PDH Virtual Backup for XenServer
Some time ago I was contacted by PHD Virtual about their new product PHD Virtual Backup version 5.6. Since I’m always looking for new products to play with, and haven’t come across a backup solution for XenServer I really liked, their request piqued my interest and I decided to give it a try. I want to tell you, PHD’s product is one I would recommend. To learn more, read on.
There are two versions of the PDH Virtual Backup solution: One for VMWare ESX and one for Citrix XenServer. Because “I don’t do VMWare” it was pretty easy to make a choice: PHD Virtual Backup for XenServer.
One of the bloggers I follow, Stephane Thirion a.k.a. ArchyNet, did a post earlier this year about this product on a high level (installation with screenshots and such) so I won’t be doing that 😉
1. Who or what is PHD Virtual?
First, let me introduce you to the vendor… PHD Virtual was founded in 2005 and was a pioneer in 2006 when they introduced their Virtual Backup Appliances to the world, which uses virtual machines to backup virtual machines (a rather new concept back in those days). Over the years they have won a bunch of awards, such as a few from VMWorld and now have over 4500 customers…They are a company that seems to be on a roll.
For more information: http://www.phdvirtual.com
2. I’m using “Automated VM Protection and Recovery”
In XenServer Enterprise edition and higher, a feature named “Automated VM Protection and recovery” (a.k.a. VMPR) is included. This is a backup method using snapshots, you can schedule the snapshots and I find it ideal in test environments, but never in production environment.
It also does no compression of the data when taking the ‘backup’, such as most backup solutions do. So from a storage/tape perspective this method consumes a lot…
And, from a personal note, using snapshots as a backup just make my skin crawl 😉
So, this is a feature of Citrix and not PHD… so what does PHD bring to the table regarding this? Very simple: It does it better…
I won’t go into a deep dive here, but to give you an idea I’ve written down the (simplified!) steps:
1. The PHD VBA reads the metadata from the target and creates a snapshot.
2. The snapshot is attached to the PHD VBA as a virtual disk and the snapshot is removed.
3. The data is deduped, compressed and eventually send to the backup storage location.
4. The virtual disk is detached and removed.
So yes, PDH Virtual does make use of snapshot in the backup process, but minimal… it converts the snapshot to a virtual disk as soon as possible to avoid the problems that may come along with snapshots.
3. PHD Virtual Backup
And now comes PHD Virtual Backup for XenServer along. To my pleasant surprise, it’s not only a simple backup solution but has some very nice additional features. The more interesting ones (from my point of view) I will explain and elaborate on later in this post.
First, lets start with the backup feature. As I wrote, I don’t like solutions that use snapshots as backups. So, with PHD Virtual Backup snapshots are not used in the backup process – and that’s pretty cool, right?!
PHD Virtual Backup created backups through an appliance named “PHD Virtual Backup Appliance” (VBA). The installation of the appliance is very, very easy and consists of X steps:
1. If XenCenter client is running on the XenServer, close the XenClient window.
2. Install phdvb.msi.
3. Import the PHD Virtual Appliance by importing PHDVBA.xva.
4. Right-click the appliance and click on PHD Virtual Backup -> Console to access the console.
Note: Depending on your infrastructure your appliance may not get an IP address from your DHCP server. If that’s the case, simply set a static or dynamic IP manually through the console.
Creating a backup through this product consists of 7 automated steps:
1. The VBA reads VM metadata and creates a snapshot (yes, there is a snapshot).
2. The snapshots are attached to the VBA as virtual disks.
3. The snapshot is removed / deleted.
4. The VBA duplicates and compresses the data.
5. The VDA sends the data to the backup storage.
6. The VDA detaches the virtual disks.
7. The virtual disks are removed / deleted.
The big difference here between VVMPR and PHD Virtual Backup is that the later doesn’t use a snapshot as a backup. Instead it converts a snapshot into a virtual disk and uses that as a backup. So, no active overhead on your VM, only just for the moment the snapshot is created… and no snapshots are used as a backup! And when it’s done, it removes the virtual disks and of course also the snapshots: Nice ‘n clean…
Just to go a little bit further you’re even able to exclude a specific VHD from the backup job… and something I’ve encountered just by playing with the product: it automatically discovers new virtual machines! No need to import… sweeeeet!
The Exporter feature in PHD Virtual Backup allows you to export data from your backup storage to external media (USB / Tape / …?) for long-term and/or off-site storage. However, this feature can be very resource intrusive and it’s therefor recommended that you install it either on a physical machine or on a VM which has enough resources on it and is optimized for performance (no dynamic VHD/memory and such).
The Exporter can be installed with the “PHDVB_Exporter_Install” executable.
After this installation is finished, you’ll have to configure PHD Virtual Backup to use the Exporter feature with only 3 simple steps:
1. Provide a staging path.
2. Provide a backup storage location ( it would need a source, right? 😉 ).
3. Create an exporter job where you define the VM’s to backup.
Note: You can also add the job to Windows Task Scheduler but will be added without a set schedule. However, the advantage of this option is that you can specify to run the backup whether the user is logged on or not and makes it configurable for specific operating systems (Win7 / Srv2008R2 / …? ).
PHD Virtual has some very interesting technologies included in their products, such as TrueDedupe™ and TrueRestore™ to name a few.
Let’s start with the nicest one: TrueDedupe™.
To put it in simple terms: it uses source side de-duplication of data on block level, compression of data is done at the VBA (I/O intrusive!) and because the data can be send over the network to the backup storage all the network traffic is minimized. Next to this, the de-duplication is done against actual data and not against some arbitrary template. Here’s a nice overview:
PHD Virtual claims an average saving ration of 25:1 which means a 90% (or more) storage requirement saving. This number is huge!
Stubborn as I am, I only see to believe… so I’ve provisioned some VM’s and looked at the savings… in my environment it was ‘only’ 82%. And note that I really did my best to include very different data in the virtual machines with the goal not to hit that 90% savings ration.
So I just have to say that I was a little flabbergasted by this (in a good way!).
Whenever you create a backup, there is a risk of a ‘corrupt backup’. Some common causes for this are bad sectors and incomplete data blocks. TrueRestore™ solves this problem by checking whether the backup is consistent with 4 simple steps.
1. Verify data blocks in-line during the backup.
2. Check data blocks on the backup target (background process).
3. Identify corrupt data blocks and flag them as ‘bad’.
4. Replace ‘bad’ data blocks with good blocks during the next backup.
This also goes for the restore process. By doing this, a successful restore can be guaranteed and you wouldn’t have to do a manual consistency check or something like it and allows for a next to perfect assurance of consistent data.
Again in simple terms: OpenRestore™ allows for the restore of virtual machines without having to install software on the XenServer host. This is something I like a lot since I’m a big believer of keeping a virtualization host always as clean as possible
4.4. Open Export
The Open Export feature allows for the export of backups to the standard and compressed OVF format. This feature also allows, by using the OVF format, for faster backup to tape (or some other long-term storage media). Especially in large environment, the duration of the backup time can be critical so this feature might be very, very useful…
4.5. Direct to Storage restore
Again a feature to optimize the performance and resource utilization. Since this method of restore uses a connection directory to the storage and not over the network, it is far less intrusive in your virtual environment and additionally, a lot faster
4.6. VM Replication
For most common configurations two VBA’s are used, one in the DR site and one in the production site. The VBA on the DR site will contact the backup storage at your production site, find the changes and replicate them. It will only replicate the changes (deltas).
So when you configure a replication job, be sure to configure them to replicate after the backup job has finished.
It’s even possible to replicate with just one VBA but then you would use a ‘push’ like system. This could be useful when you don’t want, or don’t have the option to, configure a second VBA.
4.7 Storage Areas
NFS, CIFS and VMDK disk stores can be used as a storage area for the backup. This enables the product to be configured diverse environments and can use to most storage options.
Again, since PHD Virtual Backup takes a snapshot and converts that snapshot to a virtual disk, the only performance intrusion for the virtual machine is the time it takes for a snapshot to be created.
After that point, no connection to the original virtual machine is used.
From a hypervisor point of view, DOM 0 is not used for any I/O operations which means that the performance of the hypervisor itself is not in any danger (or at least from an I/O perspective 😉 ).
And just to make the list of performance benefits complete, it uses multi-stream methods for backups and restores… per PHD VBA! This product performs well and scales too. So yes, the more VBA’s you have, the more load you can handle (kinda “duuhhh” ).
6. Architecture overview
When we put all of this together (with some extra information taken from PHD Virtual’s website) in a picture, you’ll get an architecture overview:
Well, none actually. There is nothing in the product that rubs me the wrong way.
8. Potential future enhancements
I think that the following points however, will allow for some additional added value for the product:
1. It doesn’t have a PowerShell module… and since I do looooove PowerShell, I missed this.
2. VM configuration changes are not replicated from the source to the replicated VM… for me this is a very important feature it’s currently not able to provide.
3. It’s currently not possible to customize and/or fine tune things such as deduplication ratios and compression methods/algorithms.
4. Preferably I would like a ribbon that can be customized (initially some basic tasks are provided by the vendor at installation and make it a bar that learns and adjusts itself according to the tasks performed frequently. So, something like the ribbon in Microsoft Office 2007 and later.
5. The ability to configure ‘roles’ with appropriate rights. This would be ideal for environment where developers are allowed to restore their own virtual machines, but are not allowed to touch anything else in the backup environment. This is something I don’t see enough in other backup solutions and I think this would be a truly added value for the product.
Of course this has to be configurable on virtual machines, hosts and/or groups or resources.
6. Configure security settings on a group of virtual machines, for example those in your perimeter network or DMZ. This is where I would want additional (read: extreme) security settings, where for the rest of the environment more loose security settings are allowed. For smaller environments, it may not be desired to implement an additional VBA for this.
With those security settings I mean that communication to/from the objects in that group (read: servers) would be encrypted with the strongest encryption possible, using certificates, etc. Any security setting one could think of… as where the internal environment may have more ‘loose’ security settings, for example due to performance requirements.
7. Configure scheduled times for synchronization between geological locations. For example, when there is limited bandwidth between locations you may want to limit traffic to specific times. Although users can schedule their backup and replication jobs, I would like it to go a bit further…
Let’s take a hosting provider as an example. Here you may want to configure synchronization / replication settings between location on a global level, but maybe also for groups and/or specific resources in a different way. Something like to the way you configure Maintenance Windows in Microsoft’s SCCM and/or replication connections in a DFS(R) environment.
Looking at the future, you may want to replicate to a location in a cloud… I would want to specify which resources to replicate to there and define the times it is allowed to replicate and even to throttle the bandwidth for the replication.
I found PHD Virtual Backup to be a very easy-to-use solution which creates backups of virtual machines with limited intrusion on the performance of my environment. It’s also integrates with other enterprise backup solutions such as those from NetApp, HP, and Symantec.
Next, the VBA is very easy to deploy and has friendly and easy to use management interface. PHD offers a solid product and it’s not even expensive.
Interested? Take a look at http://www.phdvirtual.com to try it out for free!