It’s the last day of VMworld and boy does my head hurt. Way too much information out there. Only 5 more sessions to present at today and then I’m done. I did get a small break to check email and such today and ran across a rather interesting blog posting from Marathon. In this posting they try to tell customers why VMware FT (fault tolerance) is so horrible. I’m fine with people talking bad about VMware as long as it’s accurate. In this case it’s nowhere close and obviously was written by someone that just doesn’t understand VMware or virtualization. I thought I’d take a second to make some corrections here. Go read the source article first. Most of it is quoted here for reference.
1. No component-level fault tolerance. The most common failures that result in unplanned downtime are component failures such as storage, NIC or controller failures. Yet VMware Fault Tolerance doesn’t do anything to protect against I/O, storage or network failures. By not addressing these primary sources of failures, VMware appears to be saying that you/the customer are on your own do figure out how to protect your storage and network connections. This may be okay for the very largest IT staffs in the world, but for the other 98%; it will not be sufficient.
VMware already has features to protect again component failure. If your NIC fails you’ve got NIC teaming built into the system. To set it up simply plug in both NICs to the server, go into the network panel and attach both of them to the same virtual switch. Done. 4 clicks. Same thing for storage with the built-in SAN multipathing drivers. I absolutely agree with the author that component failures are the cause of most crashes and that’s why VMware added these features in 2002. VMware FT is not designed for component failure because there’s no sense in moving the VM to another host if you’ve simply lost a NIC uplink. NIC teaming will take care of that with ease and is a LOT cheaper than using CPU and memory resources on another host to overcome the failure.
2. Complexity on top of complexity. In order to use VMware Fault Tolerance, you’ll first have to install both VMware HA and DRS. No small feat in and of themselves. Then, because VMware FT requires NIC teaming, you’ll also have to manually install paired NICs. Then you’ll need to manually setup dual storage controllers (with the software to manage them) because it requires multi-pathing. And to top it all off, you’re required to use an expensive, and often complicated, SAN.
This is where it’s pretty obvious the author has never configured HA or DRS. Let me show you a picture of how hard this is.
See those two check boxes? Click them and you’ve just enabled HA and DRS. If that’s too hard then please comment and let me know how it could possibly be easier. Even my dog has figure out how to do this now. Granted, it’s a pretty smart dog.
As for setting up the dual NICs and dual HBAs, well yes, you have to actually plug the physical devices in. After you’ve done that the **built-in** NIC teaming and HBA drivers will take over and configure most everything for you. The NIC teaming does require 4 extra clicks. The HBA drivers actually figure out the failover paths, match them up, and setup the appropriate form of failover all auto-magically. They’ve been doing this since ESX 1.5 (6 years ago).
Lastly, yes this requires shared storage. Pretty sure that most environments that want FT (no downtime what-so-ever because out business could lose millions) already have a SAN to take advantage of other things virtualization related such as DRS and VMotion.
UPDATE: VMware FT does not require dual NICs or dual HBAs. This is something you **should** have in every virtualization setup that’s running VMs you care anything about but it’s not a requirement to get VMware FT running.
3. Limited CPU fault tolerance. With VMware FT, you’ll need to setup what VMware refers to as a “record/replay” capability on both a primary and secondary server. If something happens to the primary server, the record is stored on the SAN and then restarted on the secondary server. Two things to point out here. First, the whole thing depends on the quality of the SAN. Second, in the words of the VMware engineer who presented at VMworld, “this can take a couple of seconds.” So what happens to your application state in those couple of seconds?
So we’re back to the SAN argument. If you’re the type of company that requires absolutely no downtime for an app – if the app is just that critical – then I’m pretty sure you’re going to have a decent SAN. What’s a decent SAN? From many performance tests I’ve run it’s a broad category depending on the app but it ranges from small NAS appliances to high-end F/C. Yes, iSCSI and NFS work great for most applications – even I/O intensive apps. But we’re back to the apps in question that are requiring FT. Those are the apps that usually are sitting on high-end storage. If you’re having so many problems with your SAN that you don’t trust it for FT then you have much bigger issues at hand that VMware or Marathon or any of the other virtualization related vendors aren’t going to help you with. It’s time for new arguments beside “be afraid of the SAN”. Even my father’s insurance business with 3 servers and 15 employees has shared storage in play.
UPDATE: There is some confusion here on what is stored on the SAN. VMware FT requires shared storage (NAS, iSCSI, or FC) to store the virtual disk for the VM. There is no actual “snapshot” for VMware FT. CPU instructions and memory are constantly streamed to the secondary server where they are consumed in real time. This is why the VMs stay in lock step with each other. No CPU or memory instructions are written to a SAN and resumed or anything like that. The virtual disk is stored on shared storage for a few different reasons. First, it’s already there if you’re using VMotion or DRS or VMware HA. Second, it’s a huge waste of disk space to replicate the actual disk file. Third, it takes a long time and a lot of bandwidth to constantly keep disk files in sync. Really the shared storage is the better architecture in this case.
4. For VMware virtual environments only. VMware FT will only work in VMware environments. It won’t work with other hypervisors, and most importantly, you can’t use for business critical and mission critical applications that you want to keep on physical server platforms (i.e., non-virtualized environments which still represent the vast majority of customer use cases). Oh well, only the vast majority of critical applications run in physical environments anyway.
This is a funny argument. You’re complaining that a VMware feature works only with VMware environments. I guess I could see that as a valid argument if you’re Marathon and want to play with everyone. Only problem is the Marathon stuff doesn’t work with VMware (Citrix only) so the same argument could be reversed in this case.
The bottom line with all of this is try to make some valid, accurate statements when you’re talking about competitors. At least then people might believe the rest of what you say. Hopefully the author here will take some time to play with the VMware setups to see what he’s really competing against. There are FREE evals here just in case.
UPDATE: This wasn’t talked about but Marathon’s virtualization FT only works with Windows 2003 Standard or Enterprise SP1 today. VMware FT works with any of the over 70 certified guest operating systems that run on Virtual Infrastructure. The Marathon solution also sits deeply embedded within the OS. From their FAQ:
What is involved in migrating our existing applications to a Marathon environment?
The servers to be used for the Marathon environment will need to be configured with just a base Windows OS installed. The Marathon software is then installed on top of these environments to create the virtual Windows environment, on which applications can then be installed. For existing servers, Marathon and its partners can work with you to develop a migration plan that assures minimal impact to users.
This also impacts your ability to patch systems using Marathon products since some patches could impact these deep integration points. Again from the FAQ:
How does Marathon qualify Windows security patches?
Because of their critical nature, we screen and test Microsoft Security Updates that apply to Windows 2000 or Windows 2003 and are posted on Microsoft’s automatic update website area. In the majority of cases, Windows security updates are fully compatible with Marathon products. In the rare cases where an issue is found, we post an advisory on our support website knowledgebase and provide an update to resolve the issue.
With VMware’s solution on the other hand the operating system is untouched and can be installed, patched, and operated normally. Obviously it’s time for a much deeper dive into the real differences between these two solutions.
**CAUTION**: The commenter with the name TopGun below actually works for Stratus – a competitor (sort of) to VMware FT. You can pretty much ignore his rants as a “customer” since that’s obviously a lie. Read the whole story here.
-
MTC
-
Mike DiPetrillo
-
MTC
-
FT Guy
-
Vikash Kumar Roy
-
Mike DiPetrillo
-
TopGun
-
Mike DiPetrillo
-
Ganesh
-
TopGun
-
Vikarti Anatra
-
Ramesh
-
Timo Brueggemann
-
Harley
-
Duncan

