I was reading through some of my daily Google Alerts that I have setup and came across an article on SQL Solutions talking about how it’s a bad idea to run SQL in a VM because you could loose transactions in a power failure. Now normally I would comment directly on the website but this site (like all of the search* sites) doesn’t provide a mechanism for comments. That’s just stupid in my opinion. I then tried to contact the author 3 times but no response. So here I am on my blog trying to do some education and set the author straight. If you’re reading this, Alex, then please contact me so we can talk about your setup. Now on to the corrections.
Background
The premise of Alex’s article is that VMs will write to the host server’s cache and thus if you had a power failure not all of the SQL transactions would be written to disk and the database will be corrupt. Alex doesn’t give a lot of insight into his setup like what version or product from VMware he’s using. Alex also doesn’t list out his host operating system which does matter as you’ll see below. He just states everything as VMware. Here’s the picture with the architecture diagram from his site.
Education
As you can see he must be running one of VMware’s hosted products such as VMware Server, VMware Workstation, or VMware Fusion. Now is the time to give some insight into how writes are handled in the different hosted products.
- On Linux hosts, VMware does not use unbuffered IO (i.e. O_DIRECT), as it’s not safe/supported across all the Linux versions that VMware cares about. So currently, VMware hosted products always use buffered IO for all IOs (i.e. guest-initiated IOs, VMware internal IOs, etc.).
- On Windows hosts, by default VMware uses unbuffered IO.
- On Mac OS (for Fusion), VMware uses buffered IO by default currently (because early Mac OS versions had some kernel bugs that caused dead-locks when unbuffered IO was used). However there is a likelihood that VMware may start using unbuffered IO for Fusion at some point in the future.
Given that the site is dedicated to SQL server which runs on Windows I’m going to make an assumption that Alex’s tests were run with a Windows host in which case there was some misconfiguration since all IO done by the guest is unbuffered which means no write acknowledgment is done without first receiving it from the storage array. This is why I’m sitting here scratching my head on how Alex got stuff to fail. Given some of the other comments in the post (“excluding “toy” databases like MySQL”) and the generalization of “VMware” I’m sure this is just an education thing that needs to happen.
Conclusion
It is true that in ANY physical or virtualized deployment, poorly designed infrastructure (storage, servers, UPS etc) can lead to instability of SQL Server or any other database server for that matter. Notwithstanding such poor designs, customers can be assured that SQL Server running on VMware ESX (and even the hosted products when setup properly) will always be a reliable and recoverable system that maintains application and data consistency just as it is deployed on physical systems.
It is possible that the tests in the article and the inaccurate conclusions reached therein may be based on testing SQL Server on the hosted VMware Server. Certain host OS’s perform (non-persistent) write-back caching, a default behavior of the host OS that could lead to behavior such as that described. I’m happy to help root cause any such point case if it is clarified/valid, and will then define related best practices to ensure that customers do not use bad configurations, even if they are using VMware Server and not VMware ESX.
I can reproduce this same problem on a physical server attached to a DMX or any other array for that matter or even local DAS storage by disconnecting the batteries that keep cache alive to flush IO in flight when a power failure occurs. This is why every cache system, server, and storage array that is “datacenter” grade is required to have a UPS. Additionally this is why every “datacenter” grade storage configuration should have at least two paths to the disk. The whole article seems to be written on the assumption that infrastructure owners and application owners would simply throw out these principles when running their mission critical SQL databases.
When you put VMware ESX in the stack, the above still doesn’t change. VMware still uses UPS’s and multipath to the storage and highly recommends this best practice configuration.
UPDATE (February 11, 2009): A new KB article has been written describing the methods VMware uses to write data to disk. Go here for more information.



January 27th, 2009 at 2:09 am
Agreed. Glad you read and point it out. That article on SQL solutions is totally a misleading.
Cheers, Kent
January 27th, 2009 at 4:22 pm
So, yes you could lose data if you lose power, but how would it differ running on VMware vs Physical host if both lost power?
funny.
Roger L
January 27th, 2009 at 5:46 pm
To address Roger's question, if you are running SQL Server in a Virtual Machine on an ESX server, or a hosted VMware product that has been configured correctly, the result of a loss of power would be the same recoverable and crash consistent copy of the database being available for both the phyisical and virtual environments. In other words there would be no difference in what you got after recovery were complete.
That is the point which Mike is clearing up as the SQLSolutions article suggested you would get a differenct result and that is not correct. Hopefully this is now clear. The physical and virtual both provide a solid foundation for running a SQL Server and both provide a crash consistant and recoverable database.
February 8th, 2009 at 2:57 pm
[...] http://www.mikedipetrillo.com/mikedvirtualization/2009/01/running-sql-server-on-vmware-products-is-s... [...]
February 10th, 2009 at 12:14 pm
[...] Running SQL Server on VMware Products is Safe – Post dispelling some specific anti-SQL Server on VMware FUD. Post a comment — Trackback URI RSS 2.0 feed for these comments This entry (permalink) was posted on Tuesday, February 10, 2009, at 9:14 am by admin. Filed in Links and tagged active directory, authentication, fud, kerberos. [...]
February 12th, 2009 at 9:03 am
[...] couple of weeks ago I posted about SQL Server running in a VM and how disk writes were assured to happen in contradiction to a [...]
January 22nd, 2010 at 4:15 pm
Hello Mike,
I'm glad I happen to 'find' your blog about VMWare.
Not too familiar with it myself, but I'm still digesting the above as you've written. Hm……the situation with the UPS, that sounds familiar. Know of some cases where people have panicked of data loss due to power outages WHILE backing up 'precious' files from computer to an EXTERNAL HD. Believe it or not.
Does the same principle apply?
I'm still learning. New to the profession and on my way to carving out a career as a DBA. Please keep up the blog.
Incidentally, I'm using a Mac. I'm struggling/trying to grasp how to run VMWare on a Mac OSX to put up Oracle for personal developing purposes/leaning.
Any suggestions or advise from a pro is greatly appreciated from this 'green horn'.
Thank you.
January 22nd, 2010 at 11:15 pm
Hello Mike,
I'm glad I happen to 'find' your blog about VMWare.
Not too familiar with it myself, but I'm still digesting the above as you've written. Hm……the situation with the UPS, that sounds familiar. Know of some cases where people have panicked of data loss due to power outages WHILE backing up 'precious' files from computer to an EXTERNAL HD. Believe it or not.
Does the same principle apply?
I'm still learning. New to the profession and on my way to carving out a career as a DBA. Please keep up the blog.
Incidentally, I'm using a Mac. I'm struggling/trying to grasp how to run VMWare on a Mac OSX to put up Oracle for personal developing purposes/leaning.
Any suggestions or advise from a pro is greatly appreciated from this 'green horn'.
Thank you.