With all the talk around the industry about Solid State PCIe based storage, and storage vendors such as Netapp release virtual storage appliances, I wanted to investigate if these were solutions I could use for my customers.
As such I recently requested a FusionIO IODrive2 card from Simon Williams (@simwilli) at FusionIO so I could test how these cards perform for VMware virtual machines.
I installed the IODrive2 card into my IBM x3850 M2 server, for more details about the server. check out the “My Lab” page however I have since upgraded my host to vSphere 5.1 Release 799733.
The IODrive2 card is installed in an PCIe x8 25W slot, See the below for the maximum performance courtesy of “HowStuffWorks”. These would be the theoretical maximums although there are a number of factors which will impact performance.
Here is the quoted performance from the FusionIO website.
For this test, in an attempt to ensure the Virtual machine configuration is not a bottleneck, I have created a Virtual machine with the following configuration.
Windows 2008 R2 Enterprise , Virtual Machine Hardware Version 9 (vSphere 5.1) . 4 vCPU , 8GB vRAM , 4 x PVSCSI Adapters, 5 x vDisks, 1 for OS, and 4 in a RAID0 stripe.
I have the OS drive and 2vDisks on PVSCSI adapter 1, and 2 vDisks per remaining PVSCSI adapters.
Note: I also tested having additional vDisks in the VM, and the performance remained within +/-2% so I don’t believe this was a problem for this test. I also tested two VMs concurrently running benchmarking software and could not achieve any higher performance for any of the tests.
This just goes to show the VMware hypervisor is not a bottleneck for storage performance.
See the below for the configuration for the test VM.
He is what it looks like in Windows
The below is the configuration in Windows Disk Manager.
To test performance, I was planning to use “Crystal Disk Mark x64” & VMware’s IO Analyzer 1.1 appliance which uses IOMeter, however the IO Analyzer does not appear to work on vSphere 5.1 although in fairness to the product, I didn’t really make any serious attempt to troubleshoot the issue.
Therefore I downloaded “SQLIO” from Microsoft.
Lets start with Crystal Disk mark, which to be honest I haven’t used before, so lets see how it goes.
See the below for the tests being performed.
To perform the tests I simply hit the “All” button in the top left of the picture above.
To ensure the results are not skewed for any reason, I repeated each test three times and before running the tests I checked the performance graphs for the ESXi host an ensured there was minimal (<1MBps) of disk activity.
Test One Results
Test Two Results
Test Three Results
So overall, very consistent results.
Out of interest, I then added another 4 disks (one per PVSCSI controller) and re run the test to see if the virtual machine configuration was a limitation.
The results are below and show a minor increase in sequential read, but overall no change worth mentioning.
I then used the SQLIO tool which can do a lot more granular tests.
So in an attempt to get close to the advertised performance, I started with a sequential read with 1024KB I/O size and a queue depth of 32.
The initial results I thought we’re pretty good, 889MBps and obviously the same IOPS (889) due to the IO size.
Next I completed the following tests for a duration of 60 seconds for IO sizes 1024,512,256,128,64,32,16,8 and 4.
1. Sequential READ , Queue depth 32
2. Random READ , Queue depth 32
3. Sequential WRITE , Queue depth 32
4. Random WRITE , Queue depth 32
The Results from the CLI are all shown below in thumbnail format. You can click the thumbnail to see the full size results.
1024k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
889MBps / 889IOPS | 888MBps / 888IOPS | 317MBps / 317IOPS | 307MBps / 307IOPS
512k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
886MBps / 1773IOPS | 893MBps / 1786IOPS | 321MBps / 642IOPS | 321MBps / 642IOPS
256k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
894MBps / 3577IOPS | 800MBps / 3203IOPS | 323MBps / 1294IOPS | 320MBps / 1280IOPS
128k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
881MBps / 7055IOPS | 806MBps / 888IOPS | 320MBps / 2565IOPS | 321MBps /6454IOPS
64k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
636MBps / 10189IOPS | 669MBps / 10711IOPS | 321MBps / 5146IOPS | 321MBps / 5142IOPS
32k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
504MBps / 16141IOPS | 486MBps / 15564IOPS | 318MBps / 10186IOPS | 319MBps / 10212IOPS
16k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
306MBps / 19621IOPS | 307MBps / 19671IOPS | 300MBps / 19251IOPS | 290MBps / 18618IOPS
8k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
179MBps / 22975IOPS | 174MBps / 22346IOPS | 169MBps / 21722IOPS | 167MBps / 21493IOPS
4k Block Size (KB)
Test 1 – Sequential READ | Test 2 – Random READ | Test 3 – Sequential WRITE | Test 4 – Random WRITE
92MBps / 23761IOPS | 90MBps / 23236IOPS | 89MBps / 22845IOPS | 83MBps / 21378IOPS
Summary of Sequential Read Performance
The sequential read test is where you get the largest numbers, and these numbers are generally what is advertised, although, with the larger block sizes, the tests do not represent real world disk activity.
What is interesting, is I reached the saturation point with just 128k IOs.
The performance in these tests in my opinion we’re very good considering the older server the card was tested in.
Summary of Sequential Write Performance
I was again surprised at how quickly I reached the maximum performance, a 16KB IO got 90% the sequential write performance of even 512KB and 1024KB IOs. With a faster test server, I would be interesting to see if this remained the case.
Summary of Random Read Performance
The random read performance for me is quite impressive, for applications such as MS SQL which reads (and writes) in 64k blocks, the FusionIO card in even older servers will deliver >650MBps random read performance. Getting that sort of performance out of traditional DAN or SAN would require a lot of spindles and cache!
Summary of Random Write Performance
The random read performance for me is quite impressive, for applications such as MS SQL which writes in 64k blocks, the FusionIO card in even older servers will deliver >300MBps random write performance. As with the above Random read performance, try getting that out of traditional DAS/SAN storage.
Conclusion
I would like to encourage storage vendors to provide details on how they benchmark their products along with real world examples for things like VMware View / SQL / Oracle etc. This would go a long way to helping customers and consultants decide what products may work for them.
Regarding my specific testing, Although I suspected this prior to beginning my testing, the older x3850 M2 hardware is clearly a bottleneck for such a high performance card, like the IODrive2.
In other tests I have read, such as Michael Webster’s (IO Blazing Datastore Performance with FusionIO) these cards are capable of significantly higher performance. For example, Michael’s tests were conducted on relatively new Dell T710’s with Westmere spec CPUs, he was able to get much closer to the advertised performance figures.
Even in an older server, such as my x3850 M2, the FusionIO performs very well and is a much better alternative than using traditional DAS storage or even some of today’s SAN array which would require numerous spindles to get anywhere near the FusionIO cards performance. Other DAS/SAN/NAS solutions would also likely be much more expensive.
I can see these style of cards playing a large part in enterprise storage in the future. With vendors such as Netapp partnering with FusionIO, it goes to show there are big plans for this technology.
One use case I can see in the not to distant future is using storage appliances such as Netapp Edge VSA to share FusionIO (DAS) storage to vSphere clusters in OnTap Cluster mode.
I am planning to several more benchmark style posts, including one relating to VMware View floating pool desktop deployments as a demonstration of one the many use cases for Fusion IODrive2 cards, so stay tuned.
Pingback: IO Blazing Datastore Performance with Fusion-io « Long White Virtual Clouds
I think it’s “Fusion-io” not “FusionIO” 🙂
I believe you are correct sir.
With my understanding IBM 3850M2 PCIe is Version 1.0 not 2.0. So bandwidth will be half.
Is it?
I believe your correct and this is why my results we’re lower than the capabilities of the Fusion-io IODrive2. What I have demonstrated is the card still performs exceptionally well even on older hardware.
Hi, how about benchmarking it within vmware VSA and NETapp VSA?
We might see, if we should scrap our midrange SAN arrays asap 🙂
I plan to do testing for performance on the Netapp Edge VSA, on the Fusion-io card. Stay tuned.
OFC i am 🙂
Hi Josh, Great writeup and thanks for the pingback. I’m wondering if the RAID0 striping in the guest was a limiting factor for your performance also. During my tests I split the IO load over individual files that were placed on each of the individual virtual disks and then read or wrote to them in parallel. This gave my test harness (IO Blazer) access to more OS level IO subsystem queues without the software RAID overhead. This simulates the way an OLTP database might be be laid out. Be interesting to see what difference if any this makes with the hardware you’re running.