FusionIO IODrive2 Virtual Machine performance benchmarking (Part 1)

With all the talk around the industry about Solid State PCIe based storage, and storage vendors such as Netapp release virtual storage appliances, I wanted to investigate if these were solutions I could use for my customers.

As such I recently requested a FusionIO IODrive2 card from Simon Williams (@simwilli) at FusionIO  so I could test how these cards perform for VMware virtual machines.

I installed the IODrive2 card into my IBM x3850 M2 server, for more details about the server. check out the “My Lab” page however I have since upgraded my host to vSphere 5.1 Release 799733.

The IODrive2 card is installed in an PCIe x8 25W slot, See the below for the maximum performance courtesy of “HowStuffWorks”. These would be the theoretical maximums although there are a number of factors which will impact performance.

Here is the quoted performance from the FusionIO website.

For this test, in an attempt to ensure the Virtual machine configuration is not a bottleneck, I have created a Virtual machine with the following configuration.

Windows 2008 R2 Enterprise , Virtual Machine Hardware Version 9 (vSphere 5.1) . 4 vCPU , 8GB vRAM , 4 x PVSCSI Adapters, 5 x vDisks, 1 for OS, and 4 in a RAID0 stripe.

I have the OS drive and 2vDisks on PVSCSI adapter 1, and 2 vDisks per remaining PVSCSI adapters.

Note: I also tested having additional vDisks in the VM, and the performance remained within +/-2% so I don’t believe this was a problem for this test. I also tested two VMs concurrently running benchmarking software and could not achieve any higher performance for any of the tests.

This just goes to show the VMware hypervisor is not a bottleneck for storage performance.

See the below for the configuration for the test VM.

He is what it looks like in Windows

The below is the configuration in Windows Disk Manager.

To test performance, I was planning to use “Crystal Disk Mark x64” & VMware’s IO Analyzer 1.1 appliance which uses IOMeter, however the IO Analyzer does not appear to work on vSphere 5.1 although in fairness to the product, I didn’t really make any serious attempt to troubleshoot the issue.

Therefore I downloaded “SQLIO” from Microsoft.

Lets start with Crystal Disk mark, which to be honest I haven’t used before, so lets see how it goes.

See the below for the tests being performed.

To perform the tests I simply hit the “All” button in the top left of the picture above.

To ensure the results are not skewed for any reason, I repeated each test three times and before running the tests I checked the performance graphs for the ESXi host an ensured there was minimal (<1MBps) of disk activity.

Test One Results

Test Two Results

Test Three Results

So overall, very consistent results.

Out of interest, I then added another 4 disks (one per PVSCSI controller) and re run the test to see if the virtual machine configuration was a limitation.

The results are below and show a minor increase in sequential read, but overall no change worth mentioning.

I then used the SQLIO  tool which can do a lot more granular tests.

So in an attempt to get close to the advertised performance, I started with a  sequential read with 1024KB I/O size and a queue depth of 32.

The initial results I thought we’re pretty good, 889MBps and obviously the same IOPS (889) due to the IO size.

Next I completed the following tests for a duration of 60 seconds for IO sizes 1024,512,256,128,64,32,16,8 and 4.

1. Sequential READ , Queue depth 32
2. Random READ , Queue depth 32
3. Sequential WRITE , Queue depth 32
4. Random WRITE , Queue depth 32

The Results from the CLI are all shown below in thumbnail format. You can click the thumbnail to see the full size results.

1024k Block Size (KB)

Test 1 – Sequential READ | Test 2 – Random READ   |  Test 3 – Sequential WRITE | Test 4 – Random WRITE

889MBps / 889IOPS       |    888MBps / 888IOPS       |      317MBps / 317IOPS         |  307MBps / 307IOPS

        

512k Block Size (KB)

Test 1 – Sequential READ | Test 2 – Random READ     |   Test 3 – Sequential WRITE  | Test 4 – Random WRITE

886MBps / 1773IOPS       |    893MBps / 1786IOPS    |        321MBps / 642IOPS         |    321MBps / 642IOPS

            

256k Block Size (KB)

Test 1 – Sequential READ | Test 2 – Random READ   |  Test 3 – Sequential WRITE | Test 4 – Random WRITE

894MBps / 3577IOPS       |    800MBps / 3203IOPS |      323MBps / 1294IOPS      |   320MBps / 1280IOPS

              

128k Block Size (KB)

Test 1 – Sequential READ | Test 2 – Random READ   |  Test 3 – Sequential WRITE | Test 4 – Random WRITE

881MBps / 7055IOPS       |    806MBps / 888IOPS     |      320MBps / 2565IOPS    |   321MBps /6454IOPS

              

64k Block Size (KB)

Test 1 – Sequential READ   | Test 2 – Random READ      | Test 3 – Sequential WRITE | Test 4 – Random WRITE

636MBps / 10189IOPS       |   669MBps / 10711IOPS   |       321MBps / 5146IOPS     |    321MBps / 5142IOPS

                  

32k Block Size (KB)

Test 1 – Sequential READ   |    Test 2 – Random READ   |  Test 3 – Sequential WRITE  |  Test 4 – Random WRITE

504MBps / 16141IOPS       |    486MBps / 15564IOPS   |      318MBps / 10186IOPS     |    319MBps / 10212IOPS

                       

16k Block Size (KB)

Test 1 – Sequential READ | Test 2 – Random READ        |   Test 3 – Sequential WRITE | Test 4 – Random WRITE

306MBps / 19621IOPS       |    307MBps / 19671IOPS   |     300MBps / 19251IOPS    |  290MBps / 18618IOPS

                     

8k Block Size (KB)

Test 1 – Sequential READ    | Test 2 – Random READ      |  Test 3 – Sequential WRITE | Test 4 – Random WRITE

179MBps / 22975IOPS       |    174MBps / 22346IOPS   |  169MBps / 21722IOPS       |    167MBps / 21493IOPS

                      

4k Block Size (KB)

Test 1 – Sequential READ | Test 2 – Random READ     |  Test 3 – Sequential WRITE  | Test 4 – Random WRITE

92MBps / 23761IOPS       |    90MBps / 23236IOPS    |      89MBps / 22845IOPS      |    83MBps / 21378IOPS

                 

Summary of Sequential Read Performance

The sequential read test is where you get the largest numbers, and these numbers are generally what is advertised, although, with the larger block sizes, the tests do not represent real world disk activity.

What is interesting, is I reached the saturation point with just 128k IOs.

The performance in these tests in my opinion we’re very good considering the older server the card was tested in.

Summary of Sequential Write Performance

I was again surprised at how quickly I reached the maximum performance, a 16KB IO got 90% the sequential write performance of  even 512KB and 1024KB IOs. With a faster test server, I would be interesting to see if this remained the case.

Summary of Random Read Performance

The random read performance for me is quite impressive, for applications such as MS SQL which reads (and writes) in 64k blocks, the FusionIO card in even older servers will deliver >650MBps random read performance. Getting that sort of performance out of traditional DAN or SAN would require a lot of spindles and cache!

Summary of Random Write Performance

The random read performance for me is quite impressive, for applications such as MS SQL which writes in 64k blocks, the FusionIO card in even older servers will deliver >300MBps random write performance. As with the above Random read performance, try getting that out of traditional DAS/SAN storage.

Conclusion

I would like to encourage storage vendors to provide details on how they benchmark their products along with real world examples for things like VMware View / SQL / Oracle etc. This would go a long way to helping customers and consultants decide what products may work for them.

Regarding my specific testing, Although I suspected this prior to beginning my testing, the older x3850 M2 hardware is clearly a bottleneck for such a high performance card, like the IODrive2.

In other tests I have read, such as Michael Webster’s (IO Blazing Datastore Performance with FusionIO) these cards are capable of significantly higher performance. For example, Michael’s tests were conducted on relatively new Dell T710’s with Westmere spec CPUs, he was able to get much closer to the advertised performance figures.

Even in an older server, such as my x3850 M2, the FusionIO performs very well and is a much better alternative than using traditional DAS storage or even some of today’s SAN array which would require numerous spindles to get anywhere near the FusionIO cards performance. Other DAS/SAN/NAS solutions would also likely be much more expensive.

I can see these style of cards playing a large part in enterprise storage in the future. With vendors such as Netapp partnering with FusionIO, it goes to show there are big plans for this technology.

One use case I can see in the not to distant future is using storage appliances such as Netapp Edge VSA to share FusionIO (DAS) storage to vSphere clusters in OnTap Cluster mode.

I am planning to several more benchmark style posts, including one relating to VMware View floating pool desktop deployments as a demonstration of one the many use cases for Fusion IODrive2 cards, so stay tuned.

9 thoughts on “FusionIO IODrive2 Virtual Machine performance benchmarking (Part 1)

  1. Pingback: IO Blazing Datastore Performance with Fusion-io « Long White Virtual Clouds

  2. Hi Josh, Great writeup and thanks for the pingback. I’m wondering if the RAID0 striping in the guest was a limiting factor for your performance also. During my tests I split the IO load over individual files that were placed on each of the individual virtual disks and then read or wrote to them in parallel. This gave my test harness (IO Blazer) access to more OS level IO subsystem queues without the software RAID overhead. This simulates the way an OLTP database might be be laid out. Be interesting to see what difference if any this makes with the hardware you’re running.