How to successfully Virtualize MS Exchange – Part 16 – Virtual Disk Provisioning Types

Once you have made the decision on storage platform, and assuming you have chosen to use VMFS or NFS datastores, the next decision is how should my VMDKs be provisioned?

The VMware Exchange 2013 Best Practice Guide does not make mention of disk provisioning options nor does it make any recommendations, however you’re in luck as we will cover all the options along with pros and cons here.

For Exchange 2010, Microsoft state in Understanding Exchange 2010 Virtualization:

Virtual disks that dynamically expand aren’t supported by Exchange.

Virtual disks that use differencing or delta mechanisms (such as Hyper-V’s differencing VHDs or snapshots) aren’t supported.

However I have been unable to find confirmation if this has changed or not for Exchange 2013 in the Exchange 2013 storage configuration options document which does state Thin provisioning for Storage spaces is supported but it does not state that any other form of thin provisioning is or is not supported.

While technically not supported in 2010, there is plenty of experts who understand and recommend thin provisioning including MCM and MVP for Exchange Dustin Smith who in this video talks about some of the considerations and benefits of thin Provisioning for Exchange 2010.

Now on to the topic at hand:

When creating a Virtual Machine, VMDK/s can be provisioned in one of three ways, these are:

1. Thick Provisioned Lazy Zeroed
2. Thick Provisioned Eager Zeroed
3. Thin Provisioned

Starting with Thick Provisioned Lazy Zeroed this means that the VMDK is thick provisioned but only zeroed in a just in time fashion.

The advantages of Thick Provisioned Lazy Zeroed VMDKs include:

1. Faster VM creation time than Eager Zeroed Thick (Minimal if the storage supports VAAI Write Same primitive) 
2. The entire VMDKs capacity is reserved making capacity planning easier than Thin Provisioning

The disadvantages of Thick Provisioned Lazy Zeroed VMDKs include:

1. Slower provisioning that Thin Provisioning (although the different is generally minimal)
2. The entire VMDKs capacity is reserved and unavailable for use by other virtual machines.

With Thick Provisioned Eager Zeroed (EZT) the VMDK is thick provisioned and all blocked zeroed at the time of creation. Eager Zeroed Thick VMDKs are supported on all VMFS datastores and on NFS datastores which support the VAAI-NAS Reserve Space primitive.

The advantages of EZT VMDKs these days are really minimal but include:

1.  Supporting Oracle RAC and VMware Fault Tolerance (neither being applicable to Exchange)
2. Increased performance verses Lazy and Thin Provisioned VMDKs (but more on this topic later).

However there are a number of downsides to this method which include:

1. Slower VM creation times. The time depends on the size of the VMDK/s being created and the speed of your storage as every Gb needs to be zeroed, just like performing a Full (not quick) format on your physical server.

Note: Storage array’s who support VAAI with the “Write Same” primitive can offload the zeroing to the storage array to reduce the load on the ESXi host and speed up provisioning time dramatically.

2. Increased potential for wasted capacity on a datastore.

3. Free space within VMDKs cannot be shared with other VMs which requires every VMDK have some (generally >10% is recommended) free space per VMDK to ensure the VM does not run out of space.

Lastly there is  Thin Provision which means the VMDK only takes up the amount of space that data is written too and before each write the block must be zeroed.

The advantages of Thin Provisioning VMDKs include:

1. You can create larger VMDKs with no space utilization penalty making capacity planning and growth easier.
2. Reduce wasted or unused space on the storage
3. Allows for disk space to be overcommitted ensuring maximum utilization and flexibility.
4. Free space in VMDKs is not wasted on the datastore reducing capacity requirements compared to Eager and Lazy Zeroed VMDKs.
5. The impact of SCSI reservations (VMFS datastores ONLY) causing performance issues (increased latency) when thin provisioned virtual machines (VMDKs) grow is no longer an issue as the VAAI Atomic Test & Set (ATS) primitive alleviates the issue of SCSI reservations.
6. Thin provisioned VMs reduce the overhead for Storage vMotion , Cloning and Snapshot activities. Eg: For Storage vMotion it eliminates the requirement for Storage vMotion (or the array when offloaded by VAAI XCOPY Primitive) to relocate “White space”. Note: Storage vMotion should rarely if ever be required for Exchange VMs.
7. Thin provisioning leaves maximum available free space on the physical spindles which should improve performance of the storage subsystem as a whole.

The disadvantages of thin provisioning include:

1. Increased risk of running out of space on a datastore or underlying storage array.
2. Additional write penalty of zeroing a block before writing to it. (again more on performance later in this post).
3. Increased importance of monitoring storage capacity utilization.
4. Not supported for Exchange 2010. Note: However there is no technical inhibitor for using Thin Provisioning but supported options are obviously preferable.

All in all, @FrankDenneman (VCDX #29) sums it up perfectly with his article Thin or thick disks? – it’s about management not performance. I would also suggest considering all other workloads in the environment, not just Exchange when making decisions about Thin Provisioning as it can be very beneficial and a huge cost saving (especially CAPEX) when purchasing new equipment.

Which brings us to our next topic, Thin Vs Thick Provisioning Performance!

There have been many recommendations not to use Thin Provisioning due to the performance impact of Zeroing a block before writing to it. This recommendation has been around for a long time, and like the VMDK on NFS debate appears to have strong options on both sides.

Now for the facts!

From a performance perspective most people are surprised to learn there is no significant performance advantage to using Thick Provisioned (Eager or Lazy Zeroed) VMDKs compared to Thin Provisioned disks.

In addition to that, with the reduction of I/O from Exchange 2007 to 2010 being around 50%, and from 2010 to 2013 another 50% reduction in I/O, Exchange is no longer the huge storage I/O heavy monster it once was.

VMware conducted a Performance Study of VMware vStorage Thin Provisioning back in the ESXi 4.0 days (~2009) which I will briefly summarize.

On page 6 of the performance study the following graph shows the different in performance between Thin and Thick VMDKs during zeroing and post-zeroing.

As you can see the performance is almost identical.

ThinThickScaling

The next chart shows also from Page 6 is a comparison of throughput between thin and thick VMDKs. Again we see the difference is insignificant.

AggThrougjputThickvThin

As a result of there being no significant performance impact of using Thin Provisioning, Performance should no longer be considered an objection to using Thin Provisioning!

I recommend taking advantage of the flexibility of using Thin Provisioning and creating larger Thin Provisioned VMDKs which can help simplify capacity management from a VM/OS and application perspective as well as making growth easier for Exchange as mailbox sizes increase over time.

ThinProvision

When using thin provisioning always ensure you have your alerting properly set-up with early warning on your vSphere environment AND underlying storage to advise when storage capacity of a datastore or underlying LUN/NFS mount or storage is running low so this can be remediated.

In an upcoming post I will discuss the underlying storage, including provisioning type for LUNs and NFS mounts (i.e.: Thin on Thick / Thin on Thin / Thick on Thick and Thick on Thin).

Recommendations for VMDK provisioning:

1. Check with your storage vendor and unless they have solid justification for not using Thin Provisioning OR you have an operational constraint preventing it, use Thin Provisioned VMDKs. (The pros outweigh the cons in my opinion)
2. When using Thin Provisioning create larger VMDKs to simplify capacity management at the VM and OS/Application layer.
3. When using Thick or Thin provisioning, ensure you test performance using Jetstress and LoadGen with the same provisioning type.
4. Ensure alerting is configured and working to monitor capacity utilization especially when using thin provisioned VMDKs.

Back to the Index of How to successfully Virtualize MS Exchange.

More Information on VMDK and Datastore provisioning options:

1. Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

2. Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning (Thin on Thick)

Back to the Index of How to successfully Virtualize MS Exchange.

Example Architectural Decision – Datastore (LUN) Sizing with Block Based Storage

Problem Statement

In a vSphere environment, What is the most suitable Datastore (LUN) sizing to use for to support both production & development workloads to ensure minimum storage overhead and optimal performance?

Requirements

1. RTO 4hrs
2. RPO 12hrs
3. Support Production and Test & Development Workloads
4. Ensure optimal storage capacity utilization
5. Ensure storage performance is both consistent & maximized
6. Ensure the solution is fully supported
7. Minimize BAU effort (Monitoring)

Assumptions

1. Business critical applications are excluded
2. Block based storage
3. VAAI is supported and enabled
4. VADP backups are being utilized
5. vSphere 5.0 or later
6. Storage DRS will not be used
7. SRM is in use
8. LUNs & VMs will be thin provisioned
9. Average size VM will be 100GB and be 50% utilized
10. Virtual machine snapshot will be used but not for > 24 hours
11. Change rate of average VM is <= 15% per 24 hour period
12. Average VM has 4GB Ram
13. No Memory reservations are being used
14. Storage I/O Control (SOIC) is not being used
15. Under normal circumstances storage will not be over committed at the storage array level.
16. The average maximum IOPS per VMs is 125 (16Kb) (MBps per VM <=2)
17. The underlying storage has sufficient performance to cater for the average maximum IOPS per VM
18. A separate swap file datastore will be configured per cluster

Constraints

1. Must used existing storage solution (Block Based Storage)

Motivation

1. Increase flexibility
2. Ensure physical disk space is not unnecessarily wasted
3. Create a Scalable solution
4. Ensure high performance
5. Ensure high utilization of storage resources by reducing “islands” of unused capacity
6. Provide flexibility in the unit size of partial SRM failovers

Architectural Decision

The standard datastore size will be 3TB and contain up to 25 standard virtual machines.

This is based on the following

25 VMs per datastore X 100GB (Assumes no over commitment) = 2500GB

25 VMs w/ 4GB RAM = 100GB minus 0Gb reservation = 100GB vswap space to be stored on the swap file datastore

25 VMs w/ Snapshots of up to 15% =  375GB

Total = 2500GB + 375GB = 2875GB

Average capacity used per VM = 115GB

Justification

1. In worst case scenario where every VM has used 100% of its VMDK capacity and has 4GB RAM with no memory reservation and a snapshot of up to 15% of its size the 3TB datastore will still have 197GB remaining, as such it will not run out of space.
2. The Queue depth is on a per datastore (LUN) basis, as such, having 25 VMs per LUNs allows for a minimum of 1.28 concurrent I/O operations per VM based on the standard queue depth of 32 although it is unlikely all VMs will have concurrent I/O so the average will be much higher.
3. Thin Provisioning minimizes the impact of situations where customers demand a lot of disk space up front when they only end up using a small portion of the available disk space
4. Using Thin provisioning for VMs increases flexibility as all unused capacity of virtual machines remains available on the Datastore (LUN).
5. VAAI automatically raises an alarm in vSphere if a Thin Provisioned datastore usage is at >= 75% of its capacity
6. The impact of SCSI reservations causing performance issues (increased latency) when thin provisioned virtual machines (VMDKs) grow is unlikely to be an issue for 25 low I/O VMs and with VAAI is no longer an issue as the Atomic Test & Set (ATS) primitive alleviates the issue of SCSI reservations.
7. As the VMs are low I/O it is unlikely that there will be any significant contention for the queue depth with only 25 VMs per datastore
8. The VAAI UNMAP primitive provides automated space reclamation to reduce wasted space from files or VMs being deleted
9. Virtual machines will be Thin provisioned for flexibility, however they can also be made Thick provisioned as the sizing of the datastore (LUN) caters for worst case scenario of 100% utilization while maintaining free space.
10. Having <=25 VMs per datastore (LUN) allows for more granular SRM fail-over (datastore groups)

Alternatives

1.  Use larger Datastores (LUNs) with more VMs per datastore
2.  Use smaller Datastores (LUNs) with less VMs per datastore

Implications

1. When performing a SRM fail over, the most granular fail over unit is a single datastore which may contain up to 25 Virtual machines.

2. The solution (day 1) does not provide CapEx saving on disk capacity but will allow (if desired) over commitment in the future

Thanks to James Wirth (VCDX#83) @JimmyWally81 for his contributions to this example decision.

Related Articles

1. Datastore (LUN) and Virtual Disk Provisioning (Thin on Thick)

2. Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

3. Virtual Machine vSwap Location

CloudXClogo

 

Example Architectural Decision – Storage DRS Configuration for VMFS Datastores in a vCloud Environment

Problem Statement

In a production , self service vCloud Director environment, What is the most suitable Storage DRS configuration to improve storage utilization , performance, as well as reduce administrative effort for BAU staff?

Requirements

1. Make the most efficient use of the available storage capacity
2. Maintain consistent level of storage performance
3. Reduce the risk and overhead of capacity management
4. Reduce the risk of a unintentional or otherwise DoS event caused by self service

Assumptions

1. vSphere 5.0 or later
2. VMFS 5 Datastores which are Thick Provisioned
3. Deduplication is not in use
4. VAAI is supported by the array and enabled across the vSphere environment
5. All datastores in each respective Datastore clusters reside on the same RAID type with similar spindle types and count
6. All datastores are presented to all hosts within the cluster
7. Array level snapshots are not in use
8. IBM SVC Storage is being used
9. vCloud Director 5.1 or later
10. Storage I/O Control is enabled at set to 30ms

Constraints

1. IBM SVC storage does not currently support VASA (VMware API for Storage Awareness)

Motivation

1. Ensure production storage performance is not negatively impacted
2. Minimize the vSphere administrators workload where possible

Architectural Decision

Set the DRS automation setting to “Fully Automated”

  • Set “Utilized Space” threshold to 80%
  • Set “I/O latency” to 15ms
  • I/O Metric Inclusion – Enabled

Advanced Options

  • No recommendations until utilization difference between source and destination is: 10%
  • Evaluate I/O load every 8 Hours
  • I/O Imbalance threshold  4

Justification

1. Setting Storage DRS to “Fully Automated”  ensures that the administrator does not need to be concerned with initial placement of virtual machines as this will be dynamically and intelligently determined and executed

2. “XCOPY” is fully supported for Block based storage, as such, any Storage vMotion activity is offloaded to the array therefore removing the I/O overhead on the compute and storage fabric.

3. Where a significant I/O imbalance is detected by SDRS, the vSphere administrator is not required to take any action, Storage DRS can identify and remediate issues which fall outside parameters (which are determined by the VMware Architect) automatically. This improves the efficiency of the environment, and reduces the involvement of BAU.

4. SDRS provides valuable “initial placement” for new virtual machines which will help avoid a situation where datastores are unevenly balanced from a capacity perspective in the first place, therefore reducing the chance of virtual machines requiring migration.

5. Setting the “No recommendations until utilization difference between source and destination is” to 20% ensures that SDRS does not move virtual machines around where significant benefit is not realized  This prevents unnecessary Storage vMotion activity on the disk system, although this is offloaded from the host to the array, the I/O still may impact production performance for workloads on the same disk system.

6. Setting the “I/O Imbalance threshold (5 Aggressive / Conservative 1 ) to “4” (2nd most aggressive)  ensures that I/O imbalance should be addressed before significant imbalance is experienced by the end users. This level of “ aggressiveness” is acceptable as the Storage vMotion can be offloaded (via VAAI “XCOPY” primitive  and has almost zero impact on the host.  Setting this to “5” may result in minor I/O imbalances being corrected, at the cost of a Storage vMotion and as a result the impact of the more frequent Storage vMotion activity may negate the benefit of the I/O balancing.

7. Storage DRS will address I/O imbalance across the datastore cluster if the latency meets or exceeds the set value of 15ms (the default) and in the event of latency increasing during peak times to >=30ms , Storage I/O Control will ensure fair acess to the storage.

Alternatives

1. Use “No Automation (Manual Mode)”
2. Not use Storage DRS

Implications

1. When selecting datastores for the datastore cluster, having VASA enabled allows the “System Capability” column to be populated in the “New Datastore Cluster” wizard to ensure suitable datastores of similar performance, RAID type and features are grouped together. VASA is currently NOT supported by SVC, as such the datastore naming convension needs to accurately reflect the capabilities of the LUN/s to ensure suitable datastores are grouped together.

vmware_logo_ads