Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

Problem Statement

In a vSphere environment, What is the most suitable disk provisioning type to use for the LUN and the virtual machines to ensure minimum storage overhead and optimal performance?

Requirements

1. Ensure optimal storage capacity utilization
2. Ensure storage performance is both consistent & maximized

Assumptions

1. vSphere 5.0 or later
2. VAAI is supported and enabled
3. The time frame to order new hardware (eg: New Disk Shelves) is <= 4 weeks
4. The storage solution has tools for fast/easy capacity management

Constraints

1. Block Based Storage

Motivation

1. Increase flexibility
2. Ensure physical disk space is not unnecessarily wasted

Architectural Decision

“Thin Provision” the LUN at the Storage layer and “Thin Provision” the virtual machines at the VMware layer

(Optional) Do not present more LUNs (capacity) than you have underlying physical storage (Only over-commitment happens at the vSphere layer)

Justification

1. Capacity management can be easily managed by using storage vendor tools such eg: Netapp VSC / EMC VSI / Nutanix Command Center
2. Thin Provisioning minimizes the impact of situations where customers demand a lot of disk space up front when they only end up using a small portion of the available disk space
3. Increases flexibility as all unused capacity of all datastores and the underlying physical storage remains available
4. Creating VMs with “Thick Provisioned – Eager Zeroed” disks would unnessasarilly increase the provisioning time for new VMs
5. Creating VMs as “Thick Provisioned” (Eager or Lazy Zeroed) does not provide any significant benefit (ie: Performance) but adds a serious capacity penalty
6. Using Thin Provisioned LUNs increases the flexibility at the storage layer
7. VAAI automatically raises an alarm in vSphere if a Thin Provisioned datastore usage is at >= 75% of its capacity
8. The impact of SCSI reservations causing performance issues (increased latency) when thin provisioned virtual machines (VMDKs) grow is no longer an issue as the VAAI Atomic Test & Set (ATS) primitive alleviates the issue of SCSI reservations.
9. Thin provisioned VMs reduce the overhead for Storage vMotion , Cloning and Snapshot activities. Eg: For Storage vMotion it eliminates the requirement for Storage vMotion (or the array when offloaded by VAAI XCOPY Primitive) to relocate “White space”
10. Thin provisioning leaves maximum available free space on the physical spindles which should improve performance of the storage as a whole
11. Where there is a real or perceved issue with performance, any VM can be converted to Thick Provisioned using Storage vMotion not disruptivley.
12. Using Thin Provisioned LUNs with no actual over-commitment at the storage layer reduces any risk of out of space conditions while maintaining the flexibility and efficiency with significantly reduce risk and dependency on monitoring.
13. The VAAI UNMAP primitive provides automated space reclamation to reduce wasted space from files or VMs being deleted

Alternatives

1.  Thin Provision the LUN and thick provision virtual machine disks (VMDKs)
2.  Thick provision the LUN and thick provision virtual machine disks (VMDKs)
3.  Thick provision the LUN and thin provision virtual machine disks (VMDKs)

Implications

1. If the storage at the vSphere and array level is not properly monitored, out of space conditions may occur which will lead to downtime of VMs requiring disk space although VMs not requiring additional disk space can continue to operate even where there is no available space on the datastore
2. The storage may need to be monitored in multiple locations increasing BAU effort
3. It is possible for the vSphere layer to report sufficient free space when the underlying physical capacity is close to or entirely used
4. When migrating VMs from one thin provisioned datastore to another (ie: Storage vMotion), the storage vMotion will utilize additional space on the destination datastore (and underlying storage) while leaving the source thin provisioned datastore inflated even after successful completion of the storage vMotion.
5.While the VAAI UNMAP primitive provides automated space reclamation this is a post-process, as such you still need to maintain sufficient available capacity for VMs to grow prior to UNMAP reclaiming the dead space

Related Articles

1. Datastore (LUN) and Virtual Disk Provisioning (Thin on Thick)CloudXClogo

 

Example Architectural Decision – Virtual Machine Swap file location for SRM protected VMs

Problem Statement

In an environment where multiple vSphere clusters are protected by VMware Site Recovery Manager (SRM) with array based replication. What is the best way to ensure the RTO/RPO is met/exceeded and to minimize the storage replication overhead?

Assumptions

1. Additional storage will not be obtained

2. Eight (8) Paths per LUN are Masked/Zoned

Motivation

1. Optimize underlying storage usage

2. Ensure transient files are not unnesasarily replicated

Architectural Decision

Configure vSphere cluster swapfile policy to Store the swapfile in the datastore specified by the host.

Create and configure a dedicated swap file datastores provided by Tier 1 storage with greater than the capacity of the total vRAM for the cluster itself, along with any/all clusters using the cluster/s as recovery sites.

Justification

1.Decreased storage replication between protected and recovery sites

2. Reduced impact to the underlying storage due to reduced replication

3. Reduces the used space at the recovery site

4. No impact to the ability to recovery to, or failback from the recovery site

5. vMotion performance will not be impacted as all hosts within a cluster share the same swap file datastore which is provided from the existing shared storage

6. There is minimal complexity in setting a dedicated swap file datastore as such, the benefits outweigh the additional complexity

7. In the event of swapping, performance will not be impacted as the swap file is on Tier 1 storage

8. There is no additional Tier 1 storage utilization as the vswap file would alternatively be set to “Store in the same director as the virtual machine”

9. Ensures memory (RAM) over commitment can still be achieved where as setting memory reservations would reduce/eliminate this benefit of vSphere

Implications

1. vMotion performance between clusters will be degraded as the swap file will be moved as part of the vMotion to the destination cluster swap file datastore

2. One (1) datastore out of a maximum of 256 per host are used for the swap file datastore

3. Eight (8) paths out of a maximum of 1024 per host are used for the swap file datastore

Alternatives

1. Store the swapfile in the same directory as the virtual machine

2. Set Virtual machine memory reservations of 100% to eliminate the vswap file

Relates Articles

1. Site Recovery Manager Deployment Location

2. VMware Site Recovery Manager, Physical or Virtual machine?

 

Example Architectural Decision – Number of paths per LUN for VMFS datastores

Problem Statement

In a vSphere environment hosting a large number of VMs,  Virtual machines I/O requirements range from small <100 IOPS to large business critical applications with tens of thousands of IOPS, the ESXi hosts have been configured with 4 x 8Gb FC HBAs.

What is the most suitable number of paths per LUN when using 4 x 8GB FC connections per Host, and how will they be presented in a highly available manner with two (2) SAN Fabrics connected to an Active/Active Enterprise Disk array?

Requirements

1. All LUNs are available on all FC Interfaces
2. The storage be highly available
3. The environment should be able to continue running production workloads in the unlikely event of a dual port HBA, or single Fabric failure.
4. The environment maintain a consistent level of performance

Assumptions

1. The Storage area network has two (2) fabrics each of which is highly available
2. The disk system is presented to both SAN fabrics
3. The number of VMs per host is >100
4. vSphere 4.0 or later
5. Storage array is Active/Active
6. ESXi hosts are large and are designed to drive significant I/O
7. VAAI is supported and enabled

Constraints

1. Maximum paths supported per ESXi host is 1024
2. Maximum number of datastores per ESXi host is 256

Motivation

1. Ensure optimal performance redundancy
2. Maximum the total capacity able to be presented to a cluster

Architectural Decision

Use a standard of 8 paths per LUN

Each LUN will be presented to each HBA via both Controller A and Controller B resulting in two paths per LUN per HBA.

With a total of 4 FC connections across two (2) physical dual port HBAs in a HA configuration with one (1) connection per HBA per Fabric, this equates to a total of 8 paths per LUN to the ESXi host (4 paths per Fabric)

Justification

1. This equates to 4 paths (1 per HBA interface per LUN) per Fabric
2. The use of VMware NMP with “Round Robin” will be used and having all LUNs presented via both fabrics and all HBAs will provide the maximum reducing in latency and the most consistent performance overall
3. 8 paths per LUN ensures up to 128 LUNs can be presented within the 1024 paths per ESXi host limit which will support sufficient capacity for the cluster
4. The solution is highly available as it uses two fabrics and both controllers are Active
5. In the event of a Fabric failure, the remaining Fabric serving 2 x 8Gb connections will provide connectivity to both Controller A and B, with a total of 4 paths
6. Ensures the cluster can have enough LUNs to balance workloads across which will assist keeping latency at a minimum

Alternatives

1. Have less paths per LUN which enabled the use of more LUNs
2. Have more paths per LUN and have less LUNs

Implications

1. LUN sizes will need to be sizes to ensure a maximum of 128 LUNs are sufficient from a capacity perspective to cater for the desired number of virtual machines

vmware_logo_ads