Example Architectural Decision – Number of paths per LUN for VMFS datastores

Problem Statement

In a vSphere environment hosting a large number of VMs,  Virtual machines I/O requirements range from small <100 IOPS to large business critical applications with tens of thousands of IOPS, the ESXi hosts have been configured with 4 x 8Gb FC HBAs.

What is the most suitable number of paths per LUN when using 4 x 8GB FC connections per Host, and how will they be presented in a highly available manner with two (2) SAN Fabrics connected to an Active/Active Enterprise Disk array?

Requirements

1. All LUNs are available on all FC Interfaces
2. The storage be highly available
3. The environment should be able to continue running production workloads in the unlikely event of a dual port HBA, or single Fabric failure.
4. The environment maintain a consistent level of performance

Assumptions

1. The Storage area network has two (2) fabrics each of which is highly available
2. The disk system is presented to both SAN fabrics
3. The number of VMs per host is >100
4. vSphere 4.0 or later
5. Storage array is Active/Active
6. ESXi hosts are large and are designed to drive significant I/O
7. VAAI is supported and enabled

Constraints

1. Maximum paths supported per ESXi host is 1024
2. Maximum number of datastores per ESXi host is 256

Motivation

1. Ensure optimal performance redundancy
2. Maximum the total capacity able to be presented to a cluster

Architectural Decision

Use a standard of 8 paths per LUN

Each LUN will be presented to each HBA via both Controller A and Controller B resulting in two paths per LUN per HBA.

With a total of 4 FC connections across two (2) physical dual port HBAs in a HA configuration with one (1) connection per HBA per Fabric, this equates to a total of 8 paths per LUN to the ESXi host (4 paths per Fabric)

Justification

1. This equates to 4 paths (1 per HBA interface per LUN) per Fabric
2. The use of VMware NMP with “Round Robin” will be used and having all LUNs presented via both fabrics and all HBAs will provide the maximum reducing in latency and the most consistent performance overall
3. 8 paths per LUN ensures up to 128 LUNs can be presented within the 1024 paths per ESXi host limit which will support sufficient capacity for the cluster
4. The solution is highly available as it uses two fabrics and both controllers are Active
5. In the event of a Fabric failure, the remaining Fabric serving 2 x 8Gb connections will provide connectivity to both Controller A and B, with a total of 4 paths
6. Ensures the cluster can have enough LUNs to balance workloads across which will assist keeping latency at a minimum

Alternatives

1. Have less paths per LUN which enabled the use of more LUNs
2. Have more paths per LUN and have less LUNs

Implications

1. LUN sizes will need to be sizes to ensure a maximum of 128 LUNs are sufficient from a capacity perspective to cater for the desired number of virtual machines

vmware_logo_ads

Example Architectural Decision – vSphere Path Selection Plugin (PSP) for IBM SVC Storage

Problem Statement

What is the most suitable multipathing policy when using IBM SVC storage?

Requirements

1. Ensure maximum performance and availability for vSphere storage
2. Ensure storage performance is as consistent as possible

Assumptions

1. IBM SVC Storage which is Active/Active
2. VAAI is supported and enabled

Constraints

1. Solution must be supported

Motivation

1. Ensure optimal performance and redundancy
2. Minimize Latency

Architectural Decision

Use vSphere Native Multipathing Plugin (NMP) and configure “VMW_PSP_RR” (Round Robin) as the path selection policy.

Set the default PSP to “VMW_PSP_RR” (Round Robin) for SATP VMW_SATP_SVC so all new LUNs automatically use Round Robin

Justification

1. Round Robin helps ensure minimum average latency to the storage by using all available paths
2. Ensure performance is not degraded for some/all virtual machines due to a single HBA or connection being heavily utilized
3. Using “VMW_PSP_ FIXED” requires the paths to be manually load balanced to avoid thrashing a single path
4. Using “VMW_PSP_MRU” or “VMW_PSP_ FIXED” may lead to incosistent performance across the LUNs due to some paths being more heavily used than others
5. There is no MPP currently supplied by IBM for SVC storage
6. Round Robin is a supported configuration (Note: Although not specifically listed in the Compatability Matrix)

Alternatives

1. Use “VMW_PSP_FIXED” (Default) – Fixed Pathing
2. Use “VMW_PSP_MRU”  – Most Recently Used
3. Use vendor supplied Multipathing Plugin

Implications

1. None

vmware_logo_ads

Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning

Problem Statement

In a vSphere environment, What is the most suitable disk provisioning type to use for the LUN and the virtual machines to ensure minimum storage overhead and optimal performance?

Requirements

1. Ensure optimal storage capacity utilization
2. Ensure storage performance is both consistent & maximized

Assumptions

1. vSphere 4.1 or later
2. VAAI is supported and enabled
3. Array level data replication is being used throughout the environment
4. Monitoring of the environment (including vSphere and Storage) is a manual process
5. The time frame to order new hardware (eg: New Disk Shelves) is a minimum of 3 months

Constraints

1. Block based storage

Motivation

1. Increase flexibility
2. Ensure physical disk space is not unnecessarily wasted

Architectural Decision

“Thick Provision” the LUN at the Storage layer and “Thin Provision” the virtual machines at the VMware layer

Justification

1. Simplified capacity management as only one layer (vSphere layer) needs to be monitored for capacity
2. The Free space shown by vSphere is actual usable storage
3. Reduces the chance of an “Out of Space” condition
4. Increases flexibility as all unused capacity of all datastores remains available
5. Creating VMs with “Thick Provisioned – Eager Zeroed” disks would increase the provisioning time
6. Creating VMs as “Thick Provisioned” (Eager or Lazy Zeroed) does not provide any significant benefit but adds a serious capacity penalty
7. Using Thin Provisioned virtual machines minimizes storage replication traffic on creation of virtual machines
8. Using Thick Provisioned LUNs reduces the requirement for fast turn around times for purchasing additional capacity
9. Monitoring is essential to successfully and safely use “Thin on Thin”

Alternatives

1.  Thin Provision the LUN and thick provision virtual machine disks (VMDKs)
2.  Thick provision the LUN and thick provision virtual machine disks (VMDKs)
3.  Thin provision the LUN and thin provision virtual machine disks (VMDKs)

Implications

1. No storage over commitment can occur on the physical array
2. The storage “consumed” will be reported differently between the vSphere Administrator and the Storage Administrator. The vSphere Administrator will see the true utilization, whereas the SAN administrator will see the “Consumed” & “Provisioned” values as the same
3. It is possible for a datastore to become overcommited, and as a result if not monitored the datastore may run out of free space which would result in an outage.

Related Articles

1. Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

vmware_logo_ads