Competition Example Architectural Decision Entry 1 – TSM backup configuration for PureFlex environment?

Name: Ash Simpson
Title: Virtualization Architect
Company: IBM
Twitter: @Yipikaye1
Profile: VCP4

Problem Statement

Which is the ideal method for TSM backup for PureFlex environment? LAN free backup or LAN based backup or both?

Assumptions

1. IBM PureFlex hardware is used

2. Physical TSM server exists within PureFlex.

3. External (Virtual) Tape Library available on PureFlex SAN Fabric.

Constraints

1. Customer has selected PureFlex Infrastructure as hardware platform
2. IBM storage must be used – Storwize V7000 and IBM DS8000
3. ProtecTier VTL available and should be used

Motivation

1. Flexibility of Choice based on specific application requirements requirements.
2. The configuration to be deployed has the capability to support both.
3. LAN free backup is getting popular option in the industry.
4. LAN free backup negates the need for large backup windows.
5. PureFlex V7000 allows for FlashCopy Manager (FCM)
6. FCM is application aware for many critical Intel workloads such as SQL and Exchange.
7. All Backup I/O is retained within a single PureFlex Chassis

Architectural Decision

Deploy LAN free backup and LAN based backup infrastructure in PureFlex environments with LAN free backup via TSM for VE and FlashCopy Manager as the default. Should a particular application have the requirement for LAN based backup, the infrastructure can support it.

Host the Physical TSM server and an ESXi Host with the TSM for VE server (via affinity rule) in the same Chassis.

For the few servers requiring LAN based backup agents use affinity rules to prefer ESXi hosts in the same PureFlex chassis as the TSM server.

Alternatives

1. Provide LAN based backup only

2. Provide LAN free backup only.

Justification

1.Better utilization of network bandwidth in LAN free backup.
2.Improved performance for backup and restore operations is possible in LAN free backup.
3. LAN based backup is still required by certain applications, hence it is recommended to retain this feature.
4. Hosting TSM server in same chassis as proxy/agents prevents North/South network I/O.
5. FlashCopy Manager will reduce backup times by creating application aware snapshots on the storage array.

Implications

1. The hardware infrastructure will have to be configured for both LAN free and LAN based backup. For LAN free backup the SAN fabric in PureFlex system will be used for backup environment. The backup server transfers data from its storage directly to the tape device via FC.

2. Fibre Channel ports needs to be dedicated for backup traffic

3. Separate Zones needs to be configured in the Fibre Channel Switch module environment for backup traffic.

Back to Competition Main Page or Competition Submissions

Example Architectural Decision – Jumbo Frames with IP Storage (Use Jumbo Frames)

Problem Statement

When using IP based storage over a converged 10GB network, should Jumbo Frames be used?

Requirements

1. Fully Supported storage

2. Maximum vSphere environment availability

3. Maximize performance where possible

Assumptions

1. Dedicated 10GB Storage Network which is highly available

2. Two 10GB connections per ESXi host dedicated to IP Storage

3. Storage array supports Jumbo Frames

4. Benefit of Jumbo Frames outweighs the complexity to implement/maintain/support

5. Network performance is constrained at an interrupt level

Constraints

1. Maximum of two connections per ESXi host for IP Storage

Motivation

1. Maximum performance and security

Architectural Decision

Use Jumbo Frames

Justification

1. There is a dedicated physical network for IP storage

2. All devices end to end support Jumbo Frames and this is enabled on all switches globally

3. As only IP storage traffic traverses the dedicated network, a larger MTU will not have any adverse effects on data network traffic.

4.  IP storage packets will not be fragmented or dropped as the storage network has been verified and configured to support Jumbo Frames. Thus avoiding costly re-transmits

5. No routing exists (or is required) for the IP storage network, as such the environment is flat and simple to support

6. IP Storage performance will not be constrained by MTU

7. A standard MTU of 1500 can optionally be configured at the VMKernel layer if performance is negatively impacted by Jumbo Frames without the need to modify the switch configuration which will support up to 9216 MTU

8. Increasing the MTU will decrease the number of packets required for the same bandwidth to help prevent IP storage network being constrained at an interrupt level

Implications

1. A dedicated network needs to be maintained for IP storage which reduces consolidation

2. Storage network needs to be configured for Jumbo Frames

3. The Storage controller needs to be configured for Jumbo Frames

4. The VMKernel/s need to be configured for Jumbo Frames

5. Where the networks becomes constrained at either an interrupt or throughput level, any benefit of Jumbo Frames may be reduced or lost and IP storage performance may degrade

Alternatives

1. Do not use Jumbo Frames

2. Use Jumbo Frames in a converged network (ie: No dedicated IP Storage switches)

Relates Articles

1. Example Architectural Decision – Jumbo Frames for IP Storage (Use Jumbo Frames)

 Contributors

Thanks to Rob McNab (IBM) and Peter McCrystal (IBM) for their input into this example architectural decision.

Example Architectural Decision – Network I/O Control Shares/Limits for ESXi Host using IP Storage

Problem Statement

With 10GB connections becoming the norm, ESXi hosts will generally have less physical connections than in the past where 1Gb was generally used, but more bandwidth per connection (and in total) than a host with 1GB NICs.

In this case, the hosts have only to 2 x 10GB NICs and the design needs to cater for all traffic (including IP storage) for the ESXi hosts.

The design needs to ensure all types of traffic have sufficient burst and sustained bandwidth for all traffic types without significantly negatively impacting other types of traffic.

How can this be achieved?

Assumptions

1. No additional Network cards (1gb or 10gb) can be supported
2. vSphere 5.1
3. Multi-NIC vMotion is desired

Constraints

1. Two (2) x 10GB NICs

Motivation

1. Ensure IP Storage (NFS) performance is optimal
2.Ensure vMotion activities (including a host entering maintenance mode) can be performed in a timely manner without impact to IP Storage or Fault Tolerance
3. Fault tolerance is a latency-sensitive traffic flow, so it is recommended to always set the corresponding resource-pool shares to a reasonably high relative value in the case of custom shares.
4. Proactively address potential contention due to limited physical network interfaces

Architectural Decision

Use one dvSwitch to support all VMKernel and virtual machine network traffic.

Enable Network I/O control, and configure NFS and/or iSCSI traffic with a share value of 100 and ESXi Management , vMotion & FT which will have share value of 25. Virtual Machine traffic will have a share value of 50.

Configure the two (2) VMKernel’s for IP Storage on dvSwitch and set to be Active on one 10GB interface and Standby on the second.

Configure two VMKernel interfaces for vMotion on the dvSwitch and set the first as Active on one interface and standby on the second.

A single VMKernel will be configured for Fault tolerance and will be configured as Active on one interface and standby on the second.

For ESXi Management, the VMKernel will be configured as Active on the interface where FT is standby and standby on the second interface.

All dvPortGroups for Virtual machine traffic will be active on both interfaces.

Justification

1. The share values were chosen to ensure IP storage traffic is not impacted as this can cause flow on effects for the environments performance. vMotion & FT are considered important, but during periods of contention, should not monopolize or impact IP storage traffic.
2. IP Storage is more critical to ongoing cluster and VM performance than ESXi Management, vMotion or FT
3. IP storage requires higher priority than vMotion which is more of a burst activity and is not as critical to VM performance
4. With a share value of 25,  Fault Tolerance still has ample bandwidth to support the maximum supported FT machines per host of 4 even during periods of contention
5. With a share value of 25, vMotion still has ample bandwidth to support multiple concurrent vMotion’s during contention however performance should not be impacted on a day to day basis. With up to 8 vMotion’s supported as it is configured on a 10GB interface. (Limit of 4 on a 1GB interface) Where no contention exists, vMotion traffic can burst and use a large percentage of both 10GB interfaces to complete vMotion activity as fast as possible
6. With a share value of 25,  ESXi Management still has ample bandwidth to continue normal operations even during periods of contention
7. When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution.
8. With a share value of 50,  Virtual machine traffic still has ample bandwidth and should result in minimal or no impact to VM performance across 10Gb NICs
9. Setting Limits may prevent operations from completing in a timely manner where there is no contention

Implications

1. In the unlikely event of significant and ongoing contention, performance for vMotion may affect the ability to perform the evacuation of a host in a timely manner. This may extend scheduled maintenance windows.
2. VMs protected by FT may be impacted

Alternatives

1. Use a share value  of 50 for IP storage traffic to more evenly share bandwidth during periods of contention. However this may impact VM performance eg: Increased CPU WAIT if the IP storage is not keeping up with the storage demand

Related Posts
1. Example VMware vNetworking Design for IP Storage (4 x 10GB NICs)
2. Example VMware vNetworking Design for IP Storage (2 x 100GB NICs)
3. Frank Denneman (VCDX) – Designing your vMotion Network – Multi-NIC vMotion & NIOC