My VCAP-DTD (Desktop Design) Exam Experience

In late December 2012, I received an invite to the VMware Advanced Certified Professional – Desktop Design BETA exam. In typical BETA fashion, you have a limit period to schedule and take the exam, in this case it was only 25 days.

I then logged into the PearsonVUE website to check for availability, and surprise surprise, over the Christmas / New Year break there was only one slot available, Thursday 20th December 2012. So your thinking, so what? Well, I received the BETA exam invite (see below) on Tuesday 18th December, so basically it meant limited chance to study.

DTD Exam invite

Due to work being flat out, I literally was unable to do any study, and sat the exam.

Luckily, I have be doing a lot of work with View recently, in an poorly design environment, as a result, most things View related were pretty fresh in my mind.

As with the VCAP-DCD (VMware Advanced Certified Professional – Datacenter Design) exam, the VCAP-DTD is a 195 min exam with 115 questions made up of multiple choice , drag and drop , scenario questions, along with a limited number (in my case 6) of visio style design questions.

The key to all the VCAP exams including VCAP-DTD in my experience (having now sat and passed 4 VCAPs) is time management.

Unlike the VCP exam, where you have plenty of time, and most people finish with a lot of time remaining, VCAP exams almost always come down to the wire.

What is important to understand (although I don’t know the specifics about the scoring system) is that the questions are not weighted equally, so my advice is make sure you allow enough time for the Visio style questions, as these are a significant portion of your total mark – ignore these, and I would be surprised if you passed. For my two DCD (4 & 5) exams along with this exam, I spent almost 15min on each visio style question, as I believe these are key to passing the exam. This theory has resulted in 100% pass rate on the exams, so I’m sticking to it.

The visio questions require you to know how the various components of a VMware View solution fit together. So ensure you are familiar with the key concepts of View, such as “Blocks” & “Pods” , Storage Teiring, and you understand all the components that make up a View Environment eg: View Composer, Connection Broker, Security Server etc, and you are clear in your mind how to represent them in a diagram.

I would recommend for anyone taking the VCAP-DTD (or the VCAP-DCD) to spend some time on a whiteboard, drawing different View solutions, and even work with a friend who has View knowledge to pose scenarios for you. This will help you practice turning scenarios into diagrams, which you need to be able to do in <15mins or you risk running out of time.

An area where I see most VDI/View environment fall down, is Storage. Ensure you understand storage sizing, from both a capacity and performance perspective. Storage Tiering was released in View 4.5, and in my opinion is key to a View design, and important to understand for a View related design exam.

As with all VMware exams though, there is no secret to what is on the exam, VMware do an excellent job of providing this information by way of the Blueprint, which can be found here.

Ensure you review the Blueprint, and are comfortable with all areas. The blueprint also has links to guides such as the VMware View Architecture Planning document, I have previously read these documents, and I have no doubt, had I not read these previously, I would not have been successful on the exam, especially with no study.

With BETA exams, there is always a long, long loooooong wait to get results, and Yesterday (3/5/2013) I saw a tweet from another person who sat the BETA saying results were available on the PearsonVUE website, so I logged in, and sure enough there was my result, “pass”.

VCAP-DTD Exam Result

I still don’t have my score, as this will come via mail at some stage, but a pass is a pass so it doesn’t really matter.

So I am very happy to have passed this exam, and I am now motivated to attempt the VCDX-Desktop track, once the VCAP-DTA exam is released, and submit my VCDX-Desktop application

As a Bonus, by sitting VCAP-DTD, I also received VCP5-DT, which is stated in the VCAP-DTD blueprint (see below), so if you are VCP5-DCV already, and up for a challenge, you can save sitting two exams, by jumping straight to VCAP-DTD.

DTDpath

So, that’s another VCAP under my belt. I hope this post helps you in your pursuit of your VCAP-DTD and/or VCDX-Desktop (when released).

VMW-LGO-CERTIFIED-DESKTOPDESIGN-K

Example Architectural Decision – Datastore Heartbeats for Clusters protected by SRM

Problem Statement

To enhance the isolation detection abilities of vSphere to minimize the chance of false positive isolation responses  Datastore Heartbeats will be used. What is the most suitable configuration of Datastore Heartbeats for an environment using SRM?

Requirements

1. SRM solution must not be impacted

2. Maximum vSphere environment availability

Assumptions

1. Site Recovery Manager 5.1 protects virtual machines in the cluster/s

2. Appropriate isolation address/es have been configured OR the default isolation address is suitable

3. As all storage is presented via Active/Active storage controllers

4. There are some datastore which are not replicated

5. Isolation response is set to “Shutdown”

Constraints

1. None

Motivation

1. Minimize the chance of a false positive isolation event

2. In the event of isolation, automate the recovery of VMs

Architectural Decision

Use Datastore Heartbeats to enhance the isolation detection capabilities of vSphere.

For each cluster where SRM is used, Configure Datastore Heartbeating to Manually select two non replicated datastores per cluster as the heartbeat datastores

Justification

1. Datastore heartbeating frequently writes to the datastore selected for heartbeating so in the event the network is down, isolation, partition or failure can be properly determained. As a result, during a SRM recovery, datastores need to be un-mounted from the failed site and the Datastore heartbeating may cause one or more datastores to fail to unmount due to I/O on the datastore

2. Datastores failing to un-mount will cause one or more of the SRM recovery steps to report as failed, selecting non replicated datastores prevents this impacting SRM

3. The environment benefits from increases resiliency as a result of datastore heartbeats being used

4. There is no negative impact to the SRM solution

Implications

1. Each cluster will need to have one or more non replicated datastores if Datastore Heartbeating is to be used

2. Additional configuration required to manually select non replicated datastores for heartbeating

Alternatives

1. Do not use Datastore heartbeating

2. Use Datastore Heartbeats and have datastores automatically selected

Relates Articles

1. Example Architectural Decision – Host Isolation Response for FC Based storage


 

Example Architectural Decision – Storage I/O Control for Clusters Protected by SRM (Example 2 – Use SIOC)

Problem Statement

In an environment with one or more clusters with virtual machines protected by SRM, What is the most appropriate configuration of Storage I/O control?

Requirements

1. SRM solution must not be impacted

Assumptions

1. vSphere Version 4.1 or later

2. FC (Block) Based Storage OR NFS (File) based Storage

3. Number of datastores is fairly static

Constraints

1. Storage I/O control can prevent unmounting of datastore during a Recovery which can lead to errors being reported by SRM

Motivation

1. Where possible ensure consistent storage performance for all virtual machines

Architectural Decision

Enable and Configure Storage I/O control for all datastores.

Set the congestion threshold to 20ms

Leave the shares value default

Add a Step to each SRM recovery Plan as Step 1 and Select the Step Placement of “Before selected step”.

Configure step type as “Command of SRM Server” and execute the Scheduled Task which will disable SIOC prior to executing a SRM recovery

Justification

1. The benefits of Storage I/O control can still be achieved without impact to the SRM solution

2. SIOC will not impact SRM failover as it will be disabled automatically as part of the SRM recovery plan

3. In the event the Protected site or is lost, SIOC will not prevent failover

Implications

1. Increased complexity for the SRM solution

2. An additional step to excecute a “Command of SRM Server” is required

3. A Scheduled Task will need to be setup and configured with setting “Allow task to be ran on demand”

4. A script to disable SIOC will need to be prepared and configured with all datastores

Alternatives

1. Enable Storage I/O control and leave default settings

2. Enable storage I/O control and set share values on virtual machines

3. Enable Storage I/O control and set a lower “congestion threshold”

4. Enable Storage I/O control and set a higher “congestion threshold”

5. Disable Storage I/O control

Relates Articles

1. Example Architectural Decision –  Storage I/O Control for Clusters Protected by SRM (Example 2 – Don’t Use SIOC)

 

Example Architectural Decision – Storage I/O Control for Clusters Protected by SRM (Example 1 – Don’t Use SIOC)

Problem Statement

In an environment with one or more clusters with virtual machines protected by SRM, What is the most appropriate configuration of Storage I/O control?

Requirements

1. SRM solution must not be impacted

Assumptions

1. vSphere Version 4.1 or later

2. FC (Block) Based Storage OR NFS (File) based Storage

Constraints

1. Storage I/O control can prevent unmounting of datastore during a Recovery which can lead to errors being reported by SRM

Motivation

1. Where possible ensure consistent storage performance for all virtual machines

2. Simplicity

Architectural Decision

Do not use Storage I/O control for datastores protected by SRM

Justification

1. Storage I/O control can prevent unmounting of datastore during a Recovery which can lead to errors being reported by SRM

2. Storage I/O control can prevent re-mounting of datastore/s during a failback which can lead to errors being reported by SRM and prevent failback without manual intervention

3. Solution does not require any custom steps added to SRM to facilitate a successful recovery

Implications

1. Storage I/O control cannot be used for Datastores protected by SRM

2. In the event of storage contention, SIOC will not be able to ensure fairness between virtual machines based on their share values

3. Storage Performance may degrade significantly during contention

Alternatives

1. Enable Storage I/O control and leave default settings

2. Enable storage I/O control and set share values on virtual machines

3. Enable Storage I/O control and set a lower “congestion threshold”

4. Enable Storage I/O control and set a higher “congestion threshold”

5. Enable Storage I/O control and as part of the DR runsheet, disable SIOC prior to executing a SRM recovery

Relates Articles

1. Example Architectural Decision –  Storage I/O Control for Clusters Protected by SRM (Example 2 – Use SIOC)

 

Example Architectural Decision – Virtual Machine Swap file location for SRM protected VMs

Problem Statement

In an environment where multiple vSphere clusters are protected by VMware Site Recovery Manager (SRM) with array based replication. What is the best way to ensure the RTO/RPO is met/exceeded and to minimize the storage replication overhead?

Assumptions

1. Additional storage will not be obtained

2. Eight (8) Paths per LUN are Masked/Zoned

Motivation

1. Optimize underlying storage usage

2. Ensure transient files are not unnesasarily replicated

Architectural Decision

Configure vSphere cluster swapfile policy to Store the swapfile in the datastore specified by the host.

Create and configure a dedicated swap file datastores provided by Tier 1 storage with greater than the capacity of the total vRAM for the cluster itself, along with any/all clusters using the cluster/s as recovery sites.

Justification

1.Decreased storage replication between protected and recovery sites

2. Reduced impact to the underlying storage due to reduced replication

3. Reduces the used space at the recovery site

4. No impact to the ability to recovery to, or failback from the recovery site

5. vMotion performance will not be impacted as all hosts within a cluster share the same swap file datastore which is provided from the existing shared storage

6. There is minimal complexity in setting a dedicated swap file datastore as such, the benefits outweigh the additional complexity

7. In the event of swapping, performance will not be impacted as the swap file is on Tier 1 storage

8. There is no additional Tier 1 storage utilization as the vswap file would alternatively be set to “Store in the same director as the virtual machine”

9. Ensures memory (RAM) over commitment can still be achieved where as setting memory reservations would reduce/eliminate this benefit of vSphere

Implications

1. vMotion performance between clusters will be degraded as the swap file will be moved as part of the vMotion to the destination cluster swap file datastore

2. One (1) datastore out of a maximum of 256 per host are used for the swap file datastore

3. Eight (8) paths out of a maximum of 1024 per host are used for the swap file datastore

Alternatives

1. Store the swapfile in the same directory as the virtual machine

2. Set Virtual machine memory reservations of 100% to eliminate the vswap file

Relates Articles

1. Site Recovery Manager Deployment Location

2. VMware Site Recovery Manager, Physical or Virtual machine?

 

Example Architectural Decision – Site Recovery Manager Server – Physical or Virtual?

Problem Statement

To ensure Production vSphere environment/s can meet/exceed the required RTOs in the event of a declared site failure, What is the most suitable way to deploy VMware Site Recovery Manager, on a Physical or Virtual machine?

Requirements

1. Meet/Exceed RTO requirements

2. Ensure solution is fully supported

3. SRM be highly available, or be able to be recovered rapidly to ensure Management / Recovery of the Virtual infrastructure

4. Where possible, reduce the CAPEX and OPEX for the solution

5. Ensure the environment can be easily maintained in BAU

Assumptions

1. Sufficient compute capacity in the Management cluster for an additional VM

2. SRM database is hosted on an SQL server

3. vSphere Cluster (ideally Management cluster)  has N+1 availability

Constraints

1. None

Motivation

1. Reduce CAPEX and OPEX

2. Reduce the complexity of BAU maintenance / upgrades

3. Reduce power / cooling / rackspace usage in datacenter

Architectural Decision

Install Site Recovery Manager on a Virtual machine

Justification

1. Ongoing datacenter costs relating to Power / Cooling and Rackspace are avoided

2. Placing Site Recovery Management on a Virtual machine ensures the application benefits from the availability, load balancing, and fault resilience capabilities provided by vSphere

3. The CAPEX of a virtual machine is lower than a physical system especially when taking into consideration network/storage connectivity for the additional hardware where a physical server was used

4. The OPEX of a virtual machine is lower than a physical system due to no hardware maintenance, minimal/no additional power usage , and no cooling costs

3. Improved scale-ability and the ability to dynamically add additional resources (where required) assuming increased resource consumption by the VM. Note: The guest operating system must support Hot Add / Hot Plug and be enabled while the VM is shutdown. Where these features are not supported, virtual hardware can be added with a short outage.

4. Improved manageability as the VMware abstraction layer makes day to day tasks such as backup/recovery easier

5. Ability to non-disruptively migrate to new hardware where EVC is configured in compatible mode and enabled between hosts within a vSphere data center

Alternatives

1. Place SRM on a physical server

Implications

1. For some storage arrays, the SRM server needs to have access to admin LUNs and using a virtual machine may increase complexity by the requirement for RDMs

I would like to Thank James Wirth VCDX#83 (@jimmywally81) for his contribution to this example architectural decision.

Related Articles

1. Site Recovery Manager Deployment Location

2. Swap file location for SRM protected VMs

CloudXClogo

 

 

Example Architectural Decision – Site Recovery Manager Deployment Location

Problem Statement

To ensure Production vSphere environment/s can meet/exceed the required RTOs in the event of a declared site failure and easily perform scheduled DR testing, VMware Site Recovery Manager will be used to automated the failover to the secondary site.

What is the most suitable way to deploy Site Recovery Manager to ensure the environment can be maintained with minimal risk/complexity?

Requirements

1. Meet/Exceed RTO requirements
2. Ensure solution is fully supported

Assumptions

1. vCenter is considered a Tier 1 application
2. vSphere 5.1
3. SRM 5.1
4. A single Windows instance hosts vCenter, SSO and Inventory services and is protected by vCenter Heartbeat

Constraints

1. SRM is not protected by vCenter Heartbeat

Motivation

1. Reduce the complexity for BAU maintenance

Architectural Decision

Install Site Recovery Manager on a dedicated Windows 2008 instance

Justification

1. When installing / upgrading /  patching  SRM including Storage Replication Adapters (SRAs) this may require a reboot or troubleshooting which may impact the production vCenter, including SSO and inventory services.

2. Having SRM separate to vCenter ensures the fail over is not unnecessarily delayed in the event of a disaster due to contention with vCenter on the same VM

3. SRM and vCenter work together in the event of an outage, as such they are less complimentary workloads

4. If hosted on vCenter, SRM will then be subject to the same change windows and be impacted during any maintenance performed for applications running on the same OS instance

5. The SRM application has different availability requirements than vCenter, as such if SRM was combined with vCenter, SRM (having a lower availability requirement than vCenter) would have to be treated with the same change management / care as vCenter which would complicate BAU maintenance

6. The SRM service (business) has different maintenance requirements to vCenter, as such they are not suited to be placed on the same VM

7. Having SRM on a dedicated VM aligns with the scaling out recommendation for virtual workloads

8. Having additional components on the same OS increases complexity and may reduce the availability of vCenter

Alternatives

1. Place SRM on the vCenter server

Implications

1. One (1) additional Windows 2008 R2 licenses will be required

2. One (1) additional Windows instance will need to be maintained in BAU

I would like to Thank James Wirth VCDX#83 (@jimmywally81) for his contribution to this example architectural decision.

Related Articles

1. VMware Site Recovery Manager, Physical or Virtual machine?

2. Swap file location for SRM protected VMs

CloudXClogo

 

 

Melbourne VMUG Feb 7th 2013 – Optimizing VMware vSphere , vCloud and VDI Environments with Intelligent Storage

Last month I presented a Community Session at the Melbourne VMUG

“Optimizing VMware vSphere , vCloud and Desktop Environments with Intelligent Storage”

For those who are interested, you can watch the recorded session here.

A special Thanks to Craig Waters (@cswaters1) Melbourne MVUG leader for organizing the Melbourne VMUG and recording/encoding this session for the VMware community.

Example Architectural Decision – vSphere 5.1 Single Sign On (SSO) deployment mode across Active/Active Datacenters

Problem Statement

What is the most suitable deployment mode for vCenter Single-Sign On (SSO) in an environment where there are two (2) physical datacenters running in an Active/Active configuration?

Requirements

1. The solution must be a fully supported configuration
2. Meet/Exceed RTO of 4 hours
3. Environment must support SRM failover between Datacenter A and Datacenter B where an entire datacenter is lost

Assumptions

1.Three (3) vCenter servers will be used, One (1) at Datacenter A and Two (2) at Datacenter B
2. Environment has Two (2) Production clusters (One per Datacenter), and One (1) vCloud Cluster at Datacenter B each with a dedicated vCenter
3. Stretched clusters are not used
4. All vSphere Infrastructure servers (including SSO) are protected by SRM and vSphere HA
5. Inter-site Metropolitan Area Network is high bandwidth (>10Gb) , low latency (<5ms) and highly available (99.999%)
6. The average number of authentications per second for each SSO instance is <30 (Configuration Maximum)

Constraints

1. The environment uses traditional agent based backup solution which may not meet RPO/RTO requirements

Motivation

1. Future proof the environment

Architectural Decision

1. Use “Multi-site” SSO deployment mode
2. Do not use SSO “High Availability” clusters
3. The Primary SSO server will be at Datacenter B
4. The remaining vCenter servers will be “Secondaries” and point to the Datacenter B Primary SSO instance
5. The each SSO instance will be on a dedicated Windows 2008 x64 R2 instance
6. Each SSO instance will use the bundled SQL database
7. (Optional) For greater availability , vCenter Heartbeat will be used to protect each SSO instance

Justification

1. The environment is being designed (where) possible to sustain a Metropolitan Area Network failure between the two (2) datacenters

2. If “High Availability” mode is used, at least one (1) vCenter would be accessing SSO across the MAN link which introduces an unnecessary dependency on the MAN links

3. “High Availability” currently requires manual intervention which can be complicated and problematic

4. “Basic” mode prevents the use of Linked Mode which will make management of the environment more difficult

5. Using Multisite mode allows faster access to authentication services as each SSO instance is configured with Active Directory servers located at the same datacenter.

6. Multisite mode is required for the use of Linked-Mode and Linked Mode will  make day to day management easier

7. If one instance SSO goes offline for any reason, this will not impact production virtual machines. It will simply prevent any authentication to the affected vCenter server.

8. Having the SSO Primary at Datacenter B ensures only traffic from one vCenter (Datacenter A vCenter) traverses the MAN link as the third vCenter (for vCloud Director) is at Datacenter B

9. In the event of Datacenter B having a full datacenter wide failure for any reason, the Primary SSO instance being offline will not impact the management of Datacenter A OR the ability for the environment being recovered by SRM.

Alternatives

1. Use “Basic” Mode, resulting in a standalone version of SSO for each vCenter server

2. Use “High Availability Cluster” (Shared the same SSO database and identity sources) with one SSO server per physical datacenter

3. Use “Multisite” deployment with “High Availability Clusters” per datacenter

4. Host SSO database on a SQL Server

5. Run SSO on the vCenter server with or without the SSO database locally

6. Run a single SSO instance shared by all three (3) vCenters and use vCenter Heartbeat running across the MAN to protect SSO

Implications

1. Without a “High Availability Cluster” or SSO being protected by vCenter Heartbeat at each datacenter, the SSO for each site is a Single point of failure where authentication to the affected vCenter will fail

2. In the event of one (1) SSO server failing at Datacenter A, the SSO role does not failover to Datacenter B, or vice versa. In this case, All authentication requests on the site where SSO has failed, will fail.

3. Requires the installable version of SSO, which is Windows Only. The use of the vCenter Server Appliance (VCSA) is not available.

4. Additional Windows 2008 licenses are required for the SSO servers

Related Articles

1. Disabling Single Sign On – Dont Do It! - LongWhiteClouds

I would like to Thank Michael Webster VCDX#66 (@vcdxnz001) for his contribution to this example architectural decision.

CloudXClogo

 

 

Example Architectural Decision – Host Isolation Response for FC Based storage

Problem Statement

What are the most suitable HA / host isolation settings where the environment uses Storage (IBM SVC) with FC connectivity via a dedicated highly available Storage Area Network (SAN) fabric where ESXi Management and Virtual Machine traffic run over a highly available data network?

Requirements

1. Ensure in the event of one or more hosts becoming isolated, the environment responds in an automated manner to recover VMs where possible

Assumptions

1.The Network is highly available (>99.999% availability)
2. The Storage is highly available (>99.999% availability)
3. vSphere 5.0 or later
4. ESXi hosts are connected to the network via two physical separate switches via two physical NICs

Constraints

1. FC (Block) based storage

Motivation

1. Meet/Exceed availability requirements
2. Minimize the chance of a false positive isolation event

Architectural Decision

Turn off the default isolation address by setting the below advanced setting

“das.usedefaultisolationaddress” = False

Configure three (3) isolation addresses by setting the below advanced settings

“das.isolationaddress1″ = 192.168.1.1 (Core Router)

“das.isolationaddress2″ = 192.168.1.2 (Core Switch 1 )

“das.isolationaddress3″ = 192.168.1.3 (Core Switch 2 )

Configure Datastore Heartbeating with “Select any of the clusters datastores”

Configure Host Isolation Response to: “Shutdown”

Justification

1. When using FC storage, it is possible for the Management and Virtual Machine Networks to be unavailable, while the Storage network is working perfectly. In this case Virtual machines may not be able to communicate to other servers, but can continuing reading/writing from disk. In this case, they will likely not be servicing customer workloads, as such, Shutting the VM down gracefully allows HA to restart the VM/s on host/s which are not isolated gives the VM a greater chance of being able to resume servicing workloads than remaining on an isolated host.
2. Datastore heartbeating will allow HA to confirm if the host is “isolated” or “failed”. In either case, Shutting down the VM will allow HA to recover the VM on a surviving host.
3. As all storage is presented via Active/Active IBM SVC controllers, there is no benefit is specifying specific datastores to be used for heartbeating
4. The selected isolation addresses were chosen as they are both highly available devices in the network which are essential for network communication and cover the core routing and switching components in the network.
5. In an environment where the Network is highly available an isolation event is extremely unlikely  as such, where the three (3) isolation addresses cannot be contacted, it is unlikely the network can be restored in a timely manner OR the host has suffered multiple concurrent failures (eg: Multiple Network Cards) and performing a controlled shutdown helps ensure when the network is recovered, the VMs are brought back up in a consistent state, OR in the event the isolation impacts only a subset of ESXi hosts in the cluster, the VM/s can be recovered by HA and resume normal operations.

Alternatives

1. Set Host isolation response to “Leave Powered On”
2. Do not use Datastore heartbeating
3. Use the default isolation address

Implications

1. In the event the host cannot reach any of the isolation addresses, virtual machines will be Shutdown
2.  Using “Shutdown” as opposed to “Power off” ensures a graceful shutdown of the guest operating system, however this will delay the HA restart of the VM for up to 5 mins (300 seconds) if VMware Tools is unable to do a controlled shutdown, in which case after 300 seconds a “Power Off” will be executed.
3. In the unlikely event of network instability, VMs may be Shutdown prematurely.

CloudXClogo