Competition Example Architectural Decision Entry 2 – Use of RDMs in Standard IaaS Clusters

Name: Chris Jones
Title: Virtualization Architect
Twitter: @cpjones44
Profile: VCP5 / VCAP5-DCD

Problem Statement

VMs require more than 1.9TB in a single disk. The existing virtual environment has LUNs provisioned that are 2TB in size. As these VMs have virtual data disks (VMDKs) that are > 1.9TB in size, alarms are being triggered by the infrastructure monitoring solution and raising Incident tickets to the Virtual Infrastructure support queue.

Assumptions

1. Data within the OSI must reside within the VM and not on some kind of IP based store (like a NAS share).

2. vSphere datastores are presented through FC and not IP based stores (ie. NFS).

3. vSphere Hypervisor is ESXi 4.1.

4. There is no requirement for the VMs to be performing SAN specific functionality or running SCSI target-based software.

Constraints

1. The implemented monitoring solution cannot be customised with triggers and monitoring policies for individual objects within the environment (ie. having one monitoring policy per individual or sub-group of datastores).

2. Maximum vSphere datastore size in version 4.1 is 2TB minus 512 bytes.

3. Unable to upgrade beyond ESXi 4.1 Update 3.

Motivation

1. Reduce the number of incident tickets being raised, thus improving SLA posture.

2. Reduce the requirement to span single Windows logical volumes across multiple VMDKs.

Architectural Decision

Turn the disk into an RDM (Virtual Compatibility Mode) to remove the level of monitoring from the vSphere layer.

Alternatives

1. Create smaller VMDKs (ie. 1-1.5TB disks) and create a RAID0 volume within the guest OS.

2. Change the level of alerting so that tickets are not raised for alerts that trigger beyond 90%.

3. Turn the disk into an RDM to remove the level of monitoring from the vSphere layer.

4. Thin Provision the virtual disks

5. Store the data within the guest on some kind of IP based storage (NAS/iSCSI target).

Justification

1. Option 5 goes against the assumption that data must be local to the VM, so was ruled out.

2. Whilst thin provisioning (Option 4) is an attractive solution, this option is ruled out based on a wider infrastructure decision to thick provision all disks in the environment to reduce the risk to datastores filling up and critical business VMs stopping.

3. Option 1 via smaller VMDKs spread across multiple vSphere datastores will result in these alerts disappearing, however it will create issues when trying to execute a DR recovery for either the individual disks (Active/Passive) or the whole VM (Active/Cold). All that’s needed is for one VMDK not to be replicated and the whole Windows volume will be corrupted, or for the VMDKs to be mounted in the wrong order. Multiple VMDKs to one Windows volume also complicates the recovery of snapshot array-based backups (eg. via SMVI or NetBackup).

4. Option 2 goes against the constraint of the infrastructure monitoring solution not being able to creating individual alerting policies for either a single or sub-group of datastores in the inventory. Should individualised policies be created, we would need to ensure that the affected VMDKs that consume 90-95% of a datastore remain on that datastore as moving from one to another (ie. from Tier 2 to Tier 1) will require a change to the monitoring that has been configured. At this stage, the monitoring solution has no way to track these customised policies, which is most of the reason why global environment wide policies exist.

5. Option 3 and the use of RDMs in Virtual Compatibility Mode will allow the VM to benefit from the features of VMFS, such as advanced file locking for data protection and vSphere snapshotting. The use of RDMs will also allow for VMs to be managed by DRS (ie. can be vMotion’ed) and protected by vSphere HA.

Implications

  1. The RDM mapping will need to be recorded clearly to avoid the lengthy process of discovering from scratch what physical LUN is presented to the virtual machine.

An example of how to map these will be to:

A)    Record the name of the VM that has the RDM.

B)    Record the NAA number of the physical LUN(s) that are presented to the VM.

C)    Record the virtual device node on the virtual disk controller as to where the RDM is mounted.

D)    Record the Windows drive letter that this RDM is mounted to.

2. Additional paths will be consumed, reducing the total number of vSphere datastores that can be presented to the cluster.

Back to Competition Main Page or Competition Submissions

Competition Example Architectural Decision Entry 1 – TSM backup configuration for PureFlex environment?

Name: Ash Simpson
Title: Virtualization Architect
Company: IBM
Twitter: @Yipikaye1
Profile: VCP4

Problem Statement

Which is the ideal method for TSM backup for PureFlex environment? LAN free backup or LAN based backup or both?

Assumptions

1. IBM PureFlex hardware is used

2. Physical TSM server exists within PureFlex.

3. External (Virtual) Tape Library available on PureFlex SAN Fabric.

Constraints

1. Customer has selected PureFlex Infrastructure as hardware platform
2. IBM storage must be used – Storwize V7000 and IBM DS8000
3. ProtecTier VTL available and should be used

Motivation

1. Flexibility of Choice based on specific application requirements requirements.
2. The configuration to be deployed has the capability to support both.
3. LAN free backup is getting popular option in the industry.
4. LAN free backup negates the need for large backup windows.
5. PureFlex V7000 allows for FlashCopy Manager (FCM)
6. FCM is application aware for many critical Intel workloads such as SQL and Exchange.
7. All Backup I/O is retained within a single PureFlex Chassis

Architectural Decision

Deploy LAN free backup and LAN based backup infrastructure in PureFlex environments with LAN free backup via TSM for VE and FlashCopy Manager as the default. Should a particular application have the requirement for LAN based backup, the infrastructure can support it.

Host the Physical TSM server and an ESXi Host with the TSM for VE server (via affinity rule) in the same Chassis.

For the few servers requiring LAN based backup agents use affinity rules to prefer ESXi hosts in the same PureFlex chassis as the TSM server.

Alternatives

1. Provide LAN based backup only

2. Provide LAN free backup only.

Justification

1.Better utilization of network bandwidth in LAN free backup.
2.Improved performance for backup and restore operations is possible in LAN free backup.
3. LAN based backup is still required by certain applications, hence it is recommended to retain this feature.
4. Hosting TSM server in same chassis as proxy/agents prevents North/South network I/O.
5. FlashCopy Manager will reduce backup times by creating application aware snapshots on the storage array.

Implications

1. The hardware infrastructure will have to be configured for both LAN free and LAN based backup. For LAN free backup the SAN fabric in PureFlex system will be used for backup environment. The backup server transfers data from its storage directly to the tape device via FC.

2. Fibre Channel ports needs to be dedicated for backup traffic

3. Separate Zones needs to be configured in the Fibre Channel Switch module environment for backup traffic.

Back to Competition Main Page or Competition Submissions

Example Architectural Decision Competition by VMware Press & Josh Odgers

VMwarePressLogo

Welcome to the Example Architectural Decision Competition!

VMware Press is conjunction with JoshOdgers.com (CloudXC) wish to announced this competition to find the most innovative and creative virtualization related architectural decisions to real world problems.

All submissions will be posted in this special section of JoshOdgers.com (CloudXC) with the goal to encourage everyone to share their experiences for the benefit of the Virtualization community.

All suitable example architectural decisions submitted to this competition will remain featured on this blog following the competition with credit being given to the author.

The competition will initially run for the next six (6) weeks and depending on the popularity of the competition it may be extended.

The Winner will be announced Fortnightly and will receive a printed copy of the VMware Press title of their choice.

The runner up will receive a voucher for a VMware Press eBook.

You can see the range of books VMware Press offer here.

If any other vendors wish to contribute prizes to this competition please add a comment to this page or contact me via Twitter (@josh_odgers).

The format of all example architectural decisions submissions must be as follows. Any submission without details for the following categories will be ineligible.

Problem Statement

Describe the problem statement or goal of the situation the design decision relates too

Assumptions

1. Assumption 1
2. Assumption 2
3. Assumption 3

Constraints

1. Constraint 1
2. Constraint 2
3. Constraint 3

Motivation

1. Motivation 1
2. Motivation 2

Architectural Decision

Details of Architectural Decision

Alternatives

1. Alternative 1
2.  Alternative 2
3.  Alternative 3

Justification

1. Justification 1
2. Justification 2
3. Justification 3
4. Justification 4
5. Justification 5

Implications

1. Implication 1
2. Implication 2

Example Architectural Decisions can be submitted via the following form.

Note: Limit of 3 submissions per person, per fortnight.

Winners will be announced on this blog and via Twitter on the following dates

October 17th
October 31st
November 14th

Good Luck!

COMPETITION ENDED.