Taking vSphere to the Next Level with Converged Infrastructure – vForum2013

vForumTitlePage

At this months vForum in Sydney, Australia, I will be presenting “Taking vSphere to the Next Level with Converged Infrastructure”.

The session discusses where the virtualization market is today along with problems that exist along the journey to 100% virtual and how converged infrastructure helps solve these challenges allowing vSphere architects and administrators to take their virtual environments to the next level.

The session is scheduled for Day 2 of the conference (October 22nd) at 11AM in Parkside 110A room.

I invite everyone who will be at vForum Sydney to attend the session, and for those of you who cannot make the event, the presentation will be posted on this blog along with follow up articles covering any questions from the session.

 

Competition Example Architectural Decision Entry 4 – vCloud Allocation Pool Usable Memory

Name: Prasenjit Sarkar
Title: Senior Member of Technical Staff
Company: VMware
Twitter: @stretchcloud
Profile: VCAP-DCD4/5,VCAP-DCA4/5,VCAP-CIA,vExpert 2012/2013

Problem Statement

When using an Allocation Pool with 100% memory reservation, due to the VM memory overhead, the usable memory is less than what is expected by the users. What is the best way to ensure users can use the entire memory assigned to the Allocation pool.

Assumptions

1. vCD 5.1.2 is in use

2. vSphere 5.1 or later is in use

3. Org VDC created with Allocation Pool

Constraints

1. vCD 5.1.2 has to be used

2. Allocation Model only VDCs are affected

Motivation

1. Need to use 100% memory allocated to the VDC with Allocation Pool model

2. Optimal use of Memory assigned to Org VDC and then to the VM

Architectural Decision

Due to the “by design” fact of VM memory overhead, we cannot use the entire allocated memory and this will be solved by enabling Elastic Allocation Pool in the vCloud System level and then set a lower vCPU Speed value (260 MHz). This will allow VMs to use the entire allocated memory (100% guarantees) in the Org VDC.

Alternatives

1. Over allocate resources to the customer but only reserve the amount they purchased.

Historically VM overhead ranges in between <=5% to 20%. Most configurations have an overhead of less than 5%, if you assume such you could over allocate resources by 5% but only reserve ~95%. The effect would be that the customer could consume up to the amount of vRAM they purchased and if they created VMs with low overhead (high vRAM allocations, low vCPU) they could possibly actually consume more than they “purchased”. In the case of a 20GHz/20GB purchase we would have to set the Allocation to 21GHz but set the reservation to 95%.

Justification

VM memory overhead is calculated with so many moving targets like the model of the CPU in the ESXi host the VM will be running on, whether 3D is enabled for MKS, etc. So you cannot use the entire allocated memory at any point in time.

By selecting the Elastic VDC, we are overwriting this behavior and still not allowing more VMs to power on from what they have entitled to. Also Elastic VDC gives us an opportunity to set a custom vCPU speed and lowering the vCPU speed will allow you to deploy more vCPUs without being penalized. Without setting this flag, you cannot overcommit the vCPU, which is really bad.

260MHz is the least vCPU speed we can set and thus this has been taken to allow system administrators to overcommit the vCPUs in a VDC with Allocation Pool.

 

Implications

1. One of the caveat is not having any memory reservation for any VMs. Due to the nature of OrgVDCs, it does not allow an Org Admin to set the resource reservation for the VMs (unlike Reservation Pool) and thus any VMs with Elasticity on will not have any reservation which will be marked as overkill for the customer’s high I/O VMs (like DB or Mail Server).

You can easily overwrite the resource reservation using the vSphere but that is not the intent. Hence, we flag it as RISK as it will hamper customer’s VM performance for sure.

If we say we are reserving 100% memory and thus spawning the VMs will get equal memory and can’t oversubscribe the memory as the limit is still what the customer has bought, then also if there is a contention of memory within those VMs, I don’t have an option to prefer those VMs which are resource hungry. In a nutshell all of the VMs will get equal share.

Equal shares will distribute the resource in a RP equally and thus there will not be any guarantee that a hungry VM can get more resource on demand.

Back to Competition Main Page or Competition Submissions

Competition Example Architectural Decision Entry 2 – Use of RDMs in Standard IaaS Clusters

Name: Chris Jones
Title: Virtualization Architect
Twitter: @cpjones44
Profile: VCP5 / VCAP5-DCD

Problem Statement

VMs require more than 1.9TB in a single disk. The existing virtual environment has LUNs provisioned that are 2TB in size. As these VMs have virtual data disks (VMDKs) that are > 1.9TB in size, alarms are being triggered by the infrastructure monitoring solution and raising Incident tickets to the Virtual Infrastructure support queue.

Assumptions

1. Data within the OSI must reside within the VM and not on some kind of IP based store (like a NAS share).

2. vSphere datastores are presented through FC and not IP based stores (ie. NFS).

3. vSphere Hypervisor is ESXi 4.1.

4. There is no requirement for the VMs to be performing SAN specific functionality or running SCSI target-based software.

Constraints

1. The implemented monitoring solution cannot be customised with triggers and monitoring policies for individual objects within the environment (ie. having one monitoring policy per individual or sub-group of datastores).

2. Maximum vSphere datastore size in version 4.1 is 2TB minus 512 bytes.

3. Unable to upgrade beyond ESXi 4.1 Update 3.

Motivation

1. Reduce the number of incident tickets being raised, thus improving SLA posture.

2. Reduce the requirement to span single Windows logical volumes across multiple VMDKs.

Architectural Decision

Turn the disk into an RDM (Virtual Compatibility Mode) to remove the level of monitoring from the vSphere layer.

Alternatives

1. Create smaller VMDKs (ie. 1-1.5TB disks) and create a RAID0 volume within the guest OS.

2. Change the level of alerting so that tickets are not raised for alerts that trigger beyond 90%.

3. Turn the disk into an RDM to remove the level of monitoring from the vSphere layer.

4. Thin Provision the virtual disks

5. Store the data within the guest on some kind of IP based storage (NAS/iSCSI target).

Justification

1. Option 5 goes against the assumption that data must be local to the VM, so was ruled out.

2. Whilst thin provisioning (Option 4) is an attractive solution, this option is ruled out based on a wider infrastructure decision to thick provision all disks in the environment to reduce the risk to datastores filling up and critical business VMs stopping.

3. Option 1 via smaller VMDKs spread across multiple vSphere datastores will result in these alerts disappearing, however it will create issues when trying to execute a DR recovery for either the individual disks (Active/Passive) or the whole VM (Active/Cold). All that’s needed is for one VMDK not to be replicated and the whole Windows volume will be corrupted, or for the VMDKs to be mounted in the wrong order. Multiple VMDKs to one Windows volume also complicates the recovery of snapshot array-based backups (eg. via SMVI or NetBackup).

4. Option 2 goes against the constraint of the infrastructure monitoring solution not being able to creating individual alerting policies for either a single or sub-group of datastores in the inventory. Should individualised policies be created, we would need to ensure that the affected VMDKs that consume 90-95% of a datastore remain on that datastore as moving from one to another (ie. from Tier 2 to Tier 1) will require a change to the monitoring that has been configured. At this stage, the monitoring solution has no way to track these customised policies, which is most of the reason why global environment wide policies exist.

5. Option 3 and the use of RDMs in Virtual Compatibility Mode will allow the VM to benefit from the features of VMFS, such as advanced file locking for data protection and vSphere snapshotting. The use of RDMs will also allow for VMs to be managed by DRS (ie. can be vMotion’ed) and protected by vSphere HA.

Implications

  1. The RDM mapping will need to be recorded clearly to avoid the lengthy process of discovering from scratch what physical LUN is presented to the virtual machine.

An example of how to map these will be to:

A)    Record the name of the VM that has the RDM.

B)    Record the NAA number of the physical LUN(s) that are presented to the VM.

C)    Record the virtual device node on the virtual disk controller as to where the RDM is mounted.

D)    Record the Windows drive letter that this RDM is mounted to.

2. Additional paths will be consumed, reducing the total number of vSphere datastores that can be presented to the cluster.

Back to Competition Main Page or Competition Submissions