Competition Example Architectural Decision Entry 5 – New vSphere 5.x environment

Posted on October 14, 2013 by Josh Odgers

Name: Anand Prakash
Title: Virtualization Specialist
Twitter: @ananprak32
Profile: VCA-WM, VCA-DCV , VCA-Cloud , VCP3, VCP-DCV 4,5

Problem Statement

This is new company which is virtualization to VMware vSphere 5.x environment. As they have been managing lot of physical environment, they want to carry forward those settings/practices in Virtual Environment:
1.Tiering of all disks allocated to virtual machines. Using different Tiers for different mount point inside Windows Guest. So that they can separate OS/Data/Application Log with different performance levels.
2.They want to use dedicated LUN’s for all the VMDK’s assigned to the Guest. Possibly using RAID inside guest to enhance performance .
3.Having multiple of virtual NIC’s for Backup/Management/Data per virtual machine.
4.Creating larger machines with more memory and CPU so that applications runs faster. While creating virtual machines it is considered for future workloads (Minimum 5 years). This includes Memory/CPU/Storage.
5.Using traditional methods of DR/Backup in virtual environment. Having similar dedicated environment at DR site. The servers are running all the time so that they can used during and event of DR.
6.The environment has been designed and implemented to meet above requirements. Now they have are having multiple of performance issues and outages. The engineers spend most of their time in troubleshooting and fixing/reworking issues.
7.With so much investment they have hit the limit of consolidation. They have some legacy servers for which hardware warranty has expired and cannot be virtualized. Customer do not want to extend the warranty of these servers and it is difficult to find spares or upgrade them.

Assumptions

1.Environment is running VMware vSphere 5.0 (VMware vCenter 5.0 and ESXi 5.0)
2.There are 2 sites with 12 servers of following Specs:
HP ProLiant SL210t Gen8 Server with 512GB of Memory and 4 – 12 Core CPU, Embedded NIC – Intel’s x2 1G NIC on board
3.There are clusters made up 4 hosts each. There 2 clusters at primary site (1 Prod and 1 Non Prod). The other cluster at secondary site is dedicated to DR and not running any workload.
4.They are using Symmetrix VMAX storage array with VPLEX solution.
5.They have replication technologies to replicate data at secondary site.
6.Out of 2- 1 GB NIC’s, one is dedicated for vMotion, Management and other NIC is dedicated for virtual machine traffic.
7.Network is capable to supporting 10GB NIC’s.
8.The Non-Prod environment are not required to be running during DR.

Constraints

1.Organisations wants to remediate this issues with minimum investment and hardware purchases.
2.The Production and Non-Prod environment should be physically separate and should not communicate with each other.
Motivation

1.The environment should easy to manage and should not require too much time and effort to support it.
2.The environment should provide higher consolidation ratio and better performance.
3.They should be standard across environment and flexible to changing needs.
4.More network redundancy for failures and load balancing.
5.This should provide customer satisfaction and cost savings.
6.Provide better options to manage capacity.

Architectural Decision

1.Recommended to use of Automated Storage Tiering (AST) at Storage Array. There are policies which can be set at storage level of assign infrequently used data to slower, less-expensive SATA storage but allow it to be automatically moved to higher-performing SAS or solid-state drives (SSDs) as it becomes more active. We can create many different groups based on business need (Bronze/Silver/Gold and Replicated/Non-Replicated). Create Storage DRS cluster and simply deploy the virtual machines to that DRS cluster.
Note: Make sure you disable uncheck “Enable I/O metric for SDRS recommendations.” if you are using Automated Storage Tiering (AST) at Storage Array.
2.Recommended to use large 4TB LUN’s instead of using small LUN per vmdk. Move the virtual machines to appropriate SDRS Group. User single disks from datastore to create single drive.
3.Use single NIC’s per virtual machine and use network less backups method like VMware vSphere Data Recovery. No management and backup NIC’s.
4.Deploy VMware vCenter Operations in environment. Start analysis of whole environments and right size the virtual machine based on their active usage. And right size machine based on analysis of vCenter Operations manager. Also if required use converter to right size disks (It is easy to increase it but you will need rework to decrease the disk allocation).
5.Buy additional 2 – Dual Port 10GB NIC’s and install them on all 12 servers. Then create 2 Addition virtual switches on ESXi hosts. Connect one uplink of each 10GB NIC to Prod Network and other uplink to Non-Prod Network. Have both onboard uplinks connected to Management and vMotion VMkernal interfaces. Use ESXi host profiles to standardize this configuration.
Once the above configuration are done to ESXi host we can run both Prod and Non-Prod load on Same ESXi host. Now instead of using backup/restore or data disk replication we can setup the replication of all large datastores. Move the Guest OS on these datastores which require DR. We will run the Non-Prod load on the cluster at DR sites. We can use VMware Site Recovery to configure DR Plan to shutdown Non-Prod Guests at DR site and power on the Production Guests.
Merge both Prod and Non-Prod Clusters in the same cluster at primary site. Setup high priority/Reservation to Production Guests.

Alternatives

1.You can use vSphere Storage Profiles to maintain Tiers or Storage I/O Control.
2.Leave as current configuration
3.Leave as current configuration
4.You can ask to use to use VMware Capacity Planner to review the environment.
5.Leave as current configuration

Justification

1.You can use vSphere Storage Profiles and SIOC in combination. But the use of storage profiles post implementation is tedious task. Also if you do not use storage profiles you will loose current configuration and location of disk tiers (During storage vMotion if advanced option is not used). This type of machines with Tiers are difficult to storage vmotion and recovery. When you take snapshot of this type of Virtual machine and all IO’s goes to working directory and the naming structure of snapshot files and base disks become non-standard. If there was the snapshot and virtual machine is crashes (Configuration is lost) it will not be easy to recover. Even if you dedicate the Gold tier (SSD Disk) for application you will not know if it is utilized fully.
2.If you use multiple disks with dedicated LUN you will hit the limit of LUN (256) and no. of paths (1024). Again every time you create a new disk you will loose some free space for VMFS Metadata. Also there will whole lot of free space which gets wasted. If you do not maintain free space your vCenter will be full of alerts. And it will be difficult to distinguish between genuine and false alerts. Hence you will be limited in how many virtual machines you can create. Creating RAID inside guest can invalidate the impact of RAID created at storage array.
3.When you configure the network less backup you do not need dedicated backup NIC. As data on all NIC’s travel through same uplink, there is no point in having dedicated virtual NIC’s for management and data. Also if have some data traffic issues which is unlikely on 10GB uplink you can manage the virtual machine using console. This is critical when you are using Nexus 1000v with limited no. of ports. You could hit the limit of port much before you hit the limit of memory and CPU
4.Using VMware Capacity Planner can help one but there is need for continuous capacity planing. So VMware Operations Manager is a better option. The virtual machines themselves are not aware that they are running virtual hardware and they think that all resource allocated to them are only meant for them and try to use it. But in reality they are shared with others (Hence the machine really need resources suffer). In virtual it is very easy to increase capacity so we only provision for current workload unlike physical.
5.With current configuration there is no redundancy for all the networks. As network is capable of 10GB network and onboard NIC’s are not capable of utilizing that. Also the current configuration requires dedicate host for prod and Non-Prod network. This setup will create standardization of ESXi hosts and make it possible to use for all Guests (Prod/Non-Prod). This can also give you flexibility to group similar (Window/Linux) family to group together based on host (If same type of Guest OS running then they can better utilize Memory TPS). Also if Guest OS are placed on same datastore back-end storage can make use of de-duplication and save lot of storage.

Implications

1.Simple easy management of Virtual machines as VMware admin does not require to spread machine over datastores and can use simple storage vmotion to move virtual machines. You will not need top maintain documentation or keep checking compliance of disk periodically. The storage array can manage load effectively by monitoring current usage. This will ensure all the Tiers are used efficiently and on need basis. Whenever there is low space on datastore cluster you add new datastore to cluster and it manages the re-balancing automatically.
2.More free space on the datastores. Combined free space of many machines results in more savings. There are no alerts in vCenter for free space usage. More deployment of virtual machine and increased efficiency. Environment will have spare capacity to deploy more storage or larger disks machines if required.
3.This will increase the speed of backup and do not clog your network. This will also result in cost savings as you do need tradition agents for backing up individual virtual machine. This will also result in less utilization of memory and CPU of ESXi host. This will provide you more ports to deploy more virtual machines. This will increase consolidation ratio and effective use of physical resources. This will in turn help to recover initial cost of hardware early.
4.Based on analysis we can release the memory, CPU and storage from other virtual machines. This will result in additional resources for other virtual machines and reduced CPU wait times. This will also allow business to virtualize more servers and retire old hardware which cannot be upgraded. More savings for business which can be used to by 10GB NIC’s.
5.As we have host running plenty of memory and CPU, the network can become bottleneck. Deploying additional 10GB NIC’s will overcome this bottleneck. This will provided better performance and more consolidation and efficient use of hardware. This will make it possible to make use of hardware which was dedicated for DR and was using power and space. This will further result in cost saving and higher customer satisfaction.

Back to Competition Main Page or Competition Submissions

Competition Example Architectural Decision Entry 4 – vCloud Allocation Pool Usable Memory

Posted on October 8, 2013 by Josh Odgers

Name: Prasenjit Sarkar
Title: Senior Member of Technical Staff
Company: VMware
Twitter: @stretchcloud
Profile: VCAP-DCD4/5,VCAP-DCA4/5,VCAP-CIA,vExpert 2012/2013

Problem Statement

When using an Allocation Pool with 100% memory reservation, due to the VM memory overhead, the usable memory is less than what is expected by the users. What is the best way to ensure users can use the entire memory assigned to the Allocation pool.

Assumptions

1. vCD 5.1.2 is in use

2. vSphere 5.1 or later is in use

3. Org VDC created with Allocation Pool

Constraints

1. vCD 5.1.2 has to be used

2. Allocation Model only VDCs are affected

Motivation

1. Need to use 100% memory allocated to the VDC with Allocation Pool model

2. Optimal use of Memory assigned to Org VDC and then to the VM

Architectural Decision

Due to the “by design” fact of VM memory overhead, we cannot use the entire allocated memory and this will be solved by enabling Elastic Allocation Pool in the vCloud System level and then set a lower vCPU Speed value (260 MHz). This will allow VMs to use the entire allocated memory (100% guarantees) in the Org VDC.

Alternatives

1. Over allocate resources to the customer but only reserve the amount they purchased.

Historically VM overhead ranges in between <=5% to 20%. Most configurations have an overhead of less than 5%, if you assume such you could over allocate resources by 5% but only reserve ~95%. The effect would be that the customer could consume up to the amount of vRAM they purchased and if they created VMs with low overhead (high vRAM allocations, low vCPU) they could possibly actually consume more than they “purchased”. In the case of a 20GHz/20GB purchase we would have to set the Allocation to 21GHz but set the reservation to 95%.

Justification

VM memory overhead is calculated with so many moving targets like the model of the CPU in the ESXi host the VM will be running on, whether 3D is enabled for MKS, etc. So you cannot use the entire allocated memory at any point in time.

By selecting the Elastic VDC, we are overwriting this behavior and still not allowing more VMs to power on from what they have entitled to. Also Elastic VDC gives us an opportunity to set a custom vCPU speed and lowering the vCPU speed will allow you to deploy more vCPUs without being penalized. Without setting this flag, you cannot overcommit the vCPU, which is really bad.

260MHz is the least vCPU speed we can set and thus this has been taken to allow system administrators to overcommit the vCPUs in a VDC with Allocation Pool.

Implications

1. One of the caveat is not having any memory reservation for any VMs. Due to the nature of OrgVDCs, it does not allow an Org Admin to set the resource reservation for the VMs (unlike Reservation Pool) and thus any VMs with Elasticity on will not have any reservation which will be marked as overkill for the customer’s high I/O VMs (like DB or Mail Server).

You can easily overwrite the resource reservation using the vSphere but that is not the intent. Hence, we flag it as RISK as it will hamper customer’s VM performance for sure.

If we say we are reserving 100% memory and thus spawning the VMs will get equal memory and can’t oversubscribe the memory as the limit is still what the customer has bought, then also if there is a contention of memory within those VMs, I don’t have an option to prefer those VMs which are resource hungry. In a nutshell all of the VMs will get equal share.

Equal shares will distribute the resource in a RP equally and thus there will not be any guarantee that a hungry VM can get more resource on demand.

Back to Competition Main Page or Competition Submissions

Competition Example Architectural Decision Entry 3 – Scalable network architecture for VXLAN

Posted on October 8, 2013 by Josh Odgers

Name: Prasenjit Sarkar
Title: Senior Member of Technical Staff
Company: VMware
Twitter: @stretchcloud
Profile: VCAP-DCD4/5,VCAP-DCA4/5,VCAP-CIA,vExpert 2012/2013

Problem Statement

You are moving towards scalable network architecture for your large scale Virtualized Datacenter and want to configure VXLAN in your environment. You want to make sure that Teaming Policy for VXLAN transport is configured optimally for better performance and reduce operational complexity around it.

Assumptions

1. vSphere 5.1 or greater
2. vCloud Networking & Security 5.1 or greater
3. Core & Edge Network topology is in place

Constraints

1. Should have switches that support Static Etherchannel or LACP (Dynamic Etherchannel)
2. Have to use only IP Hash Load balancing method if using vSphere 5.1
3. Cannot use Beacon Probing as Failure Detection mechanism

Motivation

1. Optimize performance for VXLAN

2. Reduce complexity where possible

3. Choosing best teaming policy for VXLAN Traffic for future scalability

Architectural Decision

LACP – Passive Mode will be chosen as the teaming policy for the VXLAN Transport.

At least two or more physical links will be aggregated using LACP in the upstream Edge switches.

Two Edge switches will be connected to each other.

ESXi host will be cross connected to these two Physical upstream switches for forming a LACP group.

LACP will be configured in Passive mode in Edge switches so that the participating ports responds to the LACP packets that it receives but does not initiate LACP negotiation.

Alternatives

1. Use LACP – Active Mode and make sure you are using IP Hash algorithm for the load balancing in your vDS if using vSphere 5.1.

2. Use LACP – Active Mode and use any of the 22 available load balancing algorithm in your vDS if using vSphere 5.5.

3. Use LACP – Active Mode and use Cisco Nexus 1000v virtual switch and use any of the 19 available load balancing algorithm.

4. Use Static Etherchannel and make sure you are using IP Hash *Only* algorithm in your vDS.

5. If using Failover then have at least one 10G NIC to handle the VXLAN traffic.

Justification

1. Fail Over teaming policy for VXLAN vmkernel NIC uses only one uplink for all VXLAN traffic. Although redundancy is available via the standby link, all available bandwidth is not used.
2. Static Etherchannel requires IP Hash Load Balancing be configured on the switching infrastructure, which uses a hashing algorithm based on source and destination IP address to determine which host uplink egress traffic should be routed through.

3. Static Etherchannel and IP Hash Load Balancing is technically very complex to implement and has a number of prerequisites and limitations, such as, you can’t use beacon probing, you can’t configure standby or unused link etc.

4. Static Etherchannel does not do pre check both the terminating ends before forming the Channel Group. So, if there are issues within two ends then traffic will never pass and vSphere will not see any acknowledgement back in it’s Distributed Switches

5. Active LACP mode places a port into an active negotiating state, in which the port initiates negotiations with other ports by sending LACP packets. If using vSphere prior to 5.5 where only IP Hash algorithm is supported then LACP will not pass any traffic if vSphere uses any other algorithm other than IP Hash (such as Virtual Port ID)

6. The operational complexity is reduced

7. If using vSphere 5.5 then can use 22 different algorithm for load balancing and also Beacon Probing can be used for Failure Detection.

Implications

1. Initial setup has a small amount of additional complexity however this is a one time task (Set & Forget)

2. Only IP Hash algorithm is supported if using vSphere 5.1

3. Only one LAG can be supported for the entire vSphere Distributed Switches if using vSphere 5.1

4. IP Hash calculation if not done manually by taking VM’s vNIC and Physical NIC then there is no guarantee that it will balance the traffic across physical links

Back to Competition Main Page or Competition Submissions

CloudXC

By Josh Odgers – VMware Certified Design Expert (VCDX) #90

Category Archives: Architectural Decisions

Competition Example Architectural Decision Entry 5 – New vSphere 5.x environment

Competition Example Architectural Decision Entry 4 – vCloud Allocation Pool Usable Memory

Competition Example Architectural Decision Entry 3 – Scalable network architecture for VXLAN

Share this:

Share this:

Share this: