VM Right Sizing – An example of the benefits

I thought this example may be useful to show the benefits of Right sizing a virtual machine.

The VM is an SQL Database server with 4 vCPUs on a cluster which is highly overcommitted with lots of oversized VMs.

As we can see by the below graph, the CPU ready was more or less averaging 10% and on the 24th of July most vCPUs spiked to greater than 30% CPU ready each. ie: 30% of the time the server is waiting to be scheduled onto the pCPU cores.

The performance of applications using databases hosted on the server were suffering serious issues during this time.

On the  24th the VM was dropped from 4 vCPUs, down to 2 vCPUs and the results are obvious.

CPU ready dropped immediately (even in a heavily over-committed environment) to around 1% and CPU utilization remained at around the same levels. Performance also improved for applications (for example vCenter) using the database server.

TIP: Right Sizing not only helps the VM you right size, but it helps relieve the contention on the ESXi host (and cluster), which will improve performance for all VMs.

It is also important to point out this VM is the first VM to be right sized, so as more VMs are right sized in the cluster, Ready time will drop further and performance will continue to improve.

This also results in opportunities for greater consolidation within the environment without compromising performance or redundancy.

I would like to point out that I believe this server may benefit from 4 vCPUs, but definitely not in this highly CPU contended environment.

As more virtual machines are Right Sized, then this environment would likely have the opportunity to consider increasing vCPUs in suitable VMs after monitoring performance for a suitable period of time. Products like VMware vCenter Operations is excellent for reporting on Oversized and undersized VMs.

Do you believe in right sizing now?

Common Mistake: Using CPU reservations to solve CPU Ready

One of the more common problems I see in virtual environments is over sized virtual machines which typically results in lower performance, and your guessed it, high CPU Ready.

What is CPU Ready?

CPU ready is basically the time it takes a VM to be scheduled onto physical core after it is placed in the CPU scheduling queue.

What is High CPU Ready?

In my opinion, during peak load, anything above 2% (or 400ms) is a concern and should be monitored. Above 5% will be impacting performance (resulting in lower CPU utilization) and 10% or more, should be considered a serious problem and remediated immediately.

The below is a screenshot showing CPU ready from a recent test I conducted in my home lab

To calculate the percentage of CPU Ready, we divide the VMs “Summation” value (in the screen shot above its the “W2K8 CPU TEST VM 1” line by 20000 (ms) which is the statistics collection interval, then divide the result by the number of vCPUs in the VM.

So if we use the value from the “latest” column, its 7337 divide 20000, equals : 0.36685, then we divide that by 2 as the VM has 2 vCPUs and we end up with 0.183425

That’s 18% CPU Ready, which basically means 18% of the time, the VM is not doing anything!

Note: CPU Ready % can be found using ESXTOP or RESXTOP via the vMA or on the ESXi host directly.

Now to try and diagnose the Performance/CPU ready issue, we need to work out if the VM is oversized and if so, Right Size the VM.

What is an Oversized VM?

Basically a VM which has more compute resources assigned than it requires, for example, a VM which uses no more than 20% of its CPU and has 4 vCPUs.
What is Right Sizing?

In the above example, the VM is oversized as it doesn’t use more than 1vCPU (or 25%) of the CPU resources and therefore could be reduced to to 1 vCPU and run at 80%.
So the VM is oversized, and has High CPU ready, what happens when we right size it from 4vCPUs to 1vCPU and why does this help performance?
Its pretty simple, the less vCPUs a VM has, the easier job the CPU scheduler has to find enough physical cores to schedule the VM onto. If a cluster has a lot of oversized VMs, all the VMs are all competing for the same physical cores, and making it more and more difficult for the scheduler.

But what about setting a CPU Reservation? Don’t reservations “guarantee” resources?

The answer is, Yes and No.

The reservation “reserves” CPU resources measured in Mhz, but this has nothing to do with the CPU scheduler.

So setting a reservation will help improve performance for the VM you set it on, but will not “solve” CPU ready issues caused by “oversized” VMs, or by too high an overcommitment ratio of CPU resources.

In my testing I set an 80% reservation of a VMs 2 vCPUs worth of Mhz and prior to setting the reservation the CPU ready was ~20% and then CPU Ready did drop to around 10%. Note: This test was performed with only 25% overcommitment – 5 vCPUs on 4 physical Cores using CPUBUSY to keep the CPUs running at 100% (measured within the guest by Windows Task Manager).

I then set a 100% reservation of the VMs 2 vCPUs worth of Mhz, prior to setting the reservation the CPU ready was ~10% and CPU Ready did not get below 2.5% even with 100% reservation.

The result would have been exponentially worse had I tested with 50% or 100% overcommitment which is generally easily achieved with VMware and a well architected cluster. (I have seen well above these overcommitment numbers with no CPU ready issues).

Reducing CPU Ready down to 2.5% may sound like a pretty good result, but when we look at the other 3 x 1vCPU VMs on the host (4 core test ESXi 5 host) they had CPU ready of 40%!! Not to mention 2.5% is still not good!

If you have poor performance, and you discover you have High CPU Ready the best solution is  Right Size Your VMs!

I have recommended exactly that countless times and the customers never believe that performance can increase with less vCPUs, until after the Right Sizing exercise.

If after Right sizing, you still have CPU Ready, your overcommitment on CPU is simply to high for the workloads within your cluster.

You can address this by

1. Adding additional compute to the cluster. (Duh!)

2. Using Affinity rules to locate complimentary workloads together (Lots of small 1vCPU VMs which don’t have high CPU utilization will generally work well with a limited number of higher vCPU VMs)

3. Use Anti-Affinity rules to separate non complimentary workloads (eg: Don’t place all your 8vCPU VMs on one host with 300% overcommitment on CPU and expect them to work well).

4. Scaling out (not up) your VMs ie: Don’t have one 8 vCPU SQL DB server, use 4 smaller 2vCPU VMs

So now you know better than to use reservations to solve CPU contention.

Its time too go Right Sizing!

This simple task is about the best bang for buck you will get in your data center, since virtualizing on VMware in the first place.

The VCDX Application Process

I was asked by a person interested in attempting the VCDX if I could share my VCDX application / design, unfortunately as my application was based on an internal IBM project, it is strictly commercial in confidence.

However, I don’t think this is a huge problem as I can share my experience to assist potential candidates with their applications.

In the VCDX Certification Handbook and Application there are several sections, this post focuses on section 4.5 “Design Deliverable Documentation” and specifically the “A. Architectural design”.

Below is a screen shot of this section.

A piece of advise I shared in my post “The VCDX Journey” was that everything in your design is fair game for the VCDX panel to question you about. So for example if your design includes Site Recovery Manager OR vCloud Director , expect to answer questions about how your design caters for these products.

With that in mind, here are my tips.

Tip # 1 – Your design does not have to be perfect!

Don’t make the mistake of thinking you need to submit a design which follows every single “Best practice”, as this is very rare in reality. “Best Practice” is really a concept for VCP’s and to a lesser extent VCAP’s, a VCDX should be at a level of expertise too develop best practices, rather than follow.

Keep in mind, regardless of the architectural decision/s themselves,  You need to be able to justify them and align them to your “Requirements” , “Constraints” & “Assumptions” in both your documentation and the VCDX defense panel itself.

So you may have been forced to do something which is not best practice and that you wouldn’t recommend due to a “Constraint”. This is not a problem for the VCDX application, but be sure to fully understand the constraint and document in detail why the decision you made was the best opinion.

My design did not follow all best practices, nor was it the fastest or most highly available solution I could have designed. Ensure your aware of things which you could have done better, or could have changed if you did not have certain constraints, and document the alternatives.

I would suggest a design which complied with all best practices could be harder to defend, than one which had a lot of constraints preventing using best practices. As a candidate attempting to demonstrate your “Expert” level knowledge, working around constraints too meet your customer/s requirements would give you a better ability to show your thinking outside the square, so this goes for your documentation as well as the panel itself.

Tip # 2 – Don’t just fill out the VMware Solution Enablement Toolkit (SETs) template!

If your a VMware Partner, you will likely have access to VMware SETs. These a great resources which make doing designs easier (especially for people new to VMware architecture) however they are templates and anyone can fill out a template. As a VCDX applicant, you should be showing your “Expert” level knowledge / experience and innovation.

I personally have created my own template, which is a collaboration of numerous resources, including the SETs, but also has a lot of work I have created myself.

In my template I have a lot more detail than what can be found in the “SET” templates, and this I felt really assisted me in demonstrating my expert level knowledge.

For example I have a dedicated section for Architectural decisions where I had around 25 ADs for the design I submitted for VCDX, which covered not just specific VMware options, but Storage, Backup , network etc as these are all critical parts of a VMware solution. I could have have documented a lot more, but I ran out of time.

Tip # 3 – Document all your Requirements / Constraints / Assumptions and reference them.

Throughout your design, and especially your Architectural decisions, you should refer back to your Requirements, Constraints and Assumptions.

Doing this properly will assist the VCDX panel members who review your design to understand the solution. If the design document doesn’t give the reader a clear understanding of the solution then I would be surprised if you will be invited to defend.

During the VCDX defense, you should talk to how you designed too meet the Requirements and how the constraints impacted your design. You also should call out any assumptions, and discuss what risks or impacts these assumptions may have, this will be a huge help in your VCDX defense. so ensuring you have documented the ADs well for your application, is a big step towards your application being accepted.

Tip # 4 – Have your design peer reviewed

Where possible I always have my work reviewed by colleagues. Even VCAPs & VCDX’s make mistakes, so ensure you have your work reviewed. This is an excellent way to make sure your design makes sense, and is complete.

I touched on this in Tip # 3,  but make sure a person with zero knowledge of the solution, can read your design, and understand the solution. So get a review completed by somebody not involved with the project where possible.

Tip # 5 – Include information about Storage/Networking etc in your design

We all know, no VMware solution is complete without some form of Network & Storage, so ensure that your design has at least some high level details of the network & storage. This should assist you in other sections of your design document explaining your Architectural decisions, and give the reader a clearer picture of the whole environment.

Include diagrams of the end to end solution in an appendix so the reader can refer to them if any clarification is required.

Tip # 6 – Read the VCDX handbook and address each criteria.

As per the requirement document screen shot (above), the handbook actually tells you what VMware are looking for in your Architecture design.

It states “Including but not limited to: logical design, physical design, diagrams, requirements, constraints, assumptions and risks.”

In my design, Originally it didn’t in my opinion strictly meet all of the criteria, so I went back and added details to ensure I exceeded the criteria.

So in choosing what design to use for your application, my recommendation would be too not pick a small/simple design, but choose one which allows you to show your in depth knowledge and some innovation. This will make the application process a little more time consuming from a documentation point of view, but should increase your chance of success at the VCDX defense.

I hope this helps, and best of luck to anyone attempting the VCDX @ VMworld this year!