Example Architectural Decision – Host Isolation Response for IP Storage

Problem Statement

What are the most suitable HA / host isolation response when using IP based storage (In this case, Netapp HA Pair in 7-mode) when the IP storage runs over physically separate network cards and switches to ESXi management?

Assumptions

1. vSphere 5.0 or greater (To enable use of Datastore Heartbearting)
2. vFiler1 & vFiler2 reside on different physical Netapp Controllers (within the same HA Pair in 7-mode)
3. Virtual Machine guest operating systems with an I/O timeout of 190 seconds to allow for a Controller fail-over (Maximum 180 seconds)

Motivation

1. Minimize the chance of a false positive isolation response
2.Ensure in the event the storage is unavailable that virtual machines are promptly shutdown to minimize impact on the applications/data.

Architectural Decision

Turn off the default isolation address and configure the below specified isolation addresses, which check connectivity to multiple Netapp vFilers (IP storage) on the vFiler management VLAN and the IP storage interface.

Utilize Datastore heartbeating, checking multiple datastores hosted across both Netapp controllers (in HA Pair) to confirm the datastores themselves are accessible.

Services VLANs
das.isolationaddress1 : vFiler1 Mgmt Interface 192.168.1.10
das.isolationaddress2 : vFiler2 Mgmt Interface 192.168.2.10

IP Storage VLANs
das.isolationaddress3 : vFiler1 vIF 192.168.10.10
das.isolationaddress4 : vFiler2 vIF 192.168.20.10

Configure Datastore Heartbeating with “Select any of the clusters datastores taking into account my preference” and select the following datastores

  • One datastore from vFiler1 (Preference)
  • One datastore from vFiler2 (Preference)
  • A second datastore from vFiler1
  • A second datastore from vFiler2

Configure Host Isolation Response to: Power off.

Justification

1. The ESXi Management traffic is running on a standard vSwitch with 2 x 1GB connections which connect to different physical switches to the IP storage (and Data) traffic (which runs over 10GB connections). Using the ESXi management gateway (default isolation address) to deter main isolation is not suitable as the management network can be offline without impacting the IP storage or data networks. This situation could lead to false positives isolation responses.
2. The isolation addresses chosen test both data and IP storage connectivity over the converged 10Gb network
3. In the event the four isolation addresses (Netapp vFilers on the Services and IP storage interfaces) cannot be reached by ICMP, Datastore heartbeating will be used to confirm if the specified datastores (hosted on separate physical Netapp controllers) are accessible or not before any isolation action will be taken.
4. In the event the two storage controllers do not respond to ICMP on either the Services or IP storage interfaces, and both the specified datastores are inaccessible, it is likely there has been a catastrophic failure in the environment, either to the network, or the storage controllers themselves, in which case the safest option is to shutdown the VMs.
5. In the event the isolation response is triggered and the isolation does not impact all hosts within the cluster, the VM will be restarted by HA onto a surviving host.

Implications

1. In the event the host cannot reach any of the isolation addresses, and datastore heartbeating cannot access the specified datastores, virtual machines will be powered off.

Alternatives

1. Set Host isolation response to “Leave Powered On”
2. Do not use Datastore heartbeating
3. Use the default isolation address

For more details, refer to my post “VMware HA and IP Storage

The VCDX Application Process

I was asked by a person interested in attempting the VCDX if I could share my VCDX application / design, unfortunately as my application was based on an internal IBM project, it is strictly commercial in confidence.

However, I don’t think this is a huge problem as I can share my experience to assist potential candidates with their applications.

In the VCDX Certification Handbook and Application there are several sections, this post focuses on section 4.5 “Design Deliverable Documentation” and specifically the “A. Architectural design”.

Below is a screen shot of this section.

A piece of advise I shared in my post “The VCDX Journey” was that everything in your design is fair game for the VCDX panel to question you about. So for example if your design includes Site Recovery Manager OR vCloud Director , expect to answer questions about how your design caters for these products.

With that in mind, here are my tips.

Tip # 1 – Your design does not have to be perfect!

Don’t make the mistake of thinking you need to submit a design which follows every single “Best practice”, as this is very rare in reality. “Best Practice” is really a concept for VCP’s and to a lesser extent VCAP’s, a VCDX should be at a level of expertise too develop best practices, rather than follow.

Keep in mind, regardless of the architectural decision/s themselves,  You need to be able to justify them and align them to your “Requirements” , “Constraints” & “Assumptions” in both your documentation and the VCDX defense panel itself.

So you may have been forced to do something which is not best practice and that you wouldn’t recommend due to a “Constraint”. This is not a problem for the VCDX application, but be sure to fully understand the constraint and document in detail why the decision you made was the best opinion.

My design did not follow all best practices, nor was it the fastest or most highly available solution I could have designed. Ensure your aware of things which you could have done better, or could have changed if you did not have certain constraints, and document the alternatives.

I would suggest a design which complied with all best practices could be harder to defend, than one which had a lot of constraints preventing using best practices. As a candidate attempting to demonstrate your “Expert” level knowledge, working around constraints too meet your customer/s requirements would give you a better ability to show your thinking outside the square, so this goes for your documentation as well as the panel itself.

Tip # 2 – Don’t just fill out the VMware Solution Enablement Toolkit (SETs) template!

If your a VMware Partner, you will likely have access to VMware SETs. These a great resources which make doing designs easier (especially for people new to VMware architecture) however they are templates and anyone can fill out a template. As a VCDX applicant, you should be showing your “Expert” level knowledge / experience and innovation.

I personally have created my own template, which is a collaboration of numerous resources, including the SETs, but also has a lot of work I have created myself.

In my template I have a lot more detail than what can be found in the “SET” templates, and this I felt really assisted me in demonstrating my expert level knowledge.

For example I have a dedicated section for Architectural decisions where I had around 25 ADs for the design I submitted for VCDX, which covered not just specific VMware options, but Storage, Backup , network etc as these are all critical parts of a VMware solution. I could have have documented a lot more, but I ran out of time.

Tip # 3 – Document all your Requirements / Constraints / Assumptions and reference them.

Throughout your design, and especially your Architectural decisions, you should refer back to your Requirements, Constraints and Assumptions.

Doing this properly will assist the VCDX panel members who review your design to understand the solution. If the design document doesn’t give the reader a clear understanding of the solution then I would be surprised if you will be invited to defend.

During the VCDX defense, you should talk to how you designed too meet the Requirements and how the constraints impacted your design. You also should call out any assumptions, and discuss what risks or impacts these assumptions may have, this will be a huge help in your VCDX defense. so ensuring you have documented the ADs well for your application, is a big step towards your application being accepted.

Tip # 4 – Have your design peer reviewed

Where possible I always have my work reviewed by colleagues. Even VCAPs & VCDX’s make mistakes, so ensure you have your work reviewed. This is an excellent way to make sure your design makes sense, and is complete.

I touched on this in Tip # 3,  but make sure a person with zero knowledge of the solution, can read your design, and understand the solution. So get a review completed by somebody not involved with the project where possible.

Tip # 5 – Include information about Storage/Networking etc in your design

We all know, no VMware solution is complete without some form of Network & Storage, so ensure that your design has at least some high level details of the network & storage. This should assist you in other sections of your design document explaining your Architectural decisions, and give the reader a clearer picture of the whole environment.

Include diagrams of the end to end solution in an appendix so the reader can refer to them if any clarification is required.

Tip # 6 – Read the VCDX handbook and address each criteria.

As per the requirement document screen shot (above), the handbook actually tells you what VMware are looking for in your Architecture design.

It states “Including but not limited to: logical design, physical design, diagrams, requirements, constraints, assumptions and risks.”

In my design, Originally it didn’t in my opinion strictly meet all of the criteria, so I went back and added details to ensure I exceeded the criteria.

So in choosing what design to use for your application, my recommendation would be too not pick a small/simple design, but choose one which allows you to show your in depth knowledge and some innovation. This will make the application process a little more time consuming from a documentation point of view, but should increase your chance of success at the VCDX defense.

I hope this helps, and best of luck to anyone attempting the VCDX @ VMworld this year!