Nutanix support on vSphere – No Bull!

Recently it seems the spreading of Fear, Uncertainly and Doubt (FUD) about Nutanix has ramped up, probably due to Nutanix ongoing success and enormous growth.

Still It’s unfortunate when large companies, and especially people in senior positions at these companies try to bully smaller companies. Luckily at Nutanix, we’re like HoneyBadgers, and we don’t care. We Hyper-converge anyway!

honey_badger_don__t_care_by_gatorvenom-d40h28z

While this sort of attention is expected when you work for a disruptive start-up who according to independent sources such as IDC, put Nutanix as the hyperconverged market leader with 52% market share.

However when the FUD creates confusion for customers, that’s where it crosses the line and I need to correct the misinformation for the benefit of the customers, which is what Nutanix focus is, the customer experience.

The specific FUD I am talking about which was chucked around the blog/twittersphere in the last 24 hours is as follows:

  • “… Nutanix is not entitled to support VMware customers”
  • “Any support you (Nutanix) provide is not “Official”
  • “Your (support) model puts customers in a grey area”
  • “Nutanix (is) providing unofficial customer support for both VMware and Nutanix”
  • “Nutanix should be transparent to customers regarding what services they are entitled to provide, and which ones they aren’t”.

How would I describe the above comments, simple:  Total Bull (shit)!

Now, that made me think of an Australian company (Pedders – logo below) who made the term “No Bull” famous with a series of TV commercials about car servicing and suspension. The ads made light of car mechanics who overcharge people for unnecessary work on vehicles, and making the point Pedders giving the right advice and “No Bull”.

NoBull

So this post is inspired by Pedders and is about giving you the facts, and No Bull (Shit)!

So is Nutanix a supported platform for vSphere? YES!

How do I know this, I personally completed the hands on certification work and submitted the successful certification logs to VMware prior to them being approved. In fact, I am still involved in keeping our certifications up to date.

But if you don’t believe me, its easy enough to verify. All you need to do is check the Official VMware Hardware Compatibility List (HCL).

To do this simply visit http://www.vmware.com/go/hcl and select Storage/SAN as shown below.
HCL1

Next select the following:

  • Product Release Version : All
  • Partner Name : Nutanix
  • Storage Virtual Appliance Only : Yes
  • Features Category : All

HCL2
Hit Update and View Results and you will get a view like the below.

NutanixHCL

The above shows all Nutanix node types from the NX1000 series all the way to our All-Flash NX-9000 series with support for vSphere 5.1 through vSphere 6.0.

If you check on one of the node types you will see Nutanix is also supported for VAAI-NAS, and as I highlighted in my post “Not all VAAI-NAS solutions are created equal“, Nutanix supports all 4 VAAI-NAS primitives, not a subset which is common especially in the hyperconverged market.

Note: Some HCI solutions have no VAAI support at all!nutanixhclvaai

So with a quick check of the VMware HCL, we all can see Nutanix is a fully certified and supported solution by VMware Global Support Services (GSS).

What does this mean?

Put simply, Any Nutanix customer with up to date Support and Subscription (SnS) can call VMware GSS directly and get support.

Nutanix customers are also welcome and encouraged to contact Nutanix directly. As a result, customers get the best of both worlds, or choose which vendor they call based on the quality of support.

What does Nutanix support provide?

End to End support including (but not limited too):

  • The Hypervisor (ESXi, Hyper-V or KVM)
  • The Nutanix layer
  • Performance troubleshooting
  • Networking support
  • Application support such as MS SQL / MS Exchange / Oracle etc

So Nutanix support is really a “One throat to choke” service.

Not only is our service “One throat to choke”, Nutanix also has one of, if not the highest Net Promoter Score in the I.T industry, with a score of +88 out of a scale of -100 to +100.

Omega-award (1)

 

I challenge anyone to show me a company with a better NPS in the I.T industry!

Nutanix System Reliability Engineers (SREs) are more often than not Level 3 engineers. Nutanix does not hire the typical Level 1 engineer who essentially just takes a phone call and needs to escalate most calls to another engineer.

Nutanix has numerous VCAP level certified support engineers, as well as the ability to call on one or more of our 12 VMware Certified Design Experts (VCDXs) if/when required. I personally have been involved in numerous escalations, in some cases I have travelled to customer sites to investigate and drive to successful resolution of non Nutanix issues such as hypervisor bugs. The escalation was done free of charge, to ensure our customers have the best experience possible.

In the event Nutanix support finds a problem with something which is not Nutanix, such as a Hypervisor bug, we don’t hand you off to another vendor, we log the bug on your behalf, manage the case with the Hypervisor vendor and follow it through to conclusion.

To be clear, Nutanix is not an OEM partner of VMware, nor do we ship ESXi with our platform. Does this matter? Not at all. What it does mean is the support costs customers pay to VMware are not shared with Nutanix, that’s all.

Nutanix can and does provide support for vSphere, just as Systems Integrators (SIs), managed service providers and other non OEM VMware partners do.

To cover unusual situations, Nutanix is also member of TSAnet which is a multi-vendor support network along with Microsoft, VMware which ensures even in the event your problem is not strictly covered by a support contract, or its not a supported configuration, Nutanix will make every effort to ensure the problem is resolved directly with the other vendor/s via TSAnet.

This post was brought to you by my two favourite Chuck’s, and “No Bull”!

Chuck1 -Chuck_Norris-_01

The new standard in Enterprise Architecture certifications

I am very proud to have been selected to be part of a team of absolute superstars who in the last few months have developed what I believe will be the new standard in Enterprise Architecture certifications, the Nutanix Platform Expert (NPX).

The NPX was developed under the guidance of Lisa O’Leary, a PhD psychometrician and recognized authority in the development of expert-level panel-based assessments for the IT industry. This was a real eye opener for me into how to create a scoring rubric and how to ensure different examiners score as evenly as possible to ensure consistent results.

The NPX certification (along with Nutanix nu.School Education) is designed to produce and certify the best of the best enterprise architects with the main goal of ensuring customers get the best architects to design and deliver solutions which solve real world business problems while maximizing value and reducing ongoing costs.

During the development of NPX, myself and other members of the group basically decided that none of us should be able achieve NPX without each of us putting in significant time and effort to improve our skills, especially as it is required to demonstrate expertise both architecturally and hands on in multiple hypervisors and vendor software stacks. Considering the talent in the group, this was a big call!

I personally am enjoying the challenge of preparing my submission for the NPX based on a large scale project I am working on at the moment, and look forward to submitting my application and hopefully being invited to the Nutanix Design Review (NDR) to defend. I can already tell you this is more comprehensive than any single design I have done to date, and it will be a blast to defend.

So what will being an NPX mean?

Certified graduates of the NPX Program will have a very unique set of skills, including the demonstrated ability to deliver enterprise-class Web-scale solutions using multiple hypervisors and vendor software stacks on the Nutanix platform (VMware® vSphere®, Microsoft® Hyper-V®, and KVM).

This hypervisor agnostic certification for Enterprise Architects is a first in the industry; our groundbreaking approach allows an NPX the freedom to design cutting-edge Web-scale solutions for customers based solely on their business needs.

The depth and breadth of the solution design and delivery skills validated through our peer-vetted program make NPX the new standard for excellence. In accordance with program goals every NPX will be a superb technologist, a visionary evangelist for Web-scale, and a true Enterprise Architect – capable of designing and delivering a wide range of cutting-edge solutions; custom built to support the business goals of the Global 2000 and government agencies in every region of the world.

So what’s required to achieve NPX?

The first prerequisite is the Nutanix Platform Professional (NPP) certification. The NPP is really the entry level certification showing core Nutanix knowledge.

As per the NPX Application, the NPX certification is a two-stage process;

Stage 1 being a review of a candidate’s NPX Program Application.

If a candidate’s application is accepted they will be invited to participate in the NPX Design Review (NDR).

Now at this stage you’re probably saying, this doesn’t seem that hard, right?

Well, here is an idea of the required documentation:

  • A current state and operational readiness assessment
  • A Web-scale migration and transition plan
  • Documentation of specific business requirements driving the solution design
  • Documentation of assumptions that impacted the solution design
  • Documentation of design constraints that impacted the design and delivery of the solution
  • Documentation describing risks identified in the design and delivery of the solution and how those risks
  • A solution architecture including a conceptual/logical and physical design with appropriate diagrams and descriptions all functional components of the solution
  • Documentation of operational procedures and verification

The documentation set goes well beyond any certification I am aware of, but more importantly demonstrates a candidates ability to produce documentation which ensures the solution can be implemented , validated and operated in the event the lead architect is unavailable. This is a very high standard of documentation which I’ve rarely seen in my career.

In addition, 3 Professional references will also be required to validate the candidates experience.

Stage 2 being the NDR is modeled after an academic viva voce defense (live, oral exam) and requires candidates to present their solution to, and answer questions posed to them by NPX-Certified Examiners (NCE). The NDR also includes a series of hands-on exercises, which must be completed by the candidate. Successful completion of both stages is required to earn the NPX credential.

The NPX has a strict policy regarding fictitious solution designs.

NPX candidates may not submit wholly fictitious designs.

I pushed for this during the development of the certification as in my opinion, an enterprise architect should have a portfolio of work to choose from which negates the requirements to create a fictitious design.

In saying that, Partially fictitious designs are permitted when an existing design requires additions or enhancements in order to demonstrate competence in required knowledge areas (e.g., a backup or DR solution may be added if this component was outside the scope of the original design).

Adapting an existing 3-tier solution design to the Nutanix platform is also permitted. In either case the submitted design should contain a majority of solution components architected to support applications with service level agreements specified by actual business stakeholders.

The NDR itself requires the completion of an exercise involving a live Nutanix environment and completion of a design scenario. Both exercises will require demonstration of NPX-level solution design and delivery skills with a second solution stack/hypervisor.

An NPX candidate is permitted to choose the hypervisor you will be tested on during your NDR (it must be different from the hypervisor utilized in the submitted solution design). The hypervisor selected will be used for the Hands-on and Design scenarios during the NDR.

The Hypervisor choices are:

  • VMware® vSphere®
  • Microsoft® Hyper-V®
  • KVM

What next?

I would encourage all enterprise architects to stay tuned for the release of more NPX details via the Nutanix nu.School website and take on the challenge of NPX and become a better architect in the process.

The Nutanix Platform Expert Official Certification Guide is currently being written and should be released at Nutanix .NEXT this coming June.

Summary:

I really enjoyed working with such a talented group of people in developing NPX, and I look forward to being a part of the program firstly as a candidate and as a certified examiner in the future to ensure the quality of Enterprise Architects in the industry only gets better!

Here is a group shot of on the final day of NPX development in San Jose.

Names (Left to right): Derek Seaman , Steven Poitras, Jon Kohler, Ray Hassan, Bas Raayman, Raymon Epping, Josh Odgers, Michael Webster, Artur Krzywdzinski, Samir Roshan, Lane Laverett, Mark Brunstad and Richard Arsenian.

Absent for Photo: Magnus Andersson , Lisa O’Leary, PhD Psychometrician.

NPXDevTeam

Calculating Actual Usable capacity? It’s not as simple as you might think! – Part 2 Nutanix

In Part 1, the example provided showed usable capacity for a SAN/NAS using a combination of RAID 10, RAID 5 and RAID 6 along with the various sizing considerations resulted in 35.68TB usable capacity or approx 1/3rd of the RAW 100TB.

In Part 2 we will discuss the misconception that Nutanix (a Hyper-converged platform) provides lower effective usable capacity compared to SAN or NAS solutions.

At a high level, Nutanix uses Replication Factor 2 (RF2) which has the same overhead as RAID 1 so straight away a lot of people jump to the conclusion that the usable capacity is less that a traditional SAN/NAS because *insert your favourite RAID level here* has less overhead.

Let’s say we have a Nutanix cluster with 100TB Raw storage using the most common node type, the NX3050.

Now let’s address the same points as we did in Part 1 for the SAN/NAS example:

So starting with the same 100TB RAW as we did for the SAN/NAS example and see where things end up on Nutanix.

1. Deducting hot spare drives

Nutanix does not use hot spare drives, data is balanced across all drives in the “Storage Pool”. To cater for failure, it is recommended to size for N+1 for Resiliency Factor 2 (RF2) deployments. If we we’re using NX3050 nodes (the most popular Nutanix node) then the overhead of N+1 would be ~4.8TB RAW.

100TB – N+1 Node (4.8TB RAW) = 95.2TB

2. RAID Overhead

Nutanix doesn’t use RAID, but the Replication Factor 2 has an overhead is 50% (the same as RAID10).

95.2TB – 50% (RF2) = 47.6TB remaining

3. Free Space on the platform required to ensure performance

For Nutanix all write I/O goes to either the Extent Storage or Oplog, both of which are housed on the SSD tier. All random writes are serviced by the Oplog until it reaches 95% capacity at which point the oplog is bypassed.

As such, performance remains high until 95% capacity. Therefore only 5% free capacity is required to ensure high performance.

47.6TB – 5% (Free space for performance) = 45.2TB

FYI: Nutanix Performance and Engineering team members including myself typically conduct benchmarks at greater than 90% cluster capacity.

4. Free space per LUN

Nutanix does not use LUNs. Nutanix presents containers to the hypervisor. All containers are thin provisioned and all containers can use all available space in the storage pool. Meaning free space only needs to be managed at the Storage Pool layer, not at each individual container.

As we have already taken into account the 5% free space there is no need to take another 5% of space therefore we remain at 45.2TB usable.

5. Free space per VMDK

As with physical servers and SAN/NAS environments, we don’t want our VMs drives running out of capacity, as a result it is common to size VMDKs well above what is strictly required to make capacity management (operational tasks) easier.

As mentioned in Part 1, I typically see architects recommending upwards of 10-20% free space per VMDK over and above what is required to account for unexpected growth, OS patching etc. This makes perfect sense for the same reason as we have free space per LUN because if space runs out for a VM, it’s another bad day for I.T.

For this example, I will assume the same 10% free space per VMDK as I did for SAN/NAS example, the difference with Nutanix is performance remains the same regardless of the VMDK being Thick or Thin provisioned, so with every VM Thin Provisioned, no capacity is required to be reserved for free space within VMDK files as it would be for traditional environments requiring Eager Zero Thick VMDKs for performance..

So we’re still at 45.2TB usable.

Now where are we at?

So far, the first 5 points are fairly easy to calculate.

Next we will look at various factors which further reduce usable capacity for SAN/NAS and see how they apply to Nutanix.

6. Silos for Performance

Nutanix does not require nor recommend silos being created for performance reasons. All VMs can reside in a single container therefore no capacity is unusable as a result of performance requirements.

As no silos are required for maximum performance, we are still at 45.2TB usable.

7. Silos of (or Fragmented) Usable Capacity

Nutanix does not configure usable capacity to containers, a container can use all the available storage in the underlying Storage Pool. Where multiple containers are provisioned, each container can see the total capacity of the storage pool while providing logical separation of the VMs within the containers. This avoids the issue of fragmented free capacity.

The diagram below shows 5 containers hosted by an example Nutanix cluster (Storage Pool) with 100TB total capacity, each container has a capacity of 100TB and 25TB free space in alignment with the underlying storage pool.

NutanixFreeSpace

In this case, when creating a new VM, or adding or expanding VMDKs for existing VMs, it does not matter which container we place the VM, as long as it is less than the 25TB available in the pool, it makes no difference to capacity.

This removes the requirement for complex capacity management, or using Storage DRS and Storage vMotion.

So we’re still at 45.2TB usable.

Other factors which reduce usable capacity?

8. LUN Provisioning Type

In many cases, especially when talking about high performance applications, storage vendors recommend using Thick Provisioned LUNs and as mentioned in Part 1, It’s anyone’s guess how much space is wasted as a result.

But with Nutanix, all containers are Thin Provisioned so no capacity is wasted on Thick Provisioning and performance is optimal

9. Wasted Capacity from using SSDs as Cache

Nutanix does not use SSDs as Cache! The SSD’s form part of the Extent Store which is for persistent data storage. The OpLog which is also on SSD is also persistent and not a “cache”. As such, no capacity is being reduced as a result of caching.

10. Snapshot Reserves

Nutanix does not use reserve capacity for snapshots. Snapshots simply use available capacity in the storage pool. If you don’t use snapshots, no space is wasted, if you do use snapshots, then the delta changes are stored. Simple as that.

Summary:

From the 100TB RAW factoring in what is a realistic Nutanix configuration including N+1 to tolerate a node failure and support the cluster being able to fully self heal the effective usable capacity is 45.2TB which is just under 50% of 100TB RAW.

This is a very simple configuration to manage from both a performance and capacity perspective, and one which is easily calculated and repeatable.

If the Resiliency Factor was 3 (which IMO is rarely if ever required) across the entire environment (which again would be extremely unusual as VMs which require RF3 can be configured in an RF3 container) then the usable capacity would be ~30TB which is only sightly below the SAN/NAS example and RF3 delivers higher resiliency.

In reality, >95% of workloads should be deployed on RF2, with a very small number of VMs possibly using RF3. In reality RF2 is extremely resilient and self healing so IMO RF3 is rarely required.

So in conclusion, Nutanix usable capacity is ~50% of RAW capacity, the difference between Nutanix and traditional SAN/NAS is you actually can use almost all the “usable” capacity and maintain optimal performance with little/no complexity.

Nutanix also has data reduction technologies such as Compression and De-duplication, along with intelligent cloning to increase the effective capacity of the storage pool.

While I believe Nutanix’ usable capacity today is excellent especially when considering how resilient RF2 is and comparing usable capacity to many products on the market, Nutanix has the advantage of not being constrained by legacy technologies such as RAID, so I’ll leave you with a little teaser:

Usable capacity will be improving significantly in upcoming releases of Nutanix Operating System. 🙂