An exciting new adventure for this VCDX

I am very pleased to announce that I have decided to take on a new challenge and will be joining the innovative team at Nutanix starting July this year in the Solutions and Performance engineering team.

It was only a few short months ago when I first discovered what Nutanix was all about, after previously seeing the classic “No SAN” advertisements on various blogs and at VMworld in 2012, and embarrassingly I have to admit I did not make the time to look into the solution.

Since then, I have spent a lot of time looking into the Nutanix solution and have spoken to a number of people in the industry including several members of the Nutanix family. It has become obvious to me why Nutanix is one of the most successful and fastest growing start-ups in the industry, although im not sure I’d call Nutanix a “Start-up” any more.

The linearly scale out solution provided by Nutanix aligns perfectly with the virtualization best practices that most of us have known for many years, and combines PCIe SSD (Fusion-io) with SATA SSDs and high capacity SATA drives into a high performance , hyper-converged 2RU platform.

Over my many years in the industry I can recall countless scenarios where the Nutanix solution would have been a perfect fit, and solved numerous problems, both at the technical/architectural level and importantly at the business level for both SMB and Enterprise customers.

Now with the release of a wider range of Nutanix blocks including the NX-1000 and NX-6000, the solution is becoming more and more attractive.

In my role I will be part of the team who is responsible for creating high performance solutions and developing best practice guides, reference architectures and case studies for things like virtualization of business critical applications on the Nutanix platform.

A lot of people are already aware of how good the platform is for virtual desktops, but I am not only focused on showing how good the solution is for VDI, but for a wider range of workloads, including Business Critical Applications / server and Big Data workloads.

I am very much looking forward to being a significant part of this exciting company, which already boasts exceptional talent, including two VCDXs in Jason Langone @langonej & Lane Leverett @wolfbrthr . So I am very pleased to be working along side such talent and to be the third VCDX in the Nutanix family.

As I have been doing for the last year or so, I intend to continue to share my experience with the virtualization community via Twitter, Blogging, VMUGs etc, which will now include (but not be limited too) the Nutanix platform.

So stay tuned as the Nutanix team and I have a number of very interesting projects coming up in the next few weeks and months which I cant wait to share with you.

If your not already familiar with what Nutanix is all about, here are a couple of quick introductory YouTube videos which I highly recommend you take the time to watch (as I wish I had sooner!)

About Nutanix | How Nutanix Works | 8 Strategies for a Modern Datacenter

nosan

Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

Problem Statement

In a vSphere environment, What is the most suitable disk provisioning type to use for the LUN and the virtual machines to ensure minimum storage overhead and optimal performance?

Requirements

1. Ensure optimal storage capacity utilization
2. Ensure storage performance is both consistent & maximized

Assumptions

1. vSphere 5.0 or later
2. VAAI is supported and enabled
3. The time frame to order new hardware (eg: New Disk Shelves) is <= 4 weeks
4. The storage solution has tools for fast/easy capacity management

Constraints

1. Block Based Storage

Motivation

1. Increase flexibility
2. Ensure physical disk space is not unnecessarily wasted

Architectural Decision

“Thin Provision” the LUN at the Storage layer and “Thin Provision” the virtual machines at the VMware layer

(Optional) Do not present more LUNs (capacity) than you have underlying physical storage (Only over-commitment happens at the vSphere layer)

Justification

1. Capacity management can be easily managed by using storage vendor tools such eg: Netapp VSC / EMC VSI / Nutanix Command Center
2. Thin Provisioning minimizes the impact of situations where customers demand a lot of disk space up front when they only end up using a small portion of the available disk space
3. Increases flexibility as all unused capacity of all datastores and the underlying physical storage remains available
4. Creating VMs with “Thick Provisioned – Eager Zeroed” disks would unnessasarilly increase the provisioning time for new VMs
5. Creating VMs as “Thick Provisioned” (Eager or Lazy Zeroed) does not provide any significant benefit (ie: Performance) but adds a serious capacity penalty
6. Using Thin Provisioned LUNs increases the flexibility at the storage layer
7. VAAI automatically raises an alarm in vSphere if a Thin Provisioned datastore usage is at >= 75% of its capacity
8. The impact of SCSI reservations causing performance issues (increased latency) when thin provisioned virtual machines (VMDKs) grow is no longer an issue as the VAAI Atomic Test & Set (ATS) primitive alleviates the issue of SCSI reservations.
9. Thin provisioned VMs reduce the overhead for Storage vMotion , Cloning and Snapshot activities. Eg: For Storage vMotion it eliminates the requirement for Storage vMotion (or the array when offloaded by VAAI XCOPY Primitive) to relocate “White space”
10. Thin provisioning leaves maximum available free space on the physical spindles which should improve performance of the storage as a whole
11. Where there is a real or perceved issue with performance, any VM can be converted to Thick Provisioned using Storage vMotion not disruptivley.
12. Using Thin Provisioned LUNs with no actual over-commitment at the storage layer reduces any risk of out of space conditions while maintaining the flexibility and efficiency with significantly reduce risk and dependency on monitoring.
13. The VAAI UNMAP primitive provides automated space reclamation to reduce wasted space from files or VMs being deleted

Alternatives

1.  Thin Provision the LUN and thick provision virtual machine disks (VMDKs)
2.  Thick provision the LUN and thick provision virtual machine disks (VMDKs)
3.  Thick provision the LUN and thin provision virtual machine disks (VMDKs)

Implications

1. If the storage at the vSphere and array level is not properly monitored, out of space conditions may occur which will lead to downtime of VMs requiring disk space although VMs not requiring additional disk space can continue to operate even where there is no available space on the datastore
2. The storage may need to be monitored in multiple locations increasing BAU effort
3. It is possible for the vSphere layer to report sufficient free space when the underlying physical capacity is close to or entirely used
4. When migrating VMs from one thin provisioned datastore to another (ie: Storage vMotion), the storage vMotion will utilize additional space on the destination datastore (and underlying storage) while leaving the source thin provisioned datastore inflated even after successful completion of the storage vMotion.
5.While the VAAI UNMAP primitive provides automated space reclamation this is a post-process, as such you still need to maintain sufficient available capacity for VMs to grow prior to UNMAP reclaiming the dead space

Related Articles

1. Datastore (LUN) and Virtual Disk Provisioning (Thin on Thick)CloudXClogo

 

Storage DRS Configuration – Architectural Decision making flowchart

I was speaking to a number of people recently, who were trying to come up with a one size fits all Storage DRS configuration for a reference architecture document.

As Storage DRS is a reasonably complicated feature, it was my opinion that a one size fits all would not be suitable, and that multiple examples should be provided when writing a reference architecture.

A collegue suggested a flowchart would assist in making the right decision around Storage DRS, so I took up the challenge to put one together.

The below is my version 0.1 of the flowchart, which I thought I would post and hopefully get some good feedback from the community, and create a good guide for those who may not have the in-depth knowledge or experience, too choose what should be in most cases an appropriate configuration for SDRS.

This also compliments some of my previous example architectural decisions which are shown in the related topic section below.

As always, feedback is always welcomed.

I hope you find this helpful.

* Updated to include the previously missing “NO” option for Data replication.

SDRS flowchart V0.2

Related Articles

1. Example Architectural Decision – Storage DRS configuration for NFS datastores

2. Example Architectural Decision – Storage DRS configuration for VMFS datastores