Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning

Problem Statement

In a vSphere environment, What is the most suitable disk provisioning type to use for the LUN and the virtual machines to ensure minimum storage overhead and optimal performance?

Requirements

1. Ensure optimal storage capacity utilization
2. Ensure storage performance is both consistent & maximized

Assumptions

1. vSphere 4.1 or later
2. VAAI is supported and enabled
3. Array level data replication is being used throughout the environment
4. Monitoring of the environment (including vSphere and Storage) is a manual process
5. The time frame to order new hardware (eg: New Disk Shelves) is a minimum of 3 months

Constraints

1. Block based storage

Motivation

1. Increase flexibility
2. Ensure physical disk space is not unnecessarily wasted

Architectural Decision

“Thick Provision” the LUN at the Storage layer and “Thin Provision” the virtual machines at the VMware layer

Justification

1. Simplified capacity management as only one layer (vSphere layer) needs to be monitored for capacity
2. The Free space shown by vSphere is actual usable storage
3. Reduces the chance of an “Out of Space” condition
4. Increases flexibility as all unused capacity of all datastores remains available
5. Creating VMs with “Thick Provisioned – Eager Zeroed” disks would increase the provisioning time
6. Creating VMs as “Thick Provisioned” (Eager or Lazy Zeroed) does not provide any significant benefit but adds a serious capacity penalty
7. Using Thin Provisioned virtual machines minimizes storage replication traffic on creation of virtual machines
8. Using Thick Provisioned LUNs reduces the requirement for fast turn around times for purchasing additional capacity
9. Monitoring is essential to successfully and safely use “Thin on Thin”

Alternatives

1.  Thin Provision the LUN and thick provision virtual machine disks (VMDKs)
2.  Thick provision the LUN and thick provision virtual machine disks (VMDKs)
3.  Thin provision the LUN and thin provision virtual machine disks (VMDKs)

Implications

1. No storage over commitment can occur on the physical array
2. The storage “consumed” will be reported differently between the vSphere Administrator and the Storage Administrator. The vSphere Administrator will see the true utilization, whereas the SAN administrator will see the “Consumed” & “Provisioned” values as the same
3. It is possible for a datastore to become overcommited, and as a result if not monitored the datastore may run out of free space which would result in an outage.

Related Articles

1. Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

vmware_logo_ads

Example Architectural Decision – Guest OS Page File Storage in vSphere

Problem Statement

In a vSphere environment using deduplication and an array snapshot based backup solution, Guest OS page files are currently stored on the OS drive (VMDK) which reduces the effectiveness of deduplication as well as placing an overhead on the controllers having to scan data which cannot be deduplicated.

As the Guest OS Paging files are being included in the snapshot process (with the guest OS) this also demands additional capacity for both primary and secondary disk storage for disk to disk backups.

How can this overhead be minimized or eliminated?

Requirements

1. Make the most efficient use of the available storage capacity
2. Maintain consistent level of virtual machine / storage performance
3. Minimize the storage required for primary and secondary snapshot based backups
4. Maintain the array level snapshot based backup solution as it is required to meet RPO/RTOs
5. Maintain the use of deduplication and this has proven to decrease storage requirements and improve performance

Assumptions

1. vSphere 5.0 or later
2. VMFS 5 Datastores which are Thin Provisioned
3. Deduplication is in use for Volumes where Guest OS virtual disks are stored
4. VAAI is supported by the array and enabled across the vSphere environment
5. All datastores are presented to all hosts within the cluster
6. Snapshot based backup solution is being used
7. Virtual Machines are right sized
8. Disk to disk backup data is replicated offsite

Constraints

1. None

Motivation

1. Optimize the storage performance
2. Ensure Tier 1 storage is not wasted with transient files
3. Minimize storage required for snapshot based backups

Architectural Decision

Separate OS page files onto a dedicated VMDK, which will be located on a datastore (or datastore cluster) which is
1. Not Protected by the array level snapshot backup solution
2. Not running deduplication
3. Not running data compression

Justification

1. Allows page files to be stored on different underlying storage including (optionally) high capacity, lower cost, SATA disk
2. Relocating Guest OS page files to another datastore (or datastore cluster) not protected ny snapshots dramatically reduces the amount of Data being protected by the Snapshot based backup solution
3. Reduces the amount of data being replicated to secondary disk backup location/s thus minimizing the bandwidth requirements between datacenters
4. (Optionally) Ensures Tier 1 storage is only used for high performance guests
5. The result of the Virtual Machines being right sized the performance impact/frequency of paging should be minimal
6. Reduces the CPU cycles required for deduplication as data which cannot be deduplicated will not be scanned
7. Reduces the CPU cycles on the storage controllers by not attempting to compress page file data

Alternatives

1. Leave Page Files within the Virtual machines primary VMDK an accept the overhead on the backup solution
2. Turn of paging within the Guest OS (No Page File)

Implications

1. The additional steps of creating a dedicated VMDK for the VM and configuring the Guest OS to use the alternate location
2. Templates need to be updated to the above configuration
3. For environments using Site Recovery Manager,for protected virtual machines, some manual steps are required when setting up the virtual machines for the first time. This increases the work required during setup, however as this is a one time overhead, it is believed the benefit of reduced backup storage and replication traffic (for SRM) outweighs the one time overhead

vmware_logo_ads

Example Architectural Decision – Storage DRS Configuration for VMFS Datastores in a vCloud Environment

Problem Statement

In a production , self service vCloud Director environment, What is the most suitable Storage DRS configuration to improve storage utilization , performance, as well as reduce administrative effort for BAU staff?

Requirements

1. Make the most efficient use of the available storage capacity
2. Maintain consistent level of storage performance
3. Reduce the risk and overhead of capacity management
4. Reduce the risk of a unintentional or otherwise DoS event caused by self service

Assumptions

1. vSphere 5.0 or later
2. VMFS 5 Datastores which are Thick Provisioned
3. Deduplication is not in use
4. VAAI is supported by the array and enabled across the vSphere environment
5. All datastores in each respective Datastore clusters reside on the same RAID type with similar spindle types and count
6. All datastores are presented to all hosts within the cluster
7. Array level snapshots are not in use
8. IBM SVC Storage is being used
9. vCloud Director 5.1 or later
10. Storage I/O Control is enabled at set to 30ms

Constraints

1. IBM SVC storage does not currently support VASA (VMware API for Storage Awareness)

Motivation

1. Ensure production storage performance is not negatively impacted
2. Minimize the vSphere administrators workload where possible

Architectural Decision

Set the DRS automation setting to “Fully Automated”

  • Set “Utilized Space” threshold to 80%
  • Set “I/O latency” to 15ms
  • I/O Metric Inclusion – Enabled

Advanced Options

  • No recommendations until utilization difference between source and destination is: 10%
  • Evaluate I/O load every 8 Hours
  • I/O Imbalance threshold  4

Justification

1. Setting Storage DRS to “Fully Automated”  ensures that the administrator does not need to be concerned with initial placement of virtual machines as this will be dynamically and intelligently determined and executed

2. “XCOPY” is fully supported for Block based storage, as such, any Storage vMotion activity is offloaded to the array therefore removing the I/O overhead on the compute and storage fabric.

3. Where a significant I/O imbalance is detected by SDRS, the vSphere administrator is not required to take any action, Storage DRS can identify and remediate issues which fall outside parameters (which are determined by the VMware Architect) automatically. This improves the efficiency of the environment, and reduces the involvement of BAU.

4. SDRS provides valuable “initial placement” for new virtual machines which will help avoid a situation where datastores are unevenly balanced from a capacity perspective in the first place, therefore reducing the chance of virtual machines requiring migration.

5. Setting the “No recommendations until utilization difference between source and destination is” to 20% ensures that SDRS does not move virtual machines around where significant benefit is not realized  This prevents unnecessary Storage vMotion activity on the disk system, although this is offloaded from the host to the array, the I/O still may impact production performance for workloads on the same disk system.

6. Setting the “I/O Imbalance threshold (5 Aggressive / Conservative 1 ) to “4” (2nd most aggressive)  ensures that I/O imbalance should be addressed before significant imbalance is experienced by the end users. This level of “ aggressiveness” is acceptable as the Storage vMotion can be offloaded (via VAAI “XCOPY” primitive  and has almost zero impact on the host.  Setting this to “5” may result in minor I/O imbalances being corrected, at the cost of a Storage vMotion and as a result the impact of the more frequent Storage vMotion activity may negate the benefit of the I/O balancing.

7. Storage DRS will address I/O imbalance across the datastore cluster if the latency meets or exceeds the set value of 15ms (the default) and in the event of latency increasing during peak times to >=30ms , Storage I/O Control will ensure fair acess to the storage.

Alternatives

1. Use “No Automation (Manual Mode)”
2. Not use Storage DRS

Implications

1. When selecting datastores for the datastore cluster, having VASA enabled allows the “System Capability” column to be populated in the “New Datastore Cluster” wizard to ensure suitable datastores of similar performance, RAID type and features are grouped together. VASA is currently NOT supported by SVC, as such the datastore naming convension needs to accurately reflect the capabilities of the LUN/s to ensure suitable datastores are grouped together.

vmware_logo_ads