Example Architectural Decision – vSphere Path Selection Plugin (PSP) for IBM SVC Storage

Problem Statement

What is the most suitable multipathing policy when using IBM SVC storage?

Requirements

1. Ensure maximum performance and availability for vSphere storage
2. Ensure storage performance is as consistent as possible

Assumptions

1. IBM SVC Storage which is Active/Active
2. VAAI is supported and enabled

Constraints

1. Solution must be supported

Motivation

1. Ensure optimal performance and redundancy
2. Minimize Latency

Architectural Decision

Use vSphere Native Multipathing Plugin (NMP) and configure “VMW_PSP_RR” (Round Robin) as the path selection policy.

Set the default PSP to “VMW_PSP_RR” (Round Robin) for SATP VMW_SATP_SVC so all new LUNs automatically use Round Robin

Justification

1. Round Robin helps ensure minimum average latency to the storage by using all available paths
2. Ensure performance is not degraded for some/all virtual machines due to a single HBA or connection being heavily utilized
3. Using “VMW_PSP_ FIXED” requires the paths to be manually load balanced to avoid thrashing a single path
4. Using “VMW_PSP_MRU” or “VMW_PSP_ FIXED” may lead to incosistent performance across the LUNs due to some paths being more heavily used than others
5. There is no MPP currently supplied by IBM for SVC storage
6. Round Robin is a supported configuration (Note: Although not specifically listed in the Compatability Matrix)

Alternatives

1. Use “VMW_PSP_FIXED” (Default) – Fixed Pathing
2. Use “VMW_PSP_MRU”  – Most Recently Used
3. Use vendor supplied Multipathing Plugin

Implications

1. None

vmware_logo_ads

Example Architectural Decision – Datastore (LUN) and Virtual Disk Provisioning

Problem Statement

In a vSphere environment, What is the most suitable disk provisioning type to use for the LUN and the virtual machines to ensure minimum storage overhead and optimal performance?

Requirements

1. Ensure optimal storage capacity utilization
2. Ensure storage performance is both consistent & maximized

Assumptions

1. vSphere 4.1 or later
2. VAAI is supported and enabled
3. Array level data replication is being used throughout the environment
4. Monitoring of the environment (including vSphere and Storage) is a manual process
5. The time frame to order new hardware (eg: New Disk Shelves) is a minimum of 3 months

Constraints

1. Block based storage

Motivation

1. Increase flexibility
2. Ensure physical disk space is not unnecessarily wasted

Architectural Decision

“Thick Provision” the LUN at the Storage layer and “Thin Provision” the virtual machines at the VMware layer

Justification

1. Simplified capacity management as only one layer (vSphere layer) needs to be monitored for capacity
2. The Free space shown by vSphere is actual usable storage
3. Reduces the chance of an “Out of Space” condition
4. Increases flexibility as all unused capacity of all datastores remains available
5. Creating VMs with “Thick Provisioned – Eager Zeroed” disks would increase the provisioning time
6. Creating VMs as “Thick Provisioned” (Eager or Lazy Zeroed) does not provide any significant benefit but adds a serious capacity penalty
7. Using Thin Provisioned virtual machines minimizes storage replication traffic on creation of virtual machines
8. Using Thick Provisioned LUNs reduces the requirement for fast turn around times for purchasing additional capacity
9. Monitoring is essential to successfully and safely use “Thin on Thin”

Alternatives

1.  Thin Provision the LUN and thick provision virtual machine disks (VMDKs)
2.  Thick provision the LUN and thick provision virtual machine disks (VMDKs)
3.  Thin provision the LUN and thin provision virtual machine disks (VMDKs)

Implications

1. No storage over commitment can occur on the physical array
2. The storage “consumed” will be reported differently between the vSphere Administrator and the Storage Administrator. The vSphere Administrator will see the true utilization, whereas the SAN administrator will see the “Consumed” & “Provisioned” values as the same
3. It is possible for a datastore to become overcommited, and as a result if not monitored the datastore may run out of free space which would result in an outage.

Related Articles

1. Datastore (LUN) and Virtual Disk Provisioning (Thin on Thin)

vmware_logo_ads

Example Architectural Decision – Guest OS Page File Storage in vSphere

Problem Statement

In a vSphere environment using deduplication and an array snapshot based backup solution, Guest OS page files are currently stored on the OS drive (VMDK) which reduces the effectiveness of deduplication as well as placing an overhead on the controllers having to scan data which cannot be deduplicated.

As the Guest OS Paging files are being included in the snapshot process (with the guest OS) this also demands additional capacity for both primary and secondary disk storage for disk to disk backups.

How can this overhead be minimized or eliminated?

Requirements

1. Make the most efficient use of the available storage capacity
2. Maintain consistent level of virtual machine / storage performance
3. Minimize the storage required for primary and secondary snapshot based backups
4. Maintain the array level snapshot based backup solution as it is required to meet RPO/RTOs
5. Maintain the use of deduplication and this has proven to decrease storage requirements and improve performance

Assumptions

1. vSphere 5.0 or later
2. VMFS 5 Datastores which are Thin Provisioned
3. Deduplication is in use for Volumes where Guest OS virtual disks are stored
4. VAAI is supported by the array and enabled across the vSphere environment
5. All datastores are presented to all hosts within the cluster
6. Snapshot based backup solution is being used
7. Virtual Machines are right sized
8. Disk to disk backup data is replicated offsite

Constraints

1. None

Motivation

1. Optimize the storage performance
2. Ensure Tier 1 storage is not wasted with transient files
3. Minimize storage required for snapshot based backups

Architectural Decision

Separate OS page files onto a dedicated VMDK, which will be located on a datastore (or datastore cluster) which is
1. Not Protected by the array level snapshot backup solution
2. Not running deduplication
3. Not running data compression

Justification

1. Allows page files to be stored on different underlying storage including (optionally) high capacity, lower cost, SATA disk
2. Relocating Guest OS page files to another datastore (or datastore cluster) not protected ny snapshots dramatically reduces the amount of Data being protected by the Snapshot based backup solution
3. Reduces the amount of data being replicated to secondary disk backup location/s thus minimizing the bandwidth requirements between datacenters
4. (Optionally) Ensures Tier 1 storage is only used for high performance guests
5. The result of the Virtual Machines being right sized the performance impact/frequency of paging should be minimal
6. Reduces the CPU cycles required for deduplication as data which cannot be deduplicated will not be scanned
7. Reduces the CPU cycles on the storage controllers by not attempting to compress page file data

Alternatives

1. Leave Page Files within the Virtual machines primary VMDK an accept the overhead on the backup solution
2. Turn of paging within the Guest OS (No Page File)

Implications

1. The additional steps of creating a dedicated VMDK for the VM and configuring the Guest OS to use the alternate location
2. Templates need to be updated to the above configuration
3. For environments using Site Recovery Manager,for protected virtual machines, some manual steps are required when setting up the virtual machines for the first time. This increases the work required during setup, however as this is a one time overhead, it is believed the benefit of reduced backup storage and replication traffic (for SRM) outweighs the one time overhead

vmware_logo_ads