Example VMware vNetworking Design for IP Storage

On a regular basis, I am being asked how to configure vNetworking to support environments using IP Storage (NFS / iSCSI).

The short answer is, as always, it depends on your requirements, but the below is an example of a solution I designed in the past.

Requirements

1. Provide high performance and redundant access to the IP Storage (in this case it was NFS)
2. Ensure ESXi hosts could be evacuated in a timely manner for maintenance
3. Prevent significant impact to storage performance by vMotion / Fault Tolerance and Virtual machines traffic
4. Ensure high availability for ESXi Management / VMKernel and Virtual Machine network traffic

Constraints

1. Four (4) x 10GB NICs
2. Six (6) x 1Gb NICs (Two onboard NICs and a quad port NIC)

Note: So in my opinion the above NICs are hardly “constraining” but still important to mention.

Solution

Use a standard vSwitch (vSwitch0) for ESXi Management VMKernel. Configure vmNIC0 (Onboard NIC 1) and vmNIC2 (Quad Port NIC – port 1)

ESXi Management will be Active on vmNIC0 and vmNIC2 although it will only use one path at any given time.

Use a Distributed Virtual Switch (dvSwitch-admin) for IP Storage , vMotion and Fault Tolerance.

Configure vmNIC6 (10Gb Virtual Fabric Adapter NIC 1 Port 1) and vmNIC9 (10Gb Virtual Fabric Adapter NIC 2 Port 2)

Configure Network I/O with NFS traffic having a share value of 100 and vMotion & FT will each have share value of 25

Each VMKernel for NFS will be active on one NIC and standby on the other.

vMotion will be Active on vmNIC6 and Standby on vmNIC9 and Fault Tolerance vice versa.

vNetworking Example dvSwitch-Admin

Use a Distributed Virtual Switch (dvSwitch-data) for Virtual Machine traffic

Configure vmNIC7 (10Gb Virtual Fabric Adapter NIC 1 Port 2) and vmNIC8 (10Gb Virtual Fabric Adapter NIC 2 Port 1)

Conclusion

While there are many ways to configure vNetworking, and there may be more efficient ways to achieve the requirements set out in this example, I believe the above configuration achieves all the customer requirements.

For example, it provides high performance and redundant access to the IP Storage by using two (2)  VMKernel’s each active on one 10Gb NIC.

IP storage will not be significantly impacted during periods of contention as Network I/O control will ensure in the event of contention that the IP Storage traffic has ~66% of the available bandwidth.

ESXi hosts will be able to be evacuated in a timely manner for maintenance as

1. vMotion is active on a 10Gb NIC, thus supporting the maximum 8 concurrent vMotion’s
2. In the event of contention, worst case scenario vMotion will receive just short of 2GB of bandwidth. (~1750Mb/sec)

High availability is ensured as each vSwitch and dvSwitch has two (2) connections from physically different NICs and connect to physically separate switches.

Hopefully you have found this example helpful and for a example Architectural Decision see Example Architectural Decision – Network I/O Control for ESXi Host using IP Storage

Example Architectural Decision – Time Synchronization for Virtual Machines

Problem Statement

What is the best way to keep time synchronized within virtual machine guest operating systems?

Assumptions

1. ESXi hosts are using an accurate and reliable NTP server
2. A level of CPU overcommitment exists in the vSphere cluster

Motivation

1. Prevent the unlikely but possible event of CPU over commitment introducing time drift into guest operating systems

Architectural Decision

Do not use VMware Tools for Time Synchronization Source for Virtual Machines and Guest operating systems need to be configured to use an NTP server

Justification

1. Excessive overcommitment can cause timekeeping drift at rates that are uncorrectable by time synchronization utilities
2. This ensures time within virtual machines is not impacted by time drift in the event of CPU overcommitment
3. Ensure time will be consistent and provided by a central source for all virtual machines
4. NTP is a industry standard method of maintaining accurate time
5. Simplifies the process of maintaining time
6. Aviods the potential issue where Time runs too fast in a Windows virtual machine when the Multimedia Timer interface is usedSee VMware KB 1005953

Implications

1. Any/all templates need to be configured to use an NTP server within the guest operating system
2. All existing servers will need to be updated to use an NTP server within the guest operating system if they currently rely on the hypervisor (VMware Tools) for time

Alternatives

1. Use VMware Tools for time synchronization

Example Architectural Decision – Network Failover Detection Policy

Problem Statement

What is the most suitable network failover detection policy to be used on the vSwitch or dvSwitch NIC team/s in an environment which uses IP storage and has only 2 physical NICs per vSwitch or dvSwitch?

Assumptions

1. vSphere 5.0 or greater
2. Storage is presented to the ESXi hosts is NFS via Multi Switch Link Aggregation
3. A maximum of 2 physical NICs exist per dvSwitch
4. Physical Switches support “Link state tracking”

Motivation

1. Ensure a reliable network failover detection solution
2. Ensure Multi switch link aggregation can be used for IP storage

Architectural Decision

Enable “Link state tracking” on the physical switches and Use “Link Status”

Justification

1. To work properly, Beacon Probing requires at least 3 NICs for “triangulation”  otherwise a failed link cannot be determined.
2.“Link state tracking” can be enabled on the physical switch to report upstream network failures where an “edge” & “core” network topology is used, therefore preventing the link status from being OK when traffic cannot reach the destination due to an upstream failure
3. Beacon Probing and the “route based on IP hash” network load balancing option is not compatible which prevents a single VMKernel being able to use multiple interfaces for IP storage traffic

Implications

1. Link state tracking needs to be supported and enabled on the physical switches

Alternatives

1. Use “Beacon Probing”