Example Architectural Decision – Time Synchronization for Virtual Machines

Problem Statement

What is the best way to keep time synchronized within virtual machine guest operating systems?

Assumptions

1. ESXi hosts are using an accurate and reliable NTP server
2. A level of CPU overcommitment exists in the vSphere cluster

Motivation

1. Prevent the unlikely but possible event of CPU over commitment introducing time drift into guest operating systems

Architectural Decision

Do not use VMware Tools for Time Synchronization Source for Virtual Machines and Guest operating systems need to be configured to use an NTP server

Justification

1. Excessive overcommitment can cause timekeeping drift at rates that are uncorrectable by time synchronization utilities
2. This ensures time within virtual machines is not impacted by time drift in the event of CPU overcommitment
3. Ensure time will be consistent and provided by a central source for all virtual machines
4. NTP is a industry standard method of maintaining accurate time
5. Simplifies the process of maintaining time
6. Aviods the potential issue where Time runs too fast in a Windows virtual machine when the Multimedia Timer interface is usedSee VMware KB 1005953

Implications

1. Any/all templates need to be configured to use an NTP server within the guest operating system
2. All existing servers will need to be updated to use an NTP server within the guest operating system if they currently rely on the hypervisor (VMware Tools) for time

Alternatives

1. Use VMware Tools for time synchronization

One thought on “Example Architectural Decision – Time Synchronization for Virtual Machines

  1. Could you elaborate bit more on justification #1? I understand excessive overcommitment “can cause timekeeping drift at rates that are uncorrectable”, but how does NTP is better than VMware Tools in dealing with such situations?

    I am reading http://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf whitepaper here is the extract:
    “One specific problem occurs if native synchronization software happens to set the guest operating system clock forward to the correct time while the virtual machine has an interrupt backlog that it is in the process of catching up. Setting the guest operating system clock ahead is a purely software event that the virtual machine cannot detect, so it also does not detect that it should stop the catch-up process. As a result, the guest operating system clock continues to run fast until catch-up is complete, and it ends up ahead of the correct time. Fortunately, such events are infrequent, and the native synchronization software generally detects and corrects the error the next time it runs.
    Another specific problem is that native synchronization software might employ control algorithms that are tuned for the typical rate variation of physical hardware timer devices. Virtual timer devices have a more widely
    variable rate, which can make it difficult for the synchronization software to lock onto the proper correction factor to make the guest operating system clock run at precisely the rate of real time. As a result, the guest operating system clock tends to oscillate around the correct time to some degree. The native software might even determine that the timer device is broken and give up on correcting the clock. ”

    Based on that, VMware Tools time synchronization looks a bit better than native…

    Recommendation from the same paper: “Generally, it is best to use only one clock synchronization service at a time in a given virtual machine to ensure that multiple services do not attempt to make conflicting changes to the clock. So if you are using native synchronization software, we suggest turning VMware Tools periodic clock synchronization of”