Holdover Performance
Given an accurate reference clock as an input signal, one can achieve a well-synchronized clock. This is the fundamental principle of timing synchronization whereby a master clock drives the slave clock, always assuming that the master clock itself is stable and accurate.
Occasionally, there will be cases where the master clock becomes unavailable. During such periods, the local clock is no longer disciplined by a master (or reference) clock. Most synchronization-based applications need to know the amount of frequency drift that can occur during such events.
The time during which a clock does not have a reference clock to synchronize to is called clock holdover. In this state, the clock behaves like a flywheel that keeps spinning at a constant speed even when it is not being actively driven, so this is sometimes also referred to as flywheel mode. And the measure of the speed with which a slave clock drifts away from the reference or master clock is called its holdover performance. The clock’s ability to maintain the same frequency over an interval of time without a reference frequency being available is called frequency holdover. Similarly, time or phase holdover is the ability to maintain phase accuracy over an interval of time without the external phase reference.
This synchronization across network elements is achieved either via a physical signal (such as SyncE) or via precision time protocol (PTP) packets carrying timing synchronization information. There is even the ability to combine both approaches in a hybrid clocking architecture, whereby frequency is carried with SyncE and phase synchronization with PTP packets. Chapter 7, “Precision Time Protocol,” delves into this a little more.
A physical link failure (or fault) will break the frequency distribution via SyncE or interrupt the PTP packet path between slave and master. During such events, the slave clock will start slowly “drifting” away from the reference or master clock. This frequency and phase drift, besides depending on many external factors, also depends on components used for the local clock and the characteristics of the PLL circuitry.
Obviously, it is desirable for the drift from the reference clock during holdover to be small, although “small” is not a helpful measure and so it is defined in more quantitative terms. The measurement of holdover performance is done when either one, both, or all clock sources (frequency and phase) are removed, and the output of the slave clock is measured against a reference signal from the master clock.
The ITU-T recommendations specify MTIE and TDEV masks to compare the accepted holdover performance for each clock type (the masks are not as strict as the masks used when the clocks are locked to an external reference clock).
Two major factors determine holdover performance. These are primarily the internal components of a clock or PLL and secondly the external factors that could impact the clock output signals. Among the components, the oscillator (chiefly its stability) plays the most important role in providing better holdover performance from a clock. For example, a clock based on a rubidium (stratum 2) oscillator or an OCXO (stratum 3E) oscillator will provide better holdover characteristics than a stratum 4 oscillator.
Similarly, smaller variability in environmental conditions (especially temperature) during the holdover period will result in better holdover characteristics. For example, rubidium oscillators are very stable, but they are susceptible to being less so when exposed to temperature changes. Figure 5-22 provides a very rough comparison of holdover performance of clocks with different classes of oscillator. Note that Figure 5-22 is a graphical representation to show the magnitude of deviation with respect to time; either positive or negative deviations are possible for different oscillators.
Figure 5-22 Relative Holdover Performance for Different Classes of Oscillator
Hardware designers are constantly innovating to improve holdover performance. One such innovation is to observe and record the variations in the local oscillator while the reference signal is available, sometimes also referred to as history. When the reference signal is lost, this historical information is used to improve the preservation of the accuracy of the output signal. When using such techniques, the quality of this holdover data plays an important role in the holdover performance.
There are cases where multiple paths of synchronization are available, and not all are lost simultaneously. For the hybrid synchronization case, where frequency synchronization is achieved via physical network (SyncE) and phase synchronization is achieved via PTP packets, it may be that only one of the synchronization transport paths breaks (such as, only the PTP packet path or just the frequency synchronization path). For such cases, one of the synchronization modes could move into holdover.
For example, it could be that the PTP path to the master is lost (for example, due to logical path failure to the PTP master) but frequency synchronization (SyncE) is still available. In this case, the frequency stays locked to the reference and just the phase/time moves into holdover.
Why is holdover performance important, and how long should good holdover performance be required?
As an example, let us examine the case of a 4G LTE-Advanced cell site, which may have a phase alignment requirement to be always within ±1.5 μs of its master. This cell site is equipped with a GPS receiver and external antenna to synchronize its clock to this level of phase accuracy. Suppose this GPS antenna fails for some reason (say, a lightning strike). In that case, the clock in this cell site equipment will immediately move into holdover. The operation of this cell site can only continue so long as its clock phase alignment of that cell site radio stays within ±1.5 μs of accuracy.
Should this GPS antenna failure happen during the night, it might take some time for operations staff to discover and locate the failure, and then dispatch a technician to the correct location. If it was a long weekend, it might take quite some time for the technician to reach the location and fix the issue (assuming the technician knows what the problem is and has spares for the GPS antenna readily available).
Of course, it would be very convenient if, during this whole duration, the clock was able to stay within a phase alignment of ±1.5 μs and provide uninterrupted service to that area. If the maximum time to detect and correct this fault is (say) 24 hours, then for uninterrupted service, the clock on this cell site requires a holdover period of 24 hours for ±1.5 μs. In fact, the generally accepted use case at the ITU-T is that a clock should need holdover performance for up to 72 hours (considering a long weekend).
Note that this requirement (24 hours for ±1.5 μs) is quite a strict holdover specification and just about all currently deployed systems (cell site radios and routers) are not able to achieve this level of holdover. Being able to do it for 72 hours is very difficult and requires a clock with a very high-quality oscillator, such as one based on rubidium.
Adding to the difficulty, the weather conditions at a cell site (up some mountaintop) might be extremely challenging with only minimal control of environmental conditions for the equipment. Temperature is a very significant factor contributing to the stability of an oscillator, and that stability feeds directly into holdover performance. For this reason, the ITU-T G.8273.2 specification for a PTP boundary and slave clock defines the required holdover performance in absence of PTP packets for both cases—constant temperature and variable temperature.