Time may go by so slowly in a human world but in the world of technology even the smallest of fluctuations in the speed of time can cause mayhem. All sorts of systems depend heavily on time, and importantly, the understanding that both it and all devices it communicates with share the same time is crucial. A lot of very serious (not to mention, annoying) problems can occur if the time isn't in sync. The Internet stays in sync through the use of the NTP and in this incredibly fun little series, I'm going to talk you through my journey of giving something back to the Internet.
What is NTP?
That's definitely a good place to start, NTP stands for Network Time Protocol and is a reasonably simple system which networked devices use to get the time. As with most network services, NTP has a client and a server. As you probably expect, the client requests the time from the server and the server will tell the client what it is. Now, there's a lot of devices on the Internet and when I say there's a lot I mean there's a LOT. There's routers, switches, cameras, various types of phone, laptops, PCs, tablets, servers, lightbulbs, "personal assistants" such as Alexa or Google home... The list goes on and on. At my house alone, there are a couple dozen devices actively in need of knowing the time - across the country it adds up to millions.
(pool.ntp.org server count)
In order to handle this load some people have taken it upon themselves to create "pools" of NTP servers. There's one such system, www.pool.ntp.org/ which consists of (at the time of writing) 2946 IPv4 and 1217 IPv4 servers globally - each one run by an individual or company who wishes to support the service. I decided I wanted to be a part of this and the server that runs this blog uses very little of its CPU, so it seems like the perfect fit (good job I spent all those hours optimizing things, eh?). As I'm writing this I'm realising just how much there is to cover so I think I'll split this into a little series. Here, I shall talk through what NTP is and...
How NTP works
Explaining every detail as to how NTP operates would be rather boring and I don't think my hands could cope with typing that much, so, I'm going to go over the main parts. Let's talk about how clients ask for the time, a nice easy thing to answer. Clients send a single packet to the server they've chosen (or been assigned - more on that later). Packets are individual units of data which are sent across packet-switched networks such as the Internet. Packets contain all the information they need to get to where they need to go as well as where they came from, if they're part of a series what number they are, what protocol they're using and of course, the payload (the information they're carrying). Data is broken down into reasonably small packets for efficient routing across a network but in the case of NTP, there's so little information to exchange that it can be done in just 2 packets. One packet is the request from a client to the server and the other is the server's reply.
We can dive right down into the depths of network communication and have a look at a packet (in an easy to interpret form thanks to Wireshark):
That is all the information carried in an NTP packet. Just 48 bytes (90 once you add all the bits necessary to get it across a network). The packet can be broken down to the following:
- 3 bits - Leap Indicator - Warns of an impending leap second.
- 3 bits - NTP Version number (4 is the latest).
- 2 bits - Mode - Mainly indicates if the packet is from an NTP client or server though there's some others.
- 8 bits - Stratum - level of the local clock. 8 bits allows for 255 values though only 1-16 are used, more on this later.
- 8 bits - Poll Interval - The maximum interval between messages in seconds to the nearest power of 2.
- 8 bits - Precision - Precision of the local clock in seconds to the nearest power of 2.
- 32 bits - Root Delay - Total roundtrip delay to the server's primary reference source (can be a positive or negative value).
- 32 bits - Root Dispersion - Estimate of the maximum error/variance between the server and it's time source.
- 32 bits - Reference clock identifier - Time source of the device which created the packet, NULL for clients.
- 64 bits - Reference Timestamp - Local time at which the local clock was last set or corrected.
- 64 bits - Originate Timestamp - Local time at which the client sent the request.
- 64 bits - Receive Timestamp - Local time at which the request was received by the server.
- 64 bits - Transmit Timestamp - Local time at which the reply was sent from the server.
Trust me when I say that was a lot more boring for me to type than it was for you to read. It does provide a lot of information, some of which I will be referencing later but there won't be a test (at least by me... probably). One little thing I want to include, though probably won't reference, is the accuracy to which the timestamps are set and when they are set from. Each 64-bit timestamp consists of 32 bits for seconds and another 32 bits for a fractional value of a second. 32 bits allows for around 136 years (expressed in seconds) and NTP uses an epoch of January 1st, 1900 so it'll be interesting to see what happens as February 7th, 2036 approaches.
Back to the packets, the packet from a client can be mostly empty because clients often don't know the time at all or aren't coded in a way that "fills out" the fields. Other than the version number and mode everything else can be blank (0 in computer speak). Though clients improve their accuracy as time goes on and more NTP requests are made which will need other fields filling out.
How do NTP Clients get the Time "Right"?
Probably the hardest thing to get your head around (at least for me) is how the clients can get the time to be so accurate. If a server sent just the time back, then it'd be wrong by the time the client received it due to processing and network delays. NTP overcomes this with some fairly straight forward mathematics:
Offset = (Server's Receive Timestamp - Client's Transmit Timestamp) + (Server's Transmit Timestamp - Client's Receive Timestamp) / 2
Round-trip delay = (Client's Receive Timestamp - Client's Transmit Timestamp) - (Server's Transmit Timestamp - Server's Receive Timestamp)
These two values, the offset and round-trip delay are then compared with the same values calculated from other NTP requests and the client progressively narrows any inaccuracies. This generally yields an accuracy of around +/- 50ms though can yield far better or worse results depending on the accuracy of the servers used, network delay and the accuracy of the client’s own clock (they tend to drift, some quite considerably). It's worth noting that if you begin with the clock within 68 years of the correct time NTP will be able to synchronise the time reliably so keep that in mind when your device hasn't been on for 68 years and one day.
That's a picture of pool.ntp.org's monitoring system's page for my NTP server. My server is in France and the monitoring system is in Los Angeles but even then, it still thinks my server maintains an accuracy of +/-2-4ms generally. There's two spots, one at the beginning and one in the middle of that graph where the accuracy drifts. These were both when I restarted the NTP service, so it had to start the syncing process again, demonstrating how it takes a couple of hours for it to home the time in.
What is this Stratum thing then?
Just like DNS with its caching servers, NTP uses its stratum system to spread load. Stratum 1 servers require a time source such as an atomic clock (very expensive) or a GPS receiver (considerably less but still reasonably expensive) as their time source. It would be incredibly expensive and impractical for all NTP clients to get their time directly from stratum 1 servers. For accuracy purposes, scenarios involving equipment such as that used in scientific or trading environments will want to use stratum 1 servers (or may even have their own stratum 0 device). For every other situation, the accuracy of a stratum 2-15 server will suffice - in fact, most clients likely can't maintain a time accurate enough to justify using a stratum 1 server. The load created by clients is of course amplified by the fact that it is common and good practice for a client to synchronise with multiple servers to improve accuracy.
So now we know why there's this stratum system it's probably a good idea to cover what makes other stratum levels that level. It's really quite straightforward, stratum 2 servers get their time from stratum 1, stratum 3 from stratum 2 and so forth (though each stratum can also synchronise with its own stratum for sanity checking and to provide a more stable time). The accuracy does decrease as you get down the chain but for the most part remains entirely usable. You can simply calculate the stratum of a device by adding one to the stratum of its time source. Stratum 1 through 15 are available although it'd be highly abnormal for a device to pass even stratum 4. Stratum 16 is used for devices whose time source is unsynchronised and 17 through 255 are reserved.
So, what am I doing?
Took a while to get here though I feel we've covered a lot of what I would like to believe is interesting ground. I have created a public stratum 2 time server and added it to www.pool.ntp.org/. You can synchronise your devices to it if you want, ntp2.owennelson.co.uk is the address. I will be going over details on how to create your own NTP server and the considerations you should make, some of the problems I have faced on my setup, and some interesting little things I have discovered along the way in future posts, so I look forward to seeing you there.
Short link: on-te.ch/ntp