This might be wonky, so if you’re not into that sort of thing, you can save yourself 5 minutes and skip this one.
Enterprise installations of software usually consists of several semi-autonomous clusters (or sites) that are arranged in either some sort of hierarchy. Usually a tree-like structure, but it could be a mesh or graph.
Each cluster may have hundreds or even thousands of sensors (not just cameras) that all can be described by their state. A camera may be online, it may be recording, a PIR might be triggered or not and so on. Basically, sensors have three types of properties: Binary, scalar and complex ones (an LPR reader that returns tag and color of a car).
The bandwidth inside each cluster is usually decent (~GBit), and if it isn’t an upgrade of the network is usually a one time fee. Sure, there might be sensors that are at remote locations where the bandwidth is low, but in most cases bandwidth is not an issue inside the cluster. Unfortunately, the bandwidth from the cluster to the outside world is usually not as ample.
It’s not uncommon that a user at the top of the hierarchy will see a number of clusters, and it seems reasonable that this user also wants to see the state of each of the sensors inside the cluster. This, then requires that the state information of each sensor is relayed from the cluster to the client. The client may have a 100 Mbit downstream connection, but if the upstream path from the cluster to the client is a mere 256 Kbit/s then getting the status data across in real time is a real problem.
In the coding world there’s an obsession with statelessness. Relying on states is hard, so is deleting memory after you’ve allocated it, and debugging binary protocols isn’t cakewalk either, so programmers (like me) try to avoid these things. We have garbage collection (which I hate), we have wildly verbose and wasteful containers such as SOAP (which I hate) and we have polling (which I hate).
So, let’s consider a stateless design with a verbose container (but a lot more terse than soap): The client will periodically request the status.
<sensor id="long_winded_guid_as_text" type="redundant_type_info">
The client request can be optimized, but it’s the server that is having the biggest problem. If we look at the actual info in the packet, we are told that a sensor, with a guid, is ON.
Let’s examine the entropy of this message.
The amount of data needed to specify the sensor depends heavily on the number of sensors in the system. If there’s just 2 sensors in total, then the identifier requires 1 bit of information (0 or 1), if there are 4 sensors, you’ll need 2 bits (00, 01, 10 and 11), and so if there are 65000 sensors, you’ll need 16 bits (2 bytes).
The state is binary, so you’ll need just a single bit to encode the state (0 for off, 1 for on).
However, since the state information can be of 3 different types, you’ll need a status type identifier as well. However, this information only needs to be encoded in the message if the type actually changes. If the status type stays the same, it can be cached on the receiving side. However, in order to remain stateless, we’re going to need 2 bits to encode the type (00 for binary, 01 for scalar, 10 for complex).
so, for a system with 60.000 sensors, a message might look like this
[ sensor id ] [ type ] [ value ]
0000 0000 0000 0001 00 1
19 bits in total
The wasteful message is 142 bytes long, or 1136 bits, or 59 times larger, and it is actually pretty terse compared to some of the stuff you’ll see in the real world. In terms of bandwidth, the compact and “archaic” binary protocol can push 59 times as many messages through the same pipe!
Now, what happens if we remove the statelessness? We could cache the status type on the receiving side, so we’d get down to 17 bits (a decent improvement), but we could also decide to not poll, and instead let the server push data only when things change.
The savings here are entirely dependent on the configuration of the system, and sensors with scalar values probably have to continuously send the value. A technique used when recording GPS info is to only let the device send its position when it has changed more than a certain threshold or when a predefined duration has passed. That means that if you’re stationary, we only have location samples at a 1 sample per minute for example, but as you move, the sample rate increases to 1 per second. The same could be used for temperature, speed and other scalar values.
So, how often does a camera transition from detecting some motion, to not detecting any? Once per second? Once per minute? Once per hour? Probably all 3 over a 24 hour period, but on average, let’s say you have one transition every 5 minutes. If you’re running a 1 Hz polling rate vs. a sensor that reports state changes only, you’re looking at a 300x reduction in bandwidth.
In theory we’re now about 17000 times more efficient using states and binary protocols.
In practice, the TCP overhead makes quite a bit less. We might also add a bit of goo to the binary protocol to make it easier to parse, and that’s the other area where things improve: processing speed.
Let’s loosen up the binary protocol a bit by byte-aligning each element. This increases the payload to 32 bits (~8000x improvement)
[ sensor id ] [ type ] [ value ]
0000 0000 0000 0001 0000 0001 0000 0001
It is now trivial, and extremely fast to update the sensor value on the receiving side. Scalar values might be encoded as 32 bit floats, while complex statuses will take up an arbitrary amount of data depending on the sensor.
The point is that sometimes, going “backwards” is actually the low hanging fruit, and sometimes there’s a lot of it to pluck.
The design of a system should take the environment into consideration as well. If you’re writing a module for a client where you have ample bandwidth, and time to market matters, it would be most appropriate to use wasteful, but fast tools. This means that as part of a development contract, the baseline environment must be described. Failure to do so will eventually lead to problems when the client comes back, angry that the system you wrote in 2 days doesn’t work via a dial-up connection.