Technical Innovation

This might be wonky, so if you’re not into that sort of thing, you can save yourself 5 minutes and skip this one.

Enterprise installations of software usually consists of several semi-autonomous clusters (or sites) that are arranged in either some sort of hierarchy. Usually a tree-like structure, but it could be a mesh or graph.

Each cluster may have hundreds or even thousands of sensors (not just cameras) that all can be described by their state. A camera may be online, it may be recording, a PIR might be triggered or not and so on. Basically, sensors have three types of properties: Binary, scalar and complex ones (an LPR reader that returns tag and color of a car).

The bandwidth inside each cluster is usually decent (~GBit), and if it isn’t an upgrade of the network is usually a one time fee. Sure, there might be sensors that are at remote locations where the bandwidth is low, but in most cases bandwidth is not an issue inside the cluster. Unfortunately, the bandwidth from the cluster to the outside world is usually not as ample.

It’s not uncommon that a user at the top of the hierarchy will see a number of clusters, and it seems reasonable that this user also wants to see the state of each of the sensors inside the cluster. This, then requires that the state information of each sensor is relayed from the cluster to the client. The client may have a 100 Mbit downstream connection, but if the upstream path from the cluster to the client is a mere 256 Kbit/s then getting the status data across in real time is a real problem.

In the coding world there’s an obsession with statelessness. Relying on states is hard, so is deleting memory after you’ve allocated it, and debugging binary protocols isn’t cakewalk either, so programmers (like me) try to avoid these things. We have garbage collection (which I hate), we have wildly verbose and wasteful containers such as SOAP (which I hate) and we have polling (which I hate).

So, let’s consider a stateless design with a verbose container (but a lot more terse than soap): The client will periodically request the status.

client:
  <request>
    <type>status</type>
    <match>all</match>
  </request>

server:
  <response>
     <sensor id="long_winded_guid_as_text" type="redundant_type_info">
        <state>ON</state>
     </sensor>
  </response>

The client request can be optimized, but it’s the server that is having the biggest problem. If we look at the actual info in the packet, we are told that a sensor, with a guid, is ON.

Let’s examine the entropy of this message.

The amount of data needed to specify the sensor depends heavily on the number of sensors in the system. If there’s just 2 sensors in total, then the identifier requires 1 bit of information (0 or 1), if there are 4 sensors, you’ll need 2 bits (00, 01, 10 and 11), and so if there are 65000 sensors, you’ll need 16 bits (2 bytes).

The state is binary, so you’ll need just a single bit to encode the state (0 for off, 1 for on).

However, since the state information can be of 3 different types, you’ll need a status type identifier as well. However, this information only needs to be encoded in the message if the type actually changes. If the status type stays the same, it can be cached on the receiving side. However, in order to remain stateless, we’re going to need 2 bits to encode the type (00 for binary, 01 for scalar, 10 for complex).

so, for a system with 60.000 sensors, a message might look like this

[ sensor id          ] [ type ] [ value ]
  0000 0000 0000 0001    00       1

19 bits in total

The wasteful message is 142 bytes long, or 1136 bits, or 59 times larger, and it is actually pretty terse compared to some of the stuff you’ll see in the real world. In terms of bandwidth, the compact and “archaic” binary protocol can push 59 times as many messages through the same pipe!

Now, what happens if we remove the statelessness? We could cache the status type on the receiving side, so we’d get down to 17 bits (a decent improvement), but we could also decide to not poll, and instead let the server push data only when things change.

The savings here are entirely dependent on the configuration of the system, and sensors with scalar values probably have to continuously send the value. A technique used when recording GPS info is to only let the device send its position when it has changed more than a certain threshold or when a predefined duration has passed. That means that if you’re stationary, we only have location samples at a 1 sample per minute for example, but as you move, the sample rate increases to 1 per second. The same could be used for temperature, speed and other scalar values.

So, how often does a camera transition from detecting some motion, to not detecting any? Once per second? Once per minute? Once per hour? Probably all 3 over a 24 hour period, but on average, let’s say you have one transition every 5 minutes. If you’re running a 1 Hz polling rate vs. a sensor that reports state changes only, you’re looking at a 300x reduction in bandwidth.

In theory we’re now about 17000 times more efficient using states and binary protocols.

In practice, the TCP overhead makes quite a bit less. We might also add a bit of goo to the binary protocol to make it easier to parse, and that’s the other area where things improve: processing speed.

Let’s loosen up the binary protocol a bit by byte-aligning each element. This increases the payload to 32 bits (~8000x improvement)

[ sensor id       ] [ type    ] [ value ]
0000 0000 0000 0001  0000 0001  0000 0001

It is now trivial, and extremely fast to update the sensor value on the receiving side. Scalar values might be encoded as 32 bit floats, while complex statuses will take up an arbitrary amount of data depending on the sensor.

The point is that sometimes, going “backwards” is actually the low hanging fruit, and sometimes there’s a lot of it to pluck.

The design of a system should take the environment into consideration as well. If you’re writing a module for a client where you have ample bandwidth, and time to market matters, it would be most appropriate to use wasteful, but fast tools. This means that as part of a development contract, the baseline environment must be described. Failure to do so will eventually lead to problems when the client comes back, angry that the system you wrote in 2 days doesn’t work via a dial-up connection.

 

CamStreamer

Axis does not support RTMP as a transmission protocol in their firmware (correct me if I am wrong about this), this seems to be an omission as RTMP offers a pretty good way for the camera to initiate an outbound connection to an RTMP service somewhere outside the network. I believe this is how DropCam works (again, an assumption).

As most probably know (or should know), an Axis camera is basically a small Linux computer with a camera attached. Axis will let you download a development environment, which you can then use to compile applications for the camera. This is called ACAP.

Before compiling an RTMP library for ACAP, I decided to check if someone else had done this. Turns out CamStreamer had done so, and I decided to give it a whirl.

Setup is extremely simple, and the UI looks modern and clean. It has support for some of the most popular streaming services, but I needed the RTMP ingest server (Universal)  as I have my own RTMP servers in the cloud.

Options

Setting this stuff up was trivial,

generic

The system pretty much worked like a charm with a single exception. During configuration, there was an error message telling me that the camera could not reach an outside host on port 1935 (the RTMP port). The error message is helpful and accurate. But later, I lost my internet connection completely (a screw-up at my ISP, where they switched me back to DHCP) and I was unable to enter the RTMP configuration page on the camera entirely. I guess this is because the HTML/configuration is served from a host in the cloud. I did not dig into it, but it struck me as a bit odd.

The ACAP application is pretty expensive: $299 for a 1-camera license (at the time of writing). This is pretty steep as you can hook up a Raspberry-PI type device that will do the same for several cameras for less than half that.

I am not sure why RTMP support is not built into more cameras (again, correct me if I am wrong). The “push” model of RTMP is almost perfect for cloud-based solutions, so why this is not standard fare for IP cameras is beyond me.

While Flash Video uses RTMP, RTMP can be used w/o Flash being involved at all. E.g. if I publish a stream via RTMP to EvoStream, I can view the stream via RTSP instead (and what can you do with RTSP streams and your favorite VMS?). It’s true that the reliance on TCP for RTMP introduces a bit of latency, but in practice, this is hardly a problem (if you are going to use a cloud solution, latencies are going to be pretty high compared to a local one).