Random Ramblings on RTSP

It stands for “Real Time Streaming Protocol” and it is pretty much the de-facto protocol for IP security cameras. It’s based on a “pull” principle; anyone who wants to get the feed must ask for it first.

Part of the RTSP protocol describes how the camera and the client exchange information about how the camera sends its data. You might assume that the video is sent back via the same socket as the one used for the RTSP negotiation. This is not the case.

So, in the bog standard usage, the client will have to set up ports that it will use to receive the data from the camera. At this point, you could say that the client actually becomes a server, as it is now listening on two different ports. If you were to capture the communication, you might see something like this:

C->S: 
 SETUP rtsp://example.com/media.mp4/streamid=0 RTSP/1.0
 CSeq: 3
 Transport: RTP/AVP;unicast;client_port=8000-8001

S->C: 
  RTSP/1.0 200 OK
 CSeq: 3
 Transport: RTP/AVP;unicast;client_port=8000-8001;server_port=9000-9001;ssrc=1234ABCD
 Session: 12345678

If both devices are on the same LAN (and you tolerate that the client app opens two new ports in the firewall of the PC), the camera will start sending UDP packets to those two ports. The camera has no idea if the packets arrive or not (it’s UDP), so it’ll just keep spewing those packets until the client tears down the connection or the camera fails to receive a keep-alive message (usually just a dummy request via the RTSP channel).

But what if they’re not on the same network?

My VLC player on my home network may open up port 8000 and 8001 locally on my PC, but the firewall in my router has no idea this happened (there are rare exceptions to this). So the VLC player says “hey, camera out there on the internet, just send video to me on port 8000”, but that isn’t going to work because my external firewall will filter out those packets.

To solve this issue, we have RTSP over TCP.

RTSP over TCP lets the client initiate all the connections, and the camera then sends data back through those connections. Most firewalls have no issue accepting data via a connection as long as it was initiated from the inside (UDP hole punching takes advantage of this). RTSP over HTTP is an additional layer to handle hysterical system admins that only tolerate “safe” HTTP traffic through their firewalls.

So, is that the only reason to use TCP?

Well… Hopefully, you know that UDP is non-ack; the sender has no idea if the client received the packet or not. This is useful for broad- and multicasting where getting acks would be impossible. So the server is happily spamming the network, not a care in the world if the packets make it or not.

TCP has ack and retransmission; in other words, the sender knows if the receiver is getting the data and will re-transmit if packets were dropped.

Now, imagine that we have two people sitting in a quiet meeting room, one of them reads a page of text to the other. The guy reading is the camera and the guy listening is the recorder (or VLC player).

So the reader starts reading “It was the best __ times, it was ___ worst of times“, and since this is UDP, the listener is not allowed to say “can you repeat that“. Instead, the listener simply has to make do. Since there’s not a lot of entropy in the message, we can make sense of the text even if we drop a few words here and there.

But imagine we have 10 people reading at the same time. Will the listener be able to make sense of what is being said? What about 100 readers? While this is a simplified model, this is what happens when you run every camera on UDP.

Using TCP, you kinda have the listener going “got it, got it, got it” as the reader works his way down the page. If the room is quiet, the listener will rarely have to say “can you repeat that”. In other words, the transmission will be just as fast as UDP.

If you have 10 readers, and some of them are not getting the “got it” message, they may decide to scale back a bit and read a bit more slowly. In the end, though, the listener will have a verbatim copy of what was being read, even if there are 1000 readers.

Modern video codecs are extremely efficiently encoded, h.264 and h.265 throw away almost everything that is not needed (and then some). This means that if you drop packets the impact is much greater, without those missing packets all you get is a gray blur because that is all that the receiver heard when 100 idiots were yelling on top of each other.

So TCP solves the firewall issue, and in a “quiet” environment it is just as efficient as UDP. In a noisy environment, it will slow things down because of retransmissions, but is that not a lot better than to get a blurry mess? Would it not be better if the cameras were able to adjust their settings if the receiver can’t keep up? Isn’t it better than just spewing a huge amount of HD video packets into the void, never to be seen?

In my opinion, for IP surveillance installations, you should pick RTSP over TCP, and only switch to UDP if you don’t care about losing video.

As an experiment, I set up my phone to blast UDP packets to my server to determine the packet loss on LTE. Assuming it would be massive. Turns out that LTE actually has retransmission of packets on the radio layer (at least for data packets), I don’t know if it does the same for voice data.

The difference may be academic for a lot of installations as the network is actually pretty quiet, but for large/poorly architected solutions it may make a real difference.

 

 

 

Advertisements

Docker

I recently decided to take a closer look at Docker. Many of us have probably used virtualization at one point or other. I used Parallels on OSX some time ago to run Windows w/o having to reboot. Then there’s Virtualbox, VMWare, and Hyper-V that all allow you to run an OS inside another.

Docker does virtualization a bit differently.

The problem

When a piece of software has been created, it eventually needs to be deployed. If it’s an internal tool, the dev or a support tech will boot the machine, hunch over the keyboard and download and install a myriad of cryptic libraries and modules. Eventually, you’ll be able to start the wonderful new application the R&D dept. created.

Sometimes you don’t have that “luxury”, and you’ll need to create an automated (or at least semi-automated) installer. The end user will obtain it, usually via a download and then proceed to install it on their OS (often Windows) by running the setup.exe file.

The software often uses 3rd party services and libraries, that can cause the installer to balloon to several gigabytes. Sometimes the application needs a SQL server installation, but that, in turn, is dependent on some version of something else, and all of that junk needs to be packaged into the installer file – “just in case” the end user doesn’t have them installed.

A solution?

Docker offers something a bit more extreme: There’s no installer (other than docker). Once docker is running, you run a fully functional OS with all the dependencies installed. As absurd as this may sound, it actually leads to much smaller “installers” and it ensures that you don’t have issues with services that don’t play nice with each other. A Debian OS takes up 100Mbytes  (there’s a slim version at 55 Mbytes).

My only quip with docker is that it requires Hyper-V on windows (which means you need a Windows Pro install AFAIK). Once you install Hyper-V, it’s bye-bye Virtualbox. Fortunately, it is almost trivial to convert your old Virtualbox VM disks to Hyper-V.

There are a couple of things I wish were a bit easier to figure out. I wanted to get a hello-world type app running (a compiled app). This is relatively simple to do, once you know how.

How?

To start a container (with Windows as host), open PowerShell, and type

docker run -it [name of preferred OS]

You can find images on the docker hub, or just google them. If this is the first time you run and instance of the OS, docker will attempt to download the image. This can take some time (but once it is installed, it will spawn a new OS in a second or so). Eventually, you’ll then get a shell (that’s what the -it does), and you can use your usual setup commands (apt-get install … make folders and so on).

When you’re done, you can type “exit”

The next step is important, because the next time you enter “docker run -it [OS]” all your changes will have disappeared! To keep your changes, you need to commit them. To save your changes, start by entering this command:

docker ps -a

You’ll then see something like this

docker

Take note of the Container ID, and enter

docker commit [Container ID] [image name]:[image tag]

For example

docker commit bbd096b363af debian_1/test:v1

If you take a look at the image list

docker images

You’ll see that debian_1/test:v1 is among them (along with the HDD use for that image).

You can now run the image with your changes by entering

docker run -it [image name]:[image tag]

You’ll have to do a commit every single time you want to save your change. This can be a blessing (like a giant OS based undo) and a curse (forgot to save before quitting), but it’s not something the end user would/should care about.

You can also save the docker image to a file, and copy it to another machine and run it there.

Granted, “Dockerization” and virtualization as a foundation is not for everyone, but for deployments that are handled by tech-savvy integrators it might be very suitable.

It sits somewhere between an appliance and traditional installed software. It’s not as simple as the former, not as problematic as the latter.

Safe, Easy, Advanced

You can only pick 2 though.

Admitting mistakes is hard; it’s so hard that people will pay good money just to be told that they are not to blame for the mistake. That someone else is responsible for their stupidity. And sometimes they’re right, sometimes not.

Anton Yelchin was only 27 when he died, he left his car in neutral on an incline. The car started rolling and killed him. Since it would be unbearable to accept that Anton simply made a mistake, lawsuits were filed.

Another suitor claimed that the lever was too complex for people to operate, therefore the manufacturer is liable for the damage that occurs when people don’t operate them correctly. The car had rolled over her foot, and while there were no broken bones, she was now experienced “escalating pains”, and demanded reparations. One argument was that the car did not have the same feature as a more expensive BMW.

Tragically, every year more than 30 kids are forgotten in cars and die. When I bring this up with people, everyone says “it won’t ever happen to us”, and so there’s zero motivation to spend extra on such a precaution. The manufacturers know this, and since there’s also liability risk, they are not offering it. So, every year, kids bake to death in cars. It’s a gruesome fate for the kids, but the parents will never recover either.

Is it wrong to “blame the victim”?

I think the word “blame” has too many negative connotations associated to be useful in this context. Did the person’s action/inaction cause the outcome? If the answer is a resounding yes, then sure…  we can say that we “blame the victim”.

It’s obviously a gray area. If a car manufacturer decides that P should mean neutral and N should mean park, and writes about this in their manual and tells the customers as they sign the contract, then I wouldn’t blame an operator for making the mistake. The question is – would a person of “normal intelligence” be more than likely to make the same mistake?

In our industry, not only are we moving the yard-post of what “normal intelligence” means. Some of the most hysterical actors are using the bad practices of the layman and arguing that the equipment, therefore, can’t be used by professionals.

It feels like it’s entirely reasonable to argue no-one should drive 18-wheelers because random people bought modified trucks at suspect prices in a grey market and then went ahead and caused problems for a lot of people.

As professionals, we’re expected to have “higher intelligence” when it comes to handling the equipment. You can’t call yourself “professional” if you have to rely on some hack and his gang to educate you online or through their “university”. And you sure as hell can’t dismiss the usability of a device based on what random amateurs do with it.

So what gives? You have a bunch of people who act like amateurs but feel like “professionals” because they are paying good money for this industry’s equivalent of 4chan and got paid to install a few units here and there.

It seems to me that the hysterical chicken-littles of this industry are conflating their own audiences with what actual professionals are experiencing. E.g. if someone suggests using a non-standard port to “protect their installation”, then you know that the guy is not professional (doesn’t mean he’s not paid, just means he’s not competent).

And that’s at the core of this debacle: people that are incompetent, feel entitled to be called professionals, and when they make mistakes that pros would never make, it’s the fault of the equipment and it’s not suitable for professionals either.

So, as I’ve stated numerous times, I have a Hikvision and an Axis camera sitting here on my desk. Both have default admin passwords, yet I have not been the victim of any hack – ever. The Hikvision camera has decent optics (for a surveillance camera) and provides an acceptable image at a much lower cost than the “more secure” option.

And I’ll agree that getting video from that camera to my phone, sans VPN is not “easy” for the layman. But it doesn’t have to be. It just has to be easy for the thousands of competent integrators know what to do, and more importantly, what not to do.

That said; the PoC of the HikVision authentication bypass string should cause heads to roll at Hikvisions (and Dahuas) R&D department. Either there’s no code-review (bad) or there was, and they ignored it (even worse). There’s just no excuse for that kind of crap to be present in the code. Certainly not at this day and age.

 

Technical Innovation

This might be wonky, so if you’re not into that sort of thing, you can save yourself 5 minutes and skip this one.

Enterprise installations of software usually consists of several semi-autonomous clusters (or sites) that are arranged in either some sort of hierarchy. Usually a tree-like structure, but it could be a mesh or graph.

Each cluster may have hundreds or even thousands of sensors (not just cameras) that all can be described by their state. A camera may be online, it may be recording, a PIR might be triggered or not and so on. Basically, sensors have three types of properties: Binary, scalar and complex ones (an LPR reader that returns tag and color of a car).

The bandwidth inside each cluster is usually decent (~GBit), and if it isn’t an upgrade of the network is usually a one time fee. Sure, there might be sensors that are at remote locations where the bandwidth is low, but in most cases bandwidth is not an issue inside the cluster. Unfortunately, the bandwidth from the cluster to the outside world is usually not as ample.

It’s not uncommon that a user at the top of the hierarchy will see a number of clusters, and it seems reasonable that this user also wants to see the state of each of the sensors inside the cluster. This, then requires that the state information of each sensor is relayed from the cluster to the client. The client may have a 100 Mbit downstream connection, but if the upstream path from the cluster to the client is a mere 256 Kbit/s then getting the status data across in real time is a real problem.

In the coding world there’s an obsession with statelessness. Relying on states is hard, so is deleting memory after you’ve allocated it, and debugging binary protocols isn’t cakewalk either, so programmers (like me) try to avoid these things. We have garbage collection (which I hate), we have wildly verbose and wasteful containers such as SOAP (which I hate) and we have polling (which I hate).

So, let’s consider a stateless design with a verbose container (but a lot more terse than soap): The client will periodically request the status.

client:
  <request>
    <type>status</type>
    <match>all</match>
  </request>

server:
  <response>
     <sensor id="long_winded_guid_as_text" type="redundant_type_info">
        <state>ON</state>
     </sensor>
  </response>

The client request can be optimized, but it’s the server that is having the biggest problem. If we look at the actual info in the packet, we are told that a sensor, with a guid, is ON.

Let’s examine the entropy of this message.

The amount of data needed to specify the sensor depends heavily on the number of sensors in the system. If there’s just 2 sensors in total, then the identifier requires 1 bit of information (0 or 1), if there are 4 sensors, you’ll need 2 bits (00, 01, 10 and 11), and so if there are 65000 sensors, you’ll need 16 bits (2 bytes).

The state is binary, so you’ll need just a single bit to encode the state (0 for off, 1 for on).

However, since the state information can be of 3 different types, you’ll need a status type identifier as well. However, this information only needs to be encoded in the message if the type actually changes. If the status type stays the same, it can be cached on the receiving side. However, in order to remain stateless, we’re going to need 2 bits to encode the type (00 for binary, 01 for scalar, 10 for complex).

so, for a system with 60.000 sensors, a message might look like this

[ sensor id          ] [ type ] [ value ]
  0000 0000 0000 0001    00       1

19 bits in total

The wasteful message is 142 bytes long, or 1136 bits, or 59 times larger, and it is actually pretty terse compared to some of the stuff you’ll see in the real world. In terms of bandwidth, the compact and “archaic” binary protocol can push 59 times as many messages through the same pipe!

Now, what happens if we remove the statelessness? We could cache the status type on the receiving side, so we’d get down to 17 bits (a decent improvement), but we could also decide to not poll, and instead let the server push data only when things change.

The savings here are entirely dependent on the configuration of the system, and sensors with scalar values probably have to continuously send the value. A technique used when recording GPS info is to only let the device send its position when it has changed more than a certain threshold or when a predefined duration has passed. That means that if you’re stationary, we only have location samples at a 1 sample per minute for example, but as you move, the sample rate increases to 1 per second. The same could be used for temperature, speed and other scalar values.

So, how often does a camera transition from detecting some motion, to not detecting any? Once per second? Once per minute? Once per hour? Probably all 3 over a 24 hour period, but on average, let’s say you have one transition every 5 minutes. If you’re running a 1 Hz polling rate vs. a sensor that reports state changes only, you’re looking at a 300x reduction in bandwidth.

In theory we’re now about 17000 times more efficient using states and binary protocols.

In practice, the TCP overhead makes quite a bit less. We might also add a bit of goo to the binary protocol to make it easier to parse, and that’s the other area where things improve: processing speed.

Let’s loosen up the binary protocol a bit by byte-aligning each element. This increases the payload to 32 bits (~8000x improvement)

[ sensor id       ] [ type    ] [ value ]
0000 0000 0000 0001  0000 0001  0000 0001

It is now trivial, and extremely fast to update the sensor value on the receiving side. Scalar values might be encoded as 32 bit floats, while complex statuses will take up an arbitrary amount of data depending on the sensor.

The point is that sometimes, going “backwards” is actually the low hanging fruit, and sometimes there’s a lot of it to pluck.

The design of a system should take the environment into consideration as well. If you’re writing a module for a client where you have ample bandwidth, and time to market matters, it would be most appropriate to use wasteful, but fast tools. This means that as part of a development contract, the baseline environment must be described. Failure to do so will eventually lead to problems when the client comes back, angry that the system you wrote in 2 days doesn’t work via a dial-up connection.

 

CamStreamer

Axis does not support RTMP as a transmission protocol in their firmware (correct me if I am wrong about this), this seems to be an omission as RTMP offers a pretty good way for the camera to initiate an outbound connection to an RTMP service somewhere outside the network. I believe this is how DropCam works (again, an assumption).

As most probably know (or should know), an Axis camera is basically a small Linux computer with a camera attached. Axis will let you download a development environment, which you can then use to compile applications for the camera. This is called ACAP.

Before compiling an RTMP library for ACAP, I decided to check if someone else had done this. Turns out CamStreamer had done so, and I decided to give it a whirl.

Setup is extremely simple, and the UI looks modern and clean. It has support for some of the most popular streaming services, but I needed the RTMP ingest server (Universal)  as I have my own RTMP servers in the cloud.

Options

Setting this stuff up was trivial,

generic

The system pretty much worked like a charm with a single exception. During configuration, there was an error message telling me that the camera could not reach an outside host on port 1935 (the RTMP port). The error message is helpful and accurate. But later, I lost my internet connection completely (a screw-up at my ISP, where they switched me back to DHCP) and I was unable to enter the RTMP configuration page on the camera entirely. I guess this is because the HTML/configuration is served from a host in the cloud. I did not dig into it, but it struck me as a bit odd.

The ACAP application is pretty expensive: $299 for a 1-camera license (at the time of writing). This is pretty steep as you can hook up a Raspberry-PI type device that will do the same for several cameras for less than half that.

I am not sure why RTMP support is not built into more cameras (again, correct me if I am wrong). The “push” model of RTMP is almost perfect for cloud-based solutions, so why this is not standard fare for IP cameras is beyond me.

While Flash Video uses RTMP, RTMP can be used w/o Flash being involved at all. E.g. if I publish a stream via RTMP to EvoStream, I can view the stream via RTSP instead (and what can you do with RTSP streams and your favorite VMS?). It’s true that the reliance on TCP for RTMP introduces a bit of latency, but in practice, this is hardly a problem (if you are going to use a cloud solution, latencies are going to be pretty high compared to a local one).

The Parts of an IP Camera

To understand where the IP camera market is headed, I think it’s important to understand how one of these things are put together.

Like most high tech devices, each product is really an amalgamation of parts from different manufacturers. In fact many products are the result of tight, but perhaps unappreciated, collaboration of several (sometimes competing) companies. I’d recommend listening to Freakonomics rundown of the “I, Pencil” essay (starts 7 minutes in).

So, an IP camera is not a pencil, but just like all pencil manufacturers don’t manufacture every single part of the pencil, but instead, they purchase the parts (graphite, brass, paint and so on) and every manufacturer puts the pencil together following roughly the same pattern.

And, so, when it comes to IP cameras, they too are composed of parts that are available to everyone who wants to start making cameras.

You’ll need a couple of things: A lens, a sensor, some circuitry and some code.

You’re not going to start making your own lenses or sensors, are you? Probably not, so you’ll get the lenses from a lens maker (and they may even outsource their manufacturing process even further), and the sensor from either Sony or Canon.

You’re not going to design your own CPU either (unless you’re Axis). Today, you’d be better off grabbing an ARM platform and use that to drive the sensor and interface. The other advantage is that ARM is well supported in the software world, so you’re already halfway there.

Now that you have the basics, you need to write some code to get it all working together. If you went the ARM route, it’s pretty simple to get a linux kernel running. Well.. “simple” is depends on your level of skill, but finding a few geeks who can do this shouldn’t take long. So you grab the Linux kernel, add Apache or perhaps GoAhead, you can add gStreamer too (do check the link, it is a great presentation by Axis) . The next thing you know, you have a jumble of cables and breadboards, burns on your fingers from the soldering iron, you haven’t seen your kids in 4 days and the smell is getting a little hard to stomach.

On top of that, you need to wrap this in an enclosure. There’s regulations to follow, tests that need to be carried out and so on. Then you have the nightmare of maintaining all those pieces of code, and trust me – if you wrote everything yourself, it would take even longer and be much harder to test and maintain.

What if there was a company, that could do all of the above? And just stick my name on the box? After all, my company would pick the same lens, the same sensor, the same board and the same software, so why not do it?

I have no intention of starting production of a Raspberry Pi Zero based IP camera, but I know that I can make one for ~$40 (and that’s buying all the parts retail). Not only will this thing work as an IP camera, it can work as a full fledged stand-alone VMS.

In other words, the question is: if some washed up coder in Copenhagen can build a fully functional “IP camera” for $40, I think you’re going to face a tough time if you’ve based your entire organization around selling your cheapest cameras for $250+ (they may be “even more good enough”, but who cares?).

Obviously, my camera is not going to be materially different from the other guy’s cameras. We’re all going to use the same bits and pieces, including software, even the damn protocols are going to be the same.

So, I think we’re going to see a race to the bottom in terms of prices. The cameras will look and perform almost identically across brands, use the same protocols, and be completely interchangeable, much to the chagrin of the incumbents, so the USP for the brands in this realm will have be something else.

VMS Software, perhaps…

 

 

 

 

 

 

Radars

 

A while back I got fed up with people parking their cars right in front of my driveway, and I decided to find a solution.

A camera could work, but since I am cheap, I decided to look for something a bit more… economical. A PIR sensor wouldn’t work because it triggers when there is motion, and cars and people pass by all the time, so I looked into ultrasonic sensors and eventually radars. If the distance measured drops below a pre-defined threshold and stays there, I know to run into the street, yelling and screaming.

The inspiration came from Adafruit and Andreas Spiess who has a great YouTube channel where you can get more information about ultrasonic sensors and radars (and about 1000 other things).

 

Basically, you can get an Arduino capable board. I hope the WiFi capable ESP8266 will work (since I have one lying around). Then get some cheap sensors from China via Alibaba and you’re ready to experiment. At the very least, it should give you some idea of the base cost of such a device.

Both Axis and Avigilon have launched commercial versions of miniature radars that can interface to your favorite VMS. Combined with a PTZ camera this might be a very interesting combination that offers a bit more smartness than the good old PIR/PTZ combo.