Category Archives: Surveillance Software Design

The Parts of an IP Camera

To understand where the IP camera market is headed, I think it’s important to understand how one of these things are put together.

Like most high tech devices, each product is really an amalgamation of parts from different manufacturers. In fact many products are the result of tight, but perhaps unappreciated, collaboration of several (sometimes competing) companies. I’d recommend listening to Freakonomics rundown of the “I, Pencil” essay (starts 7 minutes in).

So, an IP camera is not a pencil, but just like all pencil manufacturers don’t manufacture every single part of the pencil, but instead, they purchase the parts (graphite, brass, paint and so on) and every manufacturer puts the pencil together following roughly the same pattern.

And, so, when it comes to IP cameras, they too are composed of parts that are available to everyone who wants to start making cameras.

You’ll need a couple of things: A lens, a sensor, some circuitry and some code.

You’re not going to start making your own lenses or sensors, are you? Probably not, so you’ll get the lenses from a lens maker (and they may even outsource their manufacturing process even further), and the sensor from either Sony or Canon.

You’re not going to design your own CPU either (unless you’re Axis). Today, you’d be better off grabbing an ARM platform and use that to drive the sensor and interface. The other advantage is that ARM is well supported in the software world, so you’re already halfway there.

Now that you have the basics, you need to write some code to get it all working together. If you went the ARM route, it’s pretty simple to get a linux kernel running. Well.. “simple” is depends on your level of skill, but finding a few geeks who can do this shouldn’t take long. So you grab the Linux kernel, add Apache or perhaps GoAhead, you can add gStreamer too (do check the link, it is a great presentation by Axis) . The next thing you know, you have a jumble of cables and breadboards, burns on your fingers from the soldering iron, you haven’t seen your kids in 4 days and the smell is getting a little hard to stomach.

On top of that, you need to wrap this in an enclosure. There’s regulations to follow, tests that need to be carried out and so on. Then you have the nightmare of maintaining all those pieces of code, and trust me – if you wrote everything yourself, it would take even longer and be much harder to test and maintain.

What if there was a company, that could do all of the above? And just stick my name on the box? After all, my company would pick the same lens, the same sensor, the same board and the same software, so why not do it?

I have no intention of starting production of a Raspberry Pi Zero based IP camera, but I know that I can make one for ~$40 (and that’s buying all the parts retail). Not only will this thing work as an IP camera, it can work as a full fledged stand-alone VMS.

In other words, the question is: if some washed up coder in Copenhagen can build a fully functional “IP camera” for $40, I think you’re going to face a tough time if you’ve based your entire organization around selling your cheapest cameras for $250+ (they may be “even more good enough”, but who cares?).

Obviously, my camera is not going to be materially different from the other guy’s cameras. We’re all going to use the same bits and pieces, including software, even the damn protocols are going to be the same.

So, I think we’re going to see a race to the bottom in terms of prices. The cameras will look and perform almost identically across brands, use the same protocols, and be completely interchangeable, much to the chagrin of the incumbents, so the USP for the brands in this realm will have be something else.

VMS Software, perhaps…









A while back I got fed up with people parking their cars right in front of my driveway, and I decided to find a solution.

A camera could work, but since I am cheap, I decided to look for something a bit more… economical. A PIR sensor wouldn’t work because it triggers when there is motion, and cars and people pass by all the time, so I looked into ultrasonic sensors and eventually radars. If the distance measured drops below a pre-defined threshold and stays there, I know to run into the street, yelling and screaming.

The inspiration came from Adafruit and Andreas Spiess who has a great YouTube channel where you can get more information about ultrasonic sensors and radars (and about 1000 other things).


Basically, you can get an Arduino capable board. I hope the WiFi capable ESP8266 will work (since I have one lying around). Then get some cheap sensors from China via Alibaba and you’re ready to experiment. At the very least, it should give you some idea of the base cost of such a device.

Both Axis and Avigilon have launched commercial versions of miniature radars that can interface to your favorite VMS. Combined with a PTZ camera this might be a very interesting combination that offers a bit more smartness than the good old PIR/PTZ combo.



As with IP cameras, one of the IoT challenges is how to get your controlling device (typically a phone) to talk to the IoT device in a way that does not require opening up inbound ports on your firewall.

All communication is peer to peer, so the term, when used in the context of IoT devices, is perhaps a little misleading, after all, an exposed camera sending a video stream to a phone somewhere is also “peer to peer”. Instead, P2P might be translated to “send data from A to B, even if both A and B are behind firewalls, using a middleman C” (what the hell is up with all the A, B, C these days).

On a technical level, the P2P cameras use something called UDP hole punching, which sounds a bit onymous, but there’s really nothing sneaky about it. What happens is that A connects to C, so that C now knows the external IP address of A. Likewise, B also connects to C, and now C knows the external IP address of both A and B.

This middleman, now passes the IP address of A to B, and B to A. Next step is for A to fire a volley of UDP packets towards B, while B does the same towards A.

The firewall on A’s side sees a bunch of packets travel to B’s address, and when B’s packets arrive, the firewall thinks that the UDP packets are replies to the packets that were sent from A and let’s them through.

You could accomplish the same thing by having A go to “” and email it to B, B would then do the same. Then run scripts that send UDP packets over the network, but a STUN server automates this process.

But who controls this “middle man”? Ideally, you’d be in charge of it; you’d be able to specify your own STUN-type server in the camera interface, so that you have full control of all links in the chain. In time, perhaps the camera vendors will release a protocol description and open source modules so that you can host your own middle-man.

The problem might be that you bought a nice cheap camera in the grey market. The camera is intended for the Chinese market, but comes with a “modded” firmware that enables English menus and so on. This is obviously risky. Updating a modded firmware may be impossible and brick the camera, and the manufacturer may be less inclined to support devices that have been modded. You get what you pay for, so to speak (and this blog is free!)

The modder is selling the cameras in the western markets, but the STUN server is still pointing to a server in China. This makes sense if you are a Chinese user, but it may seem very strange that your camera “calls home” to a server in China. A non-modded camera might do the same, simply because running a STUN service is cheaper, and allows the government to eavesdrop on the traffic. If you are Chinese (I am not), you could argue that you don’t trust Amazon, Microsoft or Google because they might work with the NSA. Therefore, using your own server would be preferred.

Apart from the STUN functionality, the camera may follow direction that are sent from B to C to A. This puts a lot of responsibility in the hands of the guys maintaining this server. If it is breached, a lot of cameras will then be vulnerable.

Depending on the end user, P2P may not be appropriate at all. To some users, the cost of a breach is small, compared to the hassle of installing a fully secure system it might be worth it.

While yours truly has abandoned all attempts to appear professional over the years, the truth is that most big installations have their shit together. Unfortunately the volume of DIYers and amateurish installers who don’t really know what they are doing is much bigger (in terms of headcount, not commercial volume), and if there’s one thing we all want to do, it’s to blame someone else.

Caveat Emptor.



Codecs and Client Applications

4K and H.265 is coming along nicely, but it is also part of the problems we as developers are facing these days. Users want low latency, fluid video, low bandwidth and high resolution, and they want these things on 3 types of platforms – traditional PC applications, apps for mobile devices and tablets, and a web interface. In this article, I’d like to provide some insights from a developer’s perspective.

Fluid or Low Latency

Fluid and low latency at the same time is highly problematic. To the HLS guys, 1 second is “low latency”, but to us, and the WebRTC hackers, we are looking for latencies in the sub 100 ms area. Video surveillance doesn’t always need super low latency – if a fixed camera has 2 seconds of latency, that is rarely a problem (in the real world). But as soon as any kind of interaction takes place (PTZ or 2-way audio) then you need low latency. Optical PTZ can “survive” low latency if you only support PTZ presets or simple movements, but tracking objects will be virtually impossible on a high-latency feed.

Why high latency?

A lot of the tech developed for video distribution is intended for recorded video, and not for low latency live video. The intent is to download a chunk of video, and while that plays, you download the next in the background, this happens over and over, but playing back does not commence until at least the entire first block has been read off the wire. The chunks are usually 5-10 seconds in duration, which is not a problem when you’re watching Bob’s Burgers on Netflix.

The lowest latency you can get is to simple draw the image when you receive it, but due to packetization and network latency, you’re not going to get the frames at a steady pace, which leads to stuttering which is NOT acceptable when Bob’s Burgers is being streamed.

How about WebRTC?

If you’ve ever used Google Hangouts,  then you’ve probably used WebRTC. It works great when you have a peer-to-peer connection with a good, low latency connection. The peer-to-peer part is important, because part of the design is that the recipient can tell the sender to adjust its quality on demand. This is rarely feasible on a traditional IP camera, but it could eventually be implemented. WebRTC is built into some web browsers, and it supports H.264 by default, but not H.265 (AFAIK) or any of the legacy formats.


Yes, and no. Transcoding comes at a rather steep price if you expect to use your system as if it ran w/o transcoding. The server has to decode every frame, and then re-encode it in the proper format. Some vendors transcodes to JPEG which makes it easier for the client to handle, but puts a tremendous amount of stress on the server. Not on the encoding side, but the decoding of all those streams is pretty costly.  To limit the impact on the transcoding server, you may have to alter the UI to reflect the limitation in server side resources.

Installed Applications

The trivial case is an installed application on a PC or a mobile device. Large install files are pretty annoying (and often unnecessary), but you can package all the application dependencies, and the developer can do pretty much anything they want. There’s usually a lot of disk-space available and fairly large amounts of RAM.

On a mobile device you struggle with OS fragmentation (in case of Android), but since you are writing an installed application, you are able to include all dependencies. The limitations are in computing power, storage, RAM and physical dimensions. The UI that works for a PC with a mouse is useless on a 5″ screen with a touch interface. The CPU/GPU’s are pretty powerful (for their size), but they are no-where near the processing power of a halfway decent PC. The UI has to take this into consideration as well.

“Pure” Web Clients

One issue that I have come across a few times, is that some people think the native app “uses a lot of resources”, while a web based client would somehow, magically, use fewer resources to do the same job. The native app uses 90.0% of the CPU resources to decode video, and it does so a LOT more efficient than a web client would ever be able to. So if you have a low end PC, the solution is not to demand a web client, but to lower the number of cameras on-screen.

Let me make it clear: any web client that relies on an ActiveX component to be downloaded and installed, might as well have been an installed application. ActiveX controls are compiled binaries that only run on the intended platform (IE, Windows, x86 or x64). They are usually implicitly (sometimes explicitly) left behind on the machine, and can be instantiated and used as an attack vector if you can trick a user to visit a malicious site (which is quite easy to accomplish).

The purpose of a web client is to allow someone to access the system from a random machine in the network, w/o having to install anything. An advantage is also that since there is no installer, there’s no need to constantly download and install upgrades every time you access your system. When you log in, you get the latest version of the “client”. Forget all about “better for low end” and “better performance”.


Java applets can be installed, but often setting up Java for a browser is a pain in the ass (security issues), and performance can be pretty bad.

Flash apps are problematic too, and suffer the same issues as Java applets. Flash has a decent H.264 decoder for .flv formatted streams, but no support for H.265 or legacy formats (unless you write them, from scratch.. and good luck with that 🙂 ) Furthermore, everyone with a web browser in their portfolio is trying to kill Flash due to it’s many problems.

NPAPI or other native plugin frameworks (NaCL, Pepper) did offer decent performance, but usually only worked on one or two browsers (Chrome or Firefox), and Chrome later removed support for NPAPI.

HTML5 offers the <video> tag, which can be used for “live” video. Just not low latency, and codec support is limited.

Javascript performance is now at a point (for the leading browsers) that you can write a full decoder for just about any format you can think of and get reasonable performance for 1 or 2 720p streams if you have a modern PC.


To get broad client side support (that truly works), you have to make compromises on the supported devices side. You cannot support every legacy codec and device and expect to get a decent client side experience on every platform.

As a general rule, I disregard arguments that “if it doesn’t work with X, then it is useless”. Too often, this argument gains traction, and to satisfy X, we sacrifice Y. I would rather support Y 100% if Y makes more sense. I’d rather serve 3 good dishes, than 10 bad ones. But in this industry, it seems that 6000 shitty dishes at an expensive “restaurant” is what people want. I don’t.



Tagged , , , ,

Marketing Technology

I recently saw a fun post on LinkedIn. Milestone Systems was bragging about how they have added GPU acceleration to their VMS, but the accompanying picture was from a different VMS vendor. My curiosity had the better of me, and I decided to look for the original press release. The image was gone, but the text is bad enough.

Let’s examine :

Pioneering Hardware Acceleration
In the latest XProtect releases, Milestone has harvested the advantages of the close relationships with Intel and Microsoft by implementing hardware acceleration. The processor-intensive task of decoding (rendering) video is offloaded to the dedicated graphics system (GPU) inside the processer [sic], leaving the main processor free to take on other tasks. The GPU is optimized to handle computer graphics and video, meaning these tasks will be greatly accelerated. Using the technology in servers can save even more expensive computer muscle.

“Pioneering” means that you do something before other people. OnSSI did GPU acceleration in version 1.0 of Ocularis, which is now 8 or 9 years old. Even the very old NetSwitcher app used DirectX for fast YUV conversion. Avigilon has been doing GPU acceleration for a while too, and I suspect others have as well. The only “pioneering” here is how far you can stretch the bullshit.

Furthermore, Milestone apparently needs a “close relationship” with Microsoft and Intel to use standard and publicly available quick sync tech. They could also have used FFMpeg.

We have experimented with CUDA on a high end nVidia card years ago, but came to the conclusion that the scalability was problematic, and while the CPU would show 5%, the GPU was being saturated causing stuttering video when we pushed for a lot of frames.

Using Quick sync is the right thing to do, but trying to pass it off as “pioneering” and suggesting that you have some special access to Microsoft and Intel to do trivial things is taking marketing too far.

The takeaway is that I need to remind myself to make my first gen client slow as hell, so that I can claim 100% performance improvement in v2.0.


Tagged , ,

Open Systems and Integration

Yesterday I took a break from my regular schedule and added a simple, generic HTTP event source to Ocularis. We’ve had the ability to integrate to IFTTT via the Maker Channel for quite some time. This would allow you to trigger IFTTT actions whenever an event occurs in Ocularis. Soon, you will be able to trigger alerts in Ocularis via IFTTT triggers.

For example, IFTTT has a geofence trigger, so when someone enters an area, you can pop the appropriate camera via a blank screen. The response time of IFTTT is too slow and I don’t consider it reliable for serious surveillance applications, but it’s a good illustration of the possibilities of an open system. Because I am lazy, I made a trigger based on Twitter, that way I did not have to leave the house.

Making a HTTP event source did not require any changes to Ocularis itself. It could be trivially added to previous version if one wanted to do that, but even if we have a completely open system, it doesn’t mean that people will utilize it.



Tagged ,

Camera Proxy

There’s a lot of paranoia in the industry right now, some warranted, some not. The primary issue is that when you plug something into your network you basically have to trust the vendor to not spy on you “by design” and to not provide a trivial attack vector to 3rd parties.

First things first. Perhaps you remember that CCTV means Closed Circuit Television. Pay attention to those first two words. I am pretty sure 50% or more of all “CCTV” installations are not closed at all. If your CCTV system is truly closed, there’s no way for the camera to “call home”, and it is impossible for hackers to exploit any attack vectors because there’s no access from the outside world to the camera. There are plenty of PC’s running terrible and vulnerable software out there, but as long as these systems are closed, there’s no problem. Granted, it also limits the flexibility of the system. But that’s the price you pay for security.

In the opposite end of the spectrum are cameras that are directly exposed to the internet. This is a very bad idea, and most professionals probably don’t do that. Well… some clearly do, because a quick scan of the usual sites reveal plenty of seemingly professional installations where cameras are directly accessible from the internet.

To expose a camera directly to the internet you usually have to alter the NAT tables in your router/firewall. This can be a pain in the ass for most people, so another approach is used called hole-punching. This requires a STUN server between the client sitting outside the LAN (perhaps on an LTE connection via AT&T) and the camera inside the LAN. The camera will register with the STUN server via an outbound connection. Almost all routers/firewalls allow outbound connections. The way STUN servers work, probably confuse some people, and they freak out when they see the camera making a connection to “suspicious” IP but that’s simply how things work, and not a cause for alarm.

Now, say you want to record the cameras in your LAN on a machine outside your LAN, perhaps you want an Azure VM to record the video, but how will the recorder on Azure (outside your LAN) get access to your cameras that are inside the LAN unless you set up NAT and thus expose your cameras directly to the internet?

This is where the $10 camera proxy comes in (the actual cost is higher because you’ll need an SD card and a PSU as well).

So, here’s a rough sketch of how you can do things.

  • On Azure you install your favorite VMS
  • Install Wowza or EvoStream as well

EvoStream can receive an incoming RTMP stream, and make the stream available via RTSP, it basically changes the protocol, but uses the same video packets (no transcoding). So, if you were to publish a stream at say rtmp://evostreamserver/live/mycamera, that stream will be available at rtsp://evostreamserver/mycamera. You can then add a generic RTSP camera that reads from rtsp://evostreamserver/mycamera to your VMS.

The next step is to install the proxy, you can use a very cheap Pi clone, or a regular PC.

  • Determine the RTSP address of the camera in question
  • Download FFMpeg
  • Set up FFMpeg so that it publishes the camera to EvoStream (or Wowza) on Azure

Say you have a camera that streams via rtsp://, the command looks something like this (all on one line)

ffmpeg -i rtsp://username:password@ 
-vcodec copy -f flv rtmp://evostreamserver/live/mycamera

This will make your PC grab the AV from the camera and publish it to the evostream server on Azure, but the camera is not directly exposed to the internet. The PC acts as a gateway, and it only creates an outbound connection to another PC that you control as well.

You can now access the video from the VMS on Azure, and your cameras are not exposed at all, so regardless how vulnerable they are, they will not expose any attack vectors to the outside world.

Using Azure is just an example, the point is that you want to isolate the cameras from the outside world, and this can be trivially accomplished by using a proxy.

As a side note. If cameras were deliberately spying on their users, by design, this would quickly be discovered and published. That there are bugs and vulnerabilities in firmware is just a fact of life and not proof of anything nefarious, so calm down, but take the necessary precautions.

TileMill and Ocularis

A long, long time ago, I discovered TileMill. It’s an app that lets you import GIS data, style the map and create a tile-pyramid, much like the tile pyramids used in Ocularis for maps.


There are 2 ways to export the map:

  • Huge JPEG or PNG
  • MBTiles format

So far, the only supported mechanism of getting maps into Ocularis is via a huge image, which is then decomposed into a tile pyramid.

Ocularis reads the map tiles the same way Google Maps (and most other mapping apps) reads the tiles. It simply asks for the tile at x,y,z and the server then returns the tile at that location.

We’ve been able to import Google Map tiles since 2010, but we never released it for a few reasons:

  • Buildings with multiple levels
  • Maps that are not geospatially accurate (subway maps for example)
  • Most maps in Ocularis are floor plans, going through google maps is an unnecessary extra step
  • Reliance on an external server
  • Licensing
  • Feature creep

If the app is relying on Google’s servers to provide the tiles, and your internet connection is slow, or perhaps goes offline, then you lose your mapping capability. To avoid this, we cache a lot of the tiles. This is very close to bulk download which is not allowed. In fact, at one point I downloaded many thousands of tiles, which caused our IP to get blocked on Google Maps for 24 hours.

Using MBTiles

Over the weekend I brought back TileMill, and decided to take a look at the MBTile format. It’s basically a SQLite DB file, with each tile stored as a BLOB. Very simple stuff, but how do I serve the individual tiles over HTTP so that Ocularis can use them?

Turns out, Node.js is the perfect tool for this sort of thing.

Creating a HTTP server is trivial, and opening a SQLite database file is just a couple of lines. So with less than 50 lines of code, I had made myself a MBTile server that would work with Ocularis.


A few caveats : Ocularis has the Y axis pointing down, while MBTiles have the Y axis pointing up. Flipping the Y axis is simple. Ocularis has the highest resolution layer at layer 0, MBTiles have that inverted, so the “world tile” is always layer 0.

So with a few minor changes, this is what I have.


I think it would be trivial to add support for ESRI tile servers, but I don’t really know if this belongs in a VMS client. The question is time was not better utilized by making it easy for the GIS guys to add video capabilities to their app, rather than having the VMS client attempt to be a GIS planning tool.


Tagged ,

Razor Software

It dawned on me that the VMS software, in many cases, are equivalent to razors… or pizzas, or burgers. It’s reaching the point where it’s really hard to sell anything better than just “good enough”. Sure, there are 5-blade razors with gels to soothe the skin, and there are “gourmet” burgers and pizzas (I still prefer Wendy’s, and I’ve been to a lot of burger places). So unless you can sell some sort of “experience”, when peddling easily fungible goods, you’ll be competing on price.

I recently was on vacation in Greece and I noticed that the cash register software was (apparently) DOS based, and hadn’t been updated since 2004. The service was as fast as anywhere else, and the software had no issue looking up the price of an item as quickly as the operator could scan the barcode or manually key in the item. A local hardware store here in Copenhagen has a similar system, and the Luddite discount supermarket chain, Aldi, didn’t have barcode scanners until recently.

In the good old days (of scurvy and gout), the store would simply write down what was sold, and at what price in a little black book. Back then, I doubt there were price tags on items, as it was the Quakers who introduced fixed prices.

Cash register by the National Cash Register Co., Dayton, Ohio, United States, 1915.

Cash register by the National Cash Register Co., Dayton, Ohio, United States, 1915.

The first register came about when saloon owner, James Ritty got fed up with his employees stealing from him, so he invented the cash register. A contraption he called the “Ritty’s Incorruptible Cashier”. Soon after, the invention was sold to the company that later became NCR.

NCR went on to add bookkeeping features to the cash register, so not only would you prevent the employees from stealing (it’s intended purpose), but it would also make your life easier as a shop-keeper.

Electronic cash registers was the next step. Clearly, going from a mechanical contraption to an electronic version made sense. It was cheaper, more reliable, faster and had more features.

Then came the DOS based POS systems that offered yet more flexibility, were easier to use and could interface with more printers and cash drawers.

Today, the local hipster hangout coffee shop down the street is now using an iPad with a credit card scanner attached.

As I see it, the embedded electronic register is equivalent to a DVR with some NTSC cameras. The DOS based register corresponds to the windows based IP video recorders that are fairly common today. And finally, a plug’n play video surveillance solution in the DropCam vein is peered to the iPad POS software.

My thinking is this: A tool that solves a particular, well defined problem, won’t be upgraded, at least not until the new tool offers a substantially better ROI. Adding bookkeeping to the mechanical cash made sense, but I doubt people replaced their old working mechanical registers with the slightly improved variant. New customers will buy the feature-rich variant if the price points are similar, but it’s very hard to change a substantial premium for features that really aren’t needed. And even harder to convince people to replace a working system with something more expensive, but only offers marginally useful features. I suspect that the reason I am seeing a lot of stores with DOS (or DOS-like) registers, is because the functionality is simply good enough, and there’s no meaningful ROI on upgrading to a, say, Windows 10 or iOS version.

And that brings me to VMS systems. Clearly, the analog systems were a pain in the ass. Tapes that were worn out, rendering the system useless when something did happen. Terrible low light performance and so on. The DVR systems were substantially better, causing people to upgrade. And for a while, IP based cameras were very, very expensive, but as prices came down, they started to offer a substantially better ROI than DVRs (in many respects, but not all).

The key point here is “good enough”: I don’t think we are quite there yet, but eventually we will arrive at a point where all VMS/NVRs will be pretty much plug and play. It’ll be a piece of cake to replace a camera (why the hell is this still a challenge on some VMSs?), hopefully no-one is going to spend hours tweaking resolutions, messing around with VBR and CBR, or mess around with “motion sensitivity” settings.

If I had my way, I’d let the “other guys” come up with new condiments, and ways to serve their turd-sandwiches, and instead offer cupcakes. Some people don’t like cupcakes, some prefer donuts, but I bet you that most people would pick a cupcake over any turd-sandwich. There are always people that complain, monday-morning-quarterbacks (some made a career of it), who demand some weird condiment that comes with turd-sandwich #2, and in some cases, sinking your teeth into a turd-sandwich is simply the only way to achieve some narrow goal.

I bet that once someone offers a cupcake, it’s game over for the turd-sandwich stores.

Design is About Intent

Regular readers know that I have written about this problem before, I wholeheartedly agree with John R. Moran on his observation

Delegating is by far the most subtle, pernicious, and widespread of the three evasions, particularly among tech companies. Under the guise of being “user-driven” or providing “choice,” delegators leave crucial design decisions up to the user

Design Is About Intent