4K and H.265 is coming along nicely, but it is also part of the problems we as developers are facing these days. Users want low latency, fluid video, low bandwidth and high resolution, and they want these things on 3 types of platforms – traditional PC applications, apps for mobile devices and tablets, and a web interface. In this article, I’d like to provide some insights from a developer’s perspective.
Fluid or Low Latency
Fluid and low latency at the same time is highly problematic. To the HLS guys, 1 second is “low latency”, but to us, and the WebRTC hackers, we are looking for latencies in the sub 100 ms area. Video surveillance doesn’t always need super low latency – if a fixed camera has 2 seconds of latency, that is rarely a problem (in the real world). But as soon as any kind of interaction takes place (PTZ or 2-way audio) then you need low latency. Optical PTZ can “survive” low latency if you only support PTZ presets or simple movements, but tracking objects will be virtually impossible on a high-latency feed.
Why high latency?
A lot of the tech developed for video distribution is intended for recorded video, and not for low latency live video. The intent is to download a chunk of video, and while that plays, you download the next in the background, this happens over and over, but playing back does not commence until at least the entire first block has been read off the wire. The chunks are usually 5-10 seconds in duration, which is not a problem when you’re watching Bob’s Burgers on Netflix.
The lowest latency you can get is to simple draw the image when you receive it, but due to packetization and network latency, you’re not going to get the frames at a steady pace, which leads to stuttering which is NOT acceptable when Bob’s Burgers is being streamed.
How about WebRTC?
If you’ve ever used Google Hangouts, then you’ve probably used WebRTC. It works great when you have a peer-to-peer connection with a good, low latency connection. The peer-to-peer part is important, because part of the design is that the recipient can tell the sender to adjust its quality on demand. This is rarely feasible on a traditional IP camera, but it could eventually be implemented. WebRTC is built into some web browsers, and it supports H.264 by default, but not H.265 (AFAIK) or any of the legacy formats.
Transcoding
Yes, and no. Transcoding comes at a rather steep price if you expect to use your system as if it ran w/o transcoding. The server has to decode every frame, and then re-encode it in the proper format. Some vendors transcodes to JPEG which makes it easier for the client to handle, but puts a tremendous amount of stress on the server. Not on the encoding side, but the decoding of all those streams is pretty costly. To limit the impact on the transcoding server, you may have to alter the UI to reflect the limitation in server side resources.
Installed Applications
The trivial case is an installed application on a PC or a mobile device. Large install files are pretty annoying (and often unnecessary), but you can package all the application dependencies, and the developer can do pretty much anything they want. There’s usually a lot of disk-space available and fairly large amounts of RAM.
On a mobile device you struggle with OS fragmentation (in case of Android), but since you are writing an installed application, you are able to include all dependencies. The limitations are in computing power, storage, RAM and physical dimensions. The UI that works for a PC with a mouse is useless on a 5″ screen with a touch interface. The CPU/GPU’s are pretty powerful (for their size), but they are no-where near the processing power of a halfway decent PC. The UI has to take this into consideration as well.
“Pure” Web Clients
One issue that I have come across a few times, is that some people think the native app “uses a lot of resources”, while a web based client would somehow, magically, use fewer resources to do the same job. The native app uses 90.0% of the CPU resources to decode video, and it does so a LOT more efficient than a web client would ever be able to. So if you have a low end PC, the solution is not to demand a web client, but to lower the number of cameras on-screen.
Let me make it clear: any web client that relies on an ActiveX component to be downloaded and installed, might as well have been an installed application. ActiveX controls are compiled binaries that only run on the intended platform (IE, Windows, x86 or x64). They are usually implicitly (sometimes explicitly) left behind on the machine, and can be instantiated and used as an attack vector if you can trick a user to visit a malicious site (which is quite easy to accomplish).
The purpose of a web client is to allow someone to access the system from a random machine in the network, w/o having to install anything. An advantage is also that since there is no installer, there’s no need to constantly download and install upgrades every time you access your system. When you log in, you get the latest version of the “client”. Forget all about “better for low end” and “better performance”.
Technology
Java applets can be installed, but often setting up Java for a browser is a pain in the ass (security issues), and performance can be pretty bad.
Flash apps are problematic too, and suffer the same issues as Java applets. Flash has a decent H.264 decoder for .flv formatted streams, but no support for H.265 or legacy formats (unless you write them, from scratch.. and good luck with that 🙂 ) Furthermore, everyone with a web browser in their portfolio is trying to kill Flash due to it’s many problems.
NPAPI or other native plugin frameworks (NaCL, Pepper) did offer decent performance, but usually only worked on one or two browsers (Chrome or Firefox), and Chrome later removed support for NPAPI.
HTML5 offers the <video> tag, which can be used for “live” video. Just not low latency, and codec support is limited.
Javascript performance is now at a point (for the leading browsers) that you can write a full decoder for just about any format you can think of and get reasonable performance for 1 or 2 720p streams if you have a modern PC.
Conclusion
To get broad client side support (that truly works), you have to make compromises on the supported devices side. You cannot support every legacy codec and device and expect to get a decent client side experience on every platform.
As a general rule, I disregard arguments that “if it doesn’t work with X, then it is useless”. Too often, this argument gains traction, and to satisfy X, we sacrifice Y. I would rather support Y 100% if Y makes more sense. I’d rather serve 3 good dishes, than 10 bad ones. But in this industry, it seems that 6000 shitty dishes at an expensive “restaurant” is what people want. I don’t.