Your Sourcecode is Worthless

When Google decided to do Android, they didn’t go and copy Apples iOS sourcecode. They didn’t have to – Google have enough great engineers that they could do their own implementation of iOS’s features; the value of Apple/iOS is in the ideas and the execution.

While a lot of people have ideas, they usually lack the technical ability to bring them to fruition, and even if they do, they might miss the elegance and finesse of a truly awesome solution. Copying source code means you are already behind the curve, you are not gaining anything, instead you are teaching your team that plagiarism is of higher value than innovation. Followship instead of Leadership.

Teach your team to be innovative, to execute ideas well, and you can publish your source-code on a public forum if you want*

*Never publish your source-code, but THINK of it as published, that will make you run faster and become less complacent.

Advertisements

NVR Integration and PSIM

Axis makes cameras, and now they make access control systems and an NVR too. Should the traditional NVR vendors be concerned about it?

Clearly, the Axis NVR team is better positioned to fully exploit all the features of the cameras. Furthermore, the Axis only NVR does not have to spend (waste?) time, trying to fit all sorts of shapes into a round hole. They ONLY have to deal with their own hardware platforms. This is similar to Apple’s OS X that runs so smoothly because the number of hardware permutations are fairly limited.

What if Sony did the same? Come to think of it, Sony already has an NVR. But, it’s no stretch of the imagination to realize that a Sony NVR would support Sony cameras better than the “our NVR does everything”-type.

In fact, when you really look at the problem, an NVR is a proprietary database, and some protocol conversion. To move a PTZ camera, you need to issue various different commands depending on the camera, but the end result is always the same: the camera moves up. Writing a good database engine is really, really hard, but once you have a good one, it is hard to improve. The client and administration tools continue to evolve, and become increasingly complex.

Once it becomes trivial to do the conversion, then any bedroom developer will be able to create a small NVR. Probably very limited functionality, but very cheap.

The cheap NVR might have a simple interface, but what if you could get an advanced interface on top of that cheap NVR? What if you could mix cheap NVRs with NVRs provided by the camera manufacturers, and then throw in access control to the mix? You get the picture.

If you are an NVR vendor, it is going to be an uphill battle to support “foreign” NVRs. If Milestone decided to support Genetec, it would take 5 minutes for Genetec to break that compatibility and have Milestone scramble to update their driver. Furthermore, the end user would have to pay for two licenses, and the experience would probably be terrible.

The next time an NVR vendor says “we are an open architecture”, then take a look at their docs. If the docs do not describe interoperability with a foreign client, then they are not open. An ActiveX control does NOT equate “open”. Genetec could easily support the Milestone recorders too, but it would be cheaper and easier for Genetec to simply replace existing Milestone recorders for free (like a cuckoo chick).

In this market, you cannot get a monopoly and charge a premium. The “need to have” threshold is pretty low, and if you charge too much, someone will make a new, sufficient system and underbid you. Ultimately, NVRs might come bundled with the camera for free.

So, what about PSIM? Well, we started with DVR (which is really a good enough description of what it is), but then we decided to call it an NVR to distance ourselves from the – clearly inferior DVR. Then we weren’t satisfied with that, so then it became VMS. Sure, we added some bells and whistles along the way (we had mapping and videowall in NetSwitcher eons ago), so now we call it PSIM. It does the same as the old DVR. I think this kind of thing is called “marketing”.

Is ONVIF a complete failure?

I like to sit on the sideline and critique the work that other people did. I tend not to participate in committees as I find them totally counterproductive.

To make a protocol a success, I think it should be simple. Now, simplicity is in the eye of the beholder, but take the original HTTP proposal. One page.

ONVIF is not just one page. It is a LOT of pages, a few concepts but a LOT of room to interpret things differently and mess things up (don’t forget to add developers tendency to make funny errors to the mix).

I am told that ONVIF is already creating problems in the real world. An NVR vendor recently announced that a new “Samsung ONVIF driver” was available. The idea that you write ONVIF drivers for particular vendors tells me that while ONVIF might not be broken per se, there are certainly problems pretty much across the board (please correct me if I am wrong).

ONVIF is really a great example of two problematic concepts. Design by committee and the second system effect. While the committee part is fairly obvious, the second system effect might need some clarification: Most people that participated in the committee had already had experience with simple HTTP based interfaces. Now, instead of picking the BEST parts, keeping it simple and then incrementally add functionality, they decided to make this the “mother of all rewrites”. No more simplistic HTTP, no, let’s go all out and throw the technology du jour at it. What about SOAP? Hell YEAH!! I guess the ONVIF guys could have chosen CORBA as well, because when CORBA was hot the community wanted to use CORBA for everything too.

Here’s one example of how the ONVIF committee, in my opinion, have compromised the user for the sake of the technology: Events are never pushed from the camera to the NVR, instead the NVR has to POLL for events. This may or may not be a problem, if you want your system to respond fast you need to poll more often. I think it is a poor design. I don’t need my app to act like a kid in the back seat asking me “are we there yet” every 30 seconds (or every second if you want something that seems a little closer to real time).

Now, I don’t pretend to know the motivation for this weird design and I suppose that there is some merit to it (I just don’t know what it would be). An alternative could have been a HTTP POST with some XPath and an expiration, the response would then be a multipart/mixed response, each payload corresponding to an event.

Don’t take this the wrong way, the ONVIF spec is a magnificent specimen of work, but I believe you have over-engineered the solution. All we needed was for you guys to agree on the same freaking URI to get video and we’d be done, instead we got something terribly complex that I believe will be too difficult for a lot of people to implement.

Flamebaiters

Not to be mistaken for the common internet troll, the flamebaiter will post articles and comments that are unnecessarily provocative or offensive in order to attract traffic.

Many forums have died over the years as they started to self-radicalize. As the forum attracts people who think alike, and start to repel dissenting voices, the participants get increasingly convinced that the forums ideas represent the true order of things.

Every active participant, in almost any blog or forum, has an agenda. The most honest thing, in my opinion, is to state your agenda clearly. That puts your posts in the correct context.

Motion Detection on the Edge

When we design a surveillance system, we need to carefully consider how we allocate resources and distribute workloads. When you add a camera to an NVR, the most common use is to reduce the camera to a fairly dumb “video transmitter” and then let the server do the heavy lifting.

But even if the server is much, much more powerful than your humble IP camera, it is usually taxed with a lot of work. One of the tasks the server routinely carries out is to do what some folks call “motion detection”. The term is usually misleading as the NVR is not really detecting motion at all. It is detecting “changes in the frame”, which could be noise, light, and transition from color to B/W etc. not related to what we understand as “motion” at all. Analytics engines look at differences too, but they are truly looking for “motion” and not JUST changes.

Looking for changes is usually “good enough”, and does not need to be any more than that. And if looking for “change” is what you need, then you really should let your camera do the work and free up the NVR to do more important things.

The reason we initially decided to analyze the frames for changes was really motivated by storage problems. A common HDD in those days was 200-300 MB, the 640×480 frames were considered “high resolution” and the format was always MJPEG. Naturally, the Axis 200+ could not deliver these crisp HD feeds at anywhere near 30 FPS. 3-5 FPS was usually all you could get. But storing this massive amount of data became a problem, so we decided to discard frames that were almost identical.

Naturally, as time passed we got higher resolutions and higher framerates, we were suddenly able to do MPEG4 encoding on a consumer device – in real time!!! MPEG4 and H.264 actually looks at two successive frames in much the same way we do on the NVR. The codec simply “throws away” the redundant information just as we do. Except the codec is throwing away just the parts of the frame that is similar to the previous one, while preserving only the changes – a much, much better way of doing things.

For the codec to figure out what to throw away, it must look at two successive frames. If they are very similar, it can throw away a lot, if they are very different it needs to send almost all the pixels. On top of that H.264 does a lot of other things before the video is sent across the network. These involve among other things – discrete cosine transformation, quantization and Huffman encoding.

It does not seem like a far stretch that the codec implementation could provide a number that tells the camera how much 2 frames are alike. And in a primitive way it actually does – if the frame is large in terms of bytes, then we can deduce that the frames are very different, if the frames are small, then they are very similar. Naturally this is too crude and would not work on CBR feeds, and there is no windowing etc.

Nor does it seem totally unreasonable that the codec implementation could give the “difference parameter” for each macroblock (a small 8×8 pixel block). It is important to understand that the codec already is doing the computation, we are just asking to get to peek at the result. Furthermore, the codec is also working on the crisp uncompressed frames that have the highest level of fidelity, and no information has been thrown away.

In naive implementations like the one I describe here, there is not a lot to be gained from working on the raw frames in the camera, but ask any analytics vendor if they would prefer to work on the video BEFORE or AFTER compression and the answer will uniformly be the same : BEFORE compression. So while the benefit is not huge, it is not completely without merit.

To do the detection on the NVR, the NVR will have to completely reverse the process: Take the Huffman symbols, and expand them into imaginary coefficients, go from frequency to the spatial domain, and only then can you start to think about examining the frames. You can then make all sorts of tricks – perhaps you only look at every N pixel, perhaps you don’t look at every frame, perhaps you get a lot of noise from too heavy compression, perhaps you don’t. Every single trick lowers the “quality” of the detection. Perhaps the client doesn’t care, even with severe degradation of the quality, and that’s fine by me. I am focused on improving and providing better, more efficient solutions and offering them to the ones who appreciate such things.

The point is this – spending a lot of resources decoding a H.264 stream, to get information that could have been gathered almost for free in the camera, is not my idea of efficient allocation of the resources. It is like rejecting a free apple, only to ride 30 miles to the store to buy the same, exact apple, only now it is slightly bruised from transporting it to the store – AND it takes a lot of effort to unwrap the apple.

In time, an NVR will not need to do much, in fact, I expect an NVR to be very similar to a NAS. Cheap, easy to replace, and very scalable. This will require that the cameras become a little more advanced, but my experience tells me that progress doesn’t just stop. We were amazed by 640 x 480 at 4 fps when I started, and just as we laugh at that today, we will laugh at NVR side change detection 10 years from now.

I suspect that a lot of cameras do not have the fine grained control over the encoding process that is needed here. I assume that they are picking off-the-shelf H.264 encoders or reference designs offered by the chip manufacturers. For such cameras, there might not be a simple way to do on-board processing, and doing so may jeopardize the performance of the camera – for those, you will have to spring for the expensive PC’s.

Start preparing for the change 🙂

Manual Gain Control on the Client

Another day, another experiment.

A question was brought up on one of the LinkedIn forums that I follow. I am no fan of Automatic Gain Control (AGC) as it is implemented today (I’m no fan of a lot of things it seems). The reason is that it seems to me that a lot of AGC implementations are pretty naive implementations that just multiply each pixel by a value determined by averaging the frame. I have not come across a system that will allow you to apply a different gain value to different parts of the frame.

AGC introduces a lot of noise into the frame. This in turn causes a) the compression to go to shit, and b) wreaks havoc on the motion detection. A lot of customers will pretty much be recording noise all night.

So why not do it on the client application?

Here’s a quick example I cooked up. I am using a couple of soft-buttons and since we are using the GPU for rendering the video, the cost of doing the multiplication is almost zero (the GPU was built for this sort of thing, so why not use it?)

Nothing fancy.

Painted into a Corner

When I first co-wrote Milestone Surveillance Lite and XXV we had a performance problem. My PC was a Celeron 300, and the Axis 200+ was unable to stream more than a couple of frames per second. Analog Matrix systems would run full framerate (25 or 30 fps), show 9 or even 16 cameras at any given time, and have virtually zero lag for joystick control.

As the hardware became more powerful we were able to add more cameras. Few people ran XXV (named after its ability to show 25 cameras) at full capacity, but 25 was more than 16 and more is better. People had the theoretical option to run 25 cameras which was a good selling point. People understood the argument instantly.

Since the jump in cameras on the screen was such a good story, we went on and said why not place 64 cameras on the screen at once. Again, few people ever ran 64, but they had the option. Again 64 is better than 25, and it is such a simple principle to explain.. more = better.

Now we can do hundreds of cameras on the screen at once. No-one can make sense of what is going on, but more is better..

Right?

What would happen if we released a software that went back to 16 cameras? Would anyone buy such a system? Since we’ve kept preaching that more is better, then 16 must surely be vastly inferior to a 200 camera layout.

That’s a difficult sales-pitch!

We’ve painted ourselves into a corner. Leading the clients to believe that “more is better”. More features, more cameras, more frames per second and so on.

Which would be true if we had infinite resources.

When a company decides to spend time on A, then they are NOT spending time on B. Adding one more camera driver, might mean that the IP auto-detection function did not get done, spending a lot of time on optimizing the decoding pipeline means NOT spending time on simplifying the UI and so on.

I think people like the idea that they CAN go to 100 cameras, just like the speedometer suggests that I can go to 160 mph if i so desire.

Truth is, we never do, and we really can’t – even if we tried.

The iPhone showed the world that people will trade more for less, if things are done right. The world was awash with phones that had myriads of  features. Microsoft laughed at Apple – a phone with no bluetooth, no exchange server support, no cut-and-paste! Microsoft had long followed the strategy that five shitty features had to be better than one good one, and now a newcomer was going to do things totally differently. No chance they would succeed.

Perhaps video surveillance is different.

Why do people REALLY need a 64 camera view? Help me out here!