NVR Integrators Toolbox

I never realized the importance of configuration tools – until now! I suppose as a developer I never really considered the difficulty of designing a complete video surveillance installation, but relied on the old adage “when in doubt, add machines”. But where should I place the cameras, how much data will they record and so on is a big part of these questions, and we only provide relatively simple online “calculators” that certainly does not help you visualize the entire installation.

I guess I still have a lot to learn (even after 10 years in the business 🙂 )

Advertisements

Required Fidelity In Proof

We don’t need record video to realize that something happened, we record because we want to know what or who caused the incident. In fact we are often motivated to go through the archives exactly because we, on our own accord, noticed that something sinister had happened. A smudged, pixelated frame of a man digging through our pockets tell us what we already know.

Video surveillance, generally speaking, is recorded* for later review. We do so for a number of obvious reasons; we want to know who dented my car in the lot, who or what caused the window to break, who stole my wallet, laptop, lawnmower and so on. In such situations we don’t (usually) expect to know the person who is digging through our pockets. And then there are situations where unexpected intrusion is not a problem. This could be a factory floor where a stranger would quickly be observed and apprehended by the staff, in those cases you might be monitoring people you do know .

If you have a bad photograph from a family vacation, you know – out of focus, slightly shaken, you are still very capable of identifying the guy in the red Speedos (2 sizes too small), in fact you would probably testify in court that you know who that guy is, even if you were not present when the snapshot was taken.  But what if you were presented with unclear, smudged photos of total strangers – and then you are asked to drive through town, locating someone who looked just like that 20×32 pixel grab that I just handed you? Now we’d be hard pressed to testify with absolute certainty that we’ve got the right guy (unless he is wearing small red Speedos).

We don’t need record video to realize that something happened, we record because we want to know what or who caused the incident. In fact we are often motivated to go through the archives exactly because we, on our own accord, noticed that something sinister had happened. A smudged, pixelated frame of a man digging through our pockets tell us what we already know. We need to have enough fidelity that we can recognize the perp, even if he is a total stranger, and certainly enough fidelity that someone who knows the perp will be able to.

In more serious cases, the frame or video will be shown to a large amount of people, and we then hope that someone is able to recognize the person from the lo-fi video (and is willing to spill the beans too). But in most cases, the loss of you wallet will not make it to Americas Most Wanted, and you suddenly realize that the money spent on video only provided you with information that you could have deduced otherwise (albeit after some searching of the usual places for the missing wallet  – but you do the ransacking before you review the video anyhow 🙂 )

There are certainly situations where you do not need high fidelity. Long term surveillance of staff or a previously identified person, at that point you do know who you are looking at, so identification is not the issue, but their actions might be.

Most systems allow you to “not record if nothing happens”. This is an extremely crude form of compression, frame 1 is almost exactly like frame 2, so we discard frame 2 completely. But think about the situation where someone fraudulently claims you are liable for something that happened to them. Is if proof to say “I have no recordings, so nothing happened”? Proving that “nothing” happened requires almost no fidelity – even a 160×120 feed from a storage room is good enough to prove that no-one entered between 5 am and 10 am. If someone DID enter, 160×120 is almost certainly not good enough to identify who (assuming it’s a stranger).

What this means, is that the recorder should not only have a “live and recorded” mode, but also be able to change the feed when “things” happen.

H.264 and MPEG4 using variable bit rate offers some of what I am requesting. In a static environment the P-frames are relatively small, but it presents other problems. We’ve discussed the long GOP problem before, and it requires a lot of computational power to alter the frame rate and resolution of H.264, so pruning is usually out of the question for these formats. Then there are issues of browsing on low bandwidth devices where transcoding is needed.

What to record, when, and what you can do with the recordings is absolutely not trivial. Perhaps massive storage solutions will render the problem moot, perhaps extremely fast CPUs will, but cracking the issue now would offer an advantage to the first ones to get it right.

*Although I have encountered situations where video was not recorded at all, those have been rare, and usually nothing is recorded for political reasons.

Pros and Cons of Web Interfaces in Video Surveillance Applications

Wow – longest headline ever.

A very common request is a web-based interface to the video surveillance system. An often used argument is that the end user won’t have to install anything, and that the system is readily available from a variety of platforms, after all – google.com works on macs, PCs, my 5 year old cell phone and my wife’s spanking new iPhone*

Most people are probably familiar with ActiveX controls that are needed when streaming various video formats from a camera to a web browser. While you may not think that you are “installing” anything (since the ActiveX or plugin does not necessarily appear in the Add/Remove programs window), you actually DID. A piece of executable code was downloaded and written to your hard drive, not unlike downloading and running a regular installer. ActiveX controls may require numerous supports DLL’s, which will be downloaded on demand. So even if the installation method is a little different for ActiveX, you are technically installing something on the machine.

The ActiveX controls are platform dependent (you can’t use a windows control on a mac), and they present a security risk unless managed carefully, but then there are Java Applets. These are sort of platform independent,  but can be (always are) a little slower than ActiveX. Adobe Flash is another option, but it won’t work on my wife’s iPhone, the same goes for Silverlight.

Although the second part of the argument is technically true, there are some costs to bear; although getting text and static images on the screen using baseline HTML is trivial, interactivity and streaming video is a different beast altogether. A commonly used technique is AJAX, which pretty much boils down to issuing requests asynchronously to a server using a XML object, but the XML object differs from browser to browser, so you need to write two different pieces of code to accomplish the same feat — on the same OS! Granted, the handing of the different browsers is well documented, and libraries exist that helps the developer overcome these annoyances, but for all intents and purposes, we have just doubled the number of platforms (IE and “everybody else”). The same applies to CSS, and even PNG handling.

Some companies will happily put together a “web solution”. But if you are still pretty much locked into Windows, IE, and you STILL need to install a bunch of ActiveX controls, what’s the point? Often the web solution is a little less useful than the traditional Windows application since the developers are limited to the browsers capabilities, whereas the old-skool application can pull all the levers in the system.

Recently Adobe added GPU accelerated video playback to Flash, and HTML5 is supposed to support H.264. Javascript is now very fast on a wide range of browsers (IE 9 was shown at MIX10 and looks promising, Chrome has always had fast JS). So perhaps a viable solution for desktop PC’s and macs will be available before too long.

*actually she has a Nokia phone, but I needed to add the iPhone in there somehow.

Standards in IP Video

To grow the pie, standards are needed; imagine if all web servers had slightly proprietary protocols, so that you needed an Apache client if you connected to an Apache server, and another for IIS servers. What if you couldn’t move the HTML from the old IIS to the new Apache? Imagine if for every new version of IIS you needed to re-write parts of the HTML, and distribute a new client to all the users.

In broadcast video we have NTSC, PAL and SECAM and a bunch of variations of these, yet TV has attained omnipresence. But the standards have been stable for many years, while it seems as if our industry have a different protocol for each camera, and even different versions of the same camera.

There are 2 big movements to try and define a standard. One is driven by the NVR side, and the other by the big camera manufacturers. I believe the odds are with ONVIF, if they can enforce an element of dictatorship and avoid designing another camel*.

A winner must emerge for the benefit of the consumer. Even 4 different standards would be acceptable, and allow NVR companies to focus on what matters to the end user. Stability, flexibility, ease of use, useful features that improve security, and increases the efficiency of the staff. In the ideal situation, a client would simply buy a conforming camera and plug it in, with the certainty that it would “just work”.

There are roughly 3 areas of interoperability that must be addressed.

  • Discovery and Capability
  • Video and Audio formats
  • IOs

Discovery is partially addressed by uPNP and to some extent ARP and DCHP sniffing can be helpful on LANs. But for remote cameras, a simple XML response to a fixed address would suffice.

Video and Audio formats are standardized, but some manufacturers provide proprietary versions of various codecs (Mobotix has a proprietary JPEG codec for example). I suggested a couple of years ago that we define a reference decoder for each format – I am not sure it was ever done – but if the video or audio cannot be decoded by the reference decoder the camera should not be considered conforming.

A standardized way for cameras to notify an NVR of the closure of an input, and a way to manipulate a relay would also be needed.

On top of all this comes the need for standardized authentication and encryption schemes (trivial), but that is beyond the scope of this humble blog.

As an example, the current ONVIF core spec is 223 pages and is based on SOAP, which makes it relatively simple to use in modern languages – if a little bloated;

HTTP/1.1 500 Internal Server Error
CONTENT-LENGTH: bytes in body
CONTENT-TYPE: application/soap+xml; charset=”utf-8”
DATE: when response was generated
<?xml version=”1.0” ?>
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"
xmlns:ter="http://www.onvif.org/ver10/error"
xmlns:xs="http://www.w3.org/2000/10/XMLSchema">
<soapenv:Body>
  <soapenv:Fault>
    <soapenv:Code>
      <soapenv:Value>fault code </soapenv:Value>
      <soapenv:Subcode>
          <soapenv:Value>ter:fault subcode</soapenv:Value>
          <soapenv:Subcode>
             <soapenv:Value>ter:fault subcode</soapenv:Value>
          </soapenv:Subcode>
      </soapenv:Subcode>
    </soapenv:Code>
    <soapenv:Reason>
      <soapenv:Text xml:lang="en">fault reason</soapenv:Text>
    </soapenv:Reason>
    <soapenv:Node>http://www.w3.org/2003/05/soap-envelope/node/ultimateReceiver</soapenv:Node>
    <soapenv:Role>http://www.w3.org/2003/05/soap-envelope/role/ultimateReceiver</soapenv:Role>
    <soapenv:Detail>
       <soapenv:Text>fault detail</soapenv:Text>
    </soapenv:Detail>
  </soapenv:Fault>
</soapenv:Body>
</soapenv:Envelope>

The long message contains just 5 chunks of useful information; fault code (most likely a number), reason (“not supported”) and details (“don’t do this again”). The rest is called “envelope” – that is one giant envelope for such a small message 🙂

Even if I have my peeves about SOAP and its bloatedness, it is MILES better to have a standard than 236 different ways to get an error code from a camera.

Regardless of who defines the standard, the end user will be the winner. We will then have to compete on different parameters, and I can’t wait for that to happen.

*a camel is a horse designed by committee

Are All Problems Serious?

Did someone ever approach you with a usability problem that was categorized as “serious”, and needed to be fixed asap? Other people might chime in, and now the issue is “very serious”. But more often than not, there are no real metrics to determine the real severity of an issue. A lot of times it is based on gut feelings, and more often than not, it becomes a personal and/or political too.

It turns out, as we probably instinctively know, that one mans serious is another mans benign. According to Rolf Molich,

The CUE-2 teams reported 310 different usability problems. The most frequently reported problem was reported by seven of the nine teams. Only six problems were reported by more than half of the teams, while 232 problems (75 percent) were reported only once. Many of the problems that were classified as “serious” were only reported by a single team. Even the tasks used by most or all teams produced very different results—around 70 percent of the findings for each of these common tasks were unique.

*Emphasis is mine.

I speculate that this has something to do with group dynamics. Frequently you will see that an alpha-male (or female) will emerge in the group, and what he/she deems important is considered serious by everyone else. The group might exaggerate issues to appease the alpha, making things appear worse than they really are. The alpha might try to assert his dominance by elevating his observations too.

In real life, problems are often relayed through a long series of people; The end user tells his manager, the manager tells his manager, he then tells his integrator, who tells the distributor, who talks to the rep, who tells his manager, who then takes it up on a meeting with the CTO, who then tells the team-lead or dept. manager and then – finally – it lands with some programmer, in a completely different shape.

A List Apart has the full story

The Trouble With H.264

In the last couple of years we have seen a proliferation of H.264 capable cameras. As technology improved we were able to push ever higher resolutions though IP cameras, and today 1080p video is not an uncommon request.

H.264 and its siblings (various MPEG formats) were all designed primarily for one purpose : Forward streaming. This is especially true in video surveillance applications, where latency is the enemy, and thus B frames are out of the question, a typical surveillance camera provides ONLY I and P frames.  Technically, the H.264 standard does allow a bunch of tricks for bi-directional access (seeking etc), but most cameras do not support these features yet.

In a traditional surveillance situation, 90% of all video is streamed to disk and never watched. Only a very small fraction of recorded video is ever recalled to investigate an incident. These numbers are rough estimates, and all setups vary.

But we have to store the video – “Absence of evidence is not evidence of absence”, the saying goes, so going to court claiming that “nothing happened, because I have no recordings” will always be a losing case. If H.264 offers great compression ratios, why not record in that format and save drive space?

Storing video in H.264 makes a lot of sense from a storage and bandwidth viewpoint, but video processing there is another old proverb : you can have speed, size and quality, now pick two.

For H.264 we’ve picked size and quality over speed (processing speed). H.264 is a complicated format, which takes considerable processing power to encode (camera) and decode (client). Speed on the client is not a problem when you run 4 to 9 high res H.264 streams, but some clients (ours included) offer views of 64 or even 100 cameras on screen at once, so naturally they expect it to work with ANY combination of cameras.

Now, consider what happens when the operator hits “browse”, and clicks the “step reverse” button.

H.264 video consists of GOPs (Group of Pictures), each picture in the group depends on the previous (again – for surveillance), so to get to the last picture you need to decode all the preceding pictures. Now multiply this by 64.

To increase H.264 compression, the GOP can be of variable length. This makes a lot of sense in surveillance situations where the scene is static and long GOP’s will provide a very high compression. But this makes the browser even more stressed, since it now needs to decode 50 or 60 frames per camera.

Above 25 camera views, the view needs to be treated as a special mode. It is my experience that most operators use it as a selection page, where the operator just “picks a camera from a matrix of thumbnails”.

Perhaps what we need is a better way for operators to select cameras, and get an overview of the situation, not a new 144 camera view.

Usability in Surveillance Software

The next frontier in Video Surveillance Software is USABILITY!!

The next frontier in Video Surveillance Software is USABILITY!!

I hope that this is a lasting trend; I think that it is evident that the Ocularis client has a lot of focus on the user experience, partially because we all enjoy a good UI ourselves, but also because it makes financial sense to make things SIMPLE for the users, for us (fewer support calls) and for the end-user (less training, happier employees, fewer unproductive hours).

Users finally starting to make demands
There has been a tendency to push product rather than end users pulling (requesting a particular solution). A lot of players have the same, middle of the road, design; A tree with some devices on the left and a matrix on the right. Almost all systems use this model, so the differentiation of the products boils down to frames per second, and number of supported camera vendors and finally a “feature fest”.

Not so simple
A simple UI does not mean that it is easy to implement or, perhaps more importantly, it is very hard to agree on a usable design in the development process. It really requires a director who is – in lack of a better term – an “arrogant a-hole”.

Frequently, in large teams, you will always find a number of people who believes that THEIR feature is absolutely critical for the application. This is where the arrogance is needed; sometimes the director must state – “this is an awful idea – idea discarded!”, and then prepare to be on the receiving end of a barrage of complaints and insults about his/her intelligence.

Michelangelo is supposed to have said that creating the statue of David was easy; He simply removed all the “non-David” from the block of marble. In the design of an application we can “do anything”, so the trick is to decide what NOT to do.

Naturally, just being arrogant is not enough to make a cool product (I wish it was 🙂 )

Bells and Whistles
Creating a remarkable UI is not just about nice gradients and corn-flower blue icons (wink, wink). Google is an example of a rather dull and almost childlike look and feel, but the search engine gives you what you need, when you need it, and usually works as you EXPECT it to work. Once the functionality is in place, THEN you can add bells and whistles.

While nice graphics are secondary to a well functioning UI, they often work as a great motivator for the developers on the team. The idea that the product will look very polished and that the company VALUES aesthetics and good looks sends a signal. Even the most accomplished developer will feel less motivated if the application is “polluted” with garish colors, inconsistent icon design, strange alignments and so on.

Inherent Complexity
Some things are just complicated. An example is the creation of rules in a system. You can eliminate SOME of the complexity by offering a stringent entry system to avoid syntax errors and use of initialized variables etc. (all the problems associated with conventional programming), but the core complexity can never be removed: For all intents and purposes, you are asking the operator to “write a program”. And programs may “work”, yet not do what was expected. After all, any application that you ever ran on a PC was made with this message delivered by the compiler “Compile complete : 0 errors”. This CERTAINLY does not mean the application works as intended.

Read more about this topic

All in all, the future looks bright for the end-users of surveillance software, or at least – less frustrating.