Low Light Problems

A brief primer on the reasons for the noise observed in footage recorded in low light conditions.


Low light conditions in video surveillance is always a problem. When the light available diminishes, there are generally only a few knobs to turn – gain and shutter speed.

Slow Shutter Woes

Regular photography in low light conditions usually calls for a tripod and a timer, this produces wonderfully high fidelity images of things that are stationary, unfortunately moving object – people and vehicles – are reduced to a blurry smear making it extremely hard to recognize and identify people in the frame. If you need to recognize the person in the frame, you need a high shutter speed.

Grain in the Gain

If the shutter is fast (to avoid blurring), the sensor will only receive a fewer of photons in each frame. If the sensor produces an output, say, from 0-255, where zero means “I did not detect ANY light” in other words “black” and 255 means “Lots of light” or “white” then we’d generally like to have our sensor produce values nicely distributed between these values for a regular scene. Every time we sample an area on the sensor we get a slightly different count, sometimes we count 64 photons, the next frame there are 65, then 64, 62, 63, and so on. So the pixel changes from frame to frame. This variation is just random noise that we can deal with in different ways.

But when the light is dim, most of our values are low – say from 0-32 (with a couple of 255’s from distant lights and the odd diode on a piece of equipment).  The noise doesn’t not go away, but since our values are now lower, the impact of counting one more or one less photon is much bigger. The signal to noise ratio goes up.

All these low values would give you an almost black frame, so we simply multiply all our value by 8 to get them nicely spread out between 0 and 255 again. The 255’s are then saturated, but that’s not the biggest problem. Recall how we get slightly different photon counts in each frame? Those small changes are now also amplified by 8, so the pixel is 64, 68, 60, 68 and so on. We also amplify the noise, and that’s why we see those grainy images.

Compressing noise

White noise does not compressed very well. In some audio codecs we identify segments as noise, and ask the receiver to “just make some noise for the next 100 ms, and use this filter to shape it”. In MPEG4 and H.264 we can get some pretty weird results, and JPEGs suddenly grow 100-400% – for frames that contain very little usable information.

What to do about it

There are various filters and algorithms that filter out the noise. You can use spatial and temporal analysis to try and minimize noise, usually performing a sort of averaging in time, but this should be done PRIOR to compression to allow the compressor to function as intended. If a large area of a scene is – well – black, then why not just accept that it is black, and set all those pixels to zero (decimation)?

Other options are a little more simplistic and expensive (throwing money at a problem usually helps) larger sensors and suitable lenses will certainly allow you to improve fidelity in low light conditions.

Some good examples of the concepts can be found here

NVR Integrators Toolbox

I never realized the importance of configuration tools – until now! I suppose as a developer I never really considered the difficulty of designing a complete video surveillance installation, but relied on the old adage “when in doubt, add machines”. But where should I place the cameras, how much data will they record and so on is a big part of these questions, and we only provide relatively simple online “calculators” that certainly does not help you visualize the entire installation.

I guess I still have a lot to learn (even after 10 years in the business 🙂 )

Required Fidelity In Proof

We don’t need record video to realize that something happened, we record because we want to know what or who caused the incident. In fact we are often motivated to go through the archives exactly because we, on our own accord, noticed that something sinister had happened. A smudged, pixelated frame of a man digging through our pockets tell us what we already know.

Video surveillance, generally speaking, is recorded* for later review. We do so for a number of obvious reasons; we want to know who dented my car in the lot, who or what caused the window to break, who stole my wallet, laptop, lawnmower and so on. In such situations we don’t (usually) expect to know the person who is digging through our pockets. And then there are situations where unexpected intrusion is not a problem. This could be a factory floor where a stranger would quickly be observed and apprehended by the staff, in those cases you might be monitoring people you do know .

If you have a bad photograph from a family vacation, you know – out of focus, slightly shaken, you are still very capable of identifying the guy in the red Speedos (2 sizes too small), in fact you would probably testify in court that you know who that guy is, even if you were not present when the snapshot was taken.  But what if you were presented with unclear, smudged photos of total strangers – and then you are asked to drive through town, locating someone who looked just like that 20×32 pixel grab that I just handed you? Now we’d be hard pressed to testify with absolute certainty that we’ve got the right guy (unless he is wearing small red Speedos).

We don’t need record video to realize that something happened, we record because we want to know what or who caused the incident. In fact we are often motivated to go through the archives exactly because we, on our own accord, noticed that something sinister had happened. A smudged, pixelated frame of a man digging through our pockets tell us what we already know. We need to have enough fidelity that we can recognize the perp, even if he is a total stranger, and certainly enough fidelity that someone who knows the perp will be able to.

In more serious cases, the frame or video will be shown to a large amount of people, and we then hope that someone is able to recognize the person from the lo-fi video (and is willing to spill the beans too). But in most cases, the loss of you wallet will not make it to Americas Most Wanted, and you suddenly realize that the money spent on video only provided you with information that you could have deduced otherwise (albeit after some searching of the usual places for the missing wallet  – but you do the ransacking before you review the video anyhow 🙂 )

There are certainly situations where you do not need high fidelity. Long term surveillance of staff or a previously identified person, at that point you do know who you are looking at, so identification is not the issue, but their actions might be.

Most systems allow you to “not record if nothing happens”. This is an extremely crude form of compression, frame 1 is almost exactly like frame 2, so we discard frame 2 completely. But think about the situation where someone fraudulently claims you are liable for something that happened to them. Is if proof to say “I have no recordings, so nothing happened”? Proving that “nothing” happened requires almost no fidelity – even a 160×120 feed from a storage room is good enough to prove that no-one entered between 5 am and 10 am. If someone DID enter, 160×120 is almost certainly not good enough to identify who (assuming it’s a stranger).

What this means, is that the recorder should not only have a “live and recorded” mode, but also be able to change the feed when “things” happen.

H.264 and MPEG4 using variable bit rate offers some of what I am requesting. In a static environment the P-frames are relatively small, but it presents other problems. We’ve discussed the long GOP problem before, and it requires a lot of computational power to alter the frame rate and resolution of H.264, so pruning is usually out of the question for these formats. Then there are issues of browsing on low bandwidth devices where transcoding is needed.

What to record, when, and what you can do with the recordings is absolutely not trivial. Perhaps massive storage solutions will render the problem moot, perhaps extremely fast CPUs will, but cracking the issue now would offer an advantage to the first ones to get it right.

*Although I have encountered situations where video was not recorded at all, those have been rare, and usually nothing is recorded for political reasons.

Pros and Cons of Web Interfaces in Video Surveillance Applications

Wow – longest headline ever.

A very common request is a web-based interface to the video surveillance system. An often used argument is that the end user won’t have to install anything, and that the system is readily available from a variety of platforms, after all – google.com works on macs, PCs, my 5 year old cell phone and my wife’s spanking new iPhone*

Most people are probably familiar with ActiveX controls that are needed when streaming various video formats from a camera to a web browser. While you may not think that you are “installing” anything (since the ActiveX or plugin does not necessarily appear in the Add/Remove programs window), you actually DID. A piece of executable code was downloaded and written to your hard drive, not unlike downloading and running a regular installer. ActiveX controls may require numerous supports DLL’s, which will be downloaded on demand. So even if the installation method is a little different for ActiveX, you are technically installing something on the machine.

The ActiveX controls are platform dependent (you can’t use a windows control on a mac), and they present a security risk unless managed carefully, but then there are Java Applets. These are sort of platform independent,  but can be (always are) a little slower than ActiveX. Adobe Flash is another option, but it won’t work on my wife’s iPhone, the same goes for Silverlight.

Although the second part of the argument is technically true, there are some costs to bear; although getting text and static images on the screen using baseline HTML is trivial, interactivity and streaming video is a different beast altogether. A commonly used technique is AJAX, which pretty much boils down to issuing requests asynchronously to a server using a XML object, but the XML object differs from browser to browser, so you need to write two different pieces of code to accomplish the same feat — on the same OS! Granted, the handing of the different browsers is well documented, and libraries exist that helps the developer overcome these annoyances, but for all intents and purposes, we have just doubled the number of platforms (IE and “everybody else”). The same applies to CSS, and even PNG handling.

Some companies will happily put together a “web solution”. But if you are still pretty much locked into Windows, IE, and you STILL need to install a bunch of ActiveX controls, what’s the point? Often the web solution is a little less useful than the traditional Windows application since the developers are limited to the browsers capabilities, whereas the old-skool application can pull all the levers in the system.

Recently Adobe added GPU accelerated video playback to Flash, and HTML5 is supposed to support H.264. Javascript is now very fast on a wide range of browsers (IE 9 was shown at MIX10 and looks promising, Chrome has always had fast JS). So perhaps a viable solution for desktop PC’s and macs will be available before too long.

*actually she has a Nokia phone, but I needed to add the iPhone in there somehow.

Standards in IP Video

To grow the pie, standards are needed; imagine if all web servers had slightly proprietary protocols, so that you needed an Apache client if you connected to an Apache server, and another for IIS servers. What if you couldn’t move the HTML from the old IIS to the new Apache? Imagine if for every new version of IIS you needed to re-write parts of the HTML, and distribute a new client to all the users.

In broadcast video we have NTSC, PAL and SECAM and a bunch of variations of these, yet TV has attained omnipresence. But the standards have been stable for many years, while it seems as if our industry have a different protocol for each camera, and even different versions of the same camera.

There are 2 big movements to try and define a standard. One is driven by the NVR side, and the other by the big camera manufacturers. I believe the odds are with ONVIF, if they can enforce an element of dictatorship and avoid designing another camel*.

A winner must emerge for the benefit of the consumer. Even 4 different standards would be acceptable, and allow NVR companies to focus on what matters to the end user. Stability, flexibility, ease of use, useful features that improve security, and increases the efficiency of the staff. In the ideal situation, a client would simply buy a conforming camera and plug it in, with the certainty that it would “just work”.

There are roughly 3 areas of interoperability that must be addressed.

  • Discovery and Capability
  • Video and Audio formats
  • IOs

Discovery is partially addressed by uPNP and to some extent ARP and DCHP sniffing can be helpful on LANs. But for remote cameras, a simple XML response to a fixed address would suffice.

Video and Audio formats are standardized, but some manufacturers provide proprietary versions of various codecs (Mobotix has a proprietary JPEG codec for example). I suggested a couple of years ago that we define a reference decoder for each format – I am not sure it was ever done – but if the video or audio cannot be decoded by the reference decoder the camera should not be considered conforming.

A standardized way for cameras to notify an NVR of the closure of an input, and a way to manipulate a relay would also be needed.

On top of all this comes the need for standardized authentication and encryption schemes (trivial), but that is beyond the scope of this humble blog.

As an example, the current ONVIF core spec is 223 pages and is based on SOAP, which makes it relatively simple to use in modern languages – if a little bloated;

HTTP/1.1 500 Internal Server Error
CONTENT-LENGTH: bytes in body
CONTENT-TYPE: application/soap+xml; charset=”utf-8”
DATE: when response was generated
<?xml version=”1.0” ?>
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"
      <soapenv:Value>fault code </soapenv:Value>
          <soapenv:Value>ter:fault subcode</soapenv:Value>
             <soapenv:Value>ter:fault subcode</soapenv:Value>
      <soapenv:Text xml:lang="en">fault reason</soapenv:Text>
       <soapenv:Text>fault detail</soapenv:Text>

The long message contains just 5 chunks of useful information; fault code (most likely a number), reason (“not supported”) and details (“don’t do this again”). The rest is called “envelope” – that is one giant envelope for such a small message 🙂

Even if I have my peeves about SOAP and its bloatedness, it is MILES better to have a standard than 236 different ways to get an error code from a camera.

Regardless of who defines the standard, the end user will be the winner. We will then have to compete on different parameters, and I can’t wait for that to happen.

*a camel is a horse designed by committee

Are All Problems Serious?

Did someone ever approach you with a usability problem that was categorized as “serious”, and needed to be fixed asap? Other people might chime in, and now the issue is “very serious”. But more often than not, there are no real metrics to determine the real severity of an issue. A lot of times it is based on gut feelings, and more often than not, it becomes a personal and/or political too.

It turns out, as we probably instinctively know, that one mans serious is another mans benign. According to Rolf Molich,

The CUE-2 teams reported 310 different usability problems. The most frequently reported problem was reported by seven of the nine teams. Only six problems were reported by more than half of the teams, while 232 problems (75 percent) were reported only once. Many of the problems that were classified as “serious” were only reported by a single team. Even the tasks used by most or all teams produced very different results—around 70 percent of the findings for each of these common tasks were unique.

*Emphasis is mine.

I speculate that this has something to do with group dynamics. Frequently you will see that an alpha-male (or female) will emerge in the group, and what he/she deems important is considered serious by everyone else. The group might exaggerate issues to appease the alpha, making things appear worse than they really are. The alpha might try to assert his dominance by elevating his observations too.

In real life, problems are often relayed through a long series of people; The end user tells his manager, the manager tells his manager, he then tells his integrator, who tells the distributor, who talks to the rep, who tells his manager, who then takes it up on a meeting with the CTO, who then tells the team-lead or dept. manager and then – finally – it lands with some programmer, in a completely different shape.

A List Apart has the full story

The Trouble With H.264

In the last couple of years we have seen a proliferation of H.264 capable cameras. As technology improved we were able to push ever higher resolutions though IP cameras, and today 1080p video is not an uncommon request.

H.264 and its siblings (various MPEG formats) were all designed primarily for one purpose : Forward streaming. This is especially true in video surveillance applications, where latency is the enemy, and thus B frames are out of the question, a typical surveillance camera provides ONLY I and P frames.  Technically, the H.264 standard does allow a bunch of tricks for bi-directional access (seeking etc), but most cameras do not support these features yet.

In a traditional surveillance situation, 90% of all video is streamed to disk and never watched. Only a very small fraction of recorded video is ever recalled to investigate an incident. These numbers are rough estimates, and all setups vary.

But we have to store the video – “Absence of evidence is not evidence of absence”, the saying goes, so going to court claiming that “nothing happened, because I have no recordings” will always be a losing case. If H.264 offers great compression ratios, why not record in that format and save drive space?

Storing video in H.264 makes a lot of sense from a storage and bandwidth viewpoint, but video processing there is another old proverb : you can have speed, size and quality, now pick two.

For H.264 we’ve picked size and quality over speed (processing speed). H.264 is a complicated format, which takes considerable processing power to encode (camera) and decode (client). Speed on the client is not a problem when you run 4 to 9 high res H.264 streams, but some clients (ours included) offer views of 64 or even 100 cameras on screen at once, so naturally they expect it to work with ANY combination of cameras.

Now, consider what happens when the operator hits “browse”, and clicks the “step reverse” button.

H.264 video consists of GOPs (Group of Pictures), each picture in the group depends on the previous (again – for surveillance), so to get to the last picture you need to decode all the preceding pictures. Now multiply this by 64.

To increase H.264 compression, the GOP can be of variable length. This makes a lot of sense in surveillance situations where the scene is static and long GOP’s will provide a very high compression. But this makes the browser even more stressed, since it now needs to decode 50 or 60 frames per camera.

Above 25 camera views, the view needs to be treated as a special mode. It is my experience that most operators use it as a selection page, where the operator just “picks a camera from a matrix of thumbnails”.

Perhaps what we need is a better way for operators to select cameras, and get an overview of the situation, not a new 144 camera view.