Brad Anderson Still Got It!

Brad Anderson was showing the Ocularis System at ISC West this year – this was the 3rd year in a row I believe. The giant touch screen is an impressive demo tool, and Brad’s demos STILL puts “asses in the seats” as someone put it last year. I am always a little nervous that people will just walk by and just utter the dreaded “meh!” and move on to the next booth, but Brad just rocks on stage.

This was also the first year that I did not go. This allowed me to focus on some new and exciting development, and I guess it was also the right time to pass the baton to the next generation of eager developers. Every developer should go to the show at least once every 2 years I think. Partially to appreciate the gravity of bugs in the system when it’s 7.15 am the morning before the show, but also because the interaction with reps and integrators provide a unique opportunity to throw ideas around. Seeing other systems “kick ass” is also pretty good motivation.

Good show guys, perhaps I will see you next year (I never got to go to the prime rib buffet! 😦 )

Microsoft SeaDragon in Action

Back in March 2007 Blaise Aguera y Arcas demonstrated SeaDragon (and PhotoSynth) at TED. 3 years later the most common use of SeaDragon’s technology is mapping. The name has changed to DeepZoom, but the basic concept is the same as every other large image handler. I suppose the inspiration for the systems might come from mipmapping.

An impressive demonstration was made by xRez studios, in which they created a number of high resolution panoramic images of the Yosemite Valley. The demonstration can be seen here (requires Silverlight)

The original SeaDragon presentation:

Zulu Time

Second problem is that the term “2.40” am is ambiguous when we leave daylight savings time. “Let’s meet at 2.40 am”, “what 2.40? the first time, or the second?”


An NVR in New York  is recording one camera in Nevada and one in Maryland. We just left Daylight Savings time, and the video is being watched by an investigator in an office in Arizona.


The first thing we realize is that the location of the NVR really doesn’t matter, it is the location of the camera which is interesting. An NVR must know where the cameras are – geographically.

Second problem is that the term “2.40” am is ambiguous when we leave daylight savings time. “Let’s meet at 2.40 am”, “what 2.40? the first time, or the second?”

Third is that the investigator needs to be aware where the cameras are. “We had a burglary at 6 p.m.” is also ambiguous unless we know where it was 6 p.m.

Military solution

For the ones who’s been in the army,  the term “zulu time” might be recognized. No DST, no timezones, no ambiguity, but not exactly user friendly 🙂  In the programming world we have some options too – Microsoft has a concept of “FILETIME” which is 100-nano second intervals since January 1, 1601, another common solution is seconds since January 1, 1970, and of course  there are a couple of other options, but the principle is that you simply count the number of time units since a fixed point in time. Most, if not all, solutions use Greenwich as the reference location.

We then present the time differently, depending on the location of the operator. By letting the operator specify their location (usually via a control panel in the OS), we know how to convert from “seconds since epoch at Greenwich” to regular looking date and time in Arizona. Naturally, the OS needs to know what timezone AZ is in, and that AZ do not observe DST (good for them!).

By the same token, when an operator enters a time, we need to know what location the operator is referring to when they say “go to 4 pm, april 1st, 2010”. Unless we know where, we cannot make the inverse computation and get back to seconds since epoch.

There are basically 3 possibilities;

  1. The operator uses their own local time
  2. The operator uses the NVRs local time
  3. The operator uses the cameras local time

Is this a real problem?

In lots and lots of cases, the operator, the NVR and the cameras are in the same timezone, so there is no ambiguity, but what happens when they are not? In a large corporate environment, a central NVR cluster might record from offices all over the world. It is not difficult to imagine the annoyance that it would be when someone asks you to find an incident in Hong Kong, at 3 pm local time. You find the camera in Hong Kong, and now the puzzle starts. Well – I am in New Jersey, so do I enter 3 pm (Hong Kong time), so I enter Hamburg Germany time (location of the NVR) or do I calculate the 3 pm Hong Kong time into New Jersey time and enter that?

Most cameras allow you to overlay the local time on the frame, so that solves part of the puzzle. The operator immediately realizes the difference between what they enter, and what they see on the screen, but when they don’t – well – then it all gets a little confusing.

Low Light Problems

A brief primer on the reasons for the noise observed in footage recorded in low light conditions.

Low light conditions in video surveillance is always a problem. When the light available diminishes, there are generally only a few knobs to turn – gain and shutter speed.

Slow Shutter Woes

Regular photography in low light conditions usually calls for a tripod and a timer, this produces wonderfully high fidelity images of things that are stationary, unfortunately moving object – people and vehicles – are reduced to a blurry smear making it extremely hard to recognize and identify people in the frame. If you need to recognize the person in the frame, you need a high shutter speed.

Grain in the Gain

If the shutter is fast (to avoid blurring), the sensor will only receive a fewer of photons in each frame. If the sensor produces an output, say, from 0-255, where zero means “I did not detect ANY light” in other words “black” and 255 means “Lots of light” or “white” then we’d generally like to have our sensor produce values nicely distributed between these values for a regular scene. Every time we sample an area on the sensor we get a slightly different count, sometimes we count 64 photons, the next frame there are 65, then 64, 62, 63, and so on. So the pixel changes from frame to frame. This variation is just random noise that we can deal with in different ways.

But when the light is dim, most of our values are low – say from 0-32 (with a couple of 255’s from distant lights and the odd diode on a piece of equipment).  The noise doesn’t not go away, but since our values are now lower, the impact of counting one more or one less photon is much bigger. The signal to noise ratio goes up.

All these low values would give you an almost black frame, so we simply multiply all our value by 8 to get them nicely spread out between 0 and 255 again. The 255’s are then saturated, but that’s not the biggest problem. Recall how we get slightly different photon counts in each frame? Those small changes are now also amplified by 8, so the pixel is 64, 68, 60, 68 and so on. We also amplify the noise, and that’s why we see those grainy images.

Compressing noise

White noise does not compressed very well. In some audio codecs we identify segments as noise, and ask the receiver to “just make some noise for the next 100 ms, and use this filter to shape it”. In MPEG4 and H.264 we can get some pretty weird results, and JPEGs suddenly grow 100-400% – for frames that contain very little usable information.

What to do about it

There are various filters and algorithms that filter out the noise. You can use spatial and temporal analysis to try and minimize noise, usually performing a sort of averaging in time, but this should be done PRIOR to compression to allow the compressor to function as intended. If a large area of a scene is – well – black, then why not just accept that it is black, and set all those pixels to zero (decimation)?

Other options are a little more simplistic and expensive (throwing money at a problem usually helps) larger sensors and suitable lenses will certainly allow you to improve fidelity in low light conditions.

Some good examples of the concepts can be found here