Brad Anderson Still Got It!

Brad Anderson was showing the Ocularis System at ISC West this year – this was the 3rd year in a row I believe. The giant touch screen is an impressive demo tool, and Brad’s demos STILL puts “asses in the seats” as someone put it last year. I am always a little nervous that people will just walk by and just utter the dreaded “meh!” and move on to the next booth, but Brad just rocks on stage.

This was also the first year that I did not go. This allowed me to focus on some new and exciting development, and I guess it was also the right time to pass the baton to the next generation of eager developers. Every developer should go to the show at least once every 2 years I think. Partially to appreciate the gravity of bugs in the system when it’s 7.15 am the morning before the show, but also because the interaction with reps and integrators provide a unique opportunity to throw ideas around. Seeing other systems “kick ass” is also pretty good motivation.

Good show guys, perhaps I will see you next year (I never got to go to the prime rib buffet! 😦 )

Microsoft SeaDragon in Action

Back in March 2007 Blaise Aguera y Arcas demonstrated SeaDragon (and PhotoSynth) at TED. 3 years later the most common use of SeaDragon’s technology is mapping. The name has changed to DeepZoom, but the basic concept is the same as every other large image handler. I suppose the inspiration for the systems might come from mipmapping.

An impressive demonstration was made by xRez studios, in which they created a number of high resolution panoramic images of the Yosemite Valley. The demonstration can be seen here (requires Silverlight)

The original SeaDragon presentation:

Zulu Time

Second problem is that the term “2.40” am is ambiguous when we leave daylight savings time. “Let’s meet at 2.40 am”, “what 2.40? the first time, or the second?”

Timezones!

An NVR in New York  is recording one camera in Nevada and one in Maryland. We just left Daylight Savings time, and the video is being watched by an investigator in an office in Arizona.

Phew…

The first thing we realize is that the location of the NVR really doesn’t matter, it is the location of the camera which is interesting. An NVR must know where the cameras are – geographically.

Second problem is that the term “2.40” am is ambiguous when we leave daylight savings time. “Let’s meet at 2.40 am”, “what 2.40? the first time, or the second?”

Third is that the investigator needs to be aware where the cameras are. “We had a burglary at 6 p.m.” is also ambiguous unless we know where it was 6 p.m.

Military solution

For the ones who’s been in the army,  the term “zulu time” might be recognized. No DST, no timezones, no ambiguity, but not exactly user friendly 🙂  In the programming world we have some options too – Microsoft has a concept of “FILETIME” which is 100-nano second intervals since January 1, 1601, another common solution is seconds since January 1, 1970, and of course  there are a couple of other options, but the principle is that you simply count the number of time units since a fixed point in time. Most, if not all, solutions use Greenwich as the reference location.

We then present the time differently, depending on the location of the operator. By letting the operator specify their location (usually via a control panel in the OS), we know how to convert from “seconds since epoch at Greenwich” to regular looking date and time in Arizona. Naturally, the OS needs to know what timezone AZ is in, and that AZ do not observe DST (good for them!).

By the same token, when an operator enters a time, we need to know what location the operator is referring to when they say “go to 4 pm, april 1st, 2010”. Unless we know where, we cannot make the inverse computation and get back to seconds since epoch.

There are basically 3 possibilities;

  1. The operator uses their own local time
  2. The operator uses the NVRs local time
  3. The operator uses the cameras local time

Is this a real problem?

In lots and lots of cases, the operator, the NVR and the cameras are in the same timezone, so there is no ambiguity, but what happens when they are not? In a large corporate environment, a central NVR cluster might record from offices all over the world. It is not difficult to imagine the annoyance that it would be when someone asks you to find an incident in Hong Kong, at 3 pm local time. You find the camera in Hong Kong, and now the puzzle starts. Well – I am in New Jersey, so do I enter 3 pm (Hong Kong time), so I enter Hamburg Germany time (location of the NVR) or do I calculate the 3 pm Hong Kong time into New Jersey time and enter that?

Most cameras allow you to overlay the local time on the frame, so that solves part of the puzzle. The operator immediately realizes the difference between what they enter, and what they see on the screen, but when they don’t – well – then it all gets a little confusing.

Low Light Problems

A brief primer on the reasons for the noise observed in footage recorded in low light conditions.

Low light conditions in video surveillance is always a problem. When the light available diminishes, there are generally only a few knobs to turn – gain and shutter speed.

Slow Shutter Woes

Regular photography in low light conditions usually calls for a tripod and a timer, this produces wonderfully high fidelity images of things that are stationary, unfortunately moving object – people and vehicles – are reduced to a blurry smear making it extremely hard to recognize and identify people in the frame. If you need to recognize the person in the frame, you need a high shutter speed.

Grain in the Gain

If the shutter is fast (to avoid blurring), the sensor will only receive a fewer of photons in each frame. If the sensor produces an output, say, from 0-255, where zero means “I did not detect ANY light” in other words “black” and 255 means “Lots of light” or “white” then we’d generally like to have our sensor produce values nicely distributed between these values for a regular scene. Every time we sample an area on the sensor we get a slightly different count, sometimes we count 64 photons, the next frame there are 65, then 64, 62, 63, and so on. So the pixel changes from frame to frame. This variation is just random noise that we can deal with in different ways.

But when the light is dim, most of our values are low – say from 0-32 (with a couple of 255’s from distant lights and the odd diode on a piece of equipment).  The noise doesn’t not go away, but since our values are now lower, the impact of counting one more or one less photon is much bigger. The signal to noise ratio goes up.

All these low values would give you an almost black frame, so we simply multiply all our value by 8 to get them nicely spread out between 0 and 255 again. The 255’s are then saturated, but that’s not the biggest problem. Recall how we get slightly different photon counts in each frame? Those small changes are now also amplified by 8, so the pixel is 64, 68, 60, 68 and so on. We also amplify the noise, and that’s why we see those grainy images.

Compressing noise

White noise does not compressed very well. In some audio codecs we identify segments as noise, and ask the receiver to “just make some noise for the next 100 ms, and use this filter to shape it”. In MPEG4 and H.264 we can get some pretty weird results, and JPEGs suddenly grow 100-400% – for frames that contain very little usable information.

What to do about it

There are various filters and algorithms that filter out the noise. You can use spatial and temporal analysis to try and minimize noise, usually performing a sort of averaging in time, but this should be done PRIOR to compression to allow the compressor to function as intended. If a large area of a scene is – well – black, then why not just accept that it is black, and set all those pixels to zero (decimation)?

Other options are a little more simplistic and expensive (throwing money at a problem usually helps) larger sensors and suitable lenses will certainly allow you to improve fidelity in low light conditions.

Some good examples of the concepts can be found here

NVR Integrators Toolbox

I never realized the importance of configuration tools – until now! I suppose as a developer I never really considered the difficulty of designing a complete video surveillance installation, but relied on the old adage “when in doubt, add machines”. But where should I place the cameras, how much data will they record and so on is a big part of these questions, and we only provide relatively simple online “calculators” that certainly does not help you visualize the entire installation.

I guess I still have a lot to learn (even after 10 years in the business 🙂 )

Required Fidelity In Proof

We don’t need record video to realize that something happened, we record because we want to know what or who caused the incident. In fact we are often motivated to go through the archives exactly because we, on our own accord, noticed that something sinister had happened. A smudged, pixelated frame of a man digging through our pockets tell us what we already know.

Video surveillance, generally speaking, is recorded* for later review. We do so for a number of obvious reasons; we want to know who dented my car in the lot, who or what caused the window to break, who stole my wallet, laptop, lawnmower and so on. In such situations we don’t (usually) expect to know the person who is digging through our pockets. And then there are situations where unexpected intrusion is not a problem. This could be a factory floor where a stranger would quickly be observed and apprehended by the staff, in those cases you might be monitoring people you do know .

If you have a bad photograph from a family vacation, you know – out of focus, slightly shaken, you are still very capable of identifying the guy in the red Speedos (2 sizes too small), in fact you would probably testify in court that you know who that guy is, even if you were not present when the snapshot was taken.  But what if you were presented with unclear, smudged photos of total strangers – and then you are asked to drive through town, locating someone who looked just like that 20×32 pixel grab that I just handed you? Now we’d be hard pressed to testify with absolute certainty that we’ve got the right guy (unless he is wearing small red Speedos).

We don’t need record video to realize that something happened, we record because we want to know what or who caused the incident. In fact we are often motivated to go through the archives exactly because we, on our own accord, noticed that something sinister had happened. A smudged, pixelated frame of a man digging through our pockets tell us what we already know. We need to have enough fidelity that we can recognize the perp, even if he is a total stranger, and certainly enough fidelity that someone who knows the perp will be able to.

In more serious cases, the frame or video will be shown to a large amount of people, and we then hope that someone is able to recognize the person from the lo-fi video (and is willing to spill the beans too). But in most cases, the loss of you wallet will not make it to Americas Most Wanted, and you suddenly realize that the money spent on video only provided you with information that you could have deduced otherwise (albeit after some searching of the usual places for the missing wallet  – but you do the ransacking before you review the video anyhow 🙂 )

There are certainly situations where you do not need high fidelity. Long term surveillance of staff or a previously identified person, at that point you do know who you are looking at, so identification is not the issue, but their actions might be.

Most systems allow you to “not record if nothing happens”. This is an extremely crude form of compression, frame 1 is almost exactly like frame 2, so we discard frame 2 completely. But think about the situation where someone fraudulently claims you are liable for something that happened to them. Is if proof to say “I have no recordings, so nothing happened”? Proving that “nothing” happened requires almost no fidelity – even a 160×120 feed from a storage room is good enough to prove that no-one entered between 5 am and 10 am. If someone DID enter, 160×120 is almost certainly not good enough to identify who (assuming it’s a stranger).

What this means, is that the recorder should not only have a “live and recorded” mode, but also be able to change the feed when “things” happen.

H.264 and MPEG4 using variable bit rate offers some of what I am requesting. In a static environment the P-frames are relatively small, but it presents other problems. We’ve discussed the long GOP problem before, and it requires a lot of computational power to alter the frame rate and resolution of H.264, so pruning is usually out of the question for these formats. Then there are issues of browsing on low bandwidth devices where transcoding is needed.

What to record, when, and what you can do with the recordings is absolutely not trivial. Perhaps massive storage solutions will render the problem moot, perhaps extremely fast CPUs will, but cracking the issue now would offer an advantage to the first ones to get it right.

*Although I have encountered situations where video was not recorded at all, those have been rare, and usually nothing is recorded for political reasons.

Pros and Cons of Web Interfaces in Video Surveillance Applications

Wow – longest headline ever.

A very common request is a web-based interface to the video surveillance system. An often used argument is that the end user won’t have to install anything, and that the system is readily available from a variety of platforms, after all – google.com works on macs, PCs, my 5 year old cell phone and my wife’s spanking new iPhone*

Most people are probably familiar with ActiveX controls that are needed when streaming various video formats from a camera to a web browser. While you may not think that you are “installing” anything (since the ActiveX or plugin does not necessarily appear in the Add/Remove programs window), you actually DID. A piece of executable code was downloaded and written to your hard drive, not unlike downloading and running a regular installer. ActiveX controls may require numerous supports DLL’s, which will be downloaded on demand. So even if the installation method is a little different for ActiveX, you are technically installing something on the machine.

The ActiveX controls are platform dependent (you can’t use a windows control on a mac), and they present a security risk unless managed carefully, but then there are Java Applets. These are sort of platform independent,  but can be (always are) a little slower than ActiveX. Adobe Flash is another option, but it won’t work on my wife’s iPhone, the same goes for Silverlight.

Although the second part of the argument is technically true, there are some costs to bear; although getting text and static images on the screen using baseline HTML is trivial, interactivity and streaming video is a different beast altogether. A commonly used technique is AJAX, which pretty much boils down to issuing requests asynchronously to a server using a XML object, but the XML object differs from browser to browser, so you need to write two different pieces of code to accomplish the same feat — on the same OS! Granted, the handing of the different browsers is well documented, and libraries exist that helps the developer overcome these annoyances, but for all intents and purposes, we have just doubled the number of platforms (IE and “everybody else”). The same applies to CSS, and even PNG handling.

Some companies will happily put together a “web solution”. But if you are still pretty much locked into Windows, IE, and you STILL need to install a bunch of ActiveX controls, what’s the point? Often the web solution is a little less useful than the traditional Windows application since the developers are limited to the browsers capabilities, whereas the old-skool application can pull all the levers in the system.

Recently Adobe added GPU accelerated video playback to Flash, and HTML5 is supposed to support H.264. Javascript is now very fast on a wide range of browsers (IE 9 was shown at MIX10 and looks promising, Chrome has always had fast JS). So perhaps a viable solution for desktop PC’s and macs will be available before too long.

*actually she has a Nokia phone, but I needed to add the iPhone in there somehow.