About Vivotek SmartStream

Vivotek has a technology they call “SmartStream”, a video below shows the principle

Vivotek also has a clip that shows a SmartStream in action, with a little bandwidth meter running (seek to 1:20)

Notice how the bandwidth drops, what’s less obvious (by looking at the video via YouTube) is that the video outside of the ROI is compressed more, which means worse quality – but we don’t care much about the little carousel, so that’s all fine.

I love the concept, and some time ago, I did a very, very simple, unscientific test of adaptive compression on JPEG images. The idea is the same – you are interested in some areas of the frame, while others are not that interesting. So why use the same compression across the entire frame – why not use more bits on areas that are interesting, and less on areas that are not. In an apples-to-apples comparison, adaptive compression simply works. The frame, all else being equal, takes up less space on your drive, or – alternatively, you can have better quality for the relevant parts of the frame (around people for example), while having worse quality in the areas that are not interesting (the floor, walls etc).

That’s the theory of course. It’s a pretty good theory. It’s like saying if I have 100 boxes filled with bricks, they might weigh more than 100 empty boxes. But just to make sure, someone should test the assertion, and check to see if the theory is valid in practice. Now, if someone charged my an annual fee to test if empty boxes were truly lighter than boxes filled with bricks, I’d probably pass on that offer. But perhaps it’s a little more complicated than just validating the theory (it’s valid – trust me).

What if I had 100 boxes with bricks and I then hired someone to remove bricks from box 20 to 40 (as I don’t care for purple bricks). How would I know that my new hire had actually removed the bricks? What if my employee simply disregarded my commands and tricked me into loading filled boxes onto my truck, when 20% of them should be empty? I might be willing to pay someone to verify that my “brick removal staff” were, in fact, removing bricks.

It is – however – relevant to understand that this technique doesn’t always give you dramatic savings. To get the most from adaptive compression, you should consider your scene content.

Here’s why:

Let’s consider a completely static scene. Nothing is moving, the light is fixed (indoor), and we we just looking down a hallway.


In this case, adaptive compression is not giving us much benefit. To understand why, you need to understand a little bit about H.264 encoding. I am guessing you know this – although a high profile blogger seems to equate the DCT quantization parameter with compression, so, just to be on the safe side, let’s go through this – simplified. Basically it’s all about throwing away stuff that “looks similar”.

In this case, the following steps are of interest :

  1. cut the frame into a bunch of small squares
  2. look for similarities between each square and the surroundings in previous frames
  3. if similarity is found, congratulations, you have a motion vector
  4. otherwise just subtract the pixel values in he square from the pixel values in the previous frame

Now if the scene is static, we don’t get to use step 3. So that leaves us with just step 4. What happens when you subtract two, virtually identical matrixes? You guessed it, you get a lot of zeroes. If you’ve got great lighting, a good sensor, great optics and so on, you get mostly zeroes. If you have a matrix of mostly zeroes (perhaps a few ones and twos here and there), and you then compress it, the encoder will simply output “user same as before”, which is basically the smallest possible unit. Apart from the I-frame, the P-frames will be mostly empty boxes, and this will happen in both scenarios.

So what if you have a playground and the sky above. You would intuitively draw a ROI around the playground, and expect great results. But again, in both cases you get “empty boxes” for the sky since there is no different between consecutive frames (ROI or no ROI), so you are not getting much benefit from the technique in that case either.

What about the I-frame? 

Again, it depends on the content of the scene. I-frames are similar to JPEG (but offer slightly better compression than JPEG) – so you might know that if there are large areas of the same color, then those areas are compressed quite efficiently, in contrast to areas with a lot of intricate details that do not compress as well. So if the area outside the ROI are mostly the same color, you won’t see much benefit there either.

What’s the point of this, then?

Where the adaptive compression shines is when your ROI is excluding areas that have a lot of motion. If you look at the video at the top of this post, you’ll see that the the area excluded a) has an area with motion, and b) has a lot of detail in it.

So what about Vivoteks claims about savings

Just as I just simplified a bunch of things in this post, in order to make it comprehensible and to the point, an ad, will make a generalized claim. Sometimes there’s a little asterix that says “individual results may vary”, and in this case, perhaps Vivotek should have had a disclaimer. Vivotek demonstrates the bandwidth saving in the video for everyone to see. Vivotek’s simplification of the message is no worse than a blog claiming that H.264 compression levels is simply a question of picking a number between 0 and 51 (I think the latter is a lot worse).

Vivotek is pushing the industry in the right direction, and that innovative features like SmartStream makes the market so much more interesting than merely bumping the resolution or frame-rate.

Even if there are some caveats.

Default Passwords and ONVIF

Before you judge me borderline insane, in this post, I am talking about FACTORY DEFAULT passwords, for example : mobotix

My darling ONVIF, you’ve come of age and I tried to woo you. But you turned me down. Again.

A while back I decided to take the plunge, and have some fun with ONVIF. Axis knows I’ve been very happy with ONVIF’s older, and more mature, sister VAPIX. VAPIX is damn nice. But there’s a certain allure to ONVIF, the young, promiscuous rebel, and I wanted to see if I could tame her too.

ONVIF provides a handy mechanism for detecting ONVIF cameras on the local subnet. Easy peasy.  Got all the cameras in a jiffy. Next step was to get some attributes about each camera. And suddenly the approachable darling turned out to be an outright bitch.

Usually, using a web service is a one-two-three step process. Very simple, which is important if you want any sort of penetration. Unfortunately, the camera in question decided that I wasn’t worthy of a response. Usually, I would have given up, but I was in a fighting mood so a few hours of searching high and low, I found a piece of code that would allow me to authenticate properly with the camera. That, in my opinion, is fail #1. I doubt that there would be any way for me to figure out what the hell was wrong by looking at the authentication failure error code, and it’s not as if the ONVIF site makes it clear either. Now that I spent a day looking for it, I am going to be an asshole and not share the solution until my own thing is good and done.

A small part of my  problems is that I used the root account to access the camera. The root user (built in to Axis cameras) is not an “ONVIF user”. I can – apparently – create an ONVIF user by using the root-credentials and some ONVIF wdsl, but I haven’t tried that yet. My workflow would then be : detect cameras, then connect to the camera to get caps using some user-supplied credentials (say onvif_user:1234). Now that may fail, because the user hasn’t been created yet, so I will now have to use the VAPIX root account (which the user also has to supply the password for) to create the onvif_user account. THEN I will be able to finally do ONVIF. But it’s a damn mess from the user perspective. Especially because it’s a really bad idea to have the same root password on all the cameras.

It seems to me that the lack of an ONVIF default user is a problem.

Ideally, you’d plug in your ONVIF cameras, the DHCP server gives them an IP with a long lease. We then find the cameras on the network using the default credentials. Once you decide to import a camera, the NVR server should change the cameras password and store that in an encrypted file. This way the cameras are easy to install, and you maintain security.

The way it works now is too cumbersome and error prone, and it doesn’t scale too well. I don’t want my users to fiddle with spreadsheets for each installation.

I’ve created a small page where you can, if you like, see and add default credentials for various cameras.

List of default usernames and passwords

Let’s work together and make ONVIF viable.

How I Created A Simple Axis Camera Health Monitor

Having video surveillance is great, but you need to make sure the cameras are running 24/7. Naturally, a camera can’t notify you when it dies, so to check if it is alive, you need to monitor the camera periodically. But doing so is a pain in the ass, and so with a little web server coding I was able to set up a monitor for a camera in the office in a day or so.

I’ve got an old PC running Ubuntu sitting in my basement, it’s serves as a LAMP-stack (Linux Apache MySQL and PHP), as well doing a few other chores for me. With the server in place I cooked up a small project called “Sentinel”. Using a few lines of PHP, SQL and a CRON script it’s pretty easy to do.

My tired work-horse
My tired work-horse

Basically, I’ve created a table that hold “beacons” on the MySQL server, periodically, I sweep the beacon table, and check the age of the newest beacon. If the beacon is too old, I send out an email. The script to create tables and store beacons is outside the scope of this post, but I’ll share a few simple ideas. To sweep the beacon table periodically and send emails I used CRON with curl – this way I could code my beacon check code in PHP (using PHP/Pear I can send emails). I created a CRON entry that will use curl to open a web page every hour. When the server gets the request, it will execute the PHP script, and thus I get periodic execution of PHP code. Perhaps there’s an easier way, but this worked out fine.

I created a simple dashboard, that allows me to see the beacons in a simple and quick manner.


The first 3 systems are not getting any beacons, but the 4th one is fine, and has been fine for a while. To dump a beacon on my server, all the camera needs to do is to periodically open a URL with some parameters (sysid and comment). Some of the Axis cameras support this quite nicely..

First, set up a recipient that is a HTTP recipient


Then, set up a recurring schedule

Recurring Schedule

Then, create a rule

Create Rule

Rule Setup

I’m not sure what other cameras support this functionality, I’m sure there are other cameras that will do this. The point is that once you start digging around in the toolbox, you can accomplish pretty cool things with a bit of effort. Sure, this isn’t commercialized, and it’s a hassle to configure hundreds of cameras this way. There ARE commercial tools available out there, that will let you do what I describe here.

Camera Thumbnails

In the previous version of the administrator tool, we relied heavily on camera thumbnails. In the newest version, we have opted for a more compact tree control. We experimented with thumbnails in the tree, but the UI started to look more like an abstract painting. Sometimes you need a little visual reminder though, so we added a thumbnail panel.

In the lower left corner of any camera picker control, you will see a little triangle. The triangle pops the panel that shows a thumbnail, the camera label, and any comment you may have associated with the camera.

Here’s how it works.

Live Layouts vs. Forensic Layouts

In our client, if you hit browse, you get to see the exact same view, only in browse mode. The assumption is that the layout of the live view is probably the same as the one you want when you are browsing through video.

I am now starting to question if that was a wise decision.

When clients ask for a 100 camera view, I used to wonder why. No-one can look at 100 cameras at the same time. But then I realized that they don’t actually look at them the way I thought. They use the 100 camera view a “selection panel”. They scan across the video and if they see something out of place they maximize that feed.

I am guessing here, but I suspect that in live view, you want to see “a little bit of everything” to get a good sense of the general state of things. When something happens, you need a more focused view – suddenly you go from 100 cameras to 4 or 9 cameras in a certain area.

Finally, when you go back and do forensic work, the use pattern is completely different. You might be looking at one or two cameras at a time, zooming in and navigate to neighboring cameras quite often.

Hmmm… I think we can improve this area.

Analog vs IP Video

In the good old days, you had to have a guy walk in front of an automobile, ringing a bell to warn pedestrians that a car was coming (at walking pace), it was unreliable, difficult to operate, and one false move and it became an instant death trap. Horses, on the other hand, were easy to replace, they all fit in the old barn, and they all ate pretty much the same food.

The car analogy is not to say that IP is the car of today compared to a horse; the analogy is that when cars first came about, it took a long time before they became as homogeneous as they are today. Back then it was very difficult to see cars as a viable alternative to the horse. The point I was trying to make is that we are busy designing a better car, while others are convinced that Horse 2.0 is the way to go.

So is IP really better than Analog?

A DVR is analog – right?

Well… Let’s define the terms a little, to avoid the semantic confusion. In this discussion, “analog” refers to the transmission mechanism from the camera to the recording device. “analog” means that the camera sends an analog, uncompressed NTSC/PAL signal directly to the recording device. Whereas the an IP camera captures the image, compresses it and sends it via an IP network to the recording device.

The recording device may be labelled “DVR” or “NVR”, but in most cases the internal components are roughly the same. A DVR usually comes with a framegrabber card preinstalled that allows the recorder to capture the analog video and store it in digital form on a storage medium (is this starting to sound like lawyer-speak?). Likewise, an NVR may be retrofitted with a framegrabber card too, and thus the DVR and NVR becomes almost indistinguishable. Therefore, the discussion is not about DVR vs NVR, but rather analog vs. digital transmission of video.

If you already have cabling in place, or if the placement of the cameras is such that you can’t cluster the cameras, then the cabling part of it is equal. But if you can do clustering, it is extremely cheap to do with IP cameras. Also, as Mark Schweitzer pointed out in the LinkedIn forum, IP comes with a built-in upstream channel, so if you ever need to replace a fixed camera with a PTZ, you do not need any additional cabling. Wireless transmission can also be achieved and you can monitor a camera (or a cluster of cameras) via the internet. As far as I can tell, HDcctv cannot piggyback off cat5, but need new HD-SDI cabling (I’m sure Todd Rockoff can clearify on the cabling requirements). I don’t know if SDTI is commonly used in HDcctv installations either.

Image Quality
Analog comes in a few flavors; the most common is NTSC/PAL(and their variations). NTSC has a resolution of 486 lines which in many cases is too low to identify faces unless they are very close to the camera. IP allows you to pick cameras that fit the purpose; cheap, low res cameras for overview, more expensive HD cameras for details and so on. If you so desire, you can easily replace one camera with another of higher or lower resolution. HDcctv seems to allow 1080P (2MP) as the maximum resolution. I think the increased flexibility of IP makes a winner overall.

Live viewing
For those who look at video, live, and respond to it, low latency and fluidity (high framerate) is important. High resolution less so. The reason is that you do not need crisp video to see that someone is fighting on the street, it is only when you go back to investigate and later go to court that the high resolution is real important (identification of license plates etc). Some IP cameras allow you to run 2 streams at the same time. One that gets recorded and one which is called up on demand when you want to view live video. Naturally, you are constrained by the bandwidth available to you as well as the ping-time. If you are cabling like you would an analog installation, you would have no problems with latency or quality at all.

On the other hand, if your IP camera does not provide dual streaming capabilities, then what you record is is what you get to see live. This means that if you run a low framerate and high compression then your live view will reflect that. On the other hand, you can always replace the camera with a better one that supports dual streams.

Even if the transmission of analog video is lossless the recorder will compress the video using the same compression technology as the IP camera. Any compression artifact introduced by the IP camera will also be introduced by the recorder as it compresses the video. However, while the IP solution may provide playback video in substantially higher resolution than the live video feed (which has an emphasis on framerate and latency), the analog solution can never provide higher quality than the live feed. What you see live is the highest “quality” that you will ever see.

Again, apples to apples, if IP is deployed using analog cabling conventions, you can certainly get the same live/playback quality as an analog system (e.g. VGA/30fps). Furthermore, your recorded video may be of much better fidelity than your live video (e.g. 1080p/10fps). This makes it possible to identify faces and license plates on the recorded video as you are conducting an investigation. It must be stressed that this is not always possible to achieve, bandwidth constraints are always a factor that must be taken into consideration. You should never expect to get 5MP, high quality, 30 fps recordings over a dial-up internet connection.

IP cameras is like a BYOB party. Everyone brings their own beer, so you just need to consider how many people will physically fit in your pad. IP cameras already do the compression for you, so the recorder simply needs to pipe the data to the hard-drives. In an analog system, the recording device is handing out the beer as people arrive. The host might have enough beer for a 16 people, but once you are out of beer the party stops. Furthermore, some IP cameras will allow you to do motion detection, or even video analytics directly on the camera (this technology is still in its infancy though). Obviously, a system based on analog transmission can also scale, by adding more/stronger/faster recorders. In terms of scalability I think IP eeks out the advantage, but not by much.

I think the prudent investment is not in more barns for horses or investing heavily in horse 2.0 (which requires special pavement to achieve the speed of a car).

With that said – Happy Holidays folks

Preallocation of Storage

What is the principal argument against pre-allocating (formatting) the storage for the video database?

One problem that I am aware of is if you need to pre-allocate space for each camera. A camera with very little motion might record for 100 days in a 100 GB allocation, while a busy one might have just 1 day. Change the parameters and it gets real hard to figure out what a reasonable size should be.

But say that you pre-format the total storage you need for the entire system, and then let all the cameras share the storage on a FIFO basis. This way, all cameras would have roughly the same amount of time recorded in the database.

My, decidedly unscientific tests, show that writing a large block of data to a continuous area on the disk is much faster than writing to a file that is scattered across the platters. Disk drives now have large caches and command queuing, but these mechanisms were designed for desktop use, and not a torrent of video data being written and deleted over and over again.

Some people balk at the idea that you pre-format the disk for reasons I simply do not understand. If you have a 100 TB storage system, I would expect that you’d want to use the full capacity of the disk. There are no points awarded for having 20% of the disk empty, so why do people feel that pre-allocation is bad?

Any takers?

Standards in IP Video

To grow the pie, standards are needed; imagine if all web servers had slightly proprietary protocols, so that you needed an Apache client if you connected to an Apache server, and another for IIS servers. What if you couldn’t move the HTML from the old IIS to the new Apache? Imagine if for every new version of IIS you needed to re-write parts of the HTML, and distribute a new client to all the users.

In broadcast video we have NTSC, PAL and SECAM and a bunch of variations of these, yet TV has attained omnipresence. But the standards have been stable for many years, while it seems as if our industry have a different protocol for each camera, and even different versions of the same camera.

There are 2 big movements to try and define a standard. One is driven by the NVR side, and the other by the big camera manufacturers. I believe the odds are with ONVIF, if they can enforce an element of dictatorship and avoid designing another camel*.

A winner must emerge for the benefit of the consumer. Even 4 different standards would be acceptable, and allow NVR companies to focus on what matters to the end user. Stability, flexibility, ease of use, useful features that improve security, and increases the efficiency of the staff. In the ideal situation, a client would simply buy a conforming camera and plug it in, with the certainty that it would “just work”.

There are roughly 3 areas of interoperability that must be addressed.

  • Discovery and Capability
  • Video and Audio formats
  • IOs

Discovery is partially addressed by uPNP and to some extent ARP and DCHP sniffing can be helpful on LANs. But for remote cameras, a simple XML response to a fixed address would suffice.

Video and Audio formats are standardized, but some manufacturers provide proprietary versions of various codecs (Mobotix has a proprietary JPEG codec for example). I suggested a couple of years ago that we define a reference decoder for each format – I am not sure it was ever done – but if the video or audio cannot be decoded by the reference decoder the camera should not be considered conforming.

A standardized way for cameras to notify an NVR of the closure of an input, and a way to manipulate a relay would also be needed.

On top of all this comes the need for standardized authentication and encryption schemes (trivial), but that is beyond the scope of this humble blog.

As an example, the current ONVIF core spec is 223 pages and is based on SOAP, which makes it relatively simple to use in modern languages – if a little bloated;

HTTP/1.1 500 Internal Server Error
CONTENT-LENGTH: bytes in body
CONTENT-TYPE: application/soap+xml; charset=”utf-8”
DATE: when response was generated
<?xml version=”1.0” ?>
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"
      <soapenv:Value>fault code </soapenv:Value>
          <soapenv:Value>ter:fault subcode</soapenv:Value>
             <soapenv:Value>ter:fault subcode</soapenv:Value>
      <soapenv:Text xml:lang="en">fault reason</soapenv:Text>
       <soapenv:Text>fault detail</soapenv:Text>

The long message contains just 5 chunks of useful information; fault code (most likely a number), reason (“not supported”) and details (“don’t do this again”). The rest is called “envelope” – that is one giant envelope for such a small message 🙂

Even if I have my peeves about SOAP and its bloatedness, it is MILES better to have a standard than 236 different ways to get an error code from a camera.

Regardless of who defines the standard, the end user will be the winner. We will then have to compete on different parameters, and I can’t wait for that to happen.

*a camel is a horse designed by committee