Vivotek has a technology they call “SmartStream”, a video below shows the principle
Vivotek also has a clip that shows a SmartStream in action, with a little bandwidth meter running (seek to 1:20)
Notice how the bandwidth drops, what’s less obvious (by looking at the video via YouTube) is that the video outside of the ROI is compressed more, which means worse quality – but we don’t care much about the little carousel, so that’s all fine.
I love the concept, and some time ago, I did a very, very simple, unscientific test of adaptive compression on JPEG images. The idea is the same – you are interested in some areas of the frame, while others are not that interesting. So why use the same compression across the entire frame – why not use more bits on areas that are interesting, and less on areas that are not. In an apples-to-apples comparison, adaptive compression simply works. The frame, all else being equal, takes up less space on your drive, or – alternatively, you can have better quality for the relevant parts of the frame (around people for example), while having worse quality in the areas that are not interesting (the floor, walls etc).
That’s the theory of course. It’s a pretty good theory. It’s like saying if I have 100 boxes filled with bricks, they might weigh more than 100 empty boxes. But just to make sure, someone should test the assertion, and check to see if the theory is valid in practice. Now, if someone charged my an annual fee to test if empty boxes were truly lighter than boxes filled with bricks, I’d probably pass on that offer. But perhaps it’s a little more complicated than just validating the theory (it’s valid – trust me).
What if I had 100 boxes with bricks and I then hired someone to remove bricks from box 20 to 40 (as I don’t care for purple bricks). How would I know that my new hire had actually removed the bricks? What if my employee simply disregarded my commands and tricked me into loading filled boxes onto my truck, when 20% of them should be empty? I might be willing to pay someone to verify that my “brick removal staff” were, in fact, removing bricks.
It is – however – relevant to understand that this technique doesn’t always give you dramatic savings. To get the most from adaptive compression, you should consider your scene content.
Let’s consider a completely static scene. Nothing is moving, the light is fixed (indoor), and we we just looking down a hallway.
In this case, adaptive compression is not giving us much benefit. To understand why, you need to understand a little bit about H.264 encoding. I am guessing you know this – although a high profile blogger seems to equate the DCT quantization parameter with compression, so, just to be on the safe side, let’s go through this – simplified. Basically it’s all about throwing away stuff that “looks similar”.
In this case, the following steps are of interest :
- cut the frame into a bunch of small squares
- look for similarities between each square and the surroundings in previous frames
- if similarity is found, congratulations, you have a motion vector
- otherwise just subtract the pixel values in he square from the pixel values in the previous frame
Now if the scene is static, we don’t get to use step 3. So that leaves us with just step 4. What happens when you subtract two, virtually identical matrixes? You guessed it, you get a lot of zeroes. If you’ve got great lighting, a good sensor, great optics and so on, you get mostly zeroes. If you have a matrix of mostly zeroes (perhaps a few ones and twos here and there), and you then compress it, the encoder will simply output “user same as before”, which is basically the smallest possible unit. Apart from the I-frame, the P-frames will be mostly empty boxes, and this will happen in both scenarios.
So what if you have a playground and the sky above. You would intuitively draw a ROI around the playground, and expect great results. But again, in both cases you get “empty boxes” for the sky since there is no different between consecutive frames (ROI or no ROI), so you are not getting much benefit from the technique in that case either.
What about the I-frame?
Again, it depends on the content of the scene. I-frames are similar to JPEG (but offer slightly better compression than JPEG) – so you might know that if there are large areas of the same color, then those areas are compressed quite efficiently, in contrast to areas with a lot of intricate details that do not compress as well. So if the area outside the ROI are mostly the same color, you won’t see much benefit there either.
What’s the point of this, then?
Where the adaptive compression shines is when your ROI is excluding areas that have a lot of motion. If you look at the video at the top of this post, you’ll see that the the area excluded a) has an area with motion, and b) has a lot of detail in it.
So what about Vivoteks claims about savings
Just as I just simplified a bunch of things in this post, in order to make it comprehensible and to the point, an ad, will make a generalized claim. Sometimes there’s a little asterix that says “individual results may vary”, and in this case, perhaps Vivotek should have had a disclaimer. Vivotek demonstrates the bandwidth saving in the video for everyone to see. Vivotek’s simplification of the message is no worse than a blog claiming that H.264 compression levels is simply a question of picking a number between 0 and 51 (I think the latter is a lot worse).
Vivotek is pushing the industry in the right direction, and that innovative features like SmartStream makes the market so much more interesting than merely bumping the resolution or frame-rate.
Even if there are some caveats.