In the last couple of years we have seen a proliferation of H.264 capable cameras. As technology improved we were able to push ever higher resolutions though IP cameras, and today 1080p video is not an uncommon request.
H.264 and its siblings (various MPEG formats) were all designed primarily for one purpose : Forward streaming. This is especially true in video surveillance applications, where latency is the enemy, and thus B frames are out of the question, a typical surveillance camera provides ONLY I and P frames. Technically, the H.264 standard does allow a bunch of tricks for bi-directional access (seeking etc), but most cameras do not support these features yet.
In a traditional surveillance situation, 90% of all video is streamed to disk and never watched. Only a very small fraction of recorded video is ever recalled to investigate an incident. These numbers are rough estimates, and all setups vary.
But we have to store the video – “Absence of evidence is not evidence of absence”, the saying goes, so going to court claiming that “nothing happened, because I have no recordings” will always be a losing case. If H.264 offers great compression ratios, why not record in that format and save drive space?
Storing video in H.264 makes a lot of sense from a storage and bandwidth viewpoint, but video processing there is another old proverb : you can have speed, size and quality, now pick two.
For H.264 we’ve picked size and quality over speed (processing speed). H.264 is a complicated format, which takes considerable processing power to encode (camera) and decode (client). Speed on the client is not a problem when you run 4 to 9 high res H.264 streams, but some clients (ours included) offer views of 64 or even 100 cameras on screen at once, so naturally they expect it to work with ANY combination of cameras.
Now, consider what happens when the operator hits “browse”, and clicks the “step reverse” button.
H.264 video consists of GOPs (Group of Pictures), each picture in the group depends on the previous (again – for surveillance), so to get to the last picture you need to decode all the preceding pictures. Now multiply this by 64.
To increase H.264 compression, the GOP can be of variable length. This makes a lot of sense in surveillance situations where the scene is static and long GOP’s will provide a very high compression. But this makes the browser even more stressed, since it now needs to decode 50 or 60 frames per camera.
Above 25 camera views, the view needs to be treated as a special mode. It is my experience that most operators use it as a selection page, where the operator just “picks a camera from a matrix of thumbnails”.
Perhaps what we need is a better way for operators to select cameras, and get an overview of the situation, not a new 144 camera view.