Facts and Folklore in the IP Video Industry

A while ago, I argued that just because JPEGs took up more storage space, it did not mean that JPEG offered superior quality (and certainly not if you do compare H.264 to MJPEG at the same bitrate).

I now find that some people are assuming that high GPU utilization automatically means better video performance and that all you have to do is fire up GPU-Z and you’ll know if the decoder is using the GPU for decoding.

There are some that will capitalize on the collective ignorance of the layman and ignorant “professional”. I suppose there’s always a buck to be made doing that. And a large number of people that ought to know better are not going to help educate the masses, as it would effectively remove any (wrong) perception of the superiority of their offering.

Before we start with the wonkishness, let’s consider the following question: What are we trying to achieve? The way I see it, any user of a video surveillance system simply wants to be able to see their cameras, with the best possible utilization of the resources available. They are not really concerned if a system can hypothetically show 16 simultaneous 4K streams because a) they don’t have 4K cameras and b) they don’t have a screen big enough to show 16 x 4K feeds.

So, as an example, let’s assume that 16 cameras are shown on a 1080p screen. Each viewport (or pane) is going to use (1920/4) * (1080/4) pixels (at most), that’s around 130.000 pixels per camera.

A 1080p camera delivers 2.000.000 pixels, so 15 out of every 16 pixels are never actually shown. They are captured, compressed, sent across the network, decompressed, and then we throw away 93% of the pixels.

Does that make sense to you?

A better choice is to configure multiple profiles for the cameras and serve the profile that matches the client the best. So, if you have a 1080p camera, you might have 3 profiles; a 1080p@15fps, a 720p@8fps and a CIF@4fps. If you’re showing the camera in a tiny 480 by 270 pane, why would you send the 1080p stream, putting undue stress on the network as well as on the client CPU/GPU? Would it not be better to pick the CIF stream and switch to the other streams if the user picks a different layout?

In other words; a well-designed system will rarely need to decode more than the number of pixels available on the screen. Surely, there are exceptions, but 90% of all installations would never even need to discuss GPU utilization as a bog standard PC (or tablet) is more than capable of handling the load. We’re past the point where a cheap PC is the bottleneck. More often than not, it is the operator who is being overwhelmed with information.

Furthermore, heavily optimized applications often have odd quirks. I ran a small test pitting Quicksync against Cuvid; the standard Quicksync implementation simply refused to decode the feed, while Cuvid worked just fine. Then there’s the challenge of simply enabling Quicksync on a system with a discrete GPU and dealing with odd scalability issues.

GPU usage metrics

As a small test, I wrote the WPF equivalent of “hello, world”. There’s no video decoding going on, but since WPF uses the GPU to do compositing on the screen, you’d expect the GPU utilization to be visible in GPU-Z, and as you can see below, that is also the case:

The GPU load:

  • no app (baseline) : 3-7%
  • Letting it sit: 7-16%
  • Resizing the app: 20%

This app, that performs no video decoding what-so-ever, uses the GPU to draw a white background, some text, and a green box on the screen, so just running a baseline app will show a bit of GPU usage. Does that mean that the app has better video decoding performance than, say VLC?

If I wrote a terrible H.264 decoder in BASIC and embedded it in the WPF application, an ignorant observer might deduce that the junk WPF app I wrote was faster than VLC, because it had higher GPU utilization, whereas VLC did not.

As a curious side-note, VLC did not show any “Video Engine Load” in GPU-Z,  so I don’t think VLC uses Cuvid at all. To provide an example of Cuvid/OpenGL, I wrote a small test app that does use Cuvid. The Video Engine Load is at 3-4% for this 4CIF@30fps stream.

cuvid

It reminds me of arguments I had 8 years ago when people said that application X was better than application Y because X showed 16 cameras using only 12% CPU, while Y was at 100%. The problem with the argument was that Y was decoding and displaying 10x as many frames as X. Basically X was throwing away 9 out of 10 frames. It did so, because it couldn’t keep up, determined that it was skipping frames and instead switched to a keyframe-only mode.

Anyway, back to working on the worlds shittiest NVR….

 

Advertisements

Worldwide Hack

Cameras have vulnerabilities, some easier to exploit than others. Unless you have some sort of mental defect, this is hardly news. This old fart wrote about it in 2013/2014, but it still affects a lot of people..

If you’re a bit slow in the head, you might want to take your hard earned cash, and give it to some sociopathic megalomaniac who thinks he’s the savior of the world, and feel helpless and vulnerable as you cower under the threat of the “big unknown”.

A recent hyped headlines exclaims:

“WORLDWIDE HACK”

But, you know, with this new-fangled internet, it’s pretty easy to do something “worldwide”; any script kiddie in their mother’s basement can hit every single IP that is exposed to the internet if they want. “Worldwide” don’t mean diddly squat these days. Unless you’re living in the 80’s, desperately trying to get your damn VCR fixed, so you can watch those old tapes you kept.

Now, Cameras, NVRs and DVRs with shitty security, straight to the internet? Bad fucking idea. Doesn’t mean that people don’t do it. Like drinking 2 gallons of Coke and wolfing down junk food for lunch and dinner is a bad idea -yet millions of people (actually worldwide) do it.

So you can make an easy buck selling subscriptions that places the blame squarely on the coke and pizza for the obesity epidemic. After all, who doesn’t like to be absolved of their sins, and pointing the finger at everyone else.  “The magazine says I am not to blame”, and then you can continue your gula uninhibited.

A wise person would not expect Coke or Papa Johns to spend millions of dollars showing the bad effects of poor dietary choices. They’ll continue to show fit girls and boys enjoying a coke and pizza responsibly, but the bulk of their income is certainly not derived from people with a BMI < 20.

While I understand the desire to believe that “easy” equates “correct”, it never ceases to amaze me that people don’t take any precautions. Maybe my mistake is that I am underestimating how gullible people really are (and my sociopath nemesis isn’t).

While this big, nasty, “worldwide” attack is taking place, I still haven’t seen anyone hack my trusty old Hikvision camera sitting here on my desk… must be a coincidence that I wasn’t hit.

CPU vs GPU

I think some of the incumbents are going in the wrong direction, while I am a little envious of some that I think got it 100% right.

In the old days, things were simple and cameras were really dumb, but today cameras are often quite clever, but now hordes of VMS salespeople are now trying to make them dumb again, thereby driving the whole industry backward to the detriment of the end-users. Eventually, though, I think people will wake up and realize what’s going on.

The truth is that you can run a VMS on a $100 hardware platform (excluding storage). Yet,  if you are keeping up on the latest news, it seems that you that you need a $3000 monster PC with a high-end GPU to drive it. In the grand scheme of things (cost of cameras, cabling and VMS licenses) the $2900 dollar difference is peanuts, but it bothers me nonetheless. It bothers me because it suggests a piss-poor use of the available resources.

pi
A $40 VMS capable PC

As I have stated quite a few times, the best way to detect motion is to use a PIR sensor, but if you insist on doing any sort of image analysis the best way to do it is on the camera. The camera has access to the uncompressed frame in it’s most optimal format, and it brings its own CPU resources to the party.  If you move the motion detection to the camera, then your $100 platform will never have to decode a single video frame, and can focus on what the VMS should be doing: reading, storing and relaying data.

In contrast, you can let the camera throw away a whole bunch of information as it compresses the frame. Then send the frame across the network (dropping a few packets for good measure) to a PC that is sweating bullets as it must now decompress each and every frame since MPEG formats are all or (almost) nothing formats, there is no “decode every 4th frame” option here. The decompressed frame now contains compression artifacts which contribute to making accurate analysis difficult. The transmission of the frames across the network can also lead to the frames not arriving at a steady pace, which causes other problems for video analytics engines.

missing_packets
Look at all that motion! Let’s sound the alarm.

VMS vendors now say they have a “solution” to the PC getting crushed under the insane workload required to do any sort of meaningful video analysis. Move everything to a GPU they say – and it’s kinda true. If you bring up the task manager in windows, your CPU utilization will now be lower, but crank up GPU-z and you (should) see the GPU buckling under the load. One might ask if it would not have been cheaper to get a $350 octa-core Ryzen CPU instead of a $500 GPU

gpu-z-3

Some will say that if the integrator has to spend 2 days setting up the cameras using edge detection, it might be cheaper if they just spring for the super PC and do everything on that. This assumes that the setup can actually be done quicker than when setting it up on a camera. I’d wager that a lot of motion detection systems are not really necessary, and in other cases, the VMS motion detection is simply not as good as the edge-based detection, which in some tragic instances completely invalidate the system and renders it worthless as people and objects magically teleport from one frame to the next.

 

When You Are “Hacked”

Sometime in 2014, I received a database dump from a high profile industry site. I received the file from an anonymous file sharing site via a Twitter user that quickly disappeared. The database contained user names, mail addresses, password hashes (SHA1), the salt used, IP address used to access the site and the approximate geographical location (IP geolocation lookup – nothing nefarious).

I had canceled my subscription in January 2014, and the breach happened later than that. I don’t believe I received a notification of a breach of the database. Many others did, but I absolutely would remember if I had received one – in part because I discussed the breach with a former employee at the blog, and in part, because I was in possession of said DB.

A user reached out to me, seemingly puzzled as to why I would be annoyed by not receiving a notification – seeing as I was no longer a member, why would I care that my credentials were leaked. No-one would be able to log into the site using my account anyways.

Here’s the issue I have with that. I happen to have different passwords for different things – but a lot of people do not. A lot of people use the same password for many different things. Case in point, say you find a user with the email address someuser@gmail.com, and someone uses a rainbow attack and finds the password, do you think there’s a likelihood that the same password would work if they try to log into the mail account at Gmail? Sure, it’s bad to reuse passwords, but do people do it. You bet.

So, when your site is breached, I think you have an obligation to inform everyone affected by the breach – regardless of whether they are current members or not. I would imagine anyone in the security industry would know this.