A while ago, I argued that just because JPEGs took up more storage space, it did not mean that JPEG offered superior quality (and certainly not if you do compare H.264 to MJPEG at the same bitrate).
I now find that some people are assuming that high GPU utilization automatically means better video performance and that all you have to do is fire up GPU-Z and you’ll know if the decoder is using the GPU for decoding.
There are some that will capitalize on the collective ignorance of the layman and ignorant “professional”. I suppose there’s always a buck to be made doing that. And a large number of people that ought to know better are not going to help educate the masses, as it would effectively remove any (wrong) perception of the superiority of their offering.
Before we start with the wonkishness, let’s consider the following question: What are we trying to achieve? The way I see it, any user of a video surveillance system simply wants to be able to see their cameras, with the best possible utilization of the resources available. They are not really concerned if a system can hypothetically show 16 simultaneous 4K streams because a) they don’t have 4K cameras and b) they don’t have a screen big enough to show 16 x 4K feeds.
So, as an example, let’s assume that 16 cameras are shown on a 1080p screen. Each viewport (or pane) is going to use (1920/4) * (1080/4) pixels (at most), that’s around 130.000 pixels per camera.
A 1080p camera delivers 2.000.000 pixels, so 15 out of every 16 pixels are never actually shown. They are captured, compressed, sent across the network, decompressed, and then we throw away 93% of the pixels.
Does that make sense to you?
A better choice is to configure multiple profiles for the cameras and serve the profile that matches the client the best. So, if you have a 1080p camera, you might have 3 profiles; a 1080p@15fps, a 720p@8fps and a CIF@4fps. If you’re showing the camera in a tiny 480 by 270 pane, why would you send the 1080p stream, putting undue stress on the network as well as on the client CPU/GPU? Would it not be better to pick the CIF stream and switch to the other streams if the user picks a different layout?
In other words; a well-designed system will rarely need to decode more than the number of pixels available on the screen. Surely, there are exceptions, but 90% of all installations would never even need to discuss GPU utilization as a bog standard PC (or tablet) is more than capable of handling the load. We’re past the point where a cheap PC is the bottleneck. More often than not, it is the operator who is being overwhelmed with information.
Furthermore, heavily optimized applications often have odd quirks. I ran a small test pitting Quicksync against Cuvid; the standard Quicksync implementation simply refused to decode the feed, while Cuvid worked just fine. Then there’s the challenge of simply enabling Quicksync on a system with a discrete GPU and dealing with odd scalability issues.
GPU usage metrics
As a small test, I wrote the WPF equivalent of “hello, world”. There’s no video decoding going on, but since WPF uses the GPU to do compositing on the screen, you’d expect the GPU utilization to be visible in GPU-Z, and as you can see below, that is also the case:
The GPU load:
- no app (baseline) : 3-7%
- Letting it sit: 7-16%
- Resizing the app: 20%
This app, that performs no video decoding what-so-ever, uses the GPU to draw a white background, some text, and a green box on the screen, so just running a baseline app will show a bit of GPU usage. Does that mean that the app has better video decoding performance than, say VLC?
If I wrote a terrible H.264 decoder in BASIC and embedded it in the WPF application, an ignorant observer might deduce that the junk WPF app I wrote was faster than VLC, because it had higher GPU utilization, whereas VLC did not.
As a curious side-note, VLC did not show any “Video Engine Load” in GPU-Z, so I don’t think VLC uses Cuvid at all. To provide an example of Cuvid/OpenGL, I wrote a small test app that does use Cuvid. The Video Engine Load is at 3-4% for this 4CIF@30fps stream.
It reminds me of arguments I had 8 years ago when people said that application X was better than application Y because X showed 16 cameras using only 12% CPU, while Y was at 100%. The problem with the argument was that Y was decoding and displaying 10x as many frames as X. Basically X was throwing away 9 out of 10 frames. It did so, because it couldn’t keep up, determined that it was skipping frames and instead switched to a keyframe-only mode.
Anyway, back to working on the worlds shittiest NVR….