Magical “GPU” Based Video Decoder

I was recently alerted to an article that described a magical video decoding engine. The site has a history of making odd conclusions based on their observations, so naturally, I was a bit skeptical about the claims that were relayed to me by a colleague. Basically, the CPU load dropped dramatically, and the GPU load stayed the same. This sounded almost too good to be true, so I did some casual tests here (again).

EouEzI5bBR8uk.gif

Test setup

I am not thrilled about downloading a 2 GB installer that messes up my PC when I uninstall it, and running things in a VM would not be an honest test. Nor am I about to buy a new Intel PC to test this out (my next PC will be a Ryzen based system), so all tests are done with readily available tools: FFMpeg and GPU-Z. I believe that Intel wrote the QSV version of the h264 decoder, so I guess it’s as good as it gets.

Tests were done on an old 3770K, 32 GB RAM, Windows 7 with a GeForce 670 dedicated GPU. The 3770K comes with the Intel HD Graphics 4000 integrated graphics solution that supports Quick Sync.

Terminology

In the nerd-world, a GPU usually means a discrete GPU; a NVidia GeForce or AMD Radeon dedicated graphics card. Using the term “GPU support” is too vague, because different vendors have different support for different things. E.g. NVidia has CUDA and their NVEC codecs, and some things can be done with pixel shaders that work on all GPUs. (our decoding pipeline uses this approach and works on integrated as well as discrete GPU, so that’s why I use the term GPU accelerated decoding without embarrassment).

However, when you rely on (or are testing) something very specific, like Intel Quick Sync, then that’s the term you should use. If you say GPU support then the reader might be lead to believe that a faster NVidia card will get a performance boost (since the NVidia card is much, much faster than the integrated GPU that hosts Quick Sync). This would not be the case. A newer generation of Intel CPU would offer better performance, and it would not work at all on AMD chips with a dedicated GPU (or AMD’s APU solution). Same if you use CUDA in OpenCV, then say “CUDA support” to avoid confusion.

Results

Usually, when I benchmark stuff, I run the item under test at full capacity. E.g. if I want to test, say the CPU based H264 decoder in FFMpeg against the Intel Quick Sync based decoder, I will ask the system to decode the exact same clip as fast as possible.

So, let’s decode a 720p clip using the CPU only, and see what we get.

CPU

The clip only takes a few seconds to decode, but if you look at the task manager, you can see that the CPU went to 100%. That means that we are pushing the 3770K to it’s capacity.

CPU_FPS

Now, let’s test Quick Sync

QSV

Not as fast as the CPU only, but we could run CPU decoding at the same time, and in aggregate get more…. but we got ~580 fps

QSV_FPS

So we are getting ~200 fps less than the CPU-only method. Fortunately, the CPU is not being taxed to 100% anymore. We’re only at 10% CPU use when the QSV decoder is doing its thing:

CPU_QSV

Magic!!!

But surprisingly, neither is the GPU. In fact, the GPU load is at 0%

GPU_QSV

However, if you look at the GPU Power, you can see that there is an increased power-draw on the GPU at a few places (it’s drawing 2.6W at those spikes). Those are the places where the test is being run. You can also see that the GPU clock increases to meet the demand for processing power.

If there is no load on the GPU, why does it “only” deliver ~600 fps? Why is the load not at 100%? I think the reason is that the GPU load in GPU-Z does not show the stress on the dedicated Quick Sync circuitry that is running at full capacity. I can make the GPU graph increase, by moving a window onto the screen that is driven by the Intel HD Graphics 4000 “GPU”, so the GPU-Z tool is working as intended.

I should say that I was able to increase performance by running 2 concurrent decoding sessions, getting to ~800 fps, but from then on, more sessions just lowers the frame rate, and eventually, the CPU is saturated as well.

Grief

To enable Quick Sync on my workstation which has a dedicated NVidia GeForce 670 card on Windows 7, I have to enable a “virtual” screen and allow windows to extend the display to this screen (that I can’t see because I only have one 4K monitor). I also had to enable it in the BIOS, so it was not exactly plug and play.

Conclusion

I stand by my persuasion: yes, add GPU decoding to the mix, but the user should rely on edge-based detection combined with dedicated sensors (any integrator worth their salt will be able to install a PIR detector and hook it up in just a few minutes). This allows you to run your VMS on extremely low-end hardware and the scalability is much better than moving a bottleneck to a place where it’s harder to see.

Advertisements

Marketing Technology

I recently saw a fun post on LinkedIn. Milestone Systems was bragging about how they have added GPU acceleration to their VMS, but the accompanying picture was from a different VMS vendor. My curiosity had the better of me, and I decided to look for the original press release. The image was gone, but the text is bad enough.

Let’s examine :

Pioneering Hardware Acceleration
In the latest XProtect releases, Milestone has harvested the advantages of the close relationships with Intel and Microsoft by implementing hardware acceleration. The processor-intensive task of decoding (rendering) video is offloaded to the dedicated graphics system (GPU) inside the processer [sic], leaving the main processor free to take on other tasks. The GPU is optimized to handle computer graphics and video, meaning these tasks will be greatly accelerated. Using the technology in servers can save even more expensive computer muscle.

“Pioneering” means that you do something before other people. OnSSI did GPU acceleration in version 1.0 of Ocularis, which is now 8 or 9 years old. Even the very old NetSwitcher app used DirectX for fast YUV conversion. Avigilon has been doing GPU acceleration for a while too, and I suspect others have as well. The only “pioneering” here is how far you can stretch the bullshit.

Furthermore, Milestone apparently needs a “close relationship” with Microsoft and Intel to use standard and publicly available quick sync tech. They could also have used FFMpeg.

We have experimented with CUDA on a high end nVidia card years ago, but came to the conclusion that the scalability was problematic, and while the CPU would show 5%, the GPU was being saturated causing stuttering video when we pushed for a lot of frames.

Using Quick sync is the right thing to do, but trying to pass it off as “pioneering” and suggesting that you have some special access to Microsoft and Intel to do trivial things is taking marketing too far.

The takeaway is that I need to remind myself to make my first gen client slow as hell, so that I can claim 100% performance improvement in v2.0.

keep-calm-and-ignore-bullshit-7-257x300

VR and Surveillance

Nauseating and sweaty I remove my VR goggles. I feel sick, and I need to lie down. Resident Evil 7 works great in VR because things can sneak up on you from behind. You have to actually turn your head to see what was making that noise behind you.

On a monitor I can do a full panoramic dewarp from several cameras at once, and the only nausea I experience is from eating too many donuts too fast. There’s no “behind” and I have a superhuman ability to see in every direction, from several locations, at once. A friend of mine who played computer games competitively (before it was a thing), used the maximum fov available to give him an advantage to mere humans.

panorama

One feature that might be of some use is the virtual video wall. It’s very reminiscent of the virtual desktop apps that are already available.

And I am not even sure about the gaming aspect of VR. In the gaming world, people are already wondering if VR is dead or dying. Steam stats seem to suggest that it is the case, and when I went to try the Vive in the local electronics supermarket, the booth was deserted and surrounded by boxes and gaming chairs. Apparently you could book a trial run, but the website to do so was slow, convoluted and filled with ads.

Time will tell if this takes off. I am not buying into it yet.

 

Facebook vs Zenimax

John Carmack is arguable a genius, and when Facebook lost to ZeniMax, he vented his frustration on Facebook. When programmers meet lawyers, the programmer usually ends up frustrated. When someone argues that “doing nothing can be considered doing something” then you start wondering if you are the only sane person in the room, or if you are being gaslighted by someone in suit.

I think John Carmack failed to realize just how far reaching non-compete covenants can be. In some states, an employment contract can contain elements that severely limit your ability to work in related industries, and – perhaps surprisingly – the company often owns everything you create while under contract, even if it was made in your spare time. In many cases, the company does the right thing, and let’s you own that indie-game you wrote on weekends and nights, but when Facebook buys a company for $2 billion, someone might catch the smell of sweet, sweet moolah.

Here’s how I see it.

In April 2012, John Carmack is working for ZeniMax and engages with Palmer Luckey regarding the initial HMD. It seems to me that John Carmack probably thought that what he did in his spare time was of no concern to ZeniMax, and that he was free to have fun with VR since ZeniMax was not interested in any of it.

At QuakeCon 2012 (August), Luckey Palmer is on stage with both Michael Abrash and John Carmack, talking about VR. Carmack, at this point in time is clearly a ZeniMax employee, and I have a very hard time thinking that Carmack didn’t work on VR related research at this point in time.

In August 2013, John Carmack leaves ZeniMax and joins Oculus. Then starts working full time on the VR stuff. ZeniMax doesn’t seem to care. Perhaps they expected that Oculus would soon crash and burn (it probably would have w/o Facebook intervening).

Less than a year later, in July 2014, 2 years after Palmer Luckey and John Carmack exchanged a few words on a message board, Oculus is worth $2 billion dollars to Facebook (According to Mark Zuckerberg they are now $3.5 bn in the hole with this acquisition).

John Carmack says that not a single line of ZeniMax code was used, and while that may be technically true, you could say that ZeniMax founded the research to figure the 700 ways not to make a lightbulb, Carmack then moves to Oculus, bangs the code together and the rest is history.

It’s pretty easy to convince someone who is not a programmer, that code was copied. It’s pretty easy to find a technically competent person who will say that code was copied, even if all programmers know that a lot of code looks kinda similar. Rendering a quad using OpenGL looks pretty much exactly the same in all apps, but is it copyright infringement?

Time will tell if Facebook/Oculus wins the appeal. I think the current verdict is fair (the initial $6 bn claim was idiotic).

Armageddon

Oh, video surveillance industry, I have failed ye. And I apologize. I did my best.

The false prophet is constantly preaching to his obedient and subservient flock. Tail wagging, eyes wide open, listening to the dog-whistle playing tunes of fear, uncertainty, and doubt.

All we can do is sit back and watch as the industry gets destroyed by consuming the vile soup consisting of equal parts arrogance and ignorance, served up by his highness.

I shall never forget the time, about 13 years ago, when a store manager asked why the hell it had to be so advanced. He fondly remembered his VCR that had a nice red button and it just worked. Plug in the camera, and you had video. It was that simple.

Pretty much anyone could install these systems. Video quality was shit and tapes wore out, but it was simple and most people could operate it. Once we moved to IP we fucked it all up. It became a nightmare to install and operate, and you had to have a degree in engineering to make sense of any of it.

In this complex world, some people are now shitting their pants over the ownership of a technology company by a government entity. Perhaps I am wrong. Maybe the encopresis is not related to the new gospel, but is a more permanent state of affairs, who knows? But I am starting to notice the smell.

We’re past reasoning here. We’re past the point where the accuser delivers the proof, instead, the accused now has to prove his innocence. Apparently, The Court of Oyer and Terminer has been established, and our present day version of  Thomas Newton presenting his evidence for all to see – “The coat is cut or torn in two ways”.

There’s a reason why, in civilized societies, the accused is not carrying the burden of proving their innocence – it’s damn near impossible to do so. Proving guilt, on the other hand, provided there is any, may be hard, but certainly not impossible. So far, I have yet to see more compelling evidence than oddly torn coats.

Perhaps the leap from analog and coax cables to IP and CAT5 is a leap too far for some people, and in the whirlwind of technobabble, they desperately grasp for something to hold on to. Perhaps in time they will find out that they are clinging to the branches of an old, but potent, poison ivy that has spread all over the garden.

I’m not willing to pass judgment on any camera manufacturer right now. I am willing to accept that people make mistakes. NASA burned up the Mars Climate Orbiter because someone at Lockheed Martin used imperial units! People “carelessly” installed software that contained OpenSSL, which in turn was vulnerable to the Heartbleed bug, and I could go on.

Maybe I am the ignorant one. Maybe I am not “connecting the dots”. I do see the dots, and I do see how someone is trying to make you connect them. But without evidence, I am not going to draw that line. I do have ample evidence that “the flock” are ignorant fools, so I am judging members of that flock by association (fairly or not 🙂 )

Sony IPELA Backdoor

Numerous sites now report that a backdoor has been found in several Sony IPELA cameras. 

You can update the firmware, but as self-proclaimed Messiah of IP video says: “Firmware is updated all the time, just like on a PC, and a backdoor could be injected at any point during this process” (I am still not sure if this is an attempt at humor or evidence of gross incompetence).

From the reddit post on the backdoor, you can find a link to a site that lists a lot of decrypted firmware files. These decrypted files are scanned for vulnerabilities just like sec-consult did.