Facts and Folklore in the IP Video Industry

A while ago, I argued that just because JPEGs took up more storage space, it did not mean that JPEG offered superior quality (and certainly not if you do compare H.264 to MJPEG at the same bitrate).

I now find that some people are assuming that high GPU utilization automatically means better video performance and that all you have to do is fire up GPU-Z and you’ll know if the decoder is using the GPU for decoding.

There are some that will capitalize on the collective ignorance of the layman and ignorant “professional”. I suppose there’s always a buck to be made doing that. And a large number of people that ought to know better are not going to help educate the masses, as it would effectively remove any (wrong) perception of the superiority of their offering.

Before we start with the wonkishness, let’s consider the following question: What are we trying to achieve? The way I see it, any user of a video surveillance system simply wants to be able to see their cameras, with the best possible utilization of the resources available. They are not really concerned if a system can hypothetically show 16 simultaneous 4K streams because a) they don’t have 4K cameras and b) they don’t have a screen big enough to show 16 x 4K feeds.

So, as an example, let’s assume that 16 cameras are shown on a 1080p screen. Each viewport (or pane) is going to use (1920/4) * (1080/4) pixels (at most), that’s around 130.000 pixels per camera.

A 1080p camera delivers 2.000.000 pixels, so 15 out of every 16 pixels are never actually shown. They are captured, compressed, sent across the network, decompressed, and then we throw away 93% of the pixels.

Does that make sense to you?

A better choice is to configure multiple profiles for the cameras and serve the profile that matches the client the best. So, if you have a 1080p camera, you might have 3 profiles; a 1080p@15fps, a 720p@8fps and a CIF@4fps. If you’re showing the camera in a tiny 480 by 270 pane, why would you send the 1080p stream, putting undue stress on the network as well as on the client CPU/GPU? Would it not be better to pick the CIF stream and switch to the other streams if the user picks a different layout?

In other words; a well-designed system will rarely need to decode more than the number of pixels available on the screen. Surely, there are exceptions, but 90% of all installations would never even need to discuss GPU utilization as a bog standard PC (or tablet) is more than capable of handling the load. We’re past the point where a cheap PC is the bottleneck. More often than not, it is the operator who is being overwhelmed with information.

Furthermore, heavily optimized applications often have odd quirks. I ran a small test pitting Quicksync against Cuvid; the standard Quicksync implementation simply refused to decode the feed, while Cuvid worked just fine. Then there’s the challenge of simply enabling Quicksync on a system with a discrete GPU and dealing with odd scalability issues.

GPU usage metrics

As a small test, I wrote the WPF equivalent of “hello, world”. There’s no video decoding going on, but since WPF uses the GPU to do compositing on the screen, you’d expect the GPU utilization to be visible in GPU-Z, and as you can see below, that is also the case:

The GPU load:

  • no app (baseline) : 3-7%
  • Letting it sit: 7-16%
  • Resizing the app: 20%

This app, that performs no video decoding what-so-ever, uses the GPU to draw a white background, some text, and a green box on the screen, so just running a baseline app will show a bit of GPU usage. Does that mean that the app has better video decoding performance than, say VLC?

If I wrote a terrible H.264 decoder in BASIC and embedded it in the WPF application, an ignorant observer might deduce that the junk WPF app I wrote was faster than VLC, because it had higher GPU utilization, whereas VLC did not.

As a curious side-note, VLC did not show any “Video Engine Load” in GPU-Z,  so I don’t think VLC uses Cuvid at all. To provide an example of Cuvid/OpenGL, I wrote a small test app that does use Cuvid. The Video Engine Load is at 3-4% for this 4CIF@30fps stream.

cuvid

It reminds me of arguments I had 8 years ago when people said that application X was better than application Y because X showed 16 cameras using only 12% CPU, while Y was at 100%. The problem with the argument was that Y was decoding and displaying 10x as many frames as X. Basically X was throwing away 9 out of 10 frames. It did so, because it couldn’t keep up, determined that it was skipping frames and instead switched to a keyframe-only mode.

Anyway, back to working on the worlds shittiest NVR….

 

Worldwide Hack

Cameras have vulnerabilities, some easier to exploit than others. Unless you have some sort of mental defect, this is hardly news. This old fart wrote about it in 2013/2014, but it still affects a lot of people..

If you’re a bit slow in the head, you might want to take your hard earned cash, and give it to some sociopathic megalomaniac who thinks he’s the savior of the world, and feel helpless and vulnerable as you cower under the threat of the “big unknown”.

A recent hyped headlines exclaims:

“WORLDWIDE HACK”

But, you know, with this new-fangled internet, it’s pretty easy to do something “worldwide”; any script kiddie in their mother’s basement can hit every single IP that is exposed to the internet if they want. “Worldwide” don’t mean diddly squat these days. Unless you’re living in the 80’s, desperately trying to get your damn VCR fixed, so you can watch those old tapes you kept.

Now, Cameras, NVRs and DVRs with shitty security, straight to the internet? Bad fucking idea. Doesn’t mean that people don’t do it. Like drinking 2 gallons of Coke and wolfing down junk food for lunch and dinner is a bad idea -yet millions of people (actually worldwide) do it.

So you can make an easy buck selling subscriptions that places the blame squarely on the coke and pizza for the obesity epidemic. After all, who doesn’t like to be absolved of their sins, and pointing the finger at everyone else.  “The magazine says I am not to blame”, and then you can continue your gula uninhibited.

A wise person would not expect Coke or Papa Johns to spend millions of dollars showing the bad effects of poor dietary choices. They’ll continue to show fit girls and boys enjoying a coke and pizza responsibly, but the bulk of their income is certainly not derived from people with a BMI < 20.

While I understand the desire to believe that “easy” equates “correct”, it never ceases to amaze me that people don’t take any precautions. Maybe my mistake is that I am underestimating how gullible people really are (and my sociopath nemesis isn’t).

While this big, nasty, “worldwide” attack is taking place, I still haven’t seen anyone hack my trusty old Hikvision camera sitting here on my desk… must be a coincidence that I wasn’t hit.

CPU vs GPU

I think some of the incumbents are going in the wrong direction, while I am a little envious of some that I think got it 100% right.

In the old days, things were simple and cameras were really dumb, but today cameras are often quite clever, but now hordes of VMS salespeople are now trying to make them dumb again, thereby driving the whole industry backward to the detriment of the end-users. Eventually, though, I think people will wake up and realize what’s going on.

The truth is that you can run a VMS on a $100 hardware platform (excluding storage). Yet,  if you are keeping up on the latest news, it seems that you that you need a $3000 monster PC with a high-end GPU to drive it. In the grand scheme of things (cost of cameras, cabling and VMS licenses) the $2900 dollar difference is peanuts, but it bothers me nonetheless. It bothers me because it suggests a piss-poor use of the available resources.

pi
A $40 VMS capable PC

As I have stated quite a few times, the best way to detect motion is to use a PIR sensor, but if you insist on doing any sort of image analysis the best way to do it is on the camera. The camera has access to the uncompressed frame in it’s most optimal format, and it brings its own CPU resources to the party.  If you move the motion detection to the camera, then your $100 platform will never have to decode a single video frame, and can focus on what the VMS should be doing: reading, storing and relaying data.

In contrast, you can let the camera throw away a whole bunch of information as it compresses the frame. Then send the frame across the network (dropping a few packets for good measure) to a PC that is sweating bullets as it must now decompress each and every frame since MPEG formats are all or (almost) nothing formats, there is no “decode every 4th frame” option here. The decompressed frame now contains compression artifacts which contribute to making accurate analysis difficult. The transmission of the frames across the network can also lead to the frames not arriving at a steady pace, which causes other problems for video analytics engines.

missing_packets
Look at all that motion! Let’s sound the alarm.

VMS vendors now say they have a “solution” to the PC getting crushed under the insane workload required to do any sort of meaningful video analysis. Move everything to a GPU they say – and it’s kinda true. If you bring up the task manager in windows, your CPU utilization will now be lower, but crank up GPU-z and you (should) see the GPU buckling under the load. One might ask if it would not have been cheaper to get a $350 octa-core Ryzen CPU instead of a $500 GPU

gpu-z-3

Some will say that if the integrator has to spend 2 days setting up the cameras using edge detection, it might be cheaper if they just spring for the super PC and do everything on that. This assumes that the setup can actually be done quicker than when setting it up on a camera. I’d wager that a lot of motion detection systems are not really necessary, and in other cases, the VMS motion detection is simply not as good as the edge-based detection, which in some tragic instances completely invalidate the system and renders it worthless as people and objects magically teleport from one frame to the next.

 

When You Are “Hacked”

Sometime in 2014, I received a database dump from a high profile industry site. I received the file from an anonymous file sharing site via a Twitter user that quickly disappeared. The database contained user names, mail addresses, password hashes (SHA1), the salt used, IP address used to access the site and the approximate geographical location (IP geolocation lookup – nothing nefarious).

I had canceled my subscription in January 2014, and the breach happened later than that. I don’t believe I received a notification of a breach of the database. Many others did, but I absolutely would remember if I had received one – in part because I discussed the breach with a former employee at the blog, and in part, because I was in possession of said DB.

A user reached out to me, seemingly puzzled as to why I would be annoyed by not receiving a notification – seeing as I was no longer a member, why would I care that my credentials were leaked. No-one would be able to log into the site using my account anyways.

Here’s the issue I have with that. I happen to have different passwords for different things – but a lot of people do not. A lot of people use the same password for many different things. Case in point, say you find a user with the email address someuser@gmail.com, and someone uses a rainbow attack and finds the password, do you think there’s a likelihood that the same password would work if they try to log into the mail account at Gmail? Sure, it’s bad to reuse passwords, but do people do it. You bet.

So, when your site is breached, I think you have an obligation to inform everyone affected by the breach – regardless of whether they are current members or not. I would imagine anyone in the security industry would know this.

Mirai, mirai on the wall

Update: Mirai now has an evil sister 

Mirai seems to be the talk of the town, so I can’t resist, I have to chime in.

Let me start by stating that last week’s Dyn attack was predictable.

But I think it is a mistake to single out one manufacturer when others demonstrably have the same issues. Both Axis and NUUO recently had full disclosure documents released that both included the wonderful vulnerability “remote code execution” – which is exactly what you need to create the type of DDOS attack that hit Dyn last week.

There’s not much we can do right now. The damage has been done, and the botnet army is out there, probably growing every day, and any script-kiddie worth their salt will be able to direct the wrath of the botnet towards one or more internet servers that that dislike for one reason or other. When they do, it won’t make any headlines, it will just be another site that goes offline for a few days.

The takeaway (if you are naive) is that you just need a decent password and HW from reputable sources. While I would agree on both on the principle -the truth is that in many cases it is useless advice. For example. if you look at the Axis POC you’ll see that the attacker doesn’t need to know the password at all. (I did some probing, and I am not sure about that now).

The impact?

Impact
++++++
The impact of this vulnerability is that taking into account the busybox that runs behind (and with root privileges everywhere. in all the binaries and scripts) is possible to execute arbitrary commands, create backdoors, performing a reverse connection to the machine attacker, use this devices as botnets and DDoS amplification methods… the limit is the creativity of the attacker.

And look, the affected cameras/encoders are based on BusyBox. A popular platform, and therefore a juicy target (see BashLite and ShellShock)

In other words:

Zombie

I am not suggesting that Axis is better or worse than anyone else, the point is that even the best manufacturers trip up from time to time. It can’t be avoided, but thinking that it’s just a matter of picking a decent password is not helping things.

My recommendation is to not expose anything to the internet unless you have a very, very good reason to do so. If you have a reason, then you should wrap your entire network in a VPN, so that you are only able to connect to the cameras via the VPN, and not via the public Internet.

My expectation is that no-one is going to do that, but instead they will change the password from “admin” to something else, and then go back to sleep.

On Rationalization and UX

A while ago I read about a rationalization experiment conducted by Tim Wilson at the University of Virginia. Two groups were asked to pick a poster to take home as a gift. One group were asked to explain why they liked the poster before taking it home, the other could just pick one and go home.

After a while, they were asked for feedback. How did they like their posters now?

The group that were asked for an explanation, hated theirs while the others were still happy with their choice. This is a little surprising (it was to me at least). Why would explaining your choice affect it so profoundly?

I think this behaviour is important to keep in mind when we talk to our clients.

A potential client may be reviewing 4 different bids, and while they are pretty similar, the client decides that he likes #2. At some point the client may have to rationalize why he picked #2. If there are very obvious and compelling reasons to go with #2, it presents no problems. But what if all 4 are very similar, and the reason the client wants #2 is really because he quite simply, just likes it better. He can’t explain his choice by just jotting “I like #2 better”, so he will rationalize.

And so, unless there are very clear, objective, advantages of #2 over #4, we are entering that dangerous territory. The Poster Test (and other variations of the same thing), show that people simply make things up. It’s not that people consciously think “oh, let me just make up a story”. They genuinely feel that they are making a rational argument.

In the realm of software development, requirement specifications quite often deal exclusively with what one might call itemizable features. E.g. “open a file”, “delete a user”, “send an email”, and very little effort is put into discussing the “feel” of the application. When bad managers sit in a management meeting the focus is on Product X does Y, we need to do Y too, and then Z to be “better”, and the assumption is that the user doesn’t care one bit about how the app feels. Almost no feedback exists that complain about how an application “feels” (it does exists, there is just not a lot of it), and it is very rare that managers take that sort of thing to heart.

At Prescienta.com, the “feel” is important. I believe that if the product feels robust and thoughtful, that the customer will happily make up a story about why some small unique feature in our product is crucial. “Feel” is about timing, animations, and non-blocking activities, it’s about consistency and meaningful boundaries. Some companies seem to understand this, and consistently move towards improving the user experience, others give up, forget, or lose the people that have the ability to create pleasing user experiences, and instead let their product degrade into a jumble of disjointed UX ideas, inconsistent graphical language and bad modal activities (hmmm.. a bit of introspection here?)

A well designed application is designed with the technical limitations in mind – a good example is dropcam, where the audio delay is so large that the UI is designed to take this into consideration. The push to talk metaphor is a perfect way to handle the latency. If the UI was designed to work like a phone (that has almost no latency), there would be a massive gap between what the UI is suggesting and what the technology can deliver.

Another example, a bit closer to home, is the way you move cameras around in privilege groups. The goal here was to design the UI to ensure that the user did not get the idea that they could add the same camera to two different groups. If the UI suggests that you can do so, you may be puzzled as to why you can’t. I decided to prevent multi-group membership to avoid contradictory privileges (e.g. in one group you are allowed to use PTZ in another it is forbidden, so which is correct?).