Last week I trolled Carlton Purvis via Twitter. I hereby apologize. Carlton Purvis suggesteddiscussing if higher educational requirements for police officers would lower the number of incidents of use of excessive force and corruption. I honestly felt that Carlton was linking bad behaviour (in the police force) with lack of education – an idea that I find pretty offensive. Riled up, I decided to troll Carlton Purvis by asking if he was suggesting that people with lower educational accomplishments had a propensity toward this sort of behaviour. That was a stretch. A discussion is – well – a discussion.
I got riled up, because attributing bad behavior to religion, race, gender, age or lack of education is something I consider dangerous and ignorant. The Milgram experiment shows how far people are willing to go, if they are prodded by an authority, Philip Zimbardo was behind the Stanford prison experiment and has done a lot of research on this topic, he was also defending Ivan “Chip” Frederick after the Abu Ghraib atrocities. We may snuggle up, turn on American Idol take a zip of our coffee, and think that we could never do the things that those guards did, and that these people are somehow very different from us. The truth is that circumstance is a motherf….r, when put in the wrong place at the wrong time, most of us will do some f….k up things. My stance is that “evil” (for lack of a better word), is something we are all capable of. If that is the case, we should instead be looking at the systems that seem to breed and nourish perverted behavior.
If we, as individuals stand up to immoral behavior, other people will follow. In the Milgram experiment, if people were exposed to other participants who protested the experiment, there was a much greater likelihood of them also refusing to carry out these unsavory actions. If people that speak up against bad behavior are persecuted by their peers, or by society, then the cancer will spread and become more difficult to remove later on.
71% of Canadians believe grainy security video can be enhanced in a lab
A while back I was approached by a company that had some surveillance footage that they hoped I could improve. This was a fraud case, and while there was no issues identifying the suspect (access control logs also helped), it was impossible to determine what he was carrying as he left the building. The footage was low resolution, black and white, heavily compressed and shot outdoor at night. When “enhancing” footage, you have to be careful that you are not “manipulating” the footage, prodding the result in a certain desired direction. In the end, all I could do was to remove a bit of noise, normalize the contrast and brightness, and that was it.
Enhancing is possible, but probably not to the extent that people might believe. One of the more impressive feats is removal of blur – caused by either lack focus or motion blur. To the lay-man there’s not much difference between the kind of blur that you see when you upscale an image, and out of focus blur, however one type of blur can be enhanced, the other can’t. (or can it?)
The same applies if you have an object in the distance covering, say 32 x 64 pixels. If you crop this area and upscale it 10x (using bilinear filtering), you’ll get a blurry 320 x 640. Even if the full source image was 29MP, there’s no way to magically turn the 32 x 64 crop into a 320 x 640 razor sharp image (like you often see in the movies). What counts is the number of real, captured pixels.
The example of single image super resolution works with images where the source image already have a substantial amount of detail to allow the algorithm to work. In almost all examples you can actually identify a person, or read the letters in the source image already. Further enhancement is hardly necessary for ID purposes.
Vivotek has a technology they call “SmartStream”, a video below shows the principle
Vivotek also has a clip that shows a SmartStream in action, with a little bandwidth meter running (seek to 1:20)
Notice how the bandwidth drops, what’s less obvious (by looking at the video via YouTube) is that the video outside of the ROI is compressed more, which means worse quality – but we don’t care much about the little carousel, so that’s all fine.
I love the concept, and some time ago, I did a very, very simple, unscientific test of adaptive compression on JPEG images. The idea is the same – you are interested in some areas of the frame, while others are not that interesting. So why use the same compression across the entire frame – why not use more bits on areas that are interesting, and less on areas that are not. In an apples-to-apples comparison, adaptive compression simply works. The frame, all else being equal, takes up less space on your drive, or – alternatively, you can have better quality for the relevant parts of the frame (around people for example), while having worse quality in the areas that are not interesting (the floor, walls etc).
That’s the theory of course. It’s a pretty good theory. It’s like saying if I have 100 boxes filled with bricks, they might weigh more than 100 empty boxes. But just to make sure, someone should test the assertion, and check to see if the theory is valid in practice. Now, if someone charged my an annual fee to test if empty boxes were truly lighter than boxes filled with bricks, I’d probably pass on that offer. But perhaps it’s a little more complicated than just validating the theory (it’s valid – trust me).
What if I had 100 boxes with bricks and I then hired someone to remove bricks from box 20 to 40 (as I don’t care for purple bricks). How would I know that my new hire had actually removed the bricks? What if my employee simply disregarded my commands and tricked me into loading filled boxes onto my truck, when 20% of them should be empty? I might be willing to pay someone to verify that my “brick removal staff” were, in fact, removing bricks.
It is – however – relevant to understand that this technique doesn’t always give you dramatic savings. To get the most from adaptive compression, you should consider your scene content.
Let’s consider a completely static scene. Nothing is moving, the light is fixed (indoor), and we we just looking down a hallway.
In this case, adaptive compression is not giving us much benefit. To understand why, you need to understand a little bit about H.264 encoding. I am guessing you know this – although a high profile blogger seems to equate the DCT quantization parameter with compression, so, just to be on the safe side, let’s go through this – simplified. Basically it’s all about throwing away stuff that “looks similar”.
In this case, the following steps are of interest :
cut the frame into a bunch of small squares
look for similarities between each square and the surroundings in previous frames
if similarity is found, congratulations, you have a motion vector
otherwise just subtract the pixel values in he square from the pixel values in the previous frame
Now if the scene is static, we don’t get to use step 3. So that leaves us with just step 4. What happens when you subtract two, virtually identical matrixes? You guessed it, you get a lot of zeroes. If you’ve got great lighting, a good sensor, great optics and so on, you get mostly zeroes. If you have a matrix of mostly zeroes (perhaps a few ones and twos here and there), and you then compress it, the encoder will simply output “user same as before”, which is basically the smallest possible unit. Apart from the I-frame, the P-frames will be mostly empty boxes, and this will happen in both scenarios.
So what if you have a playground and the sky above. You would intuitively draw a ROI around the playground, and expect great results. But again, in both cases you get “empty boxes” for the sky since there is no different between consecutive frames (ROI or no ROI), so you are not getting much benefit from the technique in that case either.
What about the I-frame?
Again, it depends on the content of the scene. I-frames are similar to JPEG (but offer slightly better compression than JPEG) – so you might know that if there are large areas of the same color, then those areas are compressed quite efficiently, in contrast to areas with a lot of intricate details that do not compress as well. So if the area outside the ROI are mostly the same color, you won’t see much benefit there either.
What’s the point of this, then?
Where the adaptive compression shines is when your ROI is excluding areas that have a lot of motion. If you look at the video at the top of this post, you’ll see that the the area excluded a) has an area with motion, and b) has a lot of detail in it.
So what about Vivoteks claims about savings
Just as I just simplified a bunch of things in this post, in order to make it comprehensible and to the point, an ad, will make a generalized claim. Sometimes there’s a little asterix that says “individual results may vary”, and in this case, perhaps Vivotek should have had a disclaimer. Vivotek demonstrates the bandwidth saving in the video for everyone to see. Vivotek’s simplification of the message is no worse than a blog claiming that H.264 compression levels is simply a question of picking a number between 0 and 51 (I think the latter is a lot worse).
Vivotek is pushing the industry in the right direction, and that innovative features like SmartStream makes the market so much more interesting than merely bumping the resolution or frame-rate.
One of my most read blog posts is the one where I pose the question Is ONVIF a complete failure? The protocol itself is – in principle – fine (if everyone implemented it to perfection). Back when ONVIF started, SOAP was da shiznit, so obviously ONVIF is based on SOAP. Today SOAP is considered pretty n00b, and AJAX/JSON is considered the way to go. I wish ONFIV had started out simple – a few, well defined HTTP URIs with a clearly defined response format would have been great. But it didn’t pan out like that.
I suppose that most professional integrators already know that picking random cameras and hoping that their NVR will support them (well) via ONFIV is a little haphazard. Granted, it shouldn’t be that way. If I get a device that says it has HDMI support, then my expectation is that it will work with my television that also has an HDMI connector. And I wish the same applied to IP cameras.
So I hope that integrators pick cameras that they know work (well) with the NVR/VMS software that they pick. I think a lot of integrators go for a combination of cameras and VMS’s where the VMS natively support the cameras proprietary protocol (like VAPIX for Axis or SUNAPI for Samsung-Techwin). Under the bonnet, the VMS may use parts of ONVIF to work the camera, but the specific camera type usually shows up in some dropdown in the VMS administration module.
So who uses ONVIF?
I must admit I don’t know. I know that occasionally someone inquires about it, but I never really dug down into the details. Don’t get me wrong – the idea that you could buy an ONVIF camera from anyone, and not have to rely on the VMS manufacturer to provide support for the camera is what we strive for – and some glorious day, we’ll look back at the bad old days when it was plug and pray.
But there’s another option that most people seem to be using : generic RTSP support.
A lot of installations have no need for PTZ at all. They have a bunch of fixed cameras, and if those cameras stream via RTSP, most VMS’s will support them. Granted, obtaining the RTSP URI for your camera is not really user-friendly, for the seasoned/motivated amateur, RTSP is actually a pretty decent way of getting your feed recorded in your favorite VMS.
It does cut out a lot of the simplicity of using a VMS’ proprietary support – setting the resolution, the compression etc. will have to be done via the cameras web interface, which differs wildly across different manufacturers but it might be a price one was willing to pay. Especially in the low end segment where a lot of DIY types are messing around with cameras that are cheap, and offer surprisingly good image quality – pairing them with their favorite VMS.
For the foreseeable future, I think most/all high-end installs will rely on the proprietary protocols, but I hope and pray that within a few years we’ll be able to pick pretty much any camera, and have it work, out of the box, with almost any popular VMS.
I used to read a lot of car-magazines, and I watch a lot of Top Gear on NetFlix. In Top Gear they have a segment called “The News”. In this segment, the hosts look at press releases, and snapshots of upcoming cars. They may say that the new Citroen “look great, and comes with a 256 HP engine”. It is understood, that “looks great” is the subjective opinion of the host, while 256 HP is (presumably) a fact. The viewer may agree or disagree with the statement that the car looks good, but it’s hard to disagree with the 256 HP claim. This part of the show is not called “reviews”. They don’t start the segment by saying “we are reviewing the new Citroen DS” and then talk about a press release they’ve read, or pictures they’ve seen of the car.
“Reviews” are when the hosts actually drive the cars. As they do so, they are again mixing opinion and fact. For example, “the dash is cheap hard plastic”. Again, it is understood, that “cheap” is a subjective statement and “hard plastic” is factual. Since the show is on television, the viewer has a chance to judge for themselves if they agree or not. They are reassured that Jeremy Clarkson actually drove the car, felt the plastic, and that he has driven thousands of other cars before, so he has a great set of references to compare the car under test to.
Here’s a guy actually reviewing a car
It would be considered an extreme lapse of journalistic integrity if Top Gear hired an ex-engineer at Ford to come on the show, didn’t disclose that this was an ex-Ford employee, and then had the engineer review not only a Ford Focus, but also a Mazda 3 and a Suzuki Swift. It’s not hard to imagine that the ex-Ford engineer could be quite biased when reviewing the Ford. If the engineer was fired from his job he might give the Ford a bad review, or perhaps he want’s to lend a hand to his old mates at Ford. Either way, such behaviour would probably never accepted – by the shows editors or the viewers.
Even though Top Gear doesn’t purchase the cars they test, I still have no reason to believe that they are dishonest in their reviews. They have posted plenty of bad reviews on the show – the Tesla Roadster and Zenvo comes to mind. Primarily because both companies were very vocal about the bad reviews. Tesla even took Top Gear to court, shouting about libel and malicious falsehood (they lost – twice). Naturally, Tesla is obviously not going to let Top Gear test any more cars, but neither party seems to have suffered long term over the feud. However, even though the Corvettes and Vipers are usually ridiculed on the show, I would still give both cars a test drive if I had the money. Me standing around, claiming that a Corvette is a bad car – based on a review on Top Gear would be ridiculous.
I doubt that Top Gear would have many viewers if all they did was to offer comments on press releases as “reviews”, and have ex-car manufacturers test drive not only their own creation, but also the competitions. If someone said that they based their decision to buy a car on an ex-Ford employees comments on a Ford, or Jeremy Clarksons comments on a press release, I would call that person naive. I the person paid for that sort of “advice” I would use a much stronger term.
The Heartbleed vulnerability is just mindboggling in scope. The culprit(*) is hiding in a library called “openssl”; It’s a handy little library that makes it a lot easier for a programmer to do SSL, and everybody is using it. The reason is that SSL is very complex; writing an SSL module from scratch would take ages, and be extremely error prone. So instead we piggyback off of what others have done, and save a lot of time and effort.
In the development community, there’s a general consensus that “rolling your own XYZ” when there’s already a well-tested XYZ available, is a pretty bad idea. But there are still some developers to want to write their own stuff, and scoff at anything not written by themselves. Almost all developers have, at one time or another, been seduced by a promising library, only to discover that said library had a fatal flaw that you just couldn’t fix. So it’s always a balancing act – I feel a little uneasy every single time someone adds a new 3rd party library to our app. Mainly because I have no control over the quality of that module – perhaps it leaks memory, perhaps it crashes, and if it does, it’s my responsibility to come up with a solution(**). Only problem is – I may not have access to the source code of the module, and thus I simply have to relay the bug to the 3rd party, who may or may not appreciate the gravity of the situation. So, depending on the complexity, I’m one of those that roll my own more frequently than I probably should.
The problem with Heartbleed goes beyond just your server. A great many number of client apps are using openssl, and thus they may be susceptible to a heartbleed attack. This makes Heartbleed one of the potentially most damaging bugs I’ve ever seen.
I am wondering if this is somewhat similar to inbreeding. A lot of apps have the same little strand of DNA in them, and while that little piece gives us a lot of advantages, it comes with a deadly flaw. Since an entire ecosystem is carrying the exact same piece of DNA around, they are all vulnerable. Had we had more diversity, perhaps the issue would not have been so serious. Most people are going to be fine – but there are a group of users who visit awful websites who might be targeted by hackers. I think there will be easier ways of getting into regular users machines than through heartbleed – some people will actively download and execute an file if the filename and promise is tempting enough.
For example : Here’s a little executable I made, that will show you all of a certain blogs users and passwords obtained via the heartbleed vulnerability: Heartbleed.exe
** Intels had a very fast JPEG decoder (IJL). Unfortunately it had a very small memory leak (8-16 bytes per frame) – this made XProtect Lite (Milestones first NVR) crash after a few days. John Blem (the CTO and co-founder of Milestone), went home, hacked the library and came back with a solution in a few days. I will never fully understand how he did it.
For security applications, I don’t rely on a cameras low light performance to provide great footage. I recently installed a camera in my basement, and naturally I ran a few tests to ensure that I would get good footage if anyone were to break in. After some experimentation, I would get either
Black frames, no motion detected
Grainy, noisy footage triggering motion recording all night
Ghosts moving around
The grainy images were caused by the AGC. It simply multiplies the pixel values by some factor, this amplifies everything – including noise. Such imagery does not compress well – so the frames take up a lot of space in the DB, and may falsely trigger motion detection. Cranking up the compression to solve the size problem is a bad solution, as it will simply make the video completely useless. I suppose that some motion detection algorithms can take the noise into consideration, and provide a more robust detection, but you may not have that option on your equipment.
Black frames is dues to the shuttertime being too fast, and AGC being turned off. On the other hand, if I lower the shutter speed, I get blurry people (ghosts) unless the burglar is standing still, facing the camera, for some duration (Which I don’t want to rely upon).
A word of caution
Often you’ll see demonstrations of AGC, where you have a fairly dark image on one side, and a much better image on the other, usually accompanied by some verbiage about how Automatic Gain Control works its “magic”. Notice that in many cases the subject in the scene is stationary. The problem here is that you don’t know the settings for the reference frame. It might be that the shutter speed is set to 1/5th of a second. The problem is that a 1/5th of second shutter speed is way too slow for most security applications – leading to motion blur.
During installation of the cameras, there are too common pitfalls
Leave shutter and gain settings to “Auto”
Manually set the shutter speed with someone standing (still) in front of the camera
#1 will almost certainly cause problems in low light conditions, as the camera turns the shutter waaay down, leading to adverse ghosting. #2 is better, but your pal should be moving around, and you should look at single frames from the video when determining the quality. This applies to live demos too : always make sure that the camera is capturing a scene with some motion, and request an exported still frame, to make sure you can make out the features properly.
A low tech solution
What I prefer, is for visible light to turn on, when something is amiss. This alerts my neighbours, and hopefully it causes the burglar to abort his mission. I also have quite a few PIR (Passive InfraRed) sensors in my house. They detect motion like a champ, in total darkness (they are – in a sense – a 1 pixel FLIR camera), and they don’t even trigger when our cat is on the prowl.
So, if the PIR sensors trigger, I turn on the light. Hopefully, that scares away the burglar. And since the light is on, I get great shots, without worrying about AGC or buying an expensive camera.
The cheapest DIY PIR sensors are around $10 bucks, you’ll then need some additional gear to wire it all together, but if you are nerdy enough to read this blog, I am pretty sure you already have some wiring in the house to control the lighting too.
So – it’s useless – right?
Well, it depends on the application. I am trying to protect my belongings and my house from intruders, that’s the primary goal. I’d prefer if the cameras never, ever recorded a single frame. But there are many other types of applications where low light cameras come in handy. If you can’t use visible light, and the camera is good enough that it can save you from using an infrared source then that might be your ROI. All else being equal, I’d certainly chose a camera with great low light capabilities over one that is worse, but rarely are things equal. The camera is more expensive, perhaps it has lower resolution and so on.
Finally a word on 3D noise reduction
Traditionally noise reduction was done by looking at adjacent pixels. Photoshop has a filter called “despeckle” which will remove some of the noise in a frame. It does so by looking at other pixels in the frame, and so we get 2 dimensions (vertical and horizontal). By looking at frames in the past, we can add a 3rd dimension – time – to the mix (hence 3D). If the camera is stationary, the algorithm tries to determine if a change of pixel value between frames, is caused by true change, or because of noise. Depending on a threshold, the change is either discarded as noise, or kept as a true change. You might also hear terms such as spatiotemporal, meaning “space and time” – basically, another way of expressing the same as 3D noise reduction.
I don’t know why we are throwing away one of the dimensions when we talk about imagers. I was quite happy to know just get the pixel count. 10MP resolution – great – that’s twice as many pixels as a 5MP camera. Easy. I never, ever, felt that talking about camera resolutions in abbreviations instead of the # of pixels. Is 4CIF better than CIF or better than SIF? To figure this out, I (because I am dumb) will need to look up what the hell SCIF actually means. Don’t get me wrong – i know that when we talk about 1080p it is implied that we are talking about a 1920 x 1080 image. So the “1080” part comes from the verticalresolution. Now we have 4K, but the 4K now describes the horizontalresolution (3840 × 2160). We could just call it 2MP and 8MP and folks would (hopefully) grasp that there are 4 times as many pixels in 4K as in 1080p, or – if people just LOVE the weird CIF/SIF names – we can also call it UHD, and then sit and wait for 8K/FUHD. This sort of inconsistent naming and using meaningless abbreviations drive me (even more) insane.
I am sure there’s some rationale that I just don’t get.
A few days ago, news spread that a type of Hikvision DVRs had been hacked, and had essentially been turned into zombies – doing the bidding of their new masters. This sort of thing is not new in the hacker community. A while ago, I asked if video surveillance systems were next, pointing to the Shodan search engine that were – at the time – focusing on SCADA systems and such, and recently the black-hat video of some IP cameras being hacked showed up in my twitter feed.
How does it work?
Most modern devices, that have internet capabilities, are based on some existing OS or kernel. It could be embedded Linux, Windows CE, Windows XP Embedded, QNX or a bunch of other alternatives, although the kernel is modified quite heavily to suit the device capabilities. Once the kernel runs on the device, you can write applications for it, or you can use existing apps that will run on the kernel you’ve chosen. A common app is called “curl” which grabs “stuff” off the internet using HTTP(S), you could also add an RTSP server to your device – if it is based on linux you might use gStreamer (I believe Axis uses gStreamer today), and so on. Basically, the device is not that different from your PC – it has a OS/Kernel and then some “apps” that runs in the background.
The first thing a hacker will try to do, is to get “root access”. The term “root” means the super, master, administrator who can do everything on a *nix box. To get root access, the hacker might try to extract the password by feeding the HTTP server with some carefully crafted URLs or they might examine the firmware (like in the video above) but sometimes things are a lot easier: Sometimes people put things on the internet, and leave the DEFAULT root password in place. For the Hikvision DVRs the password was 123456, and naturally this is public knowledge. Axis, very cleverly I might add, no longer have a default password! But MOST systems do. You can imagine how dangerous this is – yet every day people put things on the Internet with the default password enabled, or using simple passwords such as 123456 or PASSWORD.
So, the hacker writes a small app that will run on the kernel that the Hikvision DVR comes with. He/She then goes to the shodanhq.com server and searches for Hikvision servers (or runs a scan via some other method). Once the list of servers is retrieved, you go through them one by one, trying to log in with 123456. If you get in, great, if not, go to the next server.
Now I don’t know the hack in question in detail, so I don’t know the steps taken to upload AND schedule running your custom code, even if you are root. But somehow the hacker got the binary uploaded and the kernel started executing the code. The little app mines bitcoins (which might explain why some dirty nerds have millions of dollars tied up in MtGox accounts), and it also tries to hack into a Synology device.
So Hikvision or Synology is to blame?
NO, I don’t think it’s fair to blame them for the poor choices their customers make. One of our customers used to reject the idea of having a RJ45 connector exposed to the outside world, let alone exposing a video surveillance system to the Internet. The Internet is a slum and when you put “stuff” on the Internet, bad people will come by and try to break in. Not changing the default password is a terrible thing to do – if you live in a slum, would you leave your doors unlocked, or perhaps install this sort of thing
Am I a Target?
Do you have a server open to the internet? If so, yes, you too are a target – and people will try to break in. It’s a constant battle to keep people out.
What can I do?
Assuming you HAVE to expose a device to the internet you can observe the following:
Use good passwords
Use non-standard ports.
Try to keep up to date on firmware and security patches.
Check on your systems ever so often.
None of these steps will make your system totally safe, but it will be like locking your doors and windows properly before leaving your house. With enough commitment, someone will most likely find a way in.