HomeKit Flaw

https://9to5mac.com/2017/12/07/homekit-vulnerability/

Does this vulnerability shipping mean you shouldn’t trust HomeKit or smart home products going forward? The reality is bugs in software happen. They always have and pending any breakthrough in software development methods, they likely always will. The same is true for physical hardware which can be flawed and need to be recalled. The difference is software can be fixed over-the-air without a full recall.*

*Unless it’s a Chinese IP camera, then all “mistakes” are deliberate backdoors put in place by the government.

Advertisements

Facts and Folklore in the IP Video Industry

A while ago, I argued that just because JPEGs took up more storage space, it did not mean that JPEG offered superior quality (and certainly not if you do compare H.264 to MJPEG at the same bitrate).

I now find that some people are assuming that high GPU utilization automatically means better video performance and that all you have to do is fire up GPU-Z and you’ll know if the decoder is using the GPU for decoding.

There are some that will capitalize on the collective ignorance of the layman and ignorant “professional”. I suppose there’s always a buck to be made doing that. And a large number of people that ought to know better are not going to help educate the masses, as it would effectively remove any (wrong) perception of the superiority of their offering.

Before we start with the wonkishness, let’s consider the following question: What are we trying to achieve? The way I see it, any user of a video surveillance system simply wants to be able to see their cameras, with the best possible utilization of the resources available. They are not really concerned if a system can hypothetically show 16 simultaneous 4K streams because a) they don’t have 4K cameras and b) they don’t have a screen big enough to show 16 x 4K feeds.

So, as an example, let’s assume that 16 cameras are shown on a 1080p screen. Each viewport (or pane) is going to use (1920/4) * (1080/4) pixels (at most), that’s around 130.000 pixels per camera.

A 1080p camera delivers 2.000.000 pixels, so 15 out of every 16 pixels are never actually shown. They are captured, compressed, sent across the network, decompressed, and then we throw away 93% of the pixels.

Does that make sense to you?

A better choice is to configure multiple profiles for the cameras and serve the profile that matches the client the best. So, if you have a 1080p camera, you might have 3 profiles; a 1080p@15fps, a 720p@8fps and a CIF@4fps. If you’re showing the camera in a tiny 480 by 270 pane, why would you send the 1080p stream, putting undue stress on the network as well as on the client CPU/GPU? Would it not be better to pick the CIF stream and switch to the other streams if the user picks a different layout?

In other words; a well-designed system will rarely need to decode more than the number of pixels available on the screen. Surely, there are exceptions, but 90% of all installations would never even need to discuss GPU utilization as a bog standard PC (or tablet) is more than capable of handling the load. We’re past the point where a cheap PC is the bottleneck. More often than not, it is the operator who is being overwhelmed with information.

Furthermore, heavily optimized applications often have odd quirks. I ran a small test pitting Quicksync against Cuvid; the standard Quicksync implementation simply refused to decode the feed, while Cuvid worked just fine. Then there’s the challenge of simply enabling Quicksync on a system with a discrete GPU and dealing with odd scalability issues.

GPU usage metrics

As a small test, I wrote the WPF equivalent of “hello, world”. There’s no video decoding going on, but since WPF uses the GPU to do compositing on the screen, you’d expect the GPU utilization to be visible in GPU-Z, and as you can see below, that is also the case:

The GPU load:

  • no app (baseline) : 3-7%
  • Letting it sit: 7-16%
  • Resizing the app: 20%

This app, that performs no video decoding what-so-ever, uses the GPU to draw a white background, some text, and a green box on the screen, so just running a baseline app will show a bit of GPU usage. Does that mean that the app has better video decoding performance than, say VLC?

If I wrote a terrible H.264 decoder in BASIC and embedded it in the WPF application, an ignorant observer might deduce that the junk WPF app I wrote was faster than VLC, because it had higher GPU utilization, whereas VLC did not.

As a curious side-note, VLC did not show any “Video Engine Load” in GPU-Z,  so I don’t think VLC uses Cuvid at all. To provide an example of Cuvid/OpenGL, I wrote a small test app that does use Cuvid. The Video Engine Load is at 3-4% for this 4CIF@30fps stream.

cuvid

It reminds me of arguments I had 8 years ago when people said that application X was better than application Y because X showed 16 cameras using only 12% CPU, while Y was at 100%. The problem with the argument was that Y was decoding and displaying 10x as many frames as X. Basically X was throwing away 9 out of 10 frames. It did so, because it couldn’t keep up, determined that it was skipping frames and instead switched to a keyframe-only mode.

Anyway, back to working on the worlds shittiest NVR….

 

Worlds Shittiest NVR pt. 4.

We now have a circular list of video clips in RAM, we have a way to be notified when something happens, and we now need to move the clips in RAM to permanent storage when something happens.

In part 1 we set up FFmpeg to write to files in a loop, the files were called 0.mp4, 1.mp4 … up to 9.mp4. Each file representing 10 seconds of video. We can’t move the file that FFmpeg is writing to, so we’ll do the following instead: We will copy the file previous file that FFmpeg completed, and we’ll keep doing that for a minute or so. This means that we’ll get the file (10 seconds) before the event occurred copied to permanent storage. Then, when the file that was being written while the event happened is closed, we’ll copy that file over, then the next and so on.

We’ll use a node module called “chokidar”, so, cd to your working directory (where the SMTP server code resides) and type:

node install chokidar

Chokiar lets you monitor files or directories and gives you an event when a file has been altered (in our case, FFmpeg has added data to the file). Naturall, if you start popping your own files into the RAM disk and edit those files, you’ll screw up this delicate/fragile system (read the title for clarification).

So, for example if my RAM disk is x:\ we can do this to determine which is the newest complete file:

chokidar.watch('x:\\.', {ignored: /(^|[\/\\])\../}).on('all', (event, path) => {
    
    // we're only interested in files being written to  
    if ( event != "change")  
      return;
    
    // are we writing to a new file?  
   if ( currentlyModifiedFile != path )  
   {  
      // now we have the last file created  
     lastFileCreate = currentlyModifiedFile;  
     currentlyModifiedFile = path;  
   }
});

Now, there’s a slight snag that we need to handle: Node.js’s built-in file handler can’t copy files from one device (the RAM disk) to another (HDD), so to make things easy, we grab an extension library called “fs-extra”

Not surprisingly

node install fs-extra

So, when the camera tries to send an email, we’ll set a counter to some value. We’ll then periodically check if the value is greater than zero. If it is indeed greater than zero, then we’ll copy over the file that FFmpeg just completed and decrement the counter by one.

If the value reaches 0 we won’t copy any files, and just leave the counter at 0.

Assuming you have a nice large storage drive on e:\, and the directory you’re using for permanent storage is called “nvr” we’ll set it up so that we copy from the RAM drive (x:\) to the HDD (e:\nvr). If your drive is different (it most likely is, then edit the code to reflect that change – it should be obvious what you need to change).

Here’s the complete code:

const smtp = require ( "simplesmtp");
const chokidar = require('chokidar');
const fs = require('fs-extra');

// some variables that we're going to needvar 
currentlyModifiedFile = null;
var lastFileCreate = null;
var lastCopiedFile = null;
var flag_counter = 0;
var file_name_counter = 0;

// fake SMTP server
smtp.createSimpleServer({SMTPBanner:"My Server"}, function(req) {
    req.accept();

    // copy files for the next 50 seconds (5 files)
    flag_counter = 10;
}).listen(6789);

// function that will be called every 5 seconds
// tests to see if we should copy files

function copyFiles ( )
{ 
  if ( flag_counter > 0 ) 
  { 
     // don't copy files we have already copied  
     // this will happen because we check the  
     // copy condition 2 x faster than files are being written 
     if ( lastCopiedFile != lastFileCreate ) 
     { 
        // copy the file to HDD 
        fs.copy (lastFileCreate, 'e:/nvr/' + file_name_counter + ".mp4", function(err) {     
           if ( err ) console.log('ERROR: ' + err); 
        });

        // files will be named 0, 1, 2 ... n 
        file_name_counter++;

        // store the name of the file we just copied 
        lastCopiedFile = lastFileCreate; 
     }
     
     // decrement so that we are not copying files  
     // forever 
     flag_counter--; 
  } 
  else 
  { 
     // we reached 0, there is no  
     // file that we copied before. 
     lastCopiedFile = null; 
  }
}

// set up a watch on the RAM drive, ignoring the . and .. files
chokidar.watch('x:\\.', {ignored: /(^|[\/\\])\../}).on('all', (event, path) => {
  // we're only interested in files being written to  
  if ( event != "change")  return;
   
  // are we writing to a new file?  
  if ( currentlyModifiedFile != path )  
  {  
     // now we have the last file created  
     lastFileCreate = currentlyModifiedFile;  
     currentlyModifiedFile = path;  
  }
});

// call the copy file check every 5 seconds from now on
setInterval ( copyFiles, 5 * 1000 );

So far, we’ve written about 70 lines of code in total, downloaded ImDrive, FFMpeg, node.js and a few modules (simplesmtp, chokidar and fs-extra), and we now have a pre-buffer fully in RAM and a way to store things permanently. All detection is done by the camera itself, so the amount of CPU used is very, very low.

This is the UI so far :

folders

In the next part, we’ll take a look at how we can get FFmpeg and nginx-rtmp to allow us to view the cameras on our phone, without exposing the camera directly to the internet.

 

 

Worlds Shittiest NVR pt. 3.

Our wonderful NVR is now basically a circular buffer in RAM, but we’d like to do a few things if motion (or other things) occur.

Many cameras support notification by email when things happen; while getting an email is nice enough, it’s not really what we want. Instead, we’ll (ab)use the mechanism as a way for the camera to notify our “NVR”.

First, we need a “fake” SMTP server, so that the camera will think that it is talking to a real one and attempt to send an actual email. When we receive the request to send the email we’ll simply do something else. An idea would be to move the temporary file on the RAM drive to permanent storage, but first, we’ll see if we can do the fake SMTP server in a few lines of code.

Start by downloading and installing node.js. Node.js allows us to run javascript code, and to tap into a vast library of modules that we can use via npm (used to stand for “Node Package Manager).

Assuming you’ve got node installed, we’ll open a command prompt and test that node is properly installed by entering this command:

node -v

You should now see the version number of node in the console window. If this worked, we can move on.

Let’s make a folder for our fake SMTP server first; Let’s pretend you’ve made a folder called c:\shittynvr. In the command prompt cd to that directory, and we’re ready to enter a few more commands.

We’re not going to write an entire fake SMTP server from scratch, instead, we’ll be using a library for node. The library is called simplesmtp. It is deprecated and has been superseded by something better, but it’ll work just fine for our purpose.

To get simplesmtp, we’ll enter this command in the prompt:

npm intall simplesmtp

You should see the console download some stuff and spew out some warnings and messages, we’ll ignore those for now.

We now have node.js and the simplesmtp library, and we’re now ready to create our “event server”.

Create a text file called “smtp.js”, add this code to the file, and save it.

const smtp = require ( "simplesmtp");
smtp.createSimpleServer({SMTPBanner:"My NVR"}, function(req){
  req.pipe(process.stdout);
  req.accept();
  
  // we can do other stuff here!!!

}).listen(6789);
console.log ( "ready" );

We can now start our SMTP server, by typing

node smtp.js

Windows may ask you if you want to allow the server to open a port, if you want your camera to send events to your PC, you’ll need to approve. If you are using a different firewall of some sort, you’ll need to allow incoming traffic on port 6789.

We should now be ready to receive events via SMTP.

The server will run as long as you keep the console window open, or until you hit CTRL+C to stop it and return to the prompt.

The next step is to set up the camera to send emails when things happen. When you enter the SMTP setup for your camera, you’ll need to enter the IP address of your PC and specify the port 6789. How you set up your camera to send events via email varies with manufacturers, so consult your manual.

Here’s an example of the output I get when I use a Hikvision camera. I’ve set it up so that it sends emails when someone tries to access the camera with the wrong credentials:

output Next time, we’ll look at moving files from temporary RAM storage to disk.

Worlds Shittiest NVR pt. 2.

In pt. 1 we set up FFmpeg to suck video out of your affordable Hikvision camera. I hope your significant other was more impressed with this feat than mine was.

The issue we have with this writing constantly to the drive is that most of the time, nothing happens, so why even commit it to disk? It obviously depends on the application, but if you’re sure your wonderful VMS will not be stolen or suffer an outage at the time of a (real) incident, you can simply keep things in RAM.

So, how do we get FFmpeg to store in RAM? Well … Enter the wonderful world of the RAM disk.

ImDisk Virtual Disk Driver, is a tool that allows us to set up a RAM drive. Once you’ve downloaded the tool, you can create a disk using this command:

imdisk -a -s 512M -m X: -p "/fs:ntfs /q /y"

Do you remember how I said that I had an x: drive? Total lie. It was a RAM drive the whole time!

The command shown creates a 512-megabyte NTFS drive backed by RAM. This means that if the computer shuts down (before committing to physical HDD) the data is gone. On the other hand, it’s insanely fast and it does not screw up your HDD.

When we restart FFmpeg, it will now think that it is writing to an HDD, but in reality, it’s just sticking it into RAM. To the OS the RAM disk is a legit harddrive so we can read/write/copy/move files to and fro the disk.

In part 3, we’ll set up node.js to respond to events.

Oh, and here’s a handy guide to imdisk.

Worlds Shittiest NVR pt. 1

In this tutorial, we’ll examine what it takes to create a very shitty NVR on a Windows machine. The UI will be very difficult to use, but it will support as many cameras as you’d like, for as long as you like.

The first thing we need to do is to download FFmpeg.

Do you have it installed?

OK, then we can move on.

Create a directory on a disk that has a few gigabytes to spare. On my system, I’ve decided that the x: drive is going to hold my video. So, I’ve created a folder called “diynvr” on that driver.

Note the IP address of your camera, the make and model too, and use google to find the RTSP address of the camera streams. Many manufacturers (wisely) use a common format for all their cameras. Others do not. Use Google (or Bing if you’re crazy).

Axis:
rtsp://[camera-ip-address]/axis-media/media.amp

Hanwha (SUNAPI):
rtsp://[camera-ip-address]/profile[#]/media.smp

Some random Hikvision camera:
rtsp://[camera-ip-address]/Streaming/Channels/[#]

Now we’re almost ready for the worlds shittiest NVR.

The first thing we’ll do is to open a command prompt (I warned you that this was shitty). We’ll then CD into the directory where you’ve placed the FFmpeg files (just to make it a bit easier to type out the commands).

And now – with a single line, we can make a very, very shitty NVR (we’ll make it marginally better at a later time, but it will still be shit).

ffmpeg -i rtsp://[...your rtsp url from google goes here...] -c:v copy -f segment -segment_time 10 -segment_wrap 10 x:/diynvr/%d.mp4

So, what is going on here?

We tell FFmpeg to pull video from the address, that’s this part

-i rtsp://[...your rtsp url from google goes here...]

we then tell FFmpeg to no do anything with the video format (i.e. keep it H.264, don’t mess with it):

-c:v copy

FFmpeg should save the data in segments, with a duration of 10 seconds, and wrap around after 10 segments (100 seconds in all)

-f segment -segment_time 10 -segment_wrap 10

It should be fairly obvious how you change the segment duration and number of segments in the database so that you can do something a little more useful than having just 100 seconds of video.

And finally, store the files in mp4 containers at this location:

 x:/diynvr/%d.mp4

the %d part, means that the filename will be the digits of the segment as filename, so we’ll get files named 0.mp4, 1.mp4 … up to and including 9.mp4.

So, now we have a little circular buffer with 100 seconds of video. If anyone breaks into my house, I just need to scour through the files to find the swine. I can open the files using any video player that can play mp4 files. I use VLC, but you might prefer something else. Each file is 10 seconds long, and with thumbnails, in Explorer, you have a nice little overview of the entire “database”.

In the next part we will improve on these two things:

  • Everything gets stored
  • Constant writing to HDD

Oh, and if your camera is password protected (it should be), you can pass in the credentials in the rtsp url like so:

rtsp://[username]:[password]@[ the relevant url for your camera]

 

Random Ramblings on RTSP

It stands for “Real Time Streaming Protocol” and it is pretty much the de-facto protocol for IP security cameras. It’s based on a “pull” principle; anyone who wants to get the feed must ask for it first.

Part of the RTSP protocol describes how the camera and the client exchange information about how the camera sends its data. You might assume that the video is sent back via the same socket as the one used for the RTSP negotiation. This is not the case.

So, in the bog standard usage, the client will have to set up ports that it will use to receive the data from the camera. At this point, you could say that the client actually becomes a server, as it is now listening on two different ports. If you were to capture the communication, you might see something like this:

C->S: 
 SETUP rtsp://example.com/media.mp4/streamid=0 RTSP/1.0
 CSeq: 3
 Transport: RTP/AVP;unicast;client_port=8000-8001

S->C: 
  RTSP/1.0 200 OK
 CSeq: 3
 Transport: RTP/AVP;unicast;client_port=8000-8001;server_port=9000-9001;ssrc=1234ABCD
 Session: 12345678

If both devices are on the same LAN (and you tolerate that the client app opens two new ports in the firewall of the PC), the camera will start sending UDP packets to those two ports. The camera has no idea if the packets arrive or not (it’s UDP), so it’ll just keep spewing those packets until the client tears down the connection or the camera fails to receive a keep-alive message (usually just a dummy request via the RTSP channel).

But what if they’re not on the same network?

My VLC player on my home network may open up port 8000 and 8001 locally on my PC, but the firewall in my router has no idea this happened (there are rare exceptions to this). So the VLC player says “hey, camera out there on the internet, just send video to me on port 8000”, but that isn’t going to work because my external firewall will filter out those packets.

To solve this issue, we have RTSP over TCP.

RTSP over TCP lets the client initiate all the connections, and the camera then sends data back through those connections. Most firewalls have no issue accepting data via a connection as long as it was initiated from the inside (UDP hole punching takes advantage of this). RTSP over HTTP is an additional layer to handle hysterical system admins that only tolerate “safe” HTTP traffic through their firewalls.

So, is that the only reason to use TCP?

Well… Hopefully, you know that UDP is non-ack; the sender has no idea if the client received the packet or not. This is useful for broad- and multicasting where getting acks would be impossible. So the server is happily spamming the network, not a care in the world if the packets make it or not.

TCP has ack and retransmission; in other words, the sender knows if the receiver is getting the data and will re-transmit if packets were dropped.

Now, imagine that we have two people sitting in a quiet meeting room, one of them reads a page of text to the other. The guy reading is the camera and the guy listening is the recorder (or VLC player).

So the reader starts reading “It was the best __ times, it was ___ worst of times“, and since this is UDP, the listener is not allowed to say “can you repeat that“. Instead, the listener simply has to make do. Since there’s not a lot of entropy in the message, we can make sense of the text even if we drop a few words here and there.

But imagine we have 10 people reading at the same time. Will the listener be able to make sense of what is being said? What about 100 readers? While this is a simplified model, this is what happens when you run every camera on UDP.

Using TCP, you kinda have the listener going “got it, got it, got it” as the reader works his way down the page. If the room is quiet, the listener will rarely have to say “can you repeat that”. In other words, the transmission will be just as fast as UDP.

If you have 10 readers, and some of them are not getting the “got it” message, they may decide to scale back a bit and read a bit more slowly. In the end, though, the listener will have a verbatim copy of what was being read, even if there are 1000 readers.

Modern video codecs are extremely efficiently encoded, h.264 and h.265 throw away almost everything that is not needed (and then some). This means that if you drop packets the impact is much greater, without those missing packets all you get is a gray blur because that is all that the receiver heard when 100 idiots were yelling on top of each other.

So TCP solves the firewall issue, and in a “quiet” environment it is just as efficient as UDP. In a noisy environment, it will slow things down because of retransmissions, but is that not a lot better than to get a blurry mess? Would it not be better if the cameras were able to adjust their settings if the receiver can’t keep up? Isn’t it better than just spewing a huge amount of HD video packets into the void, never to be seen?

In my opinion, for IP surveillance installations, you should pick RTSP over TCP, and only switch to UDP if you don’t care about losing video.

As an experiment, I set up my phone to blast UDP packets to my server to determine the packet loss on LTE. Assuming it would be massive. Turns out that LTE actually has retransmission of packets on the radio layer (at least for data packets), I don’t know if it does the same for voice data.

The difference may be academic for a lot of installations as the network is actually pretty quiet, but for large/poorly architected solutions it may make a real difference.