The (Real) Problem With Cybersecurity

Having been in the sausage-factory for a long time, I’d like to share some thought about what I think is a problem when it comes to cyber-security.

Contrary to popular belief programmers are humans; we make stupid mistakes, some are big, some are small. Some days we are lazy, others we are energetic and highly motivated. So (exploitable) bugs inevitably creep in, this is just a fact of life.

The first step in writing robust and secure code is in the design and architecture. The next step is to have developers with good habits and skills, and finally you run a good selection of automated tests on the modules that make up the product.

But consider that a VMS typically runs on an OS that the vendor has very little control over, or uses a database (SQL server, mySQL, MaxDB etc) that is also outside the manufacturers reach. Furthermore, the VMS itself uses libraries from 3rd parties, and with good reason too. Open Source libraries are often much better tested and under an extreme amount of scrutiny compared to a VMS doing a homemade concoction reviewed by just a few peers, but they too have bugs (just fewer than the homebrew stuff usually).

Inevitably, someone finds a way to break the system, and when it happens, it’s a binary event. The product is now insecure. You can argue all you want that the other windows and doors are super-secure, but if the back door is open – who cares about the lock on the window?

To be fair, if the rest of the building is locked down well, then fixing the broken door may be a smaller event.

Contrast to a system that is insecure by design. Where fixing the security issues requires changes to the architecture. We’re no longer talking about replacing a broken lock, but upheaval of the entire foundation. An end-user doesn’t know if the cracks are due to a fundamental issue, or something that just needs a bit of plaster and paint.

And this brings me to the real issue.

Say a developer politely asks demand that resources are allocated to fixing these issues, what do you imagine will happen? In some companies, I assume that a task-force is assembled to estimate the severity of the issue, resources are then allocated to fix the issue. A statement is issued so that people know to apply the patch (they’re not going to do it, but it’s the right thing to do). This is what a healthy company ought to do. A sick company would make the following statement: “no-one has complained about this issue, and – actually – we have to make money”.

A good way to make yourself unpopular (as a programmer) is to respond by saying that if the issue IS discovered, you can forget about making any money. Your market will be limited to installations who really don’t care about security. The local Jiffy-Lube who replaced their VHS based recorder with a DVR that just sits on a dusty shelf may truly not care. The system is not exposed in any way – it is a CCTV (Closed, being the operative word here). They’re fine. And the root password is written on a post-it note stuck on the monitor. But what about a power plant? What about a bank? an airport?

You might imagine that an honest coder with integrity would resign on the spot, but this doesn’t solve the problem. Employees are often gagged by NDAs and non-disparagement clauses, and while disclosure of security flaws is clearly protected by the first amendment, it is generally a bad idea to talk about these things. The company may suffer heavy losses and you are putting (unsuspecting) customers at risk by making these things public. The threat of legal action and the asymmetry (a single person vs a corporation) ensures that flaws rarely surface.

It’s also conceivable that the dumbass programmer, is wrong about the risk of a bug/design issue. A developer may think that a trivial bypass of privilege checks is “dangerous”, but customers might genuinely not care.

Who knows? During the Black Hat convention in 2013, IP cameras from different manufacturers were shown to be hopelessly unsafe. Didn’t seem to make any difference.

I referenced this talk in an earlier post as well.

4 years later, cybersecurity is all the rage, and perhaps people do care – but from what I can tell, it’s just a few SJWs who crave the spotlight who pretend to care. Whether the crazy accusations have merit is irrelevant, all that matters is that viewers tune in, and the show will get increasingly grotesque to keep people entertained. And if the freaks-show is not bringing in the crowds, you can always turn it into a sort of “anonymous facebook” where people can back-stab each other – like the bitchiest teenage girls used to treat each other.

What the industry probably needs to do, is to pay professional penetration testers to go to work on the systems out there. I’m not talking about the kind of shitty automated tests that are being done today. They are far, far from being sufficient. You need people like Craig Heffner in the video to go to town to get to the bottom.

Happy hacking.

Monolith

20 years ago, the NVR we wrote was a monolith. It was a single executable, and the UI ran directly on the console. Rendering the UI, doing (primitive) motion detection and storing the video was all done within the same executable. From a performance standpoint, it made sense; to do motion detection we needed to decode the video, and we need to decode the video to render it on the screen, so decoding the video just once made sense. We’d support up to a mind-blowing 5 cameras per recorder. As hardware improved, we upped the limit to 25, in Roman numerals, 25 is XXV, and hence the name XProtect XXV (people also loved X’s back then – fortunately, we did not support 30 cameras).

Image result for rock

I’m guessing that the old monolith would be pretty fast on today’s PC, but it’s hard/impossible to scale beyond a single machine. Supporting 1000 cameras is just not feasible with the monolithic design. That said, if your system is < 50 cameras, a monolith may actually simpler, faster and just better, and I guess that’s why cheap IP recorders are so popular.

You can do a distributed monolith design too; that’s where you “glue” several monoliths together. The OnSSI Ocularis system does this; it allows you to bring in a many autonomous monoliths and let the user interact with them via one unified interface. This is a fairly common approach. Instead of completely re-designing the monolith, you basically allow remote control of the monolith via a single interface. This allows a monolith to scale to several thousand cameras across many monoliths.

One of the issues of the monolithic design is that the bigger the monolith, the more errors/bugs/flaws you’ll have. As bugs are fixed, all the monoliths must be updated. If the monolith consists of a million lines, chances are that the monolith will have a lot of issues, and fixes for these issues introduce new issues and so on. Eventually, you’re in a situation where every day you have a new release that must be deployed to every machine running the code.

The alternative to the monolith is the service based architecture. You could argue that the distributed monolith is service based; except the “service” does everything. Ideally, a service based design ties together many different services that have a tightly defined responsibility.

For example; you could have the following services: configuration, recorder, privileges, alarms, maps, health. The idea being that each of these services simply has to adhere to an interface contract. How the team actually implements the functionality is irrelevant. If a faster, lighter or more feature rich recorder service comes along, it can be added to the service infrastructure as long as it adheres to the interface. Kinda like ONVIF?

This allows for a two-tiered architectural approach. The “city planner” who plans out what services are needed and how they communicate, and the “building architect” who designs/plans what goes into the service. Smaller services are easier to manage, and thus, hopefully, do not require constant updates. To the end user though, the experience may actually be the same (or even worse). Perhaps patch 221 just updates a single service, but the user has to take some action. Whether patch 221 updates a monolith or a service doesn’t make much difference to the end-user.

Just like cities evolve over time, so does code and features. 100 years ago when this neighborhood was built, a sewer pipe was installed with the house. Later, electricity was added, it required digging a trench and plugging it into the grid. Naturally, it required planning and it was a lot of work, but it was done once, and it very rarely fails. Services are added to the city, one by one, but they all have to adhere to an interface contract. Electricity comes in at 50 Hz, and 220V at the socket, and the sockets are all compatible. It would be a giant mess if some providers used 25 Hz, some 100 Hz, some gave 110V, some 360V etc. There’s not a lot of room for interpretation here; 220V 50 Hz is 220V 50 Hz. If the spec just said “AC” it’s be a mess. Kinda like ONVIF?.

Image result for wire spaghetti

In software, the work to define the service responsibilities, and actually validate that services adhere to the interface contract is often overlooked. One team does a proprietary interface, another uses WCF, a third uses HTTPS/JSON, and all teams think that they’re doing it right, and everyone else is wrong. 3rd parties have to juggle proprietary libraries that abstract the communication with the service or deal with several different interface protocols (never mind the actual data). So imagine a product that has 20 different 3rd party libraries, each with bugs and issues, and each of those 3rd parties issue patches every 6 months. That’s 40 times a year that someone has to make decide to update or not; “Is there anything in patch 221 that pertains to my installation? Am I using a service that is dependent on any of those libraries” and so on.

This just deals with the wiring of the application. Often the UI/UX language differs radically between teams. Do we drag/drop things, or hit a “transfer” button. Can we always filter lists etc. Once again, a “city planner” is needed. Someone willing to be the  a-hole, when a team decide that deviating from the UX language is just fine, because this new design is so much better.

I suppose the problem, in many cases, is that many people think this is the fun part of the job, and everyone has an opinion about it. If you’re afraid of “stepping on toes”, then you might end up with a myriad of monoliths glued together with duct-tape communicating via a cacophony of protocols.

OK, this post is already too long;

Monoliths can be fine, but you probably should try to do something service based. You’re not Netflix or Dell, but service architecture means a more clearly defined purpose of your code, and that’s a good thing. But above all, define and stick to one means of communication, and it should not be via a library.

 

The Singleton Anti-Pattern

In programming, the whole idea is to avoid re-inventing the wheel, and re-use as much as possible. Some clever coders discovered that there were some mechanism that were used over and over again. For example, the “producer/consumer” mechanism, whereby one or more threads are “producers” and one or more threads are “consumers”. Instead of coders figuring out how to do this properly over and over again, a group of people decided to write a book that described how to solve some of these problems. “Design Patterns: Elements of Reusable Object-Oriented Software” they called it. In the business, the authors became known as the “Gang of Four”.

One of the patterns they described is a “Singleton“: A singleton is essentially a global object, that is instantiated when needed. The idea being that the user doesn’t need to know when, or how, the underlying object is created/destroyed, they can just use it, and all parts of the code then shares the same object. Isn’t that cool. It’s like global variables were suddenly being endorsed in a book, and by some clever people too!!

There are cases (rare, constrained) where a global variable makes sense; it makes sense when the physical properties that the software is trying to model, matches with a single object. E.g. a singular file on a disk or a specific camera in a network. It’s perfectly appropriate to model these objects as global, because there truly is only one of them.

Let’s consider a log mechanism. There may be several things that are logging data, but if all that data goes into just one file, then it’s OK to use a singleton for the file, but certainly not for the log abstractions. If there are three or four different modules that are all logging to the same file, then those modules must have their own logger instance, and the various instances that are made, can then write to the same file using the singleton.

A primitive class diagram could look like this:

             Module A -> Log A 
Parent  ->                        -> Singleton File
             Module B -> Log B

When you are acutely aware of this composition, you should eventually realize that each logger instance must add some identifier when it writes to the disk. Otherwise you get a log file that looks like this

File Open
File Open
File Write Failed
File Write Succeeded
File Close
File Close

What you want, in the file, is this

Module A: File Open
Module B: File Open
Module B: File Write Failed
Module A: File Write Succeeded
Module B: File Close
Module A: File Close

This appears to solve the problem; except there’s a caveat. Say someone writes an app that creates two instances of the parent module. Since the log file is a singleton, all log data is written to the same file. This, in turn, means that two instances of the parent will also write to the same file.

Consider this diagram

                              Module A -> Log A
                 Parent ->               
                              Module B -> Log B
Aggregator  ->                                       -> Singleton File
                              Module A -> Log A
                 Parent ->
                              Module B -> Log B

We are now in hell.

Module A: File Open
Module B: File Open
Module B: File Write Failed
Module A: File Open
Module B: File Write Failed
Module A: File Write Succeeded
Module B: File Close
Module A: File Write Succeeded
Module A: File Close

This issue is relatively easy to fix, and it’s still valid to have a requirement that there is just one log file (might be better to create one per parent, but that’s a matter of taste).

But what about issues where things like username, password, preferences etc. are stored in a singleton that contains “user info”. In that case, when the aggregator sets the username, the username change applies to ALL modules, regardless of where they reside in the aggregator tree. It’s therefore impossible for the aggregator to set a different username for Parent 1 and Parent 2. The aggregator, therefore, breaks.

Essentially, the coder might as well have said “let’s make the username a global variable”. 99% of all coders will object when they hear that (or “goto”). But 50% of all coders remain silent when the same pattern is described using the “singleton” moniker.

The morale of the story: don’t use singletons. Not even if you think you know what you are doing. Because if you think you know what you are doing, then you almost certainly do not.

 

Looping Canned Video For Demos

Here’s a few simple(?) steps to stream pre-recorded video into your VMS.

First you need to install an RTMP server that can do RTMP to RTSP conversion. You can use Evostream, Wowza or possibly Nimblestreamer.  Nginx-rtmp won’t work as it does not support RTSP output.

Then get FFMpeg (windows users can get it here).

Find or create the canned video that you want to use, and store it somewhere accessible.

In this example, I have used a file called R1.mp4 and my RTMP server (Evostream) is located at 192.168.0.109. The command used is this:

ffmpeg -re -stream_loop -1 -i e:\downloads\r1.mp4 -c copy -fflags +genpts -f flv rtmp://192.168.0.109/live/r1

Once this is streaming (and you can verify using VLC and opening the RTMP url you provided), you can go to your VMS and add a generic RTSP camera.

For Evostream, the RTSP output is on a different port, and has a slightly different format, so in the recorder I add:

rtsp://192.168.0.109:5544/r1

Other RTMP servers may have a slightly different transform of the URL, so check the manual.

I now have a video looping into the VMS and I can run tests and benchmarks on the exact same feed w/o needing an IP camera.

 

 

Listening to Customers

In 2011, BlackBerry peaked with a little more than 50 million devices sold. The trajectory had an impressive ~50% CAGR from 2007 where the sales were around 10 million devices. I am sure the board and chiefs were pleased and expected this trend to continue. One might expect that ~250 million devices were to be sold in 2016 if the CAGR could be sustained. Even linear growth would be fairly impressive.

Today, in 2017, BlackBerry commands a somewhat unimpressive 0.0% of the smartphone market.

There was also Nokia. The Finnish toilet-paper manufacturer pretty much shared the market with Ericsson in Scandinavia and was incredibly popular in many other regions. If I recall correctly, they sold more devices than any other manufacturer in the world. But they were the McDonalds of mobile phones: Cheap and simple (nothing wrong with that per se). They did have some premium phones, but perhaps they were just too expensive, too clumsy or maybe too nerdy?

ngage
Talking on a Nokia N-Gage phone

Nokia cleverly tricked Microsoft into buying their phone business, and soon after the Microsoft gave up on that too (having been a contender in the early years with Windows CE/Mobile).

I am confident that BlackBerry was “listening to their customers”. But perhaps they didn’t listen to the market. Every single customer at BlackBerry would state that they preferred the physical keyboard and the naive UI that BlackBerry offered. So why do things differently? Listen to your customers!

If BlackBerry was a consulting agency, then sure, do whatever the customer asks you to. If you’re selling hot-dogs, and the customer asks for more sauerkraut, then add more sauerkraut, even if it seems revolting to you. But BlackBerry is not selling hotdogs or tailoring each device to each customer. They are making a commodity that goes in a box and is pulled off a shelf by someone in a nice shirt.

As the marginally attached customers are exposed to better choices (for them), they will opt for those, and in time, as the user base dwindles, you’re left with “fans”. Fans love the way you do things, but unless your fan base is growing, you’re faced with the very challenging task of adding things your fans may not like. Employees that may be prostrate bowed but not believing, will leave and eventually you’ll have a group of flat-earth preachers evangelizing to their dwindling flock.

It might work as a small, cooky company that makes an outsider device, but it sure cannot sustain the amount of junk that you tag on over the years. Eventually that junk will drag the company under.

Or, perhaps BlackBerry was a popular hotdog stand, in a town where people just lost the appetite for hotdogs and had a craving for juicy burgers and pizza (or strange hotdogs)

Taste and Craftsmanship

I think it would be unfair to say that blue is a better color than green, or that flat design is better than skeumorphic. And I don’t think we’d appreciate flat design w/o having been through skeumorphic design first (and poor, misguided attempts at that too).

The website craigslist.com is #50 in the world (#10 in the US), and I think most designers will agree that it looks pretty plain. Amazon.com is #11 and sports a pretty chaotic design. So it would seem that a design doesn’t make or break a product; a product that works really well can succeed in spite of not being pretty. I suppose the design just has to be appropriate. I wonder if craigslist would have been where it is today, if Craig Newmark had tried to keep up with the design trends over the years. I think what’s key for craigslist is that the design looks almost bohemian, which may resonate quite well with the self-image of its users.

But craftsmanship is something slightly different. Poor craftsmanship can be a dealbreaker, and I believe it carries bigger weight than aesthetic preferences. Even if craigslist looks rather plain, it does follow some fairly well-established design rules. It would appear that the craftsmanship is good – the designers know what they are doing, and while you may not be particularly attracted to the site (the taste), the design feels deliberate and done with/on  purpose.

craigslist

As an example of the polar opposite, take a look at Yahoo! classified registry. I don’t know what those pages are for, but it seems as if Yahoo! has no taste. They’ve just mashed a bunch of random ingredients into a very tasteless pie. I wonder if they are maintaining it at all – it sure looks like they’ve abandoned it a while ago. I think the Yahoo page is an example of no taste, and bad craftsmanship. The “New” icon seems completely out of place, and it looks pretty bad.

Yahoo

 

So, I think it’s safe to say that Yahoo! have failed. A poorly designed page, with no meaningful purpose (that I can see). This sort of thing must be avoided at all cost. Yet it seems that a lot of companies end up with something similar to Yahoo!’s abomination. A lot of times, as developers discover a new technique, they can’t wait to use it – somewhere – anywhere. When I was first dabbling in WPF I did a (terrible) mirror floor effect. I immediately popped it in the administrator app. As it was pretty cool, no-one told me to remove it. It still sits there. Inappropriate and annoying to me. And as time passes, my preferences change. I went from skeumorphic to flat, but I never had the time to redo all the assets. As a result, I have a mix of both styles.

So, I understand why design rot happens, and I am pretty sure that I know how to remedy the situation, but I can’t prove that theres a ROI on cleaning it up. People who didn’t buy our product, are not going to complain about the lack of consistency in the design, and people who did, probably don’t care enough to complain. Thus, the problem appears smaller than it might be. Furthermore, a new, overhauled design, may not sit well with our existing customers, and there is clearly a tendency to want to do everything vastly different a second time around (the second system effect).

 

One Auga Per Ocularis Base*

In the Ocularis ecosystem, Heimdall is the component that takes care of receiving, decoding and displaying video on the screen. The functionality of Heimdall is offered through a class called Auga. So, to render video, you need to create an Auga object.

Ocularis was designed with the intent of making it easy for a developer to get video into their own application. Initially it was pretty simple – instantiate an Auga instance, and pass in a url, and viola, you had video. But as we added support for a wider range of NVRs, things became a little more complex. Now you need to instantiate an OCAdapter, log into an Ocularis Base Server. Then, pass the cameras to Auga via SetCameraIDispatch and then you can get video. The OCAdapter in turn, depends on a few NVR drivers. So deployment became more complex too.

One of the most common problems that I see today, is that people instantiate one OCAdapter, and one Auga instance per camera. This causes all sorts of problems; each instance counts as one login (which is a problem on login-restricted systems), every instance consumes memory and memory for fonts and other graphics are not shared between the instances. In many ways, I should have anticipated this type of use, but on the other hand, the entire Ocularis Client is using Heimdall/Auga as if it was a 3rd party component, and that seems to work pretty well (getting a little dated to look at, but hey…)

Heimdall also offers a headless mode. We call it video-hooks, and it allows you to instantiate an Auga object, and get decoded frames via a callback, or a DLL, instead of having Auga draw it on the screen. The uses for this are legion, I’ve used the video-hooks to create a web-interface, and until recently we used it for OMS to, video analytics can use the hooks to get live video in less than 75 lines of code . Initially the hooks only supported live video, but it now supports playback of recorded video too. But even when using Auga for hooks, should you ever only create one Auga instance per Ocularis Base. One Auga instance can easily stream from multiple cameras.

However, while Heimdall is named after a God, it does not have magical capabilities. Streaming from 16 * 5 MP * 30 fps will tax the system enormously – even on a beefy machine. One might be tempted to say “Well, the NVR can record it, so why can’t Auga show it?”. Well, the NVR does not have to decode every single frame completely to determine the motion level, but Auga has to decode everything, fully, all the way to the pixel format you specify when you set up the hook. If you specify BGR as your expected pixel format, we will give you every frame as a fully decoded BGR frame at 5MP. Unfortunately, there is no way to decode only every second or third frame. You can go to I-Frame only decoding (we do not support that right now), but that lowers the framerate to whatever the I-frame interval would be, typically 1 fps.

If you are running Auga in regular mode, you can show multiple cameras by using the LoadLayoutFromString function. It allows you to create pretty much any layout that you can think of, as you define the viewports via a short piece of text. Using LoadLayoutFromString  (account reqd.) you do not have to handle maximizing of viewports etc. all that is baked into Auga already. Using video hooks, you can set up (almost) any number of feeds via one Auga instance.

sdk_loadlayout

Granted, there are scenarios where making multiple Augas makes sense – certainly, you will need one per window (and hence the asterisk in the headline), and clearly if you have multiple processes running, you’d make one instance per process.

I’ll talk about the Direct3D requirement in another post.