Conway’s Law

I was re-watching this video about the (initially failed) conversion from a monolithic design of an online store, into a microservice based architecture. During the talk, Conway’s Law is mentioned. It’s one of those laws that you really should keep in mind when building software.

“organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”
— M. Conway

The concept was beautifully illustrated by a conversation I had recently; I was explaining why I disliked proprietary protocols, and hated the idea of having to rely on a binary library as the interface to a server. If a server uses HTTPS/JSON as it’s external interface, it allows me to use a large number of libraries – of my choice, for different platforms (*nix, windows) – to talk to the server. I can trivially test things using a common web browser. If there is a bug in any of those libraries, I can use another library, I can fix the error in the library myself (if it is OSS) etc. Basically I become the master of my own destiny.

If, on the other hand, there is a bug in the library provided to me, required to speak some bizarre proprietary protocol, then I have to wait for the vendor/organizational unit to provide a bug-fixed version of the library. In the meantime, I just have to wait. It’s also much harder to determine if the issue is in the server or the library because I may not have transparency to what’s inside the library, and I can’t trivially use a different means of testing the server.

But here’s the issue; the bug in the communication library that is affecting my module might not be seen as a high priority issue by the unit in charge of said library. It might be that the author left, and it takes considerable time to fix the issue etc. etc. this dramatically slows down progress and the time it takes to deliver a solution to a problem.

Image result for bottleneck

The strange thing is this; the idea that all communication has to pass through a single library, making the library critically important (but slowing things down) was actually 100% mirrored in the way the company communicated internally. Instead of encouraging cross team communication, there was an insistence that all communication pass through a single point of contact.

Basically, the crux is this, if the product is weird, take a look at the organization first. It might just be the case that the product is the result of a sub-optimal organizational structure.

Advertisements

Crashing a Plane

Ethiopian Airlines Flight 961 crashed into the Indian ocean. It had been hijacked en route from Addis-Ababa to Nairobi. The hijackers wanted to go to Australia. The captain warned that the plane only had enough fuel for the scheduled flight and would never make it to Australia. The hijackers disagreed. The 767-200ER had a max. flight capacity of 11 hours, enough to make it to Australia they argued. 125 people died when the plane finally ran out of fuel and the pilots had to attempt an emergency landing on water.

Korean Air Flight 801 was under the command of the very experienced Captain Park Yong-chul. During heavy rain, the Captain erroneously thought that the glidescope instrument landing system was operational, when it fact it wasn’t. The Captain sent the plane into the ground about 5 km from the airport killing 228 people.

In the case of Ethiopian Airlines, there’s no question that the people in charge of the plane (the hijackers), had no idea what they were doing. Their ignorance, and distrust of the crew, ultimately caused their demise. I am certain that up until the last minute, the hijackers believed they knew what they were doing.

For Korean Air 801, the crew was undoubtedly competent. The Captain had 9000 hours logged, and during the failed approach, we can safely assume that he felt that he knew what he was doing. In fact, he might have been so good that everyone else stopped second guessing Captain Park even though their instruments was giving them a reading that told them something was seriously wrong. Only the 57 year old flight engineer Nam Suk-hoon with 13000 hours logged dared speak up.

I think there’s an analogy here; we see companies crash due to gross incompetence, inexperience and failure to listen to experienced people, but we also see companies die (or become zombies) because they have become so experienced that they felt that they couldn’t make any fatal mistakes. Anyone suggesting they were short on approach are ignored. The “naysayers” can then leave the plane on their own, get thrown out for not being on board with the plan, or meet their maker when the plane hits the ground.

Yahoo comes to mind; witness this horror-show:

Image result for yahoo bad decisions

The people making these mistakes were not crazed hijackers with an insane plan. These were people in expensive suits, with many many years of experience. They all had a large swarm of people doing their bidding and showing them excel sheets and power-point presentations from early morning to late evening. Yet, they managed to crash the plane into the ground.

So, I guess the moral is this: if you’re reading the instruments, and they all say that you’re going to crash into the ground, then maybe, just maybe the instruments are showing the true state of things. If the Captain refuses to acknowledge the readings and dismisses the reports, then the choices are pretty clear.

The analogy’s weakness is that in most cases, no-one dies when the Captain sends the product in the wrong direction. The “passengers” (customers) will just get up from their seats and step into another plane or mode of transportation, and (strangely) in many cases the Captain and crew will move on and take over the controls of another plane. We can just hope that the new plane will stay in the air until it reaches it’s intended destination safely.