Agile is like Communism

Communism can work. For a short duration, and with a limited number of like-minded participants, real communism can work (or at least appear to work). In most other cases, communism just doesn’t pan out.

communism_worker_flag_mini

When faced with the long list of failed communist experiments, hardliners will always say “well, that was not real communism”. Which is true. But when you consider the nature of man, there really are just two options “bad communism” or “no communism”. I prefer the latter.

Same goes for Agile.

Observing a jelled team that is firing on all cylinders, you’ll see that dogmatic adherence to “process” is not enforced. That there is a lot of informal communication (on the technical topics), and the tasks are broken down to manageable chunks with a clear scope. The team can quickly adapt to changes in the environment simply because it is  agile. Wouldn’t it, then, be nice if we could write down how these guys are doing things, and then apply it to everyone writing software?

Here’s where reality sets in.

Some people are simply not fit to write code, and some people are not fit to write specs.

It doesn’t really matter what process you follow, inept coders and managers will never be agile.

But they can do Agile.

I suppose the rationale is that the group eventually acknowledges that it is not being productive. Perhaps it has gone through some dead sea effect for some time, and there is increasing frustration with delays, shipping defects and surprising side-effects discovered late in the cycle.

Given two options: a) we are simply incompetent or b) there’s something wrong with our process. Most teams pick option b).

Agile’s pitch is that bad productivity is simply due to the wrong process. And this is true; for competent teams, the wrong type and amount of bureaucracy slows things down. Limiting needless paperwork speeds things up. But it requires competent and honest people and an appropriate type of project. You don’t find a cure for cancer just by doing a bunch of epics, sprints and retrospectives.

The bad team then picks up Agile, but never bother reading the manifesto, and the concept is applied indiscriminately on all types of projects.

Informal inquiries and communication is shunned and the them instead insist on strict adherence to “process”, because deviation from the process is “what lead to disaster the last time” the argument goes. The obvious contradiction between refusing ad-hoc communication and insistence on “following process” and the stated principles of Agile is often completely lost on bad teams.

The web is overflowing with disaster stories of Agile gone wrong (and now I just added one to the growing pile), just as history books overflow with stories of communism gone wrong. And for every story, there’s one where an Agile proponent explains why they just weren’t doing Agile the right way, or that a different kind of Agile is needed, like in this piece, where a comment then reads:

This insane wishy-washy process-worshipping religion is __BULLSHIT__ of the highest order. What you really need is a competent team that isn’t sabotaged by over-eager, incompetent management and hordes of process-masturbators every step of the way.

The Agile process will not fix problems that are due to incompetence. Competent, jelled teams, are probably already agile. Spend more time identifying what value each member brings to the team. Keep score. Cull the herd.

Advertisements

The Singleton Anti-Pattern

In programming, the whole idea is to avoid re-inventing the wheel, and re-use as much as possible. Some clever coders discovered that there were some mechanism that were used over and over again. For example, the “producer/consumer” mechanism, whereby one or more threads are “producers” and one or more threads are “consumers”. Instead of coders figuring out how to do this properly over and over again, a group of people decided to write a book that described how to solve some of these problems. “Design Patterns: Elements of Reusable Object-Oriented Software” they called it. In the business, the authors became known as the “Gang of Four”.

One of the patterns they described is a “Singleton“: A singleton is essentially a global object, that is instantiated when needed. The idea being that the user doesn’t need to know when, or how, the underlying object is created/destroyed, they can just use it, and all parts of the code then shares the same object. Isn’t that cool. It’s like global variables were suddenly being endorsed in a book, and by some clever people too!!

There are cases (rare, constrained) where a global variable makes sense; it makes sense when the physical properties that the software is trying to model, matches with a single object. E.g. a singular file on a disk or a specific camera in a network. It’s perfectly appropriate to model these objects as global, because there truly is only one of them.

Let’s consider a log mechanism. There may be several things that are logging data, but if all that data goes into just one file, then it’s OK to use a singleton for the file, but certainly not for the log abstractions. If there are three or four different modules that are all logging to the same file, then those modules must have their own logger instance, and the various instances that are made, can then write to the same file using the singleton.

A primitive class diagram could look like this:

             Module A -> Log A 
Parent  ->                        -> Singleton File
             Module B -> Log B

When you are acutely aware of this composition, you should eventually realize that each logger instance must add some identifier when it writes to the disk. Otherwise you get a log file that looks like this

File Open
File Open
File Write Failed
File Write Succeeded
File Close
File Close

What you want, in the file, is this

Module A: File Open
Module B: File Open
Module B: File Write Failed
Module A: File Write Succeeded
Module B: File Close
Module A: File Close

This appears to solve the problem; except there’s a caveat. Say someone writes an app that creates two instances of the parent module. Since the log file is a singleton, all log data is written to the same file. This, in turn, means that two instances of the parent will also write to the same file.

Consider this diagram

                              Module A -> Log A
                 Parent ->               
                              Module B -> Log B
Aggregator  ->                                       -> Singleton File
                              Module A -> Log A
                 Parent ->
                              Module B -> Log B

We are now in hell.

Module A: File Open
Module B: File Open
Module B: File Write Failed
Module A: File Open
Module B: File Write Failed
Module A: File Write Succeeded
Module B: File Close
Module A: File Write Succeeded
Module A: File Close

This issue is relatively easy to fix, and it’s still valid to have a requirement that there is just one log file (might be better to create one per parent, but that’s a matter of taste).

But what about issues where things like username, password, preferences etc. are stored in a singleton that contains “user info”. In that case, when the aggregator sets the username, the username change applies to ALL modules, regardless of where they reside in the aggregator tree. It’s therefore impossible for the aggregator to set a different username for Parent 1 and Parent 2. The aggregator, therefore, breaks.

Essentially, the coder might as well have said “let’s make the username a global variable”. 99% of all coders will object when they hear that (or “goto”). But 50% of all coders remain silent when the same pattern is described using the “singleton” moniker.

The morale of the story: don’t use singletons. Not even if you think you know what you are doing. Because if you think you know what you are doing, then you almost certainly do not.

 

Cost of Error

When I got my first computer, the language it offered was BASIC. Ask any good programmer, and they’ll tell you that BASIC is a terrible language, but it got worse: my next language was 68K assembler on the Commodore Amiga, and with blatant disregard to what Commodore was telling us, I never used the BIOS calls. Instead, I bought the Amiga Hardware Reference Manual and started hitting the metal directly. During my time in school, I was taught Pascal, C++ and eventually I picked up a range of other languages.

What I’ve learned over the years is that the cost of a tool depends on two things: The time it takes to implement something, and (often overlooked) – the time it takes to debug when something goes wrong.

Take “Garbage Collection”, for example, the idea is that you will not have memory leaks because there’s no “new/delete” or “malloc/free” pair to match up. The GC will know when you are done with something you new/malloced and call delete/free when needed. This, surely, must be the best way to do things. You can write a bunch of code and have no worries that your app will leak memory and crash. After all, the GC cleans up.

But there are some caveats. I’ve created an app that will leak memory and eventually crash.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace LeakOne
{
 class Program
 {
   class referenceEater
   {
     public delegate void Evil ( );
     public Evil onEvil;
     ~referenceEater() {
       Console.WriteLine("referenceEater finalizer");
     }
   }

   class junk
   {
     public void noShit() { }
     public void leak() {
       for (int i = 0; i < 100000; i++) {
          referenceEater re = new referenceEater();
       }
     }

     ~junk() {
     }
   }
 
   static void Main(string[] args) {
      for (int i = 0; i < 1000000; i++) {
       junk j = new junk();
       j.leak();
      }
    }
  }
}

What on earth could cause this app to leak?

The answer is the innocent looking “Console.WriteLine” statement in the referenceEater finalizer. The GC runs in its own thread, and because Console.WriteLine takes a bit of time, the main thread will create millions of referenceEater objects and the GC simply can’t keep up. In other words, a classic producer/consumer problem, leading to a leak, and eventually a crash.

Running this app, the leak is fairly apparent just by looking at the task manager. On my laptop it only takes 5-10 minutes for the app to crash (in 32 bit mode), but in 64 bit mode the app would probably run for days, slowing things down for day, until eventually causing a crash.

It’s a bitch to debug, because the memory usage over a loop is expected to rise until the GC kicks in. So you get this see-saw pattern that you need to keep running for a fairly long time to determine, without a doubt, that you have a leak. To make matters worse, the leak may show up on busy systems, but not on the development machine that may have more cores or be less busy. It’s basically a nightmare.

java-memory-usage-example

There are other ways for .NET apps to leak – a good example is forgetting to unsubscribe from a delegate, which means that instead of matching new/delete, you now have to match subscription and unsubscription from delegates. Fragmentation of the Large Object Heap (not really a leak, but will cause memory use to grow, and ultimately kill the app)

The C++ code I have I can test for leaks by just creating a loop. Run the loop a million times, and when we are done we should have exactly the same amount of memory as before the loop.

I am not proposing that we abandon garbage collection, or that everything should be written in C++, not even by a long shot. As an example, writing a stress test for our web-server (written in C++), was done using node.js. This took less than 2 hours to put together, and I don’t really care if the test app leaks (it doesn’t). There are a myriad of places where I think C# and garbage collection is appropriate. I use C# to make COM objects that get spawned and killed by IIS, and it’s a delight to write those and not having to worry about the many required macros needed if I had done the same in C++.

And with great care, and attention, C# apps would never leak, but the reality is that programmers will make mistakes, and that the cost of this is taken into consideration when trying to determine what tool is appropriate for what task.