LinkedIn’s user database was leaked online a while ago, now usernames and passwords are for sale in the dark net. LinkedIn were not the first, nor the last to have their user database stolen, and published online.
Companies go to great lengths to protect their databases from outside intrusion; but what about the disgruntled employee? Your best friend had root access to the system, but when things went sour, he had already made a copy of the database (just in case), and so when he was finally let go, he decided to post the database snapshot online.
In the terrible old days, the database would contain the username and password in plaintext, so you’d see something like this:
User | Pass
----------+-------
Morten | 123456
Then someone thought to encrypt the passwords, but the disgruntled employee has access to the key and the algo used, so he could just post that information online as well. A tell-tale sign of such a (horrific) implementation is that the system sends your password to your email address if you forget it, instead of the usual reset password email that you get today.
These days we do salted hashes to make it hard to determine the passwords if the database is compromised. The basic idea is that you no longer store the password, but instead store the result of a hash function. When the user provide their password, the system combines the password with another value and computes the resulting hash value. This value is then compared to the value stored in the database, if they match the user gets access. Since the user database contains the RESULT of a computation, there is “no way” to deduce the original password from the hash value stored in the database.
Here is a very simple example:
Say we only allow 4 digits as our password, and our hash is simply to add the numbers together (this is a TERRIBLE hashing function!), but this will serve as an example.
So the user has a pin of 1234, and for this user we have made a salt value of 3333.
So the user enters 1234, we then append the salt, so we get 12343333, then hash that value (1+2+3+4+3+3+3+3 = 22), so we store 22 in the database.
User | Salt | Hash
----------+------+-----
Morten | 3333 | 22
Now, say someone tries to use my account, and attempted to gain access. He then enters 1111 as the PIN. The system then computes the hash (1+1+1+1+3+3+3+3 = 16). The hash computed (16) is then compared to the value in the database (22), since they don’t match, the user is denied access.
The observant reader will have realized that if I know the salt it will be simple to find a combination of PIN that will give us the desired result (22). 4321 would work and grant us access, so even if I don’t have exactly the same PIN we will get access just the same.
This is primarily because the hash function is so terrible; so you do not use idiotic hash functions, but pick some that are generally accepted as being “good”, for example MD5 is widely used. So when you use good hash functions, the likelihood of 2 PINS given the same hash is very small. This means that you have to try a lot of combinations to find a combination where function ( guess, known_salt ) == hash. But computers are happy to try a lot of combinations, and the number of combinations can be greatly reduced by using “rainbow tables”. These are tables of known, often used passwords, usually retrieved from other hacks, and while this doesn’t give us all the passwords, it usually breaks enough accounts for whatever purpose is needed. We might also want to target just one account in the DB, and try a much wider range of passwords.
Someone might argue that since I don’t have the hashing algo used I will be unable to set up my little script to run through the rainbow table, but this could have been leaked as well. Even if it wasn’t there might be known accounts in the database, where I will be able to try different hashes with a known username, password and salt.
If you are in the “same password everywhere” camp, then you are SOL. Even if the blog comes clean and discloses that they have been compromised, it is too late. Changing the password on the blog does not remove the password in the leaked database.