When you’re dealing with user passwords, the site owner does not (or should not) want to know the actual password the user enters. A very good reason for not wanting to know the actual password is that if the database is leaked, in one way or the other, whoever gets hold of the database won’t know the passwords.
What you do instead, is to save a hash of the password. A hash is a kind of checksum, so that for a given input, you get a fixed length output value. A very simple (and useless) hash is to just add the ascii-values of each char together, taking modulo 256 of the sum.
So, when the user enters “23456” as his password, you calculate the sum of the ascii-values ( 50 + 51 + 52 + 53 + 54 ) = 260, take the modulo 256 of 260, we get “4” as the checksum, we then store the “4” in the database, instead of the “23456” string.
Username Password-Hash Morten 4
The next time, the user comes along, we do the same calculation, and compare the results. If the result is “4”, the user gets access. If the user, by a mistake, entered “23457” we’d get the hash of “5” and we’d reject the user.
But wait a minute, the astute reader yells out, banging his fist on the table: “the password “65432” would give the exact same hash!”. And the reader would be correct. This is known as collision, and is why some hashes are good, and others are bad. In my idiot hash algorithm, it is completely trivial for someone to find a combination of letters that give “4” as the output.
Instead of a useless hash, you might use some that are considered “good”. The SHA family of hashes are generally considered the gold standard of hashes. Bitcoin, for example, uses SHA-256 to provide Proof of Work.
So, let’s say we pick SHA-1, and we store the output of that. In that case we’d get
SHA1( "23456" ) = c24d0a1968e339c3786751ab16411c2c24ce8a2e
But what if two users have the same password? Consider these two rows in the db
Username Password-Hash Morten c24d0a1968e339c3786751ab16411c2c24ce8a2e John c24d0a1968e339c3786751ab16411c2c24ce8a2e
If I know the password of “Morten”, I also now know the password of “John”.
What you do, instead, is to add a “salt”. The salt is just a random sequence of strings that I append to the password before computing the hash. For example, with salts, we’d get
Username Password-Hash Salt Morten ac5740fde13da84ba4a5266ce9e9b7d697e0622b xyz John c96bbd1bed0ffa3f5a0098ce7ee568ce6e9d496c abc
Now it’s not apparent that Morten and John both use “23456” as passwords, and we’re almost done….
Brute Force Attacks
In my bad hashing algorithm, it’s pretty clear that you can’t guess my password from knowing the hash value “4”. On one hand it means that other users, might easily get access to an account on the site, but on the other hand, it also means that they can’t guess the original password.
This means that even if they could get access to an account on the site using the (false) password “65432”, they probably wouldn’t be able to access the users gmail account, because it uses a different security model, and in that case the “65432” just wouldn’t work.
However, if the site uses SHA1, the chance of collision is quite low, which means that if I can find an input string that gives the same SHA1 there’s a very good chance that I found the actual original password, and if the user reused the password on other sites, I might be able to gain access to those too.
Brute force attacks don’t actually try every conceivable input, instead it uses something known as “rainbow tables”. These are tables of the most common passwords, As GPU’s improve, the number of SHA1 computations we can make in a given time increases. For example, a new NVidia GTX 1080 TI will do 11374.1 MH/s
In other words, SHA1 salted passwords are not safe, and they are certainly not safe if they occur in a rainbow table somewhere, and there is plenty of software available to accomplish this, hashcat being one.
What To Do?
If you’re reading this, you probably aren’t using either “password” or “123456” as passwords (or are you John?). Furthermore, if you are using “123456”, then you probably are also using the same password for multiple logins, and chances are your other accounts have already been breached.
Two-factor authentication goes a long way to remedy the situation; if the user logs in from a new IP (meaning outside a 255.255.255.0 subnet of known verified ips), then the 2nd stage kicks in, and you need to authenticate over email or SMS. However, this doesn’t work too well if the user uses “123456” all over the place, because the attacker might already have access to your email account (especially if the email account is your username on the site).
Enforcing “good” passwords on the site is also an improvement. You don’t have to go nuts on the requirements that it has to have special chars, upper and lower case etc. The site could actually generate the password for you. This way it would be unique, and hard to guess, if the site was breached, only the site would be affected, and if the site is just a TMZ style rag, the damage would not be too bad.