Which cryptographic or other algorithm should we use for password storage? This is a highly controversial question with different opinions from the field of cryptography and application security. Let’s start with a few basic requirements. Essentially, we want to be able to, incredibly quickly, verify if a users password is correct. However, even when attackers somehow steal our database or database data, we do not want them to be able to reverse the password, even when supercomputing capabilities are available, within a reasonable amount of time. The best password storage strategies prevent even the website administrator from knowing their users passwords. The best (and to my knowledge only) threat model created on Password Storage is from John Steven. We challenge all other cryptographers or application security professionals who disagree with Johns assertions to publish a competing threat model, or to critique his work in some scientific and publishable way.
That aside, the whole purpose of strong password storage is to prevent attackers from discovering your users passwords even when your database data and password ciphertext or hash is stolen. Once your password data is stolen, attackers can, using high performance computing, attempt to discover your users passwords “offline” using dictionary attacks, brute force attacks, and rainbow table attacks (ie: pre computed databases of hashes).
A brute force attack against a password storage system occurs when an attacker walks through a list of every possible password, creates the ciphertext/hash for each potential password, and compares that ciphertext/hash to the passwords stored in your database. Although this seems like an inefficient attack (attempting to walk through every password 10 characters or under considering upper case, lower case, numbers and basic non alphanumeric characters would take approximately 90 ^ 10 attempts or 3.4867844e+19 attempts to cover the entire password space), attackers have access to both large lists of known passwords in use today, specialized password cracking algorithms, and high performance GPU password cracking rigs that can attempt many billions of password attempts per second against your password storage mechanism with modest resources.
A dictionary attack against password storage occurs when:
-
An attacker has stolen your stored password ciphertext or hash
-
The attacker understands your password storage algorithm
-
The attacker attempts to discover users passwords by using a large list of commonly used password and dictionary words, generating the ciphertext/hash for that password, and then attempts to compare that ciphertext/hash with your stored ciphertext/hash
-
Once the attacker finds a match, they have discovered the users password for that match.
Attackers also use large pre-computed lists of hashes called rainbow tables. In 2012, LinkedIn was compromised and had their password data stolen. At the time LinkedIn used the md5 algorithm for password storage, a dangerously bad idea. Hackers were able to reverse over 90% of LinkedIn users passwords within the first day after the compromise.
So what kind of password storage is resistant to an attack of this nature? Well first of all, if you do not enforce a strong password policy, any password storage mechanism is useless. We discuss good password storage mechanism above.
The first go-to algorithm for password storage is the PBKDF2 NIST Standard KDF (Key Derivation Function). This function is both one-way and can be configured to process slowly — on purpose. Why would we want to use a slow algorithm for password storage? GPU cracking rigs and other supercomputing resources can attempt many billions of hashes per second. When using a purposely slow algorithm with proper configuration settings (ie: work factor or iteration count), those billions of attempts per seconds becomes mere thousands of attempts per second on the same hardware. What used to take days or weeks to crack, now will takes months, years or more to crack on the same hardware. No password storage mechanism is perfect. The best we can hope for is to buy time so we can recover from the data loss incident. Slow (adaptive) algorithms help us in this endeavor.
Another option for password storage is the scrypt adaptive algorithm. Not only is scrypt slow but it can also be memory hardened (ie: scrypt can be configured to consume a certain amount of memory when executed, limiting attacker hardware from parallelizing their attempt to crack your password).
Several Security libraries use loops of hashing algorithms for password storage, something we highly discourage. Our recommendations of PBKDF2 and scrypt are formal KDF’s — Key Derivation Functions — and are made to take a password and (slowly) churn out a key. THESE are the best ways to store a password since they are standard algorithms. Looping through fast hashes is — old school — and non standard cryptography. We can do better.
Even though PBKDF2 and scrypt are good initial choices for password storage for most types of websites, when was the last time your were told to slow down your code? These algorithms can require an extensive amount of hardware when attempting to support a large number of concurrent logins. In fact, the Django open source python content management system suffered from a denial of service vulnerability because it supported unlimited length passwords and the PBKDF2 algorithm for password support! Do not make the same mistake. Although you should support long passwords, and PBKDF2 is a reasonable first-pick for password storage, do not support too long passwords.
Call-out: Be sure to add unique per-user salt of at least 32 characters in length to your password before storing it. Also be sure to make the iteration account as large as possible without harming your users experience. A good rule of thumb is to set your algorithms iteration count or work factor so that password storage computation takes ½ second on your production server and to increase that value over time.
So what do you do when you need to handle scale and purposely resource-intensive algorithms? You can certainly throw a lot of hardware at the problem and call it a day. But what if you want to have secure password storage, and be able to handle a large number of concurrent logins with efficient use of hardware? That is where a HMAC comes in, the keyed-hash message authentication code. When proper design is implemented, HMAC’s can accomplish secure password storage at massive scale with efficient use of hardware. Most companies that need massive scale (big banks, etc.) already have a good key management infrastructure in place. An HMAC is actually a simple algorithm. It essentially loops through a hash twice and inserts a key in the second step. The strength of an HMAC is governed by the size of the key and the ability to keep the key in secret. HMAC’s are essentially as fast as a hashing algorithm and will not be reversible if the key is managed properly. However, if the key is stolen or lost, the whole system is broken (since the problem is reduced to a hash). When building a password storage system in your database, the KDF’s are a good first choice. However, we have seen large companies that use scrypt for their password storage system consume 10% of their server farm resources just for password processing during peak load. Yea, crazy. So by moving them to an HMAC solution — possibly even making custom hardware for their servers — we radically save them money on hardware utilization while still providing a secure password storage system. Yeah, baby!
HMAC engineering is a LOT more difficult to engineer for password storage. You need to isolate the processing of the password to truly protect the key. Consider a webservice that takes a string and spits out the HMAC hash without ever exposing the key to your application code. This is isolated so that even if your database, filesystem and web server are all compromised — your HMAC magic webservice and therefore your password storage mechanism is NOT compromised. Even if your entire database is compromised, at least the other websites that your user uses (i.e.: password reuse) is not also compromised.
Regardless of which algorithm you choose for password storage, be sure to add a salt to your password before protecting it. If two users have the same password, then their password storage ciphertext will be the same. When users first register for an account on your website, or when you first create a users account, create a 32 character random value and add it to your user table. Then, when you protect your users password, be sure to add the salt before the password before you protect it with PBKDF2, scrypt or your specialized HMAC setup.
We especially want to thank John Steven for his incredible work and research on password storage. For more information, please check out the Password Storage Cheat Sheet by John Steven.