In our Cyber Security Consulting Practice we get deep into the authentication weeds helping clients ensure their authentication systems are up to date with the latest best practices. So it was with interest that we saw this recent post by Ian Maddox on that very subject via the Google Cloud Platform Blog.

One of the key concepts the article covers is password hashing and (after some edits reflecting comments made by readers on the original text) it offers sound advice on factors such as the choice of hash function. However the author has assumed, perhaps not unreasonably, that the reader already has a good grasp on why we hash passwords and why the computational cost of the hash function (both forwards and inverse) is important. This led to a discussion in the office as to where developers who weren’t already knowledgeable could look to acquire that kind of background understanding.

As is often the case, the best material is to be found from the time when the subject itself was first introduced. Authors fresh from making discoveries or inventing new techniques, writing for a audience eager to hear about the new thing, tend to do a better job . It turns out that the idea of hashing passwords for storage goes back (at least) to Unix Version 7 – one of the first Operating Systems to implement hashing – and a paper written way back in 1978 by key Unix developers Robert Morris and Ken Thompson still provides a great introduction to the subject. As it dates from the days before even Postscript existed, the paper is a little hard to access on today’s Internet. You can however read it in the original Troff formatting in this large PDF containing the second half of Volume 2 of the Unix V7 Manual (look for “pdf page” 244), or as a separate article in this version published a little later in the CACM. Re-reading their paper in light of today’s password storage landscape we can see that the need for “salting” was understood and implemented in Unix V7. The authors were also concerned, as we are today, that hash function computational cost should be sufficiently high to deter dictionary attacks. Interestingly they were worried that off-the-shelf DES encryption chips could be used to build a massively parallel “hash cracker” machine, and so deliberately changed their hash function such that it wasn’t compatible with the known DES chips. Therefore an attacker would need to fabricate a custom chip. One wonders what they would make of today’s Bitcoin mining hardware! (For the uninitiated mining Bitcoin requires fast evaluation of hash functions).

Comments on Hacker News brought another, somewhat “competing” article by former Googler Mike Hearn to light. His article is intriguing because it describes the thinking behind the techniques implemented by Google themselves relating to authentication and user identity. That thinking in many ways diverges from current industry practice and the reasons for that divergence are certainly thought provoking. For example he advises to consider not using passwords at all on the basis that if a password depends on the user’s email for its security (through password reset and recovery) then why not just email a one-time service access token? He says not to use “secret questions” because they’re, well, not so secret after all. Top tip : 20% of users report “Pizza” as their favorite food. He also doesn’t like CAPTCHAs (because solving them is too easy for bots) although the “find the road signs” puzzle Google makes us solve in lieu feels a little too difficult for some humans. There are many more interesting, counter-fashionable but well reasoned recommendations in the article: well worth a read.

Turning back to Ian Maddox’s article, he makes the useful observation that “identity” and “account” should not be strongly interconnected within an authentication system. The following example illustrates the problem this addresses: Consider the case where a user creates an account using Facebook Login. Then at a later date the user signs up “native” with their email and password. It should be simple and easy to associate this new set of credentials with the existing account so that the user can now use either Facebook Login or email/password authentication, without needing to migrate or re-create their account data. Ian also includes an interesting discussion on session time-to-live. It is surprising how many sites and services never expire authentication tokens, creating in effect in infinitely long-lived session. One reason he cites for this is that handling session expiry gracefully (e.g. saving current session state) can be tricky. Taking the contrary position in this debate however, Mike Hearn says that sessions should never expire, on the basis that attackers typically attack sooner rather than later, and well…because it’s hard.

If you’re involved in the design and implementation of authentication systems these two articles are well worth reading; and if you’re not, they’re still worth a look to get some idea of what goes on when you type your email and password and hit “login”.