Programmer's Ranch: hashing

Hello again! :)

We're celebrating! :D Today, Programmer's Ranch turned one year old, and although I've turned most of my attention to an interesting spare-time project for the time being, I wanted to mark this occasion with a new article. And some cake.

Right, and today's article is about hashing. We've seen in "C# Security: Securing Passwords by Salting and Hashing" that a hash function transforms an input string into a totally different piece of data (a hash):

If you make even a slight change to the input, such as changing the first character from uppercase to lowercase, you get a totally different output:

Also, if you use a decent hash function (i.e. not MD5), it is normally not possible to get the input string from the hash.

In today's article, we're going to use hashes for something much simpler than securing passwords. We're going to hash the content of files, and then use that hash to check whether the file changed. Since I haven't been very impressed with SharpDevelop 5 Beta, I'm going to ditch it and use Visual Studio 2013 instead. You can use whatever you like - SharpDevelop, Visual Studio Express for Desktop, or maybe even MonoDevelop.

Create a new Console Application, and add the following at the top:

using System.Security.Cryptography;

This will allow you to use a variety of hash functions, which all derive from the HashAlgorithm class.

We'll also need a little helper function to convert our hashes from a byte array to a string, so that they may be displayed in hex in the command line. We'll use the following, which is a modified version of the Hash() method from "C# Security: Securing Passwords by Salting and Hashing":

public static string ToHexString(byte[] bytes)

{

StringBuilder sb = new StringBuilder();

foreach (byte b in bytes)

sb.Append(b.ToString("x2").ToLower());

return sb.ToString();

}

Now, let's create a text file in the same folder as our .sln file and name it "test.txt", and put the following lyrics from the Eagles' "Hotel California" in it:

So I called up the Captain,

"Please bring me my wine"

He said, "We haven't had that spirit here since nineteen sixty nine"

And still those voices are calling from far away,

Wake you up in the middle of the night

Just to hear them say...

Let's read that file into memory. First, we need to add the following:

using System.IO;

We can now read the contents of the file into a string:

string fileContents = File.ReadAllText(@"../../../test.txt");

...and quite easily compute the hash of those contents:

using (HashAlgorithm hashAlgorithm = SHA256.Create())

{

byte[] plainText = Encoding.UTF8.GetBytes(fileContents);

byte[] hash = hashAlgorithm.ComputeHash(plainText);

Console.WriteLine(ToHexString(hash));

}

Console.ReadLine();

Note that I'm using SHA256 as the hash function this time - it's a lot more robust than MD5. If you check the documentation for the HashAlgorithm class, you can find a bunch of different hash algorithms you can use. As it is, we get the following output:

Now, let's see what happens if your little toddler manages to climb onto your keyboard and modify the file. Let's remove the first character in the file (the initial "S") - that might be within a toddler's ability - and save the file. When we rerun the program, the output is quite different:

And here we have already seen how hashing gives us a mean to verify a file's integrity, or in other words, check whether it has been tampered with. In fact, popular Linux distributions such as Ubuntu distribute MD5 hashes for the files they release, so that the people who can download them can check that they are really downloading the file they wanted, and not some weird video of goats yelling like humans:

So let's actually see this in action. After downloading an Ubuntu distribution, let's change the filename to that of the Ubuntu file we downloaded, and the hash algorithm to MD5:

string fileContents = File.ReadAllText(@"../../../../ubuntu-14.04-desktop-amd64.iso");

using (HashAlgorithm hashAlgorithm = MD5.Create())

Now, let's try to compute a hash of the Ubuntu file:

Oops! We tried to read a ~1GB file into memory, and that's a pretty stupid thing to do. Unless you've got a pretty awesome computer, you'll see the memory usage spike until you get an OutOfMemoryException, as above. And even if you do have a pretty awesome computer, you shouldn't load an entire massive file just to perform an operation on its contents.

In one of my first articles here, "C#: Working with Streams", I explained how you could read a file bit by bit (e.g. line by line) and work on those parts without having to have the entire file in memory at any one time. And quite conveniently, the hash algorithms have a variant of the ComputeHash() method that takes a stream as a parameter.

So let's change our code as follows:

static void Main(string[] args)

{

using (FileStream fs = File.OpenRead(@"../../../../ubuntu-14.04-desktop-amd64.iso"))

using (HashAlgorithm hashAlgorithm = MD5.Create())

{

byte[] hash = hashAlgorithm.ComputeHash(fs);

Console.WriteLine(ToHexString(hash));

}

Console.ReadLine();

}

And let's run it:

There are a few things to note from the output:

It computes pretty quickly, despite the fact that it's going through a ~1GB file.
Memory levels remain at a pretty decent level (in fact the memory used by the program is negligible).
The output matches the first hash in the list of hashes on the Ubuntu webpage (in the background of the above screenshot).

Wonderful! :) In this first anniversary article, we revisited the concept of hashing, and learned the following:

There are several different hash algorithms provided by .NET that you can use, including MD5, SHA256, and others.
A hash gives you a way to verify whether a file has been tampered with.
Streaming provides the ability to process large files quickly and with very little memory overhead.

Thank you so much for reading, and please check back for more interesting articles here at Programmer's Ranch! :)

Hello and welcome, dear readers! :)

This article deals with storing passwords securely... usually in a database, but to keep things simple, we'll just use a C# dictionary instead. As part of this article, we'll cover two interesting techniques called salting and hashing. These topics can sometimes be challenging to understand - in fact you can see from my question about salting on StackOverflow that it had taken me a while to understand the benefits of salting, but it doesn't have to be that way. I am writing this article to hopefully make this fascinating subject easy to understand.

Right, so let's get to business. Create a new Console Application using SharpDevelop or whichever IDE you prefer. Add the following near the top, so that we can use dictionaries:

using System.Collections.Generic;

Just inside your class Program, before your Main() method, add the following dictionary to store our users and their corresponding passwords (see "C# Basics: Morse Code Converter Using Dictionaries" if this seems in any way new to you):

public static Dictionary<String, String> users = new Dictionary<String, String>()
        {
            { "johnny", "password" },
            { "mary", "flowers" },
            { "chuck", "roundhousekick" },
            { "larry", "password123" }
        };

It is now pretty simple to add a method that can check whether a given username and password result in a successful login:

public static bool Login(String username, String password)
        {
            if (users.ContainsKey(username) && users[username] == password)
                return true;
            else
                return false;
        }

This code first checks that the username actually exists in the dictionary, and then checks whether the corresponding password matches.

We can now test this code by replacing the contents of Main() with the following code:

public static void Main(string[] args)
        {
            Console.Write("Username: ");
            String username = Console.ReadLine();

            Console.Write("Password: ");
            Console.ForegroundColor = ConsoleColor.Black;
            String password = Console.ReadLine();
            Console.ResetColor();

            bool loggedIn = Login(username, password);
            if (loggedIn)
                Console.WriteLine("You have successfully logged in!");
            else
                Console.WriteLine("Bugger off!");

            Console.ReadLine();
        }

Notice that when requesting the password, we're setting the console's text colour to black. The console's background colour is also black, so the password won't show as you type, fending off people trying to spy it while looking over your shoulder.

Press F5 to try it out:

Awesome - we have just written a very simple login system.

The problem with this system is that the passwords are stored as clear text. If we imagine for a moment that our usernames and passwords were stored in a database, then the actual passwords can easily be obtained by a hacker gaining illegal access to the database, or any administrator with access to the database. We can see this by writing a simple method that shows the users' data, simulating what a hacker would see if he managed to breach the database:

public static void Hack()
        {
            foreach (String username in users.Keys)
                Console.WriteLine("{0}: {1}", username, users[username]);
        }

We can then add the following code just before the final Console.ReadLine() in Main() to test it out:

Console.WriteLine();
            Hack();

This gives us all the details, as we are expecting:

This isn't a nice thing to have - anyone who can somehow gain access to the database can see the passwords. How can we make this better?

Hashing

One way is to hash the passwords. A hash function is something that takes a piece of text and transforms it into another piece of text:

A hash function is one-way in the sense that you can use it to transform "Hello" to "8b1a9953c4611296a827abf8c47804d7", but not the other way around. So if someone gets his hands on the hash of a password, it doesn't mean that he has the password.

Another property of hash functions is that their output changes considerably even with a very small change in the input. Take a look at the following, for instance:

You can see how "8b1a9953c4611296a827abf8c47804d7" is very different from "5d41402abc4b2a76b9719d911017c592". The hashes bear no relationship with each other, even though the passwords are almost identical. This means that a hacker won't be able to notice patterns in the hashes that might allow him to guess one password based on another.

One popular hashing algorithm (though not the most secure) is MD5, which was used to produce the examples above. You can find online tools (such as this one) that allow you to compute an MD5 hash for any string you want.

In order to use MD5 in our code, we'll need to add the following statement near the top of our program code:

using System.Security.Cryptography;

At the beginning of the Program class, we can now create an instance of the MD5 class to use whenever we need:

private static MD5 hashFunction = MD5.Create();

If you look at the intellisense for MD5, you'll see that it has a ComputeHash() method, which returns an array of byte, rather than a String:

We're going to do some String work, so add the following near the top:

using System.Text;

Let's write a little helper method to hash our passwords, using Strings for both input and output:

public static String Hash(String input)
        {
            // code goes here
        }

In this method, the first thing we need to do is convert the input String to a byte array, so that ComputeHash() can work with it. This is done using the System.Text.Encoding class, which provides several useful members for converting between Strings and bytes. In our case we can work with the ASCII encoding as follows:

  byte[] inputBytes = Encoding.ASCII.GetBytes(input);

We can then compute the hash itself:

byte[] hashBytes = hashFunction.ComputeHash(inputBytes);

Since we don't like working with raw bytes, we then convert it to a hexadecimal string:

StringBuilder sb = new StringBuilder();
foreach(byte b in hashBytes)
                sb.Append(b.ToString("x2").ToLower());

The "x2" bit converts each byte into two hexadecimal characters. If you think about it for a moment, hexadecimal digits are from 0 to f (representing 0-15 in decimal), which fit into four bits. But each byte is eight bits, so each byte is made up of two hex digits.

Anyway, after that, all we need to do is return the string, so here's the entire code for the method:

public static String Hash(String input)
        {
  byte[] inputBytes = Encoding.ASCII.GetBytes(input);
            byte[] hashBytes = hashFunction.ComputeHash(inputBytes);

            StringBuilder sb = new StringBuilder();
            foreach(byte b in hashBytes)
                sb.Append(b.ToString("x2").ToLower());

            return sb.ToString();
        }

We can now change our database to use hashed passwords:

public static Dictionary<String, String> users = new Dictionary<String, String>()
        {
            { "johnny", Hash("password") },
            { "mary", Hash("flowers") },
            { "chuck", Hash("roundhousekick") },
            { "larry", Hash("password123") }
        };

In this way, we aren't storing the passwords themselves, but their hashes. For example, we're storing "5f4dcc3b5aa765d61d8327deb882cf99" instead of "password". That means we don't store the password itself any more (if you ever signed up to an internet forum or something, and it told you that your password can be reset but not recovered, you now know why). However, we can hash any input password and compare the hashes.

In our Login() method, we now change the line that checks username and password as follows:

if (users.ContainsKey(username) && users[username] == Hash(password))

Let's try this out (F5):

When the user types "johnny" as the username and "password" as the password, the password is hashed, giving us "5f4dcc3b5aa765d61d8327deb882cf99". Since the passwords were also stored as hashes in our database, it matches. In reality our login is doing the same thing as it was doing before - just that we added a hash step (a) when storing our passwords and (b) when receiving a password as input. Ultimately the password in our database and that entered by the user both end up being hashes, and will match if the actual password was the same.

How does this help us? As you can see from the hack output (last four lines in the screenshot above), someone who manages to breach the database cannot see the passwords; he can only get to the hashes. He can't login using a hash, since that will in turn be hashed, producing a completely different value that won't match the hash in the database.

Although hashing won't make the system 100% secure, it's sure to give any potential hacker a hard time.

Salting

You may have noticed that in the example I used, I had some pretty dumb passwords, such as "password" and "password123". Using a dictionary word such as "flowers" is also not a very good idea. Someone may be able to gain access to one of the accounts by attempting several common passwords such as "password". These attempts can be automated by simple programs, allowing hackers to attempt entire dictionaries of words as passwords in a relatively short period of time.

Likewise, if you know the hash for common passwords (e.g. "5f4dcc3b5aa765d61d8327deb882cf99" is the hash for "password"), it becomes easy to recognise such passwords when you see the expected hash. Hackers can generate dictionaries of hashes for common passwords, known as rainbow tables, and find hashes for common words used as passwords.

We can combat such attacks by a process known as salting. When we compute our hashes, we add some string that we invent. This means changing the first line of our Hash() function as follows:

byte[] inputBytes = Encoding.ASCII.GetBytes("chuck" + input);

Both the database password and the one entered by the user will be a hash of "chuck" concatenated with the password itself. When the user tries to login, it will still work, but look at what happens now:

The login worked, but the hashes have changed because of the salt! This means that even for a password as common as "password", a hacker cannot identify it from the hash, making rainbow tables much less effective.

Summary

This article described how to store passwords securely. It started off by doing the easiest and worst thing you can do: store them as clear text. A hash function was subsequently introduced, to transform the passwords into text from which the password cannot be retrieved. When a user logs in, the hash of the password he enters is compared with the password hash stored in the database.

Finally, the hashes were salted, by adding an arbitrary piece of text to them, in order to transform the hashes into different values that can't be used to identify common passwords.

I hope this made password security a little easier to understand. Please come back again, and support us by sharing this article with your friends, buying games from GOG.com, or any of the other ways described in the "Support the Ranch" page.

Programmer's Ranch

Gigi Labs

Friday, May 2, 2014

C# Security: Computing File Hashes

Monday, November 11, 2013

C# Security: Securing Passwords by Salting and Hashing

Hashing

Salting

Summary