Gigi Labs

Please follow Gigi Labs for the latest articles.

Friday, May 2, 2014

C# Security: Computing File Hashes

Hello again! :)

We're celebrating! :D Today, Programmer's Ranch turned one year old, and although I've turned most of my attention to an interesting spare-time project for the time being, I wanted to mark this occasion with a new article. And some cake.



Right, and today's article is about hashing. We've seen in "C# Security: Securing Passwords by Salting and Hashing" that a hash function transforms an input string into a totally different piece of data (a hash):


If you make even a slight change to the input, such as changing the first character from uppercase to lowercase, you get a totally different output:


Also, if you use a decent hash function (i.e. not MD5), it is normally not possible to get the input string from the hash.

In today's article, we're going to use hashes for something much simpler than securing passwords. We're going to hash the content of files, and then use that hash to check whether the file changed. Since I haven't been very impressed with SharpDevelop 5 Beta, I'm going to ditch it and use Visual Studio 2013 instead. You can use whatever you like - SharpDevelop, Visual Studio Express for Desktop, or maybe even MonoDevelop.

Create a new Console Application, and add the following at the top:

using System.Security.Cryptography;

This will allow you to use a variety of hash functions, which all derive from the HashAlgorithm class.

We'll also need a little helper function to convert our hashes from a byte array to a string, so that they may be displayed in hex in the command line. We'll use the following, which is a modified version of the Hash() method from "C# Security: Securing Passwords by Salting and Hashing":

        public static string ToHexString(byte[] bytes)
        {
            StringBuilder sb = new StringBuilder();
            foreach (byte b in bytes)
            sb.Append(b.ToString("x2").ToLower());

            return sb.ToString();
        }

Now, let's create a text file in the same folder as our .sln file and name it "test.txt", and put the following lyrics from the Eagles' "Hotel California" in it:

So I called up the Captain,
"Please bring me my wine"
He said, "We haven't had that spirit here since nineteen sixty nine"
And still those voices are calling from far away,
Wake you up in the middle of the night
Just to hear them say...

Let's read that file into memory. First, we need to add the following:

using System.IO;

We can now read the contents of the file into a string:

            string fileContents = File.ReadAllText(@"../../../test.txt");

...and quite easily compute the hash of those contents:

            using (HashAlgorithm hashAlgorithm = SHA256.Create())
            {
                byte[] plainText = Encoding.UTF8.GetBytes(fileContents);
                byte[] hash = hashAlgorithm.ComputeHash(plainText);
                Console.WriteLine(ToHexString(hash));
            }

            Console.ReadLine();

Note that I'm using SHA256 as the hash function this time - it's a lot more robust than MD5. If you check the documentation for the HashAlgorithm class, you can find a bunch of different hash algorithms you can use. As it is, we get the following output:


Now, let's see what happens if your little toddler manages to climb onto your keyboard and modify the file. Let's remove the first character in the file (the initial "S") - that might be within a toddler's ability - and save the file. When we rerun the program, the output is quite different:


And here we have already seen how hashing gives us a mean to verify a file's integrity, or in other words, check whether it has been tampered with. In fact, popular Linux distributions such as Ubuntu distribute MD5 hashes for the files they release, so that the people who can download them can check that they are really downloading the file they wanted, and not some weird video of goats yelling like humans:


So let's actually see this in action. After downloading an Ubuntu distribution, let's change the filename to that of the Ubuntu file we downloaded, and the hash algorithm to MD5:

            string fileContents = File.ReadAllText(@"../../../../ubuntu-14.04-desktop-amd64.iso");

            using (HashAlgorithm hashAlgorithm = MD5.Create())

Now, let's try to compute a hash of the Ubuntu file:


Oops! We tried to read a ~1GB file into memory, and that's a pretty stupid thing to do. Unless you've got a pretty awesome computer, you'll see the memory usage spike until you get an OutOfMemoryException, as above. And even if you do have a pretty awesome computer, you shouldn't load an entire massive file just to perform an operation on its contents.

In one of my first articles here, "C#: Working with Streams", I explained how you could read a file bit by bit (e.g. line by line) and work on those parts without having to have the entire file in memory at any one time. And quite conveniently, the hash algorithms have a variant of the ComputeHash() method that takes a stream as a parameter.

So let's change our code as follows:

        static void Main(string[] args)
        {
            using (FileStream fs = File.OpenRead(@"../../../../ubuntu-14.04-desktop-amd64.iso"))
            using (HashAlgorithm hashAlgorithm = MD5.Create())
            {
                byte[] hash = hashAlgorithm.ComputeHash(fs);
                Console.WriteLine(ToHexString(hash));
            }

Console.ReadLine();
        }

And let's run it:


There are a few things to note from the output:
  • It computes pretty quickly, despite the fact that it's going through a ~1GB file.
  • Memory levels remain at a pretty decent level (in fact the memory used by the program is negligible).
  • The output matches the first hash in the list of hashes on the Ubuntu webpage (in the background of the above screenshot).
Wonderful! :) In this first anniversary article, we revisited the concept of hashing, and learned the following:
  • There are several different hash algorithms provided by .NET that you can use, including MD5, SHA256, and others.
  • A hash gives you a way to verify whether a file has been tampered with.
  • Streaming provides the ability to process large files quickly and with very little memory overhead.

Thank you so much for reading, and please check back for more interesting articles here at Programmer's Ranch! :)