Gigi Labs

Please follow Gigi Labs for the latest articles.
Showing posts with label files. Show all posts
Showing posts with label files. Show all posts

Friday, May 2, 2014

C# Security: Computing File Hashes

Hello again! :)

We're celebrating! :D Today, Programmer's Ranch turned one year old, and although I've turned most of my attention to an interesting spare-time project for the time being, I wanted to mark this occasion with a new article. And some cake.



Right, and today's article is about hashing. We've seen in "C# Security: Securing Passwords by Salting and Hashing" that a hash function transforms an input string into a totally different piece of data (a hash):


If you make even a slight change to the input, such as changing the first character from uppercase to lowercase, you get a totally different output:


Also, if you use a decent hash function (i.e. not MD5), it is normally not possible to get the input string from the hash.

In today's article, we're going to use hashes for something much simpler than securing passwords. We're going to hash the content of files, and then use that hash to check whether the file changed. Since I haven't been very impressed with SharpDevelop 5 Beta, I'm going to ditch it and use Visual Studio 2013 instead. You can use whatever you like - SharpDevelop, Visual Studio Express for Desktop, or maybe even MonoDevelop.

Create a new Console Application, and add the following at the top:

using System.Security.Cryptography;

This will allow you to use a variety of hash functions, which all derive from the HashAlgorithm class.

We'll also need a little helper function to convert our hashes from a byte array to a string, so that they may be displayed in hex in the command line. We'll use the following, which is a modified version of the Hash() method from "C# Security: Securing Passwords by Salting and Hashing":

        public static string ToHexString(byte[] bytes)
        {
            StringBuilder sb = new StringBuilder();
            foreach (byte b in bytes)
            sb.Append(b.ToString("x2").ToLower());

            return sb.ToString();
        }

Now, let's create a text file in the same folder as our .sln file and name it "test.txt", and put the following lyrics from the Eagles' "Hotel California" in it:

So I called up the Captain,
"Please bring me my wine"
He said, "We haven't had that spirit here since nineteen sixty nine"
And still those voices are calling from far away,
Wake you up in the middle of the night
Just to hear them say...

Let's read that file into memory. First, we need to add the following:

using System.IO;

We can now read the contents of the file into a string:

            string fileContents = File.ReadAllText(@"../../../test.txt");

...and quite easily compute the hash of those contents:

            using (HashAlgorithm hashAlgorithm = SHA256.Create())
            {
                byte[] plainText = Encoding.UTF8.GetBytes(fileContents);
                byte[] hash = hashAlgorithm.ComputeHash(plainText);
                Console.WriteLine(ToHexString(hash));
            }

            Console.ReadLine();

Note that I'm using SHA256 as the hash function this time - it's a lot more robust than MD5. If you check the documentation for the HashAlgorithm class, you can find a bunch of different hash algorithms you can use. As it is, we get the following output:


Now, let's see what happens if your little toddler manages to climb onto your keyboard and modify the file. Let's remove the first character in the file (the initial "S") - that might be within a toddler's ability - and save the file. When we rerun the program, the output is quite different:


And here we have already seen how hashing gives us a mean to verify a file's integrity, or in other words, check whether it has been tampered with. In fact, popular Linux distributions such as Ubuntu distribute MD5 hashes for the files they release, so that the people who can download them can check that they are really downloading the file they wanted, and not some weird video of goats yelling like humans:


So let's actually see this in action. After downloading an Ubuntu distribution, let's change the filename to that of the Ubuntu file we downloaded, and the hash algorithm to MD5:

            string fileContents = File.ReadAllText(@"../../../../ubuntu-14.04-desktop-amd64.iso");

            using (HashAlgorithm hashAlgorithm = MD5.Create())

Now, let's try to compute a hash of the Ubuntu file:


Oops! We tried to read a ~1GB file into memory, and that's a pretty stupid thing to do. Unless you've got a pretty awesome computer, you'll see the memory usage spike until you get an OutOfMemoryException, as above. And even if you do have a pretty awesome computer, you shouldn't load an entire massive file just to perform an operation on its contents.

In one of my first articles here, "C#: Working with Streams", I explained how you could read a file bit by bit (e.g. line by line) and work on those parts without having to have the entire file in memory at any one time. And quite conveniently, the hash algorithms have a variant of the ComputeHash() method that takes a stream as a parameter.

So let's change our code as follows:

        static void Main(string[] args)
        {
            using (FileStream fs = File.OpenRead(@"../../../../ubuntu-14.04-desktop-amd64.iso"))
            using (HashAlgorithm hashAlgorithm = MD5.Create())
            {
                byte[] hash = hashAlgorithm.ComputeHash(fs);
                Console.WriteLine(ToHexString(hash));
            }

Console.ReadLine();
        }

And let's run it:


There are a few things to note from the output:
  • It computes pretty quickly, despite the fact that it's going through a ~1GB file.
  • Memory levels remain at a pretty decent level (in fact the memory used by the program is negligible).
  • The output matches the first hash in the list of hashes on the Ubuntu webpage (in the background of the above screenshot).
Wonderful! :) In this first anniversary article, we revisited the concept of hashing, and learned the following:
  • There are several different hash algorithms provided by .NET that you can use, including MD5, SHA256, and others.
  • A hash gives you a way to verify whether a file has been tampered with.
  • Streaming provides the ability to process large files quickly and with very little memory overhead.

Thank you so much for reading, and please check back for more interesting articles here at Programmer's Ranch! :)

Saturday, November 16, 2013

C# EF: Setting Connection Strings at Runtime with Entity Framework 5.0, Database First, VS2012

Hi everyone! :)

This article deals with how to solve the problem of building and setting an Entity Framework connection string at runtime, based on a database-first approach (i.e. you have generated an Entity Data Model based on an existing database). You are expected to be familiar with ADO .NET and the Entity Framework. The first part of the article deals with setting up an Entity Data Model and simple interactions with it; this should appeal to all readers. The second part deals with the custom connection string issue, and will be helpful only to those who have actually run into that problem.

We're going to be using Visual Studio 2012, and Entity Framework 5.0. Start a new Console Application so that you can follow along.

Setting up the database


You can use whatever database you want, but in my case I'm using SQL Server Compact edition (SQLCE). If you're using something else and already have a database, you can just skip over this section.

Unlike many of the more popular databases such as SQL Server and MySQL, SQLCE is not a server and stores its data in a file with .sdf extension. This file can be queried and updated using regular SQL, but is not designed to handle things like concurrent users - something which isn't a problem in our case. Such file-based databases are called embedded databases.

If you have Visual Studio, then you most likely already have SQLCE installed. Look for it in "C:\Program Files (x86)\Microsoft SQL Server Compact Edition\v4.0". Under the Desktop or Private folders you'll find a file called System.Data.SqlServerCe.dll which we need to interact with SQLCE. Add a reference to it from your Console Application.

Now, we're going to create the database and a simple one-table schema. We'll use good old ADO.NET for that. We'll create the database only if it doesn't exist already. First, add the following usings at the top of Program.cs:

using System.IO;
using System.Data.SqlServerCe;

In Main(), add the following:

            String filename = "people.sdf";
            String connStr = "Data Source=" + filename;

Since SQLCE works with just a file, we can create a basic connection string using just the name of the file we're working with.

The following code actually creates the database and a single table called person.

            try
            {
                // create database if it doesn't exist already

                if (!File.Exists(filename))
                {
                    // create the actual database file

                    using (SqlCeEngine engine = new SqlCeEngine(connStr))
                    {
                        engine.CreateDatabase();
                    }

                    // create the table schema

                    using (SqlCeConnection conn = new SqlCeConnection(connStr))
                    {
                        conn.Open();

                        String sql = @"create table person(
                                       id int identity not null primary key,
                                       name nvarchar(20),
                                       surname nvarchar(30)
                                   );";

                        using (SqlCeCommand command = new SqlCeCommand())
                        {
                            command.CommandText = sql;
                            command.Connection = conn;
                            int result = command.ExecuteNonQuery();
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex);
            }

We first use SqlCeEngine to create the database file. Then we use ADO .NET to create the person table. Each row will have an auto-incrementing id (primary key), as well as a name and surname. Note that SQLCE does not support the varchar type, so we have to use nvarchar (Unicode) instead.

If you now build and run the application, you should find a people.sdf file in the bin\Debug folder. We'll use that to create an Entity Data Model for the Entity Framework.

Creating the Data Model


Right click on the project and select Add -> New Item...:


From the Data category, select ADO.NET Entity Data Model. You can choose a name for it, or just leave it as the default Model1.edmx; it doesn't really matter.


Click The Add button. This brings up the Entity Data Model Wizard.


The "Generate from database" option is the default selection, so just click Next.


Hit the New Connection... button to bring up the Connection Properties window.


If SQL Server Compact 4.0 is not already selected as the Data source, click the Change... button and select it from the Change Data Source window:


Back in the Connection Properties window, click the Browse... button and locate the people.sdf file in your bin\Debug folder that we generated in the previous section. Leave the Password field empty, and click Test Connection. If all is well, you'll get a message box saying that the test succeeded.

Once you click OK, the Entity Data Model Wizard should become populated with a connection string and a default name for the model:


When you click Next, you'll get the following message:


Just click Yes and get on with it. In the next step, import the person table into your model by ticking the checkbox next to it:


Click Finish. The files for your model are added to the project. You may also get the following warning:


You don't have to worry about it. Just click OK. If you click Cancel instead, you won't have the necessary autogenerated code that you need for this project.

Interacting with the database using the Entity Framework


After the database-creation code from the first section, and before the end of the try scope, add the following code:

                // interact with the database

                using (peopleEntities db = new peopleEntities())
                {
                    db.people.Add(new person() { name = "John", surname = "Smith" });
                    db.SaveChanges();

                    foreach (person p in db.people)
                    {
                        Console.WriteLine("{0} {1} {2}", p.id, p.name, p.surname);
                    }
                }

Here, we create an instance of our entity context (peopleEntities) and then use it to interact with the database. We add a new row to the person table, and then commit the change via db.SaveChanges(). Finally, We retrieve all rows from the table and display them.

Also, add the following at the end of the Main() method so that we can see the output:

            Console.ReadLine();

Run the program by pressing F5. The output shows that a row was indeed added:


The Entity Framework knows where to find the database because it has a connection string in the App.config file:

  <connectionStrings>
    <add name="peopleEntities" connectionString="metadata=res://*/Model1.csdl|res://*/Model1.ssdl|res://*/Model1.msl;provider=System.Data.SqlServerCe.4.0;provider connection string=&quot;data source=|DataDirectory|\people.sdf&quot;" providerName="System.Data.EntityClient" />
  </connectionStrings>

This might be good enough in some situations, but other times, we might want to build such connection string in code and ask the Entity Framework to work with it. A reason for this might be because the connection string contains a password, and you want to obtain it from an encrypted source. The following two sections illustrate how this is done.

Building a raw Entity Framework connection string


Let's start out by commenting out the connection string in the App.config file:

  <connectionStrings>
    <!--
    <add name="peopleEntities" connectionString="metadata=res://*/Model1.csdl|res://*/Model1.ssdl|res://*/Model1.msl;provider=System.Data.SqlServerCe.4.0;provider connection string=&quot;data source=|DataDirectory|\people.sdf&quot;" providerName="System.Data.EntityClient" />
    -->
  </connectionStrings>

If you try running the program now, you'll get a nice exception.

Now, what we want to do is add the connection string into our code and pass it to the entity context (the peopleEntities). So before our Entity Framework code (which starts with using (peopleEntities...), add the following:

                String entityConnStr = @"metadata=res://*/Model1.csdl|res://*/Model1.ssdl|res://*/Model1.msl;provider=System.Data.SqlServerCe.4.0;provider connection string=&quot;data source=|DataDirectory|\people.sdf&quot;";

If you now try to pass this connection string to the peopleEntities constructor, you'll realise that you can't. You can see why if you expand Model1.edmx and then Model1.Context.tt in Solution Explorer, and finally open the Model1.Context.cs file:


The peopleEntities class has only a parameterless constructor, and it calls the constructor of DbContext with the connection string name defined in App.config. The DbContext constructor may accept a connection string instead, but we have no way of passing it through peopleEntities directly.

While you could add another constructor to peopleEntities, it is never a good idea to modify autogenerated code. If you regenerate the model, any code you add would be lost. Fortunately, however, peopleEntities is a partial class, which means we can add implementation to it in a separate file (see this question and this other question on Stack Overflow).

Add a new class and name it peopleEntities. Add the following at the top:

using System.Data.Entity;

Implement the class as follows:

    public partial class peopleEntities : DbContext
    {
        public peopleEntities(String connectionString)
            : base(connectionString)
        {

        }
    }

We can now modify our instantiation of peopleEntities to use our connection string as defined in code:

using (peopleEntities db = new peopleEntities(entityConnStr))

Since we are using a partial class defined in a separate file, any changes to the model will cause the autogenerated peopleEntities to be recreated, but will not touch the code we added in peopleEntities.cs.

When we run the program, we now get a very nice exception (though different from what we got right after commenting out the connection string in App.config):


Apparently this happens because of the &quot; values, which are necessary in XML files but cause problems when supplied in a String literal in code. We can replace them with single quotes instead, as follows:

                String entityConnStr = @"metadata=res://*/Model1.csdl|res://*/Model1.ssdl|res://*/Model1.msl;provider=System.Data.SqlServerCe.4.0;provider connection string='data source=|DataDirectory|\people.sdf'";

If we run the program now, it works fine, and a new row is added and retrieved:


Using EntityConnectionStringBuilder


You'll notice that the connection string we've been using is made up of three parts: metadata, provider, and the provider connection string that we normally use with ADO.NET.

We can use a class called EntityConnectionStringBuilder to provide these values separately and build a connection string. Using this approach avoids the problem with quotes illustrated at the end of the previous section.

First, remove or comment out the entityConnStr variable we have been using so far.

Then add the following near the top of Program.cs:

using System.Data.EntityClient;

Finally, add the following code instead of the connection string code we just removed:

                EntityConnectionStringBuilder csb = new EntityConnectionStringBuilder();
                csb.Metadata = "res://*/Model1.csdl|res://*/Model1.ssdl|res://*/Model1.msl";
                csb.Provider = "System.Data.SqlServerCe.4.0";
                csb.ProviderConnectionString = "data source=people.sdf";
                String entityConnStr = csb.ToString();

When you run the program, it should work just as well:


Summary


This article covered quite a bit of ground:
  • We first used the SqlceEngine and ADO .NET to create a SQL Server Compact database.
  • When then created an Entity Data Model for this database.
  • We added some code to add rows and retrieve data using the Entity Framework.
  • We provided the Entity Framework with a raw connection string in code. To do this, we used a partial class to add a new constructor to the entity context class that can pass the connection string to the parent DbContext. We also observed a problem with using &quot; in the connection string, and solved it by using single quotes instead.
  • We used EntityConnectionStringBuilder to build a connection string from its constituent parts, and in doing so completely avoided the &quot; problem.
I hope you found this useful. Feel free to leave any feedback in the comments below. Check back for more articles! :)

Friday, September 27, 2013

C#: Understanding Recursion with Directory Traversal

Greetings, and welcome to this brand new article at Programmer's Ranch! :)

In this article, we're going to talk about recursion. This technique is often considered an alternative to iteration (i.e. loops), and is useful for a wide variety of situations ranging from computing factorials to clearing empty areas in Minesweeper:


Since factorials are boring, and Minesweeper is a bit complex for this easy tutorial, we're going to look at the filesystem in order to learn about recursion. For example, take a look at the folder for the solution I just created in SharpDevelop:


See, the filesystem is actually a tree data structure. Each folder can contain other files and folders, which can in turn contain other files and folders, and so on. It isn't easy to work with things like trees using loops, but with recursion it just comes natural. Let's see how.

After creating a new console application, add the following at the top, which we need to interact with the filesystem:

using System.IO;

The first thing we want to do is get the current directory where the executable will be running. We do this by using Directory.GetCurrentDirectory(). Let's try that out:

            String currentDir = Directory.GetCurrentDirectory();
            Console.WriteLine(currentDir);
            Console.ReadKey(true);

...and here's what we get:


Now, we want to navigate up to the first CsRecursion folder, which is the solution folder. From there we'll be able to list the contents of all the subfolder. To do this, we create an instance of DirectoryInfo:

            DirectoryInfo dir = new DirectoryInfo(currentDir);

This allows us to get to the parent folder:

            dir = dir.Parent.Parent.Parent;

...and this is what we have so far:


Right, now about listing the folder and subfolder contents. Let's add a method to do that:

        public static void ListContents(DirectoryInfo dir)
        {
           
        }

In this method, we first want to list all the files in that folder. We can do this using DirectoryInfo.GetFiles(), or else using the static Directory.GetFiles() method, which is easier and works directly with file paths (Strings):

            foreach (String file in Directory.GetFiles(dir.FullName))
                Console.WriteLine(file);

Okay, now all we need is to do the same thing for all subfolders. It turns out that our ListContents() method can actually call itself, and pass in each subdirectory as a parameter:

            foreach (DirectoryInfo subdir in dir.GetDirectories())
                ListContents(subdir);

When a method calls itself, it's called recursion. In this case we say we are recursing into subdirectories.

Let's just change Main() so that it calls our ListContents() method:

        public static void Main(string[] args)
        {
            String currentDir = Directory.GetCurrentDirectory();
            DirectoryInfo dir = new DirectoryInfo(currentDir);
            dir = dir.Parent.Parent.Parent;
            ListContents(dir);
           
            Console.ReadKey(true);
        }

...and voilà:


As you can see, there's a very small amount of code, and recursion is a perfect fit for this kind of thing, because the same method can work on folders at different levels of the filesystem.

There's an important concept about recursion you need to be aware of, that might not be so evident in this example: the stopping condition. If a method calls itself and has no way to stop calling itself, you get a sort of infinite loop which actually ends in a stack overflow (in short, there's a limit to the number of times a method can call another method). Therefore, a recursive function always needs a way to stop calling itself.

If you're doing factorials, recursion stops when n=1. If you're computing a Fibonacci sequence, n=0 and n=1 are the stopping conditions. In our case, recursion stops when a folder has no further subfolders. In Minesweeper, recursion stops when there are no more adjacent blank squares (you either hit an edge or a number).

Anyhow, as we have seen in this article, recursion is a great technique to use when your data has divergent paths (most notably when dealing with trees). There are a few other interesting applications of recursion that I'll probably be writing about in the future, so stay tuned! :)

Sunday, September 22, 2013

C#: Mocking and Dependency Injection for Unit Testing a File Sorting Program

Hullo folks! :)

In yesterday's article, "C#: Unit Testing with SharpDevelop and NUnit", we learned about unit tests: what they are, why we use them, and how we write and execute them. The article ended with a number of insights about them. One of these insights was that it is not easy to write unit tests for code that relies on databases, the network, or other external resources.

In today's article, we're going to address this problem by learning about mocking and dependency injection. These might sound like big buzz-words, but you'll see in this article that they're really nothing special. To see this in action, we'll write a small program that loads a file from the hard disk and sorts it alphabetically, line by line.

This article is a bit on the advanced side, so ideally you should know your OOP and also be familiar with basic unit testing (e.g. from yesterday's article) before reading it.

Start off by creating a new Console Application in SharpDevelop. When this is done, add a new class (right click on project in Projects window, Add -> New Item...) and name it Sorter. At the top, add the following to allow us to read files and use lists:

using System.Collections.Generic;
using System.IO;

Next, add a member variable in which we can store the lines in the file:

private String[] lines;

Add a constructor in Sorter that takes the name of the file to sort, and loads it into the variable we just declared:

        public Sorter(String filename)
        {
            this.lines = File.ReadAllLines(filename);
        }

Now, add a method that actually does the sorting:

        public String[] GetSortedLines()
        {
            List<String> sortedLines = new List<String>(lines);
            sortedLines.Sort();
            return sortedLines.ToArray();
        }

This way of sorting is not the only one and not necessarily the best, but I chose it because it's simple and doesn't change the lines variable, just in case you want to keep it in its original unsorted format for whatever reason.

Let's try and write a test for this code. Add a new class called SorterTest (as I've already mentioned in yesterday's article, people usually put tests into their own project, but I'm trying to teach one concept at a time here). After adding a reference to nunit.framework.dll (check yesterday's article in case you're lost), set up the SorterTest.cs file as follows:

        [Test]
        public void GetSortedLinesTest()
        {
            Sorter sorter = new Sorter("test.txt");
            String[] s = sorter.GetSortedLines();
          
            Assert.AreEqual("Gates, Bill", s[0]);
            Assert.AreEqual("Norris, Chuck", s[1]);
            Assert.AreEqual("Torvalds, Linus", s[2]);
            Assert.AreEqual("Zuckerberg, Mark", s[3]);
        }

Create a file in your bin\Debug folder called test.txt and put the following in it:

Zuckerberg, Mark
Norris, Chuck
Gates, Bill
Torvalds, Linus

Open the Unit Tests window in SharpDevelop (View -> Tools -> Unit Tests) and run it. You can see that the test passes.

Great.

Actually, this is the wrong way of writing unit tests for this kind of thing. We have a dependency on the filesystem. What would happen if that file suddenly disappears? As a matter of fact, we are supposed to be testing the sorting logic, not whether the file is available or not.

In order to do this properly, we're going to have to refactor our code. We need to take our file loading code out of there. Create a new class called FileLoader and add the following at the top:

using System.IO;

...and then set up FileLoader as follows:

    public class FileLoader
    {
        private String[] lines;
      
        public String[] Lines
        {
            get
            {
                return this.lines;
            }
        }
      
        public FileLoader(String filename)
        {
            this.lines = File.ReadAllLines(filename);
        }
    }

In Sorter, remove the constructor as well as the using System.IO; and the lines variable. Instead, we'll pass our FileLoader as a parameter:

        public String[] GetSortedLines(FileLoader loader)
        {
            List<String> sortedLines = new List<String>(loader.Lines);
            sortedLines.Sort();
            return sortedLines.ToArray();
        }

This is called dependency injection: instead of creating the dependency (in our case a file) from within the Sorter class, we pass it as a parameter. This allows us to substitute the dependency for a fake (known as a mock). To do this, we'll need to take advantage of polymorphism (see "C# OOP: Abstract classes, fruit, and polymorphism". Create an interface (when adding a new item, instead of Class, specify Interface) and name it IFileLoader:


An interface is a form of abstract class - it cannot be instantiated, and it declares methods and/or properties that don't have any implementation because they should be implemented by the classes that inherit from (or implement) that interface. In an interface, however, no methods/properties have an implementation. It is used as a contract, saying that any class implementing the interface must implement its methods/properties. In our case, IFileLoader will be this:

    public interface IFileLoader
    {
        String[] Lines
        {
            get;
        }
    }

We then specify that FileLoader implements IFileLoader; this is the same as saying that FileLoader inherits from IFileLoader:

public class FileLoader : IFileLoader

FileLoader already has the necessary Lines property, so we're fine. Next, we replace the FileLoader parameter in Sorter.GetSortedLines() with an instance of the interface:

public String[] GetSortedLines(IFileLoader loader)

This allows us to pass, as a parameter, any class that implements IFileLoader. So we can create a class, MockFileLoader, that provides a hardcoded list of names:

    public class MockFileLoader : IFileLoader
    {
        private String[] lines = { "Zuckerberg, Mark""Norris, Chuck""Gates, Bill""Torvalds, Linus" };
      
        public String[] Lines
        {
            get
            {
                return this.lines;
            }
        }
    }

We can now rewrite our unit test like this:

        [Test]
        public void GetSortedLinesTest()
        {
            IFileLoader loader = new MockFileLoader();
            Sorter sorter = new Sorter();
            String[] s = sorter.GetSortedLines(loader);
          
            Assert.AreEqual("Gates, Bill", s[0]);
            Assert.AreEqual("Norris, Chuck", s[1]);
            Assert.AreEqual("Torvalds, Linus", s[2]);
            Assert.AreEqual("Zuckerberg, Mark", s[3]);
        }

If you run the unit test, you'll find that it works just like before, just that this time our unit test isn't dependent on any file and can run just file without one:


In this article, we have seen how to create mock classes that emulate the functionality of our normal classes, but can replace them in unit tests when dependencies exist. To facilitate this, both the mock class and the normal class implement a common interface, and an instance of this interface is passed as a parameter to the method being tested. This is called dependency injection and allows us to control what is being passed to the method.

It is not always easy to write mocks. First, as this article has shown, code may need to be refactored in order to isolate dependencies and support dependency injection. Secondly, mocking complex classes (e.g. an IMAP server) can take a great deal of effort and might not necessarily be worth it. Use your own judgement to decide whether you need unit tests in such situations.

Thanks for reading, and be sure to visit here often to read more articles that might be useful.

Tuesday, May 21, 2013

C# Network Programming: Simple HTTP Client

Hi! :)

In yesterday's article, HTTP Requests in Wireshark, we used Wireshark to observe the messages sent and received by web browsers when downloading a webpage.

In today's article, we're going to do that ourselves, in code! :D More specifically, we will write a simple client that connects to a web server and downloads a webpage.

Although the HTTP requests sent by a web browser might seem a little complicated, HTTP Made Really Easy shows that it really takes a very short request to retrieve a web page. For example, the following simple request can download the homepage of Programmer's Ranch:


GET / HTTP/1.1
Host: www.programmersranch.com


Note that the above includes a double newline which is essential for the request to be interpreted by the server (refer to yesterday's article).

Start a new SharpDevelop project, and include the necessary libraries for I/O and network programming:

using System;
using System.IO;
using System.Net;
using System.Net.Sockets;

We first declare a String to contain the HTTP request, as follows:

            String request = @"GET / HTTP/1.1
Host: www.programmersranch.com
Connection: Close

";

This is a special kind of String. The @ before the starting quotes shows that it is a literal string. This means that newlines are included in the string, and we can use it to make multiline strings. I have also included a Connection: Close header field, so that the server will automatically close the connection once it has sent back all the data - this makes it easier for us to know when we have received everything. Finally, note the double-newline at the end of the request, which is important.

Now, this is all the code we need:

            using (TcpClient client = new TcpClient("programmersranch.com"80))
            using (StreamWriter writer = new StreamWriter(client.GetStream()))
            using (StreamReader reader = new StreamReader(client.GetStream()))
            using (StreamWriter outputFile = File.CreateText("webpage.html"))
            {
                writer.Write(request);
                writer.Flush();
             
                String line = String.Empty;
                while ((line = reader.ReadLine()) != null)
                {
                    outputFile.WriteLine(line);
                }
             
                Console.WriteLine("Webpage has been written to webpage.html");
            }
         
            Console.Write("Press any key to continue . . . ");
            Console.ReadKey(true);

Here we're using TcpClient in order to connect to the website we want, and we are using port 80 since this is HTTP. We also declare a StreamWriter and StreamReader using the TcpClient's stream, so we can easily send data to and receive data from the server. Finally, we open a file called webpage.html to which we will write the received data. Since webpages tend to be quite long nowadays, this is better than writing it to the console window.

Note how the multiple using statements allow us to open and work with several resources, and they are automatically closed at the end.

In the body of the using statements, the first thing we do is send out the HTTP request, and remember to flush (remember the words of wisdom from an earlier article: streams and toilets must always be flushed) the stream to ensure that the request is actually sent.

Then, we receive the response from the server, line by line, and we write that line to the output file (webpage.html). When there is no more data to receive, reader.ReadLine() returns null, and the loop ends.

When you run this program...


...you will find the new file webpage.html in the folder where SharpDevelop puts your compiled executable (normally under bin\Debug in the folder where your source code is):


You can then open the file with your favourite text editor (Notepad++ is a good one) to view the full HTTP response:


You'll notice that the response includes the HTTP header (at the top) and the webpage's HTML, separated by a double-newline. As an exercise, try discarding the HTTP header, leaving only the HTML webpage.

Wonderful! :) In this article we have seen how easy it is to communicate with servers out there, and particularly how easy it is to download a webpage. If you want to learn about HTTP, HTTP Made Really Easy is a great place to start. You can also read an old blog post called "HTTP Communication: A Closer Look" which I had written about certain insights I observed while working on my BSc's Final Year Project. Finally, to learn about the HTTP protocol, there's no better place than RFC2616, which is the official standard.

We will do more network programming here in the future, so check back for more! :)