Gigi Labs

Please follow Gigi Labs for the latest articles.

Tuesday, May 21, 2013

C# Network Programming: Simple HTTP Client

Hi! :)

In yesterday's article, HTTP Requests in Wireshark, we used Wireshark to observe the messages sent and received by web browsers when downloading a webpage.

In today's article, we're going to do that ourselves, in code! :D More specifically, we will write a simple client that connects to a web server and downloads a webpage.

Although the HTTP requests sent by a web browser might seem a little complicated, HTTP Made Really Easy shows that it really takes a very short request to retrieve a web page. For example, the following simple request can download the homepage of Programmer's Ranch:


GET / HTTP/1.1
Host: www.programmersranch.com


Note that the above includes a double newline which is essential for the request to be interpreted by the server (refer to yesterday's article).

Start a new SharpDevelop project, and include the necessary libraries for I/O and network programming:

using System;
using System.IO;
using System.Net;
using System.Net.Sockets;

We first declare a String to contain the HTTP request, as follows:

            String request = @"GET / HTTP/1.1
Host: www.programmersranch.com
Connection: Close

";

This is a special kind of String. The @ before the starting quotes shows that it is a literal string. This means that newlines are included in the string, and we can use it to make multiline strings. I have also included a Connection: Close header field, so that the server will automatically close the connection once it has sent back all the data - this makes it easier for us to know when we have received everything. Finally, note the double-newline at the end of the request, which is important.

Now, this is all the code we need:

            using (TcpClient client = new TcpClient("programmersranch.com"80))
            using (StreamWriter writer = new StreamWriter(client.GetStream()))
            using (StreamReader reader = new StreamReader(client.GetStream()))
            using (StreamWriter outputFile = File.CreateText("webpage.html"))
            {
                writer.Write(request);
                writer.Flush();
             
                String line = String.Empty;
                while ((line = reader.ReadLine()) != null)
                {
                    outputFile.WriteLine(line);
                }
             
                Console.WriteLine("Webpage has been written to webpage.html");
            }
         
            Console.Write("Press any key to continue . . . ");
            Console.ReadKey(true);

Here we're using TcpClient in order to connect to the website we want, and we are using port 80 since this is HTTP. We also declare a StreamWriter and StreamReader using the TcpClient's stream, so we can easily send data to and receive data from the server. Finally, we open a file called webpage.html to which we will write the received data. Since webpages tend to be quite long nowadays, this is better than writing it to the console window.

Note how the multiple using statements allow us to open and work with several resources, and they are automatically closed at the end.

In the body of the using statements, the first thing we do is send out the HTTP request, and remember to flush (remember the words of wisdom from an earlier article: streams and toilets must always be flushed) the stream to ensure that the request is actually sent.

Then, we receive the response from the server, line by line, and we write that line to the output file (webpage.html). When there is no more data to receive, reader.ReadLine() returns null, and the loop ends.

When you run this program...


...you will find the new file webpage.html in the folder where SharpDevelop puts your compiled executable (normally under bin\Debug in the folder where your source code is):


You can then open the file with your favourite text editor (Notepad++ is a good one) to view the full HTTP response:


You'll notice that the response includes the HTTP header (at the top) and the webpage's HTML, separated by a double-newline. As an exercise, try discarding the HTTP header, leaving only the HTML webpage.

Wonderful! :) In this article we have seen how easy it is to communicate with servers out there, and particularly how easy it is to download a webpage. If you want to learn about HTTP, HTTP Made Really Easy is a great place to start. You can also read an old blog post called "HTTP Communication: A Closer Look" which I had written about certain insights I observed while working on my BSc's Final Year Project. Finally, to learn about the HTTP protocol, there's no better place than RFC2616, which is the official standard.

We will do more network programming here in the future, so check back for more! :)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.