Programmer's Ranch: network programming

Showing posts with label network programming. Show all posts

Tuesday, May 21, 2013

C# Network Programming: Simple HTTP Client

Hi! :)

In yesterday's article, HTTP Requests in Wireshark, we used Wireshark to observe the messages sent and received by web browsers when downloading a webpage.

In today's article, we're going to do that ourselves, in code! :D More specifically, we will write a simple client that connects to a web server and downloads a webpage.

Although the HTTP requests sent by a web browser might seem a little complicated, HTTP Made Really Easy shows that it really takes a very short request to retrieve a web page. For example, the following simple request can download the homepage of Programmer's Ranch:

GET / HTTP/1.1
Host: www.programmersranch.com

Note that the above includes a double newline which is essential for the request to be interpreted by the server (refer to yesterday's article).

Start a new SharpDevelop project, and include the necessary libraries for I/O and network programming:

using System;
using System.IO;
using System.Net;
using System.Net.Sockets;

We first declare a String to contain the HTTP request, as follows:

String request = @"GET / HTTP/1.1
Host: www.programmersranch.com
Connection: Close

";

This is a special kind of String. The @ before the starting quotes shows that it is a literal string. This means that newlines are included in the string, and we can use it to make multiline strings. I have also included a Connection: Close header field, so that the server will automatically close the connection once it has sent back all the data - this makes it easier for us to know when we have received everything. Finally, note the double-newline at the end of the request, which is important.

Now, this is all the code we need:

using (TcpClient client = new TcpClient("programmersranch.com", 80))
            using (StreamWriter writer = new StreamWriter(client.GetStream()))
            using (StreamReader reader = new StreamReader(client.GetStream()))
            using (StreamWriter outputFile = File.CreateText("webpage.html"))
            {
                writer.Write(request);
                writer.Flush();

                String line = String.Empty;
                while ((line = reader.ReadLine()) != null)
                {
                    outputFile.WriteLine(line);
                }

                Console.WriteLine("Webpage has been written to webpage.html");
            }

            Console.Write("Press any key to continue . . . ");
            Console.ReadKey(true);

Here we're using TcpClient in order to connect to the website we want, and we are using port 80 since this is HTTP. We also declare a StreamWriter and StreamReader using the TcpClient's stream, so we can easily send data to and receive data from the server. Finally, we open a file called webpage.html to which we will write the received data. Since webpages tend to be quite long nowadays, this is better than writing it to the console window.

Note how the multiple using statements allow us to open and work with several resources, and they are automatically closed at the end.

In the body of the using statements, the first thing we do is send out the HTTP request, and remember to flush (remember the words of wisdom from an earlier article: streams and toilets must always be flushed) the stream to ensure that the request is actually sent.

Then, we receive the response from the server, line by line, and we write that line to the output file (webpage.html). When there is no more data to receive, reader.ReadLine() returns null, and the loop ends.

When you run this program...

...you will find the new file webpage.html in the folder where SharpDevelop puts your compiled executable (normally under bin\Debug in the folder where your source code is):

You can then open the file with your favourite text editor (Notepad++ is a good one) to view the full HTTP response:

You'll notice that the response includes the HTTP header (at the top) and the webpage's HTML, separated by a double-newline. As an exercise, try discarding the HTTP header, leaving only the HTML webpage.

Wonderful! :) In this article we have seen how easy it is to communicate with servers out there, and particularly how easy it is to download a webpage. If you want to learn about HTTP, HTTP Made Really Easy is a great place to start. You can also read an old blog post called "HTTP Communication: A Closer Look" which I had written about certain insights I observed while working on my BSc's Final Year Project. Finally, to learn about the HTTP protocol, there's no better place than RFC2616, which is the official standard.

We will do more network programming here in the future, so check back for more! :)

Monday, May 20, 2013

HTTP Requests in Wireshark

Hi everyone! :)

In yesterday's article, Network Programming: Networking Theory, we discussed what happens when a message is sent over a network, and when it is received. Today, we're going to see a practical example of that, by observing the HTTP requests sent by a web browser.

The first thing you should do is download Wireshark. This program will allow you to monitor network traffic going into and out of your PC. After installing it, run it, and you will see the following main screen:

Click on "Capture Options" and tick the checkbox next to the network interface listed. The network interface is basically a network card or, more commonly, the networking hardware on your motherboard. Wireshark can monitor traffic passing through the Ethernet port.

Click the "Start" button to start capturing packets. Immediately, you will start seeing stuff going in and out of your PC. You will know whether it's incoming or outgoing depending on whether your PC's IP address is in the "Source" or "Destination" column (in the screenshot below, my IP address is hidden):

In the "Filter" field at the top, type "http" and press ENTER. This filter allows you to concentrate on a specific type of network traffic - in this case, we are focusing on HTTP traffic which is used by web browsers.

In the Capture menu, Restart capturing, since there is a lot of traffic that doesn't interest us. From a web browser, visit http://www.programmersranch.com/. Soon after, Stop capturing in Wireshark from the Capture menu.

You can now find various HTTP requests to various parts of the page at programmersranch.com, including the page itself and various images. The screenshot above shows the HTTP request for the main page.You can expand the sections towards the middle of the window to view more detail about various parts of the transmission. In this case, I've expanded the HTTP section, where you can see the whole HTTP request. You can do the same for TCP, IP, etc.

When you click on a particular section (such as HTTP), the relevant part of the hex view (at the bottom of the window) is highlighted. This is useful because it sometimes shows you things that you might otherwise miss. In particular, you'll notice that the last four characters are represented by hex values: 0d 0a 0d 0a. In decimal, this becomes 13 10 13 10, which map to the ASCII values of CR LF CR LF (carriage return, line feed, carriage return, line feed). In short, you have two blank lines at the end of the HTTP request. They are important because HTTP requests won't work without them.

You should also be able to find the HTTP response coming from the server, which contains the HTML arriving at your browser (shown above).

Finally, in Wireshark you can right click on a particular transmission and select "Follow TCP Stream":

This allows you to view all the relevant requests and responses on the same connection without having to find the packets one by one:

Be aware, however, that following a TCP stream like this will change the filter from http to something else. This means that you won't be seeing all incoming HTTP packets. Be sure to change the filter back in order to continue viewing HTTP traffic.

Very well. You now know how to use Wireshark to sniff packets going into and out of your PC. In code, you can create the same messages and send them out in a socket in order to achieve the same behaviour that browsers, email clients, etc. have. In tomorrow's article, we will be working with HTTP in code. So stick around. :)

Saturday, May 18, 2013

Network Programming: Networking Theory

Hi all! :)

In yesterday's article (C# Network Programming: Echo Client/Server), we saw a simple example of a client and server communicating together by means of a simple protocol. However, a lot of questions remained unanswered.

Today we're going to learn a bit more about how the internet actually works, and that will help understand network programming better.

Let's say we have the setting above: the laptop on the left is connected to the server on the right. As we have seen yesterday, the laptop (client) must know the server's IP address and port in order to connect to it. The IP address and port together form an endpoint or socket.

The client must also have an endpoint of its own, to form such a connection. But since it is a client, the port is assigned automatically by the operating system - which is why we normally don't see it in network programming.

A port can be anything between 0 and 65535, but the first 1024 are reserved for standard services (such as HTTP or email), so we normally use ports 1024 onwards for our custom programs. By using different ports, a single computer (i.e. a single IP address) may have several different incoming and outgoing connections at the same time.

The internet works a little bit like the postal system. If you want to send someone a letter, you normally put it in an envelope, and write the person's address on the envelope. The postal system will then find a way to deliver your letter. In the case of the internet, this happens mostly thanks to TCP/IP (TCP over IP). The IP protocol (which is where IP addresses come from) takes care of routing a message from one computer to another - it can pass through several other routers/servers on the way.

While IP can find a route between the sending and receiving computers, another protocol (TCP or UDP) must be used to deliver the message to the correct application on the destination computer (using ports). While TCP is normally preferred because it allows reliable message delivery, UDP is simpler and useful in certain applications (e.g. video streaming, where the loss of a little bit of data is better than waiting for it to be retransmitted).

IP, TCP and UDP are part of a bigger picture called the OSI model, which categorises internet protocols. It looks something like this:

OSI Model Layer	Data Chunks	Say what?
Application	Application Data	HTTP, email, etc
Presentation
Session
Transport	Segments	TCP or UDP
Network	Packets/Datagrams	IP
Data Link	Frames	Ethernet
Physical	Bits/Bytes	Wired or Wireless

Let's read this table top-down. Your web browser can download web pages by sending an HTTP request to Google. This HTTP request passes to the transport layer, where a TCP header is added containing the source and destination port and other stuff. The result is passed to the network layer, where an IP header is added containing the source and destination port among other stuff. This is then broken up into pieces based on the Ethernet protocol, which works using hardware MAC addresses. Finally, the pieces are converted into electrical signals (representing bits and bytes) and sent out over the wire.

At the receiving end, this works in the opposite direction (bottom-up). The bits and bytes received over the wire are assembled into Ethernet frames. From these Ethernet frames, one or more IP packets are extracted. The IP header is removed, and the result is passed to the transport layer. The TCP header is removed, and from the information therein, the original HTTP request can be forwarded to the appropriate port on the receiving machine. It is then up to the application listening on that particular port (in this case a web server) to deal with the request appropriately (in this case by sending back an HTTP response).

Okay. So this was just a very brief summary of how the internet works... but you should at least realise the difference between the World Wide Web and the Internet. If you remember that the WWW is based on HTTP and port 80, you will realise that it is just one of countless services on the internet.

In tomorrow's article, we will see a practical example of how all this works.

C# Network Programming: Echo Client/Server

Hola! :)

In today's article we're going to learn about network programming. That means you can have two (or more) machines talking to each other.

I have been doing network programming since 2007, and I can tell you it is awesome! :D This was one of my early projects:

This was a pacman game over a Google Maps setting when Google Android was still in its infancy. You can do a lot of cool stuff when computers interact with each other.

Today we'll write two small programs and have them communicate with each other. In order to do network programming, you will need to use the following libraries:

using System;
using System.IO;
using System.Net;
using System.Net.Sockets;

After adding the above in a new console application project, put in the following code:

IPAddress ip = IPAddress.Any;
            int port = 18000;
            TcpListener server = new TcpListener(ip, port);
            server.Start();
            TcpClient client = server.AcceptTcpClient();

In network programming, you normally have a server, and any number of clients. The clients can connect to the server because they know its IP address and port. The IP address is a number identifying the machine (such as 192.168.5.185), and the port is a number used to connect to a particular server program (e.g. HTTP servers use port 80; SSH servers use port 22).

In the code above, we are simply starting a server and setting it to listen for connections on port 18000. The IP address is not important since it's the same as the machine running the program - so we set it to IPAddress.Any. A TCPListener is an actual server object: it allows us to accept connections from other machines and work with them. The TCPListener is started and then waits for a client to connect to it. When this happens, we obtain a TCPClient object. We can then talk to this client by obtaining its NetworkStream:

NetworkStream stream = client.GetStream();

We can use this network stream the same way we did with files:

using (StreamReader reader = new StreamReader(stream))
            using (StreamWriter writer = new StreamWriter(stream))
            {
String line = reader.ReadLine();
                Console.WriteLine("Client said: {0}", line);
                writer.WriteLine(line);
            }

What we do here is wait for a line of text to arrive from the client that connected earlier, and store it in the line variable. After showing what we received, we use the StreamWriter to send back the same line of text.

If you press F5 now, all you get is a blank window: the program isn't doing anything while waiting for a connection.

Start a new console application for the client. Again, make sure you are using the correct libraries:

using System;
using System.IO;
using System.Net;
using System.Net.Sockets;

Now, add code to connect to the server:

TcpClient client = new TcpClient("127.0.0.1", 18000);

The IP address 127.0.0.1 is special and means you are connecting on the same machine. If you are running the server on a different machine, you will need to change the IP address in the code above.

Next, we obtain the client's NetworkStream, as we did earlier for the server:

NetworkStream stream = client.GetStream();

We can now use it to talk to the server:

using (StreamReader reader = new StreamReader(stream))
            using (StreamWriter writer = new StreamWriter(stream))
            {
                Console.WriteLine("Write something to send to server:");
                String input = Console.ReadLine();
                writer.WriteLine(input);
                writer.Flush();

                String response = reader.ReadLine();
                Console.WriteLine("Server said: {0}", response);
            }

            Console.Write("Press any key to continue . . . ");
            Console.ReadKey(true);

When the user types something and presses ENTER, it is stored in the input variable. We then use writer.WriteLine(input) to send the input to the server.

The writer.Flush() is very important. If you leave it out, the program will be stuck and send nothing to the server. Like most input/output (I/O), network streams are buffered. That means they usually wait to have a certain amount of data before actually sending it out. The Flush() call forces the data to be sent.

Always remember: streams and toilets must always be flushed.

When the server sends back its response, we store it in the response variable, and show it in the console window. If you run this program now, you get the following exception:

Well duh, that's because the server is not running. So go back to your first (server) program and leave it running. Then, run the second (client) program:

Amazing! :D You have just manage to make two programs talk to each other. If you haven't already, try putting the server on one machine and the client on another (don't forget to change the IP address in the client).

What we have done here is an example of a simple protocol. A protocol consists of the rules by which computers talk to each other. In this case:

Client connects to server.
Client sends a line of text to server.
Server sends back that same line of text.

This is called an echo protocol, because the server echoes what the client says. Something of this sort is actually a standard echo protocol (RFC862) intended mostly for debugging.

Naturally, what we did here is very simple. Many standard protocols, such as IMAP (used for email), can get very complicated. Also, you'll notice that a new server must be run in order to handle each new client. We'll deal with this another time. Finally, if you have been following the above code carefully, you'll notice that I didn't Flush() the stream in the server program, even though I was writing data to it. That's because the stream is automatically closed because of the using statement. When that happens, any data in the stream is flushed, so we don't need to do that in code.

As you can see, it is very easy to write network programs in C# (not so much in other languages, such as C). Stick around, because there is much more to learn about network programming, and I will be writing several other articles on the topic that go into more detail and show you how to do certain things (e.g. download email or webpages).

Gigi Labs