Gigi Labs

Please follow Gigi Labs for the latest articles.

Sunday, October 6, 2013

C#: Extracting Zipped Files with SharpZipLib

Hello there! :)

In this article, we're going to learn how to extract files from a .zip file using a library called SharpZipLib.
Additionally, we will also learn how to work with third-party libraries with functionality we might not get by default. This is because SharpZipLib is not part of the .NET Framework, and is developed by an independent team.

To get started, we need to create a new Console Application in SharpDevelop (or Visual Studio, if you prefer), and also download SharpZipLib from its website. When you extract the contents of the .zip file you downloaded, you'll find different versions of ICSharpCode.SharpZipLib.dll for different .NET versions. That's the library that will allow us to work with .zip files, and we need the one in the net-20 folder (the SharpZipLib download page says that's the one for .NET 4.0, which this article is based on).

Now, to use ICSharpCode.SharpZipLib.dll, we need to add a reference to it from our project. We've done this before in my article "C#: Unit Testing with SharpDevelop and NUnit". You need to right click on the project and select "Add Reference":


In the window that comes up, select the tab called ".NET Assembly Browser", since we want to reference a third party .dll file. Open the ICSharpCode.SharpZipLib.dll file, click the "OK" button in the "Add Reference" dialog, and you're ready to use SharpZipLib.


In fact, you will now see it listed among the project's references in the Projects window (on the right in the screenshot below):


Great, now how do we use it?

This is where our old friend, Intellisense, comes in. You might recall how we used it to discover things you can do with strings in one of my earliest articles here, "C# Basics: Working with Strings". This applies equally well here: as you begin to type your using statement to import the functionality you need from the library, Intellisense suggests the namespace for you:


Now SharpZipLib has a lot of functionality that allows us to work with .zip files, .tar.gz files and more. In our case we just want to experiment with .zip files, so we're fine with the following:

using ICSharpCode.SharpZipLib.Zip;

That thing is called a namespace: it contains a set of classes. If you type it into your Main() method and type in a dot (.) after it, you'll get a list of classes in it:


A namespace is used to categorise a set of related classes so that they can't be confused with other classes with the same name. Java's Vector class (a kind of resizable array) is a typical example. If you create your own Vector class to represent a mathematical vector, then you might run into a naming conflict. However, since the Java Vector is actually in the java.util namespace, then its full name is actually java.util.Vector.

This works the same way in C#. The List class you've been using all along is called is actually called System.Collections.Generic.List. We usually don't want to have to write all that, which is why we put in a using statement at the top with the namespace.

When we're working with a new namespace, however, typing the full name and using Intellisense allows us to discover what that namespace contains, without the need to look at documentation. In the screenshot above, we can already guess that ZipFile is probably the class we need to work with .zip files.

Intellisense also helps us when working with methods, constructors and properties:


I suppose you get the idea by now. Let's finally actually get something working. To try this out, I'm going to create a zip file with the following structure:

+ test1.txt
+ folder
    + test2.txt
    + test3.txt

I've used WinRAR to create the zip file, but you can use anything you like. I named it "zipfile.zip" and put it in C:\ (you might need administrator privileges to put it there... otherwise put it wherever you like). Now, we can easily obtain a list of files and folders in the .zip file with the following code:

        public static void Main(string[] args)
        {
            using (ZipFile zipFile = new ZipFile(@"C:\\zipfile.zip"))
            {
                foreach (ZipEntry entry in zipFile)
                {
                    Console.WriteLine(entry.Name);
                }
            }
         
            Console.ReadLine();
        }

This gives us:


We use the using keyword to close the .zip file once we're done - something we've been doing since my article "C#: Working with Streams". You realise you need this whenever you see either a Dispose() or a Close() method in Intellisense. We are also using looping over the zipFile itself - you realise you can do a foreach when you see a GetEnumerator() method in Intellisense. Each iteration over the zipFile gives us a ZipEntry instance, which contains information about each item in the .zip file. As you can see in the output above, entries comprise not just files, but also folders.

Since we want to extract files, folders are of no interest for us. We can use the IsFile property to deal only with files:

                    if (entry.IsFile)
                        Console.WriteLine(entry.Name);

In order to extract the files, I'm going to change the code as follows:

        public static void Main(string[] args)
        {
            using (ZipFile zipFile = new ZipFile(@"C:\\zipfile.zip"))
            {
                foreach (ZipEntry entry in zipFile)
                {
                    if (entry.IsFile)
                    {
                        Console.WriteLine("Extracting {0}", entry.Name);
                 
                        Stream stream = zipFile.GetInputStream(entry);
                        using (StreamReader reader = new StreamReader(stream))
                        {
                            String filename = entry.Name;
                            if (filename.Contains("/"))
                                filename = Path.GetFileName(filename);
                             
                            using (StreamWriter writer = File.CreateText(filename))
                            {
                                writer.Write(reader.ReadToEnd());
                            }
                        }
                    }
                }
            }
         
            Console.ReadLine();
        }

Note that I also added the following to work with File and Path:

using System.IO;

Extracting files involves a bit of work with streams. The zipFile's GetInputStream() method gives you a stream for a particular entry (file in the .zip file), which you can then read with a StreamReader as if you're reading a normal file.

I added a bit of code to handle cases when files are in folders in the .zip file - I am finding them by looking for the "/" directory separator in the entry name, and then extracting only the filename using Path.GetFileName(). [In practice you might have files with the same name in different folders, so you'd need to actually recreate the folders and put the files in the right folders, but I'm trying to keep things simple here.]

Finally, we read the contents of the entry using reader.ReadToEnd(), and write it to an appropriately named text file. If you run this program and go to your project's bin\Debug folder in Windows Explorer, you should see the test1.txt, test2.txt and test3.txt files with their proper contents. [Again, the proper way to deal with streams is to read chunks into a buffer and then write the file from it, but I'm using reader.ReadToEnd() for the sake of simplicity.]

Excellent! In this article, we have learned to list and extract files from a .zip file. We also learned why namespaces are important. But most importantly, we have looked at how to reference third party .dlls and discover how to use them based only on hints from Intellisense and our own experience. In fact, the above code was written without consulting any documentation whatsoever, solely by observing the intellisense for SharpZipLib. While it is usually easier to just find an example on the internet (possibly in some documentation), you'll find that this is a great skill to have when documentation is not readily available.

If you found this interesting, be sure to check back for future articles in which I will be covering other useful topics! Also, if you happen to like computer games, be sure to check out GOG's catalogue of cheap, DRM-free games. If you buy games through this link, you will be supporting Programmer's Ranch at the same time.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.