Linux: Using tar

  • In the UNIX world, combining many files together into a single file is referred to as "archiving" and the granddaddy utility for this is the venerable "tape archiver", known as tar. The tar utility started life as a means of taking a file system and preparing it for writing to tapes, typically for use as backup. This functionality is roughly analogous to the Windows Zip utility, but without the compression element. Tape drives typically do their own compression so software compression is often skipped and, when needed, the UNIX modular approach suggests that we would them compress separately.

    Jargon: The resulting file made by the tar utility is known as a tarball.

    Using the tar command, we use the -c flag to denote a Compaction operation and a -x flag to denote eXtraction. Almost universally we add the -f flag because we want to work with a file. And it is common for -v to be added to increase verbosity so that we can see what the command is doing. The tar utility is a massive, feature rich utility that has a great number of ways to use it. But nearly all uses are the same and we will cover that essential use here and delve into advanced uses of tar at a later time. For now, we will use tar simply as a means to ingest files and output them as a single file, a very handy utility.

    Unlike the compression utilities, tar does not do in place archiving... meaning the original files do remains when we are done; so this process is non-destructive. However, unlike a compression operation that reduces disk space by taking a large file off of our disk and replacing it with a smaller, compressed copy, an archiving operation makes a copy of the data on disk and so can easily fill up a disk without enough free space on it.

    The standard tar command usage looks like this:

    # tar -cvf /tmp/archivedfile.tar /directory/to/archive
    # tar -xvf /tmp/archivedfile.tar

    The first command takes the directory located at /directory/to/archive and archives it to the single file named /tmp/archivedfile.tar. When using tar, unlike with normal UNIX commands, the file extension does matter. This is a utility dependency, not a platform one.

    Now let's dispose of the theoreticals and we will try a real world example. It is not uncommon to use tar to grab an archive of the /etc configuration directory. So let's do that and see what we get:

    # tar -cvf /tmp/etc_backup.tar /etc

    When you run a command like this with the -v function you are going to get a lot of output. Note that /etc only be archived by the root user. I will put a full example into the comments to show what a full output looks like for reference.

    It would then be common to compress the resulting tar file to make the entire thing smaller and mimicking the behaviour of the Zip utility on Windows. But this is purely optional.

    It was, in fact, so common for tar to be combined with utilities such as gzip that these utilities were "built into" tar to make the process easier. Today we have the newer -z flag to leverage the gzip utility, -j to leverage bzip2 and -J to leverage xz. For the most part, however, people only uze -z commonly.

    So our initial command examples would be then modified to look like this:

    # tar -czvf /tmp/archivedfile.tgz /directory/to/archive
    # tar -xzvf /tmp/archivedfile.tgz

    Notice that a compressed tarball has a .tgz extension to let users know that it is both tarred and gzipped. You can uncompress a compressed *tarball separately simply by using gunzip to turn the compressed .tgz file into an uncompressed .tar file.

    The tar utility is one of the staples of UNIX systems and is very powerful and flexible. You will use it often.

    Part of a series on Linux Systems Administration by Scott Alan Miller