Linux:5 Files and Directories

From HandWiki

Files and Directories

This chapter discusses the basic tools for manipulating files and directories--tools that are among the most essential on a Linux system.

A file is a collection of data that is stored on disk and that can be manipulated as a single unit by its name.

A directory is a file that acts as a folder for other files. A directory can also contain other directories (subdirectories); a directory that contains another directory is called the parent directory of the directory it contains.

A directory tree includes a directory and all of its files, including the contents of all subdirectories. (Each directory is a "branch" in the "tree.") A slash character alone (`/') is the name of the root directory at the base of the directory tree hierarchy; it is the trunk from which all other files or directories branch.

To represent a directory's place in the file hierarchy, specify all of the directories between it and the root directory, using a slash (`/') as the delimiter to separate directories. So the directory `dict' as it appears in the preceding illustration would be represented as `/usr/dict'.

Each user has a branch in the `/home' directory for their own files, called their home directory. The hierarchy in the previous illustration has two home directories: `joe' and `jon', both subdirectories of `/home'.

When you are in a shell, you are always in a directory on the system, and that directory is called the current working directory. When you first log in to the system, your home directory is the current working directory.

Whenever specifying a file name as an argument to a tool or application, you can give the slash-delimited path name relative to the current working directory. For example, if `/home/joe' is the current working directory, you can use work to specify the directory `/home/joe/work', and work/schedule to specify `schedule', a file in the `/home/joe/work' directory.

Every directory has two special files whose names consist of one and two periods: `..' refers to the parent of the current working directory, and `.' refers to the current working directory itself. If the current working directory is `/home/joe', you can use `.' to specify `/home/joe' and `..' to specify `/home'. Furthermore, you can specify the `/home/jon' directory as ../jon.

Another way to specify a file name is to specify a slash-delimited list of all of the directory branches from the root directory (`/') down to the file to specify. This unique, specific path from the root directory to a file is called the file's full path name. (When referring to a file that is not a directory, this is sometimes called the absolute file name).

You can specify any file or directory on the system by giving its full path name. A file can have the same name as other files in different directories on the system, but no two files or directories can share a full path name. For example, user joe can have a file `schedule' in his `/home/joe/work' directory and a file `schedule' in his `/home/joe/play' directory. While both files have the same name (`schedule'), they are contained in different directories, and each has a unique full path name---`/home/joe/work/schedule' and `/home/joe/play/schedule'.

However, you don't have to type the full path name of a tool or application in order to start it. The shell keeps a list of directories, called the path, where it searches for programs. If a program is "in your path," or in one of these directories, you can run it simply by typing its name.

By default, the path includes `/bin' and `/usr/bin'. For example, the who command is in the `/usr/bin' directory, so its full path name is /usr/bin/who. Since the `/usr/bin' directory is in the path, you can type who to run /usr/bin/who, no matter what the current working directory is.

The following table describes some of the standard directories on Linux systems.

DIRECTORY - DESCRIPTION

/ - The ancestor of all directories on the system; all other directories are subdirectories of this directory, either directly or through other subdirectories.

/bin - Essential tools and other programs (or binaries).

/dev - Files representing the system's various hardware devices. For example, you use the file `/dev/cdrom' to access the CD-ROM drive.

/etc - Miscellaneous system configuration files, startup files, etcetera.

/home - The home directories for all of the system's users.

/lib - Essential system library files used by tools in `/bin'.

/proc - Files that give information about current system processes.

/root - The superuser's home directory, whose username is root. (In the past, the home directory for the superuser was simply `/'; later, `/root' was adopted for this purpose to reduce clutter in `/'.)

/sbin - Essential system administrator tools, or system binaries.

/tmp - Temporary files.

/usr - Subdirectories with files related to user tools and applications.

/usr/X11R6 - Files relating to the X Window System, including those programs (in `/usr/X11R6/bin') that run only under X.

/usr/bin - Tools and applications for users.

/usr/dict - Dictionaries and word lists (slowly being outmoded by `/usr/share/dict').

/usr/doc - Miscellaneous system documentation.

/usr/games - Games and amusements.

/usr/info - Files for the GNU Info hypertext system.

/usr/lib - Libraries used by tools in `/usr/bin'.

/usr/local - Local files--files unique to the individual system--including local documentation (in `/usr/local/doc') and programs (in `/usr/local/bin').

/usr/man - The online manuals, which are read with the man command .

/usr/share - Data for installed applications that is architecture-independent and can be shared between systems. A number of subdirectories with equivalents in `/usr' also appear here, including `/usr/share/doc', `/usr/share/info', and `/usr/share/icons'.

/usr/src - Program source code for software compiled on the system.

/usr/tmp - Another directory for temporary files.

/var - Variable data files, such as spool queues and log files.

Naming Files and Directories

File names can consist of upper- and lowercase letters, numbers, periods (`.'), hyphens (`-'), and underscores (`_'). File names are also case sensitive---`foo', `Foo' and `FOO' are all different file names. File names are almost always all lowercase letters.

Linux does not force you to use file extensions, but it is convenient and useful to give files proper extensions, since they will help you to identify file types at a glance. You can have files with multiple extensions, such as `long.file.with.many.extensions', and you can have files with none at all, such as `myfile'. A JPEG image file, for example, does not have to have a `.jpg' or `.jpeg' extension, and program files do not need a special extension to make them work.

The file name before any file extensions is called the base file name. For example, the base file name of `house.jpeg' is `house'.

Some commonly used file extensions are shown in the following table, including extensions for text and graphics files.

EXTENSION - DESCRIPTION

.txt or .text - Plain, unformatted text.

.tex - Text formatted in the TeX or LaTeX formatting language.

.ltx or .latex - Text formatted in the LaTeX formatting language (neither are as common as just using `.tex').

.gz - A compressed file.

.sgml - SGML ("Standardized General Markup Language") format.

.html - HTML ("Hypertext Markup Language") format.

.xml - XML ("Extended Markup Language") format.

Making an Empty File

You may sometimes want to create a new, empty file as a kind of "placeholder." To do so, give the name that you want to use for the file as an argument to touch.

  • To create the file `a_fresh_start' in the current directory, type:

    $ touch a_fresh_start RET
  • To create the file `another_empty_file' in the `work/completed' subdirectory of the current directory, type:

    $ touch work/completed/another_empty_file RET

This tool "touches" the files you give as arguments. If a file does not exist, it creates it; if the file already exists, it changes the modification timestamp on the file to the current date and time, just as if you had used the file.

NOTE: Often, you make a file when you edit it, such as when in a text or image or sound editor; in that case, you don't need to make the file first.

Making a Directory

Use mkdir ("make directory") to make a new directory, giving the path name of the new directory as an argument. Directory names follow the same conventions as used with other files--that is, no spaces, slashes, or other unusual characters are recommended.

  • To make a new directory called `work' in the current working directory, type:

    $ mkdir work RET
  • To make a new directory called `work' in the `/tmp' directory, type:

    $ mkdir /tmp/work RET

Making a Directory Tree

Use mkdir with the `-p' option to make a subdirectory and any of its parents that do not already exist. This is useful when you want to make a fairly complex directory tree from scratch, and don't want to have to make each directory individually.

  • To make the `work/completed/2001' directory--a subdirectory of the `completed' directory, which in turn is a subdirectory of the `work' directory in the current directory, type:

    $ mkdir -p work/completed/2001 RET

This makes a `2001' subdirectory in the directory called `completed', which in turn is in a directory called `work' in the current directory; if the `completed' or the `work' directories do not already exist, they are made as well (if you know that `work' and `completed' both exist, the above command works fine without the `-p' option).

Changing Directories

Use cd to change the current working directory; give the name of the new directory as an argument.

  • To change the current working directory to `work', a subdirectory in the current directory, type:

    $ cd work RET
  • To change to the current directory's parent directory, type:

    $ cd .. RET

You can also give the full path name of a directory.

  • To change the current working directory to `/usr/doc', type:

    $ cd /usr/doc RET

This command makes `/usr/doc' the current working directory.

Changing to Your Home Directory

With no arguments, cd makes your home directory the current working directory.

  • To make your home directory the current working directory, type:

    $ cd RET

Changing to the Last Directory You Visited

To return to the last directory you were in, use cd and give `-' as the directory name. For example, if you are in the `/home/mrs/work/samples' directory, and you use cd to change to some other directory, then at any point while you are in this other directory you can type cd - to return the current working directory to `/home/mrs/work/samples'.

  • To return to the directory you were last in, type:

    $ cd - RET

Getting the Name of the Current Directory

To determine what the current working directory is, use pwd ("print working directory"), which lists the full path name of the current working directory.

  • To determine what the current working directory is, type:

    $ pwd RET
    /home/mrs
    $

In this example, pwd output the text `/home/mrs', indicating that the current working directory is `/home/mrs'.

Listing Directories

Use ls to list the contents of a directory. It takes as arguments the names of the directories to list. With no arguments, ls lists the contents of the current working directory.

  • To list the contents of the current working directory, type:

    $ ls RET
    apple   cherry  orange
    $

In this example, the current working directory contains three files: `apple', `cherry', and `orange'.

  • To list the contents of `work', a subdirectory in the current directory, type:

    $ ls work RET
  • To list the contents of the `/usr/doc' directory, type:

    $ ls /usr/doc RET

You cannot discern file types from the default listing; directories and executables are indistinguishable from all other files. Using the `-F' option, however, tells ls to place a `/' character after the names of subdirectories and a `*' character after the names of executable files.

  • To list the contents of the directory so that directories and executables are distinguished from other files, type:

    $ ls -F RET
    repeat* test1   test2   words/
    $

In this example, the current directory contains an executable file named `repeat', a directory named `words', and some other files named `test1' and `test2'.

Another way to list the contents of directories--and one I use all the time, when I'm in X and when I also want to look at image files in those directories--is to use Mozilla or some other Web browser as a local file browser. Use the prefix file:/ to view local files. Alone, it opens a directory listing of the root directory; file:/home/joe opens a directory listing of user joe's home directory, file:/usr/local/src opens the local source code directory, and so on. Directory listings will be rendered in HTML on the fly in almost all browsers, so you can click on subdirectories to traverse to them, and click on files to open them in the browser.

Yet another way to list the contents of directories is to use a "file manager" tool, of which there are at least a few on Linux; the most popular of these is probably the "Midnight Commander," or mc.

The following subsections describe some commonly used options for controlling which files ls lists and what information about those files ls outputs. It is one of the most often used file commands on Unix-like systems.

Listing File Attributes

Use ls with the `-l' ("long") option to output a more extensive directory listing--one that contains each file's size in bytes, last modification time, file type, and ownership and permissions.

  • To output a verbose listing of the `/usr/doc/bash' directory, type:
$ ls -l /usr/doc/bash RET
total 72
-rw-r--r--   1 root    root     13744 Oct 19 22:57 CHANGES.gz
-rw-r--r--   1 root    root      1816 Oct 19 22:57 COMPAT.gz
-rw-r--r--   1 root    root     16398 Oct 19 22:57 FAQ.gz
-rw-r--r--   1 root    root      2928 Oct 19 22:57 INTRO.gz
-rw-r--r--   1 root    root      4751 Oct 19 22:57 NEWS.gz
-rw-r--r--   1 root    root      1588 Oct 19 22:57 POSIX.NOTES.gz
-rw-r--r--   1 root    root      2718 Oct 19 22:57 README.Debian.gz
-rw-r--r--   1 root    root     19596 Oct 19 22:57 changelog.gz
-rw-r--r--   1 root    root      1446 Oct 19 22:57 copyright
drwxr-xr-x   9 root    root      1024 Jul 25  1997 examples
$


This command outputs a verbose listing of the files in `/usr/doc/bash'. The first line of output gives the total amount of disk space, in 1024-byte blocks, that the files take up (in this example, 72). Each subsequent line displays several columns of information about one file.

The first column displays the file's type and permissions. The first character in this column specifies the file type; the hyphen (`-') is the default and means that the file is a regular file. Directories are denoted by `d', and symbolic links are denoted by `l'. The remaining nine characters of the first column show the file permissions. The second column lists the number of hard links to the file. The third and fourth columns give the names of the user and group that the file belongs to. The fifth column gives the size of the file in bytes, the sixth column gives the date and time of last modification, and the last column gives the file name.

Listing Directories Recursively

Use the `-R' option to list a directory recursively, which outputs a listing of that directory and all of its subdirectories.

  • To output a recursive directory listing of the current directory, type:

    $ ls -R RET
    play    work
    
    play:
    notes
    
    work:
    notes
    $

In this example, the current working directory contains two subdirectories, `work' and `play', and no other files. Each subdirectory contains a file called `notes'.

  • To list all of the files on the system, type:

    $ ls -R / RET

This command recursively lists the contents of the root directory, `/', and all of its subdirectories. It is common to combine this with the attribute option, `-l', to output a verbose listing of all the files on the system:

$ ls -lR / RET

NOTE: You can't list the contents of some directories on the system if you don't have permission to do so.

Listing Newest Files First

Use the `-t' option with ls to sort a directory listing so that the newest files are listed first.

  • To list all of the files in the `/usr/tmp' directory sorted with newest first, type:

    $ ls -t /usr/tmp RET

Listing Hidden Files

By default, ls does not output files that begin with a period character (`.'). To reduce clutter, many applications "hide" configuration files in your home directory by giving them names that begin with a period; these are called dot files, or sometimes "hidden" files. As mentioned earlier, every directory has two special dot files: `..', the parent directory, and `.', the directory itself.

To list all contents of a directory, including these dot files, use the `-a' option.

  • To list all files in the current directory, type:

    $ ls -a RET

Use the `-A' option to list almost all files in the directory: it lists all files, including dot files, with the exception of `..' and `.'.

  • To list all files in the current directory except for `..' and `.', type:

    $ ls -A RET


Listing Directories in Color

Use ls with the `--color' option to list the directory contents in color; files appear in different colors depending on their content. Some of the default color settings include displaying directory names in blue, text files in white, executable files in green, and links in turquoise.

NOTE: It's common practice to create a command alias that substitutes `ls --color' for `ls', so that typing just ls outputs a color listing. To learn more about making aliase.


Listing Directory Tree Graphs

Use tree to output an ASCII text tree graph of a given directory tree.

  • To output a tree graph of the current directory and all its subdirectories, type:

    $ tree RET
    .
    |-- projects
    |   |-- current
    |   `-- old
    |       |-- 1
    |       `-- 2
    `-- trip
        `-- schedule.txt
    
    4 directories, 3 files
    $

In the preceding example, a tree graph is drawn showing the current directory, which contains the two directories `projects' and `trip'; the `projects' directory in turn contains the directories `current' and `old'.

To output a tree graph of a specific directory tree, give the name of that directory tree as an argument.

  • To output a tree graph of your home directory and all its subdirectories, type:

    $ tree ~ RET

To output a graph of a directory tree containing directory names only, use the `-d' option. This is useful for outputting a directory tree of the entire system, or for getting a picture of a particular directory tree.

  • To output a tree graph of the entire system to the file `tree', type:

    $ tree -d / > tree RET
  • To peruse a tree graph of the `/usr/local' directory tree, type:

    $ tree -d /usr/local |less RET

Additional Directory Listing Options

The ls tool has many options to control the files listed and the information given for each file; the following table describes some of them. (The options are case sensitive.)

OPTION - DESCRIPTION

--color Colorize the names of files depending on their type.

-R Produce a recursive listing.

-a List all files in a directory, including hidden, or "dot," files.

-d List directories by name instead of listing their contents.

-f Do not sort directory contents; list them in the order they are written on the disk.

-l Produce a verbose listing.

-r Sort directory contents in reverse order.

-s Output the size--as an integer in 1K blocks--of each file to the left of the file name.

-t Sort output by timestamp instead of alphabetically, so the newest files are listed first.

NOTE: You can combine any of these options; for example, to list the contents of a directory sorted newest first, and display all attributes, use `-lt'. To recursively list all hidden files and display all attributes, use `-lRa'. It doesn't matter what order you put the options in--so `-lRa' is the same as, say, `-alR'.

Copying Files and Directories

Use cp ("copy") to copy files. It takes two arguments: the source file, which is the existing file to copy, and the target file, which is the file name for the new copy. cp then makes an identical copy of the source file, giving it the specified target name. If a file with the target name already exists, cp overwrites it. It does not alter the source file.

  • To copy the file `my-copy' to the file `neighbor-copy', type:

    $ cp my-copy neighbor-copy RET

This command creates a new file called `neighbor-copy' that is identical to `my-copy' in every respect except for its name, owner, group, and timestamp--the new file has a timestamp that shows the time when it was copied. The file `my-copy' is not altered.

Use the `-p' ("preserve") option to preserve all attributes of the original file, including its timestamp, owner, group, and permissions.

  • To copy the file `my-copy' to the file `neighbor-copy', preserving all of the attributes of the source file in the target file, type:

    $ cp -p my-copy neighbor-copy RET

This command copies the file `my-copy' to a new file called `neighbor-copy' that is identical to `my-copy' in every respect except for its name.

To copy a directory along with the files and subdirectories it contains, use the -R option--it makes a recursive copy of the specified directory and its entire contents.

  • To copy the directory `public_html', and all of its files and subdirectories, to a new directory called `private_html', type:

    $ cp -R public_html private_html RET

The `-R' option does not copy files that are symbolic links, and it does not retain all original permissions. To recursively copy a directory including links, and retain all of its permissions, use the `-a' ("archive") option. This is useful for making a backup copy of a large directory tree.

  • To make an archive copy of the directory tree `public_html' to the directory `private_html', type:

    $ cp -a public_html private_html RET

Moving Files and Directories

Use the mv ("move") tool to move, or rename, a file or directory to a different location. It takes two arguments: the name of the file or directory to move followed by the path name to move it to. If you move a file to a directory that contains a file of the same name, the file is overwritten.

  • To move the file `notes' in the current working directory to `../play', type:

    $ mv notes ../play RET

This command moves the file `notes' in the current directory to `play', a subdirectory of the current working directory's parent. If a file `notes' already exists in `play', that file is overwritten. If the subdirectory `play' does not exist, this command moves `notes' to its parent directory and renames it `play'.

To move a file or directory that is not in the current directory, give its full path name as an argument.

  • To move the file `/usr/tmp/notes' to the current working directory, type:

    $ mv /usr/tmp/notes . RET

This command moves the file `/usr/tmp/notes' to the current working directory. To move a directory, give the path name of the directory you want to move and the path name to move it to as arguments.

  • To move the directory `work' in the current working directory to `play', type:

    $ mv work play RET

This command moves the directory `work' in the current directory to the directory `play'. If the directory `play' already exists, mv puts `work' inside `play'---it does not overwrite directories.

Renaming a file is the same as moving it; just specify as arguments the file to rename followed by the new file name.

  • To rename the file `notes' to `notes.old', type:

    $ mv notes notes.old RET

Changing File Names to Lowercase

To change the uppercase letters in a file name to lowercase (or vice versa), use chcase. It takes as arguments the files whose names it should change.

  • To change the file names of all of the files in the current directory to lowercase letters, type:

    $ chcase * RET

Use the `-u' option to change file names to all uppercase letters.

  • To change file names of all of the files with a `.dos' extension in the `~/tmp' directory to all uppercase letters, type:

    $ chcase -u ~/tmp/*.dos RET

By default, chcase does not rename directories; use the `-d' option to rename directories as well as other files. The `-r' option recursively descends into any subdirectories and renames those files, too.

  • To change all of the files and subdirectory names in the current directory to all lowercase letters, type:

    $ chcase -d * RET
  • To change all of the files and subdirectory names in the current directory to all uppercase letters, and descend recursively into all subdirectories, type:

    $ chcase -d -r -u * RET
  • To change all of the files in the current directory to all lowercase letters, and descend recursively into all subdirectories (but do not change any directory names), type:

    $ chcase -r * RET

Renaming Multiple Files with the Same Extension

To give a different file name extension to a group of files that share the same file name extension, use chcase with the `-x' option for specifying a Perl expression; give the patterns to match the source and target files as a quoted argument.

For example, you can rename all file names ending in `.htm' to end in `.html' by giving `s/htm/html/' as the expression to use.

  • To rename all of the files in the current directory with a `.htm' extension to `.html', type:

    $ chcase -x 's/htm/html/' '*.htm' RET

By default, chcase will not overwrite files; so if you want to rename `index.htm' to `index.html', and both files already exist in the current directory, the above example will do nothing. Use the `-o' option to specify that existing files may be overwritten.

  • To rename all of the files in the current directory with a `.htm' extension to `.html' and overwrite any existing files, type:

    $ chcase -o -x 's/htm/html/' '*.htm' RET

NOTE: Renaming multiple files at once is a common request.

Removing Files and Directories

Use rm ("remove") to delete a file and remove it from the system. Give the name of the file to remove as an argument.

  • To remove the file `notes' in the current working directory, type:

    $ rm notes RET

To remove a directory and all of the files and subdirectories it contains, use the `-R' ("recursive") option.

  • To remove the directory `waste' and all of its contents, type:

    $ rm -R waste RET

To remove an empty directory, use rmdir; it removes the empty directories you specify. If you specify a directory that contains files or subdirectories, rmdir reports an error.

  • To remove the directory `empty', type:

    $ rmdir empty RET

Removing a File with a Strange Name

Files with strange characters in their names (like spaces, control characters, beginning hyphens, and so on) pose a problem when you want to remove them. There are a few solutions to this problem.

One way is to use tab completion to complete the name of the file. This works when the name of the file you want to remove has enough characters to uniquely identify it so that completion can work.

  • To use tab completion to remove the file `No Way' in the current directory, type:

    $ rm NoTAB Way RET

In the above example, after TAB was typed, the shell filled in the rest of the file name (` Way').

When a file name begins with a control character or other strange character, specify the file name with a file name pattern that uniquely identifies it, for tips on building file name patterns). Use the `-i' option to verify the deletion.

  • To delete the file `^Acat' in a directory that also contains the files `cat' and `dog', type:

    $ rm -i ?cat RET
    rm: remove `^Acat'? y RET
    $

In the above example, the expansion pattern `?cat' matches the file `^Acat' and no other files in the directory. The `-i' option was used because, in some cases, no unique pattern can be made for a file--for example, if this directory also contained a file called `1cat', the above rm command would also attempt to remove it; with the `-i' option, you can answer n to it.

These first two methods will not work with files that begin with a hyphen character, because rm will interpret such a file name as an option; to remove such a file, use the `--' option--it specifies that what follows are arguments and not options.

  • To remove the file `-cat' from the current directory, type:

    $ rm -- -cat RET

A Safe Way to Remove a File

Once a file is removed, it is permanently deleted and there is no command you can use to restore it; you cannot "undelete" it. (Although if you can unmount the filesystem that contained the file immediately after you deleted the file, a wizard might be able to help reconstruct the lost file by using grep to search the filesystem device file.)

A safer way to remove files is to use del, which is simply an alias to rm with the `-i' option. This specifies for rm to run in interactive mode and confirm the deletion of each file. It may be good practice to get in the habit of using del all the time, so that you don't make an accidental slip and rm an important file.

NOTE: Question 3.6 in the Unix FAQ (see `/usr/doc/FAQ/unix-faq-part3') discusses this issue, and gives a shell script called can that you can use in place of rm---it puts files in a "trashcan" directory instead of removing them; you then periodically empty out the trashcan with rm.

Giving a File More than One Name

Links are special files that point to other files; when you act on a file that is a link, you act on the file it points to. There are two kinds of links: hard links and symbolic links. A hard link is another name for an existing file; there is no difference between the link and the original file. So if you make a hard link from file `foo' to file `bar', and then remove file `bar', file `foo' is also removed. Each file has at least one hard link, which is the original file name itself. Directories always have at least two hard links--the directory name itself (which appears in its parent directory) and the special file `.' inside the directory. Likewise, when you make a new subdirectory, the parent directory gains a new hard link for the special file `..' inside the new subdirectory.

A symbolic link (sometimes called a "symlink" or "soft link") passes most operations--such as reading and writing--to the file it points to, just as a hard link does. However, if you remove a symlink, you remove only the symlink itself, and not the original file.

Use ln ("link") to make links between files. Give as arguments the name of the source file to link from and the name of the new file to link to. By default, ln makes hard links.

  • To create a hard link from `seattle' to `emerald-city', type:

    $ ln seattle emerald-city RET

This command makes a hard link from an existing file, `seattle', to a new file, `emerald-city'. You can read and edit file `emerald-city' just as you would `seattle'; any changes you make to `emerald-city' are also written to `seattle' (and vice versa). If you remove the file `emerald-city', file `seattle' is also removed.

To create a symlink instead of a hard link, use the `-s' option.

  • To create a symbolic link from `seattle' to `emerald-city', type:

    $ ln -s seattle emerald-city RET

After running this command, you can read and edit `emerald-city'; any changes you make to `emerald-city' will be written to `seattle' (and vice versa). But if you remove the file `emerald-city', the file `seattle' will not be removed.

Specifying File Names with Patterns

The shell provides a way to construct patterns, called file name expansions, that specify a group of files. You can use them when specifying file and directory names as arguments to any tool or application.

The following table lists the various file expansion characters and their meaning.

CHARACTER - DESCRIPTION

* - The asterisk matches a series of zero or more characters, and is sometimes called the "wildcard" character. For example, * alone matches all file names, a* matches all file names that consist of an `a' character followed by zero or more characters, and a*b matches all file names that begin with an `a' character and end with a `b' character, with any (or no) characters in between.

? - The question mark matches exactly one character. Therefore, ? alone matches all file names with exactly one character, ?? matches all file names with exactly two characters, and a? matches any file name that begins with an `a' character and has exactly one character following it.

[list] - Square brackets match one character in list. For example, [ab] matches exactly two file names: `a' and `b'. The pattern c[io] matches `ci' and `co', but no other file names.

~ - The tilde character expands to your home directory. For example, if your username is joe and therefore your home directory is `/home/joe', then `~' expands to `/home/joe'. You can follow the tilde with a path to specify a file in your home directory--for example, `~/work' expands to `/home/joe/work'.

Brackets also have special meaning when used in conjunction with other characters, as described by the following table.

CHARACTER - DESCRIPTION

- - A hyphen as part of a bracketed list denotes a range of characters to match--so [a-m] matches any of the lowercase letters from `a' through `m'. To match a literal hyphen character, use it as the first or last character in the list. For example, a[-b]c matches the files `a-c' and `abc'.

! - Put an exclamation point at the beginning of a bracketed list to match all characters except those listed. For example, a[!b]c matches all files that begin with an `a' character, end with a `c' character, and have any one character, except a `b' character, in between; it matches `aac', `a-c', `adc', and so on.

You can combine these special expansion characters in any combination, and you can specify more than one pattern as multiple arguments. The following examples show file expansion in action using commands described in this chapter.

  • To list all files in the `/usr/bin' directory that have the text `tex' anywhere in their name, type:

    $ ls /usr/bin/*tex* RET
  • To copy all files whose names end with `.txt' to the `doc' subdirectory, type:

    $ cp *.txt doc RET
  • To output a verbose listing of all files whose names end with either a `.txt' or `.text' extension, sorting the list so that newer files are listed first, type:

    $ ls -lt *.txt *.text RET
  • To move all files in the `/usr/tmp' directory whose names consist of the text `song' followed by an integer from 0 to 9 and a `.cdda' extension, placing them in a directory `music' in your home directory, type:

    $ mv /usr/tmp/song[0-9].cdda ~/music RET
  • To remove all files in the current working directory that begin with a hyphen and have the text `out' somewhere else in their file name, type:

    $ rm -- -*out* RET
  • To concatenate all files whose names consist of an `a' character followed by two or more characters, type:

    $ cat a??* RET

Browsing Files

You can view and peruse local files in a Web browser, such as the text-only browser lynx or the graphical Mozilla browser for X.

The lynx tool is very good for browsing files on the system--give the name of the directory to browse, and lynx will display a listing of available files and directories in that directory.

You can use the cursor keys to browse and press RET on a subdirectory to traverse to that directory; lynx can display plain text files, compressed text files, and files written in HTML; it's useful for browsing system documentation in the `/usr/doc' and `/usr/share/doc' directories, where many software packages come with help files and manuals written in HTML.

  • To browse the system documentation files in the `/usr/doc' directory, type:

    $ lynx /usr/doc RET

For more about using lynx.

With Mozilla and some other browsers you must precede the full path name with the `file:/' URN--so the `/usr/doc' directory would be `file://usr/doc'. With lynx, just give a local path name as an argument.

  • To browse the system documentation files in the `/usr/doc' directory in Mozilla, type the following in Mozilla's Location window:

    file://usr/doc

Groups, file ownership, and access permissions are Linux features that enable users to share files with one another. But even if you don't plan on sharing files with other users on your system, familiarity with these concepts will help you understand how file access and security work in Linux.

Groups and How to Work in Them

A group is a set of users, created to share files and to facilitate collaboration. Each member of a group can work with the group's files and make new files that belong to the group. The system administrator can add new groups and give users membership to the different groups, according to the users' organizational needs. For example, a system used by the crew of a ship might have groups such as galley, deck, bridge, and crew; the user captain might be a member of all the groups, but user steward might be a member of only the galley and crew groups.

On a Linux system, you're always a member of at least one group: your login group. You are the only member of this group, and its group name is the same as your username.

Let's look at how to manage your group memberships.

Listing the Groups a User Belongs To

To list a user's group memberships, use the groups tool. Give a username as an argument, and groups outputs a line containing that username followed by all of the groups the user is a member of. With no arguments, groups lists your own username and group memberships.

  • To list your group memberships, type:

    $ groups RET
    steward galley crew
    $

In this example, three groups are output: steward (the user's login group), galley, and crew.

  • To list the group memberships of user blackbeard, type:

    $ groups blackbeard RET
    blackbeard : blackbeard
    $

In this example, the command outputs the given username, blackbeard, followed by the name of one group, blackbeard, indicating that user blackbeard belongs to only one group: his login group.

Listing the Members of a Group

@sf{Debian}: `members'


To list the members of a particular group, use the members tool, giving the name of the particular group as an argument.

  • To output a list of the members of the galley group, type:

    $ members galley RET
    captain steward pete
    $

In this example, three usernames are output, indicating that these three users are the members of the galley group.

File Ownership

Every file belongs to both a user and a group--usually to the user who created it and to the group the user was working in at the time (which is almost always the user's login group). File ownership determines the type of access users have to particular files.

Determining the Ownership of a File

To find out which user and group own a particular file, use ls with the `-l' option to list the file's attributes. The name of the user who owns the file appears in the third column of the output, and the name of the group that owns the file appears in the fourth column.

For example, suppose the verbose listing for a file called `cruise' looks like this:

-rwxrw-r--      1 captain   crew        8,420 Jan 12 21:42 cruise

The user who owns this file is captain, and the group that owns it is crew.

NOTE: When you create a file, it normally belongs to you and to your login group, but you can change its ownership, as described in the next recipe. You normally own all of the files in your home directory.

Changing the Ownership of a File

You can't give away a file to another user, but other users can make copies of a file that belongs to you, provided they have read permission for that file. When you make a copy of another user's file, you own the copy.

You can also change the group ownership of any file you own. To do this, use chgrp; it takes as arguments the name of the group to transfer ownership to and the names of the files to work on. You must be a member of the group you want to give ownership to.

  • To change the group ownership of file `cruise' to bridge, type:

    $ chgrp bridge cruise RET

This command transfers group ownership of `cruise' to bridge; the file's group access permissions now apply to the members of the bridge group.

Use the `-R' option to recursively change the group ownership of directories and all of their contents.

  • To give group ownership of the `maps' directory and all the files it contains to the bridge group, type:

    $ chgrp -R bridge maps RET

Controlling Access to Files

Each file has permissions that specify what type of access to the file users have. There are three kinds of permissions: read, write, and execute. You need read permission for a file to read its contents, write permission to write changes to or remove it, and execute permission to run it as a program.

Normally, users have write permission only for files in their own home directories. Only the superuser has write permission for the files in important directories, such as `/bin' and `/etc'---so as a regular user, you never have to worry about accidentally writing to or removing an important system file.

Permissions work differently for directories than for other kinds of files. Read permission for a directory means that you can see the files in the directory; write permission lets you create, move, or remove files in the directory; and execute permission lets you use the directory name in a path.

If you have read permission but not execute permission for a directory, you can only read the names of files in that directory--you can't read their other attributes, examine their contents, write to them, or execute them. With execute but not read permission for a directory, you can read, write to, or execute any file in the directory, provided that you know its name and that you have the appropriate permissions for that file.

Each file has separate permissions for three categories of users: the user who owns the file, all other members of the group that owns the file, and all other users on the system. If you are a member of the group that owns a file, the file's group permissions apply to you (unless you are the owner of the file, in which case the user permissions apply to you).

When you create a new file, it has a default set of permissions--usually read and write for the user, and read for the group and all other users. (On some systems, the default permissions are read and write for both the user and group, and read for all other users.)

The file access permissions for a file are collectively called its access mode. The following sections describe how to list and change file access modes, including how to set the most commonly used access modes.

NOTE: The superuser, root, can always access any file on the system, regardless of its access permissions.

See Info file `fileutils.info', node `File permissions', for more information on file permissions and access modes.

Listing the Permissions of a File

To list a file's access permissions, use ls with the `-l' option. File access permissions appear in the first column of the output, after the character for file type.

For example, consider the verbose listing of the file `cruise':

-rwxrw-r--      1 captain   crew        8,420 Jan 12 21:42 cruise

The first character (`-') is the file type; the next three characters (`rwx') specify permissions for the user who owns the file; and the next three (`rw-') specify permissions for all members of the group that owns the file except for the user who owns it. The last three characters in the column (`r--') specify permissions for all other users on the system.

All three permissions sections have the same format, indicating from left to right, read, write, and execute permission with `r', `w', and `x' characters. A hyphen (`-') in place of one of these letters indicates that permission is not given.

In this example, the listing indicates that the user who owns the file, captain, has read, write, and execute permission, and the group that owns the file, crew, has read and write permission. All other users on the system have only read permission.

Changing the Permissions of a File

To change the access mode of any file you own, use the chmod ("change mode") tool. It takes two arguments: an operation, which specifies the permissions to grant or revoke for certain users, and the names of the files to work on.

To build an operation, first specify the category or categories of users as a combination of the following characters:

CHARACTER CATEGORY

u - The user who owns the file.

g - All other members of the file's group.

o - All other users on the system.

a - All users on the system; this is the same as `ugo'.

Follow this with the operator denoting the action to take:


OPERATOR ACTION

+ - Add permissions to the user's existing permissions.

- - Remove permissions from the user's existing permissions.

= - Make these the only permissions the user has for this file.

Finally, specify the permissions themselves:

CHARACTER PERMISSION

r - Set read permission.

w - Set write permission.

x - Set execute permission.

For example, use `u+w' to add write permission to the existing permissions for the user who owns the file, and use `a+rw' to add both read and write permissions to the existing permissions of all users. (You could also use `ugo+rw' instead of `a+rw'.)

Write-Protecting a File

If you revoke users' write permissions for a file, they can no longer write to or remove the file. This effectively "write-protects" a file, preventing accidental changes to it. A write-protected file is sometimes called a "read only" file.

To write-protect a file so that no users other than yourself can write to it, use chmod with `go-w' as the operation.

  • To write-protect the file `cruise' so that no other users can change it, type:

    $ chmod go-w cruise RET


Making a File Private

To make a file private from all other users on the system, use chmod with `go=' as the operation. This revokes all group and other access permissions.

  • To make the file `cruise' private from all users but yourself, type:

    $ chmod go= cruise RET

Making a File Public

To allow anyone with an account on the system to read and make changes to a file, use chmod with `a+rw' as the operation. This grants read and write permission to all users, making the file "public." When a file has read permission set for all users, it is called world readable, and when a file has write permission set for all users, it is called world writable.

  • To make the file `cruise' both world readable and world writable, type:

    $ chmod a+rw cruise RET


Making a File Executable

An executable file is a file that you can run as a program. To change the permissions of a file so that all users can run it as a program, use chmod with `a+x' as the operation.

  • To give execute permission to all users for the file `myscript', type:

    $ chmod a+x myscript RET

NOTE: Often, shell scripts that you obtain or write yourself do not have execute permission set, and you'll have to do this yourself.

Finding Files

Sometimes you will need to find files on the system that match given criteria, such as name and file size. This chapter will show you how to find a file when you know only part of the file name, and how to find a file whose name matches a given pattern. You will also learn how to list files and directories by their size and to find the locations of commands.

NOTE: When you want to find files in a directory whose contents match a particular pattern, search through the files with grep.

See Info file `find.info', node `Top', for more information on finding files.

Finding All Files That Match a Pattern

The simplest way to find files is with GNU locate. Use it when you want to list all files on the system whose full path name matches a particular pattern--for example, all files with the text `audio' somewhere in their full path name, or all files ending with `ogg'; locate outputs a list of all files on the system that match the pattern, giving their full path name. When specifying a pattern, you can use any of the file name expansion characters.

  • To find all the files on the system that have the text `audio' anywhere in their name, type:

    $ locate audio RET
  • To find all the files on the system whose file names end with the text `ogg', type:

    $ locate *ogg RET
  • To find all hidden "dotfiles" on the system, type:

    $ locate /. RET

NOTE: locate searches are not case sensitive.

Sometimes, a locate search will generate a lot of output. Pipe the output to less to peruse it.


Finding Files in a Directory Tree

Use find to find specific files in a particular directory tree, specifying the name of the directory tree to search, the criteria to match, and--optionally--the action to perform on the found files. (Unlike most other tools, you must specify the directory tree argument before any other options.)

You can specify a number of search criteria, and format the output in various ways; the following sections include recipes for the most commonly used find commands, as well as a list of find's most popular options.


Finding Files in a Directory Tree by Name

Use find to find files in a directory tree by name. Give the name of the directory tree to search through, and use the `-name' option followed by the name you want to find.

  • To list all files on the system whose file name is `top', type:

    $ find / -name top RET

This command will search all directories on the system to which you have access; if you don't have execute permission for a directory, find will report that permission is denied to search the directory.

The `-name' option is case sensitive; use the similar `-iname' option to find name regardless of case.

  • To list all files on the system whose file name is `top', regardless of case, type:

    $ find / -iname top RET

This command would match any files whose name consisted of the letters `top', regardless of case--including `Top', `top', and `TOP'.

Use file expansion characters to find files whose names match a pattern. Give these file name patterns between single quotes.

  • To list all files on the system whose names begin with the characters `top', type:

    $ find / -name 'top*' RET
  • To list all files whose names begin with the three characters `top' followed by exactly three more characters, type:


    $ find / -name 'top???' RET
  • To list all files whose names begin with the three characters `top' followed by five or more characters, type:

    $ find / -name 'top?????*' RET
  • To list all files in your home directory tree that end in `.tex', regardless of case, type:

    $ find ~ -iname '*.tex' RET
  • To list all files in the `/usr/share' directory tree with the text `farm' somewhere in their name, type:

    $ find /usr/share -name '*farm*' RET

Use `-regex' in place of `-name' to search for files whose names match a regular expression, or a pattern describing a set of strings.

  • To list all files in the current directory tree whose names have either the string `net' or `comm' anywhere in their file names, type:

    $ find . -regex '.*\(net\|comm\).*' RET

NOTE: The `-regex' option matches the whole path name, relative to the directory tree you specify, and not just file names.

Finding Files in a Directory Tree by Size

To find files of a certain size, use the `-size' option, following it with the file size to match. The file size takes one of three forms: when preceded with a plus sign (`+'), it matches all files greater than the given size; when preceded with a hyphen or minus sign (`-'), it matches all files less than the given size; with neither prefix, it matches all files whose size is exactly as specified. (The default unit is 512-byte blocks; follow the size with `k' to denote kilobytes or `b' to denote bytes.)

  • To list all files in the `/usr/local' directory tree that are greater than 10,000 kilobytes in size, type:

    $ find /usr/local -size +10000k RET
  • To list all files in your home directory tree less than 300 bytes in size, type:

    $ find ~ -size -300b RET
  • To list all files on the system whose size is exactly 42 512-byte blocks, type:

    $ find / -size 42 RET

Use the `-empty' option to find empty files--files whose size is 0 bytes. This is useful for finding files that you might not need, and can remove.

  • To find all empty files in your home directory tree, type:

    $ find ~ -empty RET

NOTE: To find the largest or smallest files in a given directory, output a sorted listing of that directory.


Finding Files in a Directory Tree by Modification Time

To find files last modified during a specified time, use find with the `-mtime' or `-mmin' options; the argument you give with `-mtime' specifies the number of 24-hour periods, and with `-mmin' it specifies the number of minutes.


  • To list the files in the `/usr/local' directory tree that were modified exactly 24 hours ago, type:

    $ find /usr/local -mtime 1 RET
  • To list the files in the `/usr' directory tree that were modified exactly five minutes ago, type:

    $ find /usr -mmin 5 RET

To specify a range of time, precede the number you give with either a plus sign (`+') to match times that are equal to or greater than the given argument, or a hyphen or minus sign (`-') to match times that are equal to or less than the given argument.

  • To list the files in the `/usr/local' directory tree that were modified within the past 24 hours, type:

    $ find /usr/local -mtime -1 RET
  • To list the files in the `/usr' directory tree that were modified within the past five minutes, type:

    $ find /usr -mmin -5 RET

Include the `-daystart' option to measure time from the beginning of the current day instead of 24 hours ago.

  • To list all of the files in your home directory tree that were modified yesterday, type:

    $ find ~ -mtime 1 -daystart RET
  • To list all of the files in the `/usr' directory tree that were modified one year or longer ago, type:

    $ find /usr -mtime +356 -daystart RET
  • To list all of the files in your home directory tree that were modified from two to four days ago, type:

    $ find ~ -mtime 2 -mtime -4 -daystart RET

In the preceding example, the combined options `-mtime 2' and `-mtime -4' matched files that were modified between two and four days ago.

To find files newer than a given file, give the name of that file as an argument to the `-newer' option.

  • To find files in the `/etc' directory tree that are newer than the file `/etc/motd', type:

    $ find /etc -newer /etc/motd RET

To find files newer than a given date, use the trick described in the find Info documentation: create a temporary file in `/tmp' with touch whose timestamp is set to the date you want to search for, and then specify that temporary file as the argument to `-newer'.

  • To list all files in your home directory tree that were modified after May 4 of the current year, type:

    $ touch -t 05040000 /tmp/timestamp RET
    $ find ~ -newer /tmp/timestamp RET

In this example, a temporary file called `/tmp/timestamp' is written; after the search, you can remove it.

NOTE: You can also find files that were last accessed a number of days after they were modified by giving that number as an argument to the `-used' option. This is useful for finding files that get little use--files matching `-used +100', say, were accessed 100 or more days after they were last modified.

Finding Files in a Directory Tree by Owner

To find files owned by a particular user, give the username to search for as an argument to the `-user' option.

  • To list all files in the `/usr/local/fonts' directory tree owned by the user warwick, type:

    $ find /usr/local/fonts -user warwick RET

The `-group' option is similar, but it matches group ownership instead of user ownership.

  • To list all files in the `/dev' directory tree owned by the audio group, type:

    $ find /dev -group audio RET


Running Commands on the Files You Find

You can also use find to execute a command you specify on each found file, by giving the command as an argument to the `-exec' option. If you use the string `'{} in the command, this string is replaced with the file name of the current found file when the command executes. Mark the end of the command with the string `';.

  • To find all files in the `~/html/' directory tree with an `.html' extension, and output lines from these files that contain the string `organic', type:

    $ find ~/html/ -name '*.html' -exec grep organic '{}' ';' RET

In this example, the command grep organic file is executed for each file that find finds, with file being the name of each file in turn.

To have find pause and confirm execution for each file it finds, use `-ok' instead of `-exec'.

  • To remove files from your home directory tree that were accessed more than one year after they were last modified, pausing to confirm before each removal, type:

    $ find ~ -used +365 -ok rm '{}' ';' RET

Finding Files by Multiple Criteria

You can combine many of find's options to find files that match multiple criteria.

  • To list files in your home directory tree whose names begin with the string `top', and that are newer than the file `/etc/motd', type:

    $ find ~ -name 'top*' -newer /etc/motd RET
  • To compress all the files in your home directory tree that are two megabytes or larger, and that are not already compressed with gzip (having a `.gz' file name extension), type:

    $ find ~ -size +2000000c -regex '.*[^gz]' -exec gzip '{}' ';' RET

The following tables describe many other options you can use with find. The first table lists and describes find's general options for specifying its behavior. As you will see, find can take many different options; see its man page or its info documentation for all of them.

OPTION - DESCRIPTION

-daystart - Use the beginning of today rather than 24 hours previous for time criteria.

-depth - Search the subdirectories before each directory.

-help - Output a help message and exit.

-maxdepth levels - Specify the maximum number of directory levels to descend in the specified directory tree.

-mount or -xdev - Do not descend directories that have another disk mounted on them.

-version - Output the version number and exit.

The following table lists and describes find's options for specifying which files to find.

Specify the numeric arguments to these options in one of three ways: preceded with a plus sign (`+') to match values equal to or greater than the given argument; preceded with a hyphen or minus sign (`-') to match values equal to or less than the given argument; or give the number alone to match exactly that value.

OPTION - DESCRIPTION

-amin minutes - Time in minutes since the file was last accessed.

-anewer file - File was accessed more recently than file.

-atime days - Time in days since the file was last accessed.

-cmin minutes - Time in minutes since the file was last changed.

-cnewer file - File was changed more recently than file.

-ctime days - Days since the file was last changed.

-empty - File is empty.

-group group - Name of the group that owns file.

-iname pattern - Case-insensitive file name pattern to match (`report' matches the files `Report', `report', `REPORT', etc.).

-ipath pattern - Full path name of file matches the pattern pattern, regardless of case (`./r*rt' matches `./records/report' and `./Record-Labels/ART'.

-iregex regexp - Path name of file, relative to specified directory tree, matches the regular expression regexp, regardless of case (`t?p' matches `TIP' and `top').

-links links - Number of links to the file.

-mmin minutes - Number of minutes since the file's data was last changed.

-mtime days - Number of days since the file's data was last changed.

-name pattern - Base name of the file matches the pattern pattern.

-newer file - File was modified more recently than file.

-path pattern - Full path name of file matches the pattern pattern (`./r*rt' matches `./records/report').

-perm access mode - File's permissions are exactly access mode.

-regex regexp - Path name of file, relative to specified directory tree, matches the regular expression regexp.

-size size - File uses size space, in 512-byte blocks. Append size with `b' for bytes or `k' for kilobytes.

-type type - File is type type, where type can be `d' for directory, `f' for regular file, or `l' for symbolic link.

-user user - File is owned by user.

The following table lists and describes find's options for specifying what to do with the files it finds.

OPTION - DESCRIPTION

-exec commands - Specifies commands, separated by semicolons, to be executed on matching files. To specify the current file name as an argument to a command, use `{}'.

-ok commands - Like `-exec' but prompts for confirmation before executing commands.

-print - Outputs the name of found files to the standard output, each followed by a newline character so that each is displayed on a line of its own. On by default.

-printf format - Use "C-style" output (the same as used by the printf function in the C programming language), as specified by string format.

The following table describes the variables may be used in the format string used by the `-printf' option.

VARIABLE - DESCRIPTION

\a - Ring the system bell (called the "alarm" on older systems).

\b - Output a backspace character.

\f - Output a form feed character.

\n - Output a newline character.

\r - Output a carriage return.

\t - Output a horizontal tab character.

\\ - Output a backslash character.

%% - Output a percent sign character.

%b - Output file's size, rounded up in 512-byte blocks.

%f - Output base file name.

%h - Output the leading directories of file's name.

%k - Output file's size, rounded up in 1K blocks.

%s - Output file's size in bytes.

Finding Files in Directory Listings

The following recipes show how to find the largest and smallest files and directories in a given directory or tree by listing them by size. They also show how to find the number of files in a given directory.


Finding the Largest Files in a Directory

To find the largest files in a given directory, use ls to list its contents with the `-S' option, which sorts files in descending order by their size (normally, ls outputs files sorted alphabetically). Include the `-l' option to output the size and other file attributes.

  • To list the files in the current directory, with their attributes, sorted with the largest files first, type:

    $ ls -lS RET

NOTE: Pipe the output to less to peruse it.


Finding the Smallest Files in a Directory

To list the contents of a directory with the smallest files first, use ls with both the `-S' and `-r' options, which reverses the sorting order of the listing.

  • To list the files in the current directory and their attributes, sorted from smallest to largest, type:

    $ ls -lSr RET


Finding the Smallest Directories

To output a list of directories sorted by their size--the size of all the files they contain--use du and sort. The du tool outputs directories in ascending order with the smallest first; the `-S' option puts the size in kilobytes of each directory in the first column of output. Give the directory tree you want to output as an option, and pipe the output to sort with the `-n' option, which sorts its input numerically.

  • To output a list of the subdirectories of the current directory tree, sorted in ascending order by size, type:

    $ du -S . | sort -n RET

Finding the Largest Directories

Use the `-r' option with sort to reverse the listing and output the largest directories first.

  • To output a list of the subdirectories in the current directory tree, sorted in descending order by size, type:

    $ du -S . | sort -nr RET
  • To output a list of the subdirectories in the `/usr/local' directory tree, sorted in descending order by size, type:

    $ du -S /usr/local | sort -nr RET

Finding the Number of Files in a Listing

To find the number of files in a directory, use ls and pipe the output to `wc -l', which outputs the number of lines in its input.

  • To output the number of files in the current directory, type:

    $ ls | wc -l RET
         19
    $

In this example, the command outputs the text `19', indicating that there are 19 files in the current directory.

Since ls does not list hidden files by default, the preceding command does not count them. Use ls's `-A' option to count dot files as well.

  • To count the number of files--including dot files--in the current directory, type:

    $ ls -A | wc -l RET
         81
    $

This command outputs the text `81', indicating that there are 81 files, including hidden files, in the current directory.

To list the number of files in a given directory tree, and not just a single directory, use find instead of ls, giving the special find predicate `\! -type d' to exclude the listing (and therefore, counting) of directories.


  • To list the number of files in the `/usr/share' directory tree, type:

    $ find /usr/share \! -type d | wc -l RET
  • To list the number of files and directories in the `/usr/share' directory tree, type:

    $ find /usr/share | wc -l RET
  • To list the number of directories in the `/usr/share' directory tree, type:

    $ find /usr/share \! -type f | wc -l RET

Finding Where a Command Is Located

Use which to find the full path name of a tool or application from its base file name; when you give the base file name as an option, which outputs the absolute file name of the command that would have run had you typed it. This is useful when you are not sure whether or not a particular command is installed on the system.

  • To find out whether perl is installed on your system, and, if so, where it resides, type:

    $ which perl RET
    /usr/bin/perl

In this example, which output `/usr/bin/perl', indicating that the perl binary is installed in the `/usr/bin' directory.

NOTE: This is also useful for determining "which" binary would execute, should you type the name, since some systems may have different binaries of the same file name located in different directories. In that case, you can use which to find which one would execute.


Managing Files

File management tools include those for splitting, comparing, and compressing files, making backup archives, and tracking file revisions. Other management tools exist for determining the contents of a file, and for changing its timestamp.

Determining File Type and Format

When we speak of a file's type, we are referring to the kind of data it contains, which may include text, executable commands, or some other data; this data is organized in a particular way in the file, and this organization is called its format. For example, an image file might contain data in the JPEG image format, or a text file might contain unformatted text in the English language or text formatted in the TeX markup language.

The file tool analyzes files and indicates their type and--if known--the format of the data they contain. Supply the name of a file as an argument to file and it outputs the name of the file, followed by a description of its format and type.

  • To determine the format of the file `/usr/doc/HOWTO/README.gz', type:


$ file /usr/doc/HOWTO/README.gz RET
/usr/doc/HOWTO/README.gz: gzip compressed data, deflated, original
filename, last modified: Sun Apr 26 02:51:48 1998, os: Unix
$


This command reports that the file `/usr/doc/HOWTO/README.gz' contains data that has been compressed with the gzip tool.

To determine the original format of the data in a compressed file, use the `-z' option.

  • To determine the format of the compressed data contained in the file `/usr/doc/HOWTO/README.gz', type:
$ file -z /usr/doc/HOWTO/README.gz RET
/usr/doc/HOWTO/README.gz: English text (gzip compressed data, deflated,
original filename, last modified: Sun Apr 26 02:51:48 1998, os: Unix)
$

This command reports that the data in `/usr/doc/HOWTO/README.gz', a compressed file, is English text.

NOTE: Currently, file differentiates among more than 100 different data formats, including several human languages, many sound and graphics formats, and executable files for many different operating systems.


Changing File Modification Time

Use touch to change a file's timestamp without modifying its contents. Give the name of the file to be changed as an argument. The default action is to change the timestamp to the current time.

  • To change the timestamp of file `pizzicato' to the current date and time, type:

    $ touch pizzicato RET

To specify a timestamp other than the current system time, use the `-d' option, followed by the date and time that should be used enclosed in quote characters. You can specify just the date, just the time, or both.

  • To change the timestamp of file `pizzicato' to `17 May 1999 14:16', type:

    $ touch -d '17 May 1999 14:16' pizzicato RET
  • To change the timestamp of file `pizzicato' to `14 May', type:

    $ touch -d '14 May' pizzicato RET
  • To change the timestamp of file `pizzicato' to `14:16', type:

    $ touch -d '14:16' pizzicato RET

NOTE: When only the date is given, the time is set to `0:00'; when no year is given, the current year is used.

See Info file `fileutils.info', node `Date input formats', for more information on date input formats.


Splitting a File into Smaller Ones

It's sometimes necessary to split one file into a number of smaller ones. For example, suppose you have a very large sound file in the near-CD-quality MPEG2, level 3 ("MP3") format. Your file, `large.mp3', is 4,394,422 bytes in size, and you want to transfer it from your desktop to your laptop, but your laptop and desktop are not connected on a network--the only way to transfer files between them is by floppy disk. Because this file is much too large to fit on one floppy, you use split.

The split tool copies a file, chopping up the copy into separate files of a specified size. It takes as optional arguments the name of the input file (using standard input if none is given) and the file name prefix to use when writing the output files (using `x' if none is given). The output files' names will consist of the file prefix followed by a group of letters: `aa', `ab', `ac', and so on--the default output file names would be `xaa', `xab', and so on.

Specify the number of lines to put in each output file with the `-l' option, or use the `-b' option to specify the number of bytes to put in each output file. To specify the output files' sizes in kilobytes or megabytes, use the `-b' option and append `k' or `m', respectively, to the value you supply. If neither `-l' nor `-b' is used, split defaults to using 1,000 lines per output file.

  • To split `large.mp3' into separate files of one megabyte each, whose names begin with `large.mp3.', type:

    $ split -b1m large.mp3 large.mp3. RET

This command creates five new files whose names begin with `large.mp3.'. The first four files are one megabyte in size, while the last file is 200,118 bytes--the remaining portion of the original file. No alteration is made to `large.mp3'.

You could then copy these five files onto four floppies (the last file fits on a floppy with one of the larger files), copy them all to your laptop, and then reconstruct the original file with cat).

  • To reconstruct the original file from the split files, type:

    $ cat large.mp3.* > large.mp3 RET
    $ rm large.mp3.* RET

In this example, the rm tool is used to delete all of the split files after the original file has been reconstructed.

Comparing Files

There are a number of tools for comparing the contents of files in different ways; these recipes show how to use some of them. These tools are especially useful for comparing passages of text in files, but that's not the only way you can use them.


Determining Whether Two Files Differ

Use cmp to determine whether or not two text files differ. It takes the names of two files as arguments, and if the files contain the same data, cmp outputs nothing. If, however, the files differ, cmp outputs the byte position and line number in the files where the first difference occurs.

  • To determine whether the files `master' and `backup' differ, type:

    $ cmp master backup RET


Finding the Differences between Files

Use diff to compare two files and output a difference report (sometimes called a "diff") containing the text that differs between two files. The difference report is formatted so that other tools (namely, patch) can use it to make a file identical to the one it was compared with.

To compare two files and output a difference report, give their names as arguments to diff.

  • To compare the files `manuscript.old' and `manuscript.new', type:

    $ diff manuscript.old manuscript.new RET

The difference report is output to standard output; to save it to a file, redirect the output to the file to save to:

$ diff manuscript.old manuscript.new > manuscript.diff RET


In the preceding example, the difference report is saved to a file called `manuscript.diff'.

The difference report is meant to be used with commands such as patch, in order to apply the differences to a file. See Info file `diff.info', node `Top', for more information on diff and the format of its output.

To better see the difference between two files, use sdiff instead of diff; instead of giving a difference report, it outputs the files in two columns, side by side, separated by spaces. Lines that differ in the files are separated by `|'; lines that appear only in the first file end with a `<', and lines that appear only in the second file are preceded with a `>'.


  • To peruse the files `laurel' and `hardy' side by side on the screen, with any differences indicated between columns, type:

    $ sdiff laurel hardy | less RET

To output the difference between three separate files, use diff3.

  • To output a difference report for files `larry', `curly', and `moe', and output it in a file called `stooges', type:

    $ diff3 larry curly moe > stooges RET

Patching a File with a Difference Report

To apply the differences in a difference report to the original file compared in the report, use patch. It takes as arguments the name of the file to be patched and the name of the difference report file (or "patchfile"). It then applies the changes specified in the patchfile to the original file. This is especially useful for distributing different versions of a file--small patchfiles may be sent across networks easier than large source files.

  • To update the original file `manuscript.new' with the patchfile `manuscript.diff', type:

    $ patch manuscript.new manuscript.diff RET


Compressed Files

File compression is useful for storing or transferring large files. When you compress a file, you shrink it and save disk space. File compression uses an algorithm to change the data in the file; to use the data in a compressed file, you must first uncompress it to restore the original data (and original file size).

The following recipes explain how to compress and uncompress files.

Compressing a File

Use the gzip ("GNU zip") tool to compress files. It takes as an argument the name of the file or files to be compressed; it writes a compressed version of the specified files, appends a `.gz' extension to their file names, and then deletes the original files.

  • To compress the file `war-and-peace', type:

    $ gzip war-and-peace RET

This command compresses the file `war-and-peace', putting it in a new file named `war-and-peace.gz'; gzip then deletes the original file, `war-and-peace'.

Decompressing a File

To access the contents of a compressed file, use gunzip to decompress (or "uncompress") it.

Like gzip, gunzip takes as an argument the name of the file or files to work on. It expands the specified files, writing the output to new files without the `.gz' extensions, and then deletes the compressed files.


  • To expand the file `war-and-peace.gz', type:

    $ gunzip war-and-peace.gz RET

This command expands the file `war-and-peace.gz' and puts it in a new file called `war-and-peace'; gunzip then deletes the compressed file, `war-and-peace.gz'.

NOTE: You can view a compressed text file without uncompressing it by using zless. This is useful when you want to view a compressed file but do not want to write changes to it.


File Archives

An archive is a single file that contains a collection of other files, and often directories. Archives are usually used to transfer or make a backup copy of a collection of files and directories--this way, you can work with only one file instead of many. This single file can be easily compressed as explained in the previous section, and the files in the archive retain the structure and permissions of the original files.

Use the tar tool to create, list, and extract files from archives. Archives made with tar are sometimes called "tar files," "tar archives," or--because all the archived files are rolled into one---"tarballs."

The following recipes show how to use tar to create an archive, list the contents of an archive, and extract the files from an archive. Two common options used with all three of these operations are `-f' and `-v': to specify the name of the archive file, use `-f' followed by the file name; use the `-v' ("verbose") option to have tar output the names of files as they are processed. While the `-v' option is not necessary, it lets you observe the progress of your tar operation.

NOTE: The name of this tool comes from "tape archive," because it was originally made to write the archives directly to a magnetic tape device. It is still used for this purpose, but today, archives are almost always saved to a file on disk.

See Info file `tar.info', node `Top', for more information about managing archives with tar.


Creating a File Archive

To create an archive with tar, use the `-c' ("create") option, and specify the name of the archive file to create with the `-f' option. It's common practice to use a name with a `.tar' extension, such as `my-backup.tar'.

Give as arguments the names of the files to be archived; to create an archive of a directory and all of the files and subdirectories it contains, give the directory's name as an argument.

  • To create an archive called `project.tar' from the contents of the `project' directory, type:

    $ tar -cvf project.tar project RET

This command creates an archive file called `project.tar' containing the `project' directory and all of its contents. The original `project' directory remains unchanged.

Use the `-z' option to compress the archive as it is being written. This yields the same output as creating an uncompressed archive and then using gzip to compress it, but it eliminates the extra step.

  • To create a compressed archive called `project.tar.gz' from the contents of the `project' directory, type:

    $ tar -zcvf project.tar.gz project RET

This command creates a compressed archive file, `project.tar.gz', containing the `project' directory and all of its contents. The original `project' directory remains unchanged.

NOTE: When you use the `-z' option, you should specify the archive name with a `.tar.gz' extension and not a `.tar' extension, so the file name shows that the archive is compressed. This is not a requirement, but it serves as a reminder and is the standard practice.

Listing the Contents of an Archive

To list the contents of a tar archive without extracting them, use tar with the `-t' option.

  • To list the contents of an archive called `project.tar', type:

    $ tar -tvf project.tar RET

This command lists the contents of the `project.tar' archive. Using the `-v' option along with the `-t' option causes tar to output the permissions and modification time of each file, along with its file name--the same format used by the ls command with the `-l' option.

Include the `-z' option to list the contents of a compressed archive.

  • To list the contents of a compressed archive called `project.tar.gz', type:

    $ tar -ztvf project.tar RET

Extracting Files from an Archive

To extract (or unpack) the contents of a tar archive, use tar with the `-x' ("extract") option.

  • To extract the contents of an archive called `project.tar', type:

    $ tar -xvf project.tar RET

This command extracts the contents of the `project.tar' archive into the current directory.

If an archive is compressed, which usually means it will have a `.tar.gz' or `.tgz' extension, include the `-z' option.


  • To extract the contents of a compressed archive called `project.tar.gz', type:

    $ tar -zxvf project.tar.gz RET

NOTE: If there are files or subdirectories in the current directory with the same name as any of those in the archive, those files will be overwritten when the archive is extracted. If you don't know what files are included in an archive, consider listing the contents of the archive first.

Another reason to list the contents of an archive before extracting them is to determine whether the files in the archive are contained in a directory. If not, and the current directory contains many unrelated files, you might confuse them with the files extracted from the archive.

To extract the files into a directory of their own, make a new directory, move the archive to that directory, and change to that directory, where you can then extract the files from the archive.


Tracking Revisions to a File

The Revision Control System (RCS) is a set of tools for managing multiple revisions of a single file.

To store a revision of a file so that RCS can keep track of it, you check in the file with RCS. This deposits the revision of the file in an RCS repository---a file that RCS uses to store all changes to that file. RCS makes a repository file with the same file name as the file you are checking in, but with a `,v' extension appended to the name. For example, checking in the file `foo.text' with RCS creates a repository file called `foo.text,v'.

Each time you want RCS to remember a revision of a file, you check in the file, and RCS writes to that file's RCS repository the differences between the file and the last revision on record in the repository.

To access a revision of a file, you check out the revision from RCS. The revision is obtained from the file's repository and is written to the current directory.

Although RCS is most often used with text files, you can also use it to keep track of revisions made to other kinds of files, such as image files and sound files.

Another revision control system, Concurrent Versions System (CVS), is used for tracking collections of multiple files whose revisions are made concurrently by multiple authors. While much less simple than RCS, it is very popular for managing free software projects on the Internet. See Info file `cvs.info', node `Top', for information on using CVS.


Checking In a File Revision

When you have a version of a file that you want to keep track of, use ci to check in that file with RCS.

Type ci followed by the name of a file to deposit that file into the RCS repository. If the file has never before been checked in, ci prompts for a description to use for that file; each subsequent time the file is checked in, ci prompts for text to include in the file's revision log. Log messages may contain more than one line of text; type a period (`.') on a line by itself to end the entry.

For example, suppose the file `novel' contains this text:

This is a tale about many things, including a long voyage across
America.


  • To check in the file `novel' with RCS, type:

    $ ci novel RET
    novel,v  <--  novel
    enter description, terminated with single '.' or end of file:
    NOTE: This is NOT the log message!
    >> The Great American Novel. RET
    >> . RET
    $

This command deposits the file in an RCS repository file called `novel,v', and the original file, `novel', is removed. To edit or access the file again, you must check out a revision of the file from RCS with which to work.

Whenever you have a new revision that you want to save, use ci as before to check in the file. This begins the process all over again.

For example, suppose you have checked out the first revision of `novel' and changed the file so that it now looks like this:

This is a very long tale about a great many things, including my long
voyage across America, and back home again. 


  • To deposit this revision in RCS, type:

    $ ci novel RET
    novel,v  <--  novel
    new revision: 1.2; previous revision: 1.1
    enter log message, terminated with single '.' or end of file:
    >> Second draft. RET
    >> . RET
    $

If you create a subdirectory called `RCS' (in all uppercase letters) in the current directory, RCS recognizes this specially named directory instead of the current directory as the place to store the `,v' revision files. This helps reduce clutter in the directory you are working in.

If the file you are depositing is a text file, you can have RCS insert a line of text, every time the file is checked out, containing the name of the file, the revision number, the date and time in the UTC (Coordinated Universal Time) time zone, and the user ID of the author. To do this, put the text `$'Id$ at a place in the file where you want this text to be written. You only need to do this once; each time you check the file out, RCS replaces this string in the file with the header text.

For example, this chapter was written to a file, `managing-files.texinfo', whose revisions were tracked with RCS; the `$'Id$ string in this file currently reads:

Checking Out a File Revision

Use co to check out a revision of a file from an RCS repository.

To check out the latest revision of a file that you intend to edit (and to check in later as a new revision), use the -l (for "lock") option. Locking a revision in this fashion prevents overlapping changes being made to the file should another revision be accidentally checked out before this revision is checked in.

  • To check out the latest revision of the file `novel' for editing, type:

    $ co -l novel RET

This command checks out the latest revision of file `novel' from the `novel,v' repository, writing it to a file called `novel' in the current directory. (If a file with that name already exists in the current directory, co asks whether or not to overwrite the file.) You can make changes to this file and then check it in as a new revision.

You can also check out a version of a file as read only, where changes cannot be written to it. Do this to check out a version to view only and not to edit.

To check out the current version of a file for examination, type co followed by the name of the file.

  • To check out the current revision of file `novel', but not permit changes to it, type:

    $ co novel RET

This command checks out the latest revision of the file `novel' from the RCS repository `novel,v' (either from the current directory or in a subdirectory named `RCS').

To check out a version other than the most recent version, specify the version number to check out with the `-r' option. Again, use the `-l' option to allow the revision to be edited.

  • To check out revision 1.14 of file `novel', type:

    $ co -l -r1.14 novel RET

NOTE: Before checking out an old revision of a file, remember to check in the latest changes first, or they may be lost.

Viewing a File's Revision Log

Use rlog to view the RCS revision log for a file--type rlog followed by the name of a file to list all of the revisions of that file.

  • To view the revision log for file `novel', type:
$ rlog novel RET

RCS file: novel,v
Working file: novel
head: 1.2
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 2;     selected revisions: 2
description:
The Great American Novel.
----------------------------
revision 1.2
date: 1991/06/20 15:31:44;  author: leo;  state: Exp;  lines: +2 -2
Second draft.
----------------------------
revision 1.1
date: 1991/06/21 19:03:58;  author: leo;  state: Exp;
Initial revision
====================================================================
$

This command outputs the revision log for the file `novel'; it lists information about the RCS repository, including its name (`novel,v') and the name of the actual file (`novel'). It also shows that there are two revisions--the first, which was checked in to RCS on 20 June 1991, and the second, which was checked in to RCS the next day, on 21 June 1991.