Sunday, December 6, 2009

Different Kinds of Shells

The great-grandfather of all shells is /bin/sh, called simply sh or the Bourne Shell, named after its developer, Steven Bourne. When it was first introduced in the mid-1970s, this was almost a godsend as it allowed interaction with the operating system. This is the "standard" shell that you will find on every version in UNIX (at least all those I have seen). Although many changes have been made to UNIX, sh has remained basically unchanged.

All the capabilities of "the shell" I've talked about so far apply to sh. Anything I've talked about that sh can do, the others can do as well. So rather than going on about what sh can do (which I already did), I am going to talk about the characteristics of some other shells.

Later, I am going to talk about the C-Shell, which kind of throws a monkey wrench into this entire discussion. Although the concepts are much the same between the C-Shell and other shells, the constructs are often quite different. On the other hand, the other shells are extensions of the Bourne Shell, so the syntax and constructs are basically the same.

Be careful here. This is one case in which I have noticed that the various versions of Linux are different. Not every shell is in every version. Therefore, the shells I am going to talk about may not be in your distribution. Have no fear! If there is a feature that you really like, you can either take the source code from one of the other shells and add it or you can find the different shells all over the Internet, which is much easier.

Linux includes several different shells and we will get into the specific of many of them as we move along. In addition, many different shells are available as either public domain, shareware, or commercial products that you can install on Linux.

As I mentioned earlier, environment variables are set up for you as you are logging in or you can set them up later. Depending on the shell you use, the files used and where they are located is going to be different. Some variables are made available to everyone on the system and are accessed through a common file. Others reside in the user's home directory.

Normally, the files residing in a users home directory can be modified. However, a system administrator may wish to prevent users from doing so. Often, menus are set up in these files to either make things easier for the user or to prevent the user from getting to the command line. (Often users never need to get that far.) In other cases, environment variables that shouldn't be changed need to be set up for the user.

One convention I will be using here is how I refer to the different shells. Often, I will say "the bash" or just "bash" to refer to the Bourne-Again Shell as a concept and not the program /bin/bash. I will use "bash" to refer to the "Bourne Shell" as an abstract entity and not specifically to the program /bin/sh.

Why the Bourne-Again Shell? Well, this shell is compatible with the Bourne Shell, but has many of the same features as both the Korn Shell (ksh) and C-Shell (csh). This is especially important to me as I flail violently when I don't have a Korn Shell.

Most of the issues I am going to address here are detailed in the appropriate man-pages and other documents. Why cover them here? Well, in keeping with one basic premise of this book, I want to show you the relationships involved. In addition, many of the things we are going to look at are not emphasized as much as they should be. Often, users will go for months or years without learning the magic that these shells can do.

Only one oddity really needs to be addressed: the behavior of the different shells when moving through symbolic links. As I mentioned before, symbolic links are simply pointers to files or directories elsewhere on the system. If you change directories into symbolic links, your "location" on the disk is different than what you might think. In some cases, the shell understands the distinction and hides from you the fact that you are somewhere else. This is where the problem lies.

Although the concept of a symbolic link exists in most versions of UNIX, it is a relatively new aspect. As a result, not all applications and programs behave in the same way. Let's take the directory /usr/spool as an example. Because it contains a lot of administrative information, it is a useful and commonly accessed directory. It is actually a symbolic link to /var/spool. If we are using ash as our shell, when we do a cd /usr/spool and then pwd, the system responds with: /var/spool. This is where we are "physically" located, despite the fact that we did a cd /usr/spool. If we do a cd .. (to move up to our parent directory), we are now located in /var. All this seems logical. This is also the behavior of csh and sh on some systems.

If we use bash, things are different. This time, when we do a cd /usr/spool and then pwd, the system responds with /usr/spools. This is where we are "logically". If we now do a cd .., we are located in /usr. Which of these is the "correct" behavior? Well, I would say both. There is nothing to define what the "correct" behavior is. Depending on your preference, either is correct. I tend to prefer the behavior of ksh. However, the behavior of ash is also valid.

Pipes and Redirection

Perhaps the most commonly used character is "|", which is referred to as the pipe symbol, or simply pipe. This enables you to pass the output of one command through the input of another. For example, say you would like to do a long directory listing of the /bin directory. If you type ls -l and then press Enter, the names flash by much too fast for you to read. When the display finally stops, all you see is the last twenty entries or so.

If instead we ran the command

ls -l | more

the output of the ls command will be "piped through more". In this way, we can scan through the list a screenful at a time.

In our discussion of standard input and standard output in Chapter 1, I talked about standard input as being just a file that usually points to your terminal. In this case, standard output is also a file that usually points to your terminal. The standard output of the ls command is changed to point to the pipe, and the standard input of the more command is changed to point to the pipe as well.

The way this works is that when the shell sees the pipe symbol, it creates a temporary file on the hard disk. Although it does not have a name or directory entry, it takes up physical space on the hard disk. Because both the terminal and the pipe are seen as files from the perspective of the operating system, all we are saying is that the system should use different files instead of standard input and standard output.

Under Linux (as well as other UNIX dialects), there exist the concepts of standard input, standard output, and standard error. When you log in and are working from the command line, standard input is taken from your terminal keyboard and both standard output and standard error are sent to your terminal screen. In other words, the shell expects to be getting its input from the keyboard and showing the output (and any error messages) on the terminal screen.

Actually, the three (standard input, standard output, and standard error) are references to files that the shell automatically opens. Remember that in UNIX, everything is treated as a file. When the shell starts, the three files it opens are usually the ones pointing to your terminal.

When we run a command like cat, it gets input from a file that it displays to the screen. Although it may appear that the standard input is coming from that file, the standard input (referred to as stdin) is still the keyboard. This is why when the file is large enough and you are using something like more to display the file one screen at a time and it stops after each page, you can continue by pressing either the Spacebar or Enter key. That's because standard input is still the keyboard.

As it is running, more is displaying the contents of the file to the screen. That is, it is going to standard output (stdout). If you try to do a more on a file that does not exist, the message

file_name: No such file or directory

shows up on your terminal screen as well. However, although it appears to be in the same place, the error message was written to standard error (stderr). (I'll show how this differs shortly.)

One pair of characters that is used quite often, "<" and ">," also deal with stdin and stdout. The more common of the two, ">," redirects the output of a command into a file. That is, it changes standard output. An example of this would be ls /bin > myfile. If we were to run this command, we would have a file (in my current directory) named myfile that contained the output of the ls /bin command. This is because stdout is the file myfile and not the terminal. Once the command completes, stdout returns to being the terminal. What this looks like graphically, we see in the figure below.


Now, we want to see the contents of the file. We could simply say more myfile, but that wouldn't explain about redirection. Instead, we input

more
This tells the more command to take its standard input from the file myfile instead of from the keyboard or some other file. (Remember, even when stdin is the keyboard, it is still seen as a file.)

What about errors? As I mentioned, stderr appears to be going to the same place as stdout. A quick way of showing that it doesn't is by using output redirection and forcing an error. If wanted to list two directories and have the output go to a file, we run this command:

ls /bin /jimmo > /tmp/junk

We then get this message:

/jimmo not found

However, if we look in /tmp, there is indeed a file called junk that contains the output of the ls /bin portion of the command. What happened here was that we redirected stdout into the file /tmp/junk. It did this with the listing of /bin. However, because there was no directory /jimmo (at least not on my system), we got the error /jimmo not found. In other words, stdout went into the file, but stderr still went to the screen.

If we want to get the output and any error messages to go to the same place, we can do that. Using the same example with ls, the command would be:

ls /bin /jimmo > /tmp/junk 2>&1

The new part of the command is 2>&1, which says that file descriptor 2 (stderr) should go to the same place as file descriptor 1 (stdout). By changing the command slightly

ls /bin /jimmo > /tmp/junk 2>/tmp/errors

we can tell the shell to send any errors someplace else. You will find quite often in shell scripts throughout the system that the file that error messages are sent to is /dev/null. This has the effect of ignoring the messages completely. They are neither displayed on the screen nor sent to a file.

Note that this command does not work as you would think:

ls /bin /jimmo 2>&1 > /tmp/junk

The reason is that we redirect stderr to the same place as stdout before we redirect stdout. So, stderr goes to the screen, but stdout goes to the file specified.

Redirection can also be combined with pipes like this:

sort < names | head

or

ps | grep sh > ps.save

In the first example, the standard input of the sort command is redirected to point to the file names. Its output is then passed to the pipe. The standard input of the head command (which takes the first ten lines) also comes from the pipe. This would be the same as the command

sort names | head

which we see here:


In the second example, the ps command (process status) is piped through grep and all of the output is redirected to the file ps.save.

If we want to redirect stderr, we can. The syntax is similar:

command 2> file

It's possible to input multiple commands on the same command line. This can be accomplished by using a semi-colon (;) between commands. I have used this on occasion to create command lines like this:

man bash | col -b > man.tmp; vi man.tmp; rm man.tmp

This command redirects the output of the man-page for bash into the file man.tmp. (The pipe through col -b is necessary because of the way the man-pages are formatted.) Next, we are brought into the vi editor with the file man.tmp. After I exit vi, the command continues and removes my temporary file man.tmp. (After about the third time of doing this, it got pretty monotonous, so I created a shell script to do this for me. I'll talk more about shell scripts later.)

Quotes

One last issue that causes its share of confusion is quotes. In Linux, there are three kinds of quotes: double-quotes ("), single-quotes ('), and back-quotes(``) (also called back-ticks). On most US keyboards, the single-quotes and double-quotes are on the same key, with the double-quotes accessed by pressing Shift and the single-quote key. Usually this key is on the right-hand side of the keyboard, next to the Enter key. On a US-American keyboard the back-quote is usually in the upper left-hand corner of the keyboard, next to the 1.

To best understand the difference between the behavior of these quotes, I need to talk about them in reverse order. I will first describe the back-quotes, or back-ticks.

When enclosed inside back-ticks, the shell interprets something to mean "the output of the command inside the back-ticks." This is referred to as command substitution, as the output of the command inside the back-ticks is substituted for the command itself. This is often used to assign the output of a command to a variable. As an example, lets say we wanted to keep track of how many files are in a directory. From the command line, we could say

ls | wc

The wc command gives me a word count, along with the number of lines and number of characters. The | is a "pipe" symbol that is used to pass the output of one command through another. In this example, the output of the ls command is passed or piped through wc. Here, the command might come up as:

7 7 61

However, once the command is finished and the value has been output, we can only get it back again by rerunning the command. Instead, If we said:

count=`ls |wc`

The entire line of output would be saved in the variable count. If we then say echo $count, we get

7 7 61

showing me that count now contains the output of that line. If we wanted, we could even assign a multi-line output to this variable. We could use the ps command, like this

trash=`ps`

then we could type in

echo $trash

which gives us:

PID TTY TIME CMD 29519 pts/6 00:00:00 bash 12565 pts/6 00:00:00 ps

This is different from the output that ps would give when not assigned to the variable trash:

PID TTY TIME CMD
29519 pts/6 00:00:00 bash
12564 pts/6 00:00:00 ps


The next kind of quote, the single-quote ('), tells the system not to do any expansion at all. Lets take the example above, but this time, use single quotes:

count='ls |wc'

If we were to now type

echo $count

we would get

ls |wc

And what we got was exactly what we expected. The shell did no expansion and simply assigned the literal string "ls | wc" to the variable count. This even applies to the variable operator "$." For example, if we simply say

echo '$LOGNAME'

what comes out on the screen is

$LOGNAME

No expansion is done at all and even the "$" is left unchanged.

The last set of quotes is the double-quote. This has partially the same effect as single-quotes, but to a limited extent. If we include something inside of double-quotes, everything loses its special meaning except for the variable operator ($), the back-slash (\), the back-tick (`), and the double-quote itself. Everything else takes on its absolute meaning. For example, we could say

echo "`date`"

which gives us

Wed Feb 01 16:39:30 PST 1995

This is a round-about way of getting the date, but it is good for demonstration purposes. Plus, I often use this in shell scripts when I want to log something and keep track of the date. Remember that the back-tick first expands the command (by running it) and then the echo echoes it to the screen.

That pretty much wraps up the quote characters. For details on other characters that have special meaning to the shell check out the section on regular expressions. You can get more details from any number of references books on Linux or UNIX in general (if you need it). However, the best way to see what's happening is to try a few combinations and see if they behave as you expect.

Previously, I mentioned that some punctuation marks have special meaning, such as *, ?, and [ ]. In fact, most of the other punctuation marks have special meaning, as well. We'll get into more detail about them in the section on basic shell scripting.

It may happen that you forget to close the quotes, and you end up on a new line that starts with (typically) a greater than symbol >. This is the secondary prompt (PS2) and is simply telling you that your previous line continues. You can continue the line and the close the quotes later, like this:

VAR="Now is the time for all good admins
> to come to the aid of their operating system."

It is as if you wrote the entire line at once.

Sometimes it is necessary to include the literal quotes in your output variable. This is a problem because your shell interprets the quotes before assinging the value to the variable. To get around this you need to "escape" or "protect" the quotes using a backslash", like this:

echo \"hello, world\"

Regular Expressions and Metacharacters

Often, the arguments that you pass to commands are file names. For example, if you wanted to edit a file called letter, you could enter the command vi letter. In many cases, typing the entire name is not necessary. Built into the shell are special characters that it will use to expand the name. These are called metacharacters.

The most common metacharacter is *. The * is used to represent any number of characters, including zero. For example, if we have a file in our current directory called letter and we input

vi let*

the shell would expand this to

vi letter

Or, if we had a file simply called let, this would match as well.

Instead, what if we had several files called letter.chris, letter.daniel, and letter.david? The shell would expand them all out to give me the command

vi letter.chris letter.daniel letter.david

We could also type in vi letter.da*, which would be expanded to

vi letter.daniel letter.david

If we only wanted to edit the letter to chris, we could type it in as vi *chris. However, if there were two files, letter.chris and note.chris, the command vi *chris would have the same results as if we typed in:

vi letter.chris note.chris

In other words, no matter where the asterisk appears, the shell expands it to match every name it finds. If my current directory contained files with matching names, the shell would expand them properly. However, if there were no matching names, file name expansion couldn't take place and the file name would be taken literally.

For example, if there were no file name in our current directory that began with letter, the command

vi letter*

could not be expanded and we would end up editing a new file called (literally) letter*, including the asterisk. This would not be what we wanted.

What if we had a subdirectory called letters? If it contained the three files letter.chris, letter.daniel, and letter.david, we could get to them by typing

vi letters/letter*

. This would expand to be:

vi letters/letter.chris letters/letter.daniel letters/letter.david

The same rules for path names with commands also apply to files names. The command

vi letters/letter.chris

is the same as

vi ./letters/letter.chris

which as the same as

vi /home/jimmo/letters/letter.chris

This is because the shell is doing the expansion before it is passed to the command. Therefore, even directories are expanded. And the command

vi le*/letter.*

could be expanded as both letters/letter.chris and lease/letter.joe., or any similar combination

The next wildcard is ?. This is expanded by the shell as one, and only one, character. For example, the command vi letter.chri? is the same as vi letter.chris. However, if we were to type in vi letter.chris? (note that the "?" comes after the "s" in chris), the result would be that we would begin editing a new file called (literally) letter.chris?. Again, not what we wanted. This wildcard could be used if, for example, there were two files named letter.chris1 and letter.chris2. The command vi letter.chris? would be the same as

vi letter.chris1 letter.chris2

Another commonly used metacharacter is actually a pair of characters: [ ]. The square brackets are used to represent a list of possible characters. For example, if we were not sure whether our file was called letter.chris or letter.Chris, we could type in the command as: vi letter.[Cc]hris. So, no matter if the file was called letter.chris or letter.Chris, we would find it. What happens if both files exist? Just as with the other metacharacters, both are expanded and passed to vi. Note that in this example, vi letter.[Cc]hris appears to be the same as vi letter.?hris, but it is not always so.

The list that appears inside the square brackets does not have to be an upper- and lowercase combination of the same letter. The list can be made up of any letter, number, or even punctuation. (Note that some punctuation marks have special meaning, such as *, ?, and [ ], which we will cover shortly.) For example, if we had five files, letter.chris1-letter.chris5, we could edit all of them with vi letter.chris[12435].

A nice thing about this list is that if it is consecutive, we don't need to list all possibilities. Instead, we can use a dash (-) inside the brackets to indicate that we mean a range. So, the command

vi letter.chris[12345]

could be shortened to

vi letter.chris[1-5]

What if we only wanted the first three and the last one? No problem. We could specify it as

vi letter.chris[1-35]

This does not mean that we want files letter.chris1 through letter.chris35! Rather, we want letter.chris1, letter.chris2, letter.chris3, and letter.chris5. All entries in the list are seen as individual characters.

Inside the brackets, we are not limited to just numbers or just letters. we can use both. The command vi letter.chris[abc123] has the potential for editing six files: letter.chrisa, letter.chrisb, letter.chrisc, letter.chris1, letter.chris2, and letter.chris3.

If we are so inclined, we can mix and match any of these metacharacters any way we want. We can even use them multiple times in the same command. Let's take as an example the command

vi *.?hris[a-f1-5]

Should they exist in our current directory, this command would match all of the following:

letter.chrisa note.chrisa letter.chrisb note.chrisb letter.chrisc
note.chrisc letter.chrisd note.chrisd letter.chrise note.chrise
letter.chris1 note.chris1 letter.chris2 note.chris2 letter.chris3
note.chris3 letter.chris4 note.chris4 letter.chris5 note.chris5
letter.Chrisa note.Chrisa letter.Chrisb note.Chrisb letter.Chrisc
note.Chrisc letter.Chrisd note.Chrisd letter.Chrise note.Chrise
letter.Chris1 note.Chris1 letter.Chris2 note.Chris2 letter.Chris3
note.Chris3 letter.Chris4 note.Chris4 letter.Chris5 note.Chris5

Also, any of these names without the leading letter or note would match. Or, if we issued the command:

vi *.d*

these would match

letter.daniel note.daniel letter.david note.david

Remember, I said that the shell expands the metacharacters only with respect to the name specified. This obviously works for file names as I described above. However, it also works for command names as well.

If we were to type dat* and there was nothing in our current directory that started with dat, we would get a message like

dat*: not found

However, if we were to type /bin/dat*, the shell could successfully expand this to be /bin/date, which it would then execute. The same applies to relative paths. If we were in / and entered ./bin/dat* or bin/dat*, both would be expanded properly and the right command would be executed. If we entered the command /bin/dat[abcdef], we would get the right response as well because the shell tries all six letters listed and finds a match with /bin/date.

An important thing to note is that the shell expands as long as it can before it attempts to interpret a command. I was reminded of this fact by accident when I input /bin/l*. If you do an

ls /bin/l*

you should get the output:

-rwxr-xr-x 1 root root 22340 Sep 20 06:24 /bin/ln
-r-xr-xr-x 1 root root 25020 Sep 20 06:17 /bin/login
-rwxr-xr-x 1 root root 47584 Sep 20 06:24 /bin/ls

At first, I expected each one of the files in /bin that began with an "l" (ell) to be executed. Then I remembered that expansion takes place before the command is interpreted. Therefore, the command that I input, /bin/l*, was expanded to be

/bin/ln /bin/login /bin/ls

Because /bin/ln was the first command in the list, the system expected that I wanted to link the two files together (what /bin/ln is used for). I ended up with error message:

/bin/ln: /bin/ls: File exists

This is because the system thought I was trying to link the file /bin/login to /bin/ls, which already existed. Hence the message.

The same thing happens when I input /bin/l? because the /bin/ln is expanded first. If I issue the command /bin/l[abcd], I get the message that there is no such file. If I type in

/bin/l[a-n]

I get:

/bin/ln: missing file argument

because the /bin/ln command expects two file names as arguments and the only thing that matched is /bin/ln.

I first learned about this aspect of shell expansion after a couple of hours of trying to extract a specific subdirectory from a tape that I had made with the cpio command. Because I made the tape using absolute paths, I attempted to restore the files as /home/jimmo/letters/*. Rather than restoring the entire directory as I expected, it did nothing. It worked its way through the tape until it got to the end and then rewound itself without extracting any files.

At first I assumed I made a typing error, so I started all over. The next time, I checked the command before I sent it on its way. After half an hour or so of whirring, the tape was back at the beginning. Still no files. Then it dawned on me that hadn't told the cpio to overwrite existing files unconditionally. So I started it all over again.

Now, those of you who know cpio realize that this wasn't the issue either. At least not entirely. When the tape got to the right spot, it started overwriting everything in the directory (as I told it to). However, the files that were missing (the ones that I really wanted to get back) were still not copied from the backup tape.

The next time, I decided to just get a listing of all the files on the tape. Maybe the files I wanted were not on this tape. After a while it reached the right directory and lo and behold, there were the files that I wanted. I could see them on the tape, I just couldn't extract them.

Well, the first idea that popped into my mind was to restore everything. That's sort of like fixing a flat tire by buying a new car. Then I thought about restoring the entire tape into a temporary directory where I could then get the files I wanted. Even if I had the space, this still seemed like the wrong way of doing things.

Then it hit me. I was going about it the wrong way. The solution was to go ask someone what I was doing wrong. I asked one of the more senior engineers (I had only been there less than a year at the time). When I mentioned that I was using wildcards, it was immediately obvious what I was doing wrong (obvious to him, not to me).

Lets think about it for a minute. It is the shell that does the expansion, not the command itself (like when I ran /bin/l*). The shell interprets the command as starting with /bin/l. Therefore, I get a listing of all the files in /bin that start with "l". With cpio , the situation is similar.

When I first ran it, the shell interpreted the files (/home/jimmo/data/*) before passing them to cpio. Because I hadn't told cpio to overwrite the files, it did nothing. When I told cpio to overwrite the files, it only did so for the files that it was told to. That is, only the files that the shell saw when it expanded /home/jimmo/data/*. In other words, cpio did what it was told. I just told it to do something that I hadn't expected.

The solution is to find a way to pass the wildcards to cpio. That is, the shell must ignore the special significance of the asterisk. Fortunately, there is a way to do this. By placing a back-slash (\) before the metacharacter, you remove its special significance. This is referred to as "escaping" that character.

So, in my situation with cpio, when I referred to the files I wanted as /home/jimmo/data/\*, the shell passed the arguments to cpio as /home/jimmo/data/*. It was then cpio that expanded the * to mean all the files in that directory. Once I did that, I got the files I wanted.

You can also protect the metacharacters from being expanded by enclosing the entire expression in single quotes. This is because it is the shell that first expands wildcard before passing them to the program. Note also that if the wild card cannot be expanded, the entire expression (including the metacharacters) is passed as an argument to the program. Some programs are capable of expanding the metacharacters themselves.

As in places, other the exclamation mark (!) has a special meaning. (That is, it is also a metacharacter) When creating a regular expression, the exclamation mark is used to negate a set of characters. For example, if we wanted to list all files that did not have a number at the end, we could do something like this
ls *[!0-9]

This is certainly faster than typing this
ls *[a-zA-z]

However, this second example does not mean the same thing. In the first case, we are saying we do not want numbers. In the second case, we are saying we only want letters. There is a key difference because in the second case we do not include the punctuation marks and other symbols.

Another symbol with special meaning is the dollar sign ($). This is used as a marker to indicate that something is a variable. I mentioned earlier in this section that you could get access to your login name environment variable by typing:

echo $LOGNAME

The system stores your login name in the environment variable LOGNAME (note no "$"). The system needs some way of knowing that when you input this on the command line, you are talking about the variable LOGNAME and not the literal string LOGNAME. This is done with the "$".Several variables are set by the system. You can also set variables yourself and use them later on. I'll get into more detail about shell variables later.

So far, we have been talking about metacharacters used for searching the names of files. However, metacharacters can often be used in the arguments to certain commands. One example is the grep command, which is used to search for strings within files. The name grep comes from Global Regular Expression Print (or Parser). As its name implies, it has something to do with regular expressions. Lets assume we have a text file called documents, and we wish to see if the string "letter" exists in that text. The command might be

grep letter documents

This will search for and print out every line containing the string "letter." This includes such things as "letterbox," "lettercarrier," and even "love-letter." However, it will not find "Letterman," because we did not tell grep to ignore upper- and lowercase (using the -i option). To do so using regular expressions, the command might look like this

grep [Ll]etter documents

Now, because we specified to look for either "L" or "l" followed by "etter," we get both "letter" and "Letterman." We can also specify that we want to look for this string only when it appears at the beginning of a line using the caret (^) symbol. For example

grep ^[Ll]etter documents

This searches for all strings that start with the "beginning-of-line," followed by either "L" or "l," followed by "etter." Or, if we want to search for the same string at the end of the line, we would use the dollar sign to indicate the end of the line. Note that at the beginning of a string, the dollar sign is treated as the beginning of the string, whereas at the end of a string, it indicates the end of the line. Confused? Lets look at an example. Lets define a string like this:

VAR=^[Ll]etter

If we echo that string, we simply get ^[Ll]etter. Note that this includes the caret at the beginning of the string. When we do a search like this

grep $VAR documents

it is equivalent to

grep ^[Ll]etter documents

Now, if write the same command like this

grep $VAR$ documents

This says to find the string defined by the VAR variable(^[Ll]etter) , but only if it is at the end of the line. Here we have an example, where the dollar sign has both meanings. If we then take it one step further:

grep ^$VAR$ documents

This says to find the string defined by the VAR variable, but only if it takes up the entry line. In other words, the line consists only of the beginning of the line (^), the string defined by VAR, and the end of the line ($).

Here I want to side step a little. When you look at the variable $VAR$ it might be confusing to some people. Further, if you were to combine this variable with other characters you may end with something you do not expect because the shell decides to include as part of the variable name. To prevent this, it is a good idead to include the variable name within curly-braces, like this:

${VAR}$

The curly-braces tell the shell what exactly belongs to the variable name. I try to always include the variable name within curly-braces to ensure that there is no confusion. Also, you need to use the curly-braces when comining variables like this:

${VAR1}${VAR2}

Often you need to match a series of repeated characters, such as spaces, dashes and so forth. Although you could simply use the asterisk to specify any number of that particular character, you can run into problems on both ends. First, maybe you want to match a minimum number of that character. This could easily solved by first repeating that character a certain number of times before you use the wildcard. For example, the expression

====*

would match at least three equal signs. Why three? Well, we have explicitly put in three equal signs and the wildcard follows the fourth. Since the asterisk can be zero or more, it could mean zero and therefore the expression would only match three.

The next problem occurs when we want to limit the maximum number of characters that are matched. If you know exactly how many to match, you could simply use that many characters. What do you do if you have a minimum and a maximum? For this, you enclose the range with curly-braces: {min,max}. For example, to specify at least 5 and at most 10, it would look like this: {5,10}. Keep in mind that the curly braces have a special meaning for the shell, so we would need to escape them with a back-slash when using them on the command line. So, lets say we wanted to search a file for all number combinations between 5 and 10 number long. We might have something like this:

grep "[0-9]\{5,10\}" FILENAME

This might seem a little complicated, but it would be far more complicated to write an regular expression that searches for each combination individually.

As we mentioned above, to define a specific number of a particular character you could simply input that character the desired number of times. However, try counting 17 periods on a line or 17 lower-case letters ([a-z]). Imagine trying to type in this combination 17 times! You could specify a range with a maximum of 17 and a minimum of 17, like this: {17,17}. Although this would work, you could save yourself a little typing by simply including just the single value. Therefore, to match exactly 17 lower-case letters, you might have something like this:

grep "[a-z]\{17\}" FILENAME

If we want to specify a minimum number of times, without a maximum, we simply leave off the maximum, like this:

grep "[a-z]\{17,\}" FILENAME

This would match a pattern of at least 17 lower-case letters.

Another problem occurs when you are trying to parse data that is not in English. If you were looking for all letters in an English text, you could use something like this: [a-zA-Z]. However, this would not include German letters, like ä,Ö,ß and so forth. To do so, you would use the expressions [:lower:], [:upper:] or [:alpha:] for the lower-case letters, upper-case letters or all letters, respectively, regardless of the language. (Note this assumes that national language support (NLS) is configured on your system, which it normally is for newer Linux distributions.

Other expressions include:
# [:alnum:] - Alpha-numeric characters.
# [:cntrl:] - Control characters.
# [:digit:] - Digits.
# [:graph:] - Graphics characters.
# [:print:] - Printable characters.
# [:punct:] - Punctuation.
# [:space:] - White spaces.

One very important thing to note is that the brackets are part of the expression. Therefore, if you want to include more in a bracket expression you need to make sure you have the correction number of brackets. For example, if you wanted to match any number of alpha-numeric or punctuation, you might have an expression like this: [[:alnum:][:digit:]]*.

Another thing to note is that in most cases, regular expression are expanded as much as possible. For example, let's assume I was parsing an HTML file and wanted to match the first tag on the line. You might think to try an expression like this: "<.*>". This says to match any number of characters between the angle brackets. This works if there is only one tag on the line. However, if you have more than one tag, this expression would match everything from the first opening angle-bracket to the last closing angle bracket with everything inbetween.

There are a number of rules that are defined for regular expression, the understanding of which helps avoid confusion:

1. An non-special character is equivalent to that character.
2. When preceeded by a backslash (\) is every special character equivalent to itself
3. A period specifies any single character
4. An asterisk specifies zero or more copies of the preceeding chacter
5. When used by itself, an asterisk species everything or nothing
6. A range of characters is specified within square brackets ([ ])
7. The beginning of the line is specified with a caret (^) and the end of the line with a dollar sign ($)
8. If included within square brackets, a caret (^) negates the set of characters

Interpreting the Command

When you input a command-line, the shell needs to be able intepret it correctly in order to know what exactly to do. Maybe you have multiple options or redirect the output to a file. In any event the shell goes through several steps to figure out that needs to be done.

One question I had was, "In what order does everything get done?" We have shell variables to expand, maybe an alias or function to process, "real" commands, pipes and input/output redirection. There are a lot of things that the shell must consider when figuring out what to do and when.

For the most part, this is not very important. Commands do not get so complex that knowing the evaluation order becomes an issue. However, on a few occasions I have run into situations in which things did not behave as I thought they should. By evaluating the command myself (as the shell would), it became clear what was happening. Let's take a look.

The first thing that gets done is that the shell figures out how many commands there are on the line. (Remember, you can separate multiple commands on a single line with a semicolon.) This process determines how many tokens there are on the command line. In this context, a token could be an entire command or it could be a control word such as "if." Here, too, the shell must deal with input/output redirection and pipes.

Once the shell determines how many tokens there are, it checks the syntax of each token. Should there be a syntax error, the shell will not try to start any of the commands. If the syntax is correct, it begins interpreting the tokens.

First, any alias you might have is expanded. Aliases are a way for some shells to allow you to define your own commands. If any token on the command line is actually an alias that you have defined, it is expanded before the shell proceeds. If it happens that an alias contains another alias, they are both expanded before continuing with the next step.

The next thing the shell checks for is functions. Like the functions in programming languages such as C, a shell function can be thought of as a small subprogram. Check the other sections for details on aliases and functions.

Once aliases and functions have all been completely expanded, the shell evaluates variables. Finally, it uses any wildcards to expand them to file names. This is done according to the rules we talked about previously.

After the shell has evaluated everything, it is still not ready to run the command. It first checks to see if the first token represents a command built into the shell or an external one. If it's not internal, the shell needs to go through the search path.

At this point, it sets up the redirection, including the pipes. These obviously must be ready before the command starts because the command may be getting its input from someplace other than the keyboard and may be sending it somewhere other than the screen. The figure below shows how the evaluation looks graphically.


Image - Steps in interpreting command line input. (interactive)

This is an oversimplification. Things happen in this order, though many more things occur in and around the steps than I have listed here. What I am attempting to describe is the general process that occurs when the shell is trying to interpret your command.

Once the shell has determined what each command is and each command is an executable binary program (not a shell script), the shell makes a copy of itself using the fork() system call. This copy is a child process of the shell. The copy then uses the exec() system call to overwrite itself with the binary it wants to execute. Keep in mind that even though the child process is executing, the original shell is still in memory, waiting for the child to complete (assuming the command was not started in the background with &).

If the program that needs to be executed is a shell script, the program that is created with fork() and exec() is another shell. This new shell starts reading the shell script and interprets it, one line at a time. This is why a syntax error in a shell script is not discovered when the script is started, but rather when the erroneous line is first encountered.

Understanding that a new process is created when you run a shell script helps to explain a very common misconception under UNIX. When you run a shell script and that script changes directories, your original shell knows nothing about the change. This confuses a lot of people who are new to UNIX as they come from the DOS world, where changing the directory from within a batch file does change the original shell. This is because DOS does not have the same concept of a process as UNIX does.

Look at it this way: The sub-shell's environment has been changed because the current directory is different. However, this is not passed back to the parent. Like "real" parent-child relationships, only the children can inherit characteristics from their parent, not the other way around. Therefore, any changes to the environment, including directory changes, are not noticed by the parent. Again, this is different from the behavior of DOS .bat files.

You can get around this by either using aliases or shell functions (assuming that your shell has them). Another way is to use the dot command in front of the shell script you want to execute. For example:

. myscript

<--NOTICE THE DOT!

This script will be interpreted directly by the current shell, without forking a sub-shell. If the script makes changes to the environment, it is this shell's environment that is changed.

You can use this same functionality if you ever need to reset your environment. Normally, your environment is defined by the start-up files in your home directory. On occasion, things get a little confused (maybe a variable is changed or removed) and you need to reset things. You can you the dot command to do so. For example, with either sh or ksh, you can write it like this:

. $HOME/.profile

<--NOTICE THE DOT!

Or, using a function of bash you can also write

. ~/.profile

<--NOTICE THE DOT!

This uses the tilde (~), which I haven't mentioned yet. Under many shells, you can use the tilde as a shortcut to refer to a particular users home directory.

If you have csh, the command is issued like this:

source $HOME/.login

<--NOTICE THE DOT!

Some shells keep track of your last directory in the OLDPWD environment variable. Whenever you change directories, the system saves your current directory in OLDPWD before it changes you to the new location.

You can use this by simply entering cd $OLDPWD. Because the variable $OLDPWD is expanded before the cd command is executed, you end up back in your previous directory. Although this has more characters than just popd, it's easier because the system keeps track of my position, current and previous, for you. Also, because it's a variable, I can access it in the same way that I can access other environment variables.

For example, if there were a file in your old directory that you wanted to move to your current one, you could do this by entering:

cp $OLDPWD/ ./

However, things are not as difficult as they seem. Typing in cd $OLDPWD is still a bit cumbersome. It is a lot less characters to type in popd -like in the csh. Why isn't there something like that in the ksh or bash? There is. In fact, it's much simpler. When I first found out about it, the adjective that first came to mind was "sweet." To change directories to your previous directory, simply type "cd -".

Permissions

All this time we have been talking about finding and executing commands, but there is one issue that I haven't mentioned. That is the concept of permissions. To access a file, you need to have permission to do so. If you want to read a file, you need to have read permission. If you want to write to a file, you need to have write permission. If you want to execute a file, you must have execute permission.

Permissions are set on a file using the chmod command or when the file is created (the details of which I will save for later). You can read the permissions on a file by using either the l command or ls -l. At the beginning of each line will be ten characters, which can either be dashes or letters. The first position is the type of the file, whether it is a regular file (-), a directory (d), a block device file (b), and so on. Below are some examples of the various file types.

Image - Various file types. (interactive)

- - regular file
c - character device
b - block device
d - directory
p - named pipe
l - symbolic link

We'll get into the details of these files as we move along. If you are curious about the format of each entry, you can look at the ls man-page.

The next nine positions are broken into three groups. Each group consists of three characters indicating the permissions. They are, in order, read(r), write(w), and execute(x). The first set of characters indicates what permissions the owner of the file has. The second set of characters indicates the permissions for the group of that file. The last set of characters indicates the permissions for everyone else.

If a particular permission is not given, a dash (-) will appear here. For example, rwx means all three permissions have been given. In our example above, the symbolic link /usr/bin/vi has read, write, and execute permissions for everyone. The device nodes /dev/tty1 and /dev/hda1 have permissions rw- for the owner and group, meaning only read and write, but not execute permissions have been given. The directory /bin has read and execute permissions for everyone (r-x), but only the owner can write to it (rwx).

For directories, the situation is slightly different than for regular files. If you do not have read permission on a directory, you cannot read the contents of that directory. Also, if you do not have write permission on a directory, you cannot write to it. This means that you cannot create a new file in that directory. Execute permissions on a directory mean that you can search it or list its contents. That is, if the execution bit is not set on a directory but the read bit is, you can see what files are in the directory but cannot execute any of the files or even change into that directory. If you have execution permission but no read permission, you can execute the files, change directories, but not see what is in the files.

Write permission on a directory also has an interesting side effect. Because you need to have write permission on a directory to create a new file, you also need to have write permission to remove an existing file. Even if you do not have write permission on the file itself, if you can write to the directory, you can erase the file.

At first this sounds odd. However, remember that a directory is nothing more than a file in a special format. If you have write permission to a directory-file, you can remove the references to other files, thereby removing the files themselves.

If we were to set the permissions for all users so that they could read, write, and execute a file, the command would look this:

chmod 777 filename

You can also use symbolic permissions to accomplish the same thing. We use the letters u, g, and o to specify the user(owner), group, and others for this file, respectively. The permissions are then r for read, w for write, and x for execute. So to set the permissions so that the owner can read and write a file, the command would look like this:

chmod u=rw filename

Note that in contrast to the absolute numbers, setting the permissions symbolically is additive. So, in this case, we would just change the user's permissions to read and write, but the others would remain unchanged. If we changed the command to this

chmod u+w filename

we would be adding write permission for the user of that file. Again, the permissions for the others would be unchanged.

To make the permissions for the group and others to be the same as for the user, we could set it like this

chmod go=u filename

which simply means "change the mode so that the permissions for the group and others equals the user." We also could have set them all explicitly in one command, like this

chmod u=rw,g=rw,o=rw filename

which has the effect of setting the permissions for everyone to read and write. However, we don't need to write that much.

Combining the commands, we could have something that looks like this:

chmod u=rw, go=u filename

This means "set the permissions for the user to read and write, then set the permissions for group and others to be equal to the user."

Note that each of these changes is done in sequence. So be careful what changes are made. For example, let's assume we have a file that is read-only for everyone. We want to give everyone write permission for it, so we try

chmod u+w,gu=o filename

This is a typo because we meant to say go=u. The effect is that we added read permissions for the user, but then set the permissions on the group and user to the same as others.

We might want to try adding the write permissions like this:

chmod +w filename

This works on some systems, but not on the Linux distributions that I have seen. According to the man-page, this will not change those permissions where the bits in the UMASK are set. (More on this later. See the chmod man-page for details.)

To get around this, we use a to specify all users. Therefore, the command would be

chmod a+w filename

There are a few other things that you can do with permissions. For example, you can set a program to change the UID of the process when the program is executed. For example, some programs need to run as root to access other files. Rather than giving the user the root password, you can set the program so that when it is executed, the process is run as root. This is a Set-UID, or SUID program. If you want to run a program with a particular group ID, you would use the SGID program with the s option to chmod, like this

chmod u+s program

or

chmod g+s program

There are a few other special cases, but I will leave it up to you to check out the chmod man-page if you are interested.

When you create a file, the access permissions are determined by their file creation mask. This is defined by the UMASK variable and can be set using the umask command. One thing to keep in mind is that this is a mask. That is, it masks out permissions rather than assigning them. If you remember, permissions on a file can be set using the chmod command and a three-digit value. For example

chmod 600 letter.john

explicitly sets the permissions on the file letter.john to 600 (read and write permission for the user and nothing for everyone else). If we create a new file, the permissions might be 660 (read/write for user and group). This is determined by the UMASK. To understand how the UMASK works, you need to remember that the permissions are octal values, which are determined by the permissions bits. Looking at one set of permissions we have
bit: 2 1 0
value: 4 2 1
symbol: r w x

which means that if the bit with value 4 is set (bit 2), the file can be read; if the bit with value 2 is set (bit 1), the file can be written to; and if the bit with value 1 is set (bit 0), the file can be executed. If multiple bits are set, their values are added together. For example, if bits 2 and 1 are set (read/write), the value is 4+2=6. Just as in the example above, if all three are set, we have 4+2+1=7. Because there are three sets of permissions (owner, group, others), the permissions are usually used in triplets, just as in the chmod example above.

The UMASK value masks out the bits. The permissions that each position in the UMASK masks out are the same as the file permissions themselves. So, the left-most position masks out the owner permission, the middle position the group, and the right most masks out all others. If we have UMASK=007, the permissions for owner and group are not touched. However, for others, we have the value 7, which is obtained by setting all bits. Because this is a mask, all bits are unset. (The way I remember this is that the bits are inverted. Where it is set in the UMASK, it will be unset in the permissions, and vice versa.)

The problem many people have is that the umask command does not force permissions, but rather limits them. For example, if we had UMASK=007, we could assume that any file created has permissions of 770. However, this depends on the program that is creating the file. If the program is creating a file with permissions 777, the umask will mask out the last bits and the permissions will, in fact, be 770. However, if the program creates permissions of 666, the last bits are still masked out. However, the new file will have permissions of 660, not 770. Some programs, like the C compiler, do generate files with the execution bit (bit 0) set. However, most do not. Therefore, setting the UMASK=007 does not force creation of executable programs, unless the program creating the file does itself).

Lets look at a more complicated example. Assume we have UMASK=047. If our program creates a file with permissions 777, then our UMASK does nothing to the first digit, but masks out the 4 from the second digit, giving us 3. Then, because the last digit of the UMASK is 7, this masks out everything, so the permissions here are 0. As a result, the permissions for the file are 730. However, if the program creates the file with permissions 666, the resulting permissions are 620. The easy way to figure out the effects of the UMASK are to subtract the UMASK from the default permissions that the program sets. (Note that all negative values become 0.)

As I mentioned, one way the UMASK is set is through the environment variable UMASK. You can change it anytime using the umask command. The syntax is simply

umask

Here the can either be the numeric value (e.g., 007) or symbolic. For example, to set the umask to 047 using the symbolic notation, we have

umask u=,g=r,o=rwx

This has the effect of removing no permissions from the user, removing read permission from the group, and removing all permissions from others.

Being able to change the permissions on a file is often not enough. What if the only person that should be able to change a file is not the owner? Simple! You change the owner. This is accomplished with the chown command, which has the general syntax:

chown new_owner filename

Where "new_owner" is the name of the user account we want to sent the owner of the file to, and "filename" is the file we want to change. In addition, you can use chown to change not only the owner, but the group of the file as well. This has the general syntax:

chown new_owner.new:group filename

Another useful trick is the ability to set the owner and group to the same ones as another file. This is done with the --reference= option, which sets to the name of the file you are referencing. If you want to change just the group, you can use the chgrp command, which has the same basic syntax as chown. Not that both chgrp and chmod can also take the --reference= option. Further, all three of these commands take the -R option, which recursively changes the permissions, owner or group.

Saturday, December 5, 2009

Shell Variables

The shell's environment is all the information that the shell will use as it runs. This includes such things as your command search path, your logname (the name you logged in under), and the terminal type you are using. Collectively, they are referred to as your environment variables and individually, as the "so-and-so" environment variable, such as the TERM environment variable, which contains the type of terminal you are using.

When you log in, most of these are set for you in one way or another. (The mechanism that sets all environment variables is shell-dependent, so we will talk about it when we get to the individual shells.) Each environment variable can be viewed by simply typing echo $VARIABLE. For example, if I type

echo $LOGNAME

I get:

jimmo

Typing

echo $TERM

I get:

ansi

In general, variables that are pre-defined by the system (e.g. PATH, LOGNAME, HOME) are written in capital letters. Note that this is not a requirement as there are exceptions.

Note that shell variables are only accessible from the current shell. In order for them to be accessible to child processes (i.e. sub-processes) they must be made available using the export command. In the system-wide shell configuration file or "profile" (etc/profile) many variables, such as PATH are exported. More information on processes can be found in the section on processes in the chapter "Introduction to Operating Systems".

It is very common that users' shell prompt is defined by the systems. For example, you might have something that looks like this:
PS1='\u@\h:\w> '

What this does is to set the first level prompt variable PS1 to include the username, hostname and the current working directory. This ends up looking something like this:

jimmo@linux:/tmp>

Adding the \A to display the time, we end up with something that looks like this:

10:09 jimmo@linux:/tmp>

Variable Meaning
\u Username
\h Hostname
\H The fully-qualified hostname
\w Current working directory
\d date
\t the current time in 24-hour HH:MM:SS format
\T the current time in 12-hour HH:MM:SS format
\@ the current time in 12-hour am/pm format
\A the current time in 24-hour HH:MM format
\l the basename of the shell's terminal device
\e Escape character
\n newline
\r carriage return

One way of using the escape character in your prompt is to send a terminal control sequence. The can be used, for example, to change the prompt so that the time is shown in red:
PS1='\e[31m\A\e[0m \u@\h:\w> '

Which then looks like this:

10:09 jimmo@linux:/tmp>

Directory Paths

As we discussed in the section on the seach path, you can often start programs simply by inputting their name, provided they lie in your search path. You could also start a program by referencing it through a relative path, the path in relation to your current working directory. To understand the syntax of relative paths, we need to backtrack a moment. As I mentioned, you can refer to any file or directory by specifying the path to that directory. Because they have special significance, there is a way of referring to either your current directory or its parent directory. The current directory is referenced by "." and its parent by ".." (often referred to in conversation as "dot" and "dot-dot").

Because directories are separated from files and other directories by a /, a file in the current directory could be referenced as ./file_name and a file in the parent directory would be referenced as ../file_name. You can reference the parent of the parent by just tacking on another ../, and then continue on to the root directory if you want. So the file ../../file_name is in a directory two levels up from your current directory. This slash (/) is referred to as a forward slash, as compared to a back-slash (\), which is used in DOS to separate path components.

When interpreting your command line, the shell interprets everything up to the last / as a directory name. If we were in the root (upper-most) directory, we could access date in one of several ways. The first two, date and /bin/date, we already know about. Knowing that ./ refers to the current directory means that we could also get to it like this: ./bin/date. This is saying relative to our current directory (./), look in the bin subdirectory for the command date. If we were in the /bin directory, we could start the command like this: ./date. This is useful when the command you want to execute is in your current directory, but the directory is not in your path. (More on this in a moment.)

We can also get the same results from the root directory by starting the command like this: bin/date. If there is a ./ at the beginning, it knows that everything is relative to the current directory. If the command contains only a /, the system knows that everything is relative to the root directory. If no slash is at the beginning, the system searches until it gets to the end of the command or encounters a slash whichever comes first. If there is a slash there (as in our example), it translates this to be a subdirectory of the current directory. So executing the command bin/date is translated the same as ./bin/date.

Let's now assume that we are in our home directory, /home/jimmo (for example). We can obviously access the date command simply as date because it's in our path. However, to access it by a relative path, we could say ../../bin/date. The first ../ moves up one level into /home. The second ../ moves up another level to /. From there, we look in the subdirectory bin for the command date. Keep in mind that throughout this whole process, our current directory does not change. We are still in /home/jimmo.

Searching your path is only done for commands. If we were to enter vi file_name (vi is a text editor) and there was no file called file_name in our current directory, vi would start editing a new file. If we had a subdirectory called text where file_name was, we would have to access it either as vi ./text/file_name or vi text/file_name. Of course, we could access it with the absolute path of vi /home/jimmo/text/file_name.

When you input the path yourself (either a command or a file) The shell interprets each component of a pathname before passing it to the appropriate command. This allows you to come up with some pretty convoluted pathnames if you so choose. For example:

cd /home/jimmo/data/../bin/../../chuck/letters

This example would be interpreted as first changing into the directory /home/jimmo/data/, moving back up to the parent directory (..), then into the subdirectory bin, back into the parent and its parent (../../) and then into the subdirectory chuck/letters. Although this is a pretty contrived example, I know many software packages that rely on relative paths and end up with directory references similar to this example.
Current directory Target directory Relative path Absolute path
/data/home/jimmo/letter /data/home/jimmo/letter/dave ./dave or dave /data/home/jimmo/letter/dave
/data/home/jimmo/letter /data/home/jimmo ../ /data/home/jimmo
/data/home/jimmo/letter /data/home/ ../.. /data/home
/data/home/jimmo/letter /tmp ../../../../tmp /tmp

The Search Path

It may happen that you know there is a program by a particular name on the system, but when you try to start it from the command line, you are told that the file is not found. Because you just ran it yesterday, you assume it has gotten removed or you don't remember the spelling.

The most common reason for this is that the program you want to start is not in your search path. Your search path is a predefined set of directories in which the system looks for the program you type in from the command line (or is started by some other command). This saves time because the system does not have to look through every directory trying to find the program. Unfortunately, if the program is not in one of the directories specified in your path, the system cannot start the program unless you explicitly tell it where to look. To do this, you must specify either the full path of the command or a path relative to where you are currently located.

Lets look at this issue for a minute. Think back to our discussion of files and directories. I mentioned that every file on the system can be referred to by a unique combination of path and file name. This applies to executable programs as well. By inputting the complete path, you can run any program, whether it is in your path or not.

Lets take a program that is in everyones path, like date (at least it should be). The date program resides in the /bin directory, so its full path is /bin/date. If you wanted to run it, you could type in /bin/date, press Enter, and you might get something that looks like this:

Sat Jan 28 16:51:36 PST 1995

However, because date is in your search path, you need to input only its name, without the path, to get it to run.

One problem that regularly crops up for users coming from a DOS environment is that the only place UNIX looks for commands is in your path. However, even if not specified in your path, the first place DOS looks is in your current directory. This is not so for UNIX. UNIX only looks in your path.

For most users, this is not a problem as the current directory is included in your path by default. Therefore, the shell will still be able to execute something in your current directory. Root does not have the current directory in its path. In fact, this is the way it should be. If you want to include the current directory in roots path, make sure it is the last entry in the path so that all "real" commands are executed before any other command that a user might try to "force" on you. In fact, I suggest that every user adds entries to the end of their path.

Assume a malicious user created a "bad" program in his/her directory called more. If root were to run more in that users directory, it could have potentially disastrous results. (Note that the current directory normally always appears at the end of the search path. So, even if there was a program called more in the current directory, the one in /bin would probably get executed first. However, you can see how this could cause problems for root.) To figure out exactly which program is actually being run, you can use the (what else?) which command.

Newer versions of the bash-Shell can be configured to not only complete commands automatically by inputting just part of the command, but also arguments to the command, as well as directory names. See the bash man-page for more details.

Commands can also be starting by including a directory path, whether or not they are in you search path. You can use relative or absolute paths, usually with the same result. Details on this can be found in the section on directory paths.
One very important environment variable is the PATH variable. Remember that the PATH tells the shell where it needs to look when determining what command it should run. One of the things the shell does to make sense of your command is to find out exactly what program you mean. This is done by looking for the program in the places specified by your PATH variable.

Although it is more accurate to say that the shell looks in the directories specified by your PATH environment variable, it is commonly said that the shell "searches your path." Because this is easier to type, I am going to use that convention here.

If you were to specify a path in the command name, the shell does not use your PATH variable to do any searching. That is, if you issued the command bin/date, the shell would interpret that to mean that you wanted to execute the command date that was in the bin subdirectory of your current directory. If you were in / (the root directory), all would be well and it would effectively execute /bin/date. If you were somewhere else, the shell might not be able to find a match.

If you do not specify any path (that is, the command does not contain any slashes), the system will search through your path. If it finds the command, great. If not, you get a message saying the command was not found.

Let's take a closer look at how this works by looking at my path variable. From the command line, if I type

echo $PATH

I get

/usr/local/bin:/bin:/usr/bin:/usr/X11/bin:/home/jimmo/bin:/:.

WATCH THE DOT!

If I type in date, the first place in which the shell looks is the /bin directory. Because that's where date resides, it is executed as /bin/date. If I type in vi, the shell looks in /bin, doesn't find it, then looks in /usr/bin, where it does find vi. Now I type in getdev. (This is a program I wrote to translate major device numbers into the driver name. Don't worry if you don't know what a major number is. You will shortly.) The shell looks in /usr/local/bin and doesn't find it. It then looks in /bin. Still not there. It then tries /usr/bin and /usr/X11/bin and still can't find it. When it finally gets to /home/jimmo/bin, it finds the getdev command and executes it. (Note that because I wrote this program, you probably won't have it on your system.)

What would happen if I had not yet copied the program into my personal bin directory? Well, if the getdev program is in my current directory, the shell finds a match with the last "." in my path. (Remember that the "." is translated into the current directory, so the program is executed as ./getdev.) If that final "." was missing or the getdev program was somewhere else, the shell could not find it and would tell me so with something like

getdev: not found

One convention that I (as well as most other authors) will use with is that commands that we talk about will be in your path unless we specifically say otherwise. Therefore, to access them, all you need to do is input the name of the command without the full path.

The Shell

As I mentioned in the section on introduction to operating systems, the shell is essentially a user's interface to the operating system. The shell is a command line interpreter, just like other operating systems. In Windows you open up a "command window" or "DOS box" to input commands, which is nothing other than a command line interpreter. Through it, you issue commands that are interpreted by the system to carry out certain actions. Often, the state where the system is sitting at a prompt, waiting for you to type input, is referred to (among other things) as being at the shell prompt or at the command line.

For many years before the invention of graphical user interfaces, such as X-Windows (the X Windowing System, for purists), the only way to input commands to the operating system was through a command line interpreter, or shell. In fact, shells themselves were thought of as wondrous things during the early days of computers because prior to them, users had no direct way to interact with the operating system.

Most shells, be they under DOS, UNIX, VMS, or other operating systems, have the same input characteristics. To get the operating system to do anything, you must give it a command. Some commands, such as the date command under UNIX, do not require anything else to get them to work. If you type in date and press Enter, that's what appears on your screen: the date.

Some commands need something else to get them to work: an argument. Some commands, like mkdir (used to create directories), work with only one argument, as in mkdir directory_name. Others, like cp (to copy files), require multiple arguments, as in

cp file1 file2

In many cases, you can pass flags to commands to change their behavior. These flags are generally referred to as options. For example, if you wanted to create a series of sub-directories without creating every one individually, you could run mkdir with the -p option, like this:

mkdir -p one/two/three/four

In principle, anything added to the command line after the command itself is an argument to that command. The convention is that an option changes the behavior, whereas an argument is acted upon by the command. Let's take the mkdir command as an example:

mkdir dir_name

Here we have a single argument which is the name of the directory to be created. Next, we add an option:

mkdir -p sub_dir/dir_name

The -p is an option. Using the terminology discussed, some arguments are optional and some options are required. That is, with some commands you must always have an option, such as the tar command. Some commands don't always need to have an argument, like the date command.

Generally, options are preceded by a dash (-), whereas arguments are not. I've said it before and I will say it again, nothing is certain when it comes to Linux or UNIX, in general. By realizing that these two terms are often interchanged, you won't get confused when you come across one or the other. I will continue to use option to reflect something that changes the command's behavior and argument to indicate something that is acted upon. In some places, you will also see arguments referred to as "operands". An operand is simply something on which the shell "operates", such as a file, directory or maybe even simple text.

Each program or utility has its own set of arguments and options, so you will have to look at the man-pages for the individual commands. You can call these up from the command line by typing in

man

where is the name of the command you want information about. Also, if you are not sure what the command is, many Linux versions have the whatis command that will give you a brief description. There is also the apropos command, which searches through the man-pages for words you give as arguments. Therefore, if you don't know the name of the command, you can still find it.

Arguments (whether they are options or operands) which are enclosed in square brackets ([ ]) are optional. In some cases, there are optional components to the optional arguments, so you may end up having brackets within brackets.

An ellipsis (...) Indicates that the preceding arguments can be repeated. For example, the ls command can take multiple file or directory names as arguments as well as multiple options. Therefore, you might have a usage message that looks like this:

ls [OPTION] ... [FILE] ...

This tells us that no options are required, but if you wanted you could use multiple options. It also tells us that no file name is required, but if you wanted you could use multiple ones.

Words that appeared in angle brackets (< >) or possibly in italics in the printed form, indicate that the word is a place holder. Like in the example below:

man

Many commands require that an option appear immediately after the command and before any arguments. Others have options and arguments interspersed. Again, look at the man-page for the specifics of a particular command.

Often, you just need a quick reminder as to what the available options are and what their syntax is. Rather than going through the hassle of calling up the man-page, a quick way is to get the command to give you a usage message. As its name implies, a usage message reports the usage of a particular command. I normally use -? as the option to force the usage message, as I cannot think of a command where -? is a valid option. Your system may also support the --help (two dashes) option. More recent versions of the various commands will typically give you a usage message if you use the wrong option. Note that fewer and fewer commands support the -?.

To make things easier, the letter used for a particular option is often related to the function it serves. For example, the -a option to ls says to list "all" files, even those that are "hidden". On older versions of both Linux and Unix, options typically consisted of a single letter, often both upper and lowercase letters. Although this meant you could have 52 different options it made remembering them difficult, if they were multiple functions that all began with the same letter. Multiple options can either be placed separately, each preceded by a dash, or combined. For example, both of these commands are valid and have the exact same effect:

ls -a -l

ls -al

In both cases you get a long listing which also included all of the hidden files.

Newer versions of commands typically allow for both single letter options and "long options" which use full words. For example, the long equivalent of -a would be --all. Note that the long options are preceded with two dashes because it would otherwise be indistinguishable from the -a followed by two -l options.

Although it doesn't happen too often, you might end up with a situation where one of the arguments to your command starts with a dash (-), for example a file name. Since options typically start with a dash, the shell cannot figure out that it is an argument and not a long line of options. Let's assume that some application I had created a file called "-jim". If I wanted to do a simple listing of the file, I might try this:

ls -jim

However, since the shell first tries to figure out what options are being used before it shows you the listing, it thinks that these are all options and gives you the error message:

ls: invalid option -- j
Try `ls --help' for more information.

You can solve this problem with some commands by using two dashes to tell the command that what follows is actually an argument. So to get the listing in the previous example, the command might look like this:

ls -- -jim

Shells and Utilities

Most UNIX users are familiar with "the shell"; it is where you input commands and get output on your screen. Often, the only contact users have with the shell is logging in and immediately starting some application. Some administrators, however, have modified the system to the point where users never even see the shell, or in extreme cases, have eliminated the shell completely for the users.

Because the Linux GUI has become so easy to use, it is possible that you can go for quite a long time without having to input commands at a shell prompt. If your only interaction with the operating system is logging into the GUI and starting applications, most of this entire site can only serve to satisfy your curiosity. Obviously, if all you ever do is start a graphical application, then understanding about shell is not all that important. However, if you are like most Linux users, understanding the basic workings of the shell will do wonders to improve your ability to use the system to its fullest extent.

Up to this point, we have referred to the shell as an abstract entity. In fact, in most texts, it is usually referred to as simply "the shell", although there are many different shells that you can use, and there is always a program that must be started before you can interact with "the shell". Each has its own characteristics (or even quirks), but all behave in the same general fashion. Because the basic concepts are the same, I will avoid talking about specific shells until later.

In this chapter, we are going to cover the basic aspects of the shell. We'll talk about how to issue commands and how the system responds. Along with that, we'll cover how commands can be made to interact with each other to provide you with the ability to make your own commands. We'll also talk about the different kinds of shells, what each has to offer, and some details of how particular shells behave.

Accessing Disks

For the most part, you need to tell Linux what to do. This gives you a lot of freedom, because it does what you tell it, but people new to Linux have a number of pre-conceptions from Windows. One thing you need to do is to tell the system to mount devices like hard disks and CD-ROMs.
Typically Linux sees the CD-ROMs the same way it does hard disks, since they are usually all on the IDE controllers. The device /dev/hda is the master device on the first controller, /dev/hdb is the slave device on the first controller, /dev/hdc is the master on the second controller and /dev/hdd is the slave device on the second controller.

To mount a filesystem/disk you use the mount command. Assuming that your CD-ROM is the master device on the second controller you might mount it like this:

mount DEVICE DIRECTORY

mount /dev/hdc /media/cdrom

Sometimes /media/cdrom does not exist, so you might want to try this.

mount /dev/hdc /mnt

Sometimes the system already know about the CD-ROM device, so you can leave off either component:

mount /media/cdrom


mount /dev/hdc

You can then cd into /media/cdrom and you are on the CD.

Details on this can be found in the section on hard disks and file systems.

Friday, December 4, 2009

When Things Go Wrong

Until you become very accustomed to using Linux you're likely to make mistakes (which also happens to people who have been working with Linux for a long time). In this section, we'll be talking about some common mistakes and problems that occur when you first start using Linux.

Usually when you make mistakes the system will let you know in some way. When using using the command line, the system will tell you in the form of error messages. For example, if you try to execute a command and the command does not exist, the system may report something like this:

bash: some_command: command not found

Such an error might occur if the command exists, but it does not reside in a directory in your search path. You can find more about this in the section on directory paths.

The system may still report an error, even if it can execute the command. For example, if the command acts on a file that does not exist. For example, the more displays the contents of a file. If the file you want to look at does not exist, you might get the error:

some_file: No such file or directory

In the first example, the error came from your shell as it tried to execute the command. In the second case, the error came from the more command as it encountered the error when trying to access the file.

In both these cases, the problem is pretty obvious. In some cases, you are not always sure. Often you include such commands within shell scripts and want to change the flow of the script based on errors or success of the program. When a command ends, it provides its "exit code" or "return code" in the special variable $?. So after a command fails, running this command will show you the exit code:

echo $?

Note that it is up to the program to both provide the text message and the return code. Sometimes you end up with a text message that does not make sense (or there is no text at all), so all you get is the return code, which is probably even less understandable. To make a translation between the return code and a text message, check the file /usr/include/asm/errno.h.

You need to be aware that errors on one system (i.e. one Linux distribution) are not necessarily errors on other systems. For example, if you forget the space in this command, some distributions will give you an error: ls-l

However, on SUSE Linux, this will generate the same output as if you had not forgotten the space. This is because the ls-l is an alias to the command ls -l. As the name implies, an alias is a way of referring to something by a different name. For details take a look at the section on aliases.

It has happened before that I have done a directory listing and saw a particular file. When I tried to remove it, the system told me the file did not exist. The most likely explanation is that I misspelled the filename, but that wasn't it. What can happen sometimes is that a control character ends up becoming part of the filename. This typically happens with the backspace as it is not always defined as the same character on every system. Often the backspace is CTRL-H, but it could happen that you create a file on a system with a different backspace key and end up creating a filename with CTRL-H. When you display the file it prints out the name and when it reaches the backspace backs up one character before continuing. For example your ls output might show you this file:

jimmo

However trying to erase it you get an error message. To see any " non printable" characters you would use the -q option to ls. This might show you:

jimmoo?

Which says the file name actually contains two o's and a trailing backspace. Since the backspace erased the last 'o' in the display, you do not see it when the file name is displayed normally.

Sometimes you lose control of programs and they seem to "runaway". In other cases, a program may seem to hang and freeze your terminal. Although it is possible because of a bug in the software or a flaky piece of hardware, oftentimes the user makes a mistake he was not even aware of. This can be extremely frustrating for the beginner, since you do not even know how you got yourself into the situation, let alone how to get out.

When I first started learning Unix (even before Linux was born) I would start programs and quickly see that I needed to stop them. I knew I could stop the program with some combination of the control key and some other letter. In my rush to stop the program, I would press the control key and many different letters in sequence. On some occassions, the program simply stop and goes no further. On other occasions, the program would appear to stop, but I would later discover that it was still running. What happened was that I hit a combination that did not stop the program but did something else.

In the first example, where the program would stop and go no further, I had "suspended" the program. In essence, I'd put it to sleep and it would wait for me to tell it to start up again. This is typically done by pressing CTRL-S. This feature can obviously be useful in the proper circumstance, but when it is unexpected and you don't know what you did, it can be very unnerving. To put things right, you resume the command with CTRL-Q.

In the second example, where the program seemed to have disappeared, I had also suspended the program but at the same time had put in the "background". This special feature of Unix shells dates from the time before graphical interfaces were common. It was a great waste of time to start a program and then have to wait for it to complete, when all you were interested in was the output which you could simply write to file. Instead you put a program in the background and the shell returned to the prompt, ready for the next command. It's sometimes necessary to do this once a command is started, which you do by pressing CTRL-Z, which suspends the program, but returns to the prompt. You then issue the bg command, which starts the previous command in the background. (This is all part of "job control" which is discussed in another section.)

To stop the program, what I actually wanted to do was to "interrupt" it. This is typical done with CTRL-C.

What this actually does is to send a signal to the program, in this case an interrupt signal. You can define which signal is sent when you press any given combination of keys. We talk about this in the section on terminal settings.

When you put a command in the background which send output to the screen, you need to be careful about running other programs in the meantime. What could happen is that your output gets mixed up, making it difficult to see which output belongs to which command.

There have been occasions where I have issued a command and the shell jumps to the next line, then simply displays a greater than symbol (>). What this often means is that the shell does not think you are done with the command. This typically happens when you are enclosing something on the command line quotes in you forget to close the quotes. For example if I wanted to search for my name in a file I would use the grep command. If I were to do it like this:

grep James Mohr filename.txt

I would get an error message saying that the file "Mohr" did not exist.

To issue this command correctly I would have to include my name inside quotes, like this:

grep "James Mohr" filename.txt

However, if I forgot the final quote, for example, the shell would not think the command was done yet and would perceive the enter key that I pressed as part of the command. What I would need to do here is to interrupt the command, as we discussed previously. Note this can also happen if you use single quotes. Since the shell does not see any difference between a single quote and an apostrophe, you need to be careful with what you type. For example if I wanted to print the phrase "I'm Jim", I might be tempted to do it like this:/P>

echo I'm Jim

However, the system does not understand contractions and thinks I have not finished the command.

As we will discuss in the section on pipes and redirection, you can send the output of a command to a file. This is done with the greater than symbol (>). The generic syntax looks like this:

command > filename

This can cause problems if the command you issue expects more arguments than you gave it. For example, if I were searching the contents of a file for occurrences of a particular phrase

grep phrase > filename

What would happen is the shell would drop down to the next line and simply wait forever or until you interrupted the command. The reason is that the grep command can also take input from the command line. It is waiting for you to type in text, before it will begin searching. Then if it finds the phrase you are looking for it will write it into the file. If that's not what you want the solution here is also to interrupt the command. You can also enter the end of file character (CTRL-D), which would tell grep to stop reading input.

One thing to keep in mind, is that you can put a program in the background even if the shell does not understand job control. In this case, it is impossible to bring the command back to the foreground in order to interrupt. You need to do something else. As we discussed earlier, Linux provides you a tool to display the processes which you are currently running (the ps command). Simply typing ps on the command line might give you something like this:

PID TTY TIME CMD
29518 pts/3 00:00:00 bash
30962 pts/3 00:00:00 ps

The PID column in the ps output is the process identifier (PID).

If not run in the background, the child processes will continue to do its job until its finished and then report back to its parent when it is done. A little house cleaning is done and the process disappears from the system. However, sometimes, the child doesn't end like it is supposed to. One case is when it becomes a "runaway" process. There are a number of causes of runaway processes, but essentially it means that the process is no longer needed but does not disappear from the system

The result of this is often the parent cannot end either. In general, the parent should not end until all of its children are done (however there are cases where it is desired). If processes continue to run they take up resource and can even bring the system to a stand still.

In cases where you have "runaway" processes or any other time where as process is running that you need to stop, you can send any process a signal to stop execution if you know its PID. This is the kill command and syntax is quite simple:

kill

By default, the kill command sends a termination signal to that process. Unfortunately, there are some cases where a process can ignore that termination signal. However, you can send a much more urgent "kill" signal like this:

kill -9

Where "9" is the number of the SIGKILL or kill signal. In general, you should first try to use signal 15 or SIGTERM. This sends a terminate singal and gives the process a chance to end "gracefully". You should also look to see if the process you want to stop has any children.

For details on what other signals can be sent and the behavior in different circumstances look at the kill man-page or simply try kill -l:

1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL
5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2
13) SIGPIPE 14) SIGALRM 15) SIGTERM 17) SIGCHLD
18) SIGCONT 19) SIGSTOP 20) SIGTSTP 21) SIGTTIN
22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO
30) SIGPWR 31) SIGSYS 35) SIGRTMIN 36) SIGRTMIN+1
37) SIGRTMIN+2 38) SIGRTMIN+3 39) SIGRTMIN+4 40) SIGRTMIN+5
41) SIGRTMIN+6 42) SIGRTMIN+7 43) SIGRTMIN+8 44) SIGRTMIN+9
45) SIGRTMIN+10 46) SIGRTMIN+11 47) SIGRTMIN+12 48) SIGRTMIN+13
49) SIGRTMIN+14 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8
57) SIGRTMAX-7 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4
61) SIGRTMAX-3 62) SIGRTMAX-2 63) SIGRTMAX-1 64) SIGRTMAX

Keep in mind that sending signals to a process is not just to kill a process. In fact, sending signals to processes is a common way for processes to communicate with each other. You can find more details about signals in the section on interprocess communication.

In some circumstances, it is not easy to kill processes by their PID. For example, if something starts dozens of other processes, it is ineffective to try to input all of their PIDs. To solve this problem Linux has the killall command and takes the command name instead of the PID. You can also use the -i, --interactive option to interactively ask you if the process should be kill or the -w, --wait option to wait for all killed processes to die. Note that if processed ignores the signal or if it is a zombie, then killall may end up waiting forever.

There have been cases where I have frantically tried to stop a runaway program and repeatedly pressed Ctrl-C. The result is that the terminal gets into an undefined state whereby it does not react properly to any input, that is when you press the various keys. For example, pressing the enter key may not bring you to a new line (which it normally should do). If you try executing a command, it's possible to command is not executed properly, because the system has not identified the enter key correctly. You can return your terminal to a "sane" condition by inputting:

stty sane Ctrl-J

The Ctrl-J character is the line feed character and is necessary as the system does not recognize the enter key.

It has happened to me a number of times, that the screen saver was activated and it was if the system had simply frozen. There were no error messages, no keys work and the machine did not even respond across the network (telnet, ping, etc.) Unfortunately, the only thing to do in this case is to turn the computer off and then on again.

On the other hand, you can prevent these problems in advance. THe most likely cause it that the Advanced Power Management (APM) is having problems. In this case, you should disable the APM within the system BIOS. Some machines also have something called "hardware monitoring". This can cause problems, as well, and should be disabled.

Problems can also be caused by the Advanced Programmable Interrup controller. This can be deactivated by changing the boot string used by either LILO or grub. In addtion, you can disable it by adding "disableapic" to your boot line.

Privacy policy

At http://linux4fresher.blogspot.com/, the privacy of our visitors is of extreme importance to us. This privacy policy document outlines the types of personal information is received and collected by http://linux4fresher.blogspot.com/ and how it is used.

Log Files
Like many other Web sites, http://linux4fresher.blogspot.com/ makes use of log files. The information inside the log files includes internet protocol ( IP ) addresses, type of browser, Internet Service Provider ( ISP ), date/time stamp, referring/exit pages, and number of clicks to analyze trends, administer the site, track user’s movement around the site, and gather demographic information. IP addresses, and other such information are not linked to any information that is personally identifiable.

Cookies and Web Beacons
http://linux4fresher.blogspot.com/ does use cookies to store information about visitors preferences, record user-specific information on which pages the user access or visit, customize Web page content based on visitors browser type or other information that the visitor sends via their browser.

DoubleClick DART Cookie
Google, as a third party vendor, uses cookies to serve ads on http://linux4fresher.blogspot.com/.
Google's use of the DART cookie enables it to serve ads to users based on their visit to http://linux4fresher.blogspot.com/ and other sites on the Internet.
Users may opt out of the use of the DART cookie by visiting the Google ad and content network privacy policy at the following URL - http://www.google.com/privacy_ads.html

Some of our advertising partners may use cookies and web beacons on our site. Our advertising partners include ....
Google Adsense


These third-party ad servers or ad networks use technology to the advertisements and links that appear on http://linux4fresher.blogspot.com/ send directly to your browsers. They automatically receive your IP address when this occurs. Other technologies ( such as cookies, JavaScript, or Web Beacons ) may also be used by the third-party ad networks to measure the effectiveness of their advertisements and / or to personalize the advertising content that you see.

http://linux4fresher.blogspot.com/ has no access to or control over these cookies that are used by third-party advertisers.

You should consult the respective privacy policies of these third-party ad servers for more detailed information on their practices as well as for instructions about how to opt-out of certain practices. http://linux4fresher.blogspot.com/'s privacy policy does not apply to, and we cannot control the activities of, such other advertisers or web sites.

If you wish to disable cookies, you may do so through your individual browser options. More detailed information about cookie management with specific web browsers can be found at the browsers' respective websites.

Disclaimer :
All the pictures and material on this blog are assumed to be taken from public domain. The copyright (if any) of these pictures and articles belongs to their orginal publisher / photographer / copyright holder as the case may be. We claim no ownership to them. If anybody has reservations / objection on the use of these material/images or find any copy-righted material on this site, then please e-mail us with the details of copy right etc. In case, the objections is found to be appropriate, the offensive material / pictures will be removed from this site immediately.