LINUX 4 FRESHER: Interpreting the Command

When you input a command-line, the shell needs to be able intepret it correctly in order to know what exactly to do. Maybe you have multiple options or redirect the output to a file. In any event the shell goes through several steps to figure out that needs to be done.

One question I had was, "In what order does everything get done?" We have shell variables to expand, maybe an alias or function to process, "real" commands, pipes and input/output redirection. There are a lot of things that the shell must consider when figuring out what to do and when.

For the most part, this is not very important. Commands do not get so complex that knowing the evaluation order becomes an issue. However, on a few occasions I have run into situations in which things did not behave as I thought they should. By evaluating the command myself (as the shell would), it became clear what was happening. Let's take a look.

The first thing that gets done is that the shell figures out how many commands there are on the line. (Remember, you can separate multiple commands on a single line with a semicolon.) This process determines how many tokens there are on the command line. In this context, a token could be an entire command or it could be a control word such as "if." Here, too, the shell must deal with input/output redirection and pipes.

Once the shell determines how many tokens there are, it checks the syntax of each token. Should there be a syntax error, the shell will not try to start any of the commands. If the syntax is correct, it begins interpreting the tokens.

First, any alias you might have is expanded. Aliases are a way for some shells to allow you to define your own commands. If any token on the command line is actually an alias that you have defined, it is expanded before the shell proceeds. If it happens that an alias contains another alias, they are both expanded before continuing with the next step.

The next thing the shell checks for is functions. Like the functions in programming languages such as C, a shell function can be thought of as a small subprogram. Check the other sections for details on aliases and functions.

Once aliases and functions have all been completely expanded, the shell evaluates variables. Finally, it uses any wildcards to expand them to file names. This is done according to the rules we talked about previously.

After the shell has evaluated everything, it is still not ready to run the command. It first checks to see if the first token represents a command built into the shell or an external one. If it's not internal, the shell needs to go through the search path.

At this point, it sets up the redirection, including the pipes. These obviously must be ready before the command starts because the command may be getting its input from someplace other than the keyboard and may be sending it somewhere other than the screen. The figure below shows how the evaluation looks graphically.

Image - Steps in interpreting command line input. (interactive)

This is an oversimplification. Things happen in this order, though many more things occur in and around the steps than I have listed here. What I am attempting to describe is the general process that occurs when the shell is trying to interpret your command.

Once the shell has determined what each command is and each command is an executable binary program (not a shell script), the shell makes a copy of itself using the fork() system call. This copy is a child process of the shell. The copy then uses the exec() system call to overwrite itself with the binary it wants to execute. Keep in mind that even though the child process is executing, the original shell is still in memory, waiting for the child to complete (assuming the command was not started in the background with &).

If the program that needs to be executed is a shell script, the program that is created with fork() and exec() is another shell. This new shell starts reading the shell script and interprets it, one line at a time. This is why a syntax error in a shell script is not discovered when the script is started, but rather when the erroneous line is first encountered.

Understanding that a new process is created when you run a shell script helps to explain a very common misconception under UNIX. When you run a shell script and that script changes directories, your original shell knows nothing about the change. This confuses a lot of people who are new to UNIX as they come from the DOS world, where changing the directory from within a batch file does change the original shell. This is because DOS does not have the same concept of a process as UNIX does.

Look at it this way: The sub-shell's environment has been changed because the current directory is different. However, this is not passed back to the parent. Like "real" parent-child relationships, only the children can inherit characteristics from their parent, not the other way around. Therefore, any changes to the environment, including directory changes, are not noticed by the parent. Again, this is different from the behavior of DOS .bat files.

You can get around this by either using aliases or shell functions (assuming that your shell has them). Another way is to use the dot command in front of the shell script you want to execute. For example:

. myscript

<--NOTICE THE DOT!

This script will be interpreted directly by the current shell, without forking a sub-shell. If the script makes changes to the environment, it is this shell's environment that is changed.

You can use this same functionality if you ever need to reset your environment. Normally, your environment is defined by the start-up files in your home directory. On occasion, things get a little confused (maybe a variable is changed or removed) and you need to reset things. You can you the dot command to do so. For example, with either sh or ksh, you can write it like this:

. $HOME/.profile

<--NOTICE THE DOT!

Or, using a function of bash you can also write

. ~/.profile

<--NOTICE THE DOT!

This uses the tilde (~), which I haven't mentioned yet. Under many shells, you can use the tilde as a shortcut to refer to a particular users home directory.

If you have csh, the command is issued like this:

source $HOME/.login

<--NOTICE THE DOT!

Some shells keep track of your last directory in the OLDPWD environment variable. Whenever you change directories, the system saves your current directory in OLDPWD before it changes you to the new location.

You can use this by simply entering cd $OLDPWD. Because the variable $OLDPWD is expanded before the cd command is executed, you end up back in your previous directory. Although this has more characters than just popd, it's easier because the system keeps track of my position, current and previous, for you. Also, because it's a variable, I can access it in the same way that I can access other environment variables.

For example, if there were a file in your old directory that you wanted to move to your current one, you could do this by entering:

cp $OLDPWD/ ./

However, things are not as difficult as they seem. Typing in cd $OLDPWD is still a bit cumbersome. It is a lot less characters to type in popd -like in the csh. Why isn't there something like that in the ksh or bash? There is. In fact, it's much simpler. When I first found out about it, the adjective that first came to mind was "sweet." To change directories to your previous directory, simply type "cd -".

LINUX 4 FRESHER

Sunday, December 6, 2009

Interpreting the Command

No comments:

Post a Comment

Followers

Blog Archive