Sunday, December 6, 2009

Interpreting the Command

When you input a command-line, the shell needs to be able intepret it correctly in order to know what exactly to do. Maybe you have multiple options or redirect the output to a file. In any event the shell goes through several steps to figure out that needs to be done.

One question I had was, "In what order does everything get done?" We have shell variables to expand, maybe an alias or function to process, "real" commands, pipes and input/output redirection. There are a lot of things that the shell must consider when figuring out what to do and when.

For the most part, this is not very important. Commands do not get so complex that knowing the evaluation order becomes an issue. However, on a few occasions I have run into situations in which things did not behave as I thought they should. By evaluating the command myself (as the shell would), it became clear what was happening. Let's take a look.

The first thing that gets done is that the shell figures out how many commands there are on the line. (Remember, you can separate multiple commands on a single line with a semicolon.) This process determines how many tokens there are on the command line. In this context, a token could be an entire command or it could be a control word such as "if." Here, too, the shell must deal with input/output redirection and pipes.

Once the shell determines how many tokens there are, it checks the syntax of each token. Should there be a syntax error, the shell will not try to start any of the commands. If the syntax is correct, it begins interpreting the tokens.

First, any alias you might have is expanded. Aliases are a way for some shells to allow you to define your own commands. If any token on the command line is actually an alias that you have defined, it is expanded before the shell proceeds. If it happens that an alias contains another alias, they are both expanded before continuing with the next step.

The next thing the shell checks for is functions. Like the functions in programming languages such as C, a shell function can be thought of as a small subprogram. Check the other sections for details on aliases and functions.

Once aliases and functions have all been completely expanded, the shell evaluates variables. Finally, it uses any wildcards to expand them to file names. This is done according to the rules we talked about previously.

After the shell has evaluated everything, it is still not ready to run the command. It first checks to see if the first token represents a command built into the shell or an external one. If it's not internal, the shell needs to go through the search path.

At this point, it sets up the redirection, including the pipes. These obviously must be ready before the command starts because the command may be getting its input from someplace other than the keyboard and may be sending it somewhere other than the screen. The figure below shows how the evaluation looks graphically.


Image - Steps in interpreting command line input. (interactive)

This is an oversimplification. Things happen in this order, though many more things occur in and around the steps than I have listed here. What I am attempting to describe is the general process that occurs when the shell is trying to interpret your command.

Once the shell has determined what each command is and each command is an executable binary program (not a shell script), the shell makes a copy of itself using the fork() system call. This copy is a child process of the shell. The copy then uses the exec() system call to overwrite itself with the binary it wants to execute. Keep in mind that even though the child process is executing, the original shell is still in memory, waiting for the child to complete (assuming the command was not started in the background with &).

If the program that needs to be executed is a shell script, the program that is created with fork() and exec() is another shell. This new shell starts reading the shell script and interprets it, one line at a time. This is why a syntax error in a shell script is not discovered when the script is started, but rather when the erroneous line is first encountered.

Understanding that a new process is created when you run a shell script helps to explain a very common misconception under UNIX. When you run a shell script and that script changes directories, your original shell knows nothing about the change. This confuses a lot of people who are new to UNIX as they come from the DOS world, where changing the directory from within a batch file does change the original shell. This is because DOS does not have the same concept of a process as UNIX does.

Look at it this way: The sub-shell's environment has been changed because the current directory is different. However, this is not passed back to the parent. Like "real" parent-child relationships, only the children can inherit characteristics from their parent, not the other way around. Therefore, any changes to the environment, including directory changes, are not noticed by the parent. Again, this is different from the behavior of DOS .bat files.

You can get around this by either using aliases or shell functions (assuming that your shell has them). Another way is to use the dot command in front of the shell script you want to execute. For example:

. myscript

<--NOTICE THE DOT!

This script will be interpreted directly by the current shell, without forking a sub-shell. If the script makes changes to the environment, it is this shell's environment that is changed.

You can use this same functionality if you ever need to reset your environment. Normally, your environment is defined by the start-up files in your home directory. On occasion, things get a little confused (maybe a variable is changed or removed) and you need to reset things. You can you the dot command to do so. For example, with either sh or ksh, you can write it like this:

. $HOME/.profile

<--NOTICE THE DOT!

Or, using a function of bash you can also write

. ~/.profile

<--NOTICE THE DOT!

This uses the tilde (~), which I haven't mentioned yet. Under many shells, you can use the tilde as a shortcut to refer to a particular users home directory.

If you have csh, the command is issued like this:

source $HOME/.login

<--NOTICE THE DOT!

Some shells keep track of your last directory in the OLDPWD environment variable. Whenever you change directories, the system saves your current directory in OLDPWD before it changes you to the new location.

You can use this by simply entering cd $OLDPWD. Because the variable $OLDPWD is expanded before the cd command is executed, you end up back in your previous directory. Although this has more characters than just popd, it's easier because the system keeps track of my position, current and previous, for you. Also, because it's a variable, I can access it in the same way that I can access other environment variables.

For example, if there were a file in your old directory that you wanted to move to your current one, you could do this by entering:

cp $OLDPWD/ ./

However, things are not as difficult as they seem. Typing in cd $OLDPWD is still a bit cumbersome. It is a lot less characters to type in popd -like in the csh. Why isn't there something like that in the ksh or bash? There is. In fact, it's much simpler. When I first found out about it, the adjective that first came to mind was "sweet." To change directories to your previous directory, simply type "cd -".

No comments:

Post a Comment

Privacy policy

At http://linux4fresher.blogspot.com/, the privacy of our visitors is of extreme importance to us. This privacy policy document outlines the types of personal information is received and collected by http://linux4fresher.blogspot.com/ and how it is used.

Log Files
Like many other Web sites, http://linux4fresher.blogspot.com/ makes use of log files. The information inside the log files includes internet protocol ( IP ) addresses, type of browser, Internet Service Provider ( ISP ), date/time stamp, referring/exit pages, and number of clicks to analyze trends, administer the site, track user’s movement around the site, and gather demographic information. IP addresses, and other such information are not linked to any information that is personally identifiable.

Cookies and Web Beacons
http://linux4fresher.blogspot.com/ does use cookies to store information about visitors preferences, record user-specific information on which pages the user access or visit, customize Web page content based on visitors browser type or other information that the visitor sends via their browser.

DoubleClick DART Cookie
Google, as a third party vendor, uses cookies to serve ads on http://linux4fresher.blogspot.com/.
Google's use of the DART cookie enables it to serve ads to users based on their visit to http://linux4fresher.blogspot.com/ and other sites on the Internet.
Users may opt out of the use of the DART cookie by visiting the Google ad and content network privacy policy at the following URL - http://www.google.com/privacy_ads.html

Some of our advertising partners may use cookies and web beacons on our site. Our advertising partners include ....
Google Adsense


These third-party ad servers or ad networks use technology to the advertisements and links that appear on http://linux4fresher.blogspot.com/ send directly to your browsers. They automatically receive your IP address when this occurs. Other technologies ( such as cookies, JavaScript, or Web Beacons ) may also be used by the third-party ad networks to measure the effectiveness of their advertisements and / or to personalize the advertising content that you see.

http://linux4fresher.blogspot.com/ has no access to or control over these cookies that are used by third-party advertisers.

You should consult the respective privacy policies of these third-party ad servers for more detailed information on their practices as well as for instructions about how to opt-out of certain practices. http://linux4fresher.blogspot.com/'s privacy policy does not apply to, and we cannot control the activities of, such other advertisers or web sites.

If you wish to disable cookies, you may do so through your individual browser options. More detailed information about cookie management with specific web browsers can be found at the browsers' respective websites.

Disclaimer :
All the pictures and material on this blog are assumed to be taken from public domain. The copyright (if any) of these pictures and articles belongs to their orginal publisher / photographer / copyright holder as the case may be. We claim no ownership to them. If anybody has reservations / objection on the use of these material/images or find any copy-righted material on this site, then please e-mail us with the details of copy right etc. In case, the objections is found to be appropriate, the offensive material / pictures will be removed from this site immediately.