Using the terminal, we are able to send simple text commands to our computer to do things like navigate through a directory or copy a file. The shell is a user interface for access to an operating system’s services. Most often the user interacts with the shell using a command-line interface (CLI). The terminal is a program that opens a graphical window and lets you interact with the shell. It is defined as the outermost layer of an operating system, which stands in direct contrast to the kernel (the innermost layer of the operating system). The shell is the part we humans can interact with.
A shell does three main things:
- Initialize: In this step, a typical shell would read and execute its configuration files. These change aspects of the shell’s behavior.
- Interpret: Next, the shell reads commands from the standard input, that is the line prompt, and executes them.
- Terminate: After its commands are executed, the shell executes any shutdown commands, frees up any memory, and terminates.
Most of the wizardry takes place the Interpret part, implemented as an “infinite” loop. The loop does three things:
- Read: Read the command from standard input.
- Parse: Separate the command string into a program and arguments.
- Execute: Run the parsed command.
One tool that helps us a lot and one of the most frequently used commands in shell is the ls
command, used to list directories and files in a current directory; Lets see it’s syntax before breaking in down.
By default, when we run the ls command without any arguments, it will print a list of all files or directories in the current working directory.
$ ls
Just type ls and press the enter key. The whole content will be shown as seen in the image below.
By default, the ls command displays list of all files or directories in alphabetical or numeric order. Pretty straight forward so far. But there is a lot happening behind the scenes. Lets see another example that requires an argument [file name]
$ ls *.c
If you want to list all files with a specific extension in a directory, use the asterisk + the file extension as an argument with ls
command as show above:
The -l
(long listing format) option
By default, the ls
command output display only the names of the files and directories. If you want to get more information then use -l
option with the ls command to display its output in long listing format
$ ls -l *.c
When you use -l
option (long listing format), you will see the below file information:
Output -rwxr--r-- 1 root root 241 Jul 7 07:31 0-puts_recursion.c
In the above example:
- First character (–) — It shows the file Type.
- File Permissions (
rwxr--r--)
– It is indicates that the user can read, write and execute the file, and the group and others can only read the file. - Number of links (
1)
– It is shows the number of hard links to this file. - Owner and group (
root root
) -These two fields specifies the owner and the group of the file. - Size (
241
) – It is specifies the size of file in bytes. - Last modified date and time (
Jul 7 07:31
) – This field specifies the last modification date and time of the file - File name (
0-puts_recursion.c)
– The last field shows name of the file.
I hope you now have an understanding of the ls -l
linux command. Now lets take a look behind the scenes.
The *
(star/ asterisk) expansion
The asterisk or star wildcard (*) matches the character that are placed after and tells the program to look for files that match those same characters. in our case, one places it before the '.c'
extension, you are telling the shell to only list files that end with a '.c'
or c files.
Notice how in the previous example, only the files ending with .c are displayed
Break the command into tokens
We enter a string as such, but for the execution the shell needs the different parts of this line separated. So the shell tokenizes it. That is, it separates every word and puts it into an array of strings.
One can implement this behaviour with a function similar to getline and strtok.
Check for aliases
Bash aliases allow you to set a shortcut command for a longer command. A bash alias has the following structure:
alias [alias_name]=”[command_to_alias]”
It always starts on a new line with the alias keyword. You define the shortcut command you want to use with the alias name, followed by an equal sign. So under the hood, the shell will exchange the entered command as an alias for the actual command with the respective arguments for proper execution.
Check builtins
First, lest clarify what Builtins are. Builtins are commands or functions called from the shell. These functions are executed directly within the shell in the present process instance and are faster than external programs. If the program determines the user input is not a builtin, by comparing the command with an array of the builtin names, it then moves to the next process, which is looking for the PATH.
Finds the command in the PATH
It copies the environment, which is an “array of strings” (AKA a double-pointer), One of this strings is the PATH. Which as such is a line starting with “PATH=” followed by the addresses of all the directories where the executable files are located separated by colons. The first part has to be removed and then the addresses had to be parsed or tokenize to form another array of strings with each separated possible path. The shell then adds or concatenate the command to the path and check for access permissions for the given file. Here is clear that a command is also actually the name of an executable file, that is, of a program to be executed. This check is done until the file matching the name with the right permissions is found or the array is finished, marked by a NULL string.
In our ls case the shell will start looking for the program file named ls, in the posibble addreses stores in the PATH.
Call the program ls with all the filename ending with .c as parameter
To call or execute a program means to do the following syscalls:
fork/execve/wait
fork creates a child process. When this function is called, the operating system duplicates the process currently running, both start to run concurrently. To tell them apart, the original process is called the “parent”, and the new one is called the “child”. fork() returns 0 to the child process, and it returns to the parent the process ID number (PID) of its child. So practically the only way for new processes is to start is by an existing one duplicating itself.
execve does the actual execution of the program. Normally one just does not want to run two copies of the same. So the exec families of commands replaces the current running program (parent) with an entirely new one (child). This means that when you call exec, the operating system stops your process, loads up the new program, and starts that one in its place. A process never returns from an exec() call (unless there’s an error).
wait stops the parent process while the child process, which is executing the program finishes.
As a side note, build-ins do not go into this process, they are executed within the parent process. For example, a call to exit within a child process would exit the child process, but we need that it exits the actual shell, that why it has to be called from the parent process.
When ls is done, print the prompt
The prompt is just the character or characters that signal us that the terminal is ready to take a new command. It is printed in the Stdout, also known as standard output, is the default file descriptor where a process can write output. stdout is defined by the POSIX standard. Its default file descriptor number is 1.