Chaining commands together
Administration
- I haven’t heard from anyone that this time doesn’t work, so I am going to move forward with this time slot.
- The questions will be similar to the homework and attendance questions.
- In general I would recommend doing it in the class docker container to ensure compatability.
- I will be in my Zoom room for the duration of the exam period
Chaining commands together
Unix philosophy
- Make each program do one thing well.
- Expect the output of every program to become the input to another,
— Doug McIlroy
An added note, a lot of semantics of this lecture will change depending on the shell you are using (sh, bash, zsh, dash, fish), but the concepts should remain.
Piping (|
)
We’ve already talked about this a bit in the class, and even used it in some assignments; piping is the most basic form of connecting two commands together.
Generally, when a command runs, it prints the output to your terminal. This occurs because the command’s stdout
is redirected to the terminal by default. This doesn’t have to be the case though. With a pipe, we can send the stdout
(and stderr
) to the stdin
of the following command.
A really common use that we have seen again and again in this class is piping into grep
.
$ cat file.txt | grep word
Nothing changes in the perspective of the cat
command; all it knows is that it is writing data to its stdout
. This redirection is occurring at the shell level: the OS provides the mechanisms to make the redirection possible, but the shell is what connects one commands stdout
to the next commands stdin
, which is all opaque to the commands themselves.
Data Streams: stdin
, stdout
, and stderr
There are 3 primary data streams for any terminal command: stdin
, stdout
, and stderr
.
stdin
is how some commands receive their input data. If a command is able to receive its information via a pipe (|
), then it reads fromstdin
stdout
is the channel in which most commands output their information. It is usually connected to your current terminal window, which is why are you able to see it when running a command.stderr
is likestdout
in the fact that it is an output channel of information, but it is usually only written to when an error occurs with the command that was run. It also tends to write to the terminal.
Each of these data streams have a corresponding file in the /dev/
directory:
/dev/stdin
/dev/stdout
/dev/stderr
which can be used to manually manipulate the data stream
In the /dev/
directory there are also file descriptors for each of the tty
s in use for the different terminal sessions you have open, as well as other useful files such as /dev/null
which you can think of as a black hole, which takes information on stdin
and disregards it.
redirection
basic redirection (>
and >>
)
The most common form of redirection is >
. This is the basic redirection that will send stdout
(but not stderr
) to a file of your choosing. You can view this with the following command; notice that the stdout
gets put in the file but stderr
still prints to the terminal.
$ curl https://intro-to-cmdline-tools.jtledon.com/parsetext/jledon.txt > only-stdin.txt
You can see the text about download progress is not in the file, but the text from the file is. This is because the download progress text is written to stderr
, and therefore was not redirected.
>>
is exactly the same as >
except it appends to a file if it exists, rather than overwriting it.
$ echo "hi" > file.txt # file.txt contains the text "hi"$ echo "replace" > file.txt # file.txt contains the text "replace", and nothing else$ echo "addition" >> file.txt # file.txt contains the text "replace" and "addition"
stderr
redirection
Referring back to the previous example, sometimes you dont want the stderr
output cluttering your terminal. In cases like that, you can specifically redirect the stderr
output somewhere else. Notice that now the download progress information is gone:
$ curl https://intro-to-cmdline-tools.jtledon.com/parsetext/jledon.txt 2> /dev/null > only-stdin.txt$ curl -s https://intro-to-cmdline-tools.jtledon.com/parsetext/jledon.txt > only-stdin.txt # curl's -s flag also helps with this
You can redirect stdout
, stdout
, or both, with the following syntax
1>
(>
shorthand) - redirect stdout2>
- redirect stderr&>
- redirect both stdout and stderr
file descriptor redirection
You can also refer to other of the 3 file descriptors. For example, you can redirect stderr
to wherever stdout
is pointing, without having to know where that is; stdout
could be redirected to another command’s stdin
, to a new file, or to /dev/null
.
2>&1
- direct stderr to go to the same location as stdout
|&
- pipe bothstdout
andstderr
to the next command- equivalent to
2>&1 |
- equivalent to
input redirection
We just covered output redirection, but there is also input redirection. This can be similar to piping, but has notable differences.
If we wanted to grep the contents of a file, one way would be to do:
$ cat file.txt | grep word$ grep word file.txt
Lets focus on the first case. We want to call the cat
command just to get the data into grep
- that seems wasteful. Instead, we can use input redirection to pass the data to grep
on stdin
without having to call cat
first.
$ grep word < file.txt
We were able to pass the contents of the file on stdin
without needing to call cat
to get it to write to its stdout
first.
I find this particularly useful when using gdb
on binaries that require stdin
input; it allows you to define the data that you want the binary to read on stdin
, in a file. Then, when the binary is ready to read from stdin
there is already data there ready for it.
$ gdb ./prog$ run < input_file
$ gdb -ex='run < input_file' --args ./prog # one-liner of the same idea
This method pairs particularly well with a .gdbinit
file.
Here-doc and Here-strings
https://en.wikipedia.org/wiki/Here_document#Overview
The key difference from here documents is that, in here documents, the delimiters are on separate lines; the leading and trailing newlines are stripped. Unlike here documents, here strings do not use delimiters.
wc -l <<EOF> these> are all> seperate> lines> EOF
Here-documents
Work well with chaining
$ wc -l << EOF | xargs seq | xargs mkdir> this> is> a> multiline> string> EOF
# makes directories 1 2 3 4 5
Here-strings
Similar to here-docs, but there is no “delimiter”, just a quotation, which is on the same line.
Still works perfectly fine with chaining, but is more confusing to read (unmatched quote leading into the rest of the pipeline)
Better for small inputs bc <<< '2^7'
Note that bc only reads from stdin, and does not take cmdline args
otherwise you would have to write echo 2^7 | bc
or just open bc
directly
$ wc -l <<< "this | xargs seq | xargs mkdir> is> a> multiline> string"
# makes directories 1 2 3 4 5
xargs
Sometimes, commands don’t read from stdin
, the only read the values that are passed to them on the command-line. xargs
solves this problem by reading from stdin
to get the pipe data, and then repeatedly calling the command that you entered. You can verify that the command is called repeatedly in separate instances with the following command:
$ echo -e "1\n2\n3" | xargs -I {} date "+{} %s%N"
As I hinted in the previous example, xargs
allows you to specify a placeholder for where the value from stdin
will get placed in the following command with -I
. This is particularly useful when you need the values to be placed in a specific location in the next command. For example, at a certain place in a format string, or between two flags:
$ echo -e "1\n2\n3" | xargs -I {} date "+{} %s%N"
Some notable commands that
$ find ... | rm # doesnt work$ find ... | xargs rm # passing it as a cmdline argument rather than via stdin
For complex xargs
commands that require subshells, piping or other operations that execute with higher precedence, consider using bash -c
to wrap the command and ensure xargs -I{}
string substitution occurs before running any command
$ seq 2 5 | xargs -I{} bash -c 'printf "%s - %s\n" {} $(dig +short site{}.com | head -n 1)'
Command substitution
There are times were you dont want to pass in information over stdin
or via a file, but you want to include the contents of another command in another one. In situations like that, it is convenient to use command substitution.
$ ps -u $(whoami) -U $(whoami) # limits the ps output to processes owned by the current user
This is actually how most shell scripts are written
$ VAR=$(cmd)
Process substitution
Process substitution is useful when you are using a command that requires files as inputs, but you need to perform some operation on the input files before you want to input them. You could make a .tmp
file perform some operations, and delete it later but that requires making a file. It would be nice if you could perform some operations on the file before sending it to the command, and still pass it in as a file descriptor.
Consider this example where you have two files that. Assume that file1 and file2 are files with lists of words in them. We can get a more reasonable understanding of how different the files actually are and what is missing, because we sort them first.
$ diff <(sort file1.txt) <(sort file2.txt)$ comm <(sort file1.txt) <(sort file2.txt) # show whats common between these files rather than whats different
<()
returns a file descriptor which points to the contents returned by stdout
from the commands inside.
There is also an equivalent version in the other direction. It allows you to pass in >()
, in the spot where a command would be expecting a file name. The shell handles this, generates a fd, and sends the data that would have gone into this file to stdin
of the commands inside >()
instead.
This is a bit of contrived example, but it shows what it can do and how to use it.
$ head -n 5 file.txt > >(grep "word")
>
expects a file to write to. Rather than passing in a filename, we pass in >()
: >()
stands in the place of where a file would be. Rather than the command writing to that fd, the output that would have been written is piped into the command inside the process substituion, and it gets ran.
Utilizing device files
There are times when some commands only write to a file, and wont write to stdout: cp
is a poor example, but it fits the purposes of this demonstration. One way to get around this is to use output process substitution:
$ cp some-file.txt >(cat)
but another way would be to use the /dev/stdout
file
$ cp some-file.txt /dev/stdout # this also goes to stdout!
Its worth noting that this is an incredibly contrived example, and just cat
ing the file would make much more sense and have the same effect, but its gets my point across of how you might interact with the input and output stream device files.
Homework
Submit your own set of commands that use all of the following:
- pipes -
|
- input redirection -
<
- output redirection -
>
xargs
- command substitution -
$()
- process substitution input -
<()
creating a fd - process substitution output -
>()
writing to a commands stdin rather than a file
You can have these commands do whatever you want. When you submit the command, include a summary of what this command or pipeline of commands does and what problem it solves.
When using <()
, you cant do something like diff <(cat file.txt)
instead of just doing diff file.txt
because no meaningful change was made; however, if you use something more substantial than cat
, that would be fine.
I couldnt think of an assignment that would force you to use all of these, so the best I can do is encourage you to use them on your own and think of use-case. I know you can, but try not to use ChatGPT; if everyone uses ChatGPT or the provided examples on the man pages everyone is going to have the same solutions which will be super boring to grade.