Chaining commands together

Administration

I haven’t heard from anyone that this time doesn’t work, so I am going to move forward with this time slot.
The questions will be similar to the homework and attendance questions.
In general I would recommend doing it in the class docker container to ensure compatability.
I will be in my Zoom room for the duration of the exam period

Chaining commands together

Unix philosophy

Make each program do one thing well.

Expect the output of every program to become the input to another,

— Doug McIlroy

An added note, a lot of semantics of this lecture will change depending on the shell you are using (sh, bash, zsh, dash, fish), but the concepts should remain.

Piping (`|`)

We’ve already talked about this a bit in the class, and even used it in some assignments; piping is the most basic form of connecting two commands together.

Generally, when a command runs, it prints the output to your terminal. This occurs because the command’s stdout is redirected to the terminal by default. This doesn’t have to be the case though. With a pipe, we can send the stdout (and stderr) to the stdin of the following command.

A really common use that we have seen again and again in this class is piping into grep.

$ cat file.txt | grep word

Nothing changes in the perspective of the cat command; all it knows is that it is writing data to its stdout. This redirection is occurring at the shell level: the OS provides the mechanisms to make the redirection possible, but the shell is what connects one commands stdout to the next commands stdin, which is all opaque to the commands themselves.

Data Streams: `stdin`, `stdout`, and `stderr`

There are 3 primary data streams for any terminal command: stdin, stdout, and stderr.

stdin is how some commands receive their input data. If a command is able to receive its information via a pipe (|), then it reads from stdin
stdout is the channel in which most commands output their information. It is usually connected to your current terminal window, which is why are you able to see it when running a command.
stderr is like stdout in the fact that it is an output channel of information, but it is usually only written to when an error occurs with the command that was run. It also tends to write to the terminal.

Each of these data streams have a corresponding file in the /dev/ directory:

/dev/stdin
/dev/stdout
/dev/stderr which can be used to manually manipulate the data stream

In the /dev/ directory there are also file descriptors for each of the ttys in use for the different terminal sessions you have open, as well as other useful files such as /dev/null which you can think of as a black hole, which takes information on stdin and disregards it.

redirection

basic redirection (`>` and `>>`)

The most common form of redirection is >. This is the basic redirection that will send stdout (but not stderr) to a file of your choosing. You can view this with the following command; notice that the stdout gets put in the file but stderr still prints to the terminal.

$ curl https://intro-to-cmdline-tools.jtledon.com/parsetext/jledon.txt > only-stdin.txt

You can see the text about download progress is not in the file, but the text from the file is. This is because the download progress text is written to stderr, and therefore was not redirected.

>> is exactly the same as > except it appends to a file if it exists, rather than overwriting it.

$ echo "hi" > file.txt # file.txt contains the text "hi"
$ echo "replace" > file.txt # file.txt contains the text "replace", and nothing else
$ echo "addition" >> file.txt # file.txt contains the text "replace" and "addition"

`stderr` redirection

Referring back to the previous example, sometimes you dont want the stderr output cluttering your terminal. In cases like that, you can specifically redirect the stderr output somewhere else. Notice that now the download progress information is gone:

$ curl https://intro-to-cmdline-tools.jtledon.com/parsetext/jledon.txt 2> /dev/null > only-stdin.txt
$ curl -s https://intro-to-cmdline-tools.jtledon.com/parsetext/jledon.txt > only-stdin.txt # curl's -s flag also helps with this

You can redirect stdout, stdout, or both, with the following syntax

1> (> shorthand) - redirect stdout
2> - redirect stderr
&> - redirect both stdout and stderr

file descriptor redirection

You can also refer to other of the 3 file descriptors. For example, you can redirect stderr to wherever stdout is pointing, without having to know where that is; stdout could be redirected to another command’s stdin, to a new file, or to /dev/null.

2>&1 - direct stderr to go to the same location as stdout

|& - pipe both stdout and stderr to the next command
- equivalent to 2>&1 |

input redirection

We just covered output redirection, but there is also input redirection. This can be similar to piping, but has notable differences.

If we wanted to grep the contents of a file, one way would be to do:

$ cat file.txt | grep word
$ grep word file.txt

Lets focus on the first case. We want to call the cat command just to get the data into grep - that seems wasteful. Instead, we can use input redirection to pass the data to grep on stdin without having to call cat first.

$ grep word < file.txt

We were able to pass the contents of the file on stdin without needing to call cat to get it to write to its stdout first.

I find this particularly useful when using gdb on binaries that require stdin input; it allows you to define the data that you want the binary to read on stdin, in a file. Then, when the binary is ready to read from stdin there is already data there ready for it.

$ gdb ./prog
$ run < input_file

$ gdb -ex='run < input_file' --args ./prog # one-liner of the same idea

This method pairs particularly well with a .gdbinit file.

Here-doc and Here-strings

https://en.wikipedia.org/wiki/Here_document#Overview

The key difference from here documents is that, in here documents, the delimiters are on separate lines; the leading and trailing newlines are stripped. Unlike here documents, here strings do not use delimiters.

wc -l <<EOF
> these
> are all
> seperate
> lines
> EOF

Here-documents

Work well with chaining

$ wc -l << EOF | xargs seq | xargs mkdir
> this
> is
> a
> multiline
> string
> EOF

# makes directories 1 2 3 4 5

Here-strings

Similar to here-docs, but there is no “delimiter”, just a quotation, which is on the same line. Still works perfectly fine with chaining, but is more confusing to read (unmatched quote leading into the rest of the pipeline) Better for small inputs bc <<< '2^7' Note that bc only reads from stdin, and does not take cmdline args otherwise you would have to write echo 2^7 | bc or just open bc directly

$ wc -l <<< "this | xargs seq | xargs mkdir
> is
> a
> multiline
> string"

# makes directories 1 2 3 4 5

`xargs`

Sometimes, commands don’t read from stdin, the only read the values that are passed to them on the command-line. xargs solves this problem by reading from stdin to get the pipe data, and then repeatedly calling the command that you entered. You can verify that the command is called repeatedly in separate instances with the following command:

$ echo -e "1\n2\n3" | xargs -I {} date "+{}  %s%N"

As I hinted in the previous example, xargs allows you to specify a placeholder for where the value from stdin will get placed in the following command with -I. This is particularly useful when you need the values to be placed in a specific location in the next command. For example, at a certain place in a format string, or between two flags:

$ echo -e "1\n2\n3" | xargs -I {} date "+{}  %s%N"

Some notable commands that

$ find ... | rm # doesnt work
$ find ... | xargs rm # passing it as a cmdline argument rather than via stdin

For complex xargs commands that require subshells, piping or other operations that execute with higher precedence, consider using bash -c to wrap the command and ensure xargs -I{} string substitution occurs before running any command

$ seq 2 5 | xargs -I{} bash -c 'printf "%s - %s\n" {} $(dig +short site{}.com | head -n 1)'

Command substitution

There are times were you dont want to pass in information over stdin or via a file, but you want to include the contents of another command in another one. In situations like that, it is convenient to use command substitution.

$ ps -u $(whoami) -U $(whoami) # limits the ps output to processes owned by the current user

This is actually how most shell scripts are written

$ VAR=$(cmd)

Process substitution

Process substitution is useful when you are using a command that requires files as inputs, but you need to perform some operation on the input files before you want to input them. You could make a .tmp file perform some operations, and delete it later but that requires making a file. It would be nice if you could perform some operations on the file before sending it to the command, and still pass it in as a file descriptor.

Consider this example where you have two files that. Assume that file1 and file2 are files with lists of words in them. We can get a more reasonable understanding of how different the files actually are and what is missing, because we sort them first.

$ diff <(sort file1.txt) <(sort file2.txt)
$ comm <(sort file1.txt) <(sort file2.txt) # show whats common between these files rather than whats different

<() returns a file descriptor which points to the contents returned by stdout from the commands inside.

There is also an equivalent version in the other direction. It allows you to pass in >(), in the spot where a command would be expecting a file name. The shell handles this, generates a fd, and sends the data that would have gone into this file to stdin of the commands inside >() instead.

This is a bit of contrived example, but it shows what it can do and how to use it.

$ head -n 5 file.txt > >(grep "word")

> expects a file to write to. Rather than passing in a filename, we pass in >(): >() stands in the place of where a file would be. Rather than the command writing to that fd, the output that would have been written is piped into the command inside the process substituion, and it gets ran.

Utilizing device files

There are times when some commands only write to a file, and wont write to stdout: cp is a poor example, but it fits the purposes of this demonstration. One way to get around this is to use output process substitution:

$ cp some-file.txt >(cat)

but another way would be to use the /dev/stdout file

$ cp some-file.txt /dev/stdout # this also goes to stdout!

Its worth noting that this is an incredibly contrived example, and just cating the file would make much more sense and have the same effect, but its gets my point across of how you might interact with the input and output stream device files.

Homework

Submit your own set of commands that use all of the following:

pipes - |
input redirection - <
output redirection - >
xargs
command substitution - $()
process substitution input - <() creating a fd
process substitution output - >() writing to a commands stdin rather than a file

You can have these commands do whatever you want. When you submit the command, include a summary of what this command or pipeline of commands does and what problem it solves.

When using <(), you cant do something like diff <(cat file.txt) instead of just doing diff file.txt because no meaningful change was made; however, if you use something more substantial than cat, that would be fine.

I couldnt think of an assignment that would force you to use all of these, so the best I can do is encourage you to use them on your own and think of use-case. I know you can, but try not to use ChatGPT; if everyone uses ChatGPT or the provided examples on the man pages everyone is going to have the same solutions which will be super boring to grade.

Chaining commands together

Administration