Ralf's Dillema

The reason to keep shell scripts safe

A production system broke, because a bash script failed to switch into the correct directory thus filling up a limited disk.
Services weren’t able to write the files exchanged to others to disk and users were mad.

What exactly happened

A script was called from a cronjob. The script was supposed to copy files from the local disk to a NFS share.
The script was supposed to switch into the NFS and create a gzipped versions of files older than a day.
Yet the NFS mount was not available, so the script failed to switch into the directory and created the gzipped files in the current directory.
As the disk filled up, the script failed to create the gzipped files - but as soon as the disk freed up, the script continued to create the gzipped files and filled up the disk again.

How to prevent this

The script was written a long time ago - I don’t know why this is done this way.

Exit on error

No one ever thought that the NFS might not be available. Instead of needing to build a check for every possible error, we can use the set -e (or set -o errexit) option to exit the script on any error.

#!/bin/bash
set -e
# OR
set -o errexit

# do stuff

# If some commands are allowed to fail, we can use the `|| true` 
# option to ignore the error.
# ignore errors
command || true

errexit will make the script exit if any command returns a non-zero exit code. This could be compared to an unhandled exception in a programming language.
This makes it easier to spot issues, without this option, the script will continue and if the last command succeeds, the script will exit with a zero exit code.

If we had implemented this option, the script would have failed and the disk would not have been filled up, ~~and my friday would have been saved~~.

Subshells and functions

For functions or subshells we need to use set -e errtrace to exit the subshell or function on error.

#!/bin/bash
set -o errtrace

# function
function foo {
  # do stuff
}

Variables

We can use the set -u (or set -o nounset) option to exit the script on unset variables.

#!/bin/bash
set -u
# OR
set -o nounset

WillBeUnset="Unset"
testVariable="$willBeUnset is unset"
echo "$testVariable"

In this case the script will exit with an error, because the variable willBeUnset is not set, as the variable name is set in PascalCase but called in camelCase.
With the nounset option, the script exits with an exit code of 1 and prints the following error message: willBeUnset: unbound variable.

Pipefail

We can use the set -o pipefail option to exit the script on errors in pipes.

#!/bin/bash
set -o pipefail

# do stuff
cat /missing/file | command

pipefail will make the script exit, if the pipe returns a non-zero exit code. If any command in a pipe returns a non-zero exit code, it will be used as the return code for the whole pipe - even if the last command succeeds.

This was one of the reasons why the script failed. The script was supposed to create a gzipped version of piped files, which tried for every file and failed after some time.
But as soon as the disk freed up, the script continued to create the gzipped files and filled up the disk again.

Debugging

We can use the set -x (or set -o xtrace) option to print the commands and their arguments as they are executed.

#!/bin/bash
set -x
# OR
set -o xtrace

# do stuff

Conclusion

If you are working with shell scripts, you should use safety options to prevent critical errors.
To keep your bash script short you can use the set -euo pipefail errtrace option to enable all options at once.

#!/bin/bash
set -euo pipefail errtrace
# To enable debugging, add the x option too

# do stuff

Maybe this will save your ass one day.

Keep Shell Scripts safe