A production system broke, because a bash script failed to switch into the correct directory thus filling up a limited disk.
Services weren’t able to write the files exchanged to others to disk and users were mad.
A script was called from a cronjob. The script was supposed to copy files from the local disk to a NFS share.
The script was supposed to switch into the NFS and create a gzipped versions of files older than a day.
Yet the NFS mount was not available, so the script failed to switch into the directory and created the gzipped files in the current directory.
As the disk filled up, the script failed to create the gzipped files - but as soon as the disk freed up, the script continued to create the gzipped files and filled up the disk again.
The script was written a long time ago - I don’t know why this is done this way.
No one ever thought that the NFS might not be available. Instead of needing to build a check for every possible error, we can use the set -e
(or set -o errexit
) option to exit the script on any error.
#!/bin/bash
set -e
# OR
set -o errexit
# do stuff
# If some commands are allowed to fail, we can use the `|| true`
# option to ignore the error.
# ignore errors
command || true
errexit
will make the script exit if any command returns a non-zero exit code. This could be compared to an unhandled exception in a programming language.If we had implemented this option, the script would have failed and the disk would not have been filled up,
and my friday would have been saved.
For functions or subshells we need to use set -e errtrace
to exit the subshell or function on error.
#!/bin/bash
set -o errtrace
# function
function foo {
# do stuff
}
We can use the set -u
(or set -o nounset
) option to exit the script on unset variables.
#!/bin/bash
set -u
# OR
set -o nounset
WillBeUnset="Unset"
testVariable="$willBeUnset is unset"
echo "$testVariable"
In this case the script will exit with an error, because the variable willBeUnset
is not set, as the variable name is set in PascalCase
but called in camelCase
.
With the nounset
option, the script exits with an exit code of 1 and prints the following error message: willBeUnset: unbound variable
.
We can use the set -o pipefail
option to exit the script on errors in pipes.
#!/bin/bash
set -o pipefail
# do stuff
cat /missing/file | command
pipefail
will make the script exit, if the pipe returns a non-zero exit code. If any command in a pipe returns a non-zero exit code, it will be used as the return code for the whole pipe - even if the last command succeeds.This was one of the reasons why the script failed. The script was supposed to create a gzipped version of piped files, which tried for every file and failed after some time.
But as soon as the disk freed up, the script continued to create the gzipped files and filled up the disk again.
We can use the set -x
(or set -o xtrace
) option to print the commands and their arguments as they are executed.
#!/bin/bash
set -x
# OR
set -o xtrace
# do stuff
If you are working with shell scripts, you should use safety options to prevent critical errors.
To keep your bash script short you can use the set -euo pipefail errtrace
option to enable all options at once.
#!/bin/bash
set -euo pipefail errtrace
# To enable debugging, add the x option too
# do stuff
Maybe this will save your ass one day.