No Backslash Support

From Exterior Memory
Jump to: navigation, search

grep and sh (the shell) do not support the otherwise common shortcuts \t (tab) or \n (newline), but instead interpret it (without warning) as a literal letter t or letter n.

I've bitten by this more times than I like to recall. If you rely on input that uses delimiters other than spaces, I recommend that you don't write shell scripts, but use a decent script language (Perl, Python, Ruby or even PHP).

There are actually two issues:

  1. The shell and lot of Unix tools do not support \t (tab) or \n (newline)
  2. It is very hard do specify delimiters other than spaces or newlines

Delimiter (IFS)

By default the shell uses and whitespace as delimiter.

This introduced a bug in the following script, because it fails for file names with spaces in them.

#!/bin/sh
# Loop through all files
for file in `ls -1 $HOME`
do
    # Do something
    echo "File $file"
done

You need to tell the shell to only use the newline as delimiter. You can do so by setting the $IFS variable to a newline. Since the shell does not understand \n, the following is wrong:

IFS="\n"  # this fails

Instead, the only way you can do this is by using a literal newline in your script:

IFS="
"  # this works

Here is the full script:

# Set the item separator to newline only (to support file names with spaces)
OIFS="$IFS"
IFS='
'
# Loop through all files
for file in `ls -1 $HOME`
do
    # Do something
    echo "File $file"
done
# restore original delimiter
IFS="$OIFS"

<rant>If you think this is ugly or convoluted for a script that just loops through all files, please stop writing shell scripts and learn a script language. You will not regret turning your back on the rotten pile of junk that is called shell scripting.</rant>

grep does not support \t

grep does not support \t or \n: it will simply match the letter t or the letter n. grep -E does support \w though.

Given the file

1	line one
2	line two
3	line three

grep -E --colour "^.+\t" will match

2	line two
3	line three

The --colour parameter lets you quickly debug the regular expression.

To specify the tab, insert a literal tab in your expression:

grep -E --colour "^.+	"

You can insert a tab character on the command line by pressing control-V tab.

grep and sed is line-based

It is near impossible to do anything which is not line-based with Unix tools. Despite the name, sed (the stream editor) does not operate on a character-stream, but operates on lines.

For example, I have not found a good way to use a regular expression that contains a new line with Unix tools. For example, a regular expression that matches a line which is followed by another line.

grep does not understand "\n" (it will interpret this as the letter n), but even inserting a literal newline fails:

Using command-V:

grep ".^M." somefile.txt

does not match any line (I expected it to match the last character of a line, and the first character of the next line)

Using command-V:

grep ".^M." somefile.txt

does not match any line (I expected it to match the last character of a line, and the first character of the next line)

#!/bin/sh
grep -E --colour '.
.' somefile.txt

matches all lines in the file (it seem equivalent to grep -E '.')