No Backslash Support
grep and sh (the shell) do not support the otherwise common shortcuts \t (tab) or \n (newline), but instead interpret it (without warning) as a literal letter t or letter n.
I've bitten by this more times than I like to recall. If you rely on input that uses delimiters other than spaces, I recommend that you don't write shell scripts, but use a decent script language (Perl, Python, Ruby or even PHP).
There are actually two issues:
- The shell and lot of Unix tools do not support \t (tab) or \n (newline)
- It is very hard do specify delimiters other than spaces or newlines
Delimiter (IFS)
By default the shell uses and whitespace as delimiter.
This introduced a bug in the following script, because it fails for file names with spaces in them.
#!/bin/sh # Loop through all files for file in `ls -1 $HOME` do # Do something echo "File $file" done
You need to tell the shell to only use the newline as delimiter. You can do so by setting the $IFS variable to a newline. Since the shell does not understand \n, the following is wrong:
IFS="\n" # this fails
Instead, the only way you can do this is by using a literal newline in your script:
IFS=" " # this works
Here is the full script:
# Set the item separator to newline only (to support file names with spaces) OIFS="$IFS" IFS=' ' # Loop through all files for file in `ls -1 $HOME` do # Do something echo "File $file" done # restore original delimiter IFS="$OIFS"
<rant>If you think this is ugly or convoluted for a script that just loops through all files, please stop writing shell scripts and learn a script language. You will not regret turning your back on the rotten pile of junk that is called shell scripting.</rant>
grep does not support \t
grep does not support \t or \n: it will simply match the letter t or the letter n. grep -E does support \w though.
Given the file
1 line one 2 line two 3 line three
grep -E --colour "^.+\t" will match
2 line two 3 line three
The --colour parameter lets you quickly debug the regular expression.
To specify the tab, insert a literal tab in your expression:
grep -E --colour "^.+ "
You can insert a tab character on the command line by pressing control-V tab.
grep and sed is line-based
It is near impossible to do anything which is not line-based with Unix tools. Despite the name, sed (the stream editor) does not operate on a character-stream, but operates on lines.
For example, I have not found a good way to use a regular expression that contains a new line with Unix tools. For example, a regular expression that matches a line which is followed by another line.
grep does not understand "\n" (it will interpret this as the letter n), but even inserting a literal newline fails:
Using command-V:
grep ".^M." somefile.txt
does not match any line (I expected it to match the last character of a line, and the first character of the next line)
Using command-V:
grep ".^M." somefile.txt
does not match any line (I expected it to match the last character of a line, and the first character of the next line)
#!/bin/sh grep -E --colour '. .' somefile.txt
matches all lines in the file (it seem equivalent to grep -E '.')