Saturday, 6 August 2011

Sed - UNIX Tool

In this post I want to discuss about sed (Stream editor). Sed is a Unix utility that parses text and implements a programming language which can apply transformations to such text. It was developed by Lee E. McMohan of Bell Labs at 1974. It is available today for most operating systems.

Sed has several commands, the most essential command of sed is the substitution command: s. The substitution command changes all occurrences of the regular expression into a new value.

A simple example is changing “day” in the “old” file to “night” in the “new” file.

sed 's/day/night/' <old > new

I must emphasize the sed editor changes exactly what you tell it to. So if you executed

echo Sunday | sed 's/day/night/'

This would output the word “Sunnight” because sed found the string “day” in the input.
There are four parts to this substitute command:

s                 Substitute command
/../../        Delimeter
day           Regular expression pattern search pattern
night         Replacement string

The slash as a delimeter

The character after the s is the delimiter. It can be anything you want, however. If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin – you could use the backslash to quote the slash:

sed 's/\/usr\/local\/bin/\/common\/bin/' <old >new

we can also use an underline, colons or “|”  character instead of a slash as a delimiter:

sed 's_/usr/local/bin_/common/bin_' <old >new
sed 's:/usr/local/bin:/common/bin:' <old >new
sed 's|/usr/local/bin|/common/bin|' <old >new
        
Using & as the matched string

$ used for pattern matching. It is easy to do this if you are looking for a particular string:

sed 's/abc/(abc)/' <old >new

This won't work if you don't know exactly what you will find. The solution requires the special character "&." It corresponds to the pattern found.

sed 's/[a-z]*/(&)/' <old >new

You can have any number of "&" in the replacement string. e.g. the first number of a line:

% echo "123 abc" | sed 's/[0-9]*/& &/'
123 123 abc

If the input was "abc 123" the output would be unchanged.

% echo "123 abc" | sed 's/[0-9][0-9]*/& &/'
123 123 abc

The original sed did not support the "+" metacharacter. GNU sed does. It means "one or more matches". So the above could also be written using
% echo "123 abc" | sed 's/[0-9]+/& &/'
123 123 abc

Using \1 to keep part of the pattern

The "\1" is the first remembered pattern, and the "\2" is the second remembered pattern. Sed has up to nine remembered patterns. If you wanted to keep the first word of a line, and delete the rest of the line, mark the important part with the parenthesis:

sed 's/\([a-z]*\).*/\1/'

"[a-z]*" matches zero or more lower case letters, Therefore if you type

echo abcd123 | sed 's/\([a-z]*\).*/\1/'

This will output "abcd" and delete the numbers.
If you want to switch two words around, you can remember two patterns and change the order around:

sed 's/\([a-z]*\) \([a-z]*\)/\2 \1/'

You may want to insist that words have at least one letter by using

sed 's/\([a-z][a-z]*\) \([a-z][a-z]*\)/\2 \1/'

If you want to eliminate duplicated words, you can try:

sed 's/\([a-z]*\) \1/\1/'

If you want to detect duplicated words, you can use

sed -n '/\([a-z][a-z]*\) \1/p'

If you wanted to reverse the first three characters on a line, you can use

sed 's/^\(.\)\(.\)\(.\)/\3\2\1/'

/g -Global replacement

let's place parentheses around words on a line. Instead of using a pattern like "[A-Za-z]*" which won't match words like "won't," we will use a pattern, "[^ ]*," that matches everything except a space. The following will put parenthesis around the first word:

sed 's/[^ ]*/(&)/' <old >new

If you want it to make changes for every word, add a "g" after the last delimiter and use the work-around:

sed 's/[^ ][^ ]*/(&)/g' <old >new

/1, /2, etc. Specifying which occurance

If you want to modify a particular pattern that is not the first one on the line, you could use "\(" and "\)" to mark each pattern, and use "\1" to put the first pattern back unchanged. This next example keeps the first word on the line but deletes the second:

sed 's/\([a-zA-Z]*\) \([a-zA-Z]*\) /\1 /' <old >new

There is an easier way to do this. You can add a number after the substitution command to indicate you only want to match that particular pattern. Example:

sed 's/[a-zA-Z]* //2' <old >new

You can combine a number with the g (global) flag. For instance, if you want to leave the first world alone , but change the second, third, etc. to DELETED, use /2g:

sed 's/[a-zA-Z]* /DELETED /2g' <old >new

/p – print
If you use an optional argument to sed, "sed -n," it will not, by default, print any new lines. When the "-n" option is used, the "p" flag will cause the modified line to be printed.

sed -n 's/pattern/&/p' <file

Write to a file with /w filename

you can specify a file that will receive the modified data. An example is the following, which will write all lines that start with an even number, followed by a space, to the file even:

sed -n 's/^[0-9]*[02468] /&/w even' <file

previously, I have only used one substitute command. If you need to make two changes, and you didn't want to read the manual, you could pipe together multiple sed commands:

sed 's/BEGIN/begin/' <old | sed 's/END/end/' >new

Multiple commands with -e command

One method of combining multiple commands is to use a -e before each command:

sed -e 's/a/A/' -e 's/b/B/' <old >new

sed -f scriptname

If you have a large number of sed commands, you can put them into a file and use

sed -f sedscript <old >new

where sedscript could look like this:

# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g

sed in shell script
If you have many commands and they won't fit neatly on one line, you can break up the line using a backslash:

sed -e 's/a/A/g' \
-e 's/e/E/g' \
-e 's/i/I/g' \
-e 's/o/O/g' \
-e 's/u/U/g' <old >new


No comments:

Post a Comment