In this part we cover the following topics
sed
The acronym
sed
stands for stream editor
. Sometimes you may think about it as substitution editor
because the most typical usage is text substitution. Created with the birth of Unix (during 1973–74 by Lee E. McMahon of Bell Labs) it is one of the most indispensable tools we can't live with. Acording to general Unix philosophy [Unix philosophy]
- Write programs that do one thing and do it well.
- Write programs to work together.
- Write programs to handle text streams, because that is a universal interface.
it is a simple utility that allow us to make some text transformations. Simplicity is a source of its power and today, it runs on all major operating systems.
To a novice user, the syntax of sed
may look cryptic which is true for all command lines tools. However, once we get familiar with its syntax, we can solve many complex tasks with just a few lines of code. The sed
is a line-oriented stream editor that performs editing operations on information coming from standard input or a file. sed
edits line by line and in a non-interactive way. This means that we make all of the editing decisions as we are calling the editor. This may seem confusing or unintuitive for those who were raised in the WYSIWYG culture, but it is a very powerful and fast way to transform text. sed
reads text, line by line, from a source (standard input stream or file) into an internal buffer called the pattern space. Each line read starts a new cycle. To the pattern space, sed applies one or more operations which have been specified via a command line or sed script.
s
, the substitute command is the element from which sed
is best known. In its simples form it has the form
1 |
sed s/[PATTERN]/[REPLACEMENT]/ [INPUT_FILE] |
which may be used like below
1 2 3 4 5 6 7 8 9 10 11 12 |
MBAPF:textdataprocessing fulmanp$ echo foo bar > text.txt MBAPF:textdataprocessing fulmanp$ cat text.txt foo bar MBAPF:textdataprocessing fulmanp$ echo foo bar | sed s/foo/FOO/ FOO bar MBAPF:textdataprocessing fulmanp$ sed s/foo/FOO/ text.txt FOO bar MBAPF:textdataprocessing fulmanp$ sed s/foo/FOO/ < text.txt FOO bar MBAPF:textdataprocessing fulmanp$ sed s/foo/FOO/ < text.txt > res.txt MBAPF:textdataprocessing fulmanp$ cat res.txt FOO bar |
Remarks:
- Only the first occurrence of
foo
was changed. sed
changes exactly what we tell it to.
12MBAPF:textdataprocessing fulmanp$ sed s/oo/OO/ < text.txtfOO bar- To avoid any problems in a future, it is recommend to use quotes even in this case.
1234MBAPF:textdataprocessing fulmanp$ sed s/foo/FOO/ < text.txtFOO barMBAPF:textdataprocessing fulmanp$ sed 's/foo/FOO/' < text.txtFOO bar - The slash
/
is used by default as a delimiter which may leads to very confusing statements
12MBAPF:textdataprocessing fulmanp$ echo /path/to/some/location | sed 's/some\/location/other\/location\/possible\/to\/use/'/path/to/other/location/possible/to/use
Notice that without quotes we will get an error
12MBAPF:textdataprocessing fulmanp$ echo /path/to/some/location | sed s/some\/location/other\/location\/possible\/to\/use/sed: 1: "s/some/location/other/l ...": bad flag in substitute command: 'o'
We can use any other character like underline_
12MBAPF:textdataprocessing fulmanp$ echo /path/to/some/location | sed 's_some/location_other/location/possible/to/use_'/path/to/other/location/possible/to/use
or colons:
12MBAPF:textdataprocessing fulmanp$ echo /path/to/some/location | sed 's:some/location:other/location/possible/to/use:'/path/to/other/location/possible/to/use
As long as it's not in the string we are looking for, anything goes, even letters
12MBAPF:textdataprocessing fulmanp$ echo abcd | sed 'szbzsz'ascd
We may reference to what we have found in our replacement string using ampersand character &
1 2 |
MBAPF:textdataprocessing fulmanp$ echo 'foo bar foo' | sed 's/foo [a-z]* foo/& &/' foo bar foo foo bar foo |
Because patterns [a-z]*
matches zero or more letters, it may lead to some problems
1 2 |
MBAPF:textdataprocessing fulmanp$ echo '1 a' | sed 's/[a-z]*/b/' b1 a |
so better use pattern about which we will be sure it matches a nonempty string
1 2 |
MBAPF:textdataprocessing fulmanp$ echo '1 a' | sed 's/[a-z][a-z]*/b/' 1 b |
&
is very useful if we want to use search pattern in replacement but we don't know exactly what we will find
1 2 3 4 |
MBAPF:textdataprocessing fulmanp$ echo 'foo 123 > bar 345' | sed 's/[0-9][0-9]*/& &/' foo 123 123 bar 345 345 |
Sometimes, instead of reference to a whole search pattern we may need to refer to a part of the pattern. In such a case \1
is the first remembered pattern, the \2
, is the second remembered pattern and so on. The numeric value can have up to nine values: from \1
to \9
.
For example, if we want to keep the second word of a line, and delete all others, we can try this
1 2 |
MBAPF:textdataprocessing fulmanp$ echo 'ab cd ef gh' | sed 's/\([a-z][a-z]*\) \([a-z][a-z]*\).*/\2/' cd |
Nice example of remembered patterns usage is duplicated words detection.
1 |
sed -n '/\([a-z][a-z]*\) \1/p' |
This command will print lines with duplicated words
1 2 3 4 5 6 |
MBAPF:textdataprocessing fulmanp$ echo 'ab cd > cd cd ef > cd ef > cd ef ef' | sed -n '/\([a-z][a-z]*\) \1/p' cd cd ef cd ef ef |
If you are curious what are n
and p
responsible for, please keep reading next subsection.
-n
option andp
flag
By default,sed
prints every line. If it makes a substitution, the new text (substituted) is printed instead of the old one. When the-n
option is usedsed
will not, by default, print any lines. Thep
flag will cause the modified line to be printed.
123456MBAPF:textdataprocessing fulmanp$ echo 'ab cd> cd ef> ef gh> gh ij' | sed -n 's/ef/&/p'cd efef gh
Here is ased
replacement forgrep
command
1sed -n 's/[PATTERN]/&/p' [FILE]
Notice that there is also ap
command which also prints data. Thep
command used alone
1sed 'p'
will duplicate every line
123456MBAPF:textdataprocessing fulmanp$ echo 'ab cd> ef gh' | sed 'p'ab cdab cdef ghef gh
For example, if we want to double every empty line, we can use the following pattern
1sed '/^$/ p'
the way it is showned below
123456789101112MBAPF:textdataprocessing fulmanp$ echo 'ab cd>> ef gh>> ij kl' | sed '/^$/ p'ab cdef ghij kl
Withp
command we have a simpler replacement forgrep
command
123456MBAPF:textdataprocessing fulmanp$ echo 'ab cd> cd ef> ef gh> gh ij' | sed -n '/ef/ p'cd efef gh-e
command
-e
stands for--expression
and allows to combine multipe commands in one call.
Without-e
command we have to use pipes
1sed 's/b/B/' < alphabet01.txt | sed 's/g/G/' > res.txt
as we did in the following example
123456789MBAPF:textdataprocessing fulmanp$ echo 'ab bc cd de> ef fg gh hi' > alphabet01.txtMBAPF:textdataprocessing fulmanp$ cat alphabet01.txtab bc cd deef fg gh hiMBAPF:textdataprocessing fulmanp$ sed 's/b/B/' < alphabet01.txt | sed 's/g/G/' > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtaB bc cd deef fG gh hi
which looks clumsy. Much better is write it as
1234MBAPF:textdataprocessing fulmanp$ sed -e 's/b/B/' -e 's/g/G/' < alphabet01.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtaB bc cd deef fG gh hi
or
1234MBAPF:textdataprocessing fulmanp$ sed -e 's/b/B/; s/g/G/' < alphabet01.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtaB bc cd deef fG gh hi
If we have many commands and they won't fit neatly on one line, we can break up the line using, as usual in Unix commands, a backslash\
12345MBAPF:textdataprocessing fulmanp$ sed -e 's/b/B/' \> -e 's/g/G/' < alphabet01.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtaB bc cd deef fG gh hi
Third option are scripts which will be covered as a next command.
Note that commands are executed one after another in the same order as they are specified
12MBAPF:textdataprocessing fulmanp$ echo 'a b c' | sed -e 's/a/b/g' -e 's/b/B/g'B B c
-f
command
We can put commands into a file
12345MBAPF:textdataprocessing fulmanp$ echo 's/b/B/> s/g/G/' > sed_script_01.txtMBAPF:textdataprocessing fulmanp$ cat sed_script_01.txts/b/B/s/g/G/
and use-f
command to call it
1234MBAPF:textdataprocessing fulmanp$ sed -f sed_script_01.txt < alphabet01.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtaB bc cd deef fG gh hiAnother way of executing
sed
as a script is to use an interpreter script. Having a file that contains123#!/usr/bin/sed -fs/b/B/s/g/G/we can call it as
1234MBAPF:textdataprocessing fulmanp$ ./sed_script_02.txt < alphabet01.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtaB bc cd deef fG gh hi
In case of yoursed
location is different than/usr/bin/sed
, you can find it withwhereis
command12MBAPF:textdataprocessing fulmanp$ whereis sed/usr/bin/sed
Why do we use./[FILENAME]
to execute a file in Unix? Why not just enter it like other commandsgcc
,ls
etc.In Unix and related operating systems, dot
.
denotes the current directory. Since we want to run a file in our current directory and that directory is not in our$PATH
, we need the./
"prefix" to tell the shell where the executable is. So,./[FILENAME]
means: run the executable calledFILENAME
that is in this (current) directory.This answer raises another question: Why
./
is not in our$PATH
?It's for security reasons. Imagine that we are looking in someone else's home directory, and type just
gcc
or even simpler and more naturalls
. Doing this we want to know that we are running the real one, not a malicious version which may erases all our files, left by someone.Having
.
as the last entry in our$PATH
is a little bit safer, but there are other attacks which make use of that. An easy one is to exploit common typos, likesl
orls-l
that I personally write several times an every day. Or, find a common command that happens to be not installed on this system, for examplevim
.You may ask: Why do we need
./
at the start instead of simply.
?Also for this questiona there is an answer.
/
is the path separator in Unix, so we use it to separate the (current) directory.
from the following[FILENAME]
. Without this we have.[FILENAME]
which is a valid hidden file name in its own right.Of course we can call
sed
as a shell script. Having a file that contains123#!/bin/shsed -e 's/b/B/' \-e 's/g/G/'we execute it as
1234MBAPF:textdataprocessing fulmanp$ ./sed_script_03.txt < alphabet01.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtaB bc cd deef fG gh hiw
flag andw
command
With this flag we can specify a file to which the data will be saved. It does not sound unusual - after all we can save data using simply an output redirection with>
character. Interesting thing is that we can have ten files open with one instance of sed. This allows us to split up a stream of data into separate files. We will show this in Working with multiple files substitution.Besides
w
flag, there is alsow
command. Both works the same way and differ a little bit in syntax. Here is the example that will only write lines that start with#
1234567891011121314151617MBAPF:textdataprocessing fulmanp$ echo '# comment 1> no comment 1> # comment 2> no comment 2> # comment 3> no comment 3> # comment 4> no comment 4' > comments.txtMBAPF:textdataprocessing fulmanp$ cat comments.txt# comment 1no comment 1# comment 2no comment 2# comment 3no comment 3# comment 4no comment 4- as a flag
123456MBAPF:textdataprocessing fulmanp$ sed -n 's/^#/&/w res.txt' < comments.txtMBAPF:textdataprocessing fulmanp$ cat res.txt# comment 1# comment 2# comment 3# comment 4 - as a command
123456MBAPF:textdataprocessing fulmanp$ sed -n '/^#/ w res.txt' < comments.txtMBAPF:textdataprocessing fulmanp$ cat res.txt# comment 1# comment 2# comment 3# comment 4
- as a flag
r
command
If there is a command for writing, there should be a command for reading. And here it is. The following will insert a contents of ahlines.txt
file
123MBAPF:textdataprocessing fulmanp$ cat hlines.txt==================
after the line starting with the#
character (/#^/
is a pattern restriction -- you can read about it in Restrictions)
123456789101112131415161718MBAPF:textdataprocessing fulmanp$ sed '/^#/ r hlines.txt' < comments.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txt# comment 1==================no comment 1# comment 2==================no comment 2# comment 3==================no comment 3# comment 4==================no comment 4d
command
This command- deletes the current pattern space,
- reads in the next line,
- puts the new line into the pattern space,
- aborts the current command,
- and starts execution at the first sed command. This is called starting a new cycle.
To delete every line starting with the
#
character (/#^/
is a pattern restriction -- you can read about it in Restrictions)123456MBAPF:textdataprocessing fulmanp$ sed '/^#/ d' < comments.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtno comment 1no comment 2no comment 3no comment 4n
command
This command prints the pattern space, then replace it with the next line of input and continue with next command.12345678MBAPF:textdataprocessing fulmanp$ echo $'1\n2\n3' | sed 's/./A/; s/./B/'BBBMBAPF:textdataprocessing fulmanp$ echo $'1\n2\n3' | sed 's/./A/; n; s/./B/'ABAThat is why we can say that
n
excludes the commands that precede it from being applied to the line that was just pulled in.=
command
The=
command prints the current line number to standard output.12345678910111213141516MBAPF:textdataprocessing fulmanp$ sed '/^#/ =' < comments.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txt1# comment 1no comment 13# comment 2no comment 25# comment 3no comment 37# comment 4no comment 4MBAPF:textdataprocessing fulmanp$ sed -n '$=' comments.txt8Since the
=
command only prints to standard output, we cannot print the line number on the same line as the pattern. We need to edit multi-line patterns to do this -- more in Working with multiple lines section.I
flag
With this flagsed
is case insensitive.q
command
q
command quits fromsed
. This command is most useful when we want to abort the editing after some condition is reached. For example, we can use it to print first 2 lines of input (the number of displayed lines is right before theq
command -- you can read about it in Restrictions)123456MBAPF:textdataprocessing fulmanp$ echo 'ab> cd> ef> gh' | sed 2qabcdcomments
Comments are lines where the first non-white character is a#
. Be aware that on some systems, sed can have only one comment, and it must be the first line of the script.{[COMMANDS]}
We can group a set of commands to trigge them by a single restriction (see also Restrictions) match.123456789MBAPF:textdataprocessing fulmanp$ echo '1> 2> 3> 4' | sed -n '> 2 {> s/2/*/> p> }'*The above example should works in one-line version
12345MBAPF:textdataprocessing fulmanp$ echo '1> 2> 3> 4' | sed -n '2{s/2/*/;p;}'*If last semicolon is ommited, an error (on macOS) is reported
12345MBAPF:textdataprocessing fulmanp$ echo '1> 2> 3> 4' | sed -n '2{s/2/*/;p}'sed: 1: "2{s/2/*/;p}": extra characters at the end of p command
Sometimes we don't want to execute
sed
on every line -- for some reasons we want to restrict its operational range to some subset of lines, not all of them as it is by default.
- Restrict by line number
The simplest restriction applies a single line number. If we want to restrict a substitution to line 2, we should add a2
before the command
123456789101112131415161718192021MBAPF:textdataprocessing fulmanp$ echo 'a1 b1 c2> a2 b2 c2> a3 b3 c3> a4 b4 c4> a5 b5 c5> a6 b6 c6' > abc_6_lines.txtMBAPF:textdataprocessing fulmanp$ cat abc_6_lines.txta1 b1 c2a2 b2 c2a3 b3 c3a4 b4 c4a5 b5 c5a6 b6 c6MBAPF:textdataprocessing fulmanp$ sed '2 s/a/A/' < abc_6_lines.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txta1 b1 c2A2 b2 c2a3 b3 c3a4 b4 c4a5 b5 c5a6 b6 c6
We can also specify a range on line numbers by inserting a comma between the numbers. To restrict a substitution to the first 3 lines, we can use
12345678MBAPF:textdataprocessing fulmanp$ sed '1,3 s/a/A/' < abc_6_lines.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txtA1 b1 c2A2 b2 c2A3 b3 c3a4 b4 c4a5 b5 c5a6 b6 c6
Using the special character$
which means the last line in the file, we can restrict a substitution to the range from 3rd line to the end of file
12345678MBAPF:textdataprocessing fulmanp$ sed '4,$ s/a/A/' < abc_6_lines.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txta1 b1 c2a2 b2 c2a3 b3 c3A4 b4 c4A5 b5 c5A6 b6 c6
Not that line numbers are cumulative if several files are used.
123456789101112131415MBAPF:textdataprocessing fulmanp$ cat abc_3_lines.txta7 b7 c7a8 b8 c8a9 b9 c9MBAPF:textdataprocessing fulmanp$ sed '6,7 s/a/A/' abc_6_lines.txt abc_3_lines.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txta1 b1 c2a2 b2 c2a3 b3 c3a4 b4 c4a5 b5 c5A6 b6 c6A7 b7 c7a8 b8 c8a9 b9 c9
With GNUsed
we can also start at some line (6) and then operate on the next few lines (1)
1MBAPF:textdataprocessing fulmanp$ sed '6,+1 s/a/A/' abc_6_lines.txt abc_3_lines.txt
On macOS we have an error
12MBAPF:textdataprocessing fulmanp$ sed '6,+1s/a/A/' abc_6_lines.txt abc_3_lines.txtsed: 1: "6,+1s/a/A/": expected context address - Patterns
We can restrictsed
activity by a regular expression
1234567MBAPF:textdataprocessing fulmanp$ sed '/a[1,3,5]/ s/b/B/' < abc_6_lines.txta1 B1 c2a2 b2 c2a3 B3 c3a4 b4 c4a5 B5 c5a6 b6 c6
As with substitution, we can specify pattern delimiter. If the expression starts with a backslash\
, the next character is the delimiter
1234567MBAPF:textdataprocessing fulmanp$ sed '\_a[1,3,5]_ s/b/B/' < abc_6_lines.txta1 B1 c2a2 b2 c2a3 B3 c3a4 b4 c4a5 B5 c5a6 b6 c6 - Ranges by patterns
We can specify two regular expressions as the range
1234567MBAPF:textdataprocessing fulmanp$ sed '/b3/,/b5/ s/[a-z]//g' < abc_6_lines.txta1 b1 c2a2 b2 c23 3 34 4 45 5 5a6 b6 c6
We can combine line numbers and regular expressions
1234567MBAPF:textdataprocessing fulmanp$ sed '1,/b5/ s/[a-z]//g' < abc_6_lines.txt1 1 22 2 23 3 34 4 45 5 5a6 b6 c6 - Reversing the restriction
Sometimes we want to perform an action on every line except those that match a regular expression, or those outside of a range of addresses. In such a case we cen use the!
character to inverts the address restriction
123456789MBAPF:textdataprocessing fulmanp$ sed -n '1,3 s/a/A/ p' < abc_6_lines.txtA1 b1 c2A2 b2 c2A3 b3 c3MBAPF:textdataprocessing fulmanp$ sed -n '1,3! s/a/A/ p' < abc_6_lines.txtA4 b4 c4A5 b5 c5A6 b6 c6MacBook-A
a
append (add) a line
1234567891011121314151617MBAPF:textdataprocessing fulmanp$ sed '> /^#/ a\> ### ### ###> ' < comments.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txt# comment 1### ### ###no comment 1# comment 2### ### ###no comment 2# comment 3### ### ###no comment 3# comment 4### ### ###no comment 4i
insert a line
1234567891011121314151617MBAPF:textdataprocessing fulmanp$ sed '> /^#/ i\> ### ### ###> ' < comments.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txt### ### #### comment 1no comment 1### ### #### comment 2no comment 2### ### #### comment 3no comment 3### ### #### comment 4no comment 4c
change a line
12345678910111213MBAPF:textdataprocessing fulmanp$ sed '> /^#/ c\> ### ### ###> ' < comments.txt > res.txtMBAPF:textdataprocessing fulmanp$ cat res.txt### ### ###no comment 1### ### ###no comment 2### ### ###no comment 3### ### ###no comment 4
All three commands will allow us to add more than one line. It's enough to end each line with a \
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
MBAPF:textdataprocessing fulmanp$ sed ' > /^#/ c\ > ### > > ### > ' < comments.txt > res.txt MBAPF:textdataprocessing fulmanp$ cat res.txt ### no comment 1 ### no comment 2 ### no comment 3 ### no comment 4 MBAPF:textdataprocessing fulmanp$ sed ' > /^#/ c\ > ###\ > \ > ###\ > ' < comments.txt > res.txt MBAPF:textdataprocessing fulmanp$ cat res.txt ### ### no comment 1 ### ### no comment 2 ### ### no comment 3 ### ### no comment 4 |
Working with multiple lines may seems to be a little bit foggy, so please be sure that you understand everything what has ben written so for. We start this topic explaining three new commands used in multiple line patterns:
N
, D
, and P
. We will do this in relation to the well known n
, d
, and p
single line commands.
N
command
As we know, then
command- prints out the current pattern space,
- empty the current pattern space,
- and read in the next line of input.
In contrary, the
N
command- does not print out the current pattern space,
- does not empty the pattern space,
- appends a new line character to the pattern space,
- reads in the next line and appends it to the pattern space.
Compare the following examples
12345678910111213141516MBAPF:textdataprocessing fulmanp$ echo $'1\n2\n3' | sed 's/./A/; s/./B/;'BBBMBAPF:textdataprocessing fulmanp$ echo $'1\n2\n3' | sed 's/./A/; n; s/./B/;'ABAMBAPF:textdataprocessing fulmanp$ echo $'1\n2\n3' | sed 's/./A/; N; s/./B/;'B2MBAPF:textdataprocessing fulmanp$ echo $'1\n2\n3\n4' | sed 's/./A/; N; s/./B/;'B2B4If we want to match three particular lines we can do this as follow
12345678910111213141516171819MBAPF:textdataprocessing fulmanp$ echo 'a 1 a> b 2 b> c 3 c> d 4 d> e 5 e' | sed '> /2/ {> N> /3/ {> N> /4/ {> N> s/2.*3.*4/join: 2+3+4/> }> }> }> 'a 1 ab join: 2+3+4 de 5 eP
command
As we know, thep
command- simply prints the entire pattern space;
- the command doesn't change the pattern space.
In contrary, the
N
command- prints only the first part of the pattern space, up to the newline character;
- the command doesn't change the pattern space.
123MBAPF:textdataprocessing fulmanp$ echo '1 2 3' | sed -n 's/ /\/g; P;'1
Notice that if we want to use a new line in a pattern, we have to use literaly\n
while if we want to print a new line, we have to insert a literal new line character. This explains why we can not write1echo '1 2 3' | sed -n 's/ /\n/g; P;'Don't forget about escape character
\
to avoid error message1234MBAPF:textdataprocessing fulmanp$ echo '1 2 3' | sed -n 'p; s/ //g; D;'sed: 1: "p; s/ //g; D;": unescaped newline inside substitute patternD
command
As we know, thed
command- deletes the current pattern space,
- reads in the next line,
- puts the new line into the pattern space,
- aborts the current command,
- and starts execution at the first sed command.
In contrary, the
D
command- deletes the first portion of the pattern space, up to the new line character, leaving the rest of the pattern untouched
- stops the current command,
- it will not print the current pattern space,
- and starts the command cycle over again.
123456MBAPF:textdataprocessing fulmanp$ echo '1 2 3' | sed -n 'p; s/ /\/g; D;'1 2 3233
All three commands used together allow to remove the last 3 lines from a file
1 2 3 4 5 6 7 8 9 10 11 12 |
MBAPF:textdataprocessing fulmanp$ echo '1 > 2 > 3 > 4 > 5 > 6' | sed '1 {N;N;} > N > P > D' 1 2 3 |
Another example is to print line number proceeded line itself
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
MBAPF:textdataprocessing fulmanp$ cat abc_6_lines.txt a1 b1 c2 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 MBAPF:textdataprocessing fulmanp$ sed '/.*/ =' < abc_6_lines.txt | sed 'N;s/\n/: /' 1: a1 b1 c2 2: a2 b2 c2 3: a3 b3 c3 4: a4 b4 c4 5: a5 b5 c5 6: a6 b6 c6 |
or display input data on two columns
1 2 3 4 5 6 7 8 9 10 11 12 13 |
MBAPF:textdataprocessing fulmanp$ echo '1 > 2 > 3 > 4 > 5' | sed -En ' > $!N > s/(.*)\n/\1 \ > / > s/(.{10}).*\n/\1/ > p' 1 2 3 4 5 |
Despite the pattern space or buffer containing characters that can be modified and send to the output stream we have been talked so far, ther is also one more buffer: the hold buffer or hold space. This new buffer can be used to make a copy of the data in the pattern space for future use. There are five commands we can use while working with the hold buffer.
x
command
This command ex
changes the pattern space with the hold buffer.h
command
Theh
command copies the pattern buffer into the hold buffer. The pattern buffer is left unchanged.H
command
TheH
command allows us to combine several lines in the hold buffer. It works like theN
command as lines proceeded by\n
are appended to the buffer. This way we can save several lines in the hold buffer, and print them only if a particular pattern is found later.g
command
Works like theh
command but in revers direction: from the hold space to the pattern space.G
command
Works like theH
command but in revers direction: from the hold space to the pattern space.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
MBAPF:textdataprocessing fulmanp$ echo '1 2 > 2 3 > 3 4 > 4 5' | sed -n ' > h > s/3// > :print > p' 1 2 2 4 4 5 MBAPF:textdataprocessing fulmanp$ echo '1 2 > 2 3 > 3 4 > 4 5' | sed -n ' > h > s/3// > g > :print > p' 1 2 2 3 3 4 4 5 |
To control the flow of execution
sed
provides a looping and branching commands.
- loops
A loop insed
works similar to a classicgoto
statement. We can jump to the line marked by thelabel
and continue executing the remaining commands. A label is a name placed after colon:
123:loop:start:end
To jump to a specific label, we can use theb
command followed by the label name. If the label name is omitted, then jump is to the end of script.
123456789101112MBAPF:textdataprocessing fulmanp$ echo '1 2> 2 3> 3 4> 4 5' | sed -n '> /3/!b print> s/^/# /> :print> p'1 2# 2 3# 3 44 5 - branches
Branch can be created using thet
command. With this commandsed
jumps to the label only if the previous substitute command was successful.To complete task simila to the task completed in loops part we can write (unfortunatelly we have to use
b
command which doesn't make any sense, becauseb
command alone is enough to complete this task)1234567891011121314151617MBAPF:textdataprocessing fulmanp$ echo '1 2> 2 3> 3 4> 4 5' | sed -n '> h> s/3//> t comment> b print> :comment> g> s/^/# /> :print> p'1 2# 2 3# 3 44 5Another example may be more useful
1234567891011MBAPF:textdataprocessing fulmanp$ echo '1> 1 2> 1 2 3> 1 2 3 4' | sed -E '> :start> s/^(.{1,8})$/\1*/> t start'1********1 2******1 2 3****1 2 3 4**Here an
-E
command line option is used to enable extended regular expressions; on other then macOS systems-r
option may be correct.
Sometimes we may see such a strange lines of code1234... some code ...t label:label... some code ...At first sight, the branch here seems useless since it will jump to the instruction that would have been executed anyway. However, if we read the definition of the
t
command carefully, we will see it branches only if there was a substitution since the start of the current cycle or since the previous test command was executed. In other words, the test instruction has the side effect of clearing the substitution flag. This is exactly the purpose of the code fragment above. This is a trick to avoid false positive when using several substitutions commands.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
MBAPF:textdataprocessing fulmanp$ echo '100 abc > 101 def > 102 ghi > 103 jkl' | sed -n ' > h > s/\(^[0-9]*[02468] \)/\1/ > t even > s/\(^[0-9]*[13579] \)/\1/ > t odd > b > :even > g > w res_even.txt > b > :odd > g > w res_odd.txt > ' MBAPF:textdataprocessing fulmanp$ cat res_even.txt 100 abc 102 ghi MBAPF:textdataprocessing fulmanp$ cat res_odd.txt 101 def 103 jkl |
- The Basics of Using the Sed Stream Editor to Manipulate Text in Linux
- Intermediate Sed: Manipulating Streams of Text in a Linux Environment
- Complete Sed Command Guide [Explained with Practical Examples] Very detailed explanation of
sed
's "internals". - Linux sed command Here, among other things, we can find a good overview of regular expression syntax
- Regex tutorial for Linux (Sed & AWK) examples
- Ultimate Sed Tutorial With Examples
- 20 Sed (Stream Editor) Command Examples for Linux Users
- 31+ Examples for sed Linux Command in Text Manipulation
- Sed and Awk 101 Hacks