Skip to content

sed

In this part we cover the following topics


sed

The acronym sed stands for stream editor. Sometimes you may think about it as substitution editor because the most typical usage is text substitution. Created with the birth of Unix (during 1973–74 by Lee E. McMahon of Bell Labs) it is one of the most indispensable tools we can't live with. Acording to general Unix philosophy [Unix philosophy]

  • Write programs that do one thing and do it well.
  • Write programs to work together.
  • Write programs to handle text streams, because that is a universal interface.

it is a simple utility that allow us to make some text transformations. Simplicity is a source of its power and today, it runs on all major operating systems.
To a novice user, the syntax of sed may look cryptic which is true for all command lines tools. However, once we get familiar with its syntax, we can solve many complex tasks with just a few lines of code. The sed is a line-oriented stream editor that performs editing operations on information coming from standard input or a file. sed edits line by line and in a non-interactive way. This means that we make all of the editing decisions as we are calling the editor. This may seem confusing or unintuitive for those who were raised in the WYSIWYG culture, but it is a very powerful and fast way to transform text. sed reads text, line by line, from a source (standard input stream or file) into an internal buffer called the pattern space. Each line read starts a new cycle. To the pattern space, sed applies one or more operations which have been specified via a command line or sed script.


Basic usage

s, the substitute command is the element from which sed is best known. In its simples form it has the form

which may be used like below

Remarks:

  • Only the first occurrence of foo was changed.
  • sedchanges exactly what we tell it to.
  • To avoid any problems in a future, it is recommend to use quotes even in this case.
  • The slash / is used by default as a delimiter which may leads to very confusing statements

    Notice that without quotes we will get an error

    We can use any other character like underline _

    or colons :

    As long as it's not in the string we are looking for, anything goes, even letters

We may reference to what we have found in our replacement string using ampersand character &

Because patterns [a-z]* matches zero or more letters, it may lead to some problems

so better use pattern about which we will be sure it matches a nonempty string

& is very useful if we want to use search pattern in replacement but we don't know exactly what we will find

Sometimes, instead of reference to a whole search pattern we may need to refer to a part of the pattern. In such a case \1 is the first remembered pattern, the \2, is the second remembered pattern and so on. The numeric value can have up to nine values: from \1 to \9.

For example, if we want to keep the second word of a line, and delete all others, we can try this

Nice example of remembered patterns usage is duplicated words detection.

This command will print lines with duplicated words

If you are curious what are n and p responsible for, please keep reading next subsection.


Options and flags

  • -n option and p flag
    By default, sed prints every line. If it makes a substitution, the new text (substituted) is printed instead of the old one. When the -n option is used sed will not, by default, print any lines. The p flag will cause the modified line to be printed.

    Here is a sed replacement for grep command

    Notice that there is also a p command which also prints data. The p command used alone

    will duplicate every line

    For example, if we want to double every empty line, we can use the following pattern

    the way it is showned below

    With p command we have a simpler replacement for grep command
  • -e command
    -e stands for --expression and allows to combine multipe commands in one call.
    Without -e command we have to use pipes

    as we did in the following example

    which looks clumsy. Much better is write it as

    or

    If we have many commands and they won't fit neatly on one line, we can break up the line using, as usual in Unix commands, a backslash \

    Third option are scripts which will be covered as a next command.


    Note that commands are executed one after another in the same order as they are specified

  • -f command
    We can put commands into a file

    and use -f command to call it

    Another way of executing sed as a script is to use an interpreter script. Having a file that contains

    we can call it as


    In case of your sed location is different than /usr/bin/sed, you can find it with whereis command



    Why do we use ./[FILENAME] to execute a file in Unix? Why not just enter it like other commands gcc, ls etc.

    In Unix and related operating systems, dot . denotes the current directory. Since we want to run a file in our current directory and that directory is not in our $PATH, we need the ./ "prefix" to tell the shell where the executable is. So, ./[FILENAME] means: run the executable called FILENAME that is in this (current) directory.

    This answer raises another question: Why ./ is not in our $PATH?

    It's for security reasons. Imagine that we are looking in someone else's home directory, and type just gcc or even simpler and more natural ls. Doing this we want to know that we are running the real one, not a malicious version which may erases all our files, left by someone.

    Having . as the last entry in our $PATH is a little bit safer, but there are other attacks which make use of that. An easy one is to exploit common typos, like sl or ls-l that I personally write several times an every day. Or, find a common command that happens to be not installed on this system, for example vim.

    You may ask: Why do we need ./ at the start instead of simply .?

    Also for this questiona there is an answer. / is the path separator in Unix, so we use it to separate the (current) directory . from the following [FILENAME]. Without this we have .[FILENAME] which is a valid hidden file name in its own right.

    Why do we use "./" to execute a file?

    Of course we can call sed as a shell script. Having a file that contains

    we execute it as

  • w flag and w command
    With this flag we can specify a file to which the data will be saved. It does not sound unusual - after all we can save data using simply an output redirection with > character. Interesting thing is that we can have ten files open with one instance of sed. This allows us to split up a stream of data into separate files. We will show this in Working with multiple files substitution.

    Besides w flag, there is also w command. Both works the same way and differ a little bit in syntax. Here is the example that will only write lines that start with #

    • as a flag
    • as a command
  • r command
    If there is a command for writing, there should be a command for reading. And here it is. The following will insert a contents of a hlines.txt file

    after the line starting with the # character (/#^/ is a pattern restriction -- you can read about it in Restrictions)
  • d command
    This command

    • deletes the current pattern space,
    • reads in the next line,
    • puts the new line into the pattern space,
    • aborts the current command,
    • and starts execution at the first sed command. This is called starting a new cycle.

    To delete every line starting with the # character (/#^/ is a pattern restriction -- you can read about it in Restrictions)

  • n command
    This command prints the pattern space, then replace it with the next line of input and continue with next command.

    That is why we can say that n excludes the commands that precede it from being applied to the line that was just pulled in.

  • = command
    The = command prints the current line number to standard output.

    Since the = command only prints to standard output, we cannot print the line number on the same line as the pattern. We need to edit multi-line patterns to do this -- more in Working with multiple lines section.

  • I flag
    With this flag sed is case insensitive.
  • q command
    q command quits from sed. This command is most useful when we want to abort the editing after some condition is reached. For example, we can use it to print first 2 lines of input (the number of displayed lines is right before the q command -- you can read about it in Restrictions)

  • comments
    Comments are lines where the first non-white character is a #. Be aware that on some systems, sed can have only one comment, and it must be the first line of the script.
  • {[COMMANDS]}
    We can group a set of commands to trigge them by a single restriction (see also Restrictions) match.

    The above example should works in one-line version

    If last semicolon is ommited, an error (on macOS) is reported


Restrictions

Sometimes we don't want to execute sed on every line -- for some reasons we want to restrict its operational range to some subset of lines, not all of them as it is by default.

  • Restrict by line number
    The simplest restriction applies a single line number. If we want to restrict a substitution to line 2, we should add a 2 before the command

    We can also specify a range on line numbers by inserting a comma between the numbers. To restrict a substitution to the first 3 lines, we can use

    Using the special character $ which means the last line in the file, we can restrict a substitution to the range from 3rd line to the end of file

    Not that line numbers are cumulative if several files are used.

    With GNU sed we can also start at some line (6) and then operate on the next few lines (1)

    On macOS we have an error
  • Patterns
    We can restrict sed activity by a regular expression

    As with substitution, we can specify pattern delimiter. If the expression starts with a backslash \, the next character is the delimiter
  • Ranges by patterns
    We can specify two regular expressions as the range

    We can combine line numbers and regular expressions
  • Reversing the restriction
    Sometimes we want to perform an action on every line except those that match a regular expression, or those outside of a range of addresses. In such a case we cen use the ! character to inverts the address restriction


Adding, changing, inserting

  • a append (add) a line
  • i insert a line
  • c change a line

All three commands will allow us to add more than one line. It's enough to end each line with a \


Working with multiple lines

Working with multiple lines may seems to be a little bit foggy, so please be sure that you understand everything what has ben written so for. We start this topic explaining three new commands used in multiple line patterns: N, D, and P. We will do this in relation to the well known n, d, and p single line commands.

  • N command
    As we know, the n command

    • prints out the current pattern space,
    • empty the current pattern space,
    • and read in the next line of input.

    In contrary, the N command

    • does not print out the current pattern space,
    • does not empty the pattern space,
    • appends a new line character to the pattern space,
    • reads in the next line and appends it to the pattern space.

    Compare the following examples

    If we want to match three particular lines we can do this as follow

  • P command
    As we know, the p command

    • simply prints the entire pattern space;
    • the command doesn't change the pattern space.

    In contrary, the N command

    • prints only the first part of the pattern space, up to the newline character;
    • the command doesn't change the pattern space.


    Notice that if we want to use a new line in a pattern, we have to use literaly \n while if we want to print a new line, we have to insert a literal new line character. This explains why we can not write

    Don't forget about escape character \ to avoid error message

  • D command
    As we know, the d command

    • deletes the current pattern space,
    • reads in the next line,
    • puts the new line into the pattern space,
    • aborts the current command,
    • and starts execution at the first sed command.

    In contrary, the D command

    • deletes the first portion of the pattern space, up to the new line character, leaving the rest of the pattern untouched
    • stops the current command,
    • it will not print the current pattern space,
    • and starts the command cycle over again.

All three commands used together allow to remove the last 3 lines from a file


Another example is to print line number proceeded line itself

or display input data on two columns


The hold buffer

Despite the pattern space or buffer containing characters that can be modified and send to the output stream we have been talked so far, ther is also one more buffer: the hold buffer or hold space. This new buffer can be used to make a copy of the data in the pattern space for future use. There are five commands we can use while working with the hold buffer.

  • x command
    This command exchanges the pattern space with the hold buffer.
  • h command
    The h command copies the pattern buffer into the hold buffer. The pattern buffer is left unchanged.
  • H command
    The H command allows us to combine several lines in the hold buffer. It works like the N command as lines proceeded by \n are appended to the buffer. This way we can save several lines in the hold buffer, and print them only if a particular pattern is found later.
  • g command
    Works like the h command but in revers direction: from the hold space to the pattern space.
  • G command
    Works like the H command but in revers direction: from the hold space to the pattern space.


Control flow

To control the flow of execution sed provides a looping and branching commands.

  • loops
    A loop in sed works similar to a classic goto statement. We can jump to the line marked by the label and continue executing the remaining commands. A label is a name placed after colon :

    To jump to a specific label, we can use the b command followed by the label name. If the label name is omitted, then jump is to the end of script.
  • branches
    Branch can be created using the t command. With this command sed jumps to the label only if the previous substitute command was successful.

    To complete task simila to the task completed in loops part we can write (unfortunatelly we have to use b command which doesn't make any sense, because b command alone is enough to complete this task)

    Another example may be more useful

    Here an -E command line option is used to enable extended regular expressions; on other then macOS systems -r option may be correct.


    Sometimes we may see such a strange lines of code

    At first sight, the branch here seems useless since it will jump to the instruction that would have been executed anyway. However, if we read the definition of the t command carefully, we will see it branches only if there was a substitution since the start of the current cycle or since the previous test command was executed. In other words, the test instruction has the side effect of clearing the substitution flag. This is exactly the purpose of the code fragment above. This is a trick to avoid false positive when using several substitutions commands.


Working with multiple files


Sources, examples, tutorials