awk – Tutorials

In this part we cover the following topics

awk

awk

awk is not an abbreviation for awkward as many people things after first few minutes they spend with this tool. In fact, it is an elegant and simple language. The work awk is derived from the initials of the language's three developers: A. Aho, B. W. Kernighan and P. Weinberger.

awk is an excellent filter and report writer, and in many cases it is easier to use awk than "conventional" programming languages like C or Python. We can wonder what differs awk from sed?

sed is a stream editor. It works with streams of characters on a per-line basis. It uses pattern matching and address matching to take an actions. It has a primitive programming language that includes goto-style loops and simple conditionals. There are essentially only two "variables": pattern space and hold space. Mathematical operations are almost not possible while string functions are not possible at all. sed can be used when there are patterns in the text. For example, we could replace all the negative numbers in some text that are in the form "minus-sign followed by a sequence of digits" (e.g., "-123.45") with their absolute values numbers (e.g., "123.45").

awk is oriented toward delimited fields on a per-line basis. There is complete support for variables and single-dimension associative arrays. There are some mathematical operations as well as some very basic string functions. It has C-style printf, allows to define user functions and has programming constructs including conditions (if/else) and loops (for, while and do/while). It also uses pattern matching to take an actions. awk can be used when the "text" has like rows and columns structure. For example, we could sum all negative values from second column.

AWK follows a simple Read, Execute, and Repeat workflow given below

Execute commands from BEGIN block.
Read a line from input stream.
Execute commands on a previously read line.
If it's not end of file go to step 2.
Execute commands from END block.

Looking at this workflow it should be clear why awk is a perfect tool to generate simple formatted reports.

With awk scripting language, we can

use variables;
use string and most arithmetic operators we know from C language;
use control flow and loops.

Being more precisely, in awk we can use a lot of elements well known from classic C language

printf function to pretty print with
- escape sequences,
- format specifiers,
- minimum field width specifiers,
- left justification,
- field precision value specifiers
We can also send output to a named file instead a standard output, with the following format

printf([FORMAT], [ARGUMENTS]) > [OUTPUT_FILE]

1

printf([FORMAT], [ARGUMENTS]) > [OUTPUT_FILE]
Flow Control with next and exit
We can exit from an awk script using the exit command. The second command, the next command, will also change the flow of the program. It causes the current processing of the pattern space to stop. The program reads in the next line, and starts executing the commands again with the new line.
Numerical functions cos, exp, int, log, sin, sqrt (for some versions there are also atan, rand, srand)
String functions

index(string, search) length(string) split(string,array,separator) substr(string,position) substr(string,position,max)

1
2
3
4
5

index(string, search)
length(string)
split(string,array,separator)
substr(string,position)
substr(string,position,max)
if for control flows
while and for for loops

Below a list and syntax of all awk commands is given

if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
for ( variable in array ) statement
break
continue
{ [ statement ] ...}
variable=expression
print [ expression-list ] [ > expression ]
printf format [ , expression-list ] [ > expression ]
next 
exit

if ( conditional ) statement [ else statement ]

while ( conditional ) statement

for ( expression ; conditional ; expression ) statement

for ( variable in array ) statement

break

continue

{ [ statement ] ...}

variable=expression

print [ expression-list ] [ > expression ]

printf format [ , expression-list ] [ > expression ]

exit

The awk command is used like this

awk options program file

1	awk options program file

awk refers to the rows and columns as records and fields. Note, that awk gives us an access to the first 99 of fields in a single line

Variables

In awk there are two kinds of variables

User defined A user defined variable is one we create.
Positional A positional variable is not a special variable, but a function triggered by the dollar sign $. Therefore

User defined variables can be defined before script execution and used through the execution of the script

MBAPF:textdataprocessing fulmanp$ awk -v foo=3 'BEGIN {print foo}'

1	MBAPF:textdataprocessing fulmanp$ awk -v foo=3 'BEGIN {print foo}'

In this example

-v option assigning a value to a variable. It allows assignment before the program execution.
foo=3 is a definition on foo variable.
BEGIN {print foo} is a BEGIN block with only one command: print intended to print value of foo variable.

They can be also defined inside one of "regular" blocks

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN {foo=7; print foo}'
7

1 2	MBAPF:textdataprocessing fulmanp$ awk 'BEGIN {foo=7; print foo}' 7

Positional variables allows us to get an access to the specified fields from currently processed line. The variable $0 refers to the entire line that awk reads in. Having a data file as given below

MBAPF:textdataprocessing fulmanp$ echo '11 12 13
> 21 22 23
> 31 32 33
> 41 42 43' > data01.txt
MBAPF:textdataprocessing fulmanp$ cat data01.txt 
11 12 13
21 22 23
31 32 33
41 42 43

MBAPF:textdataprocessing fulmanp$ echo '11 12 13

> 21 22 23

> 31 32 33

> 41 42 43' > data01.txt

MBAPF:textdataprocessing fulmanp$ cat data01.txt

11 12 13

21 22 23

31 32 33

41 42 43

we can write

MBAPF:textdataprocessing fulmanp$ awk '{print $0}' data01.txt 
11 12 13
21 22 23
31 32 33
41 42 43

MBAPF:textdataprocessing fulmanp$ awk '{print $0}' data01.txt

11 12 13

21 22 23

31 32 33

41 42 43

to print all of them.

Variables of the form $[POSITIVE NATURAL] addresses POSITIVE NATURAL field from our data (remember about a limit of 99 fields in a single line)

MBAPF:textdataprocessing fulmanp$ awk '{print $1,$3}' data01.txt 
11 13
21 23
31 33
41 43

MBAPF:textdataprocessing fulmanp$ awk '{print $1,$3}' data01.txt

11 13

21 23

31 33

41 43

Notice that last two examples can be also completed with commands

MBAPF:textdataprocessing fulmanp$ awk '{print}' data01.txt 
11 12 13
21 22 23
31 32 33
41 42 43
MBAPF:textdataprocessing fulmanp$ awk '{$2=""; print}' data01.txt 
11  13
21  23
31  33
41  43

MBAPF:textdataprocessing fulmanp$ awk '{print}' data01.txt

11 12 13

21 22 23

31 32 33

41 42 43

MBAPF:textdataprocessing fulmanp$ awk '{$2=""; print}' data01.txt

11 13

21 23

31 33

41 43

Notice that in the second case results are similar, but not identical. The number of spaces between the values vary. There are two reasons for this. The actual number of fields does not change. Setting a positional variable to an empty string does not delete the variable. It's still there, but the contents has been deleted. The other reason is the way AWK outputs the entire line. The first example outputs three fields, while the second outputs two. In-between each field there is a space. In result we have two spaces.

An useful variable related to fields is the NF (number of fields)

MBAPF:textdataprocessing fulmanp$ awk '{print $NF}' data01.txt 
13
23
33
43
MBAPF:textdataprocessing fulmanp$ awk '{print $(NF-1)}' data01.txt 
12
22
32
42

MBAPF:textdataprocessing fulmanp$ awk '{print $NF}' data01.txt

MBAPF:textdataprocessing fulmanp$ awk '{print $(NF-1)}' data01.txt

With NF it is easy to print last element from every line even if the number of fields differs from line to line

MBAPF:textdataprocessing fulmanp$ echo '11
> 21 22
> 31 32 33
> 42 42 43 44' > data02.txt
MBAPF:textdataprocessing fulmanp$ cat data02.txt 
11
21 22
31 32 33
42 42 43 44
MBAPF:textdataprocessing fulmanp$ awk '{print $NF}' data02.txt 
11
22
33
44

MBAPF:textdataprocessing fulmanp$ echo '11

> 21 22

> 31 32 33

> 42 42 43 44' > data02.txt

MBAPF:textdataprocessing fulmanp$ cat data02.txt

21 22

31 32 33

42 42 43 44

MBAPF:textdataprocessing fulmanp$ awk '{print $NF}' data02.txt

Another "counter", the NR (the number of records), tells us the number of records, or the line number. With this we can work with certain lines

MBAPF:textdataprocessing fulmanp$ awk '{if(NR>2){print $0}}' data01.txt 
31 32 33
41 42 43

MBAPF:textdataprocessing fulmanp$ awk '{if(NR>2){print $0}}' data01.txt

31 32 33

41 42 43

Separators

The input field separator variable

With awk we can process many different text files and, what is quite obvious, not all of them use whitespace as a field separator. We can easily change the field separator character to be any other character using the -F command line option.

MBAPF:textdataprocessing fulmanp$ echo '11 12,13,14 15
> 21 22,23,24 25
> 31 32,33,34 35' > data03.txt
MBAPF:textdataprocessing fulmanp$ cat data03.txt 
11 12,13,14 15
21 22,23,24 25
31 32,33,34 35
MBAPF:textdataprocessing fulmanp$ awk '{print $3}' data03.txt 
15
25
35
MBAPF:textdataprocessing fulmanp$ awk -F, '{print $3}' data03.txt 
14 15
24 25
34 35

MBAPF:textdataprocessing fulmanp$ echo '11 12,13,14 15

> 21 22,23,24 25

> 31 32,33,34 35' > data03.txt

MBAPF:textdataprocessing fulmanp$ cat data03.txt

11 12,13,14 15

21 22,23,24 25

31 32,33,34 35

MBAPF:textdataprocessing fulmanp$ awk '{print $3}' data03.txt

MBAPF:textdataprocessing fulmanp$ awk -F, '{print $3}' data03.txt

14 15

24 25

34 35

However there is a way to do this without the command line option. Instead the variable FS can be set.

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{FS=","} {print $3}' data03.txt 
14 15
24 25
34 35

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{FS=","} {print $3}' data03.txt

14 15

24 25

34 35

Notice that if FS is not defined in BEGIN block, the result is different

MBAPF:textdataprocessing fulmanp$ awk '{FS=","; print $3}' data03.txt 
15
24 25
34 35

MBAPF:textdataprocessing fulmanp$ awk '{FS=","; print $3}' data03.txt

24 25

34 35

Explanation for this is quite clear if we only realize how awk works. It process file line by line and first it reads the whole line, prepares for processing and after that processes it (executes all commands). If we change the field separator before we read the line, the change affects what we read. If we change it after we read the line, it will not redefine the variables.

The output field separator variable

Consider the following examples

MBAPF:textdataprocessing fulmanp$ awk '{print $2 $3}' data01.txt 
1213
2223
3233
4243
MBAPF:textdataprocessing fulmanp$ awk '{print $2, $3}' data01.txt 
12 13
22 23
32 33
42 43

MBAPF:textdataprocessing fulmanp$ awk '{print $2 $3}' data01.txt

1213

2223

3233

4243

MBAPF:textdataprocessing fulmanp$ awk '{print $2, $3}' data01.txt

12 13

22 23

32 33

42 43

In the first case, the two positional parameters are concatenated together and output without a space. In the second case, two fields are printed, and the output field separator is placed between them. By default this separator is a whitespace, but we can change this by modifying the variable OFS.

MBAPF:textdataprocessing fulmanp$ awk '{OFS=":"; print $2, $3}' data01.txt 
12:13
22:23
32:33
42:43

MBAPF:textdataprocessing fulmanp$ awk '{OFS=":"; print $2, $3}' data01.txt

12:13

22:23

32:33

42:43

The record separator variable

awk reads one line (called record in awk) at a time, and breaks up the line into fields. We can change awk's definition of a line setting the RS variable. If we set it to an empty string, then awk will read the entire file into memory.

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS=" "} {print ">"$0"<"}' data01.txt 
>11<
>12<
>13
21<
>22<
>23
31<
>32<
>33
41<
>42<
>43
<

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS=" "} {print ">"$0"<"}' data01.txt

>11<

>12<

>13

21<

>22<

>23

31<

>32<

>33

41<

>42<

>43

The output record separator variable

The default output record separator is a newline. This can be set to be any sequence of characters with ORS variable.

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS=" "} {print $0}' data01.txt | awk 'BEGIN{ORS="::"} {print $0}' > res.txt
MBAPF:textdataprocessing fulmanp$ cat res.txt 
11::12::13::21::22::23::31::32::33::41::42::43::::

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS=" "} {print $0}' data01.txt | awk 'BEGIN{ORS="::"} {print $0}' > res.txt

MBAPF:textdataprocessing fulmanp$ cat res.txt

11::12::13::21::22::23::31::32::33::41::42::43::::

Notice that while ORS can be a sequence of characters like :: in the example above, RS can take only one character

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS="2"} {print ">"$0"<"}' res.txt 
>11::1<
>::13::<
>1::<
><
>::<
>3::31::3<
>::33::41::4<
>::43::::<
MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS="22"} {print ">"$0"<"}' res.txt 
>11::1<
>::13::<
>1::<
><
>::<
>3::31::3<
>::33::41::4<
>::43::::<

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS="2"} {print ">"$0"<"}' res.txt

>11::1<

>::13::<

>1::<

>::<

>3::31::3<

>::33::41::4<

>::43::::<

MBAPF:textdataprocessing fulmanp$ awk 'BEGIN{RS="22"} {print ">"$0"<"}' res.txt

>11::1<

>::13::<

>1::<

>::<

>3::31::3<

>::33::41::4<

>::43::::<

Arrays

In awk we can use one dimensional associative arrays. Associativity is a good news, because allows to reduce coding time and makes difficult problems much more simpler. Let's write a simple program which counts the number of words occurrences in a file. First we create a file

MBAPF:textdataprocessing fulmanp$ echo '11
> 22
> 22
> 33
> 33
> 33' > data04.txt
MBAPF:textdataprocessing fulmanp$ cat data04.txt 
11
22
22
33
33
33

MBAPF:textdataprocessing fulmanp$ echo '11

> 22

> 33

> 33' > data04.txt

MBAPF:textdataprocessing fulmanp$ cat data04.txt

then we can count words with the following script saved under script01.awk name

{
	username[$1]++;
}
END {
	for (i in username) {
		print i":"username[i];
	}
}

{

username[$1]++;

}

END {

for (i in username) {

print i":"username[i];

}

Up to now we have get awk program directly. Fortunately awk provides the ability to read from file with parameter -f what we will use at the moment.

MBAPF:textdataprocessing fulmanp$ awk -f script01.awk data04.txt 
22:2
11:1
33:3
MBAPF:textdataprocessing fulmanp$ man sort

MBAPF:textdataprocessing fulmanp$ awk -f script01.awk data04.txt

22:2

11:1

33:3

MBAPF:textdataprocessing fulmanp$ man sort

Using pipes, sort and top commands we can select this way two top most frequently words

MBAPF:textdataprocessing fulmanp$ awk -f script01.awk data04.txt | sort -r | head -n 2
33:3
22:2

MBAPF:textdataprocessing fulmanp$ awk -f script01.awk data04.txt | sort -r | head -n 2

33:3

22:2

Imagine now that we have a file profits.txt with our profits from the programs we sell in iTunes

January app01 10
February app02 2
February app03 15
March app02 3
April app01 7
May app03 12
May app03 5
May app02 8
June app01 10
July app01 20
August app02 5
August app03 15
September app03 4
October app02 15
November app01 8
December app02 17
December app02 3
December app03 5
December app01 9

January app01 10

February app02 2

February app03 15

March app02 3

April app01 7

May app03 12

May app03 5

May app02 8

June app01 10

July app01 20

August app02 5

August app03 15

September app03 4

October app02 15

November app01 8

December app02 17

December app02 3

December app03 5

December app01 9

and we want to calculate the most profitable month, the most profitable application and the most profitable application in every month.

The most profitable month

{
	data[$1] += $3;
}
END {
	for (i in data) {
		print i, data[i];
	}
}

{

data[$1] += $3;

}

END {

for (i in data) {

print i, data[i];

}

MBAPF:textdataprocessing fulmanp$ awk -f profits_m.awk profits.txt | sort -nrk 2,2
December 34
May 25
July 20
August 20
February 17
October 15
June 10
January 10
November 8
April 7
September 4
March 3

MBAPF:textdataprocessing fulmanp$ awk -f profits_m.awk profits.txt | sort -nrk 2,2

December 34

May 25

July 20

August 20

February 17

October 15

June 10

January 10

November 8

April 7

September 4

March 3

The most profitable application

{ data[$2] += $3; } END { for (i in data) { print i, data[i]; } }

1
2
3
4
5
6
7
8

{
data[$2] += $3;
}
END {
for (i in data) {
print i, data[i];
}
}

MBAPF:textdataprocessing fulmanp$ awk -f profits_a.awk profits.txt | sort -nrk 2,2 app01 64 app03 56 app02 53

1
2
3
4

MBAPF:textdataprocessing fulmanp$ awk -f profits_a.awk profits.txt | sort -nrk 2,2
app01 64
app03 56
app02 53

The most profitable application every month
To simplify our solution, two scripts will be used. First profits_ma_1.awk

{
	data[$1":"$2] += $3;
}
END {
	for (i in data) {
		print i, data[i];
	}
}

{

data[$1":"$2] += $3;

}

END {

for (i in data) {

print i, data[i];

}

and a second profits_ma_2.awk

{
	if($2 > data[substr($1,0,3)]) {data[substr($1,0,3)]=$2; name[substr($1,0,3)]=$1}
}
END {
	map["Jan"] = 1
	map["Feb"] = 2
	map["Mar"] = 3
	map["Apr"] = 4
	map["May"] = 5
	map["Jun"] = 6
	map["Jul"] = 7
	map["Aug"] = 8
	map["Sep"] = 9
	map["Oct"] = 10
	map["Nov"] = 11
	map["Dec"] = 12
	
	for (i in data) {
		
		print map[i], i, substr(name[i],(index(name[i],":"))+1), data[i];
	}
}

{

if($2 > data[substr($1,0,3)]) {data[substr($1,0,3)]=$2; name[substr($1,0,3)]=$1}

}

END {

map["Jan"] = 1

map["Feb"] = 2

map["Mar"] = 3

map["Apr"] = 4

map["May"] = 5

map["Jun"] = 6

map["Jul"] = 7

map["Aug"] = 8

map["Sep"] = 9

map["Oct"] = 10

map["Nov"] = 11

map["Dec"] = 12

for (i in data) {

print map[i], i, substr(name[i],(index(name[i],":"))+1), data[i];

}

We can call them as it is shown bellow

MBAPF:textdataprocessing fulmanp$ awk -f profits_ma_1.awk profits.txt | awk -f profits_ma_2.awk | sort -nk 1,1
1 Jan app01 10
2 Feb app03 15
3 Mar app02 3
4 Apr app01 7
5 May app03 17
6 Jun app01 10
7 Jul app01 20
8 Aug app03 15
9 Sep app03 4
10 Oct app02 15
11 Nov app01 8
12 Dec app02 20

MBAPF:textdataprocessing fulmanp$ awk -f profits_ma_1.awk profits.txt | awk -f profits_ma_2.awk | sort -nk 1,1

1 Jan app01 10

2 Feb app03 15

3 Mar app02 3

4 Apr app01 7

5 May app03 17

6 Jun app01 10

7 Jul app01 20

8 Aug app03 15

9 Sep app03 4

10 Oct app02 15

11 Nov app01 8

12 Dec app02 20

Patterns

So far we have only used two patterns: the special words BEGIN and END even without calling them a pattern. Patterns seems to be something indispensable in text processing tool as we saw it in sed part. When we realize how they are used, we can conclude that patterns are used like conditions in the environment where conditional statement doesn't exist. But we have conditions in awk, so we don't need patterns because we can duplicate them using an if statement.

A pattern (or condition) is simply an abbreviated test. If the condition is true, the action is performed. All relational tests can be used as a pattern.

MBAPF:textdataprocessing fulmanp$ awk '{if(NR<=3){print}}' profits.txt 
January app01 10
February app02 2
February app03 15

MBAPF:textdataprocessing fulmanp$ awk '{if(NR<=3){print}}' profits.txt

January app01 10

February app02 2

February app03 15

If we prefer, we can change the if statement to a condition what results in shortening the code

MBAPF:textdataprocessing fulmanp$ awk 'NR<3 {print}' profits.txt 
January app01 10
February app02 2

MBAPF:textdataprocessing fulmanp$ awk 'NR<3 {print}' profits.txt

January app01 10

February app02 2

Besides conditional tests, you can also use a regular expressions. Printing all lines that contain the sequence # commentfrom a file pattern.txt

# comment 1 line 1
# comment 1 line 2
begin
block 1 line 1
block 1 line 2
block 1 line 3
end

# comment 2 line 1
# comment 2 line 2
begin
block 2 line 1
block 2 line 2
block 2 line 3
end

# comment 1 line 1

# comment 1 line 2

begin

block 1 line 1

block 1 line 2

block 1 line 3

end

# comment 2 line 1

# comment 2 line 2

begin

block 2 line 1

block 2 line 2

block 2 line 3

end

is possible with the following command

MBAPF:textdataprocessing fulmanp$ awk '{if($0 ~ /# comment/) {print}}' pattern.txt 
# comment 1 line 1
# comment 1 line 2
# comment 2 line 1
# comment 2 line 2

MBAPF:textdataprocessing fulmanp$ awk '{if($0 ~ /# comment/) {print}}' pattern.txt

# comment 1 line 1

# comment 1 line 2

# comment 2 line 1

# comment 2 line 2

or more briefly

MBAPF:textdataprocessing fulmanp$ awk '$0 ~ /# comment/ {print}' pattern.txt 
# comment 1 line 1
# comment 1 line 2
# comment 2 line 1
# comment 2 line 2

MBAPF:textdataprocessing fulmanp$ awk '$0 ~ /# comment/ {print}' pattern.txt

# comment 1 line 1

# comment 1 line 2

# comment 2 line 1

# comment 2 line 2

Saying the truth, this type of test is so common, the awk allows a third, shorter format

MBAPF:textdataprocessing fulmanp$ awk '/# comment/ {print}' pattern.txt 
# comment 1 line 1
# comment 1 line 2
# comment 2 line 1
# comment 2 line 2

MBAPF:textdataprocessing fulmanp$ awk '/# comment/ {print}' pattern.txt

# comment 1 line 1

# comment 1 line 2

# comment 2 line 1

# comment 2 line 2

Tests can be combined with the and &&, or || and not ! operators. Parenthesis can also be added if we want to change operators order or to make complex statement more clear.

A very useful variant of pattern is called the comma separated pattern and takes the form

/[TRIGGER_ACTION_PATTERN]/,/[STOP_ACTION_PATTERN]/ [ACTION]

1	/[TRIGGER_ACTION_PATTERN]/,/[STOP_ACTION_PATTERN]/ [ACTION]

This form defines, in one line, the condition to turn the action on, and the condition to turn the action off. That is, when a line containing TRIGGER_ACTION_PATTERN is seen, the ACTION is performed. Every line afterwards is also processed by ACTION, until a line containing STOP_ACTION_PATTERN is seen. This one is also processed as the last processed line.

The following prints all lines between line containing begin and another line containing end.

MBAPF:textdataprocessing fulmanp$ awk '/begin/,/end/ {print}' pattern.txt 
begin
block 1 line 1
block 1 line 2
block 1 line 3
end
begin
block 2 line 1
block 2 line 2
block 2 line 3
end

MBAPF:textdataprocessing fulmanp$ awk '/begin/,/end/ {print}' pattern.txt

begin

block 1 line 1

block 1 line 2

block 1 line 3

end

begin

block 2 line 1

block 2 line 2

block 2 line 3

end

The following prints all lines between 4 and 6 (inclusively):

MBAPF:textdataprocessing fulmanp$ awk '(NR==4),(NR==6) {print}' pattern.txt 
block 1 line 1
block 1 line 2
block 1 line 3

MBAPF:textdataprocessing fulmanp$ awk '(NR==4),(NR==6) {print}' pattern.txt

block 1 line 1

block 1 line 2

block 1 line 3

Note that we can have several patterns in a script and each one is independent of the others.

MBAPF:textdataprocessing fulmanp$ awk '/block/ {print}
> /block 1/ {print}' pattern.txt
block 1 line 1
block 1 line 1
block 1 line 2
block 1 line 2
block 1 line 3
block 1 line 3
block 2 line 1
block 2 line 2
block 2 line 3

MBAPF:textdataprocessing fulmanp$ awk '/block/ {print}

> /block 1/ {print}' pattern.txt

block 1 line 1

block 1 line 2

block 1 line 3

block 2 line 1

block 2 line 2

block 2 line 3

Functions

In awk we define functions according to the following general format

function [NAME]( {ARGUMENT1], ..., [ARGUMENTN]) { 
   [BODY]
}

function [NAME]( {ARGUMENT1], ..., [ARGUMENTN]) {

[BODY]

}

Consider the following addLeadingDots saved as function.awk

function addLeadingDots (line, lineLength) {
	currentLineLength = length(line)
	diff = lineLength - currentLineLength
	str = ""
	if (diff > 0){
		for(i=0;i<diff;i++) {
			str = str"."
		}
	}
    return str line
}

{
   print addLeadingDots($0, 11)
}

function addLeadingDots (line, lineLength) {

currentLineLength = length(line)

diff = lineLength - currentLineLength

str = ""

if (diff > 0){

for(i=0;i<diff;i++) {

str = str"."

}

return str line

}

{

print addLeadingDots($0, 11)

}

On executing this code, we get the following result

MBAPF:textdataprocessing fulmanp$ cat data02.txt 
11
21 22
31 32 33
42 42 43 44
MBAPF:textdataprocessing fulmanp$ awk -f function.awk data02.txt 
.........11
......21 22
...31 32 33
42 42 43 44

MBAPF:textdataprocessing fulmanp$ cat data02.txt

21 22

31 32 33

42 42 43 44

MBAPF:textdataprocessing fulmanp$ awk -f function.awk data02.txt

.........11

......21 22

...31 32 33

42 42 43 44

Various examples