Small tools – Tutorials

In this part we will be talking about bunch of very simple tool. Differently than sed or awk, all of them are much easier to learn and use. This is why we collected them in one chapter. We will discuss the most frequently used options -- for more details please refer to man pages.

In this part we cover the following topics

cut
dd ???
grep
head
join
less ???
nl ???
od ???
paste
seq ???
sort
split
tail
tee ???
tr
uniq ???
wc ???

cut

The cut command in UNIX is a command line utility for cutting sections from each line of input and writing the result to standard output. It can be used to cut parts of a line by byte position (-b), character (-c) and field delimiter (-f and -d to specify delimiter different that default tab character). A range must be provided in each case which consists of one of N, N-M, N- (N to the end of the line), or -M (beginning of the line to M), where N and M are counted from 1 (there is no zeroth value).

Below a list of all usable options is given (except help and version which are skipped, as being present in most UNIX commands)

-b, --bytes=RANGE Select only the bytes from each line as specified in RANGE. RANGE specifies a byte, a set of bytes, or a range of bytes as it was described above.
-c, --characters=RANGE Select only the characters from each line as specified in RANGE.
-d, --delimiter=DELIM use character DELIM instead of a tab for the field delimiter.
-f, --fields=RANGE Select only the fields from each line as specified in RANGE. Also print any line that contains no delimiter character, unless the -s option is specified.
--complement complement the set of selected bytes, characters or fields.
-s, --only-delimited do not print lines not containing delimiters.
--output-delimiter=STRING use STRING as the output delimiter string. The default is to use the input delimiter.

cut -- usage examples

To cut by byte position

MacBook-Air-Piotr:small fulmanp$ echo 'foo:bar:baz' | cut -b 1,4-6,10- f:baaz

1
2

MacBook-Air-Piotr:small fulmanp$ echo 'foo:bar:baz' | cut -b 1,4-6,10-
f:baaz

To cut by character
Where input stream is character based -c can be a better option than selecting by bytes with -b as often characters are more than one byte. In the following example Polish letter Ą -- Latin Capital Letter a with Ogonek -- has unicode U+0104 whis is coded in two bytes (c4 and 84) with UTF8.

MacBook-Air-Piotr:small fulmanp$ echo 'Ą' > data02.txt
MacBook-Air-Piotr:small fulmanp$ ls -l
total 16
-rw-r--r--  1 fulmanp  staff  60 22 lis 23:50 data01.txt
-rw-r--r--  1 fulmanp  staff   3 23 lis 21:47 data02.txt
MacBook-Air-Piotr:small fulmanp$ cat data02.txt 
Ą
MacBook-Air-Piotr:small fulmanp$ od -t x1 data02.txt 
0000000    c4  84  0a                                                    
0000003

MacBook-Air-Piotr:small fulmanp$ echo 'Ą' > data02.txt

MacBook-Air-Piotr:small fulmanp$ ls -l

total 16

-rw-r--r-- 1 fulmanp staff 60 22 lis 23:50 data01.txt

-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txt

MacBook-Air-Piotr:small fulmanp$ cat data02.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 data02.txt

0000000 c4 84 0a

0000003

By using the -c option the character can be correctly selected along with any other characters that are of interest.

MacBook-Air-Piotr:small fulmanp$ echo 'aĄb' | cut -b 2
?
MacBook-Air-Piotr:small fulmanp$ echo 'aĄb' | cut -c 2
Ą

MacBook-Air-Piotr:small fulmanp$ echo 'aĄb' | cut -b 2

MacBook-Air-Piotr:small fulmanp$ echo 'aĄb' | cut -c 2

This option seems to work incorectly on Linux

tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -b 2
�
tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2
�
tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2-3
Ą

tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -b 2

�

tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2

�

tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2-3

--complement does not work on MacOS, but should work on Linux

tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2 --complement
a�b
tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2-3 --complement
ab

tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2 --complement

a�b

tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2-3 --complement

To cut based on a delimiter (to cut by field)

MacBook-Air-Piotr:small fulmanp$ echo 'a,b,c,d,e,f' | cut -d ',' -f 1,4- a,d,e,f

1
2

MacBook-Air-Piotr:small fulmanp$ echo 'a,b,c,d,e,f' | cut -d ',' -f 1,4-
a,d,e,f

--output-delimiter does not work on MacOS, but should work on Linux

MacBook-Air-Piotr:small fulmanp$ echo 'a,b,c,d,e,f' | cut -d ',' -f 1,4- --output-delimiter=":"

1

MacBook-Air-Piotr:small fulmanp$ echo 'a,b,c,d,e,f' | cut -d ',' -f 1,4- --output-delimiter=":"

grep

The name grep means general regular expression parser, but it would be easier for us to think about grep command as a search command for Unix systems. It’s used to search for text strings or, more generally, regular expressions within one or more files or input stream.

grep is a simple tool but despite this has a lot of options. Printing all of them here is useless as our goal is not to copy man pages. I think it’s easiest to learn how to use the grep command by showing examples, so this is what I'm going to do as next.

grep -- usage examples

For all of the examples, we’ll be using the following test file named data03.txt.

MacBook-Air-Piotr:small fulmanp$ echo "01: foo
> 02: Foo
> 03: fOo
> 04: foO
> 05: FoO
> 06: bar
> 07: Bar
> 08: bAr
> 09: baR
> 10: BAR
> 11: foo bar" > data03.txt
MacBook-Air-Piotr:small fulmanp$ cat data03.txt 
01: foo
02: Foo
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR
10: BAR
11: foo bar

MacBook-Air-Piotr:small fulmanp$ echo "01: foo

> 02: Foo

> 03: fOo

> 04: foO

> 05: FoO

> 06: bar

> 07: Bar

> 08: bAr

> 09: baR

> 10: BAR

> 11: foo bar" > data03.txt

MacBook-Air-Piotr:small fulmanp$ cat data03.txt

01: foo

02: Foo

03: fOo

04: foO

05: FoO

06: bar

07: Bar

08: bAr

09: baR

10: BAR

11: foo bar

Search for a string in one or more files

MacBook-Air-Piotr:small fulmanp$ grep foo data03.txt 01: foo 11: foo bar

1
2
3

MacBook-Air-Piotr:small fulmanp$ grep foo data03.txt
01: foo
11: foo bar
Case-insensitive (with -i option) search for a string

MacBook-Air-Piotr:small fulmanp$ grep -i foo data03.txt 01: foo 02: Foo 03: fOo 04: foO 05: FoO 11: foo bar

1
2
3
4
5
6
7

MacBook-Air-Piotr:small fulmanp$ grep -i foo data03.txt
01: foo
02: Foo
03: fOo
04: foO
05: FoO
11: foo bar

Search for a string matched a regular expression

MacBook-Air-Piotr:small fulmanp$ grep '[fF]oo' data03.txt 
01: foo
02: Foo
11: foo bar
MacBook-Air-Piotr:small fulmanp$ grep '^....[A-Z]' data03.txt 
02: Foo
05: FoO
07: Bar
10: BAR

MacBook-Air-Piotr:small fulmanp$ grep '[fF]oo' data03.txt

01: foo

02: Foo

11: foo bar

MacBook-Air-Piotr:small fulmanp$ grep '^....[A-Z]' data03.txt

02: Foo

05: FoO

07: Bar

10: BAR

Reverse the meaning with -v option

MacBook-Air-Piotr:small fulmanp$ grep -v '[fF]oo' data03.txt 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr 09: baR 10: BAR

1
2
3
4
5
6
7
8
9

MacBook-Air-Piotr:small fulmanp$ grep -v '[fF]oo' data03.txt
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR
10: BAR
Search for multiple patterns (mind egrep usage in this case)

MacBook-Air-Piotr:small fulmanp$ egrep '^....F|R$' data03.txt 02: Foo 05: FoO 09: baR 10: BAR

1
2
3
4
5

MacBook-Air-Piotr:small fulmanp$ egrep '^....F|R$' data03.txt
02: Foo
05: FoO
09: baR
10: BAR
Show matching line numbers

MacBook-Air-Piotr:small fulmanp$ egrep -n '^....F|R$' data03.txt 2:02: Foo 5:05: FoO 9:09: baR 10:10: BAR

1
2
3
4
5

MacBook-Air-Piotr:small fulmanp$ egrep -n '^....F|R$' data03.txt
2:02: Foo
5:05: FoO
9:09: baR
10:10: BAR
Display matching filenames

MacBook-Air-Piotr:small fulmanp$ egrep -l '^....F|R$' * data03.txt

1
2

MacBook-Air-Piotr:small fulmanp$ egrep -l '^....F|R$' *
data03.txt

Lines before and after grep match

MacBook-Air-Piotr:small fulmanp$ grep 'F.O' data03.txt 
05: FoO
MacBook-Air-Piotr:small fulmanp$ grep -B 2 -A 4 'F.O' data03.txt 
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR

MacBook-Air-Piotr:small fulmanp$ grep 'F.O' data03.txt

05: FoO

MacBook-Air-Piotr:small fulmanp$ grep -B 2 -A 4 'F.O' data03.txt

03: fOo

04: foO

05: FoO

06: bar

07: Bar

08: bAr

09: baR

Highlighting the search using --color option

MacBook-Air-Piotr:small fulmanp$ grep --color -B 2 -A 4 'F.O' data03.txt 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr 09: baR

1
2
3
4
5
6
7
8

MacBook-Air-Piotr:small fulmanp$ grep --color -B 2 -A 4 'F.O' data03.txt
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR

FoO should be somehow higlighted. On my terminal it's red.
Counting the lines when words match

MacBook-Air-Piotr:small fulmanp$ grep 'F' data03.txt 02: Foo 05: FoO MacBook-Air-Piotr:small fulmanp$ grep -c 'F' data03.txt 2

1
2
3
4
5

MacBook-Air-Piotr:small fulmanp$ grep 'F' data03.txt
02: Foo
05: FoO
MacBook-Air-Piotr:small fulmanp$ grep -c 'F' data03.txt
2

That was a short example of the grep typical usage. More options we can find in documentation.

head

head is a program on Unix systems used to display the beginning of a text file or a stream of data (by default it prints the first 10 lines). The general command syntax is typical and there are just a few options.

-c [-]K print the first K bytes of each file; with the leading -, print all but the last K bytes of each file.
-n [-]K print the first K lines instead of the first 10; with the leading -, print all but the last K lines of each file.
-q never print headers giving file names.
-v always print headers giving file names.

K may have a multiplier suffix

b 512,
kB 1000, K 1024,
MB 1000*1000, M 1024*1024,
GB 1000*1000*1000, G 1024*1024*1024,
and so on for T, P, E, Z, Y.

A complement command for head is a tail command.

head -- usage examples

Mac-mini-Piotr:small fulmanp$ head data03.txt 
01: foo
02: Foo
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR
10: BAR
Mac-mini-Piotr:small fulmanp$ head -n 3 data03.txt 
01: foo
02: Foo
03: fOo

Mac-mini-Piotr:small fulmanp$ head data03.txt

01: foo

02: Foo

03: fOo

04: foO

05: FoO

06: bar

07: Bar

08: bAr

09: baR

10: BAR

Mac-mini-Piotr:small fulmanp$ head -n 3 data03.txt

01: foo

02: Foo

03: fOo

-n option with negative values does not work in MacOS

Mac-mini-Piotr:small fulmanp$ head -n -3 data03.txt 
head: illegal line count -- -3
Mac-mini-Piotr:small fulmanp$ head --lines=-3 data03.txt 
head: illegal option -- -
usage: head [-n lines | -c bytes] [file ...]

Mac-mini-Piotr:small fulmanp$ head -n -3 data03.txt

head: illegal line count -- -3

Mac-mini-Piotr:small fulmanp$ head --lines=-3 data03.txt

head: illegal option -- -

usage: head [-n lines | -c bytes] [file ...]

but works in Linux

tdp@tdp-VirtualBox:~$ head -n 3 data03.txt
01: foo
02: Foo
03: fOo
tdp@tdp-VirtualBox:~$ head -n -3 data03.txt
01: foo
02: Foo
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr

tdp@tdp-VirtualBox:~$ head -n 3 data03.txt

01: foo

02: Foo

03: fOo

tdp@tdp-VirtualBox:~$ head -n -3 data03.txt

01: foo

02: Foo

03: fOo

04: foO

05: FoO

06: bar

07: Bar

08: bAr

Mac-mini-Piotr:small fulmanp$ head -c 3 data03.txt 
01:Mac-mini-Piotr:small fulmanp$

1 2	Mac-mini-Piotr:small fulmanp$ head -c 3 data03.txt 01:Mac-mini-Piotr:small fulmanp$

01:Mac-mini-Piotr:small fulmanp$ cat data02.txt 
Ą
Mac-mini-Piotr:small fulmanp$ head -c 2 data02.txt 
ĄMac-mini-Piotr:small fulmanp$ head -c 3 data02.txt 
Ą
Mac-mini-Piotr:small fulmanp$ head -c 1 data02.txt 
?Mac-mini-Piotr:small fulmanp$

01:Mac-mini-Piotr:small fulmanp$ cat data02.txt

Mac-mini-Piotr:small fulmanp$ head -c 2 data02.txt

ĄMac-mini-Piotr:small fulmanp$ head -c 3 data02.txt

Mac-mini-Piotr:small fulmanp$ head -c 1 data02.txt

?Mac-mini-Piotr:small fulmanp$

-v option does not work in MacOS

?Mac-mini-Piotr:small fulmanp$ head -c 1 -v data02.txt 
head: illegal option -- v
usage: head [-n lines | -c bytes] [file ...]

?Mac-mini-Piotr:small fulmanp$ head -c 1 -v data02.txt

head: illegal option -- v

usage: head [-n lines | -c bytes] [file ...]

but works in Linux

tdp@tdp-VirtualBox:~$ head -c 1 data02.txt
�tdp@tdp-VirtualBox:~$ head -c 1 -v data02.txt
==> data02.txt <==
�tdp@tdp-VirtualBox:~$

tdp@tdp-VirtualBox:~$ head -c 1 data02.txt

�tdp@tdp-VirtualBox:~$ head -c 1 -v data02.txt

==> data02.txt <==

�tdp@tdp-VirtualBox:~$

join

join command combines two files based on the matching content lines found in each file. Using join command is quite straight forward but it can save lots of time and effort. To join two files using the join command files must have identical join fields. The default join field is the first field delimited by blanks (space or tab). Join expects that files will be sorted on the join fields before joining.

Most frequently used options includes

-1 FIELD Join on this FIELD of file 1.
-2 FIELD Join on this FIELD of file 2.
-t CHAR Use CHAR as input and output field separator.
-o FORMAT Use FORMAT while constructing output line.
-j FIELD Equivalent to -1 FIELD -2 FIELD.
-i Ignore differences in case when comparing fields.
-a FILENUM Also, print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2.

join -- usage examples

Basic usage of join command is usage without any options. All what is required is to specify 2 files as an arguments. Having two files data06_A.txt and data06_B.txt with a following content

Mac-mini-Piotr:small fulmanp$ cat data06_A.txt 1 A 2 B 3 CMac-mini-Piotr:small fulmanp$ cat data06_B.txt 1 AA 2 BB 3 CCMac-mini-Piotr:small fulmanp$

1
2
3
4
5
6
7

Mac-mini-Piotr:small fulmanp$ cat data06_A.txt
1 A
2 B
3 CMac-mini-Piotr:small fulmanp$ cat data06_B.txt
1 AA
2 BB
3 CCMac-mini-Piotr:small fulmanp$

the result is as below

Mac-mini-Piotr:small fulmanp$ join data06_A.txt data06_B.txt 1 A AA 2 B BB 3 C CC

1
2
3
4

Mac-mini-Piotr:small fulmanp$ join data06_A.txt data06_B.txt
1 A AA
2 B BB
3 C CC

Choosing field
When the first default join field is not longer matching, we can modify default behavior and join both files based on another fields. For files data06_AA.txt and data06_BB.txt with a following content

Mac-mini-Piotr:small fulmanp$ cat data06_AA.txt 
A 1
B 2
C 3Mac-mini-Piotr:small fulmanp$ cat data06_BB.txt 
1 AA
2 BB
3 CCMac-mini-Piotr:small fulmanp$ join data06_AA.txt data06_BB.txt 
Mac-mini-Piotr:small fulmanp$ join -1 2 -2 1 data06_AA.txt data06_BB.txt 
1 A AA
2 B BB
3 C CC

Mac-mini-Piotr:small fulmanp$ cat data06_AA.txt

A 1

B 2

C 3Mac-mini-Piotr:small fulmanp$ cat data06_BB.txt

1 AA

2 BB

3 CCMac-mini-Piotr:small fulmanp$ join data06_AA.txt data06_BB.txt

Mac-mini-Piotr:small fulmanp$ join -1 2 -2 1 data06_AA.txt data06_BB.txt

1 A AA

2 B BB

3 C CC

Overriding default join format

Mac-mini-Piotr:small fulmanp$ join -o 1.2 -o 2.2 -1 2 -2 1 data06_AA.txt data06_BB.txt 1 AA 2 BB 3 CC

1
2
3
4

Mac-mini-Piotr:small fulmanp$ join -o 1.2 -o 2.2 -1 2 -2 1 data06_AA.txt data06_BB.txt
1 AA
2 BB
3 CC

On Linux the following version (without multiple -o) should work

tdp@tdp-VirtualBox:~$ join -o 1.2 2.2 -1 2 -2 1 data06_AA.txt data06_BB.txt 1 AA 2 BB 3 CC

1
2
3
4

tdp@tdp-VirtualBox:~$ join -o 1.2 2.2 -1 2 -2 1 data06_AA.txt data06_BB.txt
1 AA
2 BB
3 CC

Dealing with non-pairable lines

Mac-mini-Piotr:small fulmanp$ cat data06_AAA.txt 
1 A
2 B
3 C
4 D
5 EMac-mini-Piotr:small fulmanp$ cat data06_BBB.txt 
1 AA
2 BB
3 CC
6 FFMac-mini-Piotr:small fulmanp$ join -a 1 data06_AAA.txt data06_BBB.txt 
1 A AA
2 B BB
3 C CC
4 D
5 E
Mac-mini-Piotr:small fulmanp$ join -a 2 data06_AAA.txt data06_BBB.txt 
1 A AA
2 B BB
3 C CC
6 FF
Mac-mini-Piotr:small fulmanp$ join -a 1 -a 2 data06_AAA.txt data06_BBB.txt 
1 A AA
2 B BB
3 C CC
4 D
5 E
6 FF

Mac-mini-Piotr:small fulmanp$ cat data06_AAA.txt

1 A

2 B

3 C

4 D

5 EMac-mini-Piotr:small fulmanp$ cat data06_BBB.txt

1 AA

2 BB

3 CC

6 FFMac-mini-Piotr:small fulmanp$ join -a 1 data06_AAA.txt data06_BBB.txt

1 A AA

2 B BB

3 C CC

4 D

5 E

Mac-mini-Piotr:small fulmanp$ join -a 2 data06_AAA.txt data06_BBB.txt

1 A AA

2 B BB

3 C CC

6 FF

Mac-mini-Piotr:small fulmanp$ join -a 1 -a 2 data06_AAA.txt data06_BBB.txt

1 A AA

2 B BB

3 C CC

4 D

5 E

6 FF

nl

In theory, nl numbers the lines in a file. In practise it does much more.

nl -- usage examples

paste

The paste command merges the corresponding lines of multiple files side-by-side.

paste -- usage examples

To display the contents of data06_AAA.txt and data06_BBB.txt, side-by-side, with the corresponding lines of each file separated by a tab we can use paste command in the following way

Mac-mini-Piotr:small fulmanp$ cat data06_AAA.txt 
1 A
2 B
3 C
4 D
5 EMac-mini-Piotr:small fulmanp$ cat data06_BBB.txt 
1 AA
2 BB
3 CC
6 FFMac-mini-Piotr:small fulmanp$ paste data06_AAA.txt data06_BBB.txt 
1 A	1 AA
2 B	2 BB
3 C	3 CC
4 D	6 FF
5 E

Mac-mini-Piotr:small fulmanp$ cat data06_AAA.txt

1 A

2 B

3 C

4 D

5 EMac-mini-Piotr:small fulmanp$ cat data06_BBB.txt

1 AA

2 BB

3 CC

6 FFMac-mini-Piotr:small fulmanp$ paste data06_AAA.txt data06_BBB.txt

1 A 1 AA

2 B 2 BB

3 C 3 CC

4 D 6 FF

5 E

With -d we can change line delimiter

Mac-mini-Piotr:small fulmanp$ paste -d : data06_AAA.txt data06_BBB.txt 1 A:1 AA 2 B:2 BB 3 C:3 CC 4 D:6 FF 5 E:

1
2
3
4
5
6

Mac-mini-Piotr:small fulmanp$ paste -d : data06_AAA.txt data06_BBB.txt
1 A:1 AA
2 B:2 BB
3 C:3 CC
4 D:6 FF
5 E:

With -s option paste command paste one file at a time instead of in parallel. It means, that we merge the files in sequentially manner. It reads all the lines from a single file and merges all these lines into a single line with each line separated by tab. And these single lines are separated by newline.

On MacOS result is odd

Mac-mini-Piotr:small fulmanp$ paste -s data06_AAA.txt data06_BBB.txt 
1 A	2 B	3 C	4 D	5 E1 AA	2 BB	3 CC	6 FF

1 2	Mac-mini-Piotr:small fulmanp$ paste -s data06_AAA.txt data06_BBB.txt 1 A 2 B 3 C 4 D 5 E1 AA 2 BB 3 CC 6 FF

while on Linux seems to be correct

tdp@tdp-VirtualBox:~$ paste -s data06_AAA.txt data06_BBB.txt
1 A 2 B 3 C 4 D 5 E
1 AA 2 BB 3 CC 6 FF

tdp@tdp-VirtualBox:~$ paste -s data06_AAA.txt data06_BBB.txt

1 A 2 B 3 C 4 D 5 E

1 AA 2 BB 3 CC 6 FF

-s option is much more clear for one column files

Mac-mini-Piotr:small fulmanp$ cat d1.txt 
1
2
3
4
Mac-mini-Piotr:small fulmanp$ cat d2.txt 
A
B
C
D
Mac-mini-Piotr:small fulmanp$ paste d1.txt d2.txt 
1	A
2	B
3	C
4	D
Mac-mini-Piotr:small fulmanp$ paste -s d1.txt d2.txt 
1	2	3	4
A	B	C	D

Mac-mini-Piotr:small fulmanp$ cat d1.txt

Mac-mini-Piotr:small fulmanp$ cat d2.txt

Mac-mini-Piotr:small fulmanp$ paste d1.txt d2.txt

1 A

2 B

3 C

4 D

Mac-mini-Piotr:small fulmanp$ paste -s d1.txt d2.txt

1 2 3 4

A B C D

sort

sort command rearrange the lines in a text file so that they are sorted, numerically and alphabetically.

By default, the rules for sorting are

Lines starting with a number will appear before lines starting with a letter.
Lines starting with a letter that appears earlier in the alphabet will appear before lines starting with a letter that appears later in the alphabet.
Lines starting with a lowercase letter will appear before lines starting with the same letter in uppercase.

sort has many options -- please refer to man pages to get know all of them. Below only some most common examples are given.

sort -- usage examples

Consider the following data07.txt file

MacBook-Air-Piotr:small fulmanp$ cat data07.txt 
4 a b a:c
2 a b b:a
2 a a a:a
1 b b b:b
3 a b a:c

MacBook-Air-Piotr:small fulmanp$ cat data07.txt

4 a b a:c

2 a b b:a

2 a a a:a

1 b b b:b

3 a b a:c

To sort the lines in this file alphabetically, use the following command

MacBook-Air-Piotr:small fulmanp$ sort data07.txt 
1 b b b:b
2 a a a:a
2 a b b:a
3 a b a:c
4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort data07.txt

1 b b b:b

2 a a a:a

2 a b b:a

3 a b a:c

4 a b a:c

We can use -o option to save sorting result in a file

MacBook-Air-Piotr:small fulmanp$ sort -o data07_sorted.txt data07.txt 
MacBook-Air-Piotr:small fulmanp$ cat data07_sorted.txt 
1 b b b:b
2 a a a:a
2 a b b:a
3 a b a:c
4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort -o data07_sorted.txt data07.txt

MacBook-Air-Piotr:small fulmanp$ cat data07_sorted.txt

1 b b b:b

2 a a a:a

2 a b b:a

3 a b a:c

4 a b a:c

To sort the lines in reverse order

MacBook-Air-Piotr:small fulmanp$ sort -r data07.txt 4 a b a:c 3 a b a:c 2 a b b:a 2 a a a:a 1 b b b:b

1
2
3
4
5
6

MacBook-Air-Piotr:small fulmanp$ sort -r data07.txt
4 a b a:c
3 a b a:c
2 a b b:a
2 a a a:a
1 b b b:b

Checking for sorted order

MacBook-Air-Piotr:small fulmanp$ sort -c data07.txt 
sort: data07.txt:2: disorder: 2 a b b:a
MacBook-Air-Piotr:small fulmanp$ sort -c data07_sorted.txt 
MacBook-Air-Piotr:small fulmanp$

MacBook-Air-Piotr:small fulmanp$ sort -c data07.txt

sort: data07.txt:2: disorder: 2 a b b:a

MacBook-Air-Piotr:small fulmanp$ sort -c data07_sorted.txt

MacBook-Air-Piotr:small fulmanp$

Sorting based on selected fields of data
Normally, sort decides how to sort lines based on the entire line: it compares every character from the first character in a line, to the last one. Even leading whitespaces matters

MacBook-Air-Piotr:small fulmanp$ cat data09.txt 
a
 b
c
  d
MacBook-Air-Piotr:small fulmanp$ sort data09.txt 
  d
 b
a
c

MacBook-Air-Piotr:small fulmanp$ cat data09.txt

MacBook-Air-Piotr:small fulmanp$ sort data09.txt

To ignores leading blanks, use the -b option

MacBook-Air-Piotr:small fulmanp$ sort -b data09.txt 
a
 b
c
  d

MacBook-Air-Piotr:small fulmanp$ sort -b data09.txt

If we want sort to compare a limited subset of every line data, we can specify which fields to compare using the -k option (fields are defined as anything separated by whitespace unless we specify other character with -t option).

MacBook-Air-Piotr:small fulmanp$ sort -k 3 data07.txt 
2 a a a:a
3 a b a:c
4 a b a:c
2 a b b:a
1 b b b:b

MacBook-Air-Piotr:small fulmanp$ sort -k 3 data07.txt

2 a a a:a

3 a b a:c

4 a b a:c

2 a b b:a

1 b b b:b

Have in mind that -k 3 means rather sort starting with column 3 than sort based (only) on column 3. If -k 3 is used, the sort key would begin at column 3 and extend to the end of the line, spanning all the fields in between. If we want to sort based only on column 3 we shoud specify starting field as well as ending field

MacBook-Air-Piotr:small fulmanp$ sort -k 3,3 data07.txt 
2 a a a:a
1 b b b:b
2 a b b:a
3 a b a:c
4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort -k 3,3 data07.txt

2 a a a:a

1 b b b:b

2 a b b:a

3 a b a:c

4 a b a:c

We can do even more, and specify a start and end position by character in every field

MacBook-Air-Piotr:small fulmanp$ sort data07.txt 
1 b b b:b
2 a a a:a
2 a b b:a
3 a b a:c
4 a b a:c
MacBook-Air-Piotr:small fulmanp$ sort -k 4 data07.txt 
2 a a a:a
3 a b a:c
4 a b a:c
2 a b b:a
1 b b b:b
MacBook-Air-Piotr:small fulmanp$ sort -k 4.3 data07.txt 
2 a a a:a
2 a b b:a
1 b b b:b
3 a b a:c
4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort data07.txt

1 b b b:b

2 a a a:a

2 a b b:a

3 a b a:c

4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort -k 4 data07.txt

2 a a a:a

3 a b a:c

4 a b a:c

2 a b b:a

1 b b b:b

MacBook-Air-Piotr:small fulmanp$ sort -k 4.3 data07.txt

2 a a a:a

2 a b b:a

1 b b b:b

3 a b a:c

4 a b a:c

We may also sort the contents of a file based upon more than one column

MacBook-Air-Piotr:small fulmanp$ cat data07.txt 
4 a b a:c
2 a b b:a
2 a a a:a
1 b b b:b
3 a b a:c
MacBook-Air-Piotr:small fulmanp$ sort data07.txt 
1 b b b:b
2 a a a:a
2 a b b:a
3 a b a:c
4 a b a:c

MacBook-Air-Piotr:small fulmanp$ cat data07.txt

4 a b a:c

2 a b b:a

2 a a a:a

1 b b b:b

3 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort data07.txt

1 b b b:b

2 a a a:a

2 a b b:a

3 a b a:c

4 a b a:c

Because the following seems to not sort based on the first character form field 4

MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.1 data07.txt 
1 b b b:b
2 a a a:a
2 a b b:a
3 a b a:c
4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.1 data07.txt

1 b b b:b

2 a a a:a

2 a b b:a

3 a b a:c

4 a b a:c

we can try

MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.2 data07.txt 
2 a a a:a
3 a b a:c
4 a b a:c
1 b b b:b
2 a b b:a

MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.2 data07.txt

2 a a a:a

3 a b a:c

4 a b a:c

1 b b b:b

2 a b b:a

MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.2 -k 2 data07.txt 
2 a a a:a
3 a b a:c
4 a b a:c
2 a b b:a
1 b b b:b

MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.2 -k 2 data07.txt

2 a a a:a

3 a b a:c

4 a b a:c

2 a b b:a

1 b b b:b

To sort the contents numerically

MacBook-Air-Piotr:small fulmanp$ cat data08.txt 
20
11
10
2
1
21MacBook-Air-Piotr:small fulmanp$ sort data08.txt 
1
10
11
2
20
21
MacBook-Air-Piotr:small fulmanp$ sort -n data08.txt 
1
2
10
11
20
21

MacBook-Air-Piotr:small fulmanp$ cat data08.txt

21MacBook-Air-Piotr:small fulmanp$ sort data08.txt

MacBook-Air-Piotr:small fulmanp$ sort -n data08.txt

Remove duplicates with -u option

MacBook-Air-Piotr:small fulmanp$ sort data07.txt 
1 b b b:b
2 a a a:a
2 a b b:a
3 a b a:c
4 a b a:c
MacBook-Air-Piotr:small fulmanp$ sort -u data07.txt 
1 b b b:b
2 a a a:a
2 a b b:a
3 a b a:c
4 a b a:c
MacBook-Air-Piotr:small fulmanp$ sort -k 1,1 -u data07.txt 
1 b b b:b
2 a b b:a
3 a b a:c
4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort data07.txt

1 b b b:b

2 a a a:a

2 a b b:a

3 a b a:c

4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort -u data07.txt

1 b b b:b

2 a a a:a

2 a b b:a

3 a b a:c

4 a b a:c

MacBook-Air-Piotr:small fulmanp$ sort -k 1,1 -u data07.txt

1 b b b:b

2 a b b:a

3 a b a:c

4 a b a:c

Sort using human readable numbers

MacBook-Air-Piotr:small fulmanp$ cat data10.txt 1M 2G 3K MacBook-Air-Piotr:small fulmanp$ sort -h data10.txt 3K 1M 2G

1
2
3
4
5
6
7
8

MacBook-Air-Piotr:small fulmanp$ cat data10.txt
1M
2G
3K
MacBook-Air-Piotr:small fulmanp$ sort -h data10.txt
3K
1M
2G

Merge already sorted files

MacBook-Air-Piotr:small fulmanp$ cat data11_1.txt
1
3
5
MacBook-Air-Piotr:small fulmanp$ cat data11_2.txt
2
4
6
MacBook-Air-Piotr:small fulmanp$ sort -m data11_2.txt data11_1.txt 
1
2
3
4
5
6

MacBook-Air-Piotr:small fulmanp$ cat data11_1.txt

MacBook-Air-Piotr:small fulmanp$ cat data11_2.txt

MacBook-Air-Piotr:small fulmanp$ sort -m data11_2.txt data11_1.txt

split

split command is used to split a file into the pieces. Whenever it is used a large file is divided into a set of smaller files with default size equal to 1000 lines, its default name prefix x and names as aa, ab, ac, etc. (so the full file names would be xaa, xab, xac, etc.).

Typically split accepts the following options

-a use suffixes of length N (default 2)
-b put SIZE bytes per output file
-C put at most SIZE bytes of lines per output file
-d use numeric suffixes instead of alphabetic
-l put NUMBER lines per output file
-x use hex suffixes instead of alphabetic
-n generate CHUNKS output files

SIZE may be (or may be an integer optionally followed by) one of following: KB=1000 bytes, K=1024 bytes, MB= 1000*1000 bytes, M=1024*1024, and so on for G, T, P, E, Z, Y.

CHUNKS may be

N split into N files based on size of input
K/N output Kth of N to stdout
l/N split into N files without splitting lines/records
l/K/N output Kth of N to stdout without splitting lines/records
r/N like l but use round robin distribution
r/K/N likewise but only output Kth of N to stdout

On MacOS another option is also available

-p The file is split whenever an input line matches PATTERN, which is interpreted as an extended regular expression. The matching line will be the first line of the next output file.

split -- usage examples

Create dummy files

Two files with random human readable bytes

MacBook-Air-Piotr:small fulmanp$ base64 /dev/urandom | head -c $((1024)) > 1024_B.txt
MacBook-Air-Piotr:small fulmanp$ base64 /dev/urandom | head -c $((1000)) > 1000_B.txt
MacBook-Air-Piotr:small fulmanp$ ls -l
total 208
-rw-r--r--  1 fulmanp  staff  1000  5 gru 23:52 1000_B.txt
-rw-r--r--  1 fulmanp  staff  1024  5 gru 23:52 1024_B.txt

MacBook-Air-Piotr:small fulmanp$ base64 /dev/urandom | head -c $((1024)) > 1024_B.txt

MacBook-Air-Piotr:small fulmanp$ base64 /dev/urandom | head -c $((1000)) > 1000_B.txt

MacBook-Air-Piotr:small fulmanp$ ls -l

total 208

-rw-r--r-- 1 fulmanp staff 1000 5 gru 23:52 1000_B.txt

-rw-r--r-- 1 fulmanp staff 1024 5 gru 23:52 1024_B.txt

File with a line foo bar repeated 256 times

MacBook-Air-Piotr:small fulmanp$ cat - > data13.txt 
foo bar
[Press Ctrl+D]
MacBook-Air-Piotr:small fulmanp$ for i in {1..8}; do cat data13.txt data13.txt > tmp.txt && mv tmp.txt data13.txt; done
MacBook-Air-Piotr:small fulmanp$ wc data13.txt 
     256     512    2048 data13.txt

MacBook-Air-Piotr:small fulmanp$ cat - > data13.txt

foo bar

[Press Ctrl+D]

MacBook-Air-Piotr:small fulmanp$ for i in {1..8}; do cat data13.txt data13.txt > tmp.txt && mv tmp.txt data13.txt; done

MacBook-Air-Piotr:small fulmanp$ wc data13.txt

256 512 2048 data13.txt

The wc command used above displays the number of lines, words, and bytes contained in input file.

Very nice information about generating dummy files can be found in How To Quickly Generate A Large File On The Command Line (With Linux) and How To Create Files Of A Certain Size In Linux.

Split file into pieces with customize line numbers

MacBook-Air-Piotr:small fulmanp$ split data13.txt 
MacBook-Air-Piotr:small fulmanp$ ls -l | grep xa
-rw-r--r--  1 fulmanp  staff  2048  6 gru 00:26 xaa
MacBook-Air-Piotr:small fulmanp$ wc xa*
     256     512    2048 xaa
MacBook-Air-Piotr:small fulmanp$ split -l 16 data13.txt
MacBook-Air-Piotr:small fulmanp$ ls -l | grep xa
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xaa
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xab
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xac
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xad
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xae
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xaf
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xag
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xah
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xai
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xaj
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xak
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xal
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xam
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xan
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xao
-rw-r--r--  1 fulmanp  staff   128  6 gru 00:24 xap
MacBook-Air-Piotr:small fulmanp$ wc xa*
      16      32     128 xaa
      16      32     128 xab
      16      32     128 xac
      16      32     128 xad
      16      32     128 xae
      16      32     128 xaf
      16      32     128 xag
      16      32     128 xah
      16      32     128 xai
      16      32     128 xaj
      16      32     128 xak
      16      32     128 xal
      16      32     128 xam
      16      32     128 xan
      16      32     128 xao
      16      32     128 xap
     256     512    2048 total

MacBook-Air-Piotr:small fulmanp$ split data13.txt

MacBook-Air-Piotr:small fulmanp$ ls -l | grep xa

-rw-r--r-- 1 fulmanp staff 2048 6 gru 00:26 xaa

MacBook-Air-Piotr:small fulmanp$ wc xa*

256 512 2048 xaa

MacBook-Air-Piotr:small fulmanp$ split -l 16 data13.txt

MacBook-Air-Piotr:small fulmanp$ ls -l | grep xa

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xaa

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xab

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xac

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xad

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xae

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xaf

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xag

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xah

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xai

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xaj

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xak

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xal

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xam

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xan

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xao

-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xap

MacBook-Air-Piotr:small fulmanp$ wc xa*

16 32 128 xaa

16 32 128 xab

16 32 128 xac

16 32 128 xad

16 32 128 xae

16 32 128 xaf

16 32 128 xag

16 32 128 xah

16 32 128 xai

16 32 128 xaj

16 32 128 xak

16 32 128 xal

16 32 128 xam

16 32 128 xan

16 32 128 xao

16 32 128 xap

256 512 2048 total

Split file into pieces with customize byte numbers

MacBook-Air-Piotr:small fulmanp$ split -b 100 1000_B.txt 
MacBook-Air-Piotr:small fulmanp$ wc xa*
       0       1     100 xaa
       0       1     100 xab
       0       1     100 xac
       0       1     100 xad
       0       1     100 xae
       0       1     100 xaf
       0       1     100 xag
       0       1     100 xah
       0       1     100 xai
       0       1     100 xaj
       0      10    1000 total

MacBook-Air-Piotr:small fulmanp$ split -b 100 1000_B.txt

MacBook-Air-Piotr:small fulmanp$ wc xa*

0 1 100 xaa

0 1 100 xab

0 1 100 xac

0 1 100 xad

0 1 100 xae

0 1 100 xaf

0 1 100 xag

0 1 100 xah

0 1 100 xai

0 1 100 xaj

0 10 1000 total

Create files with numeric suffix instead of alphabetic
Unfortunately this option doesn't work on MacOS; should work on Linux

MacBook-Air-Piotr:small fulmanp$ split -l 16 -d data13.txt ???

1
2

MacBook-Air-Piotr:small fulmanp$ split -l 16 -d data13.txt
???

Create files with customized prefix

MacBook-Air-Piotr:small fulmanp$ split -l 16 data13.txt data13_ 
MacBook-Air-Piotr:small fulmanp$ wc data13_*
      16      32     128 data13_aa
      16      32     128 data13_ab
      16      32     128 data13_ac
      16      32     128 data13_ad
      16      32     128 data13_ae
      16      32     128 data13_af
      16      32     128 data13_ag
      16      32     128 data13_ah
      16      32     128 data13_ai
      16      32     128 data13_aj
      16      32     128 data13_ak
      16      32     128 data13_al
      16      32     128 data13_am
      16      32     128 data13_an
      16      32     128 data13_ao
      16      32     128 data13_ap
     256     512    2048 total

MacBook-Air-Piotr:small fulmanp$ split -l 16 data13.txt data13_

MacBook-Air-Piotr:small fulmanp$ wc data13_*

16 32 128 data13_aa

16 32 128 data13_ab

16 32 128 data13_ac

16 32 128 data13_ad

16 32 128 data13_ae

16 32 128 data13_af

16 32 128 data13_ag

16 32 128 data13_ah

16 32 128 data13_ai

16 32 128 data13_aj

16 32 128 data13_ak

16 32 128 data13_al

16 32 128 data13_am

16 32 128 data13_an

16 32 128 data13_ao

16 32 128 data13_ap

256 512 2048 total

Divide file into chunks
Unfortunately this option doesn't work on MacOS; should work on Linux

MacBook-Air-Piotr:small fulmanp$ wc 1024_B.txt 
       0       1    1024 1024_B.txt
MacBook-Air-Piotr:small fulmanp$ md5 1024_B.txt 
MD5 (1024_B.txt) = 1ac07d0da8a3f019b4a4d15e26668113
MacBook-Air-Piotr:small fulmanp$ split -n 4 1024_B.txt 
???
merge with
cat xa* > 1024_B_2.txt
MacBook-Air-Piotr:small fulmanp$ md5 1024_B_2.txt

MacBook-Air-Piotr:small fulmanp$ wc 1024_B.txt

0 1 1024 1024_B.txt

MacBook-Air-Piotr:small fulmanp$ md5 1024_B.txt

MD5 (1024_B.txt) = 1ac07d0da8a3f019b4a4d15e26668113

MacBook-Air-Piotr:small fulmanp$ split -n 4 1024_B.txt

???

merge with

cat xa* > 1024_B_2.txt

MacBook-Air-Piotr:small fulmanp$ md5 1024_B_2.txt

Create files of customize suffix length

MacBook-Air-Piotr:small fulmanp$ split -l 16 -a 5 data13.txt
MacBook-Air-Piotr:small fulmanp$ wc xa*
      16      32     128 xaaaaa
      16      32     128 xaaaab
      16      32     128 xaaaac
      16      32     128 xaaaad
      16      32     128 xaaaae
      16      32     128 xaaaaf
      16      32     128 xaaaag
      16      32     128 xaaaah
      16      32     128 xaaaai
      16      32     128 xaaaaj
      16      32     128 xaaaak
      16      32     128 xaaaal
      16      32     128 xaaaam
      16      32     128 xaaaan
      16      32     128 xaaaao
      16      32     128 xaaaap
     256     512    2048 total

MacBook-Air-Piotr:small fulmanp$ split -l 16 -a 5 data13.txt

MacBook-Air-Piotr:small fulmanp$ wc xa*

16 32 128 xaaaaa

16 32 128 xaaaab

16 32 128 xaaaac

16 32 128 xaaaad

16 32 128 xaaaae

16 32 128 xaaaaf

16 32 128 xaaaag

16 32 128 xaaaah

16 32 128 xaaaai

16 32 128 xaaaaj

16 32 128 xaaaak

16 32 128 xaaaal

16 32 128 xaaaam

16 32 128 xaaaan

16 32 128 xaaaao

16 32 128 xaaaap

256 512 2048 total

tail

The tail command is a command-line utility for printing the last part of files. By default tail returns the last ten lines of each file that it is given. Compared to head, tail has a little bit more options and one very useful feature which allows it to be used in real time file changes monitoring.

General syntax is as as follow

tail [-F | -f | -r] [-q] [-b number | -c number | -n number] [file ...]

1	tail [-F \| -f \| -r] [-q] [-b number \| -c number \| -n number] [file ...]

-c [+|-]K Output the last K bytes. Numbers having a leading plus + sign are relative to the beginning of the input. Numbers having a leading minus - sign or no explicit sign are relative to the end of the input.
-n [+|-]KOutput the last K lines, instead of the default last 10. A leading plus + or - sign may be used in the meaning described in -c.
-f or --follow[={name|descriptor}] Output appended data as the file grows. This option will cause tail will loop forever, checking for new data at the end of the file(s). When new data appears, it will be printed. If we follow more than one file, a header will be printed to indicate which file's data is being printed. If the file shrinks instead of grows, tail will let us know with a message. If we specify name, the file with that name is followed, regardless of its file descriptor. If we specify descriptor, the same file is followed, even if it is renamed. This is the default behavior.
-f, --follow, and --follow=descriptor are equivalent.
--retry Keep trying to open a file even when it is or becomes inaccessible; useful when following by name, i.e., with --follow=name.
-F Same as --follow=name --retry.
-q Never output headers giving file names.
-v, Always output headers giving file names.

Again, as for , K may have a multiplier suffix

b 512,
kB 1000, K 1024,
MB 1000*1000, M 1024*1024,
GB 1000*1000*1000, G 1024*1024*1024,
and so on for T, P, E, Z, Y.

A complement command for tail is a head command.

tail -- usage examples

Mac-mini-Piotr:small fulmanp$ cat data03.txt 
01: foo
02: Foo
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR
10: BAR
11: foo bar
Mac-mini-Piotr:small fulmanp$ tail data03.txt 
02: Foo
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR
10: BAR
11: foo bar
Mac-mini-Piotr:small fulmanp$ tail -c 35 data03.txt 
8: bAr
09: baR
10: BAR
11: foo bar
Mac-mini-Piotr:small fulmanp$ tail -c +35 data03.txt 
: FoO
06: bar
07: Bar
08: bAr
09: baR
10: BAR
11: foo bar
Mac-mini-Piotr:small fulmanp$ tail -c -35 data03.txt 
8: bAr
09: baR
10: BAR
11: foo bar

Mac-mini-Piotr:small fulmanp$ cat data03.txt

01: foo

02: Foo

03: fOo

04: foO

05: FoO

06: bar

07: Bar

08: bAr

09: baR

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ tail data03.txt

02: Foo

03: fOo

04: foO

05: FoO

06: bar

07: Bar

08: bAr

09: baR

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ tail -c 35 data03.txt

8: bAr

09: baR

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ tail -c +35 data03.txt

: FoO

06: bar

07: Bar

08: bAr

09: baR

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ tail -c -35 data03.txt

8: bAr

09: baR

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ tail -n 2 data03.txt 
10: BAR
11: foo bar
Mac-mini-Piotr:small fulmanp$ tail -n +2 data03.txt 
02: Foo
03: fOo
04: foO
05: FoO
06: bar
07: Bar
08: bAr
09: baR
10: BAR
11: foo bar
Mac-mini-Piotr:small fulmanp$ tail -n -2 data03.txt 
10: BAR
11: foo bar

Mac-mini-Piotr:small fulmanp$ tail -n 2 data03.txt

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ tail -n +2 data03.txt

02: Foo

03: fOo

04: foO

05: FoO

06: bar

07: Bar

08: bAr

09: baR

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ tail -n -2 data03.txt

10: BAR

11: foo bar

Mac-mini-Piotr:small fulmanp$ touch data04.txt
Mac-mini-Piotr:small fulmanp$ touch data05.txt
Mac-mini-Piotr:small fulmanp$ ls -l
total 24
-rw-r--r--  1 fulmanp  staff  60 22 lis 23:50 data01.txt
-rw-r--r--  1 fulmanp  staff   3 23 lis 21:47 data02.txt
-rw-r--r--  1 fulmanp  staff  92 24 lis 10:50 data03.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 21:59 data04.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 21:59 data05.txt
Mac-mini-Piotr:small fulmanp$ tail -f data04.txt data05.txt 

==> data04.txt <==

==> data05.txt <==

==> data04.txt <==
a:1
a:2

==> data05.txt <==
b:1
b:2

==> data04.txt <==
a:3
^C

Mac-mini-Piotr:small fulmanp$ touch data04.txt

Mac-mini-Piotr:small fulmanp$ touch data05.txt

Mac-mini-Piotr:small fulmanp$ ls -l

total 24

-rw-r--r-- 1 fulmanp staff 60 22 lis 23:50 data01.txt

-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txt

-rw-r--r-- 1 fulmanp staff 92 24 lis 10:50 data03.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 21:59 data04.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 21:59 data05.txt

Mac-mini-Piotr:small fulmanp$ tail -f data04.txt data05.txt

==> data04.txt <==

==> data05.txt <==

==> data04.txt <==

a:1

a:2

==> data05.txt <==

b:1

b:2

==> data04.txt <==

a:3

Last login: Sat Nov 24 21:01:45 on ttys002
Mac-mini-Piotr:small fulmanp$ ls -l
total 24
-rw-r--r--  1 fulmanp  staff  60 22 lis 23:50 data01.txt
-rw-r--r--  1 fulmanp  staff   3 23 lis 21:47 data02.txt
-rw-r--r--  1 fulmanp  staff  92 24 lis 10:50 data03.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 21:59 data04.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 21:59 data05.txt
Mac-mini-Piotr:small fulmanp$ echo "a:1" > data04.txt 
Mac-mini-Piotr:small fulmanp$ echo "a:2" > data04.txt 
Mac-mini-Piotr:small fulmanp$ echo "b:1" > data05.txt 
Mac-mini-Piotr:small fulmanp$ echo "b:2" > data05.txt 
Mac-mini-Piotr:small fulmanp$ echo "a:3" > data04.txt 
Mac-mini-Piotr:small fulmanp$ mv data04.txt data04_modified.txt 
Mac-mini-Piotr:small fulmanp$ echo "a:4" > data04.txt

Last login: Sat Nov 24 21:01:45 on ttys002

Mac-mini-Piotr:small fulmanp$ ls -l

total 24

-rw-r--r-- 1 fulmanp staff 60 22 lis 23:50 data01.txt

-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txt

-rw-r--r-- 1 fulmanp staff 92 24 lis 10:50 data03.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 21:59 data04.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 21:59 data05.txt

Mac-mini-Piotr:small fulmanp$ echo "a:1" > data04.txt

Mac-mini-Piotr:small fulmanp$ echo "a:2" > data04.txt

Mac-mini-Piotr:small fulmanp$ echo "b:1" > data05.txt

Mac-mini-Piotr:small fulmanp$ echo "b:2" > data05.txt

Mac-mini-Piotr:small fulmanp$ echo "a:3" > data04.txt

Mac-mini-Piotr:small fulmanp$ mv data04.txt data04_modified.txt

Mac-mini-Piotr:small fulmanp$ echo "a:4" > data04.txt

The same but with option -F instead of -f

Mac-mini-Piotr:small fulmanp$ touch data04.txt data05.txt
Mac-mini-Piotr:small fulmanp$ ls -l
total 24
-rw-r--r--  1 fulmanp  staff  60 22 lis 23:50 data01.txt
-rw-r--r--  1 fulmanp  staff   3 23 lis 21:47 data02.txt
-rw-r--r--  1 fulmanp  staff  92 24 lis 10:50 data03.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 22:05 data04.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 22:05 data05.txt
Mac-mini-Piotr:small fulmanp$ tail -F data04.txt data05.txt 

==> data04.txt <==

==> data05.txt <==

==> data04.txt <==
a:1
a:2

==> data05.txt <==
b:1
b:2

==> data04.txt <==
a:3
a:4
^C

Mac-mini-Piotr:small fulmanp$ touch data04.txt data05.txt

Mac-mini-Piotr:small fulmanp$ ls -l

total 24

-rw-r--r-- 1 fulmanp staff 60 22 lis 23:50 data01.txt

-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txt

-rw-r--r-- 1 fulmanp staff 92 24 lis 10:50 data03.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 22:05 data04.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 22:05 data05.txt

Mac-mini-Piotr:small fulmanp$ tail -F data04.txt data05.txt

==> data04.txt <==

==> data05.txt <==

==> data04.txt <==

a:1

a:2

==> data05.txt <==

b:1

b:2

==> data04.txt <==

a:3

a:4

Mac-mini-Piotr:small fulmanp$ ls -l
total 24
-rw-r--r--  1 fulmanp  staff  60 22 lis 23:50 data01.txt
-rw-r--r--  1 fulmanp  staff   3 23 lis 21:47 data02.txt
-rw-r--r--  1 fulmanp  staff  92 24 lis 10:50 data03.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 22:05 data04.txt
-rw-r--r--  1 fulmanp  staff   0 24 lis 22:05 data05.txt
Mac-mini-Piotr:small fulmanp$ echo "a:1" >> data04.txt
Mac-mini-Piotr:small fulmanp$ echo "a:2" >> data04.txt
Mac-mini-Piotr:small fulmanp$ echo "b:1" >> data05.txt
Mac-mini-Piotr:small fulmanp$ echo "b:2" >> data05.txt
Mac-mini-Piotr:small fulmanp$ echo "a:3" >> data04.txt
Mac-mini-Piotr:small fulmanp$ mv data04.txt data04_modified.txt 
Mac-mini-Piotr:small fulmanp$ echo "a:4" >> data04.txt

Mac-mini-Piotr:small fulmanp$ ls -l

total 24

-rw-r--r-- 1 fulmanp staff 60 22 lis 23:50 data01.txt

-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txt

-rw-r--r-- 1 fulmanp staff 92 24 lis 10:50 data03.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 22:05 data04.txt

-rw-r--r-- 1 fulmanp staff 0 24 lis 22:05 data05.txt

Mac-mini-Piotr:small fulmanp$ echo "a:1" >> data04.txt

Mac-mini-Piotr:small fulmanp$ echo "a:2" >> data04.txt

Mac-mini-Piotr:small fulmanp$ echo "b:1" >> data05.txt

Mac-mini-Piotr:small fulmanp$ echo "b:2" >> data05.txt

Mac-mini-Piotr:small fulmanp$ echo "a:3" >> data04.txt

Mac-mini-Piotr:small fulmanp$ mv data04.txt data04_modified.txt

Mac-mini-Piotr:small fulmanp$ echo "a:4" >> data04.txt

tr

The tr command is used to translate specified characters into other characters. Moreover it can be also used to deleting specified characters, or squeezing repeated characters.

In contrast to many command line programs, tr does not accept file names as arguments (i.e., input data). Instead, it only accepts inputs from standard input or from the output of other programs via redirection; it write to standard output.

The general syntax of tr is

tr [options] set1 [set2]

1	tr [options] set1 [set2]

particularlt on MacOS we have

MacBook-Air-Piotr:small fulmanp$ tr
usage: tr [-Ccsu] string1 string2
       tr [-Ccu] -d string1
       tr [-Ccu] -s string1
       tr [-Ccu] -ds string1 string2

MacBook-Air-Piotr:small fulmanp$ tr

usage: tr [-Ccsu] string1 string2

tr [-Ccu] -d string1

tr [-Ccu] -s string1

tr [-Ccu] -ds string1 string2

The first, designated set1, lists the characters in the text to be replaced or removed. The second, set2, lists the characters that are to be substituted for the characters listed in the first argument. If both the set1 and set2 are specified and -d option is not specified, then command will replace each characters in set1 with each character in same position in set2. Input characters in the string set1 are mapped to corresponding characters in the string set1 so it is resonable that both set1 and set2 should have equal length. If this is not the case, no error is generated, but two rules are applied to make them equal

set2

set1

set2

set1

If length of set2 exceed the length of set1, excess characters in set2 are ignored.

Being more precisely, both sets can be specified not only by characters but by

Enumeration of characters like in (see example below)

tr '{}' '()' < infile > outfile

1

tr '{}' '()' < infile > outfile
Using charater ranges like in (see example below)

tr 'A-Z' 'a-z' < infile > outfile

1

tr 'A-Z' 'a-z' < infile > outfile
Using POSIX character classes. Each consists of a word (or abbreviation) surrounded by colons and then enclosed in a set of square brackets. So the sequence [:class:] represents all characters belonging to the defined character class, and class names are
- alnum alphanumeric characters,
- alpha alphabetic characters,
- cntrl control (non-printing) characters,
- digit numeric characters,
- graph graphic characters,
- lower lower-case alphabetic characters,
- print printable characters,
- punct punctuation characters,
- space whitespace characters,
- upper upper-case characters,
- xdigit hexadecimal characters 0-9 A-F.
They can be used like in (see example below)

tr '[:upper:]' '[:lower:]' < infile > outfile

1

tr '[:upper:]' '[:lower:]' < infile > outfile

Classes can be combined to form a more complex set, for example '[:lower:][:digit:]' (see example below)

We can also mix all of the above methods (see example below).

Typically tr accepts three options

-c Converts the set to the complement of the listed characters, i.e., operations apply to characters not in the given set.
-d Delete characters in the first set from the output.
-s Squeeze multiple occurrences of the characters listed in the last operand (either set1 or set2) in the input into a single instance of the character. This occurs after all deletion and translation is completed.

On MacOS another two options (-C, -u) are available (however -c option has different meaning; -C on MacOS = -c on Linux)

-C Complement the set of characters in set1.
-c Same as -C but complement the set of values in string1.
-u Guarantee that any output is unbuffered.

tr -- usage examples

We will use the following test file data12.txt

foo   bar 1  2   3
oof:rab:3:2:1
123foobar
FOO   BAR 1  2   3

foo bar 1 2 3

oof:rab:3:2:1

123foobar

FOO BAR 1 2 3

MacBook-Air-Piotr:small fulmanp$ cat data12.txt 
foo   bar 1  2   3
oof:rab:3:2:1
123foobar
FOO   BAR 1  2   3

MacBook-Air-Piotr:small fulmanp$ cat data12.txt

foo bar 1 2 3

oof:rab:3:2:1

123foobar

FOO BAR 1 2 3

Replaced : with a -

MacBook-Air-Piotr:small fulmanp$ tr : - < data12.txt > res.txt
MacBook-Air-Piotr:small fulmanp$ cat res.txt 
foo   bar 1  2   3
oof-rab-3-2-1
123foobar
FOO   BAR 1  2   3

MacBook-Air-Piotr:small fulmanp$ tr : - < data12.txt > res.txt

MacBook-Air-Piotr:small fulmanp$ cat res.txt

foo bar 1 2 3

oof-rab-3-2-1

123foobar

FOO BAR 1 2 3

Alternatively we can use pipe

MacBook-Air-Piotr:small fulmanp$ cat data12.txt | tr : - > res.txt
MacBook-Air-Piotr:small fulmanp$ cat res.txt 
foo   bar 1  2   3
oof-rab-3-2-1
123foobar
FOO   BAR 1  2   3

MacBook-Air-Piotr:small fulmanp$ cat data12.txt | tr : - > res.txt

MacBook-Air-Piotr:small fulmanp$ cat res.txt

foo bar 1 2 3

oof-rab-3-2-1

123foobar

FOO BAR 1 2 3

Replace using enumeration of characters (replace more than one character)

MacBook-Air-Piotr:small fulmanp$ tr 'fob' '#@!' < data12.txt #@@ !ar 1 2 3 @@#:ra!:3:2:1 123#@@!ar FOO BAR 1 2 3

1
2
3
4
5

MacBook-Air-Piotr:small fulmanp$ tr 'fob' '#@!' < data12.txt
#@@ !ar 1 2 3
@@#:ra!:3:2:1
123#@@!ar
FOO BAR 1 2 3

Replace using charater ranges

MacBook-Air-Piotr:small fulmanp$ tr 'a-z' '#' < data12.txt
###   ### 1  2   3
###:###:3:2:1
123######
FOO   BAR 1  2   3
MacBook-Air-Piotr:small fulmanp$ tr 'a-z' '#@!' < data12.txt
!!!   @#! 1  2   3
!!!:!#@:3:2:1
123!!!@#!
FOO   BAR 1  2   3

MacBook-Air-Piotr:small fulmanp$ tr 'a-z' '#' < data12.txt

### ### 1 2 3

###:###:3:2:1

123######

FOO BAR 1 2 3

MacBook-Air-Piotr:small fulmanp$ tr 'a-z' '#@!' < data12.txt

!!! @#! 1 2 3

!!!:!#@:3:2:1

123!!!@#!

FOO BAR 1 2 3

Delete specified characters

MacBook-Air-Piotr:small fulmanp$ tr -d fo < data12.txt bar 1 2 3 :rab:3:2:1 123bar FOO BAR 1 2 3

1
2
3
4
5

MacBook-Air-Piotr:small fulmanp$ tr -d fo < data12.txt
bar 1 2 3
:rab:3:2:1
123bar
FOO BAR 1 2 3

Squeeze repetition of characters

MacBook-Air-Piotr:small fulmanp$ tr [:space:] '?' < data12.txt 
foo???bar?1??2???3?oof:rab:3:2:1?123foobar?FOO???BAR?1??2???3?MacBook-Air-Piotr:small fulmanp$ 
MacBook-Air-Piotr:small fulmanp$ tr -s [:space:] '?' < data12.txt 
foo?bar?1?2?3?oof:rab:3:2:1?123foobar?FOO?BAR?1?2?3?MacBook-Air-Piotr:small fulmanp$

MacBook-Air-Piotr:small fulmanp$ tr [:space:] '?' < data12.txt

foo???bar?1??2???3?oof:rab:3:2:1?123foobar?FOO???BAR?1??2???3?MacBook-Air-Piotr:small fulmanp$

MacBook-Air-Piotr:small fulmanp$ tr -s [:space:] '?' < data12.txt

foo?bar?1?2?3?oof:rab:3:2:1?123foobar?FOO?BAR?1?2?3?MacBook-Air-Piotr:small fulmanp$

Complement the sets

MacBook-Air-Piotr:small fulmanp$ tr : - < data12.txt
foo   bar 1  2   3
oof-rab-3-2-1
123foobar
FOO   BAR 1  2   3
MacBook-Air-Piotr:small fulmanp$ tr -c : - < data12.txt
----------------------:---:-:-:-------------------------------MacBook-Air-Piotr:small fulmanp$ 
MacBook-Air-Piotr:small fulmanp$ tr -C : - < data12.txt
----------------------:---:-:-:-------------------------------MacBook-Air-Piotr:small fulmanp$

MacBook-Air-Piotr:small fulmanp$ tr : - < data12.txt

foo bar 1 2 3

oof-rab-3-2-1

123foobar

FOO BAR 1 2 3

MacBook-Air-Piotr:small fulmanp$ tr -c : - < data12.txt

----------------------:---:-:-:-------------------------------MacBook-Air-Piotr:small fulmanp$

MacBook-Air-Piotr:small fulmanp$ tr -C : - < data12.txt

----------------------:---:-:-:-------------------------------MacBook-Air-Piotr:small fulmanp$

Using POSIX character classes and mixed set specification

MacBook-Air-Piotr:small fulmanp$ tr [:digit:] - < data12.txt
foo   bar -  -   -
oof:rab:-:-:-
---foobar
FOO   BAR -  -   -
MacBook-Air-Piotr:small fulmanp$ tr [:digit:]r - < data12.txt
foo   ba- -  -   -
oof:-ab:-:-:-
---fooba-
FOO   BAR -  -   -
MacBook-Air-Piotr:small fulmanp$ tr [:digit:]r[:lower:] - < data12.txt
---   --- -  -   -
---:---:-:-:-
---------
FOO   BAR -  -   -

MacBook-Air-Piotr:small fulmanp$ tr [:digit:] - < data12.txt

foo bar - - -

oof:rab:-:-:-

---foobar

FOO BAR - - -

MacBook-Air-Piotr:small fulmanp$ tr [:digit:]r - < data12.txt

foo ba- - - -

oof:-ab:-:-:-

---fooba-

FOO BAR - - -

MacBook-Air-Piotr:small fulmanp$ tr [:digit:]r[:lower:] - < data12.txt

--- --- - - -

---:---:-:-:-

---------

FOO BAR - - -

Difference between -c and -C (who can explain this????)

MacBook-Air-Piotr:small fulmanp$ ls -l | grep data02.txt 
-rw-r--r--  1 fulmanp  staff   3 23 lis 21:47 data02.txt
MacBook-Air-Piotr:small fulmanp$ cat data02.txt 
Ą
MacBook-Air-Piotr:small fulmanp$ od -t x1 data02.txt 
0000000    c4  84  0a                                                    
0000003
MacBook-Air-Piotr:small fulmanp$ tr -c 'Ą' '#' < data02.txt > res.txt
MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt 
0000000    c4  84  23                                                    
0000003
MacBook-Air-Piotr:small fulmanp$ tr -C 'Ą' '#' < data02.txt > res.txt
MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt 
0000000    c4  84  23                                                    
0000003
MacBook-Air-Piotr:small fulmanp$ tr -c '\x84' '#' < data02.txt > res.txt
MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt 
0000000    23  23                                                        
0000002
MacBook-Air-Piotr:small fulmanp$ tr -C '\x84' '#' < data02.txt > res.txt
MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt 
0000000    23  23                                                        
0000002
MacBook-Air-Piotr:small fulmanp$ tr -c '\xc4' '#' < data02.txt > res.txt
MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt 
0000000    23  23                                                        
0000002
MacBook-Air-Piotr:small fulmanp$ tr -C '\xc4' '#' < data02.txt > res.txt
MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt 
0000000    23  23                                                        
0000002
MacBook-Air-Piotr:small fulmanp$

MacBook-Air-Piotr:small fulmanp$ ls -l | grep data02.txt

-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txt

MacBook-Air-Piotr:small fulmanp$ cat data02.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 data02.txt

0000000 c4 84 0a

0000003

MacBook-Air-Piotr:small fulmanp$ tr -c 'Ą' '#' < data02.txt > res.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt

0000000 c4 84 23

0000003

MacBook-Air-Piotr:small fulmanp$ tr -C 'Ą' '#' < data02.txt > res.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt

0000000 c4 84 23

0000003

MacBook-Air-Piotr:small fulmanp$ tr -c '\x84' '#' < data02.txt > res.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt

0000000 23 23

0000002

MacBook-Air-Piotr:small fulmanp$ tr -C '\x84' '#' < data02.txt > res.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt

0000000 23 23

0000002

MacBook-Air-Piotr:small fulmanp$ tr -c '\xc4' '#' < data02.txt > res.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt

0000000 23 23

0000002

MacBook-Air-Piotr:small fulmanp$ tr -C '\xc4' '#' < data02.txt > res.txt

MacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt

0000000 23 23

0000002

MacBook-Air-Piotr:small fulmanp$