In this part we will be talking about bunch of very simple tool. Differently than sed
or awk
, all of them are much easier to learn and use. This is why we collected them in one chapter. We will discuss the most frequently used options -- for more details please refer to man pages.
In this part we cover the following topics
cut
dd ???
grep
head
join
less ???
nl ???
od ???
paste
seq ???
sort
split
tail
tee ???
tr
uniq ???
wc ???
cut
The
cut
command in UNIX is a command line utility for cutting sections from each line of input and writing the result to standard output. It can be used to cut parts of a line by byte position (-b
), character (-c
) and field delimiter (-f
and -d
to specify delimiter different that default tab
character). A range must be provided in each case which consists of one of N
, N-M
, N-
(N
to the end of the line), or -M
(beginning of the line to M
), where N
and M
are counted from 1 (there is no zeroth value).
Below a list of all usable options is given (except help
and version
which are skipped, as being present in most UNIX commands)
-b
,--bytes=RANGE
Select only the bytes from each line as specified inRANGE
.RANGE
specifies a byte, a set of bytes, or a range of bytes as it was described above.-c
,--characters=RANGE
Select only the characters from each line as specified inRANGE
.-d
,--delimiter=DELIM
use character DELIM instead of atab
for the field delimiter.-f
,--fields=RANGE
Select only the fields from each line as specified inRANGE
. Also print any line that contains no delimiter character, unless the-s
option is specified.--complement
complement the set of selected bytes, characters or fields.-s
,--only-delimited
do not print lines not containing delimiters.--output-delimiter=STRING
useSTRING
as the output delimiter string. The default is to use the input delimiter.
cut
-- usage examples- To cut by byte position
12MacBook-Air-Piotr:small fulmanp$ echo 'foo:bar:baz' | cut -b 1,4-6,10-f:baaz - To cut by character
Where input stream is character based-c
can be a better option than selecting by bytes with-b
as often characters are more than one byte. In the following example Polish letter Ą -- Latin Capital Letter a with Ogonek -- has unicode U+0104 whis is coded in two bytes (c4 and 84) with UTF8.
12345678910MacBook-Air-Piotr:small fulmanp$ echo 'Ą' > data02.txtMacBook-Air-Piotr:small fulmanp$ ls -ltotal 16-rw-r--r-- 1 fulmanp staff 60 22 lis 23:50 data01.txt-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txtMacBook-Air-Piotr:small fulmanp$ cat data02.txtĄMacBook-Air-Piotr:small fulmanp$ od -t x1 data02.txt0000000 c4 84 0a0000003
By using the-c
option the character can be correctly selected along with any other characters that are of interest.
1234MacBook-Air-Piotr:small fulmanp$ echo 'aĄb' | cut -b 2?MacBook-Air-Piotr:small fulmanp$ echo 'aĄb' | cut -c 2Ą
This option seems to work incorectly on Linux
123456tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -b 2�tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2�tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2-3Ą
--complement
does not work on MacOS, but should work on Linux
1234tdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2 --complementa�btdp@tdp-VirtualBox:~$ echo 'aĄb' | cut -c 2-3 --complementab - To cut based on a delimiter (to cut by field)
12MacBook-Air-Piotr:small fulmanp$ echo 'a,b,c,d,e,f' | cut -d ',' -f 1,4-a,d,e,f
--output-delimiter
does not work on MacOS, but should work on Linux
1MacBook-Air-Piotr:small fulmanp$ echo 'a,b,c,d,e,f' | cut -d ',' -f 1,4- --output-delimiter=":"
grep
The name
grep
means general regular expression parser, but it would be easier for us to think about grep
command as a search command for Unix systems. It’s used to search for text strings or, more generally, regular expressions within one or more files or input stream.
grep
is a simple tool but despite this has a lot of options. Printing all of them here is useless as our goal is not to copy man pages. I think it’s easiest to learn how to use the grep
command by showing examples, so this is what I'm going to do as next.
grep
-- usage examplesFor all of the examples, we’ll be using the following test file named
data03.txt
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
MacBook-Air-Piotr:small fulmanp$ echo "01: foo > 02: Foo > 03: fOo > 04: foO > 05: FoO > 06: bar > 07: Bar > 08: bAr > 09: baR > 10: BAR > 11: foo bar" > data03.txt MacBook-Air-Piotr:small fulmanp$ cat data03.txt 01: foo 02: Foo 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr 09: baR 10: BAR 11: foo bar |
- Search for a string in one or more files
123MacBook-Air-Piotr:small fulmanp$ grep foo data03.txt01: foo11: foo bar - Case-insensitive (with
-i
option) search for a string
1234567MacBook-Air-Piotr:small fulmanp$ grep -i foo data03.txt01: foo02: Foo03: fOo04: foO05: FoO11: foo bar - Search for a string matched a regular expression
123456789MacBook-Air-Piotr:small fulmanp$ grep '[fF]oo' data03.txt01: foo02: Foo11: foo barMacBook-Air-Piotr:small fulmanp$ grep '^....[A-Z]' data03.txt02: Foo05: FoO07: Bar10: BAR - Reverse the meaning with
-v
option
123456789MacBook-Air-Piotr:small fulmanp$ grep -v '[fF]oo' data03.txt03: fOo04: foO05: FoO06: bar07: Bar08: bAr09: baR10: BAR - Search for multiple patterns (mind
egrep
usage in this case)
12345MacBook-Air-Piotr:small fulmanp$ egrep '^....F|R$' data03.txt02: Foo05: FoO09: baR10: BAR - Show matching line numbers
12345MacBook-Air-Piotr:small fulmanp$ egrep -n '^....F|R$' data03.txt2:02: Foo5:05: FoO9:09: baR10:10: BAR - Display matching filenames
12MacBook-Air-Piotr:small fulmanp$ egrep -l '^....F|R$' *data03.txt - Lines before and after grep match
12345678910MacBook-Air-Piotr:small fulmanp$ grep 'F.O' data03.txt05: FoOMacBook-Air-Piotr:small fulmanp$ grep -B 2 -A 4 'F.O' data03.txt03: fOo04: foO05: FoO06: bar07: Bar08: bAr09: baR - Highlighting the search using
--color
option
12345678MacBook-Air-Piotr:small fulmanp$ grep --color -B 2 -A 4 'F.O' data03.txt03: fOo04: foO05: FoO06: bar07: Bar08: bAr09: baRFoO
should be somehow higlighted. On my terminal it's red. - Counting the lines when words match
12345MacBook-Air-Piotr:small fulmanp$ grep 'F' data03.txt02: Foo05: FoOMacBook-Air-Piotr:small fulmanp$ grep -c 'F' data03.txt2
That was a short example of the grep
typical usage. More options we can find in documentation.
head
head
is a program on Unix systems used to display the beginning of a text file or a stream of data (by default it prints the first 10 lines). The general command syntax is typical and there are just a few options.
-c [-]K
print the firstK
bytes of each file; with the leading-
, print all but the lastK
bytes of each file.-n [-]K
print the firstK
lines instead of the first 10; with the leading-
, print all but the lastK
lines of each file.-q
never print headers giving file names.-v
always print headers giving file names.
K
may have a multiplier suffix
- b 512,
- kB 1000, K 1024,
- MB 1000*1000, M 1024*1024,
- GB 1000*1000*1000, G 1024*1024*1024,
- and so on for T, P, E, Z, Y.
A complement command for head
is a tail
command.
head
-- usage examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Mac-mini-Piotr:small fulmanp$ head data03.txt 01: foo 02: Foo 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr 09: baR 10: BAR Mac-mini-Piotr:small fulmanp$ head -n 3 data03.txt 01: foo 02: Foo 03: fOo |
-n
option with negative values does not work in MacOS
1 2 3 4 5 |
Mac-mini-Piotr:small fulmanp$ head -n -3 data03.txt head: illegal line count -- -3 Mac-mini-Piotr:small fulmanp$ head --lines=-3 data03.txt head: illegal option -- - usage: head [-n lines | -c bytes] [file ...] |
but works in Linux
1 2 3 4 5 6 7 8 9 10 11 12 13 |
tdp@tdp-VirtualBox:~$ head -n 3 data03.txt 01: foo 02: Foo 03: fOo tdp@tdp-VirtualBox:~$ head -n -3 data03.txt 01: foo 02: Foo 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr |
1 2 |
Mac-mini-Piotr:small fulmanp$ head -c 3 data03.txt 01:Mac-mini-Piotr:small fulmanp$ |
1 2 3 4 5 6 7 |
01:Mac-mini-Piotr:small fulmanp$ cat data02.txt Ą Mac-mini-Piotr:small fulmanp$ head -c 2 data02.txt ĄMac-mini-Piotr:small fulmanp$ head -c 3 data02.txt Ą Mac-mini-Piotr:small fulmanp$ head -c 1 data02.txt ?Mac-mini-Piotr:small fulmanp$ |
-v
option does not work in MacOS
1 2 3 |
?Mac-mini-Piotr:small fulmanp$ head -c 1 -v data02.txt head: illegal option -- v usage: head [-n lines | -c bytes] [file ...] |
but works in Linux
1 2 3 4 |
tdp@tdp-VirtualBox:~$ head -c 1 data02.txt �tdp@tdp-VirtualBox:~$ head -c 1 -v data02.txt ==> data02.txt <== �tdp@tdp-VirtualBox:~$ |
join
join
command combines two files based on the matching content lines found in each file. Using join command is quite straight forward but it can save lots of time and effort. To join two files using the join
command files must have identical join fields. The default join field is the first field delimited by blanks (space or tab). Join expects that files will be sorted on the join fields before joining.
Most frequently used options includes
-1 FIELD
Join on thisFIELD
of file 1.-2 FIELD
Join on thisFIELD
of file 2.-t CHAR
UseCHAR
as input and output field separator.-o FORMAT
UseFORMAT
while constructing output line.-j FIELD
Equivalent to-1 FIELD -2 FIELD
.-i
Ignore differences in case when comparing fields.-a FILENUM
Also, print unpairable lines from fileFILENUM
, whereFILENUM
is1
or2
, corresponding toFILE1
orFILE2
.
join
-- usage examples- Basic usage of
join
command is usage without any options. All what is required is to specify 2 files as an arguments. Having two filesdata06_A.txt
anddata06_B.txt
with a following content
1234567Mac-mini-Piotr:small fulmanp$ cat data06_A.txt1 A2 B3 CMac-mini-Piotr:small fulmanp$ cat data06_B.txt1 AA2 BB3 CCMac-mini-Piotr:small fulmanp$
the result is as below
1234Mac-mini-Piotr:small fulmanp$ join data06_A.txt data06_B.txt1 A AA2 B BB3 C CC - Choosing field
When the first default join field is not longer matching, we can modify default behavior and join both files based on another fields. For filesdata06_AA.txt
anddata06_BB.txt
with a following content
1234567891011Mac-mini-Piotr:small fulmanp$ cat data06_AA.txtA 1B 2C 3Mac-mini-Piotr:small fulmanp$ cat data06_BB.txt1 AA2 BB3 CCMac-mini-Piotr:small fulmanp$ join data06_AA.txt data06_BB.txtMac-mini-Piotr:small fulmanp$ join -1 2 -2 1 data06_AA.txt data06_BB.txt1 A AA2 B BB3 C CC - Overriding default join format
1234Mac-mini-Piotr:small fulmanp$ join -o 1.2 -o 2.2 -1 2 -2 1 data06_AA.txt data06_BB.txt1 AA2 BB3 CC
On Linux the following version (without multiple-o
) should work
1234tdp@tdp-VirtualBox:~$ join -o 1.2 2.2 -1 2 -2 1 data06_AA.txt data06_BB.txt1 AA2 BB3 CC - Dealing with non-pairable lines
123456789101112131415161718192021222324252627Mac-mini-Piotr:small fulmanp$ cat data06_AAA.txt1 A2 B3 C4 D5 EMac-mini-Piotr:small fulmanp$ cat data06_BBB.txt1 AA2 BB3 CC6 FFMac-mini-Piotr:small fulmanp$ join -a 1 data06_AAA.txt data06_BBB.txt1 A AA2 B BB3 C CC4 D5 EMac-mini-Piotr:small fulmanp$ join -a 2 data06_AAA.txt data06_BBB.txt1 A AA2 B BB3 C CC6 FFMac-mini-Piotr:small fulmanp$ join -a 1 -a 2 data06_AAA.txt data06_BBB.txt1 A AA2 B BB3 C CC4 D5 E6 FF
nl
In theory,
nl
numbers the lines in a file. In practise it does much more.
nl
-- usage examples
paste
The
paste
command merges the corresponding lines of multiple files side-by-side.
paste
-- usage examples- To display the contents of
data06_AAA.txt
anddata06_BBB.txt
, side-by-side, with the corresponding lines of each file separated by a tab we can usepaste
command in the following way
123456789101112131415Mac-mini-Piotr:small fulmanp$ cat data06_AAA.txt1 A2 B3 C4 D5 EMac-mini-Piotr:small fulmanp$ cat data06_BBB.txt1 AA2 BB3 CC6 FFMac-mini-Piotr:small fulmanp$ paste data06_AAA.txt data06_BBB.txt1 A 1 AA2 B 2 BB3 C 3 CC4 D 6 FF5 E - With
-d
we can change line delimiter
123456Mac-mini-Piotr:small fulmanp$ paste -d : data06_AAA.txt data06_BBB.txt1 A:1 AA2 B:2 BB3 C:3 CC4 D:6 FF5 E: - With
-s
optionpaste
command paste one file at a time instead of in parallel. It means, that we merge the files in sequentially manner. It reads all the lines from a single file and merges all these lines into a single line with each line separated by tab. And these single lines are separated by newline.On MacOS result is odd
12Mac-mini-Piotr:small fulmanp$ paste -s data06_AAA.txt data06_BBB.txt1 A 2 B 3 C 4 D 5 E1 AA 2 BB 3 CC 6 FFwhile on Linux seems to be correct
123tdp@tdp-VirtualBox:~$ paste -s data06_AAA.txt data06_BBB.txt1 A 2 B 3 C 4 D 5 E1 AA 2 BB 3 CC 6 FF-s
option is much more clear for one column files123456789101112131415161718Mac-mini-Piotr:small fulmanp$ cat d1.txt1234Mac-mini-Piotr:small fulmanp$ cat d2.txtABCDMac-mini-Piotr:small fulmanp$ paste d1.txt d2.txt1 A2 B3 C4 DMac-mini-Piotr:small fulmanp$ paste -s d1.txt d2.txt1 2 3 4A B C D
sort
sort
command rearrange the lines in a text file so that they are sorted, numerically and alphabetically.
By default, the rules for sorting are
- Lines starting with a number will appear before lines starting with a letter.
- Lines starting with a letter that appears earlier in the alphabet will appear before lines starting with a letter that appears later in the alphabet.
- Lines starting with a lowercase letter will appear before lines starting with the same letter in uppercase.
sort
has many options -- please refer to man pages to get know all of them. Below only some most common examples are given.
sort
-- usage examplesConsider the following
data07.txt
file
1 2 3 4 5 6 |
MacBook-Air-Piotr:small fulmanp$ cat data07.txt 4 a b a:c 2 a b b:a 2 a a a:a 1 b b b:b 3 a b a:c |
- To sort the lines in this file alphabetically, use the following command
123456MacBook-Air-Piotr:small fulmanp$ sort data07.txt1 b b b:b2 a a a:a2 a b b:a3 a b a:c4 a b a:c
We can use-o
option to save sorting result in a file
1234567MacBook-Air-Piotr:small fulmanp$ sort -o data07_sorted.txt data07.txtMacBook-Air-Piotr:small fulmanp$ cat data07_sorted.txt1 b b b:b2 a a a:a2 a b b:a3 a b a:c4 a b a:c - To sort the lines in reverse order
123456MacBook-Air-Piotr:small fulmanp$ sort -r data07.txt4 a b a:c3 a b a:c2 a b b:a2 a a a:a1 b b b:b - Checking for sorted order
1234MacBook-Air-Piotr:small fulmanp$ sort -c data07.txtsort: data07.txt:2: disorder: 2 a b b:aMacBook-Air-Piotr:small fulmanp$ sort -c data07_sorted.txtMacBook-Air-Piotr:small fulmanp$ - Sorting based on selected fields of data
Normally, sort decides how to sort lines based on the entire line: it compares every character from the first character in a line, to the last one. Even leadingwhitespace
s matters
12345678910MacBook-Air-Piotr:small fulmanp$ cat data09.txtabcdMacBook-Air-Piotr:small fulmanp$ sort data09.txtdbac
To ignores leading blanks, use the-b
option
12345MacBook-Air-Piotr:small fulmanp$ sort -b data09.txtabcd
If we wantsort
to compare a limited subset of every line data, we can specify which fields to compare using the-k
option (fields are defined as anything separated bywhitespace
unless we specify other character with-t
option).
123456MacBook-Air-Piotr:small fulmanp$ sort -k 3 data07.txt2 a a a:a3 a b a:c4 a b a:c2 a b b:a1 b b b:b
Have in mind that-k 3
means rather sort starting with column 3 than sort based (only) on column 3. If-k 3
is used, the sort key would begin at column 3 and extend to the end of the line, spanning all the fields in between. If we want to sort based only on column 3 we shoud specify starting field as well as ending field
123456MacBook-Air-Piotr:small fulmanp$ sort -k 3,3 data07.txt2 a a a:a1 b b b:b2 a b b:a3 a b a:c4 a b a:c
We can do even more, and specify a start and end position by character in every field
123456789101112131415161718MacBook-Air-Piotr:small fulmanp$ sort data07.txt1 b b b:b2 a a a:a2 a b b:a3 a b a:c4 a b a:cMacBook-Air-Piotr:small fulmanp$ sort -k 4 data07.txt2 a a a:a3 a b a:c4 a b a:c2 a b b:a1 b b b:bMacBook-Air-Piotr:small fulmanp$ sort -k 4.3 data07.txt2 a a a:a2 a b b:a1 b b b:b3 a b a:c4 a b a:c
We may also sort the contents of a file based upon more than one column
123456789101112MacBook-Air-Piotr:small fulmanp$ cat data07.txt4 a b a:c2 a b b:a2 a a a:a1 b b b:b3 a b a:cMacBook-Air-Piotr:small fulmanp$ sort data07.txt1 b b b:b2 a a a:a2 a b b:a3 a b a:c4 a b a:c
Because the following seems to not sort based on the first character form field 4
123456MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.1 data07.txt1 b b b:b2 a a a:a2 a b b:a3 a b a:c4 a b a:c
we can try
123456MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.2 data07.txt2 a a a:a3 a b a:c4 a b a:c1 b b b:b2 a b b:a
Next
123456MacBook-Air-Piotr:small fulmanp$ sort -k 4.1,4.2 -k 2 data07.txt2 a a a:a3 a b a:c4 a b a:c2 a b b:a1 b b b:b - To sort the contents numerically
1234567891011121314151617181920MacBook-Air-Piotr:small fulmanp$ cat data08.txt2011102121MacBook-Air-Piotr:small fulmanp$ sort data08.txt1101122021MacBook-Air-Piotr:small fulmanp$ sort -n data08.txt1210112021 - Remove duplicates with
-u
option
1234567891011121314151617MacBook-Air-Piotr:small fulmanp$ sort data07.txt1 b b b:b2 a a a:a2 a b b:a3 a b a:c4 a b a:cMacBook-Air-Piotr:small fulmanp$ sort -u data07.txt1 b b b:b2 a a a:a2 a b b:a3 a b a:c4 a b a:cMacBook-Air-Piotr:small fulmanp$ sort -k 1,1 -u data07.txt1 b b b:b2 a b b:a3 a b a:c4 a b a:c - Sort using human readable numbers
12345678MacBook-Air-Piotr:small fulmanp$ cat data10.txt1M2G3KMacBook-Air-Piotr:small fulmanp$ sort -h data10.txt3K1M2G - Merge already sorted files
123456789101112131415MacBook-Air-Piotr:small fulmanp$ cat data11_1.txt135MacBook-Air-Piotr:small fulmanp$ cat data11_2.txt246MacBook-Air-Piotr:small fulmanp$ sort -m data11_2.txt data11_1.txt123456
split
split
command is used to split a file into the pieces. Whenever it is used a large file is divided into a set of smaller files with default size equal to 1000 lines, its default name prefix x
and names as aa
, ab
, ac
, etc. (so the full file names would be xaa
, xab
, xac
, etc.).
Typically split
accepts the following options
-a
use suffixes of lengthN
(default 2)-b
putSIZE
bytes per output file-C
put at mostSIZE
bytes of lines per output file-d
use numeric suffixes instead of alphabetic-l
putNUMBER
lines per output file-x
use hex suffixes instead of alphabetic-n
generateCHUNKS
output files
SIZE
may be (or may be an integer optionally followed by) one of following: KB
=1000 bytes, K
=1024 bytes, MB
= 1000*1000 bytes, M
=1024*1024, and so on for G
, T
, P
, E
, Z
, Y
.
CHUNKS
may be
N
split intoN
files based on size of inputK/N
outputK
th ofN
to stdoutl/N
split intoN
files without splitting lines/recordsl/K/N
outputK
th ofN
to stdout without splitting lines/recordsr/N
likel
but use round robin distributionr/K/N
likewise but only outputK
th ofN
to stdout
On MacOS another option is also available
-p
The file is split whenever an input line matchesPATTERN
, which is interpreted as an extended regular expression. The matching line will be the first line of the next output file.
split
-- usage examples- Create dummy files
- Two files with random human readable bytes
123456MacBook-Air-Piotr:small fulmanp$ base64 /dev/urandom | head -c $((1024)) > 1024_B.txtMacBook-Air-Piotr:small fulmanp$ base64 /dev/urandom | head -c $((1000)) > 1000_B.txtMacBook-Air-Piotr:small fulmanp$ ls -ltotal 208-rw-r--r-- 1 fulmanp staff 1000 5 gru 23:52 1000_B.txt-rw-r--r-- 1 fulmanp staff 1024 5 gru 23:52 1024_B.txt - File with a line
foo bar
repeated 256 times
123456MacBook-Air-Piotr:small fulmanp$ cat - > data13.txtfoo bar[Press Ctrl+D]MacBook-Air-Piotr:small fulmanp$ for i in {1..8}; do cat data13.txt data13.txt > tmp.txt && mv tmp.txt data13.txt; doneMacBook-Air-Piotr:small fulmanp$ wc data13.txt256 512 2048 data13.txt
The
wc
command used above displays the number of lines, words, and bytes contained in input file.Very nice information about generating dummy files can be found in How To Quickly Generate A Large File On The Command Line (With Linux) and How To Create Files Of A Certain Size In Linux.
- Two files with random human readable bytes
- Split file into pieces with customize line numbers
1234567891011121314151617181920212223242526272829303132333435363738394041MacBook-Air-Piotr:small fulmanp$ split data13.txtMacBook-Air-Piotr:small fulmanp$ ls -l | grep xa-rw-r--r-- 1 fulmanp staff 2048 6 gru 00:26 xaaMacBook-Air-Piotr:small fulmanp$ wc xa*256 512 2048 xaaMacBook-Air-Piotr:small fulmanp$ split -l 16 data13.txtMacBook-Air-Piotr:small fulmanp$ ls -l | grep xa-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xaa-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xab-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xac-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xad-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xae-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xaf-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xag-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xah-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xai-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xaj-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xak-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xal-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xam-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xan-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xao-rw-r--r-- 1 fulmanp staff 128 6 gru 00:24 xapMacBook-Air-Piotr:small fulmanp$ wc xa*16 32 128 xaa16 32 128 xab16 32 128 xac16 32 128 xad16 32 128 xae16 32 128 xaf16 32 128 xag16 32 128 xah16 32 128 xai16 32 128 xaj16 32 128 xak16 32 128 xal16 32 128 xam16 32 128 xan16 32 128 xao16 32 128 xap256 512 2048 total
- Split file into pieces with customize byte numbers
12345678910111213MacBook-Air-Piotr:small fulmanp$ split -b 100 1000_B.txtMacBook-Air-Piotr:small fulmanp$ wc xa*0 1 100 xaa0 1 100 xab0 1 100 xac0 1 100 xad0 1 100 xae0 1 100 xaf0 1 100 xag0 1 100 xah0 1 100 xai0 1 100 xaj0 10 1000 total
- Create files with numeric suffix instead of alphabetic
Unfortunately this option doesn't work on MacOS; should work on Linux12MacBook-Air-Piotr:small fulmanp$ split -l 16 -d data13.txt??? - Create files with customized prefix
12345678910111213141516171819MacBook-Air-Piotr:small fulmanp$ split -l 16 data13.txt data13_MacBook-Air-Piotr:small fulmanp$ wc data13_*16 32 128 data13_aa16 32 128 data13_ab16 32 128 data13_ac16 32 128 data13_ad16 32 128 data13_ae16 32 128 data13_af16 32 128 data13_ag16 32 128 data13_ah16 32 128 data13_ai16 32 128 data13_aj16 32 128 data13_ak16 32 128 data13_al16 32 128 data13_am16 32 128 data13_an16 32 128 data13_ao16 32 128 data13_ap256 512 2048 total
- Divide file into chunks
Unfortunately this option doesn't work on MacOS; should work on Linux123456789MacBook-Air-Piotr:small fulmanp$ wc 1024_B.txt0 1 1024 1024_B.txtMacBook-Air-Piotr:small fulmanp$ md5 1024_B.txtMD5 (1024_B.txt) = 1ac07d0da8a3f019b4a4d15e26668113MacBook-Air-Piotr:small fulmanp$ split -n 4 1024_B.txt???merge withcat xa* > 1024_B_2.txtMacBook-Air-Piotr:small fulmanp$ md5 1024_B_2.txt - Create files of customize suffix length
12345678910111213141516171819MacBook-Air-Piotr:small fulmanp$ split -l 16 -a 5 data13.txtMacBook-Air-Piotr:small fulmanp$ wc xa*16 32 128 xaaaaa16 32 128 xaaaab16 32 128 xaaaac16 32 128 xaaaad16 32 128 xaaaae16 32 128 xaaaaf16 32 128 xaaaag16 32 128 xaaaah16 32 128 xaaaai16 32 128 xaaaaj16 32 128 xaaaak16 32 128 xaaaal16 32 128 xaaaam16 32 128 xaaaan16 32 128 xaaaao16 32 128 xaaaap256 512 2048 total
tail
The
tail
command is a command-line utility for printing the last part of files. By default tail
returns the last ten lines of each file that it is given. Compared to head
, tail
has a little bit more options and one very useful feature which allows it to be used in real time file changes monitoring.
General syntax is as as follow
1 |
tail [-F | -f | -r] [-q] [-b number | -c number | -n number] [file ...] |
-c [+|-]K
Output the lastK
bytes. Numbers having a leading plus+
sign are relative to the beginning of the input. Numbers having a leading minus-
sign or no explicit sign are relative to the end of the input.-n [+|-]K
Output the lastK
lines, instead of the default last 10. A leading plus+
or-
sign may be used in the meaning described in-c
.-f
or--follow[={name|descriptor}]
Output appended data as the file grows. This option will cause tail will loop forever, checking for new data at the end of the file(s). When new data appears, it will be printed. If we follow more than one file, a header will be printed to indicate which file's data is being printed. If the file shrinks instead of grows, tail will let us know with a message. If we specifyname
, the file with that name is followed, regardless of its file descriptor. If we specifydescriptor
, the same file is followed, even if it is renamed. This is the default behavior.
-f
,--follow
, and--follow=descriptor
are equivalent.--retry
Keep trying to open a file even when it is or becomes inaccessible; useful when following by name, i.e., with--follow=name
.-F
Same as--follow=name --retry
.-q
Never output headers giving file names.-v,
Always output headers giving file names.
Again, as for
,K
may have a multiplier suffix
- b 512,
- kB 1000, K 1024,
- MB 1000*1000, M 1024*1024,
- GB 1000*1000*1000, G 1024*1024*1024,
- and so on for T, P, E, Z, Y.
A complement command for tail
is a head
command.
tail
-- usage examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
Mac-mini-Piotr:small fulmanp$ cat data03.txt 01: foo 02: Foo 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr 09: baR 10: BAR 11: foo bar Mac-mini-Piotr:small fulmanp$ tail data03.txt 02: Foo 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr 09: baR 10: BAR 11: foo bar Mac-mini-Piotr:small fulmanp$ tail -c 35 data03.txt 8: bAr 09: baR 10: BAR 11: foo bar Mac-mini-Piotr:small fulmanp$ tail -c +35 data03.txt : FoO 06: bar 07: Bar 08: bAr 09: baR 10: BAR 11: foo bar Mac-mini-Piotr:small fulmanp$ tail -c -35 data03.txt 8: bAr 09: baR 10: BAR 11: foo bar |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Mac-mini-Piotr:small fulmanp$ tail -n 2 data03.txt 10: BAR 11: foo bar Mac-mini-Piotr:small fulmanp$ tail -n +2 data03.txt 02: Foo 03: fOo 04: foO 05: FoO 06: bar 07: Bar 08: bAr 09: baR 10: BAR 11: foo bar Mac-mini-Piotr:small fulmanp$ tail -n -2 data03.txt 10: BAR 11: foo bar |
|
|
The same but with option -F
instead of -f
|
|
tr
The
tr
command is used to translate specified characters into other characters. Moreover it can be also used to deleting specified characters, or squeezing repeated characters.
In contrast to many command line programs, tr
does not accept file names as arguments (i.e., input data). Instead, it only accepts inputs from standard input or from the output of other programs via redirection; it write to standard output.
The general syntax of tr
is
1 |
tr [options] set1 [set2] |
particularlt on MacOS we have
1 2 3 4 5 |
MacBook-Air-Piotr:small fulmanp$ tr usage: tr [-Ccsu] string1 string2 tr [-Ccu] -d string1 tr [-Ccu] -s string1 tr [-Ccu] -ds string1 string2 |
The first, designated set1
, lists the characters in the text to be replaced or removed. The second, set2
, lists the characters that are to be substituted for the characters listed in the first argument. If both the set1
and set2
are specified and -d
option is not specified, then command will replace each characters in set1
with each character in same position in set2
. Input characters in the string set1
are mapped to corresponding characters in the string set1
so it is resonable that both set1
and set2
should have equal length. If this is not the case, no error is generated, but two rules are applied to make them equal
- If length of
- If length of
set2
exceed the length ofset1
, excess characters inset2
are ignored.
set2
is less then the length of set1
then set2
is extended to the length of set1
by repeating its last character as many times as necessary.
Being more precisely, both sets can be specified not only by characters but by
- Enumeration of characters like in (see example below)
1tr '{}' '()' < infile > outfile - Using charater ranges like in (see example below)
1tr 'A-Z' 'a-z' < infile > outfile - Using POSIX character classes. Each consists of a word (or abbreviation) surrounded by colons and then enclosed in a set of square brackets. So the sequence
[:class:]
represents all characters belonging to the defined characterclass
, andclass
names arealnum
alphanumeric characters,alpha
alphabetic characters,cntrl
control (non-printing) characters,digit
numeric characters,graph
graphic characters,lower
lower-case alphabetic characters,print
printable characters,punct
punctuation characters,space
whitespace characters,upper
upper-case characters,xdigit
hexadecimal characters 0-9 A-F.
They can be used like in (see example below)
1tr '[:upper:]' '[:lower:]' < infile > outfileClasses can be combined to form a more complex set, for example
'[:lower:][:digit:]'
(see example below)
We can also mix all of the above methods (see example below).
Typically tr
accepts three options
-c
Converts the set to the complement of the listed characters, i.e., operations apply to characters not in the given set.-d
Delete characters in the first set from the output.-s
Squeeze multiple occurrences of the characters listed in the last operand (eitherset1
orset2
) in the input into a single instance of the character. This occurs after all deletion and translation is completed.
On MacOS another two options (-C
, -u
) are available (however -c
option has different meaning; -C
on MacOS = -c
on Linux)
-C
Complement the set of characters inset1
.-c
Same as-C
but complement the set of values in string1.-u
Guarantee that any output is unbuffered.
tr
-- usage examplesWe will use the following test file
data12.txt
1 2 3 4 |
foo bar 1 2 3 oof:rab:3:2:1 123foobar FOO BAR 1 2 3 |
1 2 3 4 5 |
MacBook-Air-Piotr:small fulmanp$ cat data12.txt foo bar 1 2 3 oof:rab:3:2:1 123foobar FOO BAR 1 2 3 |
- Replaced
:
with a-
123456MacBook-Air-Piotr:small fulmanp$ tr : - < data12.txt > res.txtMacBook-Air-Piotr:small fulmanp$ cat res.txtfoo bar 1 2 3oof-rab-3-2-1123foobarFOO BAR 1 2 3
Alternatively we can use pipe
123456MacBook-Air-Piotr:small fulmanp$ cat data12.txt | tr : - > res.txtMacBook-Air-Piotr:small fulmanp$ cat res.txtfoo bar 1 2 3oof-rab-3-2-1123foobarFOO BAR 1 2 3 - Replace using enumeration of characters (replace more than one character)
12345MacBook-Air-Piotr:small fulmanp$ tr 'fob' '#@!' < data12.txt#@@ !ar 1 2 3@@#:ra!:3:2:1123#@@!arFOO BAR 1 2 3 - Replace using charater ranges
12345678910MacBook-Air-Piotr:small fulmanp$ tr 'a-z' '#' < data12.txt### ### 1 2 3###:###:3:2:1123######FOO BAR 1 2 3MacBook-Air-Piotr:small fulmanp$ tr 'a-z' '#@!' < data12.txt!!! @#! 1 2 3!!!:!#@:3:2:1123!!!@#!FOO BAR 1 2 3 - Delete specified characters
12345MacBook-Air-Piotr:small fulmanp$ tr -d fo < data12.txtbar 1 2 3:rab:3:2:1123barFOO BAR 1 2 3 - Squeeze repetition of characters
1234MacBook-Air-Piotr:small fulmanp$ tr [:space:] '?' < data12.txtfoo???bar?1??2???3?oof:rab:3:2:1?123foobar?FOO???BAR?1??2???3?MacBook-Air-Piotr:small fulmanp$MacBook-Air-Piotr:small fulmanp$ tr -s [:space:] '?' < data12.txtfoo?bar?1?2?3?oof:rab:3:2:1?123foobar?FOO?BAR?1?2?3?MacBook-Air-Piotr:small fulmanp$ - Complement the sets
123456789MacBook-Air-Piotr:small fulmanp$ tr : - < data12.txtfoo bar 1 2 3oof-rab-3-2-1123foobarFOO BAR 1 2 3MacBook-Air-Piotr:small fulmanp$ tr -c : - < data12.txt----------------------:---:-:-:-------------------------------MacBook-Air-Piotr:small fulmanp$MacBook-Air-Piotr:small fulmanp$ tr -C : - < data12.txt----------------------:---:-:-:-------------------------------MacBook-Air-Piotr:small fulmanp$ - Using POSIX character classes and mixed set specification
123456789101112131415MacBook-Air-Piotr:small fulmanp$ tr [:digit:] - < data12.txtfoo bar - - -oof:rab:-:-:----foobarFOO BAR - - -MacBook-Air-Piotr:small fulmanp$ tr [:digit:]r - < data12.txtfoo ba- - - -oof:-ab:-:-:----fooba-FOO BAR - - -MacBook-Air-Piotr:small fulmanp$ tr [:digit:]r[:lower:] - < data12.txt--- --- - - ----:---:-:-:----------FOO BAR - - - - Difference between
-c
and-C
(who can explain this????)
1234567891011121314151617181920212223242526272829303132MacBook-Air-Piotr:small fulmanp$ ls -l | grep data02.txt-rw-r--r-- 1 fulmanp staff 3 23 lis 21:47 data02.txtMacBook-Air-Piotr:small fulmanp$ cat data02.txtĄMacBook-Air-Piotr:small fulmanp$ od -t x1 data02.txt0000000 c4 84 0a0000003MacBook-Air-Piotr:small fulmanp$ tr -c 'Ą' '#' < data02.txt > res.txtMacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt0000000 c4 84 230000003MacBook-Air-Piotr:small fulmanp$ tr -C 'Ą' '#' < data02.txt > res.txtMacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt0000000 c4 84 230000003MacBook-Air-Piotr:small fulmanp$ tr -c '\x84' '#' < data02.txt > res.txtMacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt0000000 23 230000002MacBook-Air-Piotr:small fulmanp$ tr -C '\x84' '#' < data02.txt > res.txtMacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt0000000 23 230000002MacBook-Air-Piotr:small fulmanp$ tr -c '\xc4' '#' < data02.txt > res.txtMacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt0000000 23 230000002MacBook-Air-Piotr:small fulmanp$ tr -C '\xc4' '#' < data02.txt > res.txtMacBook-Air-Piotr:small fulmanp$ od -t x1 res.txt0000000 23 230000002MacBook-Air-Piotr:small fulmanp$