A part of GNU Coreutils, cut command likes paste and join is one that operates on fields. The cut command prints selected parts (fields or sections) of lines.
Things to keep in mind:
Work on text stream line by line
Print sections/parts/fields
Either -c, -b or -f option must be used
The default delimiter is TAB
The delimiter must be a single character
Consecutive delimiters need to be consolidated into one
Cut multiple or a range of fields
Generate sample data and saved in the file tab.txt:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ file=tab.txt && \
for i in $(seq 19); do \
for j in $(seq 19); do \
if [ "$j" == "9" ]; then \
echo-e"$i$j"; \
else \
echo -en "$i$j\t"; \
fi; \
done; \
done > $file && cat $file
111213141516171819
212223242526272829
313233343536373839
414243444546474849
515253545556575859
616263646566676869
717273747576777879
818283848586878889
919293949596979899
Fields on each line is separated by a tab character.
There is a required option:
1
2
3
$ cut tab.txt
cut: you must specify a list of bytes, characters, or fields
Try 'cut --help'for more information.
These options are:
bytes: -b or --bytes=LIST
characters: -c or --characters=LIST
fields: -f or --fields=LIST
This is very odd. Option means optional, a default should be provided. But let’s focus on the common used one: fields.
Cut the first and the ninth fields with the default delimiter TAB:
will print any line that contains no delimiter character:
1
2
3
4
5
6
7
8
9
10
$ cut -f1 mixed.txt
11
21
31
41
51
61
71
81
919293949596979899
Or use -s or --only-delimited option to omit those lines.
1
2
3
4
5
6
7
8
9
$ cut -sf 1 mixed.txt
11
21
31
41
51
61
71
81
But the better approach is to do data cleansing prior.
What about multiple TAB characters in the file:
1
2
3
4
5
6
7
8
9
10
$ sed -i 's/\(1.\)\t/\1\t\t/' mixed.txt && cat $_
111213141516171819
212223242526272829
313233343536373839
414243444546474849
515253545556575859
616263646566676869
717273747576777879
818283848586878889
919293949596979899
An empty field is still a field:
1
2
3
4
5
6
7
8
9
$ cut -sf2 mixed.txt
22
32
42
52
62
72
82
Therefore, the drawback here is that there cannot be multiple delimiter sticking together. Must perform data cleansing to reduce consecutive delimiters into a single one:
1
2
3
4
5
6
7
8
9
10
$ sed -i 's/\t\+/\t/g' mixed.txt && cat $_
111213141516171819
212223242526272829
313233343536373839
414243444546474849
515253545556575859
616263646566676869
717273747576777879
818283848586878889
919293949596979899
Multiple fields can be cut:
1
2
3
4
5
6
7
8
9
10
$ cut -f1,3,5,7,9 tab.txt
1113151719
2123252729
3133353739
4143454749
5153555759
6163656769
7173757779
8183858789
9193959799
Cut a range:
1
2
3
4
5
6
7
8
9
10
$ cut -f3-5 tab.txt
131415
232425
333435
434445
535455
636465
737475
838485
939495
Cut up to or from a field:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cut -f -3 tab.txt
111213
212223
313233
414243
515253
616263
717273
818283
919293
$ cut -f7- tab.txt
171819
272829
373839
474849
575859
676869
777879
878889
979899
When cut multiple fields, the fields are separated by the same delimiter used (indicated by -d field or TAB as the default). If change the output delimiter, it’s not the job of cut, pipe to another program: