Get Started with the Cut Command of GNU Coreutils

A part of GNU Coreutils, cut command likes paste and join is one that operates on fields. The cut command prints selected parts (fields or sections) of lines.

Things to keep in mind:

  • Work on text stream line by line
  • Print sections/parts/fields
  • Either -c, -b or -f option must be used
  • The default delimiter is TAB
  • The delimiter must be a single character
  • Consecutive delimiters need to be consolidated into one
  • Cut multiple or a range of fields

Generate sample data and saved in the file tab.txt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ file=tab.txt && \
for i in $(seq 1 9); do \
for j in $(seq 1 9); do \
if [ "$j" == "9" ]; then \
echo -e "$i$j"; \
else \
echo -en "$i$j\t"; \
fi; \
done; \
done > $file && cat $file
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
41 42 43 44 45 46 47 48 49
51 52 53 54 55 56 57 58 59
61 62 63 64 65 66 67 68 69
71 72 73 74 75 76 77 78 79
81 82 83 84 85 86 87 88 89
91 92 93 94 95 96 97 98 99

Fields on each line is separated by a tab character.

There is a required option:

1
2
3
$ cut tab.txt
cut: you must specify a list of bytes, characters, or fields
Try 'cut --help' for more information.

These options are:

  • bytes: -b or --bytes=LIST
  • characters: -c or --characters=LIST
  • fields: -f or --fields=LIST

This is very odd. Option means optional, a default should be provided. But let’s focus on the common used one: fields.

Cut the first and the ninth fields with the default delimiter TAB:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cut -f 1 tab.txt
11
21
31
41
51
61
71
81
91
$ cut -f 9 tab.txt
19
29
39
49
59
69
79
89
99

Use space as the delimiter:

1
2
3
4
5
6
7
8
9
10
$ cp tab.txt space.txt && sed -i 's/\t/ /g' $_ && cat $_
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
41 42 43 44 45 46 47 48 49
51 52 53 54 55 56 57 58 59
61 62 63 64 65 66 67 68 69
71 72 73 74 75 76 77 78 79
81 82 83 84 85 86 87 88 89
91 92 93 94 95 96 97 98 99

We must choose a different delimiter via -d or --delimiter=DELIM option:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cut -f 1 $_
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
41 42 43 44 45 46 47 48 49
51 52 53 54 55 56 57 58 59
61 62 63 64 65 66 67 68 69
71 72 73 74 75 76 77 78 79
81 82 83 84 85 86 87 88 89
91 92 93 94 95 96 97 98 99
$ cut -f 1 -d ' ' $_
11
21
31
41
51
61
71
81
91

Delimiter must be a single character:

1
2
3
$ cut -f 1 -d '\s' $_
cut: the delimiter must be a single character
Try 'cut --help' for more information.

Files containing mixed delimiters (tab and space):

1
2
3
4
5
6
7
8
9
10
$ cp tab.txt mixed.txt && sed -i 's/\(9.\)\t/\1 /g' $_ && cat $_
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
41 42 43 44 45 46 47 48 49
51 52 53 54 55 56 57 58 59
61 62 63 64 65 66 67 68 69
71 72 73 74 75 76 77 78 79
81 82 83 84 85 86 87 88 89
91 92 93 94 95 96 97 98 99

will print any line that contains no delimiter character:

1
2
3
4
5
6
7
8
9
10
$ cut -f 1 mixed.txt
11
21
31
41
51
61
71
81
91 92 93 94 95 96 97 98 99

Or use -s or --only-delimited option to omit those lines.

1
2
3
4
5
6
7
8
9
$ cut -sf 1 mixed.txt
11
21
31
41
51
61
71
81

But the better approach is to do data cleansing prior.

What about multiple TAB characters in the file:

1
2
3
4
5
6
7
8
9
10
$ sed -i 's/\(1.\)\t/\1\t\t/' mixed.txt && cat $_
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
41 42 43 44 45 46 47 48 49
51 52 53 54 55 56 57 58 59
61 62 63 64 65 66 67 68 69
71 72 73 74 75 76 77 78 79
81 82 83 84 85 86 87 88 89
91 92 93 94 95 96 97 98 99

An empty field is still a field:

1
2
3
4
5
6
7
8
9
$ cut -sf 2 mixed.txt
22
32
42
52
62
72
82

Therefore, the drawback here is that there cannot be multiple delimiter sticking together. Must perform data cleansing to reduce consecutive delimiters into a single one:

1
2
3
4
5
6
7
8
9
10
$ sed -i 's/\t\+/\t/g' mixed.txt && cat $_
11 12 13 14 15 16 17 18 19
21 22 23 24 25 26 27 28 29
31 32 33 34 35 36 37 38 39
41 42 43 44 45 46 47 48 49
51 52 53 54 55 56 57 58 59
61 62 63 64 65 66 67 68 69
71 72 73 74 75 76 77 78 79
81 82 83 84 85 86 87 88 89
91 92 93 94 95 96 97 98 99

Multiple fields can be cut:

1
2
3
4
5
6
7
8
9
10
$ cut -f 1,3,5,7,9 tab.txt
11 13 15 17 19
21 23 25 27 29
31 33 35 37 39
41 43 45 47 49
51 53 55 57 59
61 63 65 67 69
71 73 75 77 79
81 83 85 87 89
91 93 95 97 99

Cut a range:

1
2
3
4
5
6
7
8
9
10
$ cut -f 3-5 tab.txt
13 14 15
23 24 25
33 34 35
43 44 45
53 54 55
63 64 65
73 74 75
83 84 85
93 94 95

Cut up to or from a field:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cut -f -3 tab.txt
11 12 13
21 22 23
31 32 33
41 42 43
51 52 53
61 62 63
71 72 73
81 82 83
91 92 93
$ cut -f 7- tab.txt
17 18 19
27 28 29
37 38 39
47 48 49
57 58 59
67 68 69
77 78 79
87 88 89
97 98 99

When cut multiple fields, the fields are separated by the same delimiter used (indicated by -d field or TAB as the default). If change the output delimiter, it’s not the job of cut, pipe to another program:

1
2
3
4
5
6
7
8
9
10
$ cut -f 3-5 tab.txt | sed 's/\t/ /g'
13 14 15
23 24 25
33 34 35
43 44 45
53 54 55
63 64 65
73 74 75
83 84 85
93 94 95