A part of GNU Coreutils, cut command likes paste and join is one that operates on fields. The cut command prints selected parts (fields or sections) of lines.
Things to keep in mind:
Work on text stream line by line
Print sections/parts/fields
Either -c, -b or -f option must be used
The default delimiter is TAB
The delimiter must be a single character
Consecutive delimiters need to be consolidated into one
Cut multiple or a range of fields
Generate sample data and saved in the file tab.txt:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ file=tab.txt && \
for i in $(seq 19); do \
for j in $(seq 19); do \
if [ "$j" == "9" ]; then \
echo-e"$i$j"; \
else \
echo -en "$i$j\t"; \
fi; \
done; \
done > $file && cat $file
111213141516171819
212223242526272829
313233343536373839
414243444546474849
515253545556575859
616263646566676869
717273747576777879
818283848586878889
919293949596979899
Fields on each line is separated by a tab character.
There is a required option:
1
2
3
$ cut tab.txt
cut: you must specify a list of bytes, characters, or fields
Try 'cut --help'for more information.
These options are:
bytes: -b or --bytes=LIST
characters: -c or --characters=LIST
fields: -f or --fields=LIST
This is very odd. Option means optional, a default should be provided. But let’s focus on the common used one: fields.
Cut the first and the ninth fields with the default delimiter TAB:
will print any line that contains no delimiter character:
1
2
3
4
5
6
7
8
9
10
$ cut -f1 mixed.txt
11
21
31
41
51
61
71
81
919293949596979899
Or use -s or --only-delimited option to omit those lines.
1
2
3
4
5
6
7
8
9
$ cut -sf 1 mixed.txt
11
21
31
41
51
61
71
81
But the better approach is to do data cleansing prior.
What about multiple TAB characters in the file:
1
2
3
4
5
6
7
8
9
10
$ sed -i 's/\(1.\)\t/\1\t\t/' mixed.txt && cat $_
111213141516171819
212223242526272829
313233343536373839
414243444546474849
515253545556575859
616263646566676869
717273747576777879
818283848586878889
919293949596979899
An empty field is still a field:
1
2
3
4
5
6
7
8
9
$ cut -sf2 mixed.txt
22
32
42
52
62
72
82
Therefore, the drawback here is that there cannot be multiple delimiter sticking together. Must perform data cleansing to reduce consecutive delimiters into a single one:
1
2
3
4
5
6
7
8
9
10
$ sed -i 's/\t\+/\t/g' mixed.txt && cat $_
111213141516171819
212223242526272829
313233343536373839
414243444546474849
515253545556575859
616263646566676869
717273747576777879
818283848586878889
919293949596979899
Multiple fields can be cut:
1
2
3
4
5
6
7
8
9
10
$ cut -f1,3,5,7,9 tab.txt
1113151719
2123252729
3133353739
4143454749
5153555759
6163656769
7173757779
8183858789
9193959799
Cut a range:
1
2
3
4
5
6
7
8
9
10
$ cut -f3-5 tab.txt
131415
232425
333435
434445
535455
636465
737475
838485
939495
Cut up to or from a field:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cut -f -3 tab.txt
111213
212223
313233
414243
515253
616263
717273
818283
919293
$ cut -f7- tab.txt
171819
272829
373839
474849
575859
676869
777879
878889
979899
When cut multiple fields, the fields are separated by the same delimiter used (indicated by -d field or TAB as the default). If change the output delimiter, it’s not the job of cut, pipe to another program:
I would like to add a custom domain to a module other than the default one in Google App Engine Managed VMs. For example, I have the following sub-domains:
1
2
3
4
Custom domain names | SSL support | Record type | Data | Alias
The domain api.example.com points to the default module. And I would like the other domain admin.example.com to point to the module admin. However, by merely adding the custom domain in the settings, both api.example.com and admin.example.com are pointed to the same module: the default module. Then how to point the custom domain and route the traffic to the admin module? The answer is on the dispatch file. But first need to understand the concept of modules in Google App Engine.
The above chart illustrates the architecture of a Google App Engine application:
Application: “An App Engine application is made up of one or more modules”. [1]
Module: “Each module consists of source code and configuration files. The files used by a module represents a version of the module. When you deploy a module, you always deploy a specific version of the module.” [1] “All module-less apps have been converted to contain a single default module.” [1] Every application has a single default module.
Version: “A particular module/version will have one or more instances.”
Instance: “Each instance runs its own separate executable.”
Another concept is the resource sharing:
“App Engine Modules let developers factor large applications into logical components that can share stateful services and communicate in a secure fashion.” [1]
“Stateful services (such as Memcache, Datastore, and Task Queues) are shared by all modules in an application.” [1]
Remember that Google Cloud Platform is project based. If there are a web application and a mobile application, and they are both making requests to another application, the API. Then there should be three projects. However, in this case, both api and admin share the same services, such as the same database and file storage. We should put them together in the same application as separate modules.
With that in mind, how to route requests to a specific module? “Every module, version, and instance has its own unique URI (for example, v1.my-module.my-app.appspot.com). Incoming user requests are routed to an instance of a particular module/version according to URL addressing conventions and an optional customized dispatch file.” [1] Since a custom domain is used, we have to use the customized dispatch file. And this is done by creating a dispatch file dispatch.yml to route requests based on URL patterns.
1
2
3
4
5
6
7
8
# Dispatch
# ========
---
dispatch:
- url: '*/favicon.ico'
module: default
- url: 'admin.example.com/*'
module: admin
The application field is not necessary, otherwise, you will get:
1
2
WARNING: The [application] field is specified in file [/home/chao/example/dispatch.yml].
This field is not used by gcloud and should be removed.
Glob characters (such as asterisk) can be used, and support up to 10 routing rules.
Deploy the dispatch file is simple:
1
$ gcloud preview app deploy dispatch.yml
And it should require almost no time to wait. Once it is ready, the requests sent to admin.example.com will be routed properly to the admin module.
When you have a release for the first time, how do you name your release version based on SemVer? 1.0.0, 0.1.0 or 0.0.1?
I like to start off with 0.1.0. Because version 0.0.0 means there is nothing. Then from nothing to something, that’s breaking change, hence version 0.1.0.
I was planning to build two different Docker images: one for production and one for test. However, the more I had coded, the more frustrated I had become. I want something simple, and more means complexity:
Having multiple images means more configuration files to manage and update.
More things to do when working with third-party tools, for example, Google App Engine Managed VMs deployment will only recognize the application root level Dockerfile out of the box.
Steeper learning curve for new developers.
Keep our application structure simple is important. If you need multiple dockerfiles, then your application is too complex. The difference between npm install --production and npm install isn’t a lot. But it will save you time and effort from managing multiple dockerfiles. There is no reason to have more than one Dockerfile. Just use a different command such as npm test when running tests.
When provisioning a new machine in Google Compute Engine with Vagrant, I spotted the following error messages when updating the system:
1
2
Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
E: Unable to lock directory /var/lib/apt/lists/
It appears to be that I was just behind on the race to grab the lock on the APT update. Without getting the latest index, I am not able to upgrade the packages. An simple solution is to wait it out:
When working with Google Cloud Datastore, I would like to design each kind (collection or table) with its own schema and index files:
1
2
3
4
$ tree
.
├── index.yml
└── schema.yml
But not every kind has to have something in the index.yml file. So, what to do? Will the index creation command accept an empty index.yml file with zero byte? Let’s give try:
Strings are likely the most frequently used. When working with strings, the most important question to ask is how many bytes can be inserted into the property/field. According to the table listed in properties and value types:
Up to 1500 bytes if property is indexed, up to 1MB otherwise
Let’s give a try:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// String Property Type
// ====================
//
// Investigate the string property type of Google Cloud Datastore.
//
// Dependencies tested:
//
// - `gcloud@v0.21.0`
// - `lorem-ipsum@1.0.3`
'use strict';
// Define dependencies.
var params = {
projectId: 'myproj',
keyFilename: '/path/to/key/file.json',
namespace: 'test'
};
var dataset = require('gcloud').datastore.dataset(params);