Coding

Wrapping Line by Line JSON to an Array

Some utilities print out JSON data (object) line by line. Line by line JSON is not really a valid JSON format, need to wrap it in bracket as an array.

For example, here are a few lines of JSON objects:

1
2
3
4
$ for i in $(seq 3); do echo '{"id":'$i'}'; done
{"id":1}
{"id":2}
{"id":3}

To combine them into a single array, we can use jq‘s -s, —slurp option:

1
2
$ for i in $(seq 3); do echo '{"id":'$i'}'; done | jq -cM -s
[{"id":1},{"id":2},{"id":3}]

The option reads the entire input stream into a large array. This is good until the input stream is too large to process, because jq is not producing output while slurping up the input. If the input is line by line JSON, and all you want to do is wrapping it into an array (with or even without the trailing newline), we can simply do the following:

1
2
3
$ for i in $(seq 3); do echo '{"id":'$i'}'; done |\
awk 'BEGIN { printf "["; getline; printf "%s", $0 } { printf ",%s", $0 } END { print "]" }'
[{"id":1},{"id":2},{"id":3}]

This is with the newline printed. And it will solve the problem of a very large input. Data will simply stream out as it comes it. Try it with a forever while loop.

There is another option in jq named —stream, and it seems to be doing the same. But option —slurp overrides —stream, and the option itself is already so complicated.

In conclusion, you can use jq -s to wrap line by line JSON into an array. If you want to be safe, just use the awk example above.

Gitignore: Excluding Everything Except a Specific Directory

I have the following directory structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ tree -aF src/
src/
├── github.com/
│   ├── git/
│   │   └── git/
│   │   └── file
│   └── realguess/
│   ├── git/
│   │   └── file
│   └── hello/
│   └── hello.go
└── .gitignore
6 directories, 4 files

My intention is to keep the projects in src/github.com/realguess/ only. So, I have added the following gitignore file:

1
2
3
$ cat src/.gitignore
/github.com/
!/github.com/realguess/

But the directory is ignored:

1
2
$ git check-ignore -v github.com/realguess/
src/.gitignore:1:/github.com/ github.com/realguess/

It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined.[^1]

Uh-oh! The directory github.com/ is excluded. Therefore, it is not possible to re-include a file github.com/realguess (directory is a file). To fix it, just need to add the wildcard to the end of the parent directory:

1
2
3
$ cat src/.gitignore
/github.com/*
!/github.com/realguess/

By adding the wildcard character, it matches every file (including hidden ones) or directory under github.com/ directory, the directory itself as the parent of github.com/realguess/ is not excluded. Then, we use the negation pattern to re-include github.com/realguess/.

Double check it:

1
2
$ git check-ignore -v github.com/realguess/
1

The debugging command returns non-zero status code, which indicates that the directory is not ignored.

Listing Tags in Natural Sort of Version Numbers

Using the Node.js repository as an example:

1
2
3
$ git remote -v
origin https://github.com/nodejs/node.git (fetch)
origin https://github.com/nodejs/node.git (push)

If we would like to list all tags with v0.12 versions, we could do:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ git tag -l 'v0.12.*'
v0.12.0
v0.12.1
v0.12.10
v0.12.11
v0.12.12
v0.12.13
v0.12.14
v0.12.15
v0.12.2
v0.12.3
v0.12.4
v0.12.5
v0.12.6
v0.12.7
v0.12.8
v0.12.8-rc.1
v0.12.9

However, v0.12.2 should come after v0.12.1.

To fix it, we use the sort command with option:

1
2
-V, --version-sort
natural sort of (version) numbers within text

Thus:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ git tag -l 'v0.12.*' | sort --version-sort
v0.12.0
v0.12.1
v0.12.2
v0.12.3
v0.12.4
v0.12.5
v0.12.6
v0.12.7
v0.12.8
v0.12.8-rc.1
v0.12.9
v0.12.10
v0.12.11
v0.12.12
v0.12.13
v0.12.14
v0.12.15

Streaming HTTP Request Directly to Response in Node.js

This is a Node.js starting script to stream HTTP request directly into response:

1
2
3
require('http').createServer((req, res) => {
req.pipe(res); // Pipe request directly to response
}).listen(3000);

It behaves almost like an echo, you get back whatever you sent. For example, use HTTPie to make a request to the above server:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ echo foo | http --verbose --stream :3000 Content-Type:text/plain
POST / HTTP/1.1
Accept: application/json, */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 4
Host: localhost:3000
User-Agent: HTTPie/0.9.6
Content-Type: text/plain
foo
HTTP/1.1 200 OK
Connection: keep-alive
Transfer-Encoding: chunked
foo

We can also add the Content-Type response header to echo back what the entire media type is after assembling all chunks.

1
2
3
4
5
6
7
require('http').createServer((req, res) => {
req.pipe(res); // Pipe request directly to response
if (req.headers['content-type']) {
res.setHeader('Content-Type', req.headers['content-type']);
}
}).listen(3000);

The response should have the Content-Type field as below:

1
2
3
4
5
6
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: text/plain
Transfer-Encoding: chunked
foo

Notice that instead of usual Content-Length in the response header, we’ve got Transfer-Encoding: chunked. The default transfer encoding for Node.js HTTP is chunked:

Sending a ‘Content-length’ header will disable the default chunked encoding.[^1]

About transfer encoding:

Chunked transfer encoding is a data transfer mechanism in version 1.1 of the Hypertext Transfer Protocol (HTTP) in which data is sent in a series of “chunks”. It uses the Transfer-Encoding HTTP header in place of the Content-Length header, which the earlier version of the protocol would otherwise require. Because the Content-Length header is not used, the sender does not need to know the length of the content before it starts transmitting a response to the receiver. Senders can begin transmitting dynamically-generated content before knowing the total size of that content. … The size of each chunk is sent right before the chunk itself so that the receiver can tell when it has finished receiving data for that chunk. The data transfer is terminated by a final chunk of length zero.[^2]

With the above starting script, now you can attach some transform streams to manipulate the request and stream back in chunked response.

Settings:

1
2
3
4
$ node --version
v6.3.1
$ http --version
0.9.6

[^1]: HTTP, Node.js API Docs

[^2]: Chunked transfer encoding, Wikipedia

Escaping in JSON with Backslash

Escape characters are part of the syntax for many programming languages, data formats, and communication protocols. For a given alphabet an escape character’s purpose is to start character sequences (so named escape sequences), which have to be interpreted differently from the same characters occurring without the prefixed escape character.[^2]

JSON or JavaScript Object Notation is a data interchange format. It has an escape character as well.

In many programming languages such as C, Perl, and PHP and in Unix scripting languages, the backslash is an escape character, used to indicate that the character following it should be treated specially (if it would otherwise be treated normally), or normally (if it would otherwise be treated specially).[^3]

JavaScript also uses backslash as an escape character. JSON is based on a subset of the JavaScript Programming Language, therefore, JSON also uses backslash as the escape character:

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.[^1]

A character can be:

  • Any Unicode character except " or \ or control character
  • \"
  • \\
  • \/
  • \b
  • \f
  • \n
  • \r
  • \t
  • \u + four-hex-digits

Only a few characters can be escaped in JSON. If the character is not one of the listed:

1
2
$ cat data.json
"\a"

it returns a SyntaxError[^4]:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ node -e 'console.log(require("./data.json"))'
module.js:561
throw err;
^
SyntaxError: /home/chao/tmp/js/data.json: Unexpected token a in JSON at position 2
at Object.parse (native)
at Object.Module._extensions..json (module.js:558:27)
at Module.load (module.js:458:32)
at tryModuleLoad (module.js:417:12)
at Function.Module._load (module.js:409:3)
at Module.require (module.js:468:17)
at require (internal/module.js:20:19)
at [eval]:1:13
at ContextifyScript.Script.runInThisContext (vm.js:25:33)
at Object.exports.runInThisContext (vm.js:77:17)

The First Release Version

When you have a release for the first time, how do you name your release version based on SemVer? 1.0.0, 0.1.0 or 0.0.1?

I like to start off with 0.1.0. Because version 0.0.0 means there is nothing. Then from nothing to something, that’s breaking change, hence version 0.1.0.

Changelog Starter

A changelog starter to be consistent with my coding style:

  • Two header levels only with setext style.
  • Use underlines for first-level headers and dashes for second-level headers.
  • Each release is a section, hence use second-level header.
  • Each category in a release is not a section, hence use emphasis with double asterisks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
Changelog
=========
Guideline:
1. Record notable changes.
2. List releases in reversed chronological order, newest on top.
3. Write all dates in `YYYY-MM-DD` format.
4. Follow [SemVer] format for each release.
5. Creating a new release by copying the **Example** section (on the bottom)
to the top and rename it **Unreleased**.
6. Keep an **Unreleased** section on top for tracking any changes in order to
minimize required effort.
7. Change _Unreleased_ to a release name with the date when ready.
_The guideline attempts to follow [Keep a CHANGELOG][1]._
[SemVer]: http://semver.org/
[1]: http://keepachangelog.com/
0.1.0 / 2015-01-01
------------------
**Security**
- Inform users to upgrade for vulnerabilities.
**Removed**
- Remove a deprecated feature in this release.
**Deprecated**
- Deprecate a feature in the upcoming release.
**Changed**
- Change an existing functionality.
**Added**
- Add a new feature.
**Fixed**
- Fix a bug.
Example / YYYY-MM-DD
--------------------
**Security**
- Inform users to upgrade for vulnerabilities.
- Each sentence must use the present tense and ends with a period.
**Removed**
- Remove a deprecated feature in this release.
- Basic Authentication is dropped in favor of Digest Authentication.
**Deprecated**
- Deprecate a feature in the upcoming release.
- Basic Authentication will be replaced by Digest Authentication in the next
release.
**Changed**
- Change an existing functionality.
- User session data is added to the user object upon initialization.
**Added**
- Add a new feature.
- A new endpoint `/users` is added to handle user CRUD.
**Fixed**
- Fix a bug.
- Add the default case to handle unknown cases in user group assigning.

Restart Upstart Instances on System Reboot

A single Upstart job can have multiple instances running:

1
2
3
$ sudo start my-job port=4000
$ sudo start my-job port=4001
$ sudo start my-job port=4002

However, when the operating system reboots, the job with multiple instances will fail to start, due to instance information is not provided to the job. We can fix this problem by adding a for loop in the script section:

1
2
3
4
5
6
7
8
9
start on (local-filesystems and net-device-up IFACE!=lo)
stop on shutdown
script
for i in `seq 4000 4002`
do
exec /path/to/my/job
done
end script

With this Upstart job, we do not need to provide instance information:

1
$ sudo start my-job

Therefore, during system restart, the job will initiate automatically.

Synchronous Console Methods

In Node v0.10.x, the console functions are synchronous when the destination is a terminal or a file (to avoid lost messages in case of premature exit) and asynchronous when it’s a pipe (to avoid blocking for long periods of time). See console. We can test it with the following script: