Streaming HTTP Request Directly to Response in Node.js

This is a Node.js starting script to stream HTTP request directly into response:

1
2
3
require('http').createServer((req, res) => {
req.pipe(res); // Pipe request directly to response
}).listen(3000);

It behaves almost like an echo, you get back whatever you sent. For example, use HTTPie to make a request to the above server:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ echo foo | http --verbose --stream :3000 Content-Type:text/plain
POST / HTTP/1.1
Accept: application/json, */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 4
Host: localhost:3000
User-Agent: HTTPie/0.9.6
Content-Type: text/plain
foo
HTTP/1.1 200 OK
Connection: keep-alive
Transfer-Encoding: chunked
foo

We can also add the Content-Type response header to echo back what the entire media type is after assembling all chunks.

1
2
3
4
5
6
7
require('http').createServer((req, res) => {
req.pipe(res); // Pipe request directly to response
if (req.headers['content-type']) {
res.setHeader('Content-Type', req.headers['content-type']);
}
}).listen(3000);

The response should have the Content-Type field as below:

1
2
3
4
5
6
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: text/plain
Transfer-Encoding: chunked
foo

Notice that instead of usual Content-Length in the response header, we’ve got Transfer-Encoding: chunked. The default transfer encoding for Node.js HTTP is chunked:

Sending a ‘Content-length’ header will disable the default chunked encoding.[^1]

About transfer encoding:

Chunked transfer encoding is a data transfer mechanism in version 1.1 of the Hypertext Transfer Protocol (HTTP) in which data is sent in a series of “chunks”. It uses the Transfer-Encoding HTTP header in place of the Content-Length header, which the earlier version of the protocol would otherwise require. Because the Content-Length header is not used, the sender does not need to know the length of the content before it starts transmitting a response to the receiver. Senders can begin transmitting dynamically-generated content before knowing the total size of that content. … The size of each chunk is sent right before the chunk itself so that the receiver can tell when it has finished receiving data for that chunk. The data transfer is terminated by a final chunk of length zero.[^2]

With the above starting script, now you can attach some transform streams to manipulate the request and stream back in chunked response.

Settings:

1
2
3
4
$ node --version
v6.3.1
$ http --version
0.9.6

[^1]: HTTP, Node.js API Docs

[^2]: Chunked transfer encoding, Wikipedia

Installing Caddy 0.9.x on Ubuntu/Debian System

Install Caddy via its installer script on Ubuntu/Debian system:

1
2
3
4
5
6
7
$ curl -s https://getcaddy.com/ | sudo bash
Downloading Caddy for linux/amd64...
https://caddyserver.com/download/build?os=linux&arch=amd64&arm=&features=
Extracting...
Putting caddy in /usr/local/bin (may require password)
Caddy 0.9.1 (+e8e5595)
Successfully installed

This is different from the Download page, where you get to select additional features (see the &features= URL query parameter).

1
2
$ which caddy
caddy is /usr/local/bin/caddy

Get the installed version:

1
2
$ caddy --version
Caddy 0.9.1 (+e8e5595)

Get help:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
$ caddy -h
Usage of caddy:
-agree
Agree to the CA's Subscriber Agreement
-ca string
URL to certificate authority's ACME server directory (default "https://acme-v01.api.letsencrypt.org/directory")
-conf string
Caddyfile to load (default "Caddyfile")
-cpu string
CPU cap (default "100%")
-email string
Default ACME CA account email address
-grace duration
Maximum duration of graceful shutdown (default 5s)
-host string
Default host
-http2
Use HTTP/2 (default true)
-log string
Process log file
-pidfile string
Path to write pid file
-plugins
List installed plugins
-port string
Default port (default "2015")
-quic
Use experimental QUIC
-quiet
Quiet mode (no initialization output)
-revoke string
Hostname for which to revoke the certificate
-root string
Root path of default site (default ".")
-type string
Type of server to run (default "http")
-version
Show version

Run Caddy locally:

1
2
3
4
$ caddy
Activating privacy features... done.
http://:2015
WARNING: File descriptor limit 1024 is too low for production servers. At least 8192 is recommended. Fix with "ulimit -n 8192".

A file descriptor is simply a number that the operating system assigns to an open file to keep track of it. Caddy’s primary goal is to be an easy-to-use static file web server. Having high file descriptor limit means it can open more files to serve users at the same time.

1
2
3
$ ulimit -Sn && ulimit -Hn
1024
4096

The current system is too low in both soft and hard limits. But since it’s not in production, warning can be ignored.

Make sure the server working:

1
2
3
4
5
6
7
8
$ http :2015
HTTP/1.1 404 Not Found
Content-Length: 14
Content-Type: text/plain; charset=utf-8
Server: Caddy
X-Content-Type-Options: nosniff
404 Not Found

Response header X-Content-Type-Options: nosniff prevents MIME based attacks, it tells the browser to respect the response content type, not to override.

Status code 404 means working, but just lacks an index file. Let’s create one:

Accessing Upwork JSON Data without the API

Upwork, formerly Elance-oDesk, is the world’s largest freelancing marketplace. I’m interested to know what types of jobs they are in the platform, and how many. For a lazy programmer, browsing each job category and clicking on each link, and copying those numbers is not the way to go. I need to automate this. There is an API. But before diving into the API documentation, let’s see if there is another way (“Rule of Diversity”).

Before continuing, a word of warning, this is prohibited:

Using any robot, spider, scraper, or other automated means to access the Site for any purpose without our express written permission or collecting or harvesting any personally identifiable information, including Account names, from the Site;[^2]

After poking around the web app, it communicates with its backend by using JSON data exchange format via the URL: https://www.upwork.com/o/jobs/browse/url. However, if accessing the URL directly, it will respond with 404 page not exist error. Something is missing.

Well, the web app is able to successfully make the request, so this is not difficult to tackle. Just use the process of elimination from the working request, it will reveal the required information.

After a couple tries, just need to add the request header: X-Requested-With: XMLHttpRequest, then the JSON response with the status code 200 will be returned:

1
2
3
4
5
6
7
8
9
$ http --verbose https://www.upwork.com/o/jobs/browse/url \
X-Requested-With:XMLHttpRequest
GET /o/jobs/browse/url HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: www.upwork.com
User-Agent: HTTPie/0.9.6
X-Requested-With: XMLHttpRequest

The default sort is by creation time in descending order, so you don’t need to add the query parameters: sort==create_time+desc (HTTPie).

Let’s load the response data into Node.js and perform a quick analysis:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ node
> data = require('./upwork.json')
{ url: '/o/jobs/browse/',
searchResults:
{ q: '',
paging: { total: 87654, offset: 0 },
spellcheck: { corrected_queries: [] },
jobs:
[ [Object],
[Object],
[Object],
[Object],
[Object],
[Object],
[Object],
[Object],
[Object],
[Object] ],
smartSearch: { downloadTeamApplication: false },
facets:
{ jobType: [Object],
workload: [Object],
duration: [Object],
clientHires: [Object],
contractorTier: [Object],
categories: [Object],
previousClients: [Object],
subcategories: [] },
isSearchWithEmptyParams: true,
subcategories: [],
currentQuery: {},
rssLink: '/ab/feed/jobs/rss?api_params=1&q=',
atomLink: '/ab/feed/jobs/atom?api_params=1&q=',
queryParsedParams: [],
pageTitle: 'Freelance Jobs - Upwork' } }

The property searchResults.paging.total is the total number of jobs available:

1
2
> data.searchResults.paging
{ total: 87654, offset: 0 }

But, the number is different from the web app, a lot less, 50% less jobs found. Is that because the request is not recognized as a logged-in user? Let’s find out.

Installing jq from Source

Packages built in both Ubuntu and Debian packages lack behind, therefore, to get the latest version of jq, build from source.

There are a few prerequisites to install:

  • GCC
  • Make
  • Autotools

Both GCC and Make are usually installed if you do development, but not Autotools. Luckily, this is easy to fulfill:

1
$ sudo apt-get install automake

Install from source:

1
2
3
4
$ sudo git clone https://github.com/stedolan/jq.git
$ cd jq
$ sudo git checkout jq-1.5
$ sudo ./configure && sudo make && sudo make install

The installed path is at:

1
2
$ which jq
jq is /usr/local/bin/jq

However, this gives me an unexpected tag:

1
2
$ jq --version
jq-1.5-dirty

Will Docker Container Restart Pick Up Updated Image?

When a Docker image has been updated, will restarting the running container via docker restart pick up the change? Educated guess will be no, because like restarting a process, the memory is still retained. The best way to find out is to give a try.

Let’s start with a Dockerfile:

1
2
3
# Version Foo
FROM debian:8.5
CMD while true; do echo foo; sleep 5; done

The command will keep printing foo every 5 seconds.

Create the image:

1
2
3
4
5
6
7
8
9
$ docker build -t example .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM debian:8.5
---> 1b088884749b
Step 2 : CMD while true; do echo foo; sleep 5; done
---> Running in 38fdeb15f629
---> 6a56a50ef254
Removing intermediate container 38fdeb15f629
Successfully built 6a56a50ef254

Notice the image ID starting with 6a56.

Start the container:

1
2
$ docker run -d --name example example
dac42e7194e4ec2bdca8e24db29a3333ae2f422d316e341c5cb1499034a4357b

Check the log:

1
2
3
$ docker logs example
foo
foo

This is expected output.

Inspect the container:

1
$ docker inspect example

The important field is the corresponding image, which matches to the previous built image:

1
2
3
4
5
{
...
"Image": "sha256:6a56a50ef254bb1d07117b0a0750ef81fafe9735ab3b0f2b0a14511f38d5b83d"
...
}

Now update the Dockerfile:

1
2
3
# Version Bar
FROM debian:8.5
CMD while true; do echo bar; sleep 5; done

This time it prints bar instead of foo.

Rebuild the image:

1
2
3
4
5
6
7
8
9
$ docker build -t example .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM debian:8.5
---> 1b088884749b
Step 2 : CMD while true; do echo bar; sleep 5; done
---> Running in 7fc297e12005
---> a6c04345afb9
Removing intermediate container 7fc297e12005
Successfully built a6c04345afb9

Now we have a different image. The image ID is different: a6c0. But the old image is still there:

1
2
3
4
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
example latest a6c04345afb9 24 seconds ago 125.1 MB
<none> <none> 6a56a50ef254 3 minutes ago 125.1 MB

Restart the container:

1
2
$ docker restart example
example

Got bar? No still foo all the way with the log. And when you inspect the container, it still uses the old image.

So, docker restart will not pick up the changes from updated image, it will still use the old image built previously. Therefore, the correct way is to drop the container entirely and run it again:

1
2
3
4
$ docker stop example && docker rm example && docker run -d --name example example
example
example
55cec9110fed0257060673a085a08f143003336b1720894f43c6ac5a22104935

The log shows the correct message:

1
2
3
$ docker logs example
bar
bar

Inspecting the container, now it has the correct image:

1
2
3
4
5
6
$ docker inspect example
{
...
"Image": "sha256:a6c04345afb953ab392241f56c04f72110c772a6ee3a36e248c1ffd03f81b7d6"
...
}

And don’t forget to delete the old image.

Settings:

1
2
$ docker --version
Docker version 1.12.0, build 8eab29e

Fixing Authorization Failure in AWS CLI by Synchronizing the Clock

Running into an error when executing an AWS command:

1
2
3
4
$ aws ec2 describe-instances
An error occurred (AuthFailure) when calling the DescribeInstances operation: AWS
was not able to validate the provided access credentials

From the error message, it appears to be an error with access credentials. But after updating to a new credential, and even updated the AWS package, the error still persisted. After trying out other commands, there was an error message containing “signature not yet current” with timestamps. So, the actual problem was due to inaccurate local clock. Hence, the solution is to sync the local date and time by polling the Network Time Protocol (NTP) server:

1
$ sudo ntpdate pool.ntp.org

ntpdate can be run manually as necessary to set the host clock, or it can be run from the host startup script to set the clock at boot time. This is useful in some cases to set the clock initially before starting the NTP daemon ntpd. It is also possible to run ntpdate from a cron script. However, it is important to note that ntpdate with contrived cron scripts is no substitute for the NTP daemon, which uses sophisticated algorithms to maximize accuracy and reliability while minimizing resource use. Finally, since ntpdate does not discipline the host clock frequency as does ntpd, the accuracy using ntpdate is limited.[^1]

From the description, we can learn that we can make things even easier by installing NTP package:

1
$ sudo apt-get install -y ntp

Network Time Protocol daemon and utility programs NTP, the Network Time Protocol, is used to keep computer clocks accurate by synchronizing them over the Internet or a local network, or by following an accurate hardware receiver that interprets GPS, DCF-77, NIST or similar time signals.[^2]

Verify the installation and execution:

1
2
$ ps -e | grep ntpd
4964 ? 00:00:00 ntpd

with the environment:

1
2
$ aws --version
aws-cli/1.10.53 Python/2.7.6 Linux/3.13.0-92-generic botocore/1.4.43

[^1]: $ man nptdate
[^2]: $ apt-cache show ntp

Creating a Data Volume Container in Dockerfile

Create a Docker data volume container in Dockerfile is unbelievably simple, just use the VOLUME instruction:

1
2
FROM debian:8.5
VOLUME ["/data"]

The instruction creates a mount point and attach the volumes from native host or other containers.

Build the data container:

1
2
3
4
5
6
7
8
9
$ docker build -t data .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM debian:8.5
---> 1b088884749b
Step 2 : VOLUME /data
---> Running in 5511f34a489c
---> 7b723b2b3d13
Removing intermediate container 5511f34a489c
Successfully built 7b723b2b3d13

The built size is just about 125.1 MB.

1
$ docker create --name data data

The first data is the name of the container, the second data is the name of Docker image.

To attach the data volume container to another, we use --volumes-from option:

1
$ docker run -it --rm --name foo --volumes-from=data debian:8.5 /bin/bash

If there’re initial data to copy, then add the COPY instruction:

1
2
3
FROM debian:8.5
VOLUME ["/data"]
COPY . /data

Settings:

1
2
$ docker --version
Docker version 1.11.2, build b9f10c9

Escaping in JSON with Backslash

Escape characters are part of the syntax for many programming languages, data formats, and communication protocols. For a given alphabet an escape character’s purpose is to start character sequences (so named escape sequences), which have to be interpreted differently from the same characters occurring without the prefixed escape character.[^2]

JSON or JavaScript Object Notation is a data interchange format. It has an escape character as well.

In many programming languages such as C, Perl, and PHP and in Unix scripting languages, the backslash is an escape character, used to indicate that the character following it should be treated specially (if it would otherwise be treated normally), or normally (if it would otherwise be treated specially).[^3]

JavaScript also uses backslash as an escape character. JSON is based on a subset of the JavaScript Programming Language, therefore, JSON also uses backslash as the escape character:

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.[^1]

A character can be:

  • Any Unicode character except " or \ or control character
  • \"
  • \\
  • \/
  • \b
  • \f
  • \n
  • \r
  • \t
  • \u + four-hex-digits

Only a few characters can be escaped in JSON. If the character is not one of the listed:

1
2
$ cat data.json
"\a"

it returns a SyntaxError[^4]:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ node -e 'console.log(require("./data.json"))'
module.js:561
throw err;
^
SyntaxError: /home/chao/tmp/js/data.json: Unexpected token a in JSON at position 2
at Object.parse (native)
at Object.Module._extensions..json (module.js:558:27)
at Module.load (module.js:458:32)
at tryModuleLoad (module.js:417:12)
at Function.Module._load (module.js:409:3)
at Module.require (module.js:468:17)
at require (internal/module.js:20:19)
at [eval]:1:13
at ContextifyScript.Script.runInThisContext (vm.js:25:33)
at Object.exports.runInThisContext (vm.js:77:17)

Getting the Version of the Latest Release

What’s the latest release of Docker?

Its homepage doesn’t tell you anything. Have to poke around, click on a few links, may or may not get you what you want. If there’s a quick way, even better a CLI method, that will be great.

Couple things we can do. First, when installing docker, we use the URL https://get.docker.com/. It has a path that will return an installation instruction with the version number:

1
2
3
4
5
$ curl https://get.docker.com/builds/
# To install, run the following command as root:
curl -sSL -O https://get.docker.com/builds/Linux/x86_64/docker-1.11.2.tgz && sudo tar zxf docker-1.11.2.tgz -C /
# Then start docker in daemon mode:
sudo /usr/local/bin/docker daemon

There is another way. Well, there is always another way. Docker project is hosted in GitHub, we can use this URL:

1
https://github.com/docker/docker/releases/latest

which will be redirected to the latest release:

1
https://github.com/docker/docker/releases/tag/v1.11.2

Since it’s a redirect, we can use HTTP HEAD method without download the entire response body:

1
2
3
4
5
6
7
8
9
$ curl --silent --head https://github.com/docker/docker/releases/latest
HTTP/1.1 302 Found
Server: GitHub.com
Content-Type: text/html; charset=utf-8
Status: 302 Found
Cache-Control: no-cache
Vary: X-PJAX
Location: https://github.com/docker/docker/releases/tag/v1.11.2
Vary: Accept-Encoding

Extract and process the value of the Location field will get us what we are looking for.

Let’s construct a simple command to obtain such an information:

1
2
3
4
5
6
7
8
9
$ curl \
--silent \
--head \
--url https://github.com/docker/docker/releases/latest | \
grep \
--regexp=^Location | \
cut \
--delimiter=/ \
--fields=8

or:

1
2
3
$ curl -sI https://github.com/docker/docker/releases/latest | \
grep ^Location | \
cut -d / -f 8

Both commands will return v1.11.2.

By using GitHub, not only we can get the latest stable release version of Docker, we can also obtain other projects. In fact, if the project was hosted in GitHub, and it was tagged properly with the releases, you can use this method to obtain the version. However, if it’s not properly tagged, such as Node.js, you need to find another way.

Node.js Installation Methods July 2016 Edition

Follow up with the previous blog on various methods to install Node.js, here is the updated one with Node.js v4.x and v6.x versions.

Install Node.js v4.x via NodeSource[^1] setup script:

1
2
$ curl -sL https://deb.nodesource.com/setup_4.x | sudo -E bash -
$ sudo apt-get install -y nodejs

sudo -E option indicates to the security policy that the user wishes to preserve their existing environment variables[^2].

With Docker and Dockerfile, sudo should be removed, because root:

1
2
RUN curl -sL https://deb.nodesource.com/setup_4.x | bash - && \
apt-get install -y nodejs

Install Node.js v6.x via the same method:

1
2
$ curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
$ sudo apt-get install -y nodejs

The above installation method has been tested on:

  • Debian 8.5
  • Node.js v4.4.7 and v6.2.2

In summary, the methods to install Node.js covered are:

  • Package manager
  • Nodesource script
  • From the source

[^1]: NodeSource provides binary distribution setup and support scripts.
[^2]: See man sudo.