dockerfile

Avoiding Dependencies Reinstall When Building NodeJS Docker Image

When I first started writing Dockerfile for building a NodeJS app, I will do:

1
2
3
WORKDIR /app
COPY . /app
RUN npm install

Sure, this is very simple, but every time a change in the source file, the entire dependency tree needs to be re-installed. The only time you need to rebuild is when package.json changes.

One trick is to move package.json elsewhere, and build the dependencies, then move back:

1
2
3
4
5
WORKDIR /app
ADD package.json /tmp
RUN cd /tmp && npm install
COPY . /app
RUN mv /tmp/node_modules /app/node_modules

However, if there are a lot of depending packages with lots of files, the last step will take a long time.

How about using symbolic link to shorten the time?

1
2
3
4
5
WORKDIR /app
ADD package.json /tmp
RUN cd /tmp && npm install
COPY . /app
RUN ln -s /tmp/node_modules /app/node_modules

Well, it works, but not every NPM package is happy about it. Some will complain about the softlink, result in errors.

So, how can we reuse caches as much as possible, and:

  1. Do not need to reinstall dependencies
  2. Do not use symbolic link

Here is a solution, just move down the COPY statement:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Set the working directory, which creates the directory.
WORKDIR /app
# Install dependencies.
ADD package.json /tmp
RUN cd /tmp && npm install
# Move the dependency directory back to the app.
RUN mv /tmp/node_modules /app
# Copy the content into the app directory. The previously added "node_modules"
# directory will not be overridden.
COPY . /app

With this configuration, if package.json stays the same, only the last cache layer will be rebuilt when source codes change.

Creating a Data Volume Container in Dockerfile

Create a Docker data volume container in Dockerfile is unbelievably simple, just use the VOLUME instruction:

1
2
FROM debian:8.5
VOLUME ["/data"]

The instruction creates a mount point and attach the volumes from native host or other containers.

Build the data container:

1
2
3
4
5
6
7
8
9
$ docker build -t data .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM debian:8.5
---> 1b088884749b
Step 2 : VOLUME /data
---> Running in 5511f34a489c
---> 7b723b2b3d13
Removing intermediate container 5511f34a489c
Successfully built 7b723b2b3d13

The built size is just about 125.1 MB.

1
$ docker create --name data data

The first data is the name of the container, the second data is the name of Docker image.

To attach the data volume container to another, we use --volumes-from option:

1
$ docker run -it --rm --name foo --volumes-from=data debian:8.5 /bin/bash

If there’re initial data to copy, then add the COPY instruction:

1
2
3
FROM debian:8.5
VOLUME ["/data"]
COPY . /data

Settings:

1
2
$ docker --version
Docker version 1.11.2, build b9f10c9

Node.js Installation Methods July 2016 Edition

Follow up with the previous blog on various methods to install Node.js, here is the updated one with Node.js v4.x and v6.x versions.

Install Node.js v4.x via NodeSource[^1] setup script:

1
2
$ curl -sL https://deb.nodesource.com/setup_4.x | sudo -E bash -
$ sudo apt-get install -y nodejs

sudo -E option indicates to the security policy that the user wishes to preserve their existing environment variables[^2].

With Docker and Dockerfile, sudo should be removed, because root:

1
2
RUN curl -sL https://deb.nodesource.com/setup_4.x | bash - && \
apt-get install -y nodejs

Install Node.js v6.x via the same method:

1
2
$ curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
$ sudo apt-get install -y nodejs

The above installation method has been tested on:

  • Debian 8.5
  • Node.js v4.4.7 and v6.2.2

In summary, the methods to install Node.js covered are:

  • Package manager
  • Nodesource script
  • From the source

[^1]: NodeSource provides binary distribution setup and support scripts.
[^2]: See man sudo.

Use a Single Dockerfile

I was planning to build two different Docker images: one for production and one for test. However, the more I had coded, the more frustrated I had become. I want something simple, and more means complexity:

  1. Having multiple images means more configuration files to manage and update.
  2. More things to do when working with third-party tools, for example, Google App Engine Managed VMs deployment will only recognize the application root level Dockerfile out of the box.
  3. Steeper learning curve for new developers.

Keep our application structure simple is important. If you need multiple dockerfiles, then your application is too complex. The difference between npm install --production and npm install isn’t a lot. But it will save you time and effort from managing multiple dockerfiles. There is no reason to have more than one Dockerfile. Just use a different command such as npm test when running tests.