dependency

Avoiding Dependencies Reinstall When Building NodeJS Docker Image

When I first started writing Dockerfile for building a NodeJS app, I will do:

1
2
3
WORKDIR /app
COPY . /app
RUN npm install

Sure, this is very simple, but every time a change in the source file, the entire dependency tree needs to be re-installed. The only time you need to rebuild is when package.json changes.

One trick is to move package.json elsewhere, and build the dependencies, then move back:

1
2
3
4
5
WORKDIR /app
ADD package.json /tmp
RUN cd /tmp && npm install
COPY . /app
RUN mv /tmp/node_modules /app/node_modules

However, if there are a lot of depending packages with lots of files, the last step will take a long time.

How about using symbolic link to shorten the time?

1
2
3
4
5
WORKDIR /app
ADD package.json /tmp
RUN cd /tmp && npm install
COPY . /app
RUN ln -s /tmp/node_modules /app/node_modules

Well, it works, but not every NPM package is happy about it. Some will complain about the softlink, result in errors.

So, how can we reuse caches as much as possible, and:

  1. Do not need to reinstall dependencies
  2. Do not use symbolic link

Here is a solution, just move down the COPY statement:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Set the working directory, which creates the directory.
WORKDIR /app
# Install dependencies.
ADD package.json /tmp
RUN cd /tmp && npm install
# Move the dependency directory back to the app.
RUN mv /tmp/node_modules /app
# Copy the content into the app directory. The previously added "node_modules"
# directory will not be overridden.
COPY . /app

With this configuration, if package.json stays the same, only the last cache layer will be rebuilt when source codes change.

Why Using Exact Versions in NPM Package Dependencies

Versions frequently used in NPM package dependencies are caret ranges or ^version. And npm install uses caret ranges as well.

Caret range:

Allows changes that do not modify the left-most non-zero digit in the [major, minor, patch] tuple. In other words, this allows patch and minor updates for versions 1.0.0 and above, patch updates for version 0.x >=0.1.0, and no updates for version 0.0.x.

Most of time, using caret ranges as dependency versions work perfectly fine. But once a while, there are bugs. In one of projects, JS-YAML with caret version ^3.4.3 was used and installed:

1
2
3
4
5
{
"dependencies": {
"js-yaml": "^3.4.3"
}
}

Time passes. When npm install was run again from a freshly cloned project code base, version 3.5.2 was installed. Since JS-YAML version 3.5.0, errors are thrown when safe loading a spec with duplicate keys. If there are no duplicate keys in the YAML file, this will be fine. However, one of the files has it. The correct approach is to actually fix the duplicate keys. But it requires additional work back and forth. You’ve heard this before: “Hey it works when I installed it.”

The point here is to use exact versions instead of letting the package manager decides by dropping the caret character:

1
2
3
4
5
{
"dependencies": {
"js-yaml": "3.4.3"
}
}

This will avoid the above mentioned problem. We can manually update the outdated NPM packages.

Don’t let machine decide. You Do!

Update Outdated NPM Packages

NPM provides a command to check for outdated packages:

1
$ npm help outdated

However, by default, the command checks the entire dependency tree, not only the modules specified in package.json, but also the dependencies of those modules. If we only care about the top-level packages, we can add --depth option to show just that:

1
$ npm outdated --depth 0

This is similar to listing installed packages:

1
$ npm list --depth 0

Using the option, it will not print out the nested dependency tree, but only the top-level. The option is similar to tree -L 1, but zero indexed instead of one.

Another interesting thing about outdated command is the color coding in the output:

Gulp Task Alias

Gulp API does not have a method to create an alias task. But, there are two ways to do this now:

  1. Create a task alias by dependency
  2. Create a task alias by duplication

Dependency

Gulp dependency system can be used to create an alias of a task:

gulp.task('task', function(){});
gulp.task('alias', ['task']);

Here, task alias is the alias of the task task. When list the tasks, it shows the dependency tree:

$ gulp --tasks
Tasks for ~/gulpfile.js
├── task
└─┬ alias
  └── task

When running the tasks, the dependency is obvious:

$ gulp alias
Starting 'task'...
Finished 'task' after 49 μs
Starting 'alias'...
Finished 'alias' after 6.83 μs

Duplication

Another way to create a task alias is by duplicating the task definition:

var task = function () {};    
gulp.task('task' , task);
gulp.task('alias', task);

But in this way, there is no implicit relationship that will indicates that one task is an alias of another. As you can see from the task list:

$ gulp --tasks
Tasks for ~/gulpfile.js
├── task
└── alias

To Gulp, there are two different tasks. But when the alias task is run, the console log is cleaner than the dependency one:

$ gulp alias
Starting 'alias'...
Finished 'alias' after 45 μs

Gulp is still involving, therefore, API might change regarding for creating alias of a task. At the moment, I like the second option. Alias and dependency are two different concepts. Dependency should be left as dependency. Alias is just another name to run the same task.