Decrypting Password Protected PDF Files

Receiving password protected PDF files? If you get annoyed to type the password every time, decrypt and save into a new file:

1
$ qpdf --password=secret --decrypt infile.pdf outfile.pdf

QPDF is a command-line tools and library for transforming PDF files, an alternative to pdftk.

Notes:

1
qpdf 7.1.1

Identifying Duplicate Files in the Current Directory

Here are some tools to find duplicates files:

  • duff: Quickly find duplicate files
  • fdupes: Finds duplicate files in a given set of directories

But we can also just cobble together with a few commonly used CLI tools:

1
2
$ find -type f | xargs -n 1 -I {} md5sum '{}' | sort | \
trueawk '{if(k==$1){printf("%s\n%s\n",v,$2)}else{print("")};k=$1;v=$2}' | uniq

The command finds all files in the current directory, computes the MD5 checksum of each file, sorts them by the checksum first (k), then the file name (v). Finally, prints the duplicates ones and runs through a unique filter to obtain the final result.

Put into a shell script (find-duplicates.sh) with some dummy files to test the command and the above mentioned tools:

1
2
3
4
5
6
7
8
9
10
#!/bin/sh
apt-get update && apt-get install -y duff fdupes
cd /tmp && seq 3 | xargs -I {} bash -c 'touch file{}; date > date{}; echo $RANDOM > rand{}'
find -type f | xargs -n 1 -I {} md5sum '{}' | sort | \
trueawk '{if(k==$1){printf("%s\n%s\n",v,$2)}else{print("")};k=$1;v=$2}' | uniq
duff -r .
fdupes -r .

Execute the script in a disposable environment:

1
$ cat find-duplicates.sh | docker run -i --rm debian:9.6

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
Ign:1 http://cdn-fastly.deb.debian.org/debian stretch InRelease
Get:2 http://security-cdn.debian.org/debian-security stretch/updates InRelease [94.3 kB]
Get:3 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease [91.0 kB]
Get:4 http://cdn-fastly.deb.debian.org/debian stretch Release [118 kB]
Get:5 http://security-cdn.debian.org/debian-security stretch/updates/main amd64 Packages [467 kB]
Get:6 http://cdn-fastly.deb.debian.org/debian stretch Release.gpg [2434 B]
Get:7 http://cdn-fastly.deb.debian.org/debian stretch-updates/main amd64 Packages [5152 B]
Get:8 http://cdn-fastly.deb.debian.org/debian stretch/main amd64 Packages [7089 kB]
Fetched 7868 kB in 1s (4119 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
duff fdupes
0 upgraded, 2 newly installed, 0 to remove and 3 not upgraded.
Need to get 52.4 kB of archives.
After this operation, 146 kB of additional disk space will be used.
Get:1 http://cdn-fastly.deb.debian.org/debian stretch/main amd64 fdupes amd64 1:1.6.1-1+b1 [21.2 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian stretch/main amd64 duff amd64 0.5.2-1.1+b2 [31.2 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 52.4 kB in 0s (110 kB/s)
Selecting previously unselected package fdupes.
(Reading database ... 6498 files and directories currently installed.)
Preparing to unpack .../fdupes_1%3a1.6.1-1+b1_amd64.deb ...
Unpacking fdupes (1:1.6.1-1+b1) ...
Selecting previously unselected package duff.
Preparing to unpack .../duff_0.5.2-1.1+b2_amd64.deb ...
Unpacking duff (0.5.2-1.1+b2) ...
Setting up duff (0.5.2-1.1+b2) ...
Setting up fdupes (1:1.6.1-1+b1) ...
+ xargs -n 1 -I '{}' md5sum '{}'
+ sort
+ find -type f
+ awk '{if(k==$1){printf("%s\n%s\n",v,$2)}else{print("")};k=$1;v=$2}'
+ uniq
./file1
./file2
./file3
./date1
./date2
./date3
+ duff -r .
3 files in cluster 1 (0 bytes, digest da39a3ee5e6b4b0d3255bfef95601890afd80709)
./file3
./file1
./file2
3 files in cluster 2 (29 bytes, digest c20aeceea6a6b4bb1903d9124ea69223da08c69c)
./date2
./date1
./date3
+ fdupes -r .
./date2
./date1
./date3
./file3
./file1
./file2

Looks similar, same result.

However, those tools provide additional configurations such as following symlink files. Worth to install.

Using Two Different Phones for 2FA

When setting up two-factor authentication (2FA), there is always an option to print backup codes in case you lose your phone. But if you have a spare phone, you can use it as the backup authenticator device.

During the 2FA setup process, scan the QR code on two different phones. Both phones will show the identical codes. This is because of time-based OTP (One-Time Password) implementation. Time-based OTP establishes authentication by an Unix time to start counting and an interval for computing the next set of codes (usually 30 seconds).

In fact, you can use three, four, or as many devices you want. When done, set a strong password for the phone, turn it off, store somewhere safe. It’s better than a piece of paper, because when your safe was broken into, there’s another layer of protection.

BTW, when Google asks you “What kind of phone do you have?”, you can pick Android and still use the authenticator app from the iPhone.

Get codes from the Authenticator app

Therefore, 2FA, two different phones, two locations, no LastPass, no Authy, no cloud based backup.

sudo -s or sudo -i

sudo allows users to run programs with the security privileges of another user (superuser or other users). Of the supported options, what’s difference between -s and -i?

Both options run an interactive shell if no command is specified:

1
2
$ sudo -s
$ sudo -i

The difference is that when using -i:

sudo attempts to change to that user’s home directory before running the shell. The command is run with an environment similar to the one a user would receive at log in. - man sudo

Let’s assume the current directory is at /:

1
2
3
4
5
$ sudo -s pwd
/
$ sudo -i pwd
/root

Changing directory is not attempted when using -s option.

sudo is commonly used to elevate the privilege to execute as the superuser, and usually done in place rather than in the superuser’s home directory, such as:

1
$ sudo -s chown $USER:$GROUP file

Therefore, when running as superuser, use -s:

1
$ sudo -s

When running as another user, use -i:

1
$ sudo -u foo -i

Amazon API Gateway No Integration Defined for Method

Encountered the following problem when deploying with the Serverless Framework (v1.27.3) to Amazon API Gateway:

1
2
3
4
CloudFormation - CREATE_IN_PROGRESS - AWS::ApiGateway::Deployment - ApiGatewayDeployment
CloudFormation - CREATE_FAILED - AWS::ApiGateway::Deployment - ApiGatewayDeployment
An error occurred: ApiGatewayDeployment - No integration defined for method

The problem was not the bug from the Serverless side, but was originated with manually created resource without integration defined for the POST method:

1
2
3
$ aws apigateway get-integration --rest-api-id xxxxxxxxxx --resource-id 0xxxxx --http-method post
An error occurred (NotFoundException) when calling the GetIntegration operation: No integration defined for method

After removing the resource. The deployment works again.

In conclusion, if there is a resource and one of its method has no integration, the REST API cannot be deployed. Either removing the resource or creating an integration will resolve the problem.

The following script will look for a REST API for no integration error:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/sh
#
# Search Amazon API Gateway for resources that have no integration defined for
# method.
API_NAME=example.com
REST_API_ID=$(aws apigateway get-rest-apis --query='items[?name==`'$API_NAME'`].id | [0]' --output=text)
RESOURCES=$(aws apigateway get-resources --rest-api-id=$REST_API_ID --query='items[*].id' --output=text)
for resource in $(echo "$RESOURCES")
do
for method in $(aws apigateway get-resource \
--query='resourceMethods && keys(resourceMethods)' \
--output=text \
--rest-api-id=$REST_API_ID \
--resource-id=$resource)
do
if [ "$method" != "None" ]
then
aws apigateway get-integration \
--rest-api-id=$REST_API_ID \
--resource-id=$resource \
--http-method=$method > /dev/null
if [ $? -ne 0 ]
then
aws apigateway get-resource \
--rest-api-id=$REST_API_ID \
--resource-id=$resource
fi
fi
done
done

References:

jq: Filter JSON Array by Value

Print the JSON object in an array where the object key is equal to the specified value with jq:

1
2
3
4
$ echo '[{"key":"foo"},{"key":"bar"}]' | jq '.[] | select(.key == "foo")'
{
"key": "foo"
}

Reference:

1
2
$ jq --version
jq-1.5

Avoiding Dependencies Reinstall When Building NodeJS Docker Image

When I first started writing Dockerfile for building a NodeJS app, I will do:

1
2
3
WORKDIR /app
COPY . /app
RUN npm install

Sure, this is very simple, but every time a change in the source file, the entire dependency tree needs to be re-installed. The only time you need to rebuild is when package.json changes.

One trick is to move package.json elsewhere, and build the dependencies, then move back:

1
2
3
4
5
WORKDIR /app
ADD package.json /tmp
RUN cd /tmp && npm install
COPY . /app
RUN mv /tmp/node_modules /app/node_modules

However, if there are a lot of depending packages with lots of files, the last step will take a long time.

How about using symbolic link to shorten the time?

1
2
3
4
5
WORKDIR /app
ADD package.json /tmp
RUN cd /tmp && npm install
COPY . /app
RUN ln -s /tmp/node_modules /app/node_modules

Well, it works, but not every NPM package is happy about it. Some will complain about the softlink, result in errors.

So, how can we reuse caches as much as possible, and:

  1. Do not need to reinstall dependencies
  2. Do not use symbolic link

Here is a solution, just move down the COPY statement:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Set the working directory, which creates the directory.
WORKDIR /app
# Install dependencies.
ADD package.json /tmp
RUN cd /tmp && npm install
# Move the dependency directory back to the app.
RUN mv /tmp/node_modules /app
# Copy the content into the app directory. The previously added "node_modules"
# directory will not be overridden.
COPY . /app

With this configuration, if package.json stays the same, only the last cache layer will be rebuilt when source codes change.

Wrapping Line by Line JSON to an Array

Some utilities print out JSON data (object) line by line. Line by line JSON is not really a valid JSON format, need to wrap it in bracket as an array.

For example, here are a few lines of JSON objects:

1
2
3
4
$ for i in $(seq 3); do echo '{"id":'$i'}'; done
{"id":1}
{"id":2}
{"id":3}

To combine them into a single array, we can use jq‘s -s, —slurp option:

1
2
$ for i in $(seq 3); do echo '{"id":'$i'}'; done | jq -cM -s
[{"id":1},{"id":2},{"id":3}]

The option reads the entire input stream into a large array. This is good until the input stream is too large to process, because jq is not producing output while slurping up the input. If the input is line by line JSON, and all you want to do is wrapping it into an array (with or even without the trailing newline), we can simply do the following:

1
2
3
$ for i in $(seq 3); do echo '{"id":'$i'}'; done |\
awk 'BEGIN { printf "["; getline; printf "%s", $0 } { printf ",%s", $0 } END { print "]" }'
[{"id":1},{"id":2},{"id":3}]

This is with the newline printed. And it will solve the problem of a very large input. Data will simply stream out as it comes it. Try it with a forever while loop.

There is another option in jq named —stream, and it seems to be doing the same. But option —slurp overrides —stream, and the option itself is already so complicated.

In conclusion, you can use jq -s to wrap line by line JSON into an array. If you want to be safe, just use the awk example above.

Gitignore: Excluding Everything Except a Specific Directory

I have the following directory structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ tree -aF src/
src/
├── github.com/
│   ├── git/
│   │   └── git/
│   │   └── file
│   └── realguess/
│   ├── git/
│   │   └── file
│   └── hello/
│   └── hello.go
└── .gitignore
6 directories, 4 files

My intention is to keep the projects in src/github.com/realguess/ only. So, I have added the following gitignore file:

1
2
3
$ cat src/.gitignore
/github.com/
!/github.com/realguess/

But the directory is ignored:

1
2
$ git check-ignore -v github.com/realguess/
src/.gitignore:1:/github.com/ github.com/realguess/

It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined.[^1]

Uh-oh! The directory github.com/ is excluded. Therefore, it is not possible to re-include a file github.com/realguess (directory is a file). To fix it, just need to add the wildcard to the end of the parent directory:

1
2
3
$ cat src/.gitignore
/github.com/*
!/github.com/realguess/

By adding the wildcard character, it matches every file (including hidden ones) or directory under github.com/ directory, the directory itself as the parent of github.com/realguess/ is not excluded. Then, we use the negation pattern to re-include github.com/realguess/.

Double check it:

1
2
$ git check-ignore -v github.com/realguess/
1

The debugging command returns non-zero status code, which indicates that the directory is not ignored.

Registering a Domain with Amazon Route 53 via CLI

Register a domain with Amazon Route 53 via CLI is very straightforward, the only complicate thing is to generate the acceptable JSON structure, then somehow enter the JSON string in the command line without meddling from the shell.

The command is:

1
$ aws route53domains register-domain

The documentation can be found from the AWS CLI Command Reference or issuing:

1
$ aws route53domains register-domain help

Most the command line options require simple string values, but some options like --admin-contact requires a structure data, which is complex to enter in the command line. The easiest way is to forget all other options and use --cli-input-json to get everything in JSON.

First let’s generate the skeleton or template to use:

1
$ aws route53domains register-domain --generate-cli-skeleton

That should output something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
{
"DomainName": "",
"IdnLangCode": "",
"DurationInYears": 0,
"AutoRenew": true,
"AdminContact": {
"FirstName": "",
"LastName": "",
"ContactType": "",
"OrganizationName": "",
"AddressLine1": "",
"AddressLine2": "",
"City": "",
"State": "",
"CountryCode": "",
"ZipCode": "",
"PhoneNumber": "",
"Email": "",
"Fax": "",
"ExtraParams": [
{
"Name": "",
"Value": ""
}
]
},
"RegistrantContact": {
"FirstName": "",
"LastName": "",
"ContactType": "",
"OrganizationName": "",
"AddressLine1": "",
"AddressLine2": "",
"City": "",
"State": "",
"CountryCode": "",
"ZipCode": "",
"PhoneNumber": "",
"Email": "",
"Fax": "",
"ExtraParams": [
{
"Name": "",
"Value": ""
}
]
},
"TechContact": {
"FirstName": "",
"LastName": "",
"ContactType": "",
"OrganizationName": "",
"AddressLine1": "",
"AddressLine2": "",
"City": "",
"State": "",
"CountryCode": "",
"ZipCode": "",
"PhoneNumber": "",
"Email": "",
"Fax": "",
"ExtraParams": [
{
"Name": "",
"Value": ""
}
]
},
"PrivacyProtectAdminContact": true,
"PrivacyProtectRegistrantContact": true,
"PrivacyProtectTechContact": true
}

Fill in the fields, leave out the ones not needed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{
"DomainName": "example.com",
"DurationInYears": 1,
"AutoRenew": true,
"AdminContact": {
"FirstName": "Joe",
"LastName": "Doe",
"ContactType": "PERSON",
"AddressLine1": "101 Main St",
"City": "New York",
"State": "NY",
"CountryCode": "US",
"ZipCode": "10001",
"PhoneNumber": "+1.8888888888",
"Email": "[email protected]"
},
"RegistrantContact": {
"FirstName": "Joe",
"LastName": "Doe",
"ContactType": "PERSON",
"AddressLine1": "101 Main St",
"City": "New York",
"State": "NY",
"CountryCode": "US",
"ZipCode": "10001",
"PhoneNumber": "+1.8888888888",
"Email": "[email protected]"
},
"TechContact": {
"FirstName": "Joe",
"LastName": "Doe",
"ContactType": "PERSON",
"AddressLine1": "101 Main St",
"City": "New York",
"State": "NY",
"CountryCode": "US",
"ZipCode": "10001",
"PhoneNumber": "+1.8888888888",
"Email": "[email protected]"
},
"PrivacyProtectAdminContact": true,
"PrivacyProtectRegistrantContact": true,
"PrivacyProtectTechContact": true
}

Now let’s register by running through jq to clean up extra blanks and end of line characters, and then pipe to xargs command by using new line character as the delimiter instead of a blank by default:

1
2
3
4
5
$ cat input.json | jq -cM . | \
xargs -d '\n' aws route53domains register-domain --cli-input-json
{
"OperationId": "00000000-0000-0000-0000-00000000000"
}