Update GitHub Pages Deprecated IP Addresses with Zone Apex Domain

Recently, I have received a warning email from GitHub:

GitHub Pages recently underwent some improvements (https://github.com/blog/1715-faster-more-awesome-github-pages) to make your site faster and more awesome, but we’ve noticed that realguess.net isn’t properly configured to take advantage of these new features. While your site will continue to work just fine, updating your domain’s configuration offers some additional speed and performance benefits. Instructions on updating your site’s IP address can be found at https://help.github.com/articles/setting-up-a-custom-domain-with-github-pages#step-2-configure-dns-records, and of course, you can always get in touch with a human at [email protected]. For the more technical minded folks who want to skip the help docs: your site’s DNS records are pointed to a deprecated IP address.

What are the improvements on GitHub Pages? Here are the two major improvements [1]:

Pages are served via CDN (Content Delivery Network)
DoS (Denial of Service) protection

But my site wasn’t properly configured to take advantage of speed and performance due to that the “DNS records are pointed to a deprecated IP address”.

What are the IP addresses that GitHub Pages uses before but have been deprecated now?

The domain realguess.net is an custom zone apex domain (also called bare, naked, or root domain). The domain blog.realguess.net is not a zone apex domain, but a subdomain. “A custom subdomain will not be affected by changes in the underlying IP addresses of GitHub’s servers.” [2] But, I am not using a subdomain. I have configured the zone apex domain to point to currently deprecated IP addresses. “If you are using an A record that points to 207.97.227.245 or 204.232.175.78, you will need to update your DNS settings, as we no longer serve Pages directly from those servers.” [3] So, these IP addresses are deprecated, and need to update the current DNS from:

1
2
3

$ dig realguess.net +nostats +nocomments +nocmd
;realguess.net.                 IN      A
realguess.net.          86400   IN      A       204.232.175.78

to the new ones pointed by username.github.io:

$ dig realguess.github.io +nostats +nocomments +nocmd
;realguess.github.io.           IN      A
realguess.github.io.    3600    IN      CNAME   github.map.fastly.net.
github.map.fastly.net.  28      IN      A       199.27.73.133

And the new IP address are:

192.30.252.153
192.30.252.154

Using a subdomain is a better solution, so I don’t need to care about the changing GitHub Pages IP addresses in the future. But I don’t think the frequency of IP address updating is going to be very often. So, I will stick with my zone apex domain.

Apr 27, 2014

Understand Google Trends Search Term and Topic

“Google Trends is a public web facility of Google Inc., based on Google Search, that shows how often a particular search-term is entered relative to the total search-volume across various regions of the world, and in various languages.” - Google Trends - Wikipedia

How you type your search term in Google Trends will determine the results you will see. As described in Google Trends help page, there are four different ways:

A B
“A B”
A + B
A - B

I am going to illustrate a few examples to explain the difference with the following fixed parameters:

Location: United States
Time range: 2013
Categories: All categories
Type: Web Search

A B

Search term format A B is the most common format:

Must contains both words
Words can be any order
Other words can be included, such as A B C
“No misspellings, spelling variations, synonyms, plural or singular versions” - Google Trends Help

Let’s start by comparing three search terms ‘tennis shoes’, ‘tennis’ and ‘shoes’:

Search term  | Avg
------------------
tennis shoes | 1
tennis       | 20
shoes        | 89

This shows the first point above, that both words much be in the search term. Otherwise, the difference should not be this large.

The order of the search term matters a little. This can be done by simply comparing ‘tennis shoes’ and ‘shoes tennis’:

Search term  | Avg
------------------
tennis shoes | 73
shoes tennis | 73

Another example is ‘Barack Obama’ vs. ‘Obama Barack’:

Search term  | Avg
------------------
Barack Obama | 26
Obama Barack | 26

The third point can be demonstrated with the conjunction of the format "A B" (to be shown later).

The fourth point is actually the most important, because variation matters. For example, comparing ‘tennis shoes’, ‘tennis shoe’, and ‘tenis shoes’:

Search term  | Avg
------------------
tennis shoes | 74
tennis shoe  | 11
tenis shoes  |  2

Or, ‘ugg’ and ‘uggs’:

Search term  | Avg
------------------
ugg          | 24
uggs         | 25

Here, when people search either ‘ugg’ or ‘uggs’ on Google, most likely, they mean the same thing. So, the actual trend should include both search terms.

“A B”

The format "A B" means exact search:

Order is important
No additional words included

Here is comparison among three search terms ‘“tennis shoes”‘, ‘“shoes tennis”‘ and ‘tennis shoes’:

Search term    | Avg
--------------------
"tennis shoes" | 69
"shoes tennis" | NA
tennis shoes   | 74

There are people who search for ‘“shoes tennis”‘ exactly, but that the volume is too small when comparing to the other two.

This example also shows that since search term format A B allows additional words to be added such as A B C. Therefore, its volume is higher than "A B" search term format.

A + B

Search term format A + B is an OR operation. For example, compare ‘tennis’, ‘shoes’, ‘tennis + shoes’, and ‘shoes + tennis’:

Search term    | Avg
--------------------
tennis         | 13
shoes          | 58
tennis + shoes | 71
shoes + tennis | 71

The sum of tennis and shoes is equal to either tennis + shoes or shoes + tennis.

A - B

A - B search term format excludes searches containing B. The search words can be A C, but not A B. For example, ‘tennis’ and ‘tennis - shoes’:

Search term    | Avg
--------------------
tennis         | 35
tennis - shoes | 32

Search term tennis can include shoes, therefore, it is higher than tennis - shoes, which disallow shoes.

Special Characters

“Currently, search terms that contain special characters, such as an apostrophe, single quotes, and parentheses, will be ignored. For example, if you type women's tennis world ranking, you get results for womens tennis world ranking.” - Google Trends Help

Topics

Without variations such as misspellings, synonyms or special characters, we cannot truly capture the user’s online search intention. As explained by the blog post, An easier way to explore topics and entities in Google Trends, when we search for rice, are we looking for Rice University or the rice we eat? Or how to count all the variations when looking up Gwyneth Paltrow, such as Gwen Paltro or Lead actress in Iron Man?

That’s why Google Trends introduces topics, where the semantics of the search are used, not only variations, but actual meanings. Google Trends topics is still in beta: “Measuring search interest in topics is a beta feature which quickly provides accurate measurements of overall search interest. To measure search interest for a specific query, select the ‘search term’ option.” And “when you measure interest in a search topic (Tokyo - Capital of Japan) our algorithms (Google Trends) count many different search queries that may relate to the same topic (東京, Токио, Tokyyo, Tokkyo, Japan Capital, etc). When you measure interest in a search query (Toyko - Search term), our systems will count only searches including that string of text (“Tokyo”).”

Here is an example of the retail company Nordstrom, comparing between search term and topic (the dotted line is topic and the solid line is search term):

Here is another example of Laura Mercier Cosmetics:

The topic algorithm is likely to count both ‘Laura Mercier Cosmetics’ and ‘Laura Mercier’:

Another good example is ‘Apple’ (fruit), ‘Apple Inc.’ (consumer electronic company) and ‘apple’ (search term):

Indeed, when a search is apple, it means more than just the maker of iPhone and iPad:

Search term    | Avg
--------------------
Apple          | 13
Apple Inc.     | 47
apple          | 55

We can build our own topic by using the format A + B to include all variations, for example, topic Flip-flops (Garment) closely resembles the sum of flip flop and flip flops (see the OR operator):

For some search terms, if we do not use topic or omitting different variations, we might get a wrong impression of the search trend. For example, comparing flip flops and Bahamas, on the peak, people who search for flip-flops is almost the same as people look for Bahamas:

Conclusion

Using search term in Google Trends has its limitation because it does not capture all variations such as misspellings, spelling variations, synonyms, plural or singular version, or special characters. And more importantly, the semantics of the search term is not captured. That’s why Google introduces topic to improve the result.

Google Trends has a limited number of topics, for those search terms without a corresponding topics, we need to a taxonomy system to include all the variations in order to capture the true semantics of the search.

Apr 26, 2014

Install SSH Public Key to All AWS EC2 Instances

I’ve got a new laptop, and I need to install the SSH public key of the new machine to all my AWS EC2 instances in order to enable keyless access. I can use ssh-copy-id to install the public key one instance at a time, but I can also do it all at once:

$ aws ec2 describe-instances --output text \
  --query 'Reservations[*].Instances[*].{IP:PublicIpAddress}' \
  while read host; do \
  ssh-copy-id -i /path/to/key.pub $USER@$host; done

Somehow if using PublicIpAddress, some IP addresses in the response were cluttered in a single line. So, I use {IP:PublicIpAddress} instead.

For non-standard port:

$ ssh-copy-id -i /path/to/key.pub "$USER@$host -p $port"

The only problem is that it might install a duplicate key in ~/.ssh/authorized_keys file of the remote instance, if the key has already been installed. One way to solve this problem is to test the login from the new machine and generate only the IP addresses that the new machine does not have access to:

$ aws ec2 describe-instances --output text \
  --query 'Reservations[*].Instances[*].{IP:PublicIpAddress}' \
  | while read host; do ssh -q $USER@$host exit \
  || echo $host; done > instances.txt

Now back to the old machine, install the public key at once:

$ cat instances.txt | while read host; \
  do ssh-copy-id -i /path/to/key.pub $USER@$host; done

Apr 23, 2014

Obtain Username from Custom Domain in Tumblr

Users frequently have custom domain for their blogs in Tumblr. If you need to know the [Tumblr] username from the custom domain, just need to grab the value from the header field X-Tumblr-User:

$ curl -Is blog.foursquare.com | grep -i x-tumblr-user
X-Tumblr-User: foursquare

Another example:

$ curl -Is blog.birchbox.com | grep -i x-tumblr-user
X-Tumblr-User: birchbox

Now you can use the username to make some API calls to Tumblr.

Apr 11, 2014

Nota Bene

Everyday, there are always new things to learn, even just at reading the comments from /etc/apt/sources.list:

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.

“N.B.” is an abbreviation for nota bene, a Latin expression meaning “note well”. N.B. every detail matters.

Apr 2, 2014

Difference between require and fs in Loading Files

Node has a simple module loading system by using require.

A module prefixed with '/' is an absolute path to the file. For example, require('/home/marco/foo.js') will load the file at /home/marco/foo.js.

A module prefixed with './' is relative to the file calling require(). That is, circle.js must be in the same directory as foo.js for require('./circle') to find it.

Without a leading '/' or './' to indicate a file, the module is either a “core module” or is loaded from a node_modules folder.

http://nodejs.org/api/modules.html#modules_file_modules

We can also use it to load JSON files by simply specifying the file path. File path can be relative or absolute. Giving the following file structure:

.
├── app.js
├── data
│   └── file.json
└── lib
    └── util.js

Let’s say the root path of the application is /home/marco. Use relative path from the file lib/util.js:

1	var json = require('../data/file.json');

This will load the JSON content from data/file.json, because the require system knows that the file is relative to the calling file lib/util.js. But when using file system functions, relative path is not treat the same way:

Relative path to filename can be used, remember however that this path will be relative to process.cwd(). - File System

process.cwd() returns the current working directory, not the root directory of the application nor the directory of the calling file. For example:

1	var data = fs.readFileSync('../data/file.json');

If your current working directory is /home/marco/myapp, the application root directory, you will get the following error message:

fs.js:410
  return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
                 ^
Error: ENOENT, no such file or directory '../data/file.json'
    at Object.fs.openSync (fs.js:410:18)
    at Object.fs.readFileSync (fs.js:276:15)

Because process.cwd() is /home/marco/myapp, and reading file relative to this directory with ../data/file.json, Node will expect file path to be /home/marco/data/file.json. Therefore, the error is thrown.

The difference between require and fs in loading files is that if the relative path is used in require, it is relative to the file calling the require. If the relative path is used in fs, it is relative to process.cwd().

To fix the error in file system functions, we should always use absolute path. One thing we can do is:

1	var data = fs.readFileSync(__dirname + '/../data/file.json');

__dirname is “the name of the directory that the currently executing script is resides in” [dirname], and it has no trailing slash. The the structure looks odd, even though it works, but it might not work in all systems.

The correct solution here is to use path.join from Path:

1	var data = fs.readFileSync(path.join(__dirname, '../data/file.json'));

The path module “contains utilities for handling and transforming file paths”. The path.join function will call path.normalize to “normalize a string path, taking care of .. and . paths” [normalize], also see function normalizeArray in https://github.com/joyent/node/blob/master/lib/path.js on how the paths are normalized.

In conclusion:

// Use `require` to load JSON file.
var json = require('../data/file.json');
// Use `fs` to load JSON file.
var data = fs.readFileSync(path.join(__dirname, '../data/file.json'));
var json = JSON.parse(data.toString());

Mar 29, 2014

Passing Environment Variables from Cron to Shell Script

In Debian system with Vixie cron, environment variables can be defined in the crontab:

MESSAGE=Hello World
* * * * * /bin/sh ~/script.sh >> /tmp/script.log

Now, can the same environment variable MESSAGE be passed to the shell script ~/script.sh that being scheduled to run every minute? Let’s just give a try by adding the following line to script.sh:

echo "ECHO: ${MESSAGE}"

tail -f /tmp/script.log:

ECHO: Hello World
ECHO: Hello World
ECHO: Hello World

Therefore, the shell script will pick up the environment variables defined in crontab. This really is a convenience.

Mar 29, 2014

Amazon AWS Command Line Interface (CLI)

This is a brief guide for installing AWS Command Line Interface (CLI) on Ubuntu Linux.

The AWS Command Line Interface (CLI)] is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts. [2]

The point here is unified, one tool to run all Amazon AWS services.

Install

The installation procedure applies to Ubuntu Linux with Zsh and Bash.

Install pip, a Python package manager:

$ sudo apt-get install python-pip

Install awscli:

$ sudo pip install awscli

Install autocompletion:

$ which aws_zsh_completer.sh
/usr/local/bin/aws_zsh_completer.sh
$ source aws_zsh_completer.sh

Add this line to ~/.zshrc as well.

Or for Bash ~/.bashrc:

$ echo '\n# Enable AWS CLI autocompletion' >> ~/.bashrc
$ echo 'complete -C aws_completer aws' >> ~/.bashrc
$ source ~/.bashrc

Test installation:

$ which aws
/usr/local/bin/aws
$ aws help

Test autocompletion:

$ aws <tab><tab>

You should see a list of all available AWS commands.

Usage

Before using aws-cli, you need to tell it about your AWS credentials. There are three ways to specify AWS credentials:

Environment variables
Config file
IAM Role

Using config file is preferred, which is a simple ini file format to be stored in ~/.aws/config. A soft link can be used to link it or just tell awscli where to find it:

$ export AWS_CONFIG_FILE=/path/to/config_file

It is better to use IAM roles with any of the AWS services:

The final option for credentials is highly recommended if you are using aws-cli on an EC2 instance. IAM Roles are a great way to have credentials installed automatically on your instance. If you are using IAM Roles, aws-cli will find them and use them automatically. [4]

The default output is in JSON format. Other formats are tab-delimited text and ASCII-formatted table. For example, using --query filter and table output:

$ aws ec2 describe-instances --query 'Reservations[*].Instances[*].{ \
  ID:InstanceId, TYPE:InstanceType, ZONE:Placement.AvailabilityZone, \
  SECURITY:SecurityGroups[0].GroupId, KEY:KeyName, VPC:VpcId, \
  STATE:State.Name}' --output table

This will print a nice looking table of all EC2 instances.

The command line options also accept JSON format. But when passing in large blocks of data, referring a JSON file is much easier. Both local file and remote URL can be used.

Upgrade

Check the installed and the latest versions:

$ pip search awscli

Upgrade AWS CLI to the latest version:

$ sudo pip install --upgrade awscli

References

Mar 22, 2014

Writing Custom Yeoman Generator

Writing a custom Yeoman generator, you need to install generator-generator first.

(generator-generator) Scaffolds out a new basic Yeoman generator with some sensible defaults.

So it creates some basic structure for writing your own custom generators. It is a generator for writing Node generator application.

First Install generator-generator:

$ sudo npm install -g generator-generator

Then generate some basic files for writing your own generator by issuing:

$ yo generator

You can use generator-generator to write both a custom generator and a custom sub-generator. What is sub-generator? For example, Angular directive or service is a sub-generator.

Use naming convention generator-[name].

Create a directory for your new generator:

$ mkdir -p ~/generators/generator-hello && cd $_

Make sure entering the newly created directory (cd $_), otherwise all files generated will be placed under the current directory. Then, scaffold your generator:

$ yo generator

As versions [email protected] and [email protected], the generator creates the following directories and files:

.
├── app
│   ├── index.js
│   └── templates
├── .editorconfig
├── .gitattributes
├── .gitignore
├── .jshintrc
├── node_modules
│   ├── .bin
│   ├── chalk
│   ├── mocha
│   └── yeoman-generator
├── package.json
├── README.md
├── test
│   ├── test-creation.js
│   └── test-load.js
└── .travis.yml
8 directories, 10 files

You might want to make your initial commit:

$ git init && git add '*' && git commit -m 'Initial commit'

While in development, make it accessible globally:

$ sudo npm link

This is similar to:

$ sudo npm install -g [generator-name]

They both will be discoverable from /usr/local/lib/node_modules.

There isn’t much going on there, we need to customize our own generator. The main file is ./app/index.js.

1
2
3

var HelloGenerator = yeoman.generators.Base.extend({
});
module.exports = HelloGenerator;

The HelloGenerator extends Base generator. There are four properties (functions) that it has extended: init, askFor, app, and projectfiles.

All these functions will be invoked by Yeoman one by one in order (the order is important), meaning that init will be called first to perform initialization, and then askFor will be called to get user prompts, then app and projectfiles will be called afterward to build the application.

However, it is not a must to have four properties, it just brings more structure and modularity. In fact, you can merge all four functions into a single one.

The method name does not matter. Just don’t name it prompt because you will override the existing property inherited from yeoman.generators.Base.

If we need to declare a “private” property, just appending it with an underscore as a hidden property, e.g. _dontAskFor: function(){}, or if you just use it in the same file, you can simply define a global function myFunc().

Personal, I don’t think it is necessary to break it into many functions, three are enough, initialization, prompting user for input, and generating files.

When choosing property names, the Base object contains many properties which is not a good idea to be overridden, such as ["append", "help", "log", "prompt", "remote"]. This design has limitation on choosing a right property name. For example, I was thinking about using prompt instead of askFor, but prompt is an inherited property. Therefore, it is a good idea to keep fewer properties in HelloGenerator. Three are enough. All these properties can be accessible from the context this. From this object you can access everything from Base object to newly defined _dontAskFor hidden property.

Time to test the custom generator:

$ cd /tmp && mkdir hello && yo $_

Mar 14, 2014

Traveling from the Big Easy to Hollywood via Amtrak Sunset Limited

If 30 hours on the train was long, how about 50 more hours? This is the second part of the train traveling journey from New York to Los Angeles. Traveling from New York to New Orleans took about 30 hours with one night sleep on the train, and this trip, from New Orleans to Los Angeles is about 50 hours, and two nights were spent on the train.

The Route

The train travels from New Orleans, Louisiana to Los Angeles, California, passing notable cities such as Houston, San Antonio, El Paso, and Tuscon. Not many states involved comparing to the first leg of the trip. There are only Louisiana, Texas, New Mexico, Arizona and California.

If you look at the Amtrak System map, it is a hub for three long distance Amtrak routes, Sunset Limited, Crescent and City of New Orleans (between Chicago and New Orleans). Next hub is San Antonio. The Sunset Limited is used to extend to Jacksonville, Florida. But due to Katrina, the service is suspended.

The train travels close to the United States and Mexico border, sometimes, it was very close, like in El Paso. I was able to see the Mexico side from the train.

The Train

The type of train in this route is called Superliner, a double decker train. We were staying on the second floor, on the left side, able to see the border between the United States and Mexico.

Superliner has a sightseeing lounge, where two decks of windows on each side of the car. And you can turn the chair facing the windows. This provides a really comfortable scenery view along the route.

The Room

The Superliner roomette we booked still has two beds, one is assembled by pulling the seats together, and the other one can be pulled down from the ceiling. But it has no sink and toilet like the one from the last leg. And we have to use the shared toilet on the end of car. By the way, there is a shower room to use. Feeling taking a shower on a moving train?

Our Superliner Roomette is ideal for one or two passengers, with two comfortable reclining seats on either side of a big picture window. At night, the seats convert to a comfortable bed, and an upper berth folds down from above. Roomettes are located on both upper and lower levels of our double-decker Superliner train cars.

No in-cabin toilet or shower; restrooms, showers nearby in same train car

The View

Leaving New Orleans, when the train moved on the [Huey P Long Bridge] across the Mississippi River, SuperDome and the Big Easy, Bayous of Louisiana behind us.

Riding across Huey P Long Bridge leaving New Orleans

Bayous of Louisiana:

Bayous of Louisiana

The first stop in Texas is Beaumont, TX.

In the South, it is truck dominant area:

The Conclusion

The biggest problem is the Internet access.

☑ Travel from the East Coast to the West Coast of the United States by train

Where the journey will take me next?

realguess

Don't afraid to try!

Archives