Cloud

Sep 24, 2015

Add a Custom Domain to the Specific Module of Google App Engine Managed VMs

I would like to add a custom domain to a module other than the default one in Google App Engine Managed VMs. For example, I have the following sub-domains:

Custom domain names | SSL support | Record type | Data                 | Alias
------------------- | ----------- | ----------- | -------------------- | -----
api.example.com     | none        | CNAME       | ghs.googlehosted.com | api
admin.example.com   | none        | CNAME       | ghs.googlehosted.com | admin

listed in the application console:

https://console.developers.google.com/project/[project]/appengine/settings/domains

The domain api.example.com points to the default module. And I would like the other domain admin.example.com to point to the module admin. However, by merely adding the custom domain in the settings, both api.example.com and admin.example.com are pointed to the same module: the default module. Then how to point the custom domain and route the traffic to the admin module? The answer is on the dispatch file. But first need to understand the concept of modules in Google App Engine.

Google App Engine Module Hierarchy
Source: Google App Engine Docs

The above chart illustrates the architecture of a Google App Engine application:

Application: “An App Engine application is made up of one or more modules”. [1]
Module: “Each module consists of source code and configuration files. The files used by a module represents a version of the module. When you deploy a module, you always deploy a specific version of the module.” [1] “All module-less apps have been converted to contain a single default module.” [1] Every application has a single default module.
Version: “A particular module/version will have one or more instances.”
Instance: “Each instance runs its own separate executable.”

Another concept is the resource sharing:

“App Engine Modules let developers factor large applications into logical components that can share stateful services and communicate in a secure fashion.” [1]
“Stateful services (such as Memcache, Datastore, and Task Queues) are shared by all modules in an application.” [1]

Remember that Google Cloud Platform is project based. If there are a web application and a mobile application, and they are both making requests to another application, the API. Then there should be three projects. However, in this case, both api and admin share the same services, such as the same database and file storage. We should put them together in the same application as separate modules.

With that in mind, how to route requests to a specific module? “Every module, version, and instance has its own unique URI (for example, v1.my-module.my-app.appspot.com). Incoming user requests are routed to an instance of a particular module/version according to URL addressing conventions and an optional customized dispatch file.” [1] Since a custom domain is used, we have to use the customized dispatch file. And this is done by creating a dispatch file dispatch.yml to route requests based on URL patterns.

# Dispatch
# ========
---
dispatch:
  - url: '*/favicon.ico'
    module: default
  - url: 'admin.example.com/*'
    module: admin

The application field is not necessary, otherwise, you will get:

1 2	WARNING: The [application] field is specified in file [/home/chao/example/dispatch.yml]. This field is not used by gcloud and should be removed.

Glob characters (such as asterisk) can be used, and support up to 10 routing rules.

Deploy the dispatch file is simple:

1	$ gcloud preview app deploy dispatch.yml

And it should require almost no time to wait. Once it is ready, the requests sent to admin.example.com will be routed properly to the admin module.

Sep 15, 2015

Scaffold an Explicit Document to Avoid Error When Creating Indexes in Google Cloud Datastore

When working with Google Cloud Datastore, I would like to design each kind (collection or table) with its own schema and index files:

$ tree
.
├── index.yml
└── schema.yml

But not every kind has to have something in the index.yml file. So, what to do? Will the index creation command accept an empty index.yml file with zero byte? Let’s give try:

$ gcloud preview datastore create-indexes index.yml
ERROR: (gcloud.preview.datastore.create-indexes) An error occurred while parsing file: [/home/chao/kind/index.yml]
The file is empty

No, it does not like it. Then, what’s the minimum required? The answer is an explicit document with an empty document content:

---

The three dashes form a directives end marker line. YAML uses three dashes to separate directives from document content.

When creating indexes with the file, it will proceed without errors or warnings:

$ gcloud preview datastore create-indexes index.yml
You are about to update the following configurations:
- myproj/index  From: [/home/chao/kind/index.yml]
Do you want to continue (Y/n)?

Therefore, to avoid error, when scaffold an index.yml file for indexing, use an explicit document with an empty document content.

Sep 14, 2015

Google Cloud Datastore Starter: Dataset

Google Cloud Datastore dataset starter, part of a starter collection:

// Google Cloud Datastore Starter: Dataset
// =======================================
'use strict';
// Define a dataset.
var params = {
  projectId: 'MY_PROJECT',
  keyFilename: '/path/to/key/file.json',
  namespace: 'MY_NAMESPACE'
};
var dataset = require('gcloud').datastore.dataset(params);
// Define an entity (including both key and data).
var kind = 'MY_KIND';
var key = dataset.key(kind);
var data = [
  {
    name: 'title',
    value: 'Google Cloud Datastore Starter',
    excludeFromIndexes: false
  }
];
// var data = { title: 'Google Cloud Datastore Starter' }; // Simple version
var entity = {
  key: key,
  data: data
};
// Save a single entity.
dataset.save(entity, function (err) {
  if (err) {
    throw err;
  }
});

Sep 14, 2015

Understand the Limitation of the String Property Type of Google Cloud Datastore

Google Cloud Datastore supports a variety of data types for property values:

Integers
Floating-point numbers
Strings
Dates
Binary data

Strings are likely the most frequently used. When working with strings, the most important question to ask is how many bytes can be inserted into the property/field. According to the table listed in properties and value types:

Up to 1500 bytes if property is indexed, up to 1MB otherwise

Let’s give a try:

// String Property Type
// ====================
//
// Investigate the string property type of Google Cloud Datastore.
//
// Dependencies tested:
//
// - `gcloud@v0.21.0`
// - `lorem-ipsum@1.0.3`
'use strict';
// Define dependencies.
var params = {
  projectId: 'myproj',
  keyFilename: '/path/to/key/file.json',
  namespace: 'test'
};
var dataset = require('gcloud').datastore.dataset(params);
var kind = 'String';
var lorem = require('lorem-ipsum');
// Save an entity to the datastore.
function run(params) {
  var size = params.size;
  var index = params.index;
  var str = lorem({ count: size }).substr(0, size);
  var key = dataset.key(kind);
  var data = [
    {
      name: 'title',
      value: str,
      excludeFromIndexes: !index
    }
  ];
  var entity = {
    key: key,
    data: data
  };
  dataset.save(entity, function (err) {
    console.log(size, new Buffer(str).length, index,
      err ? err.code + ' ' + err.message : '200 OK');
  });
}
// Explanation of fields:
//
// - `size`: Total number of bytes to produce
// - `index`: Whether to index the string field
//
// In-line comment indicates the expected result.
[
  { size: 1500, index: true  },             // OK
  { size: 1501, index: false },             // OK
  { size: 1501, index: true  },             // Error
  { size: 1024 * 1024 - 92, index: false }, // OK
  { size: 1024 * 1024 - 91, index: false }, // Error
].forEach(run);

Don’t ask me where 91 or 92 is coming from. Apparently, this is somewhere closer to 1 MB, but not exactly. The result of the test script:

Jun 29, 2015

Google Cloud Pub/Sub Starters for gcloud Version v0.15.x

Google Cloud Pub/Sub starters/examples/references for Node.js gcloud version v0.15.x.

Usage:

1	$ PROJECT=my-project KEYFILENAME=/path/to/keyfile.json node file.sh

Jan 7, 2015

Use CLI s3api to Retrieve Metadata of Amazon S3 Object

In Amazon S3 CLI, there are only a handful commands:

1	cp ls mb mv rb rm sync website

We can use cp command to retrieve S3 object:

1	$ aws s3 cp s3://mybucket/myfile.json myfile.json

But if only the metadata of the object, such as ETag or Content-Type is needed, the S3 CLI does not have any command to do that.

Now enter S3API CLI. Not only the CLI commands can retrieve S3 objects, but also associated metadata. For example, retrieve S3 object similar to aws s3 cp:

1	$ aws s3api get-object --bucket mybucket --key myfile.json

Just need the metadata of the object, use head-object, which retrieves metadata without the object itself, as HTTP HEAD method:

Dec 27, 2014

Revoke and Grant Public IP Addresses to Amazon EC2 Instances Via AWS Command Line Interface (CLI)

If you work from place to place, such as from one coffee shop to another, and you need access to your Amazon EC2 instances, but you don’t want to allow traffics from all IP addresses. You can use the EC2 Security Groups to allow the IP addresses from those locations. But once you move on to a different location, you want to delete the IP address from the previous location. The process to do these manually and over and over again quickly becomes cumbersome. Here is a command line method that quickly removes all other locations and allows only the traffic from your current location.

The steps are:

Revoke all existing sources to a particular port
Grant access to the port only from the current IP address

Assume the following:

Profile: default
Security group: mygroup
Protocol: tcp
Port: 22

First, revoke access to the port from all IP addresses:

$ aws ec2 describe-security-groups \
    --profile default \
    --group-names mygroup \
    --query 'SecurityGroups[0].IpPermissions[?ToPort==`22`].IpRanges[].CidrIp' | \
  jq .[] | \
  xargs -n 1 aws ec2 revoke-security-group-ingress \
    --profile default \
    --group-name mygroup \
    --protocol tcp \
    --port 22 \
    --cidr

The aws ec2 describe-security-groups command before the first pipe returns JSON formatted data, filtered via JMESPath query, which is supported by AWS CLI, for example:

[
  "XXX.XXX.XXX.XXX/32",
  "XXX.XXX.XXX.XXX/32"
]

jq command simply converts an array of JSON to line by line strings, which xarg takes in, loops through and deletes one IP address at a time.

After this step, all IP addresses originally allowed are all revoked. Next step is to grant access to the port from a single IP address:

Nov 28, 2014

Delete All Messages in an Amazon SQS Queue via AWS CLI

Amazon SQS or Simple Queue Service is a fast, reliable, scalable, fully managed message queuing service. There is also AWS CLI or Command Line Interface available to use with the service.

If you have a lot of messages in a queue, this command will show the approximate number:

1
2
3

$ aws sqs get-queue-attributes \
  --queue-url $url \
  --attribute-names ApproximateNumberOfMessages

Where $url is the URL to the Amazon SQS queue.

There is no command to delete all messages yet, but you can chain a few commands together to make it work:

Oct 25, 2014

Use CNAME for Resovling Private and Public IP Address in Amazon EC2

“A security group acts as a virtual firewall that controlls the traffic for one or more instances.” 1 The Amazon EC2 Security Groups are not just capable controlling traffic from an IP address, but also from all EC2 instances belong to a specific security group. I want to allow an instance belonging to one security group to access an instance belongs to another security group via a custom domain name (subdomain.example.com).

But when I configured the subdomain via Amazon Route 53, I have misconfigured it by assigning a A record, an IP address or the Elastic IP address of the instance. I should have used CNAME, and assigned the public DNS (the public hostname of the EC2 instance).

May 24, 2014

Amazon S3 Delimiter and Prefix

Amazon S3 is an inexpensive online file storage service, and there is the JavaScript SDK to use. There are things puzzling me when using the SDK were:

How to use parameters Delimiter and Prefix?
What is the difference between CommonPrefixes and Contents?
How to create a folder/directory with JavaScript SDK?

To retrieve objects in an Amazon S3 bucket, the operation is listObjects. The listObjects does not return the content of the object, but the key and meta data such as size and owner of the object.

To make a call to get a list of objects in a bucket:

1
2
3

s3.listObjects(params, function (err, data) {
  // ...
});

Where the params can be configured with the following parameters:

Bucket
Delimiter
EncodingType
Marker
MaxKeys
Prefix

But what are Delimiter and Prefix? And how to use them?

Let’s start by creating some objects in an Amazon S3 bucket similar to the following file structure. This can be easily done by using the AWS Console.

.
├── directory
│   ├── directory
│   │   └── file
│   └── file
└── file
2 directories, 3 files

In Amazon S3, the objects are:

directory/
directory/directory/
directory/directory/file
directory/file
file

One thing to keep in mind is that Amazon S3 is not a file system. There is not really the concept of file and directory/folder. From the console, it might look like there are 2 directories and 3 files. But they are all objects. And objects are listed alphabetically by their keys.

To make it a little bit more clear, let’s invoke the listObjects method. Since the operation has only Bucket parameter is required:

1
2
3

params = {
  Bucket: 'example'
};

The response data contains in the callback function:

{ Contents: 
   [ { Key: 'directory/',
       LastModified: ...,
       ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
       Size: 0,
       Owner: [Object],
       StorageClass: 'STANDARD' },
     { Key: 'directory/directory/',
       LastModified: ...,
       ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
       Size: 0,
       Owner: [Object],
       StorageClass: 'STANDARD' },
     { Key: 'directory/directory/file',
       LastModified: ...,
       ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
       Size: 0,
       Owner: [Object],
       StorageClass: 'STANDARD' },
     { Key: 'directory/file',
       LastModified: ...,
       ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
       Size: 0,
       Owner: [Object],
       StorageClass: 'STANDARD' },
     { Key: 'file',
       LastModified: ...,
       ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
       Size: 0,
       Owner: [Object],
       StorageClass: 'STANDARD' } ],
  CommonPrefixes: [],
  Name: 'example',
  Prefix: '',
  Marker: '',
  MaxKeys: 1000,
  IsTruncated: false }

If this is a file structure, you might expect:

1 2	directory/ file

But it is not, because a bucket does not work like a folder or a directory, where the immediate files inside the directory is shown. The objects inside the bucket are laid out flat and alphabetically.

In UNIX, a directory is a file, but in Amazon S3, everything is an object, and can be identified by key.

So, how to make Amazon S3 behave more like a folder or a directory? Or how to just list the content of first level right inside the bucket?

In order to make it work like directory you have to use Delimiter and Prefix. Delimiter is what you use to group keys. It does have to be a single character, it can be a string of characters. And Prefix limits the response to keys that begin with the specified prefix.

Delimiter

Let’s start by adding the following delimiter:

params = {
  Bucket: 'example',
  Delimiter: '/'
};

You will get something more like a listing of a directory:

{ Contents: 
   [ { Key: 'file' } ],
  CommonPrefixes: [ { Prefix: 'directory/' } ],
  Name: 'example',
  Prefix: '',
  MaxKeys: 1000,
  Delimiter: '/',
  IsTruncated: false }

There are a directory directory/ and a file file. What happened was that the following objects except file are grouped by the delimiter /:

directory/
directory/directory/
directory/directory/file
directory/file
file

So, result in:

1 2	directory/ file

This feels more like a listing of a directory or folder. But if we change Delimiter to i, then, you get no Contents and just the prefixes:

{ Contents: [],
  CommonPrefixes: [ { Prefix: 'di' }, { Prefix: 'fi' } ],
  Name: 'example',
  Prefix: '',
  MaxKeys: 1000,
  Delimiter: 'i',
  IsTruncated: false }

All keys can be grouped into two prefixes: di and fi. Therefore, Amazon S3 is not a file system, but might act like one if using the right parameters.

As I have mentioned that Delimiter does not need to be a single character:

{ Contents: 
   [ { Key: 'directory/' },
     { Key: 'directory/file' },
     { Key: 'file' } ],
  CommonPrefixes: [ { Prefix: 'directory/directory' } ],
  Name: 'example',
  Prefix: '',
  MaxKeys: 1000,
  Delimiter: '/directory',
  IsTruncated: false }

If recall the bucket structure:

directory/
directory/directory/
directory/directory/file
directory/file
file

Both directory/directory/ and directory/directory/file are grouped into a common prefix: directory/directory, due to the common grouping string /directory.

Let’s try another one with Delimiter: 'directory':

{ Contents: 
   [ { Key: 'file' } ],
  CommonPrefixes: [ { Prefix: 'directory' } ],
  Name: 'example',
  Prefix: '',
  MaxKeys: 1000,
  Delimiter: 'directory',
  IsTruncated: false }

Okay, one more. Let’s try ry/fi:

{ Contents: 
   [ { Key: 'directory/' },
     { Key: 'directory/directory/' },
     { Key: 'file' } ],
  CommonPrefixes: 
   [ { Prefix: 'directory/directory/fi' },
     { Prefix: 'directory/fi' } ],
  Name: 'example,
  Prefix: '',
  MaxKeys: 1000,
  Delimiter: 'ry/fi',
  IsTruncated: false }

So, remember that Delimiter is just providing a grouping functionality for keys. If you want it to behave like a file system, use Delimiter: '/'.

Prefix

Prefix is much easier to understand, it is a filter that limits keys to be prefixed by the one specified.

With the same structure:

directory/
directory/directory/
directory/directory/file
directory/file
file

Let’s set the Prefix parameter to directory:

{ Contents: 
   [ { Key: 'directory/' },
     { Key: 'directory/directory/' },
     { Key: 'directory/directory/file' },
     { Key: 'directory/file' } ],
  CommonPrefixes: [],
  Name: 'example',
  Prefix: 'directory',
  MaxKeys: 1000,
  IsTruncated: false }

How about directory/:

{ Contents: 
   [ { Key: 'directory/' },
     { Key: 'directory/directory/' },
     { Key: 'directory/directory/file' },
     { Key: 'directory/file' } ],
  CommonPrefixes: [],
  Prefix: 'directory/' }

Both directory and directory/ prefixes are the same.

If we try something slightly different, Prefix: 'directory/d':

{ Contents: 
   [ { Key: 'directory/directory/' },
     { Key: 'directory/directory/file' } ],
  CommonPrefixes: [],
  Prefix: 'directory/d' }

Putting all together with both Delimiter: 'directory' and Prefix: 'directory':

{ Contents: 
   [ { Key: 'directory/' },
     { Key: 'directory/file' } ],
  CommonPrefixes: [ { Prefix: 'directory/directory' } ],
  Prefix: 'directory',
  Delimiter: 'directory' }

First, list the keys prefixed by directory:

directory/
directory/directory/
directory/directory/file
directory/file

Group them by the delimiter directory with prefix directory:

1	directory/directory

The result Contents are:

1 2	directory/ directory/file

and CommonPrefixes are:

1	directory/directory

Maybe changing Delimiter to i could give a better perspective:

{ Contents: 
   [ { Key: 'directory/' } ],
  CommonPrefixes: [ { Prefix: 'directory/di' }, { Prefix: 'directory/fi' } ],
  Prefix: 'directory',
  Delimiter: 'i' }

as:

directory/               # key to show
directory/directory/     # group to 'directory/di'
directory/directory/file # group to 'directory/di'
directory/file           # Group to 'directory/fi'
file                     # ignored due to prefix

One advantage of using Amazon S3 over listing a directory is that you don’t need to concern about nested directories, everything is being flattened. So, you can loop it through just by specifying the Prefix property.

Directory/Folder

If you’re using the Amazon AWS console to “Create Folder”, you can create a directory/folder and upload a file to inside the directory/folder. In effect, you are actually creating two objects with the following keys:

1 2	directory/ directory/file

If you use the following command to upload a file, the directory is not created:

1	$ aws s3 cp file s3://example/directory/file

Because, Amazon S3 is not a file system, but a key/value store. If you use listObjects method, you will just see one object. That is the same reason that you cannot copy a local directory:

1 2	$ aws s3 cp directory s3://example/directory upload failed: aws/ to s3://example/directory [Errno 21] Is a directory: u'/home/chao/tmp/directory/'

But we can use the JavaScript SDK to create a directory/folder:

s3.putObject({ Bucket: 'example', Key: 'directory/' }, function (err, data) {
  if (err) { return console.error(err); }
  console.log(data);
});

Note that you must use directory/ with trailing slash instead the one without. Otherwise, it is just a regular file not a directory.

realguess

Don't afraid to try!