jq: Filter JSON Array by Value
Print the JSON object in an array where the object key is equal to the specified value with jq:
|
|
Reference:
|
|
Print the JSON object in an array where the object key is equal to the specified value with jq:
|
|
Reference:
|
|
Some utilities print out JSON data (object) line by line. Line by line JSON is not really a valid JSON format, need to wrap it in bracket as an array.
For example, here are a few lines of JSON objects:
|
|
To combine them into a single array, we can use jq‘s -s, —slurp option:
|
|
The option reads the entire input stream into a large array. This is good until the input stream is too large to process, because jq is not producing output while slurping up the input. If the input is line by line JSON, and all you want to do is wrapping it into an array (with or even without the trailing newline), we can simply do the following:
|
|
This is with the newline printed. And it will solve the problem of a very large input. Data will simply stream out as it comes it. Try it with a forever while
loop.
There is another option in jq named —stream, and it seems to be doing the same. But option —slurp overrides —stream, and the option itself is already so complicated.
In conclusion, you can use jq -s
to wrap line by line JSON into an array. If you want to be safe, just use the awk example above.
Register a domain with Amazon Route 53 via CLI is very straightforward, the only complicate thing is to generate the acceptable JSON structure, then somehow enter the JSON string in the command line without meddling from the shell.
The command is:
|
|
The documentation can be found from the AWS CLI Command Reference or issuing:
|
|
Most the command line options require simple string values, but some options like --admin-contact
requires a structure data, which is complex to enter in the command line. The easiest way is to forget all other options and use --cli-input-json
to get everything in JSON.
First let’s generate the skeleton or template to use:
|
|
That should output something like:
|
|
Fill in the fields, leave out the ones not needed:
|
|
Now let’s register by running through jq to clean up extra blanks and end of line characters, and then pipe to xargs
command by using new line character as the delimiter instead of a blank by default:
|
|
Packages built in both Ubuntu and Debian packages lack behind, therefore, to get the latest version of jq, build from source.
There are a few prerequisites to install:
Both GCC and Make are usually installed if you do development, but not Autotools. Luckily, this is easy to fulfill:
|
|
Install from source:
|
|
The installed path is at:
|
|
However, this gives me an unexpected tag:
|
|
Amazon SQS or Simple Queue Service is a fast, reliable, scalable, fully managed message queuing service. There is also AWS CLI or Command Line Interface available to use with the service.
If you have a lot of messages in a queue, this command will show the approximate number:
|
|
Where $url
is the URL to the Amazon SQS queue.
There is no command to delete all messages yet, but you can chain a few commands together to make it work:
I have created a jq Docker image based on BusyBox with automated builds. BusyBox is really really small in size, so the jq image I have created is also very small, just a little over 6 MB.
Here is the source code and image repositories:
Building Docker image is pretty straight-forward, just follow the instruction:
https://docs.docker.com/userguide/dockerrepos/#automated-builds
I have also created a tag v1.4
in my GitHub repository to match the release of jq binary. This should also be reflected in Docker registry. After couple tries, here is the build details for adding both latest
and 1.4
tags:
|
|
The first two columns match git branch and tag, and the last column reflects Docker tags. See https://github.com/realguess/docker-jq/tags and https://registry.hub.docker.com/u/realguess/jq/tags/manage/.
This is done by starting an automated build with type of tag
instead of branch
. Everything is done via the Docker Hub website.
If I pull down this repository:
|
|
It should give me two image layers with the two different image IDs, which makes sense, as latest commit usually is not the same as the tagged one.
After a while, the index should be built, and I can search it via:
|
|
The image is listed as AUTOMATED
, but the description is missing here in the search command.
There are two types of descriptions:
Full description will be generated automatically via README.md
file. I thought the short description can also be generated via README-short.txt
, however, this is not the case. You can add it in the settings page for example:
https://registry.hub.docker.com/u/realguess/jq/settings/
Automated builds are triggered automatically with GitHub and BitBucket repository. Once any commit is pushed to either repository, a trigger will be sent to Docker Hub, and automated build will start.
In the previous post, I have written about how to split a large JSON file into multiple parts, but that was limited to the default behavior of mongoexport
, where each line in the output file represents a JSON string. If you have to deal with a large JSON file, such as the one generated with --jsonArray
option in mongoexport
, you can to parse the file incrementally or streaming.
I have downloaded a large JOSN data set (about 144MB) from Data.gov. If you try to read the entire data set into memory:
> var json = require('./data.json')
Killed
The process is not able to handle it. Use streaming is necessary. And luckily, our command line JSON processing tool, jq, supports streaming.
The parts we are interested are encapsulated in an array under data
property of the data set. We are going to split each element of the array into its own file.
Don’t try to use -f
option in jq to read file from the command line, it will read everything into a memory. Instead, do cat data.json | jq
.
$ mkdir parts
$ cat data.json | jq -c -M '.data[]' | \
while read line; do echo $line > parts/$(date +%s%N).json; done
The entire data set is piped into jq to filter and compress each array element. Each element is printed in one line, and each line is saved into its own JSON file by using UNIX timestamp plus nanosecond as the filename. All pieces are saved into parts/
directory.
But there is one problem with embedded JSON string, which has to do with echo
, due to backslash. For example, if echoing the following string:
{"name":"{\"first\":\"Foo\",\"last\":\"Foo\"}","username":"foo","id":1}
It will be printed as an invalid JSON:
{"name":"{"first":"Foo","last":"Foo"}","username":"foo","id":1}
Backslashes are stripped. To fix this problem, we can simply double backslash:
$ cat data.json | jq -c -M '.data[]' | sed 's/\\"/\\\\"/g' | \
while read line; do echo $line > parts/$(date +%s%N).json; done
You can even try curl
the remote JSON file instead using cat
from the downloaded file. But you might want to try with a smaller file first, because, with my slow machine, it took me nearly an hour to finish splitting into 678,733 parts:
real 49m35.780s
user 2m42.888s
sys 6m48.048s
To take it a little bit further, the next step is to decide how many lines or array elements to write into a single file.
What is the best command line tool to process JSON?
Hmm… Okay, let’s try different command line JSON processing tools with the following use case to decide which one is the best to use.
Here is the use case: JSON | filter | shell
. A program outputs JSON data, pipes into a JSON command line processing tool to filter data, and then send to a shell command to do more work.
Here is a snippet of sample JSON data:
|
|
Or in one-liner data.json
:
|
|
The command line JSON processor should filter each element of the array and convert it into its own line:
|
|
The result will be piped as the input line by line into a shell script echo.bash
:
|
|
The final output should be:
|
|
Before start looking for existing tools, let’s see how difficult it is to write a custom solution.
|
|
Perform a test run:
|
|
Well, it works. In essence, we are writing a simple JSON parser. Unless you want to keep the footprint small, you don’t want to write another JSON parser. And why bother to reinvent the wheel? Let’s start look at the existing solutions.
Let’s start with the tools from NPM registry:
$ npm search json command
Here are a few candidates that appears to be matching from the description:
Not a lot of choice, and modules are not active. This might be that because there is already a really good solution, jq, which has 2493 stars and 145 forks, and the last update was 6 days ago.
jq is like
sed
for JSON data - you can use it to slice and filter and map and transform structured data with the same ease thatsed
,awk
,grep
and friends let you play with text. - jq
Instead of NPM install, do:
$ sudo apt-get -y install jq
Since we don’t need color or prettified, just line by line. So, here is the command chain:
|
|
jq can do much more than just the example just shown. It has zero runtime dependencies, and flexible to deal with not just array but object as well.
jq is clearly the winner here, with the least dependency, the most functionality and more popularity, as well as a comprehensive documentation.