Elasticsearch is a wonderful database for performing full-text on rich documents at terabyte-scale.
It’s already pretty easy to talk to Elasticsearch. You can
-
use the HTTP-based, REST API via commmand-line tools like curl, your favorite HTTP library, or even your browser’s URL bar
-
use the interface built on Apache Thrift
-
use the native Java classes
What’s missing was a command-line shell that let you directly inspect Elasticsearch’s “filesystem” or “database schema”, run queries, and in general muck about. I got sick of writing things like
$ curl -s -X GET "http://localhost:9200/_status" | ruby -rjson -e 'puts JSON.parse($stdin.read)["indices"]["my_index"]["docs"]["num_docs"]'
How about
$ es /_status --only=indices.my_index.docs.num_docs
Install the gem:
$ sudo gem install elasticshell
which installs a program ‘es’ that you can run from the command line to start Elasticshell. Try
$ es --help
right now to see that everything is properly installed. You’ll also see a brief survey of Elasticshell’s startup options.
To start an Elasticshell session, just run
$ es
Elasticshell will automatically try to connect to a local Elasticsearch database running on the default port. You can modify this with the startup options. Type help
at any time to get some contextual help from Elasticshell.
Within Elasticshell, there are three variables whose values affect behavior. These variables are reflected in the default prompt, for example:
GET /my_index/my_type$
This prompt tells us three things:
-
The default HTTP verb we’re using for requests is
GET
. -
The default API “scope” we’re in is
/my_index/my_type
. If the shell is connected to an Elasticsearch server and the scope exists, it will be colored green. Otherwise it’s yellow. -
Elasticshell will print raw responses from the database – this is the
$
at the end of the prompt. If we were in pretty-print mode, this would become a$$
.
Elasticshell will try to connect to the Elasticsearch hosts passed with the --servers
option during startup. At any other time, you can connect to servers by issuing the connect
command:
GET /$ connect http://192.168.1.10:9200 http://192.168.1.11:9200 http://192.168.1.12:9200 http://192.168.1.13:9200
Scopes are defined by the Elasticsearch REST API. Some scopes like /_cluster
or /_nodes
are static and present for all Elasticsearch clusters. Other scopes like /my_index/my_type
depend upon the particular cluster.
Use the cd
built-in to move between scopes:
GET /$ cd /blog/comments GET /blog/comments$ cd .. GET /blog$ cd /blog/entries GET /blog/entries$ cd GET /$
The ls
command will show the contents of a given scope:
GET /$ ls blog _cluster _nodes _status
but the ll
command gives more output:
GET /$ ll i 1/1/0 5 3.3kb blog s _cluster s _nodes - _status
Here you see that blog
-
is an index (the
i
in the first column) -
has 1 total shard, 1 successful shard, and 0 failed shards
-
has 5 documents
-
occupies 3.3kb of space on disk
And, because of the s
in the first-column, _cluster
and _nodes
are scopes – you can cd
into them.
Finally, _status
is a request, you can’t cd
into it, but you can issue it.
If you were to first cd
into the index and run ll
you’ll see different output:
GET /$ cd /blog/ GET /blog$ ll m comments m entries - _aliases - _search - _stats - _status
_aliases
and so on are just more requests you can make but comments
and entries
are mappings (they have an m
in the first-column).
You can change Elasticsearch’s default HTTP verb by giving it one. Here’s the same thing in two steps:
GET /$ PUT PUT /$ /my_new_index
You can also do this on a per-request basis
GET /$ PUT /my_new_index GET /$
Typing pretty
at any time will toggle Elasticsearch’s pretty-printing format on or off.
GET /$ pretty GET /$$
The extra $
-sign means it’s pretty…
Scopes are fine for organizing the API but to get anything done you’ll have to send a request.
Each scope has different fixed, named requests, as per the Elasticsearch API documentation. Within a scope, tab-complete on the first word to see a list of possible commands. Hit enter after a command to see output from Elasticsearch.
Here’s a command to get the status for the cluster:
GET /$ _status
Here’s a command to get the health of the cluster:
GET /$ cd _cluster GET /_cluster$ health
which you could also have issued like this
GET /$ _cluster/health
Commands will also accept a query string, as in this example of a search through my_index
:
GET /my_index$ _search?q=foo+AND+bar
In the above example the query foo AND bar
was passed via the query string part of a URL. Passing a more complex query requires we put the query in the body of the request. If you’re willing to forego using spaces you can do this right on the same line:
GET /my_index$ _search {"query":{"query_string":{"query":"foo"}}}
But if you want more expressiveness you can either name a file (with tab-completion) that contains the body you want:
# in /tmp/query.json { "query": { "query_string: { "query": "foo AND bar" } } }
followed by
GET /my_index$ _search /tmp/query.json
Or you can do cat
-style, pasting the query into the shell, by using the -
character:
GET /my_index$ _search - { "query": { "query_string: { "query": "foo AND bar" } } }
Don’t forget to use Ctrl-D
to send an EOF
to flush the input of the query.
You can redirect the output from a request in a variety of ways.
Most simply is to redirect to a file:
GET /my_index$ _search /tmp/query_with_lots_of_output.json > /data/output_file.json
Or you can try sending it to Ruby itself:
GET /my_index$ _search /tmp/query_that_needs_filtering.json | puts response["hits"]["hits"].first["body"]
Everything to the right of the |
is executing within a Ruby process that has the response
and request
variables in scope. You can be even more free by just piping without any Ruby code, which will leave you in a Ruby (Pry) shell with the same binding.
GET /my_index$ _search /tmp/query_that_needs_interaction.json | >> response => {"took"=>1, "timed_out"=>false, ... } >> request => {:verb=>"GET", :path=>"/_search", :query_options=>{}, :body=>""}
Hit CTRL-D to get out of this new interactive Ruby shell and back to Elasticshell.
Instead of running Elasticshell interactively, you can exit after running only a single command by feeding the request path directly to the es
script. For example
$ es /_cluster/health $ es --scope=/_cluster health $ es --verb=GET /_cluster/health
all work like you think they do.
The --only
option can also be passed a .
-separated hierarchical list of keys to slice into the resulting object. This is useful when trying to drill into a large amount of data returned by Elasticsearch. The example from the start of this file is relevant again here:
$ es /_status --only=indices.my_index.docs.num_docs