In the part 1, we saw how to display the number of docs and their sizes per node. In the real life, it helps but we need more than that.

The classic way to organise your data in elastic is to use one index per day by appending the current date to the index. You end up with hundreds of indexes which are created via an index template in order to set the object mapping properly.

At this point, it is easy to determine how much docs each individual index has and how much size on disk it uses with a plugin like cerebro. But it is painful to sum all the indexes using their patterns in order to get the values we want.

So basically the operations that must be done are (If you use index templates):

  • Read all the indexes templates
  • Scan all the indexes and add the size and number of docs to an aggregated index name (The index name without the date part)
  • Push the results to elastic search in order to be able to create dashboard of indexes tendencies

We will add these functions to our current python code available here.

Reading the templates:

The elastic search python bindings does not include any function that can be used to retrieve the index template list. Te only way to get it is to use a post request using the REST API.

curl -XGET 'localhost:9201/_template?pretty'

Result:

{
 "elksupervisor" : 
 {
   "order" : 0,
   "template" : "elastic_stat*",
   "settings" : { },
   "mappings" : {
   "stat" : {
...

The import part is the “template field” that can be used to match indexes like:

elastic_stat-2016.12.16

Scanning all the indexes:

Unfortunately, the python bindings, does not have any nice function to read the indexes properties, so we will use a REST call again:

curl -XGET 'localhost:9201/_stats?pretty'

After that it becomes easy to build to aggregates indexes and the last step is to simply push the values via a bulk insert.

The documents will have the following format:

{
 "_index": "elastic_stat-2016.12.10",
 "_type": "indice",
 "_id": "AVjoSOxr2ZICcxbIuQK7",
 "_score": null,
 "_source": {
 "docs": 519828,
 "@timestamp": 1481365836000,
 "name": "yourindicename",
 "size": 230198634
 },
 "fields": {
 "@timestamp": [
 1481365836000
 ]
 },
 "sort": [
 1481365836000
 ]
}

In a few days, you will be able to build nice dashboards showing the size and number of docs by aggregated indexes.

Note that the current code counts all docs, so if you are using replicas, the number of total docs will include the replicas.

The full source code is available here.

A docker version of this application here.

Docker-Compose:

##############################
  elasticsupervisor:
    image: snuids/elasticsupervisor:v0.2d
    container_name: elasticsupervisor
    links:
      - esnode1
    environment:
      - ELASTIC_ADDRESS=esnode1:9200
      - PYTHONUNBUFFERED=0