Using docker to build a simple elastic search infrastructure.

Preparing the infrastructure

We will create an Elastic Search infrastructure using docker. We will only focus on two parts of the stack:

  • Elastic Search which is the database engine
  • Cabana which is a nice dash boarding tool

In order to create our stack, we will use docker-compose. So open a text file and save it as docker-compose.yml in a folder on your computer.

Paste the following text in the file:

#COMPOSE ELK
############
 esnode1:
 image: snuids/elasticwithplugins:v2.4
 environment:
 - ES_JAVA_OPTS=-Xmx1g -Xms1g
 ports:
 - "9201:9200"
 container_name: esnode1

######
 kibana:
 image: kibana:4.6.1
 ports:
 - "5601:5601"
 environment:
 - ELASTICSEARCH_URL=http://esnode1:9200
 container_name: kibana
 links:
 - esnode1

Type the following command in command shell:

docker-compose up -d

So now, you can access Elastic Search here: http://localhost:9201

And Kibana here: http://localhost:5601

Note that we did not use the official elastic search release but ours (snuids/elasticwithplugins:v2.4) which includes the wonderful Kopf and HQ plugins.

You can access kopf here: http://localhost:9201/_plugin/kopf

And HQ here: http://localhost:9201/_plugin/hq

Filling some data

In order to fill the data we will use a small python program that retrieves public bikes information from the public rent a bike service.

from datetime import datetime
from elasticsearch import Elasticsearch
import urllib2
import json
import time

client = Elasticsearch(hosts=['127.0.0.1:9201'])

def fetch_villo():
    url = 'http://opendata.bruxelles.be/api/records/1.0/search/?dataset=stations-villo-disponibilites-en-temps-reel&rows=1000&facet=banking&facet=bonus&facet=status&facet=contract_name'

    h = urllib2.urlopen(url)
    res= h.read()
    data = json.loads(res)
    res=res.replace("\u0","")

    bulk_body="";

    for station in data["records"]:
        station["fields"]["name"]=station["fields"]["name"].replace(" ","_").replace("-","_").replace("/","_")
        jsondata=json.dumps(station);
        bulk_body += '{ "index" : { "_index" : "villowithid", "_type" : "station","_id":"'+station["recordid"]+'"} }\n'
        bulk_body += jsondata+'\n'
        bulk_body += '{ "index" : { "_index" : "histvillo", "_type" : "station"} }\n'
        bulk_body += jsondata+'\n'

    print "Bulk ready."
    client.bulk(body=bulk_body)
    print "Bulk gone."

for i in range(0,10):
    print '*'*80
    fetch_villo();
    time.sleep(30)
    print '*'*80

Basically this software calls via http a REST service that returns all the stations with various parameters such as:

  • The number of bikes available
  • The number of rented bikes
  • The name of the station
  • Is there a banking system at the station

It then constructs a bulk load for elastic search. A bulk load is basically multiple inserts in a single http post. This is mandatory in order to get decent performance. Making a http post per record is by far too slow.

We are constructing two indexes, one without history and one with history. In order to update the document and not create a new one, we simply add an _id to the record.

When an _id is added to a record, the record is updated if it already exists or created otherwise.

It is easy to check that the index are created via Kopf as shown in the following picture:

screen-shot-2016-10-24-at-21-38-37

In order to use the data of each index in Kibana, the indexes must be loaded via the “Settings” option of Kibana.

screen-shot-2016-10-24-at-21-41-23

Repeat the same operation with villowithid* index. Now they are two available indexes in Kibana.

Creating the dashboards

It is now possible via the Visualise option to construct a first dashboard.

Pick a Vertical Bar Chart in the visualise menu of Kibana and choose the villowithid* index.

Fill the left part of the parameters in order to match the following screenshot:

screen-shot-2016-10-24-at-22-13-13

Save your visualisation via the upper right floppy icon.

Make another visualisation and choose the Pie Panel option.

screen-shot-2016-10-24-at-22-15-23

Save your visualisation and use the dashboard option. Click on the “plus” button to add your saved visualisations.

Once added, you can use the upper right part of the screen in order to change the refresh rate. Click on a part of the pie panel in order to filter some data from the other graph e.t.c.

screen-shot-2016-10-24-at-22-18-20

In the next post we will see why we added the strange line and how to prepare elastic search in order to avoid replacing all the white spaces by underscores.

station["fields"]["name"]=station["fields"]["name"].replace(" ","_").replace("-","_").replace("/","_")