Using docker to build a simple elastic search infrastructure.
Preparing the infrastructure
We will create an Elastic Search infrastructure using docker. We will only focus on two parts of the stack:
- Elastic Search which is the database engine
- Cabana which is a nice dash boarding tool
In order to create our stack, we will use docker-compose. So open a text file and save it as docker-compose.yml in a folder on your computer.
Paste the following text in the file:
#COMPOSE ELK ############ esnode1: image: snuids/elasticwithplugins:v2.4 environment: - ES_JAVA_OPTS=-Xmx1g -Xms1g ports: - "9201:9200" container_name: esnode1 ###### kibana: image: kibana:4.6.1 ports: - "5601:5601" environment: - ELASTICSEARCH_URL=http://esnode1:9200 container_name: kibana links: - esnode1
Type the following command in command shell:
docker-compose up -d
So now, you can access Elastic Search here: http://localhost:9201
And Kibana here: http://localhost:5601
Note that we did not use the official elastic search release but ours (snuids/elasticwithplugins:v2.4) which includes the wonderful Kopf and HQ plugins.
You can access kopf here: http://localhost:9201/_plugin/kopf
And HQ here: http://localhost:9201/_plugin/hq
Filling some data
In order to fill the data we will use a small python program that retrieves public bikes information from the public rent a bike service.
from datetime import datetime from elasticsearch import Elasticsearch import urllib2 import json import time client = Elasticsearch(hosts=['127.0.0.1:9201']) def fetch_villo(): url = 'http://opendata.bruxelles.be/api/records/1.0/search/?dataset=stations-villo-disponibilites-en-temps-reel&rows=1000&facet=banking&facet=bonus&facet=status&facet=contract_name' h = urllib2.urlopen(url) res= h.read() data = json.loads(res) res=res.replace("\u0","") bulk_body=""; for station in data["records"]: station["fields"]["name"]=station["fields"]["name"].replace(" ","_").replace("-","_").replace("/","_") jsondata=json.dumps(station); bulk_body += '{ "index" : { "_index" : "villowithid", "_type" : "station","_id":"'+station["recordid"]+'"} }\n' bulk_body += jsondata+'\n' bulk_body += '{ "index" : { "_index" : "histvillo", "_type" : "station"} }\n' bulk_body += jsondata+'\n' print "Bulk ready." client.bulk(body=bulk_body) print "Bulk gone." for i in range(0,10): print '*'*80 fetch_villo(); time.sleep(30) print '*'*80
Basically this software calls via http a REST service that returns all the stations with various parameters such as:
- The number of bikes available
- The number of rented bikes
- The name of the station
- Is there a banking system at the station
- …
It then constructs a bulk load for elastic search. A bulk load is basically multiple inserts in a single http post. This is mandatory in order to get decent performance. Making a http post per record is by far too slow.
We are constructing two indexes, one without history and one with history. In order to update the document and not create a new one, we simply add an _id to the record.
When an _id is added to a record, the record is updated if it already exists or created otherwise.
It is easy to check that the index are created via Kopf as shown in the following picture:
In order to use the data of each index in Kibana, the indexes must be loaded via the “Settings” option of Kibana.
Repeat the same operation with villowithid* index. Now they are two available indexes in Kibana.
Creating the dashboards
It is now possible via the Visualise option to construct a first dashboard.
Pick a Vertical Bar Chart in the visualise menu of Kibana and choose the villowithid* index.
Fill the left part of the parameters in order to match the following screenshot:
Save your visualisation via the upper right floppy icon.
Make another visualisation and choose the Pie Panel option.
Save your visualisation and use the dashboard option. Click on the “plus” button to add your saved visualisations.
Once added, you can use the upper right part of the screen in order to change the refresh rate. Click on a part of the pie panel in order to filter some data from the other graph e.t.c.
In the next post we will see why we added the strange line and how to prepare elastic search in order to avoid replacing all the white spaces by underscores.
station["fields"]["name"]=station["fields"]["name"].replace(" ","_").replace("-","_").replace("/","_")
Leave a Reply