Docker is the best tool to quickly check new technologies without the hassles of installations.

In this post, we will write a simple docker compose file that will fire a 3 nodes Cassandra cluster.

Docker-Compose file

The first step is of course to have docker installed on your system.

A newer version of this docker-compose using the version 3 of docker-compose is available here.

Create a text file, name it docker-compose.yml and copy/paste the following text via your preferred text tool.

version: '2'
services:

###############################
   cassandra0:
    image: cassandra
    container_name: cassandra0
    ports:
     - 9042:9042
     - 7199:7199

###############################
   cassandra1:
    image: cassandra
    container_name: cassandra1
    ports:
     - 9142:9042
    links:
     - cassandra0:seed
    environment:
     - CASSANDRA_SEEDS=seed

###############################
   cassandra2:
    image: cassandra
    container_name: cassandra2
    ports:
     - 9242:9042
    links:
     - cassandra0:seed
    environment:
     - CASSANDRA_SEEDS=seed

This docker files defines three containers. The cassandra0 container is the one used as main seed. The two others link it and get the seed via the CASSANDRA_SEEDS environment variable which is set to “seed” which is an alias to the cassandra0 container.

Start the containers via the following command:

docker-compose up -d

Checking the installation

Check the status of the first node via the following command:

docker exec cassandra0 nodetool status

You should get the same results as the ones shown in the following screenshot:

Cassandra2.jpg

Repeating the same operation and using the other container names (cassandra1, cassandra2) should give the same results.

The important information in the results part of the command are the two letters at the beginning of each node line. It should be UN. U for Up and N for Normal.

Creating a key space via Python

Use the following python program in order to create a keyspace and a table inside it.

import logging

log = logging.getLogger()
log.setLevel('INFO')
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
log.addHandler(handler)

from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement

KEYSPACE = "mykeyspace"

def createKeySpace():
    cluster = Cluster(contact_points=['127.0.0.1'],port=9142)
    session = cluster.connect()

    log.info("Creating keyspace...")
    try:
        session.execute("""
            CREATE KEYSPACE %s
            WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }
            """ % KEYSPACE)

        log.info("setting keyspace...")
        session.set_keyspace(KEYSPACE)

        log.info("creating table...")
        session.execute("""
            CREATE TABLE mytable (
                mykey text,
                col1 text,
                col2 text,
                PRIMARY KEY (mykey, col1)
            )
            """)
    except Exception as e:
        log.error("Unable to create keyspace")
        log.error(e)

createKeySpace();

You can use the cqlsh tool included in each container to check your key space using the following commands:

docker exec -it cassandra0 cqlsh

You can then use the command “describe mykeyspace” in order to get details about the newly created key space.

CassandraCQLSH.jpg

Let’s insert some data

We will again use python to insert a few records in our table. Add the following function to the program.

def insertData(number):
    cluster = Cluster(contact_points=['127.0.0.1'],port=9142)
    session = cluster.connect()

    log.info("setting keyspace...")
    session.set_keyspace(KEYSPACE)

    prepared = session.prepare("""
    INSERT INTO mytable (mykey, col1, col2)
    VALUES (?, ?, ?)
    """)

    for i in range(number):
        if(i%100 == 0):
            log.info("inserting row %d" % i)
        session.execute(prepared.bind(("rec_key_%d" % i, 'aaa', 'bbb')))

insertData(1000)

Reading the freshly inserted data is not that difficult using a function similar to the one below:

def readRows():
    cluster = Cluster(contact_points=['127.0.0.1'],port=9142)
    session = cluster.connect()

    log.info("setting keyspace...")
    session.set_keyspace(KEYSPACE)

    rows = session.execute("SELECT * FROM mytable")
    log.info("key\tcol1\tcol2")
    log.info("---------\t----\t----")

    count=0
    for row in rows:
        if(count%100==0):
            log.info('\t'.join(row))
        count=count+1;

    log.info("Total")
    log.info("-----")
    log.info("rows %d" %(count))

By stopping and starting one of the three containers, it is easy to check that the system is still available even if one of its node is shutdown. Change the port in the python program from 9142 to 9042 or 9242 in order to communicate with a particular node.