Getting started with Apache Camel

A precious integration framework very helpful for ETL.

Introduction

Extracting, Transforming and Loading data (ETL) has always been a painful task. Every system has its own specificities or protocols. A lot of frameworks attacked this problem and a few of them are now very mature.

On the open source side, there is:

On the commercial part, there is:

  • Mule
  • IBM ESB

We will focus here on Apache Camel. Note that Camel is a library and not a software, so the first step is to write a small program that will be responsible of the ETL using the Camel library.

Note that Apache ServiceMix is a Camel Library integration similar to the one we are doing here. It is powerful but a little bit more complex for new users. We will perhaps cover this product in a next blog.

Getting Started

Camel is written in java. Our goal here is not to write our ETL tools in java but to write integration patterns using the xml DSL available in Camel. So you do not need to know java to use Camel. But you need a basic understanding of IT in order to build our integration application.

Note that a fully dockerized version is available on Dockhub for those that don’t really want to know how the application is built.

Getting a Java IDE.

We suggest to download the NetBeans IDE as it is probably the easiest one to use for beginners.

Downloading the application skeleton

You can directly download the application skeleton here.screen-shot-2016-10-15-at-09-57-23

The folder includes the following files:

  • pom.xml: The maven build system configuration file
  • CamelRunner.java: The java code used to start the process
  • log4j.properties: A basic configuration for the logger
  • config1.xml: A basic ETL configuration file

You can use the Netbeans IDE to open the project directly.

Analyzing the pom file

A maven pom file is basically a file explaining how to build a java application. It handles critical part of the build such as solving the various library references that could be a headache without such a system.

There are two important things in this pom file. First the Camel Version is defined as a parameter in the top of it. It lets you pick a particular version of Camel without downloading anything manually. Change the version number and maven will take care of all the dependencies. (The pom.xml uses the version 2.18.0)

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!-- AMA 18 Amr 2016: Initial release -->

<modelVersion>4.0.0</modelVersion>
<groupId>com.pi2s</groupId>
<artifactId>CamelRunnerMaven</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<name>CamelRunner</name>
<url>http://maven.apache.org</url>

<properties>
<camel.version>2.18.0</camel.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>

Second, the list of dependencies that must be linked. These are included in the tag dependencies.

<dependency>
  <groupId>org.apache.camel</groupId>
  <artifactId>camel-core</artifactId>
  <version>${camel.version}</version>
</dependency>

<dependency>
  <groupId>org.apache.camel</groupId>
  <artifactId>camel-script</artifactId>
  <version>${camel.version}</version>
</dependency>

<dependency>
  <groupId>org.apache.camel</groupId>
  <artifactId>camel-jdbc</artifactId>
  <version>${camel.version}</version>
</dependency>

This list is very important as it determines the keyword we will be able to use in our integration file later.

Analyzing the java file

package camelrunner;

import java.io.File;
import java.nio.charset.Charset;
import java.util.Date;
import java.util.Timer;
import java.util.TimerTask;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.FileSystemXmlApplicationContext;

/**
* @author snuids
*/

public class CamelRunner extends TimerTask{

static final String version = "1.0.0";
static final Logger logger = LoggerFactory.getLogger("CamelRunner");
static ApplicationContext context;
private File configFile;
private long lastFileUpdateTime;

public static void main(String[] args) throws Exception {

logger.info("Version:" + version + " on java:" + System.getProperty("java.version"));
logger.info("SYNTAX: java -jar xxx.jar /PATH/TO/thefile.xml");
logger.info("Default Charset=" + Charset.defaultCharset());

String current = new java.io.File( "." ).getCanonicalPath();
logger.info("Current dir:"+current);

if(args.length&amp;amp;lt;=0)
{
logger.error("No context file specified.");
return;
}

File configFile = new java.io.File( args[0]);
logger.info("Opening:"+configFile.getAbsolutePath());
if(!configFile.exists())
{
logger.error("File not found.");
return;
}

// Loading the context file
try {
logger.info("Creating context...");
context = new FileSystemXmlApplicationContext("file:"+configFile.getAbsolutePath());
logger.info("Done.");
} catch (Exception espring) {
logger.error("Error while starting spring:" + espring.getMessage(), espring);
}

if (context == null) {
logger.error("Unable to load spring context");
return;
}

logger.info("Main finished");

Timer timer = new Timer();
// repeat the check every second
timer.schedule( new CamelRunner(configFile), new Date(), 1000 );

}

public CamelRunner(File configFile)
{
this.configFile=configFile;
lastFileUpdateTime=configFile.lastModified();
}

@Override
public void run()
{
logger.info("Checking File.");
if(configFile.lastModified()!=lastFileUpdateTime)
{
logger.info("File chanded. Exiting.");
System.exit(0);
}
}
}

The code is easy to follow, it basically open a spring xml context file and starts it. The rest of the code simply checks if the xml file was changed and stops the application if a change is detected. This will be useful when the application will be deployed inside a docker container.

Analyzing the log4j.properties

That’s a very simple file letting the user configure how the log should behave. In this simple configuration file, everything is logged to the console.

# Define the root logger with appender file

log4j.rootLogger = INFO,console

# Console appender
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=[%-5p] %d %c - %m%n
log4j.appender.console.threshold=INFO

Analyzing the config1.xml file

This is were the ETL takes place. Basically Camel executes routes. A route has a starting point and one or more ending point. So for example a starting point could be a timer and an ending point a logger as shown below.

<route>
  <from uri="timer://hello?fixedRate=true&amp;period=5000"/>
  <setBody>
    <constant>Hello World !!!</constant>
  </setBody>
  <to uri="LOG:Hi"/>
</route>

Running the application

Start the NetBeans IDE and choose the open project option.

screen-shot-2016-10-15-at-10-50-58

Right click the project and pick the properties option. This will pop up the following window:

screen-shot-2016-10-15-at-10-58-59

Fill up the following fields in the “Run” section:

  • Main Class: camelrunner.CamelRunner
  • Arguments: configs/config1.xml
  • VM Options: -Dlog4j.configuration=file:log4j.properties

Click on the Run button and your first camel application is ready.

Running the application from the command line

In order to run it from the command line, the application must be packaged. Hit the build button from the IDE. The system builds a jar that is stored in the target directory. The lib folder contains all the libraries linked with the application and must stay with the jar.

snuids@AMAMBookPro2 ~/Desktop/SimpleCamelRunner/target $java -Dlog4j.configuration=file:../log4j.properties -jar CamelRunner.jar ../configs/config1.xml
[INFO ] 2016-10-15 15:27:11,353 CamelRunner - Version:1.0.0 on java:1.7.0_51
[INFO ] 2016-10-15 15:27:11,353 CamelRunner - SYNTAX: java -jar xxx.jar /PATH/TO/thefile.xml
[INFO ] 2016-10-15 15:27:11,354 CamelRunner - Default Charset=UTF-8

Note the path to the log4j and to the config file with the .. part in order to go back to the parent folder.

Where to go next

You should dockerize your application in order to ease its deployment as explained in this post.