Creating a federated data service with Pentaho Kettle
- Kettle (PDI) 5: download here [Not for production use]
- You are familiar with Pentaho Kettle (PDI)
- You are familiar with the Linux command line
What is the goal?We have data sitting around in various disparate databases, files, etc. By creating a simple Kettle transformation which joins all these data together, we can provide a data service to various applications via a JDBC connection. This way, the application does not have to implement any logic on how to deal with all these disparate data sources, but instead only connect to the one Kettle data source. These applications can send standard SQL statements to our data service (with some restrictions), which in turn will retrieve the data from all the various disconnected data sources, join them together and return a result set.
This Kettle feature is fairly new and still in development, but it holds a lot of potential.
Configure the Kettle transformationI created a very simple transformation which gets some stock data about lenses with prices in GBP (For simplicity sake I use a Data Grid step. In real world scenarios this would be a Database Input step). We get the current conversion rate from a web service and use this rate to convert our GBP prices to EUR. The transformation looks like this:
You can download the transformation from here.
Note the yellow database icon on the top right hand corner of the Output (Select Values) step. This indicates that this step is used as Service step. This can be configured in the Transformation Properties by pressing CTRL+T:
You also have the option to catch the service data in the local memory.
Perform a preview on the last step (named Output):
This is basically the dataset which we want to be able to query from other applications.
Configure CarteIf you don’t already have a configuration file in the PDI root directory, create one:
And paste this xml in there (please adjust the path to the ktr file):
Save and close.
Let’s start the server now passing the config file as the only argument:
sh carte.sh carte-config.xml
Query service data from an applicationOnce the server has started successfully, you can access the service by any client of your choice as long as they support JDBC. Examples of clients are Mondrian, Squirrel, Pentaho Report Designer, Jaspersoft iReport, BIRT, and many many more.
For simplicity sake, we will just query the data service directly from Kettle:
- Click on the View tab.
- Right click on Database Connections and choose New Connection Wizard.
- Enter the following details:
|Driver||Kettle Thin JDBC Driver (org.pentaho.di.core.jdbc.ThinDriver)|
Then click the Test button. Kettle should be able to successfully connect to our data service.
Finally, click OK.
Next we just want to execute a simple SQL query. In the View tab, in Database connections, right click on the connection name you just created and choose SQL Editor and insert the following query and click execute:
SELECT * FROM lensStock WHERE price_gbp > 100
Note that the table name is the service name that we configured earlier on in the carte-config.xml.
The returned dataset will look like this:
Some other applications ask for a JDBC connection string, which looks like this: