Monday, November 1, 2010

PDI Kettle Plugins

Pentaho Data Integration Plugins 

Agile BI

This extremly useful plugin can be downloaded from the Pentaho website (plugin for PDI 4, plugin for PDI 4.1 RC1). [Due to the fact that this plugin is not open source, PDI doesn't have it installed by default. Pentaho was so kind to make it available for the community version for free.]
  1. Unzip the file into the data-integration/plugins/spoon directory. It will create a folder in there named agile-bi. Start spoon and the new capabilities will automatically be available.
  2. Once you have done this, fire up Spoon, create a transformation which outputs the data to a completely denormalized table. 
  3. Once there is data in this table, right click on the table output step, choose Model. 
  4. In the model view, you can click the "Auto populates model with default dimensions and measures" icon. If this doesn't do a decent job generating your model, you can always change it manually. 
  5. Once you have properly prepared your model, save it and return to the data integration perspective. 
  6. Right click on the table output step again and choose  Visualize > Analyzer. In this perspective you can fully dig into your data and discover any problems. For example, my data set has a country data point. The values are supposed to be full country names, but I realize that somehow in my data "AR" shows up instead of "Argentina". So I can go back to the data integration perspective, do the necessary changes to the transformation, save it, run it again, go back to the Analyzer, refresh the data and I can see that now all my country values are valid. This is an absolute time saver and very efficient approach to quality checking your data. 

There are a couple of outer things you can do with this plugin as well, i.e. create a report with the wizard known from the Report Designer.

Kettle Franchising Factory

This is a very interesting project to give you a framework for ETL development. The project description reads as follows:

The Kettle Franchising Factory (KFF) adds on top of the existing kettle platform the necessary tools to open multiple data integration restaurants in a rapid, flexible and organised way. KFF allows you to deploy a large series of data integration solutions (multi-customer, multi-solution) in a fully standardized way.

KFF is composed of:
Kettle plugins
re-usable transformations/jobs
logging/scheduling framework
standards
naming conventions
best practices for set-up
directory structures


I hope that we see further development on it. You can find a presentation about it here and download it here.

Kettle Cookbook

Another very promising project initiated by Roland Bouman (the co-author of the excellent Pentahos Solution books): This is not really a plugin, but a job that auto generates a documentation based on the description you added to your steps, jobs, etc. So there are no excuses any more not to create a documentation! Have look here for more info.

Pentaho Report Output Step

Matt Casters made this step available: It allows you to pass data points to a PRPT (Pentaho Report). You can specify where the report template is located, to which directory and in which format the report should be outputted and also specify report parameters. You can use this in simple scenarios where you just want to output a single report and an more complex fashion for report bursting i.e..
Please find more information about it here.

Excel 2007 XLSX Output Step

Slawo was so kind to provide this step which will be of much use if you are mainly working with newer versions of MS Office. You can find more info here.

As you see, plugins can add some interesting features to Kettle which facilitate our work enormously!

2 comments:

  1. Thank you for all the useful info! :)

    ReplyDelete
  2. Glad to hear that you liked it! I just added your blog to my Google reader.

    ReplyDelete