Thursday, January 2, 2014

Building a Data Mart with Pentaho Data Integration (Video Course)

I have been an enthusiastic follower of the Pentaho open source business intelligence movement for many years. At the beginning of 2013 I got asked to create a video tutorial/course on populating a star schema with Pentaho Kettle. This was my first foray into video tutorials. This video is now available on the Packt website.

To me the most interesting experience on this project was finding an open source columnar database. Certainly I could have just gone down the road of using a standard row-oriented one: But having worked on projects which made use of commercial columnar databases, I quite well understood their advantage. To my surprise, the landscape of open source columnar database was quite small. There has been some revival of sorts in the Hadoop world with Impala etc (using dedicated file formats), but this was at that time probably a bit too much cutting edge. The tutorial required a DB, which had established itself for some time and was easy to install: MonetDB. This is the same DB which is actually used by Kettle as well for Instaview.  This gave me the opportunity to discuss bulk loading and talk about some advantages of columnar DBs.

Creating these videos was not quite as easy as I initially anticipated. I spent actually quite a lot of time on this project and at the end of 2013 rerecorded most of the video sessions to fix some pronunciation problems (Although I’ve lived in the UK for 9 years now I can’t quite hide my roots ;)) as well as rewriting all the files to work with PDI v4.4 (initially I was working with a trunk version of PDI v5).

I do hope that these videos provide the viewer with a nice introduction into this exciting topic. As I mention at the beginning of the course, this is not an introduction to Pentaho Kettle in general - I do assume that the viewer already has some basic Pentaho Kettle knowledge. Furthermore I decided to only focus on the Linux command line - but it shouldn’t be all to difficult for the viewer to translate everything to a Windows or Mac OS X environment as well. Is this course perfect? I don’t think so - but for my first foray into the video tutorial world I do hope it is worthwhile and teaches the viewer a few tips and tricks.

Lastly I want to thank my reviews for their support and their honest feedback, Unnati at Packt Publishing for the administrative side and finally Brandon Jackson for his help, support and work on some bugs related to MonetDB bulk loader!


  1. Congratulations Diethard! I have purchased my copy. Looking forward to watching it.