Monday, January 30, 2012

Pentaho Data Integration (Kettle) Big Data features go Open Source

I just read some pretty exciting news: All the Big Data features in Pentaho Data Integration (Kettle) will be made available in the open source version. This means you have now free access to all the Hadoop, Cassandra, HBase, MongoDB steps and job entries, a move which will certainly increase the popularity of this ETL tool even further. The Kettle GUI allows you to easily create transformations and jobs which import, transform, export etc your data.
You can find a very interesting tutorial on how to design a MapReduce job with Kettle here.

Saturday, January 21, 2012

Comparison of resource sharing features in open source reporting

Comparison of resource sharing features in open source reporting tools

In large scale reporting projects sharing resources is key for efficiency. This article tries to compare the three most popular open source tools to understand to which extend their feature set supports the various resource files that are likely to be shared.

As an example: Maintain style definitions in a centralized global file. Instead of having to change a particular style in all the reports, this can simple be done in one global file. The clear advantages are: (1) It saves a lot of time and (2) consistency: every style is defined the same way for all reports and it is not as error prone as changing the style in every report.

Apart from styles there are several other report properties which can be maintained in global external files to achieve a similar effect.



Sunday, January 8, 2012

Book review: Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing: Delivering the Promise of Business Intelligence

A high percentage of classic waterfall model business intelligence projects are failing. One of the main reasons for this is that the waterfall model is a sequential model: Simply speaking, you plan first, then develop, then test etc and then the whole project is supposed to be finished. BI project normally span several months. During this time, requirements change, the priority of the requirements might change, the understanding of the client in regards to what BI actually is changes, etc. A classic waterfall model cannot  accomodate these changing requirements easily over time as planing was done in the first step only.
With "Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing: Delivering the Promise of Business Intelligence" Ken Collier introduces agile methodology to BI projects.  Readers new to the agile methodology will find a detailed introduction (i.e. What are user stories? How do I conduct an agile project?), learn about the iterative cycles (sprints) which allow feedback driven development, various other approaches like test driven development, continuous integration and much more. Ken Collier also introduces the Message Driven Warehouse which among other benefits allows to easily and quickly implement new requirements.
In a nutshell, it is one of the best books on BI which I read last year, so I can only highly recommend it.