I just read some pretty exciting news: All the Big Data features in Pentaho Data Integration (Kettle) will be made available in the open source version. This means you have now free access to all the Hadoop, Cassandra, HBase, MongoDB steps and job entries, a move which will certainly increase the popularity of this ETL tool even further. The Kettle GUI allows you to easily create transformations and jobs which import, transform, export etc your data.
You can find a very interesting tutorial on how to design a MapReduce job with Kettle here.
Monday, January 30, 2012
Saturday, January 21, 2012
Comparison of resource sharing features in open source reporting toolsIn large scale reporting projects sharing resources is key for efficiency. This article tries to compare the three most popular open source tools to understand to which extend their feature set supports the various resource files that are likely to be shared.
As an example: Maintain style definitions in a centralized global file. Instead of having to change a particular style in all the reports, this can simple be done in one global file. The clear advantages are: (1) It saves a lot of time and (2) consistency: every style is defined the same way for all reports and it is not as error prone as changing the style in every report.
Apart from styles there are several other report properties which can be maintained in global external files to achieve a similar effect.
Sunday, January 8, 2012
Book review: Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing: Delivering the Promise of Business Intelligence
With "Agile Analytics: A Value-Driven Approach to Business Intelligence and Data Warehousing: Delivering the Promise of Business Intelligence" Ken Collier introduces agile methodology to BI projects. Readers new to the agile methodology will find a detailed introduction (i.e. What are user stories? How do I conduct an agile project?), learn about the iterative cycles (sprints) which allow feedback driven development, various other approaches like test driven development, continuous integration and much more. Ken Collier also introduces the Message Driven Warehouse which among other benefits allows to easily and quickly implement new requirements.
In a nutshell, it is one of the best books on BI which I read last year, so I can only highly recommend it.