Open source business intelligence tutorials: Pentaho, Talend, Jasper Reports, BIRT and more.
Topics: Data Integration, Data Warehousing, Data Modeling, BI Server Setup, OLAP, Reporting, Dashboarding, Master Data Management and many more.
Sunday, December 12, 2010
Saturday, November 27, 2010
How to Set Up Pentaho Community Build Framework
- Introduction
- Pentaho BI Server Setup with CBF
- Java, Tomcat and Ant
- Subversion
- Database
- CBF Build XML
- Download a BI Server Build
- Create build.properties
- Start Hypersonic DB
- Create the build
- Ubuntu (Linux) Setup
- Using Applications > Ubuntu Software Center
- Using Terminal
- Create the .ant Directory
- XMLTask.jar
- CBF Build XML
- Download a BI Server Build
- Create build.properties
- Create patches
- Mac OS X Setup
Introduction
- Easily upgrade to new version of the BI server without going through a long setup process
- Easily create environment specific versions of your BI server without going through a long setup process
Pentaho BI Server Setup with CBF
Introducing Community Build Framework
We are in the lucky position that Pedro Alves from Webdetails offers the freely available CBF (Community Build Framework), which makes setting up and upgrading the BI Server for various environments a relative easy task. I try to talk you through the setup for the most popular operating sytems.
The main folder/file structure of CBF looks like this (simplified):
- CBF
- Pentaho BI Server source code folder(s): Each version of the source code will be stored in one folder. These folders are left completely untouched. Changes to the original source can be made via patches.
- Project folder(s): For each project you can create a separate folder (i.e. project-nike, project-puma, ...) which holds the project specific configurations and files.
- config
- build.properties file, which tells ant what and where we want to build. A minimal config contains the path to your Java compiler, solutions and web server (among others).
- build-env.properties (optional) where env is a placeholder for any environment. So you could name it build-prod.properties, build-test.properties etc. This is very neat as you can have global settings in the build.properties and environment specific ones in the specific build-env.properties. When we are using Ant to build our BI server, ant will load the build.properties and then use the specific build-env.properties.
- patches: all files that need some kind of modification are copied here from the source code (maintaining nearly the same folder structure) and replacing constant values with tokens in form @tokenname@. The tokens are specified in the build.properties.
- solution: This folder holds your reports, xactions etc. (same as Pentaho solutions folder on standard BI Server)
- Target-Build: All source code files will be copied to this folder to build your BI server. All patches will be applied to this BI server.
- Target-Dist: The BI Server will be copied to this folder after everything has been built
- build.xml: Note that any build.xml doesn't work with all versions of the BI server. When downloading make sure you choose the version of build.xml that accepts the version of your BI server.
Note that the Pentaho BI Server source code folder(s), target-build and target-dist are shared among all projects. target-build and traget-dist are handled by ant, so you don't have to do anything about them.
Note: Currently the main folder structure has to be created mostly manually.
Create following folder structure to start with. This is common for all operating systems. We will discuss soon OS specific steps.
- Pentaho
- CBF
- bi-server-source
- project-tutorial
- config
- patches
- solution
- target-build
- target-dist
The folder for the source code will be created automatically once we download the code with SVN.
In general, the below is how a folder structure could look like after completion of the setup. Ours will look slightly different.
The pentaho directory is where we will copy the a recent build of Pentaho BI Server. The idea is to leave this directory untouched and do all changes in the project-client directory. The below is an example structure, which depends on your specific project needs:
Pedro Alves explains:
- All changes that would normally go to pentaho/* are placed under "patches" directory (project-client/patches/).
- The CBF ant script will pick up the files in the project-client/patches/ directory, scan for tokens and replace the tokens with the variables defined inside the project-client/config/build.properties files, and copy the files to the top level directory of the entire project (In this example MyProjectDir [remark: here we change it to CBF]).
- It's not recommended to patch anything under pentaho/*; sources changes are patched in to target-build/* and all other changes are made by patching the final directory, target-dist."
Windows setup
Java, Tomcat and Ant
- Java: JDK [Java Standard Edition; short Java SE] (JRE is not enough as Ant needs JDK to work properly). Read installation instructions. Install it in D:\Pentaho
- Tomcat: At the time of writing, v6 was the most recent one. The location of the tomcat directory is not important; it can be placed almost any where. The location of the tomcat will be set inside of the CBF build.properties file. I installed Tomcat in D:\Pentaho
- Ant: Download Apache Ant 1.8.1 and place it in D:\Pentaho. Installation notes: here.
Set environment variables
Using the GUI
Set the JAVA_HOME environment variable to the directory where you installed JDK:
- Click on New under the System variables section.
- Type JAVA_HOME in the variable name field.
- Type D:\Program Files\Java\jdk1.6.0_21 in the variable value field
Set the ANT_HOME environment variable to the directory where you installed Ant:
- Click on New under the System variables section.
- Type ANT_HOME in the variable name field.
- Type C:\ant in the variable value field.
Set the PATH environment variable to include the directory where you installed the Ant bin directory:
- Find the PATH environment variable in the list. If PATH is not listed, click on New under the System variables section.
- Type %ANT_HOME%\bin;%JAVA_HOME%\bin;
Using Command Line
Check Ant and Java are working
Creating the .ant directory
For our work we will need an optional ant task that is not available with the default set up. In your ant directory you find a file called build.xml.- system - store in Ant's lib directory
- user - store in the user's home directory
- optional - store in Ant's source code lib/optional directory, used if building Ant source code
XMLTask.jar
Download XMLTask.jar from here and move the jar file to the .ant\lib\ folder.Subversion
Install on of the distributions. I provide here a list of options, choose one that suits you:- CollabNet Subversion Edge 1.2.2
- http://www.sliksvn.com/en/download and a client like http://tortoisesvn.net/downloads or subclipse (an Eclipse plugin). For some additional info have a look here: http://www.codinghorror.com/blog/2008/04/setting-up-subversion-on-windows.html
- Hudson is also an option as it includes subversion
- Or download http://www.cygwin.com/ and issue the Linux command.
Database
Let's get started with Hypersonic DB, let on you can change to your own choice of DB.- Download HSQL 2.0 http://sourceforge.net/projects/hsqldb/files/ and save it in a convenient folder
- Start the DB Server by double clicking on runServer.bat (in D:\Pentaho\hsqldb-2.0.0\hsqldb\bin)
CBF Build XML
Download a BI Server Build
Create build.properties
An example of a build.properties file you can find here. Make sure that for Windows you use double backslashes in your paths!BASE_URL = put your URL here
Example:
So basically, the files stays exactly the same, you only replace some values by tokens (highlighted in red).
The tokens @solution.deploy.path@ and @BASE_URL@ are defined in the project-client/config/build.properties or project-client/config/build-client.properties files and will be replaced by the CBF ant script and the new revised web.xml with the replaced tokens will be copied to the top level directory (In this example CBF).
Start Hypersonic DB
Go to D:\Pentaho\hsqldb-2.0.0\hsqldb\bin and start runServer.bat.Create the build
In your command line tool go to the CBF folder and issue ant -Dproject=tutorial -p, which will show you all the parameters that you can pass to the build.xml. Find an extract below:D:\Pentaho\CBF>ant -Dproject=tutorial -p
Now issue the following:
dist-clean will delete any previous buids.
Ubuntu (Linux) Setup
- Open terminal (Applications > Accessories > Terminal).
- Issue javac -version to see if you have a recent JDK installed.
The next section describes the setup for users who want to avoid working with command line as much as possible. Users familiar with the Terminal, please jump to "Using Terminal".
Using Applications > Ubuntu Software Center
- In Ubuntu Software Center search for "openjdk-6-jdk" and click to install it [OPEN]
- search for Ant, "Java based built tool like Make, Ant" will show up. Click the install button.
- Search for subversion. "Advanced version control system, subversion" will shop up. Click the install button. This package includes the subversion client (svn), tools to create a Subversion repository (svnadmin) and to make a repository available over a network (svnserve). The fastest way now to progress is to use the Terminal. Follow these instructions, skip the first step as we already installed subversion. Your svn repository should then reside in /usr/local/svn/repos.
- Search for subversion and choose one of the clients, like Subcommander
- Search for Tomcat. At the time of this writing, Tomcat 6 was the current version. Install it.
- Download a hypersonic database from here. Go to Place > Home Folder, click File > Create Folder and name it "Pentaho". Unzip the HSQLDB file in the Downloads folder and move the unzipped folder to the recently created "Pentaho" folder. [/home/diethardsteiner/Pentaho]
- You can also download MySQL if you want (some install info you can find here). We will not cover setting up the environment with MySQL here, but you can later on progress to include MySQL in your environment.
- Still being in the "Pentaho" folder, go to File > Create Folder and name it "CBF". Download CBF's build.xml form the CBF Wiki page, extract it and move it to the recently created "CBF" folder. Mark the file, hit F2 and rename it to build.xml. [/home/diethardsteiner/Pentaho/CBF]
Using Terminal
- Install Java: sudo apt-get install openjdk-6-jdk
- Install ant: sudo apt-get install ant
- Install subversion: sudo apt-get install subversion. A good documentation on how to set up subversion on Ubuntu can be found here. Follow these instructions. Your svn repository should then reside in /usr/local/svn/repos. Additional info can be found here.
- Install Tomcat: sudo apt-get install tomcat6
- Download a hypersonic database from here. Create a folder: mkdir $Home/diethardsteiner/Pentaho. Move the folder in this directory.
Create the .ant Directory
Ant is located in usr/share/ant. Follow the instructions mentioned in the Windows section: click here.XMLTask.jar
CBF Build XML
Download a BI Server Build
See here
Create build.properties
See hereCreate patches
See hereMac OS X Setup
Subversion is included in Mac OS X Leopard and Snow Leopard. Have a look at this tutorial on how to get it running.
Kettle: Handling Dates with Regular Expression
Kettle: Handling Dates with Regular Expression
- The string must start with 4 numbers. We enclose the definition by brackets to create the first capturing group. Note: I added #1. This is a comment and helps to mark the capturing groups for easy reference.
- Next we say that a dash can follow or not. This is our 2nd capturing group.
- I guess you get the idea for the remaining capturing groups. In the end we make sure that nothing else follows, hence we use the dollar sign.
Friday, November 26, 2010
Review "Pentaho Kettle Solutions"
A short review of the "Pentaho Kettle Solutions" book
Matt Casters, Roland Bouman and Jos van Dongen's Kettle bible was released about 3 months ago and I finally managed to finish reading it (600+ pages!!!).- Answer 1: Both. These books are very different in what they are trying to bring across. They are not really overlapping, so it makes sense reading both.
- Answer 2: If you want to have a quick start in a practical step by step fashion, get "Pentaho 3.2 Data Integration: Beginner's Guide"
- Answer 3: If you want to understand the bigger picture, then go for "Pentaho Kettle Solutions".
Thursday, November 18, 2010
Pentaho Kettle Data Input: Pivoted Data
Pentaho Kettle Data Input: Pivoted Data
Friday, November 12, 2010
Using regular expressions with Pentah...
Using regular expressions with Pentaho Data Integration (Kettle)
Monday, November 1, 2010
PDI Kettle Plugins
Pentaho Data Integration Plugins
Agile BI
- Unzip the file into the data-integration/plugins/spoon directory. It will create a folder in there named agile-bi. Start spoon and the new capabilities will automatically be available.
- Once you have done this, fire up Spoon, create a transformation which outputs the data to a completely denormalized table.
- Once there is data in this table, right click on the table output step, choose Model.
- In the model view, you can click the "Auto populates model with default dimensions and measures" icon. If this doesn't do a decent job generating your model, you can always change it manually.
- Once you have properly prepared your model, save it and return to the data integration perspective.
- Right click on the table output step again and choose Visualize > Analyzer. In this perspective you can fully dig into your data and discover any problems. For example, my data set has a country data point. The values are supposed to be full country names, but I realize that somehow in my data "AR" shows up instead of "Argentina". So I can go back to the data integration perspective, do the necessary changes to the transformation, save it, run it again, go back to the Analyzer, refresh the data and I can see that now all my country values are valid. This is an absolute time saver and very efficient approach to quality checking your data.
Kettle Franchising Factory
The Kettle Franchising Factory (KFF) adds on top of the existing kettle platform the necessary tools to open multiple data integration restaurants in a rapid, flexible and organised way. KFF allows you to deploy a large series of data integration solutions (multi-customer, multi-solution) in a fully standardized way.
KFF is composed of:
Kettle plugins
re-usable transformations/jobs
logging/scheduling framework
standards
naming conventions
best practices for set-up
directory structures
Kettle Cookbook
Pentaho Report Output Step
Matt Casters made this step available: It allows you to pass data points to a PRPT (Pentaho Report). You can specify where the report template is located, to which directory and in which format the report should be outputted and also specify report parameters. You can use this in simple scenarios where you just want to output a single report and an more complex fashion for report bursting i.e..Tuesday, September 28, 2010
Mondrian MDX and Schema Validation Difference PDR and Schema Workbench
When discussing this topic on the Mondrian developer mailing list, Julian Hyde commented the following:
"It looks like PRD is using mondrian to validate formulas. I suspect that it is an earlier version of Mondrian, which had weaker validation rules. I don't recall why we made the change, but people will log bugs that MDX succeeds in SSAS and fails in mondrian, and we will (rightly) change mondrian.
As Thomas pointed out in the comment below, have a look at the mondrian.properties file located in the PDR folder report-designer\resources. You can find various settings there like this one:
mondrian.olap.elements.NeedDimensionPrefix=true
This seems to be the one that stopped my "not so accurate" MDX queries to run. I do not recommend changing this setting though, but advise to write precise MDX queries and make sure that the calculated members in your Schema have to complete reference as well.
UPDATE 2010/10/04:
Don't change this one: It's important that your Schema and MDX has properly defined syntax.
Now open psw-ce-3.2.0.13661\schema-workbench\mondrian\properties and add the above highlighted properties from the PRD properties file (if these properties already exist, amend them so that they are exactly set the same way).