Thursday, December 10, 2009

Rethinking the Pentaho Report Designer Layout

Rethinking the Pentaho Report Designer Layout

The Pentaho Report Designer (PRD) has evolved to a very feature-rich product. In this article I want to point out one problem that I still have with this product from a usability point of view.

Imagine following scenario: You are about to create a new report for your CEO. In a nutshell the report should have first a summary cross-tab with the essential KPIs and then below some more detailed product data in a standard table (let's keep it simple). 

Now currently you would create the product report/table in the body of your main report and then you would add a subreport to the header, which would reference the cross-tab which lives in a separate reporting file. Now that might all have a technical reason why PRD is set up like this, but from a usability point of view, it is just not an ideal solution. This is a simple example, imaging if we would have to include more subreports in the main report. Another problem with subreports is that you don't really see in the main report if the layout of the subreport is in harmony with the rest of the main report. You have to execute the report in order to see this, then go back to the subreport, execute it again, check if it is fine and if not you have to go back again.

Now it is all easy to sit here and write these lines if you are not involved in developing PRD. I am writing this article as a constructive criticism because I am a big fan of PDR and want to see it become the best report designer. For me, from a user perspective, it would be way easier if PRD offered report type elements that you could drag and drop onto the canvas, just as you would do it now in PRD 3.5 with labels, charts etc. This way you would not be force straight away to follow a specific given structure (Imagine the case where I would only have one chart in a main report in the header and the reporting body/details would be completely unused).

So let's say we would have report elements like "Crosstab Report","Classic table report", etc that you just drag and drop onto the canvas when you need them. Within those elements you define all the necessary settings and you create all the necessary other elements. We are doing this in the same window (we don't have to go to a different window). We control all our data connections in one place and we see all our design in one place.

Over the time Pentaho might introduce various other report type elements, that you then can just drag and drop on the canvas as well. Overall I think that this approach would facilitate designing a report. PRD 3.5 was a huge step forward and I am positive that we will see great new features with the next versions.

Friday, December 4, 2009

Pentaho Reporting 3.6 Milestone 1 is out!

Great news over on the Pentaho Community Forum! The first milestone of PRD 3.6 has been released. Have a look at the release notes here.

There have been several bug fixes plus the addition of some great new features. I just want to highlight some that are important for me:

  • Added OLAP4J (Advanced) and Mondrian (Advanced) datasources. These datasources work exactly as the SQL (Advanced) datasource by allowing the query to be computed by a formula.
  • Formulas can be used in parameters now. There are two formula types: Display-value computation for lists, and for all parameters a post-processor.
  • Parameter-Support added to OLAP4J (except for members and sets)

Download it now from SourceForge and give it a try!

Wednesday, December 2, 2009

Exporting Characters as UTF-8 from Ke...

Exporting characters as UTF-8 from Kettle

Recently I took over a project for our Russian office, which strangely enough is part of the UK & International region of the company I am working for. This was the first time I was exposed to handling Cyrillic characters.

Basically there are following points to take into concern:
  • Make sure your MySQL table uses the UTF-8 encoding
  • Make sure that in the database connection details in Kettle following options are set: characterEncoding=utf8, characterSetResult=utf8,useUnicode=true.
  • Once you ran the ETL process and populated, don't worry if MySQL Query Studio displays the characters as an array of pipes (in this case the Cyrillic fonts are not installed). If you see question marks, well then, something is still wrong.

Tuesday, November 3, 2009

Using Parameters in Pentaho Report Designer

You wonder what all this fuss is about? Or are you one of the many who have been struggling to use parameters in your data queries?

Using parameters with Pentaho Report Designer 3.5 (PRD) and later versions is not a trivial topic. The main problem is, that for each data source the parameter definition is different. Using parameters with standard database queries is fairly easy, using it with MDX, MQL queries and other data sources is a bit more an advance topic.

We will start off with discussing the easiest implementation and then we will move on to discuss solutions for MQL and MDX queries.

How to define a parameter

PRD 3.5 is a major improvement over previous versions of this software. One of the new features is, that you can define parameters within your report, which means, that you do not have to use Xactions any more. (You can still use Xactions for more complex logic, but generally speaking, defining the parameters within your report will make your life easier).

Within the data tab in PRD you can see a section called "Parameters". Right click on it and select "Add parameter". In the dialog you can define your parameter name, define the input type (drop down menu etc.), add a data source to populate the input dialog and make the parameter mandatory. This is all fairly straight forward and easy to set up.

Note: You have to set up the parameter before you reference it in your data queries.

Additional info about the new parameter dialog

Thomas has a very informative article on his blog about various parameter settings. I have just copied this info into here for a better overview. This section covers settings that were not available in Pentaho Report Designer 3.5, but only later releases.

::start quotation::

Hidden Parameters

By selecting a tiny little check-box, parameters can now be excluded from the UI-generation. The parameter itself still behaves like any other parameter and thus takes part in the formula-calculations and is part of the validation.

Parameter Validation can be loosened

When this flag is set to false, the validator will accept other values than the ones that have been specified in a list-parameter. With relaxed validation, you can easily write WHERE conditions, that accept partial matches or contain “LIKE” clauses. For the sake of security, we none-the-less default this feature to strict-checks.


Every List-Parameter can now have computed display-texts. This way, you are no longer limited by what your data source can produce, and even a simple data-source like an XML file or a static table-datasource can produce better looking prompts.

Post-Processing Formula

A post-processing formula processes the parameter-input before it gets validated or passed into the report. The post-processing can either be defined on the parameter itself or can be defined as an extra field by using a hidden parameter. The computed value will still be validated by the engine and thus using formula’s does not necessarily mean you have to fear that illegal input comes through. Formulas allow you to access runtime information, like the current user’s roles or the user’s name via the ENV(..) function.

::end quotation::

Using parameters with database queries

From PRD 3 to PRD 3.5 the way you define your parameters in a SQL query has slightly changed.
You define a parameter within a SQL query like this: ${parametername}. Make sure you do not use enclosing single or double quotation marks, as PRD knows already how to display the parameter.


start_date <= ${my_start_date}

Using parameters with Metadata/MQL queries

The easiest way to define the parameter is by using the MQL query builder. Add the data point to the filter and write {VARIABLENAME} in the value field (replace VARIABLENAME with your actual variable name; make sure it is enclosed with curly brackets). Both the parameter and the condition will then be created for you.

If you want to define your parameter in the MQL directly, use following approach:
A parameter for an MQL query can be define as follows:


<condition>[TABLE.COLUMN] = [param:MY_PARAMETER]</condition>

Note: There are no enclosing brackets or a $ for the parameter_name, but the whole reference must have enclosing square brackets!

Let's have a look now at how to use date parameters in MQL. Normally, if you are using a date in MQL, you would have to use DATEVALUE("2009-11-01"), but it seems that if you are using a data parameter, DATEVALUE() shouldn't be used. As Thomas (Taqua) points out "'Normally' probably refers to 'in the old days'. But in the old days, Metadata had no parametrization at all, and all the parametrization that happened in the BI-server was a simple string-manipulation exercise on the XML code that represents the query. Thats why you in reality only passed in strings and then had to convert it via a DATEVALUE function. (And of course, its just a horrible way to do parametrization, as you now have to be careful that your strings do not break the XML or even rewrite the XML for fun and profit.)"

Define your date parameter called i.e. my_date, choose the date picker input and set the type to date. 

In your MQL you reference the paramter like this:

[Registration Time.BC_SN_LT_TIME_DIMENSION_THE_DATE] &gt;=[param:my_date]

I would also suggest to have a look at the MQL syntax Pentaho Wiki documentation, which is very informative!

NEW: Report Desinger 3.6 (release notes)
Added OLAP4J (Advanced) and Mondrian (Advanced) datasources. These datasources work exactly as the SQL (Advanced) datasource by allowing the query to be computed by a formula.
Formulas can be used in parameters now. There are two formula types: Display-value computation for lists, and for all parameters a post-processor.
Parameter-Support added to OLAP4J (except for members and sets)

Using parameters with Mondrian/OLAP/MDX queries

How to pass on a single value

Imagine you want to pass on a single value to the member of a simple dimension like [Login Date].[Date].  You can use a function called parameter, which basically looks like this: parameter("parametername", dimension, member default value). 

The member default value will be used (you guessed it) in case no parameter value is available. This is very useful if you are still designing your report in PRD.

Here is an example of using this approach:

LastPeriods(6,Parameter("my_date",[Login Date],[Login Date].[2009-09-12]))
[Measures].[Distinct Users]
from [Logins]

Since PRD 3.6 I've been also using the standard variable approach of ${variable} successfully:

MEMBER [Measures].[7daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(7.0),
MEMBER [Measures].[6daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(6.0),
MEMBER [Measures].[5daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(5.0),
MEMBER [Measures].[4daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(4.0),
MEMBER [Measures].[3daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(3.0),
MEMBER [Measures].[2daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(2.0),
MEMBER [Measures].[1daysago] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(1.0),
SET [Period] AS
(CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(7.0) : CurrentDateMember([Date.Date Digits],
"[""Date.Date Digits""]\.[""Date Digits""]\.[yyyy-mm-dd]").Lag(1.0))
MEMBER [Measures].[Period] AS
MEMBER [Capsule].[Total Live Capsules] AS
(NOT ([Capsule].[Capsule].CurrentMember.Name MATCHES ".*test.*")))),([Measures].CurrentMember))
NON EMPTY (UNION(({Filter({[Capsule].[Capsule].Members},
(NOT ([Capsule].[Capsule].CurrentMember.Name MATCHES ".*test.*")))} * {[Creative].Children}),
({[Capsule].[Total Live Capsules]} * {[Creative]}),
({[Site].[${VAR_PARTNER_NAME}].[UNKNOWN].[${VAR_CLIENT_NAME}]} * {[Bucket Type].[TREATMENT]} * {[Ad Type].[Ad Type].[Retargeted]})

How to pass on multiple values

It is very likely that you have to populate nearly the whole dimension, i.e. [Login Date.Monthly Calendar].[Monthly Calendar All].[Year].[Quarter].[Month].[Day] which would look like this: [Login Date.Monthly Calendar].[Monthly Calendar All].[2009].[Q3].[Jul].[2]

Now it gets pretty interesting: Download the latest built of PRD from here, as this will not work in PRD 3.5. The latest built includes a post-processing formula option for a parameter. Once you have the latest version installed, create a new report.

Add a parameter called my_date, choose Date Picker as type and String as format type. Now we have to write an open formula for the Post-Processing Formula entry (which is within the parameter definition dialog).

I would advise you to do this step by step, outside of the parameter definition dialog, by defining several open formula functions to extract the year, month, quarter and day part of the date and to prepare all of them separately accordingly. Drag and drop each formula function onto the report header and run your report to see if the results are correct. Once you are sure that all the results are correct, add another open formula function and concatenate all the other formula functions and additional strings that will make your final parameter. Drag and drop this final formula again on the report header and test the output.

In my case, the formula looked like this (and this must be the most impressive open formula that I have ever written):

=("[Login Date.Monthly Calendar].[Monthly Calendar All].[" & YEAR([my_date]) & "].[" & "Q" & IF(MOD(MONTH([my_date]);3 )=0; MONTH([my_date])/3; INT(MONTH([my_date])/3)+1) & "].[" & IF(MONTH([my_date])=1;"Jan";IF(MONTH([my_date])=2;"Feb";IF(MONTH([my_date])=3;"Mar";IF(MONTH([my_date])=4;"Apr";IF(MONTH([my_date])=5;"May";IF(MONTH([my_date])=6;"Jun";IF(MONTH([my_date])=7;"Jul";IF(MONTH([my_date])=8;"Aug";IF(MONTH([my_date])=9;"Sep";IF(MONTH([my_date])=10;"Oct";IF(MONTH([my_date])=11;"Nov";"Dec"))))))))))) & "].[" & DAY([my_date]) & "]")

So now that we are sure that the formula is working, we can go back to our parameter and insert this whole formula into the Post-Processing Formula field. Note: You have to pass the formula itself, and not reference the open formula function!

Now add your data source and include the parameter. 

In my case, the query looks like this:

Parameter("my_date",[Login Date.Monthly Calendar],[Login Date.Monthly Calendar].[Monthly Calendar All].[2009].[Q3].[Jul].[4]) ON ROWS,
[Measures].[Sessions] ON COLUMNS

That's it. I am happy that this finally works within PRD! 

The below info was kindly provided by Thomas (His blog you can find here)

More info

The following info was taken from Thomas Morgner's blog

After a million or so complaints about the complexity of building parametrized queries with the PARAMETER function within MDX, today I indulged to the demands of the crowd.

Beginning with Milestone 2 of the Pentaho Report-Designer 3.6,we now support parameter injection via the${parameter} syntax. Parameter values injected via that syntax will not be checked in any way, so it is the designer's responsibility to ensure that everything is quoted correctly to cause no harm or to break the query. With great powers comes great responsibility. 

The ${parameter} syntax for MDX is not just a toString() conversion. It follows the MessageFormat syntax and thus allows to format Date and Number objects properly before inserting them into the MDX query. An extended format rule allows to produce quoted MDX-string literals by specifying the subformat <string. These strings start and end with a double-quote and all double-quote characters found in the original string get escaped according to the MDX grammar.

So now I can finally answer the question on how to parametrize a Date-Axis from a Date-parameter. To produce a member string like [2009].[10].[4] from a parameter called dateparam use[${dateparam,date,"yyyy"}].[${dateparam,date,"MM"}].[${dateparam,date,"dd"}] in your MDX query.

I still haven't found out how to do the same with the PARAMETER function.

Support for the PARAMETER function will remain there (as in theory it is a good idea to have prepared/explicit parameter).

You can test this functionality with either the latest CI build or with the upcoming Milestone 2 of the Report-Designer 3.6.

This part was originally published by Thomas Morgner on the Mondrian mailing list:

String parameters are passed into Mondrian without any postprocessing

by the reporting engine (if the parameter is a java.lang.String, otherwise we will throw a very visible exception). You are using a StrToMember which probably does not like the "{..}" syntax of the sets. Try a StrToSet instead, if you intend to pass a set.

You can also skip the parameter function and use the classical ${...} parametrization syntax (PRD 3.6 or newer), which performs a simple replacement on the MDX string. To prevent code injections you can specify a subformat along with those parameters: ${param} will include the parameter without any modification.

${param,string} will quote the parameter value as MDX string (slap double quotes around it and escape all inner double quote chars)

${param,formattype,formatstyle} will treat the parameter as a message format request, so that you can format numbers, dates and so on properly.

For the grammar of formattype and formatstyle have a look at

How to create a parameter for a set (posted by Bill W.): 

WITH SET [selectedset] AS ${CSVClient}
select [selectedset] ON COLUMNS,
 {[Measures].[Count of Employees], [Measures].[Count of Declined], [Measures].[Count No Record], [Measures].[Count Requested But No Vaccination Record], [Measures].[Count Vaccinated], [Measures].[Count Vaccinated Elsewhere], [Measures].[% Vaccinated], [Measures].[% Declined], [Measures].[% No Record]} ON ROWS
from [Monthly Influenza Summary]
where ([Monthly Period].[${SelectedYear}])

I am using PRD parameters to get the set of customers in a multi-select, then I turn that set into a formatted array which I use in the query. The basics are there now.


  • also uses the ${param} syntax to have parameters. (Same syntax as JDBC)

Reflection/Java Method Call

  • parameters are given as plain names, no special encoding needed.
  • parameter are separated by comma
  • if the parameter-name itself contains a comma or double-quotes, quote the parameter-name by
    • enclosing it in double quotes
    • any inner double-quote character must be quoted by another double quote. (Parameter name : Joe"A" becomes "Joe""A""")

Pentaho Data Integration (Kettle)

You can only use a Kettle (PDI) data source in version that was released after Pentaho Report Designer 3.5. 
  • Parameter are declared explicitly
  • PDI only accepts strings, so if the parameter is no string, then it will be converted via "toString()"
  • KTR file needs to declare the parameter and arguments


  • Scripts have a "dataRow" variable that implements org.pentaho.reporting.engine.classic.core.DataRow.
  • Scripts also have access to 
    • "configuration" (org.pentaho.reporting.libraries.base.config.Configuration)
    • "resourceBundleFactory" (org.pentaho.reporting.engine.classic.core.ResourceBundleFactory)
    • "resourceManager" (org.pentaho.reporting.libraries.resourceloader.ResourceManager)
    • "contextKey" (org.pentaho.reporting.libraries.resourceloader.ResourceKey)

This article would not have been possible without the help of various members in the Pentaho forum, which you can find here.

Referencing parameters in the message field

Using parameters in message fields works the following way:

Year: $(parameter)

If you want to format the parameter in a special way, you can specify the format like this:

Year: $(parameter, number, $ #,###.##)

Tuesday, October 27, 2009

Send Pentaho Report By Email

Send Pentaho Report By Email

The Pentaho Report Designer 3.5 brought along some important improvements. If you set up Xactions before that sent out reports by email, there are some slight changes now that you want to be aware of.

In my case I prepared a report that uses a MDX query as source, which automatically selects the last 5 days of data. The report doesn't use an input, so it is fairly easy to integrate into an Xaction.

My idea is to have basically a simplified version of the report in the email body and the full report with a chart as attachment. For the report, that I intend to have in the email body, I have to make sure that the CSS is not an external file. You can do this by going to File/Configururation in the Report Designer 3.5 and setting InlineStyle to true in output-table-html.

I set up following steps in the Xaction (Pentaho Report Designer 3.5):
  • Message template for email subject name. Define an output name.
  • Message template for file attachment name. Define an output name.
  • Message template for email text. Define an output name.
  • Pentaho Report XLS: In this step we create the Excel file for the attachment. Tag "Use Content Repository". Define "Report XLS" as output-name. Make sure you set the output-format to application/
  • Pentaho Report HTML: Set to output-format to text/html. Tag "Use Content Repository". We will use this output for the email html body. Define "Report-HTML" as output-name.
  • Now drag and drop both Report-HTML and Report-XLS into the output folder in the report-outputs area. For each of them, add a output destination. At this point it doesn't matter which destination you choose.
  • Now we have to edit the XML directly, hence change to the XML view. Look for the outputs that you defined earlier, and if you defined i.e. a file destination beforehand, then replace the <file></file> by <contentrepo></contentrepo>.
  • Now add the email step. Fill out the fields and reference the parameters that we define earlier on where applicable. In example: Add a parameter to HTML message: "Report-HTML".

Friday, October 9, 2009

Current date function on Mondrian

Current date function on Mondrian
Pentaho BI Suite - Mondrian

I create quite a lot of operational dashboard views like "How many users did we acquire through which kind of channels in the last 7 days on a daily level?". In the past my solution was to write some Javascript within an xaction to define the date variables and then to pass on these variables as parameters to an MDX query. The results set of the MDX query would then be passed on to a JPivot component. While you can fairly easily implement this in an xaction, it is probably not the most convenient way to do this.
Ideally your OLAP server would offer this kind of functionality and as I only found out recently, Mondrian has actually been supporting this for a while. Mondrian offers a function called CurrentDateMember() that basically gets the current date for you and maps it to your time dimension. As everybody's time dimension is not set up the same way, CurrentDateMember() solves the problem by providing a formatting string, that can do the mapping for you.
But it doesn't stop there: In case you have a time dimension that is less than daily, you can specify BEFORE, AFTER or EXACT as the last argument of the function and it will figure out your time dimension details respectively.

So in a glance CurrentDateMember() has following arguments:
  1. your time dimension, i.e. [Login Time]
  2. the mapping format, i.e. '["Login Time"]\.[yyyy]\.["Q"q]\.[m]\.[d]', which will resolve to [Login Time].[2009].[Q4].[10].[7]
  3. mapping method (optional): BEFORE, AFTER or EXACT

The mapping format uses the VB format syntax, which is used for MDX in general as well. As you can see in the example above, you have to use \\ before dots and quotes are need to escape the formatting process.

Formatting Strings as used by Mondrian (which differs in some cases to the ones mentioned in the VB Docu):

yyyy Year
q Quarter
m Month
y Day of year d Day
w Weekday ww Week
h Hour
n Minute s Second

By using CurrentDateMember(argument1, argument2).Lag(3) you can go 3 siblings back (in example it could give you the date of 3 days ago). If you want to go 3 days ahead in time, specify Lag(-3).

Please find some working examples below:

Mapping [2009-10-12] date format:

select {CurrentDateMember([Login Date], '[yyyy-mm-dd]').Lag(3.0)} ON COLUMNS,
{[Login Channel].[All Login Channels]}  ON ROWS
from [Logins]
{[Measures].[Distinct Users]} For some reason, although the documentation clearly states "MM" for
month number with a leading 0, only "mm" will work. It is a good idea
to check the mondrian.log in case you experience errors, as you will
see there if the translation works (i.e.: if the Mondrian log shows
[2009-MM-09], you know that the month number was not translated).

The above example was for a dimension with one hierarchy only. Please find below an example with a dimension with more than one hierarchy:

CurrentDateMember([Date].[Date],'[Date]\.[Date]\.[yyyy-mm-dd]').Lag(357.0) ON COLUMNS,
NON EMPTY {Hierarchize({[Measures].[Sales]})} ON ROWS
FROM [Sales]

Mapping [2009].[4].[October].[12] date format:
select {CurrentDateMember([Login Time Monthly], '[yyyy]\.[q]\.[mmmm]\.[d]').Lag(3)} ON COLUMNS,
{[Login Channel].[All Login Channels]}  ON ROWS
from [Logins]
{[Measures].[Distinct Users]} Mapping [2009].[41].[6] date format:

select {CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]\.[w]').Lag(3)} ON COLUMNS,
{[Login Channel].[All Login Channels]}  ON ROWS
from [Logins]
{[Measures].[Distinct Users]}

So when should you now mention the dimension in the formating string? If you avoid mentioning it, Mondrian will have to search for the existence of the various members within all the dimensions, which takes processing time. If you have only one date dimension, then you could theoretically not mention the dimension string, if you have more, then there is no way around it.

Imagine now, we want to have a monthly and weekly summary of the last 6 periods. So how do we approach this?

Last 6 weeks:select {
CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]').Lag(6) :
CurrentDateMember([Login Time Weekly], '["Login Time Weekly"]\.[yyyy]\.[ww]')
{[Login Channel].[All Login Channels]}  ON ROWS
from [Logins]
{[Measures].[Distinct Users]}

Please keep in mind that here our time dimension looks like this: [year].[week].[day]

Last 6 months:

select {
CurrentDateMember([Login Time Monthly], '["Login Time Monthly"]\.[yyyy]\.[q]\.[mmmm]').Lag(6) :
CurrentDateMember([Login Time Monthly], '["Login Time Monthly"]\.[yyyy]\.[q]\.[mmmm]')
{[Login Channel].[All Login Channels]}  ON ROWS
from [Logins]
{[Measures].[Distinct Users]}

Please keep in mind that here our time dimension looks like this: [year].[quarter].[month].[day]

I hope that this tutorial showed you the power of CurrentDateTime(). It is a very useful function, especially if you have to do a lot of analysis across time. It tooks me quite some time to use this function correctly (especially as there are not many examples), so I hope you can implement it within 5 minutes.

One final example: The VB format string for day of week is w. This will return 1 for Sunday - which is based on the US week format. So for those living in Europe, where the week starts on a Monday, how do you handle this? Just add 1 to the lag function:

CurrentDateMember([Date.Weekly Calendar],'["Date.Weekly Calendar"]\.[yyyy]\.[ww]\.[w]').Lag(3+1)

Related info:

Wednesday, October 7, 2009

How to change the Pentaho Login Screen

How remove the sample users from the login box
Pentaho BI Server - Configuration

Have you ever wondered how to get rid of the sample users in the Pentaho BI Server login box? Tom Barber pointed out a simple solution recently on the Pentaho forum that was originally posted by Paul Stoellenberger:

You can edit the login properties of mantle in:

there is an option:

change that to

That's easy enough, if you know it :)

Thursday, October 1, 2009

Converting binary data type to string in...

Converting binary data type to String in Kettle

Sometimes you might come across data that is store in in binary form in a table. To convert this data you chose one of the following approaches:

Convert directly using SQL in the database input step

One quick method would be to use the CONVERT or CAST function (test which one works best for you):

CONVERT(prodCode USING latin1) AS prodCode

CAST(prodCode AS SIGNED) AS prodCode

Select Values ... step

Go to the Meta-data tab, choose your binary field and set Binary to Normal to Y.
Note: You can only use one tab at a time in the Select Values ... step!

Modified Java Script Value Step

If you have to import the binary data into Kettle, you can use this approach:

var string = new, "UTF-8");

Setting up an "All" Parameter

Setting up an "All" Parameter

This short tutorial is fairly similar to the last one, let's say it's an improved version:
We work again with Pentaho Report Designer 3.5. Imagine you want to give your users the possibility to choose either one of the parameter values or all. So how can we implement this with a MySQL query? 

The approach is as follows:

Set up the query to for the parameter values like this:

"All Countries" AS country_name

We do the same for the channels query. Please keep in mind that UNION will remove any duplicates. If you are sure that you have no duplicates, you can use UNION ALL to improve the performance.

Set up a new parameter called "country_name", tick "mandatory", choose the above query as the source, set the type to string.

Now change the main query that feeds the report like this:

(cc.channel_name = ${channel_name} OR "All Channels" = ${channel_name}) AND
(country_name = ${country_name} OR "All Countries" = ${country_name}) AND
the_date >= ${start_date} AND
the_date <= ${end_date}
GROUP BY 1,2,3,4

Save everything and run the report ... you should see the all options now in your pull down menu:

As some of you might use Oracle as well, a user named "zulu" from the pentaho forum pointed out that:
"Not sure if this helps you now, but depending on your SQL dialect, a
NULL (meaning nothing) is treated differently to a "NULL" string.

In Oracle your predicate could be:
WHERE (${media} IS NULL OR media=${media}).
Oracle applies "lazy" logic, so if your Media parameter is not
completed by the user (meaning it "IS NULL"), the condition will not
even check the "media=${media}" part.

Just a remark from my side: My original post included a query like this one "(media = ${media} OR 'All Media' = ${media})"
You can find the original post here.

Update 2012-05-30: Somebody asked me how to achieve the same with mulitselect parameters. Here is the approach that works for me (It's a different dataset, but you get the idea):

(city IN (${CITY}) OR "All cities" IN (${CITY}))

Note: I defined the CITY parameter as type STRING.

Tuesday, September 29, 2009

Pentaho Report Designer 3.5: Not mand...

How to set up not mandatory parameters

When creating parameters in Pentaho Report Designer 3.5, you can definie if the paramater is mandatory or not.

Imagine we have a query like this one to feed our report:






channel = ${channel} AND

media = ${media}


I define two parameters for this report: channel and media. The end user will be able to choose a value for these parameters once they call the report. In the query you define the parameter like this: ${parameter_name}. This is fairly easy so far, but how do we configure the report if we want to allow the users to choose either one media value or non (=showing all media types)?

This is not difficult to implement, if you know how to do it. Basically, you have to change your query to this:






channel = ${channel} AND

(media = ${media} OR 0 = IFNULL(${media},0)


Basically, if the user doesn't choose any value for media, the media value will be set to NULL (as in "no value"). So I changed the query to include "(media = ${media} OR 0 = IFNULL(${media},0)", which basically says: If there is a media value, apply it, if not, return all the media types (=no filter). So depending on which part is TRUE, the query result will change!

Now this is a fairly wonderful way to implement this. The only problems are that the Null value/No filter will be only available when you call the report in the first go, once you changed the Media to "TV" in example, the drop down menu (or whatever display type you chose) doens't offer a NULL value any more. Also, it would be great to be able to rename the NULL value for the end user individually, in example to "All Media Types".

Monday, September 28, 2009

Pentaho Kettle: Using Full Outer Join

Today I came across a special use case for an outer join. I was planning to set up an ETL process that imports yesterday's revenue summary by summarizing the input of two tables.

The main problem was, that we might not have made any revenue yesterday at all, or there might be only revenue in one table but not the other.

Basically, if revenue figures exist in both tables, the process would be following:
- input data from table 1: date, service name, revenue_cars
- input data from table 2: date, service name, revenue_carparts

- join the tables, so that the output looks like: date, service name, revenue_cars, revenue_carparts
The problem is, as mentioned, that not every day data will be available in both tables. So I would have to implement a logic like this one:

  • IF input data table 1 NOT EMPTY AND input data table 2 EMPTY THEN output only data of table 1.
  • IF input data table 2 NOT EMPTY AND input data table 1 EMPTY THEN output only data of table 2.
  • IF input data table 1 NOT EMPTY AND input data table 2 NOT EMPTY THEN JOIN everything and output it.

All this can be achieved by using a FULL OUTER JOIN.

So your process in Kettle should be as follows:

  1. Drag and drop two "table input" steps into the working area and define the settings. Our query looks like this:For table1:
    COUNT(*) AS count
    GROUP BY 1,2
    For table2:
    COUNT(*) AS count,
    SUM(rev) AS rev_sum
    GROUP BY 1,2
  2. For each of them add a sort step and connect them with the table input. Make sure you sort the relevant fields in ascending order (in our case this is date and service_name).
  3. Drag and drop a Join step on the working area, connect them with the two sort steps and then define the two sort steps as input.

In this screenshot you can also see a Group By Step. This one is not in use at all. But in case you want to check if there is any output of this step, the group by step has an option called "Always give back a result row", which, when ticked, will set the aggregation to 0 (Zero). This might be useful in some cases. This is different to the Detect Empty Stream step, which only gives back a completely empty row in any case.

Ok, so now we have the FULL OUTER JOIN working. Have a look at the output by exporting it into a text or Excel file. You will see that in case both result sets didn't match, the number fields (count and rev_sum) are blank. As we want to use these fields later on in calculations, we have to assign a default value in case the number field is null.

Insert a If field value is null step, connect it to the previous step and double click on it afterwards. Tick "Select fields" and in the fields table add  count and rev_sum and assign a default value of 0 (Zero) to them.

Another thing that you will also have recognized by looking at the output is that we have now two additional fields: service_name_1 and date_1.

So now we have to do a check. In case there are matching values in both results, both

  • date and data_1 and
  • service_name and service_name_1
will be populated. If it is not in both, only one of them will be populated. So we are going to create a new variable called service_name_new and date_new that are going to replace the old ones in a "Modified Java Script Value" Step. Drag and drop this step on the working area and connect it with the Merge Join Step. Douple click on the Modified Java Script Value  Step and insert following script plus tick "Compatibility Mode":

if (service_name.isNull()) {
var service_name_new=service_name_1;
var service_name_new=service_name;

if (date.isNull()) {
var date_new=date_1;
var date_new=date;

Add service_name_new and date_new as output fields and define the type. Then add a Select values Step and add all the fields, but not service_name_1, service_name, date, date_1. Add another step to export it into Excel in example. Our whole process should look like this now (please ignore that some connection have the copy sign, this will not be the case with your process):

In this small tutorial we have now set up a nicely working FULL OUTER JOIN. I hope that you can implement this in your ETL processes now without any problems.

Saturday, September 26, 2009

Full Review of "Pentaho Reporting 3.5 for Java Developers"

This is the first book available that describes the functionality of the Pentaho Report Designer in detail. Will Gorman mainly focused on integrating the reports in Java applications and not so much on using it in combination with the Pentaho BI Server.
Overall, it is an excellent book. It guides you through various tutorials on how to create more complex reports. All the tutorials are fairly easy to follow. Some chapters of the book give a very detailled description of the available functionality and settings, which serves as an excellent reference.
Conclusion: I really enjoyed reading this book. From my point of view Will Gorman provides an interesting overview of Pentaho Reporting. If you are one of those users that are not familiar with Java and just want to get a general overview of Pentaho Reporting, then this book will also be a good ressource as well (just skip some of the pages then that describe the Java integration).
I hope that there will be a Part II of this book, as I certainly would have liked to know more about adding OLAP and Metadata sources to the report and parameterizing them.

Wednesday, September 23, 2009

New books arrived

After coming back from my business trip to South Africa I found a package of new books on my office desk. I finally received a copy of "Pentaho Reporting 3.5 for Java Developers", "Pentaho Solutions" and "The Data Warehouse Toolkit - The Complete Guide To Dimensional Modeling".
I think I am going to start with the Pentaho Reporting book, so that I can post a review here quite soon. It was a tough choice, as all the books sound very interesting.

Thursday, September 17, 2009

Review "Pentaho Reporting 3.5 for Java Developers" Books Chapter 6

Packt Publishing Ltd sent me yesterday the chapter 6 "Including Charts and Graphics in Reports" of the recently published "Pentaho Reporting 3.5 for Java Developers" Book to review. Excited as I was, I had to read it on the same day!
"Pentaho Reporting 3.5 for Java Developers" was written by Will Gorman, a long time Pentaho developer. I found chapter 6 easy to read, so I would say it is ideal for users that are new to Pentaho Reporting as well as those who have already a bit of experience with it. It's also great that the book covers the very latest version of the software, which brings a lot of major improvements.
The book is not only a valuable reference but offers easy-to-follow examples as well. It is definitely worth the investment, as it will save you hours of searching for the right info in forums, blogs and the wiki, which can sometimes be a bit frustrating.
Now let's have a look at chapter 6: Will kicks off by introducing all the different types of data sources (category, x-y, x-y-z and times series datasets, ...) that can be used in Pentaho Reporting and offers nice examples.
Next, all the chart rendering properties are discussed in detail, which serves as a great reference. This is followed by an introduction to all the various chart types that are available in Pentaho Reporting. In the first example you create a new report and load data from a HSQLDB, which you use as a data source for an area chart. Will speaks you through the chart properties that have to be set.
Bar, line, pie, ring, multipie, waterfall, bar-line, bubble, scatter, xy, scatter and radar charts are discussed next. Included is also a very nice example of a pie chart that resets itself with each group header (very useful!). The bubble chart, which uses 3 dimensions, is also very interesting.
The last page of this chapter describes how to insert static and dynamic images into your report.

Overall, I found this chapter very easy to read. It was a great idea to include examples as well and not only to concentrate on a description of the properties. All in all, I can strongly recommend this book based on this chapter and I am looking forward to reviewing the whole book, which should arrive in my post box next week. By the way, you can already buy the book here.

Wednesday, September 16, 2009

Review coming soon ...

Today I received a nice email from Packt Publishing Ltd asking me if I wanted to write a review about the new Pentaho Reporting 3.5 for Java Developers Book.
I've quite some books to read currently about OLAP and Data Warehouse Design, but I'll try to speed up to provide a review about this exciting new book in the next few months. So stay tuned!

Saturday, September 12, 2009

New Books on Pentaho

There hasn't been much literature around about Pentaho, at least not any that I know of. So far most people would search for info in the pentaho forum or wiki. But recently things have changed. There is now whole book about setting up Pentaho with MySQL called "Pentaho Solutions", which you can order from Amazon. This one tries to cover basically every component of Pentaho, which should be a great support for first time users.
Another book was published recently as well, called "Pentaho Reporting 3.5 for Java Developers" by Will Gorman, a long term Pentaho developer. This book will provide you everything you need to know about Pentaho Reporting 3.5. I've already ordered this books and I am quite looking forward to read them!

Friday, July 24, 2009

The 10 Seconds Pentaho Metadata Editor Tutorial

Quick Step By Step Guide

You can find the main Pentaho Wiki doc here.
This is an updated (2013-08-19) and extended version of the original tutorial which I posted here a few years ago. It will take you more than 10 seconds now to read through it, but instructions should be more precise now than before.

Specifying JNDI and adding the JDBC driver

  1. Define a JDNI Connection. You have to use the same JNDI connection as you specified on the BI Server (in the Pentaho Administration Console).The JDBC details can be specified in the file in the following directory:<pme-root-dir>/simple-jndi

    Add the following (amend to your requirements):
    The first part before the forward slash is the JDNI name (so in this case dwh).
  2. Check if the required JDBC driver is installed under<pme-root-dir>/libext/JDBC
    If not, download the JDBC driver from your vendor’s website and copy the jar file into this folder.

Importing the physical tables

  1. Start PME by running the following command in the PME root directory:sh ./
  2. Click on File > Save and specify a good name for your metadata model.
  3. Right click on Connections on the top left hand side and choose New Connection .... Define your connection details, make sure you choose JNDI in the Access selection and specify the same JNDI name in the Settings section as you originally specified in the file.
    Some interesting options are found in the
    Advanced section: If you work with a database like PostgreSQL or Oracle that support proper schemata, you can define the default schema to use here. Also, if you have columns of type boolean you can also enable support for them:

    After entering all the details, click the
    Test button to make sure that your connection details are correct. Once you get a successful return message, click OK.
  4. Right click on the database connection you used created choose Import Tables:
  5. Expand the database node (dwh in the screenshot below) so that you can see all the imported tables:
  6. Specify table properties: Double click on each table to specify various settings. In example specify if it is a fact or dimensional table. For measures configure as well the aggregation type. If you want to add calculated fields, it's time to do so now: Double click on the respective table. Once the window is open, click on the + icon. Give the field a new name (i.e. PC_Amount_of_users). Define the aggregation type (i.e. Count). Define Data Type. If you don't know the length and precision yet, set it to -1. Define the formula (if you have just a simple count or sum, then only write the name of the column in there). That's it. (Field type can stay on "other").
    In the formula field you can use database specific functions as well (i.e. "YEAR(date)"). In this case you have to click on "Is formula exact?".
    You can add other properties like text alignment or date mask by clicking on the + icon.

Understanding Is the Formula Exact?

You can create add columns which are based on native SQL fragments, in example:
((CURRENT_DATE - start_date)/30)::int + 1
You specify this in the Formula Value field:
If you tick Is the Formula Exact? this basically means that the PME engine will not try to interpret this SQL fragment but instead push it directly to the database.

The disadvantage of this approach is that you might end up using functions which are specific to your database, so the model will not be that easily portable to other DBs (in case you ever have to migrate it).

Another common use case is to add a measure in case your raw data table doesn’t have one:

Defining a business model

  1. Right click on Business Model and select New Business Model. Name it etc.
  2. Drag and drop the tables onto the main working area.
  3. Double click on the table and go to Model Descriptor. If the type is incorrectly set (or not applicable for this model) click on the overwrite icon and define the respective table type (fact or dimension). Click OK.
  4. In order to create relationships between tables, select the two tables while pressing down the CTRL key and then right click on the last table and choose New Relationship:
    Another way to do this, although not that convenient, is to right click on the work area and choose
    New Relationship:

Create a business view

  1. Once the business tables and relationships are established, we can create the business view. Right click on Business View and select New Category. An easier way to do this is to choose Tools > Manage Categories (or right click the Category Editor icon in the toolbar). This will bring up the Category Editor dialog: Just click the + icon to add new categories. Define Categories, i.e. Date, Measures, Countries etc. Categories are basically buckets that help you organize the various business columns.
  2. Next we want to assign business columns to each category. If you created your categories in the tree view, right click on Business View and choose Manage Categories. Once in the Category Editor, use the arrows to move the fields into the categories.

Testing the metadata model

Now that the main metadata model is defined ... it is time to test the model. Click on the Query Builder icon in the toolbar and run some test queries. You can check the generated SQL by clicking on the SQL icon.

Publish the metadata model to the Pentaho BI Server

If testing is successful, publish the model to the BI server (Click on File > Publish To Server ...). The final metadata model is saved as an XMI file. On the BI Server, there can be only one XMI file per solution folder. Make sure that  

  • You have an account on the BI server
  • The publisher password on the BI Server is set up and you know it.
  • You know the name of the solution folder that you should publish the model to.
  • The URL to publish to is something like http://localhost:8080/pentaho/RepositoryFilePublisher
    Make sure you have
    RepositoryFilePublisher at the end of this URL!

Tips and tricks

Make use of Concepts

In the toolbar you can find the Concept Editor icon. Concepts are pretty much like CSS style definitions. Concepts can be assigned to each data point and are used in the final reports as default formats. One of the most important properties is probably Mask for Number or Date, which allows you to enter standard formatting strings (e.g. #,###.00).

To assign a Concept simply right click on the data point and choose Assign Parent Concept.

Referencing the same table more than once

In case one dimensional table is referenced more than once by your fact table, just drop the dimensional table several times into the business view and rename each of them. Then create separate relationships for all of them.

How to create formulas

Take a look at Pentaho Wiki for an introduction.

How to implement data security

  1. Click on your business model. Go to Tools > Security and import the roles from the Pentaho BI Server by entering following URL (amend if necessary):http://localhost:8080/pentaho/ServiceAction
    This will allow you to restrict the data for certain roles. If the connection works ok, you will see an XML extract of the roles definition.
  2. Go to your Business Model (this is one hierarchy below Business Models and has a brown briefcase symbol next to it) and right click, choose Edit. It is important that this is implemented on this level as otherwise it won't work.
  3. In the Metadata Security section add all the users/groups that you want to allow access. Assign the Update right to users that can save and edit their ad-hoc reports.
In the Data Constraint section add the users/groups and specify their access rights. If you want to restrict the access to a certain value of a specific column (row level security), specify it as follows: In this example we restrict the access to the country ITALY:[BT_SN_LT_COUNTRIES_SN_LT_COUNTRIES.BC_SN_LT_COUNTRIES_COUNTRY_NAME]="ITALY"
Also, if one user/group should have access to everything, you have to set the constraint to TRUE().