Diethard Steiner on Business Intelligence: 2012

Tuesday, November 20, 2012

Database Version Control: Sooner than later! (NeXtep)

Working in business intelligence usually involves creating schemas for at least the data warehouse and data marts. Usually the DDL is written using a text editor or some kind of GUI, svn or git or similar is used as standard version control, but effective versioning of this ddl is quite often not happening. Why do I say “effective” versioning - well in the case of database development you will need a tool that can generate the DDL for the delta of two versions, so that you can easily upgrade your databases.
If you are working in a project which involves properly set up development, testing and production environments, database version control should be mandatory. The good news is that there are actually some open source tools out there which can help you with that. The bad news is, that none of them can be considered in active development and none of them probably found the recognition and widespread usage that their original developers set out for.
About a year ago I tried to find some open source projects for database version control, the two tools I could find at that time were dbmaintain and neXtep. Judging by their release history one might think that these projects are dormant. So I kind of gave up on this topic. This year then at the Pentaho User Meeting in Amsterdam Edwin Weber actually mentioned neXtep in his presentation and I had a brief conversation with him afterwards about his experience with this particular tool. He mentioned that although there were no new releases, there were still patches submitted for that tool. So I kind of thought, let’s give this a try. I finally came around to take a look at neXtep just a week ago.
After registering you can download the neXtep Studio, which is Eclipse based. Setting it up is fairly straightforward and the interface is fairly intuitive as well. What I was not expecting is that the documentation was that good. It seems like the people behind this project spent really a considerable amount of time in creating what I would call an excellent documentation, which you can find here.
There is also a forum where I even got some answers to my questions! If someone fancies adding support for additional databases, some documentation can be found here and here.
The aim of this post is not to walk you through the tool, as their Wiki really is the best place to find that insight, it is just a way of saying: “Hey, there is this impressive tool out there for database version control, start using it and give them some support!”.
I think, that, although it is very often neglected, database version control is a very important topic. Version your database sooner than later!

Some notes on the installation:
I had some problems with setting up the neXtep repository on

PostgreSQL 9.2: Two columns could not be created. See here on how to solve this problem.
MySQL 5.5: The installer started hanging when performing the final upgrade of the repository. I just killed the process and started neXtep Studio again after which it successfully upgraded the repository. I couldn’t create any user then, but luckily (in this case) the password is stored as plain text in the db table. Run:
SELECT * FROM nextep_repo.REP_USERS;

How to use neXtep’s database version control

This is an extremely simplified setup: Let’s imagine we already have one table in our PostgreSQL database (users with MySQL, MS SQL or Oracle databases please adjust the SQL respectively):

CREATE DATABASE nextep;
\connect nextep
CREATE SCHEMA test;
CREATE TABLE
test.user
(
firstname VARCHAR(50)
, lastname VARCHAR(50)
)
;

We decide to start with database version control, because things are going to get much more complex pretty soon once the project starts rolling.
On the terminal (in my case on Ubuntu) navigate to the neXtep install directory and start the Studio (this takes into account that you have already set up the neXtep repository and users):

./neXtep

When the Shared Workspace selection dialog comes up, choose New Workspace. Then provide a name, description and select PostgreSQL as database vendor. Make sure you then select Import structure from an existing database (reverse engineering):

Now provide all the database connection details and test the connection:

Click Next. You will be asked to for the DB password again (make sure you tick to save it this time).
Now pay special attention to the Version Navigator on the left hand side of the Studio:

Right click on the module, the second element in the hierarchy, Nextep Test, and choose commit:

In this case we will not change the version number (but we will do so in future). Add a good description and then click commit. Now you will see that module as well as the table name have a lock symbol attached.

Next mark the table name and click on the Check Out button (or alternatively right click on the table name and choose Check Out):

A new dialog comes up, called Problem Solver Wizard, just click Finish. Next provide a good description of the changes you are about to make. You will see that neXtep automatically increase the patch number, which is just what we want in this case. We also want to work on the main branch:

Click Ok.
Notice that the lock symbols disappeared in the Version Navigator and that the version number increased.

In the Version Navigator double click on the user table. You will now see a new Table Overview coming up. Let’s now add a new column called user_id. Click on the Columns tab at the bottom of the Table Overview, then click on Add:

Add following columns:

userid: SERIAL

employeeno: VARCHAR(20); description: alpanumeric code

Next move the userid column up so that it is the first column in the table.

Click on the Primary Keys tab and a new primary key called PK_userid and assign the userid column:

We also want to pay attention on the error message neXtep brought up: We forgot to define the userid column as NOT NULL, so we go right back to the Columns tab and change the userid accordingly:

As we have implemented all the required changes, we are ready to commit them. Click on the Commit and lock this version button. You have now one more time the chance to adjust the description, then click commit.
Note that the version number increased again:

Then click on the Synchronize DB button:

This opens the Synchronization perspective.
The Synchronization pane gives you an exact overview of the changed since the last applied version:

It highlights that we added the employeeno and userid columns as well as a primary key called PK_userid.
In the main work area you can see the DDL neXtep generated to upgrade our target database from version 1.0.0.0 to 1.0.0.1:

As this looks all quite fine for us, we click on the Execute button:

And all these changes will be applied to our target database. Watch the Console view for any progress messages.

It’s always good practice to click the Synchronize button again just to make sure that our model matches the DB model. As we can see in our case, something is not quite alright:

What basically happened is that PostgreSQL created its own sequence for the userid and also changed the data type to Integer. This is fine for us, so we can just incorporate these changes. Click on the Repository reverse synchronization from database information button and then click Execute:

Now we want to ensure that everything is properly in sync again, so we click the Synchronize database button again. Now everything is fine (click the Shows/Hides unchanged items button as well):

Go back to the Database Design perspective and have a look if the changes were applied. We can see that a new sequence was added (in the Version Navigator)

and that the userid column details changed:

To be fair, we could have set it up like this in the first place, but now at least you got a bit more exposed to the synchronization options.

How to create a branch

This is now our first release, which will be delivered to the test environment. Let’s create a new branch test to commit these changes there:
Click on the module Nextep Test and choose Commit. Provide an activity description and click on Iteration. Then click on new and create a Test branch:

Click Next followed by Finish.

Note that the version tree gets automatically updated:

We could now package and deliver this model (available via a right click action on the module name). neXtep also supplies a command line tool called neXtep Installer (more info here) which is able to deploy database deliveries generated by the neXtep Studio.

How to merge

Obviously we also want to have all these improvements in the main branch, so we can use the merge action. First we have to go back to the last version of the MAIN trunk. We do this by creating a new workspace.

Choose Workspace > Create new workspace and make sure you select the Explicitly define the versions of the modules option.

In the View rules definition wizard choose the latest version of the main trunk and click Add module to view:

Click Finish.

As you can see in the Version Navigator our new workspace is based on the latest main trunk version:

If you now click on Workspace > Change Workspace you will see that you can now switch between the main and test trunk versions by switching between these two workspaces (ok, we should have named it in a better fashion):

Workspace: Versions are not bound to a particular workspace. A workspace is just a space where you define the set of elements you need to work on, and you can create as many workspaces as you need. [source: neXtep site]

Click on Cancel.

Now we want to merge the latest version of the Test branch with this one, so that we have all the changes over here for a good base to add future developments:

Right click on the module Nextep Test and choose Merge to ..., which brings up the Merge Wizard. Click on the latest version of the Test branch in the diagram:

Click Next and you will see a graphical representation of the changes you requested:

Now you will see a detailed overview of the merge results (note that the screenshot might not 100% match with yours because I added a few other things, but you get the idea):

Click Ok.

Summary

We walked through various scenarios that you might come across in your daily work with database versioning - like creating dedicated branches, merging branches. As you have seen neXtep Studio is an extremely promising open source product! Database versioning is extremely important, so start today with it! You will not look back!

Friday, November 9, 2012

Stylesheets with Pentaho Report Designer

Stylesheets are the second big feature being integrated into PRD this year (the other one being crosstabs). I submitted a feature request for stylesheets some months ago (PRD-3823, PRD3834), simply because working on big projects requires stylesheets: You don’t want to waste hours changing every single report in case your client wants to have a style amended; you don’t want to recreate styles over and over again when you create new reports etc. I did not expect this feature to arrive anytime soon, so, quite surprisingly, as of today they are available in the latest trunk snapshot even with full CSS3 support (download here) (posted by Thomas here)! My very simple sample report can be downloaded here.
It is possible to store styles internally or externally. External stylesheets will have the extension .prptstyle.
To me stylesheets are an import part of report design and I am sure many other users will welcome this new feature. So let’s have a look at how to create and use stylesheets in PRD:

Once you have the trunk snapshot downloaded (note: do not use for production!) open PRD and create a new report, add a query and a few fields:

How to create internal styles

If you want to use internal styles, choose Format > Edit Style Definition.

This option is only useful when you intend to create a specific design for one specific report. I highly recommend you make use of an external stylesheet for bigger projects, which we will discuss next.

How to create an external stylesheet

In this case we want to create an external stylesheet, choose Window > Style Definition Editor.

Make sure you save the external stylesheet straight away and while you are working:

Add a new style rule:

Note: The name of the style-rule is very important, it’s not just an arbitrary name!

Examples:

Style Rule Name	Explanation
label[name="title"]	The style will be automatically applied to all labels with the name title. The title for the label element can be defined in the Attributes pane:
group-header label[name="title"]	This style will only be applied to a label with the name title in the group header.
report-header band[name=”subtitle”] label	This style will only be applied to a label which is located in the band of the subtitle.
details-header label	This style will be applied to all labels in the details header
itemband text-field	This style will be applied to all text fields in the details band
number-field[field="Sales"]	This style will be applied to all number-fields which represent the data field Sales:
#header	This CSS id selector style (more info here) will be applied to a unique element with the id in the Attributes pane set to ‘header’: Note: Do not prefix the id with a # in the attribute value!
.center	This CSS class selector style (more info here) will be applied to multiple elements which have the style-class in the Attributes pane set to center: Note: You must not prefix center with a dot in the attribute value!
.label.center	This CSS class selector style (more info here) will be applied to multiple elements which have the style-class in the Attributes pane set to center and are labels. Note: The attribute value has to be set to only “center”!
report-footer label.notice	This style will be applied to the label element with the class attribute set to notice in the report footer.

As you can see, the style rule name is a combination of various components.

Please find below a list of important report bands:
report-header
page-header
group-header
details-header
itemband
no-data-band
details-footer
page-footer
group-footer
report-footer
watermark

Important report elements:
label
text-field
number-field
date-field
message
band
sub-report

How to reference an external stylesheet

In the Structure pane select Master Report.
In the Attributes pane click the select button for the style-sheet-reference:

Click the Select button again:

Next set the Filter to Resources and then select your stylesheet. Click OK:

How to assign styles to elements

Only if you used CSS class and id selector compliant style rule names, you must specify them
respectively in the style-class and id attributes in the Attributes pane.
Specifying a class:

Specifying an id:

Note: Only specify the name without the dot or hash prefix!

Here a report with various styles applied (not only id and class selectors):

Working with the stylesheet XML file

If you know what you are doing, you can directly open the *.prptstyle file in your favourite text editor and amend it there:

Note: You must be familiar with XML and must understand the structure of the stylesheet. This practice is not recommended for anyone not familiar with it.

Thursday, November 1, 2012

Pentaho Report Designer: How to configure error logging

Configure logging

By default PRD outputs the log to the console and to a file.

The log output file can be found in (Linux):

~/.pentaho/logs/prd.log

To configure the log settings amend the log4j.xml file in the PRD/resources folder. See here for more details.

To get a more detailed logging, change this section

<root>

<appender-ref ref="CONSOLE"/>

<appender-ref ref="FILE"/>

</root>

<root>

<appender-ref ref="CONSOLE"/>

<appender-ref ref="FILE"/>

</root>

Here are all possible values listed (see here for more details):

ALL

TRACE

DEBUG

FATAL

ERROR

INFO

OFF

To follow the log simply run (Usually this is not necessary as on Linux you have the terminal window open from which you started PRD, so the log will be display there):

tail -f ~/.pentaho/logs/prd.log

If you want to adjust the location where the log file is stored simple change following section:

Wednesday, October 31, 2012

Experimenting with new layout features in PRD

Note: This article will cover features that are still in development. We take a first look at them, do not expect them to be feature complete and do not use them in production.

All layouting strategies mentioned in this blog post are available for download here.

To experiment with the new features yourself, download the latest PRD build from http://ci.pentaho.com/

Place data rows next to each other

The standard detail layout places one row below the other:

A new feature allows us to actually place each row next to each other:

Our sample dataset:

We set up Region as a group. Our report looks like this:

The report preview looks like this:
Compare the preview with the dataset above: You will see that a new details row was created for each group, and that the data rows were placed next to each other.
You can implement this the following way:

Simply select the Details Body in the Structure tab and in the Style tab set layout to row:

Make the group header part of the details section

And here is how the preview of the report looks:

And here is how you can implement this layout:

Set the layout type as shown above for each band. Here an example for the Group:
Note: Currently you cannot set the elements within the group header to 100%, but it will be possible in future. I implemented a workaround by using a formula to dynamically increase the size of the header element to whatever height the details band and header are consuming.

Table Layout

Although the table layout has been available in the trunk version of PRD for some time (see Thomas’ original blog post), there has hardly anything been written about it.

The table layout available currently is mainly aimed at driving the crosstabs - nevertheless we can have a look at experimenting a bit with it. There is also no wizard or easy GUI available for to create a table [but there is one for crosstabs], so it is not that straightforward to create a table. Also it helps, if you are familiar with the HTML table layout.

Note also, that although you can create tables via the GUI, it is currently not possible to adjust the column widths individually. You can give the first column a specific width and all the other columns will automatically have the same width applied.

As you see, it is a start of having something similar to a table available within PRD. Current efforts of the PRD development team are mainly focused on finishing the cross tabs functionality and any table specific features will have to come after this. This doesn’t stop you though from submitting requests to jira.pentaho.org.

So now lets have a look how a table can be possibly created (info kindly provided by Thomas):

table-header, table-body and table-footer are all variations of a table-body. You will also need table-rows and table-cells to make it work properly.

I would do it in the following way:
group: table
group-header: table-header
+ subband for table-row
(child elements will be automatically wrapped into a table-cell)
details-body: table-body
itemband: table-row
group-footer: table-body

(table-footer is not implemented. But if it were, then the table requires it to follow directly after the table-header. See HTML standard for reference.)

I managed to get it working the following way:

I added the extra band element with a table-cell layout to get this working, which normally should not be necessary - this a known bug.

The report looks like this in preview:

I hope you enjoyed seeing some new layouting features of PRD in action. Although they are not ready yet for production, it at least gives you some new ideas what you will be able to do in future.