Friday, July 24, 2009

The 10 Seconds Pentaho Metadata Editor Tutorial


Quick Step By Step Guide

You can find the main Pentaho Wiki doc here.
This is an updated (2013-08-19) and extended version of the original tutorial which I posted here a few years ago. It will take you more than 10 seconds now to read through it, but instructions should be more precise now than before.

Specifying JNDI and adding the JDBC driver



  1. Define a JDNI Connection. You have to use the same JNDI connection as you specified on the BI Server (in the Pentaho Administration Console).The JDBC details can be specified in the jdbc.properties file in the following directory:<pme-root-dir>/simple-jndi

    Add the following (amend to your requirements):
    dwh/type=javax.sql.DataSource
    dwh/driver=com.mysql.jdbc.Driver
    dwh/url=jdbc:mysql://localhost:3306/pentaho
    dwh/user=root
    dwh/password=
    The first part before the forward slash is the JDNI name (so in this case dwh).
  2. Check if the required JDBC driver is installed under<pme-root-dir>/libext/JDBC
    If not, download the JDBC driver from your vendor’s website and copy the jar file into this folder.

Importing the physical tables



  1. Start PME by running the following command in the PME root directory:sh ./metadata-editor.sh
  2. Click on File > Save and specify a good name for your metadata model.
  3. Right click on Connections on the top left hand side and choose New Connection .... Define your connection details, make sure you choose JNDI in the Access selection and specify the same JNDI name in the Settings section as you originally specified in the jdbc.properties file.
    Some interesting options are found in the
    Advanced section: If you work with a database like PostgreSQL or Oracle that support proper schemata, you can define the default schema to use here. Also, if you have columns of type boolean you can also enable support for them:

    After entering all the details, click the
    Test button to make sure that your connection details are correct. Once you get a successful return message, click OK.
  4. Right click on the database connection you used created choose Import Tables:
  5. Expand the database node (dwh in the screenshot below) so that you can see all the imported tables:
  6. Specify table properties: Double click on each table to specify various settings. In example specify if it is a fact or dimensional table. For measures configure as well the aggregation type. If you want to add calculated fields, it's time to do so now: Double click on the respective table. Once the window is open, click on the + icon. Give the field a new name (i.e. PC_Amount_of_users). Define the aggregation type (i.e. Count). Define Data Type. If you don't know the length and precision yet, set it to -1. Define the formula (if you have just a simple count or sum, then only write the name of the column in there). That's it. (Field type can stay on "other").
    In the formula field you can use database specific functions as well (i.e. "YEAR(date)"). In this case you have to click on "Is formula exact?".
    You can add other properties like text alignment or date mask by clicking on the + icon.

Understanding Is the Formula Exact?

You can create add columns which are based on native SQL fragments, in example:
((CURRENT_DATE - start_date)/30)::int + 1
You specify this in the Formula Value field:
If you tick Is the Formula Exact? this basically means that the PME engine will not try to interpret this SQL fragment but instead push it directly to the database.

The disadvantage of this approach is that you might end up using functions which are specific to your database, so the model will not be that easily portable to other DBs (in case you ever have to migrate it).

Another common use case is to add a measure in case your raw data table doesn’t have one:

Defining a business model


  1. Right click on Business Model and select New Business Model. Name it etc.
  2. Drag and drop the tables onto the main working area.
  3. Double click on the table and go to Model Descriptor. If the type is incorrectly set (or not applicable for this model) click on the overwrite icon and define the respective table type (fact or dimension). Click OK.
  4. In order to create relationships between tables, select the two tables while pressing down the CTRL key and then right click on the last table and choose New Relationship:
    Another way to do this, although not that convenient, is to right click on the work area and choose
    New Relationship:

Create a business view


  1. Once the business tables and relationships are established, we can create the business view. Right click on Business View and select New Category. An easier way to do this is to choose Tools > Manage Categories (or right click the Category Editor icon in the toolbar). This will bring up the Category Editor dialog: Just click the + icon to add new categories. Define Categories, i.e. Date, Measures, Countries etc. Categories are basically buckets that help you organize the various business columns.
  2. Next we want to assign business columns to each category. If you created your categories in the tree view, right click on Business View and choose Manage Categories. Once in the Category Editor, use the arrows to move the fields into the categories.

Testing the metadata model

Now that the main metadata model is defined ... it is time to test the model. Click on the Query Builder icon in the toolbar and run some test queries. You can check the generated SQL by clicking on the SQL icon.

Publish the metadata model to the Pentaho BI Server

If testing is successful, publish the model to the BI server (Click on File > Publish To Server ...). The final metadata model is saved as an XMI file. On the BI Server, there can be only one XMI file per solution folder. Make sure that  


  • You have an account on the BI server
  • The publisher password on the BI Server is set up and you know it.
  • You know the name of the solution folder that you should publish the model to.
  • The URL to publish to is something like http://localhost:8080/pentaho/RepositoryFilePublisher
    Make sure you have
    RepositoryFilePublisher at the end of this URL!

Tips and tricks

Make use of Concepts

In the toolbar you can find the Concept Editor icon. Concepts are pretty much like CSS style definitions. Concepts can be assigned to each data point and are used in the final reports as default formats. One of the most important properties is probably Mask for Number or Date, which allows you to enter standard formatting strings (e.g. #,###.00).

To assign a Concept simply right click on the data point and choose Assign Parent Concept.

Referencing the same table more than once

In case one dimensional table is referenced more than once by your fact table, just drop the dimensional table several times into the business view and rename each of them. Then create separate relationships for all of them.

How to create formulas

Take a look at Pentaho Wiki for an introduction.

How to implement data security



  1. Click on your business model. Go to Tools > Security and import the roles from the Pentaho BI Server by entering following URL (amend if necessary):http://localhost:8080/pentaho/ServiceAction
    This will allow you to restrict the data for certain roles. If the connection works ok, you will see an XML extract of the roles definition.
  2. Go to your Business Model (this is one hierarchy below Business Models and has a brown briefcase symbol next to it) and right click, choose Edit. It is important that this is implemented on this level as otherwise it won't work.
  3. In the Metadata Security section add all the users/groups that you want to allow access. Assign the Update right to users that can save and edit their ad-hoc reports.
In the Data Constraint section add the users/groups and specify their access rights. If you want to restrict the access to a certain value of a specific column (row level security), specify it as follows: In this example we restrict the access to the country ITALY:[BT_SN_LT_COUNTRIES_SN_LT_COUNTRIES.BC_SN_LT_COUNTRIES_COUNTRY_NAME]="ITALY"
Also, if one user/group should have access to everything, you have to set the constraint to TRUE().

11 comments:

  1. Hi Diethard Steiner

    Thx. for the quick learn. UNfortunately, I am struck at step 15; because, I could not locate the MQL Query builder.

    Can you help? I've a demo tomorrow morning. vonage.naveen@gmail.com

    Thanks Naveen

    ReplyDelete
  2. After completing the business view[s], how does one modify them when changes are required. For example, in adding a derived column to a Fact Table and wishing to change a [SUM] to an [AVERAGE]. The fields to be changed are all greyed out and inaccessible. Does one have to delete the derived column and redo the whole thing?

    ReplyDelete
  3. My apologies but my previous comment regarding modification of business view properties used an inappropriate example regarding [SUM] and [AVERAGE]. This property is available for change. Suppose instead that there was a requirement to change a 'calculation' formula. How do I get to modify this property?

    ReplyDelete
  4. Hi

    Without MQL Query builder, I tried publishing to pentaho BI server which would fail saying 'check userid/password could not publish'

    When I exported as XMI and deployed it under /pentaho-solutions/samples/metadata.xmi... the definition was not picked by BI server on restart...

    Any help?

    thanks
    Naveen

    ReplyDelete
  5. Hi,
    I'm trying to create a new column in the metadata editor calculating only the year for a column with date format to no avail. Is it really possible to use the "YEAR()" function? I'm using "YEAR(C_FECHA)" and it just doesn't work.

    Thanks,
    Eder

    ReplyDelete
  6. Same problem with SQL function as for example DATE() ...

    ReplyDelete
  7. I found this useful comment from another article which has helped me......
    for those who still encounter failures while publishing, watch-out the tomcat logs. in my case, the query was over-passing mysql default tuning capabilities. changing max_allowed_packet to 32m in mysql.cnf solved the problem...
    Comment: Posted by emmanuel delmasure at Jan 04, 2010 06:00

    ReplyDelete
  8. "In case you are using the same dimensional table for more fact tables, just drop the dimensional table several times into the business view and rename each of them. Then create separate relationships for all of them."

    And how do I create the business view if I have two dim tables? Suposse I've a sales table link to a dim_costumer_1, and a visits table link to dim_costumer_2. How do I create the customer category?

    ReplyDelete
  9. Hi: I´m just beginning with to work with metadata editor, although I´ve some experience with kettle and the schema workbench
    I started a small proyect to begin with, imported tables, made the relationships and buisness views with no problems when I tried to export to xmi file. Then I tried to increase the number of tables, and when I imported some of them, and tried to export to xmi, I got the error:

    "javax.jmi.reflect.InvalidObjectException: Object with MOFID 4CB5D5D1-E2FA-11E1-B2B0-C2849F45AA77:00000000000CCD8E no longer exists, class: org.pentaho.pms.cwm.pentaho.meta.businessinformation.CwmDescription$Impl"

    So I thought the problem was in these new tables, but I restarted the Metadata Editor, and then SOME of the tables that where causing problems no longer did, and some of the ones that where working just fine did not any more
    I did this a lot of times, restarting the program, restarting windows, using a different database, etc... and the error seems to be totaly random.
    I´m using win7, mySql 5.5 with a 8mb, 4.2Mhz PC
    the details of the error are:

    at org.netbeans.lib.jmi.xmi.SAXReader.read(SAXReader.java:62)
    at org.pentaho.pms.core.CWM.importFromXMI(CWM.java:1433)
    at org.pentaho.pms.core.CWM.importFromXMIString(CWM.java:1440)
    at org.pentaho.pms.core.CWM.exportToXMI(CWM.java:1368)
    at org.pentaho.pms.ui.MetaEditor.exportToXMI(MetaEditor.java:710)
    at org.pentaho.pms.ui.MetaEditor.exportToXMI(MetaEditor.java:665)
    at org.pentaho.pms.ui.MetaEditor$15.handleEvent(MetaEditor.java:535)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.pms.ui.MetaEditor.readAndDispatch(MetaEditor.java:851)
    at org.pentaho.pms.ui.MetaEditor.main(MetaEditor.java:3589)

    Thanks in advance for the help

    El Multiversista

    ReplyDelete
    Replies
    1. Hi,
      I suggest you create a JIRA ticket for this on jira.pentaho.com. This is the best way to get a bug resolved.

      Delete