Open source business intelligence tutorials: Pentaho, Talend, Jasper Reports, BIRT and more.
Topics: Data Integration, Data Warehousing, Data Modeling, BI Server Setup, OLAP, Reporting, Dashboarding, Master Data Management and many more.
Friday, June 27, 2014
Moving Blog: Goodbye Blogger - Hello Github Pages
Thursday, June 12, 2014
Setting a variable value dynamically in a Pentaho Data Integration job
Setting a variable value dynamically in a Pentaho Data Integration job
On some occasions you might have to set a variable value dynamically in a job so that you can pass it on to the Execute SQL Script job entry in example. In this blog post we will take a look at how to create an integer representation of the date of 30 days ago. And we want to achieve this without using an additional transformation!
The way to achieve this in a simple fashion on the job level is to use the Evaluate JavaScript job entry [Pentaho Wiki]. While this job entry is not really intended to do this, it currently offers the easiest way to accomplish just this. Just add this job entry to your Kettle job and paste the following JavaScript:
date = new java.util.Date();
date.setDate(date.getDate()-30); //Go back 30 full days
var date_tk_30_days_ago = new java.text.SimpleDateFormat("yyyyMMdd").format(date);
parent_job.setVariable("VAR_DATE_TK_30_DAYS_AGO", date_tk_30_days_ago);
true; // remember that this job entry has to return true or false
To test this let's add a Log job entry:
Add this to the log message to the job entry settings:
The date 30 days ago was: ${VAR_DATE_TK_30_DAYS_AGO}
And then run the job. You should see something similar to this:
Certainly you could just pass the value as parameter from the command line to the job, but on some occasions it is more convenient to create the value dynamically inside the job.
Software used:
- pdi-4.4.0-stable
Friday, May 30, 2014
Pentaho CDE: Create your custom table
Pentaho Dashboards CDE: Create your custom table
You want to implement something in your dashboard that is not covered by the out-of-the-box dashboard components? Luckily, with Pentaho CDE the world is open: CDE makes use of the standard web technologies (CSS, JavaScript, HTML), so theoretically you can implement whatever is in the realm of these technologies. Obviously you will need some basic knowledge of these technologies (setup is not as easy any more as filling out some config dialogs), but the possibilities are endless. In this post I’ll briefly talk you through how to source some data and then to create a custom table with it (which you can easily do with one of the CDE components as well, but that’s not the point here … imagine what else you could do):
In CDE, register a Datasource. In example create a sql over sqlJndi datasource, provide a Name i.e. qry_generic_select, choose SampleData for JNDI and specify following query:
SELECT customername, customernumber, phone FROM customers
- In the component section, add a Query Component. This component is most commonly used for displaying simple results, like one number in a dashboard (i.e. max temperature). Here we will use this component to retrieve a bigger result set.
- Click on Advanced Properties.
- For the Datasource property specify the datasource you created in step 1 (i.e. qry_generic_select)
- Provide a name for the Result Var. This is the variable, which will hold the output data of your datasource.
Write a Post Execution function, in example:
function() { document.getElementById('test').innerHTML = JSON.stringify(select_result); }
We will only use this function for now to test if the query is working. Later on we will change it.
- The setup so far should look like this:
- In the Layout Panel create a basic structure which should at least have one column. Name the column test as we referenced it already in our JavaScript function.
- Preview your dashboard (partial screenshot):
Let’s change the Post Execution function to return only the first record:
function() { document.getElementById('test').innerHTML = JSON.stringify(select_result[0]); }
And the preview looks like this:
Let’s change the Post Execution function to return only the first entry from the first record:
function() { document.getElementById('test').innerHTML = JSON.stringify(select_result[0][0]); }
And the preview looks like this:
Let’s extend our Post Execution function to create a basic table:
function() { var myContainer = document.getElementById('test'); var myTable = document.createElement('table'); var myTr = document.createElement('tr'); var myTd = document.createElement('td'); myContainer.appendChild(myTable).appendChild(myTr).appendChild(myTd).innerHTML = select_result[0][0]; }
Do a preview and make use of your browser’s developer tools to see the generated HTML:
- Ok, now that this is working, let’s add some very basic design. Click on Settings in the main CDE menu:
Choose bootstrap from the Dashboard Type pull down menu: Click Save.
Back to the Post Execution function of the Query Component: Now we want to make this a bit more dynamic: For every data row must be enclosed by
<td>
and within each data row each data value must be enclosed by<td>
. We also have to add the<tbody>
element to make a proper table. And we will apply the Bootstrap Striped Table design:// Simple function preparing the table body function() { var myContainer = document.getElementById('test'); var myTable = document.createElement('table'); var myTBody = document.createElement('tbody'); var myTr = document.createElement('tr'); var myTd = document.createElement('td'); //myTable.id = 'table1'; myTable.className = 'table table-striped'; myContainer.appendChild(myTable).appendChild(myTBody); for(var i = 0; i < select_result.length; i++) { myContainer.lastChild.lastChild.appendChild(myTr.cloneNode()); for(var j = 0; j < select_result[i].length; j++) { myText = document.createTextNode(select_result[i][j]); myContainer.lastChild.lastChild.lastChild.appendChild(myTd.cloneNode()).appendChild(myText); } } }
You can find a text version of this JavaScript code a bit further down as well in case you want to copy it.
- Do a preview now and you will see that we have a basic table now:
Note: In case you are creating this dashboard as part of a Sparkl plugin and you are having troubles seeing the bootstrap styles applied (and are sure that the problem is not within your code), try to preview the dashboard from within your Sparkl project endpoint listing (which seems to work better for some unknown reason):
One important thing missing is the header. Let’s source this info now. The Query Component provides following useful functions, which you can access within Post Execution function:
this.metadata this.queryInfo this.resultset
To get an idea of what is exactly available with in the metadata object, you can use in example this function:
document.getElementById('test').innerHTML = JSON.stringify(this.metadata);
Which reveals the following:
This is the function preparing the full table (header and body):
// function preparing the full table (header and body) function() { var myContainer = document.getElementById('test'); var myTable = document.createElement('table'); var myTHead = document.createElement('thead'); var myTh = document.createElement('th'); var myTBody = document.createElement('tbody'); var myTr = document.createElement('tr'); var myTd = document.createElement('td'); //myTable.id = 'table1'; myTable.className = 'table table-striped'; //document.getElementById('test').innerHTML = JSON.stringify(this.metadata); myMetadata = this.metadata; myContainer.appendChild(myTable).appendChild(myTHead).appendChild(myTr); for(var s = 0; s < myMetadata.length; s++){ myHeaderText = document.createTextNode(myMetadata[s]['colName']); myContainer.lastChild.lastChild.lastChild.appendChild(myTh.cloneNode()).appendChild(myHeaderText); } myContainer.lastChild.appendChild(myTBody); for(var i = 0; i < select_result.length; i++) { myContainer.lastChild.lastChild.appendChild(myTr.cloneNode()); for(var j = 0; j < select_result[i].length; j++) { myText = document.createTextNode(select_result[i][j]); myContainer.lastChild.lastChild.lastChild.appendChild(myTd.cloneNode()).appendChild(myText); } } }
- And the preview looks like this:
Voilá, our custom boostrap table is finished. This is not to say that you have to create a table this way in CDE: This was just an exercise to demonstrate a bit of the huge amount of flexibility that CDE offers. Take this as a starting point for something even better.
Wednesday, March 26, 2014
Pentaho CDE and Bootstrap: The essential getting started info
Why Bootstrap
How to use it with CDE
Configure your CDE Dashboard to use Bootstrap
Generate your standard layout
- With Bootstrap, the total span size of the page is 12 columns (as opposed to Blueprint, which has 24 columns). This means that if your page has only one column, the span size should be 12. If your page has 2 columns, the span size should be 6 for each of them and so on (calc: 12 / no of cde columns).
- Within each column nest an HTML element: This one will hold the Bootstrap HTML snippet. In the most simple form it will look like this:
Make sure to provide a name for the HTML object in the Properties panel (otherwise, the styles do not show up properly in the preview - at least in my case this happened).
Get code from Bootstrap website
Amend Bootstrap HTML
- Copy the Bootstrap HTML snippet into the HTML panel
- Copy this reference into the CSS panel: @import url('http://getbootstrap.com/dist/css/bootstrap.css')
- Adjust the HTML. Do at least the following:
- Provide a proper title.
- Add an id attribute to the content div. This way we can later on reference it when we want to assign a chart component.
- Delete the default content text.
- Then click the Run button to get a preview:
Add amended Bootstrap HTML to CDE Layout Structure
Create your data sources
Create your components
Last Example
- Adjust the Layout Structure of your dashboard to make room for a button.
- Let’s copy the HTML snippet for the standard button. Jump over to JSFiddle and adjust it until you are happy with the preview. Remember to add a dedicated id attribute.
- Copy the HTML snippet again and paste it into the HTML properties field of your HTML element in the Layout Structure panel.
- Save and Preview your dashboard.
Again, a very easy example. I guess now you are certainly interested in creating some more challenging dashboards with Pentaho CDE and Bootstrap!
Friday, February 28, 2014
Having problems starting Pentaho Kettle Spoon on Linux? Here are some solutions ...
Quite often, Pentaho Kettle Spoon - the GUI for designing transformations and jobs - starts up just fine on Linux OSes. Sometimes though, there might be some dependencies to install or special flags to set.
When starting Pentaho Kettle on Fedora I came across this nasty error message:
spoon.sh: line 166: 10487 Aborted (core dumped) "$_PENTAHO_JAVA"
On other systems I also got this error message:
Matt Casters recommends installing libwebkitgtk instead of xulrunner.
sudo yum install webkitgtk.x86_64
Update 2014-07-20:
It turns out on Fedora 20 you do have to install xulrunner, but not via yum. Victor Sosa provided some instructions on this Jira case which I copy here for reference:
1) download the xulrunner 1.9.2 from here: http://ftp.mozilla.org/pub/mozilla.org/xulrunner/nightly/2012/03/2012-03-02-03-32-11-mozilla-1.9.2/xulrunner-1.9.2.28pre.en-US.linux-x86_64.tar.bz2.
[ ... and copy it to a directory of your choice. Extract it.]
2) change this line in the spoon.sh
The only change you need is
OPT="$OPT -Dorg.eclipse.swt.browser.DefaultType=mozilla -Dorg.eclipse.swt.browser.XULRunnerPath=/opt/xulrunner-1.9.2"
Did you get any other error messages when starting Spoon and found a solution for it? Please comment below and I'll add it to this blog post so that we have a good resource for trouble shooting.
Matt Casters:
FYI, the package to install on Ubuntu is usually libwebkitgtk-1.0-0 (as documented). I'm sure it's the same on Fedora. I would avoid all that xulrunner stuff if possible.
For those of us on Kubuntu there are bugs in theme oxygen-gtk so best switch to another theme like Ambiance of turn off a bunch of fancy-shmancy animations with oxygen-settings.
21/05/2014:
If you're having Spoon problems on Linux/OSX after an upgrade, try upgrading swt.jar from http://archive.eclipse.org/eclipse/downloads/drops4/R-4.3.2-201402211700/
Sunday, February 23, 2014
Sparkl: Create your own app for the Pentaho BI/BA Server
Installing Sparkl
Initial App Setup
Creating the dashboard
- Create a row called passwordTextRow.
- With passwordTextRow still marked, click on the Add HTML icon.
- Mark the HTML row and add this HTML snippet on the right hand side:
<p>Specify your new password:</p> - Then add a new row called passwordInputRow. For this row, add two columns, one called passwordInputColumn and the other one passwordSubmitColumn. The layout should now look like this:
- Save the dashboard and switch to the Component Panel.
- Create a parameter: Generic > Simple Parameter. Call it passwordParameter.
- Add a button: Others > Button Component. Call it passwordSubmitButton. For Label specify Submit and for HtmlObject passwordSubmitColumn (just press CTRL+Space to retrieve the values of the available HtmlObjects):
- Add an Input field: Select > TextInput Component. Name it passwordTextInput, assign the Parameter passwordParameter and the HtmlObject passwordInputColumn to it:
- Now switch to the Datasource Panel. Remove the SQL dummy query.
- From the left hand side open MYPASSWORDCHANGE Endpoints and choose mypasswordchangerendpoint Endpoint. For this datasource specify the name myPasswordChangerDS in the Properties section:
- Switch back to the Components Panel. In the Components area select the Button Component. Click on Advanced Properties.
- For Action Parameters specify passwordParameter as [["passwordParameter"],["passwordParameter"]] and
- For Action Datasource specify myPasswordChangerDS. At the time of this writing there were considerations about moving the datasource property to main properties area (instead of advanced properties), so this might have changed by the time you read this.
- For Listeners specify passwordParameter
- Save the dashboard.
- Let’s see how our Sparkl plugin looks so far. Choose Tools > MyPasswordChanger:
And you should see something like this:
Preparing the Kettle transformation
- Create a new biserver-user with admin rights which we can use just for authentication purposes.
- Fire up Spoon and open mypasswordchangerendpoint.ktr
- Amend the transformation to look like the one shown in the screenshot below:
- Double click on Generate Rows. Set the Limit to 1. Create three fields:
Field
|
Type
|
Value
|
url
|
String
|
http://localhost:8080/pentaho/api/userroledao/updatePassword
|
user
|
String
|
your username
|
password
|
String
|
your password
|
- Double click on Add XML. Into the Output Value field type userXML and into the Root XML element field user:
- Next click on the Fields tab. Configure as shown below:
- Double click on the REST client step. Configure as outlined below:
- General:
- URL name field: url
- HTTP method: PUT
- Body field: userXML
- Application Type: XML
- Authentication
- HTTP Login: admin-rest
- HTTP password: test123 (note: use the details of the specially created rest admin user here!)
- Finally configure the logging step to output the essential infos.
- Run the transformation.
- username: CPK server side parameter (currently logged in user)
- password: supplied by CDE dashboard
- URL: IP and port
- authentication username
- authentication password
How to pass a standard parameter from a dashboard to a Kettle job or transformation
- Open mypasswordchangerendpoint.ktr in Spoon.
- Right click on the canvas and choose Transformation settings.
- Specify the a new parameter called passwordParameter. As you might have guessed, this parameter name has to be exactly the same as defined in the dashboard:
- Change the transformation to look like this one:
Disable the hob from the original Generate Rows step to Add XML. Add a new Generate Rows and a Get Variables step. For now the new Generate Rows should only supply the url and user values: Right now we just want to test if the password parameter is passed on properly. Open up the Get Variables step and create a new fields called password which references the variable ${passwordParameter}:
This setup enables us to use the passwordParameter originating from the dashboard in our Kettle transformation stream. - Just for testing purposes write the password field to the log (so that we see that the value is actually passed on). Change the Write to Log step config respectively.
Server-side Parameters
Kettle Properties
Sparkl server side parameters
Parameter
|
Description
|
cpk.plugin.id
|
the plugin ID
|
cpk.solution.system.dir
|
the pentaho solution system dir (full path)
|
cpk.plugin.dir
|
the plugin dir (full path)
|
cpk.plugin.system.dir
|
the plugin system dir (full path, this isn't used very often though and it might become deprecated)
|
cpk.webapp.dir
|
webapp dir (full path)
|
cpk.session.username
|
session username
|
cpk.session.roles
|
session roles (string with session authorities separated by commas)
|
Amending our transformation
- Remove any fields from the new Generate rows step. It should be blank:
- Open up the Get Variables set and configure it as shown in the screenshot … add the username variable referencing cpk.session.username:
- Adjust the Add XML step:
- Change the config of the REST Client step to not read any more the URL from the field but instead key it into the URL config field like this:
${VAR_PENTAHO_BISERVER_URL}/api/userroledao/updatePassword - Also, in the Authentication reference the parameters we just set up: ${VAR_PENTAHO_BISERVER_USER} and ${VAR_PENTAHO_BISERVER_PW} respectively.
- Save and then restart the server.
- Then test again the dashboard and watch the Tomcat log.