Wednesday, March 21, 2012

Talend Open Studio: Retrieve PostgreSQL error messages


Retrieve PostgreSQL error messages

When using the PostgresqlOutput component, sometimes you will only see “Call getNextException to see the cause” in the error log in case something goes wrong:
So how do we get a detailed error message?
The solution is to switch off the batch mode. Simply click on the Advanced settings of the PostgresqlOutput component and untick Batch size:


If you cannot see Use Batch Size, scroll all the way down!

Now save your job and rerun it: You should now get a useful error message:


Now you have the right info to solve the problem! Once the problem is solved, don't forget to switch on the batch mode again.

Sunday, March 11, 2012

PostgreSQL: Auto generating a sample dataset (II)

Some time ago I presented a simple solution to generate date dimension data in PostgreSQL using standard SQL. The below approach is another take on this and probably a little bit easier to read then the previous approach.
This time I made use of the GENERATE_SERIES function. As a simple example, we first use a Common Table Expression to generate a dataset which holds 10 successive dates and then in the main query we extract some date information:


CREATE TABLE
test.dwh.dim_date
(
date_tk BIGINT
, date DATE
, day SMALLINT
, month SMALLINT
, quarter SMALLINT
, year INT
, iso8601_year INT
, iso8601_week SMALLINT
, iso8601_dow SMALLINT
)
;


INSERT INTO
test.dwh.dim_date
WITH date_series AS (
SELECT
DATE(GENERATE_SERIES(DATE '2012-01-01', DATE '2014-01-10','1 day')) AS date
)
SELECT
CAST(TO_CHAR(date,'YYYYMMDD') AS INT) AS date_tk
, date 
, EXTRACT(DAY FROM date) AS day
, EXTRACT(MONTH FROM date) AS month
, EXTRACT(QUARTER FROM date) AS quarter
, EXTRACT(YEAR FROM date) AS year
-- the ISO 8601 year that the date falls in
-- Each ISO year begins with the Monday of the week containing the 4th of January, 
-- so in early January or late December the ISO year may be different from the Gregorian year.
, EXTRACT(ISOYEAR FROM date) AS iso8601_year
-- The number of the week of the year that the day is in. By definition (ISO 8601), weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.
-- In the ISO definition, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, 
-- and for late-December dates to be part of the first week of the next year. 
-- For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005, 
-- while 2012-12-31 is part of the first week of 2013. It's recommended to use the isoyear field together with week to get consistent results.
, EXTRACT(WEEK FROM date) AS iso8601_week
-- The day of the week as Monday(1) to Sunday(7)
-- This is identical to psql dow except for Sunday (which is 0). This matches the ISO 8601 day of the week numbering.
, EXTRACT(ISODOW FROM date) AS iso8601_dow
FROM 
date_series
;


Thursday, March 1, 2012

Talend open sources Big Data features

Just a month ago I reported here that Pentaho open sourced their Big data features in their data integration tool (Kettle). And yesterday Talend revealed on their blog that they are about to release a new Talend Open Studio for Big Data. This version will natively support Hadoop Distributed File System (HDFS), Pig, HBase, Sqoop and Hive. Moreover, Talend Open Studio for Big Data will be bundled with Hortonwork's Apache Hadoop distribution.
These are exciting times for data integration experts and companies alike. First of all, this means more choice in terms of open source data integration tools. Secondly, competition is always good and vital to a product's future development.