Showing posts with label "UTF8". Show all posts
Showing posts with label "UTF8". Show all posts

Tuesday, August 27, 2013

Pentaho PostgreSQL Bulk Loader: How to fix a Unicode error

When using the Pentaho PostgreSQL Bulk Loader step, you might come across following error message in the log:

INFO  26-08 13:04:07,005 - PostgreSQL Bulk Loader - ERROR {0} ERROR:  invalid byte sequence for encoding "UTF8": 0xf6 0x73 0x63 0x68
INFO  26-08 13:04:07,005 - PostgreSQL Bulk Loader - ERROR {0} CONTEXT:  COPY subscriber, line 2


Now this is not a problem with Pentaho Kettle, but quite likely with the default encoding used in your Unix/Linux environment. To check which encoding is currently the default one, execute the following:


$ echo $LANG
en_US


In this case, we can clearly see it is not an UTF-8 encoding, the one which the bulk loader relies on.


So to fix this, we just set the LANG variable in example to the following:


$ export LANG=en_US.UTF-8


Note: This will only be available for the current session. Add it to ~/.bashrc or similar to have it available on startup of any future shell session.

Run the transformation again and now you will see that the process just works flawlessly.