Wednesday, December 2, 2009

Exporting Characters as UTF-8 from Ke...

Exporting characters as UTF-8 from Kettle




Recently I took over a project for our Russian office, which strangely enough is part of the UK & International region of the company I am working for. This was the first time I was exposed to handling Cyrillic characters.

Basically there are following points to take into concern:
  • Make sure your MySQL table uses the UTF-8 encoding
  • Make sure that in the database connection details in Kettle following options are set: characterEncoding=utf8, characterSetResult=utf8,useUnicode=true.
  • Once you ran the ETL process and populated, don't worry if MySQL Query Studio displays the characters as an array of pipes (in this case the Cyrillic fonts are not installed). If you see question marks, well then, something is still wrong.

5 comments:

  1. Hi!

    I just want to add that when working with text files, you can set the encoding using a combobox located in the "content" tab of the configuration dialogs of the text input and ouput steps.

    ReplyDelete
  2. Thanks a lot Roland for pointing this out! Much appreciated! I still have to find some time to continue reading your excellent "Pentaho Solutions" book (http://www.amazon.co.uk/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322/ref=sr_1_1?ie=UTF8&s=books&qid=1259838283&sr=8-1). I hope I get some holidays around Xmas, then I'll try to do this.

    ReplyDelete
  3. Thanks for the support, Diethard ;)

    Don't hesitate to email me in case you have some questions about the book.

    Kind regards,

    Roland.

    ReplyDelete
  4. Thanks a lot Roland!
    Best regards,
    Diddy

    ReplyDelete
  5. Where i can set above setting exactly in database connection. Will appreciate your help for steps.
    Thanks

    ReplyDelete