Open source business intelligence tutorials: Pentaho, Talend, Jasper Reports, BIRT and more.
Topics: Data Integration, Data Warehousing, Data Modeling, BI Server Setup, OLAP, Reporting, Dashboarding, Master Data Management and many more.
Saturday, August 10, 2013
Going Agile: Sqitch Database Change Management
Going Agile: Sqitch Database Change Management
You have your database scripts under a dedicated version control and change management system, right? If not, I recommend doing this now.
While there have been handful of open source projects around which focus on DB script versioning and change management control, none has really gained a big momentum and a lot of them are dormant.
But there is a new player on the ground! A light at the end of the db change management tunnel - so to speak. David Wheeler has been working on Sqitch over the last year and the results are very promising indeed! Currently the github projects shows 7 other contributors, so let’s hope this project gets a strong momentum! Also a new github project for a Sqitch GUI was just founded.
Why I like Sqitch:
You can run all the commands from the command line and get very good feedback.
Everything seems quite logical and straightforward: It’s easy to get to know the few main commands and in a very short amount of time you are familiar with the tool.
You can use your choice of VCS.
It works very well.
Supported DBs are currently MySQL, Oracle, SQLite and PostgreSQL. CUBRID support is under way.
So what do we want to achieve? So what do we want to achieve? Bring all DDL, stored procedures etc under version control. This is what Git is very good for (or your choice of CVS). Keep track of the (order of) changes we applied to the database, verify that they are valid, be able to revert them back to a specific state if required. Furthermore, we want to deploy these changes (up to a specific state) to our test and production databases. This is was Sqitch is intended for:
The below write-up are my notes partially mixed with David’s ones.
Below I will only discuss the setup for PostgreSQL.
# make sure you have build tools install (make etc)
sudo apt-get install built-essential # sqitch requires Locals support (asks for Locale::TextDomain and Locale::Messages) sudo apt-get install libintl-perl
# make sure pg_config is installed
sudo apt-get install libpq-dev
# RHEL (Fedora, CentOs, etc) use instead:
# sudo yum install postgresql-devel
# Note where the psql include dir is located. You might be asked for it on installing sqitch
# Note the version of psql. You might be asked for it on installing sqitch
# install sqitch (adjust for your DB)
sudo cpan App::Sqitch DBD::Pg
Once installation is finished, check out the man page:
$ man sqitch
Within your git project directory, create a dedicated folder:
This is quite PostgreSQL specific. For other dbs use something like this:
SELECT 1/COUNT(*) FROM information_schema.schemata WHERE schema_name = 'staging';
Now test if you can deploy the script and revert it:
Try to deploy the changes:
The general command looks like this:
$ sqitch -d <dbname> deploy
As we have already specified a default db in the config file, we only have to run the following:
$ sqitch deploy
Adding metadata tables to dwh
Deploying changes to dwh
+ create_stg_schema .. ok
Note the plus sign in the feedback which means this change was added.
When you run deploy for the very first time, Sqitch will create maintenance tables in a dedicated schema automatically for you. These tables will (among other things) store in which “version” the DB is.
Check the current deployment status of database dwh:
(Important: If you are working with PostgreSQL, make sure you add your password to ~/.pgpass and then comment the password out in sqitch.conf beforehand otherwise this will not work.)
Or bundle them up, copy the bundle to your prod server and deploy it there:
$ sqitch bundle
Distribute the bundle
On the prod server:
$ cd bundle
$ sqitch -d dwhprod deploy
A future version of Sqitch will have better support for target DBs (see here).
Using Sqitch with an existing project (where some ddl already exists)
Sometimes you take over a project and want to bring the existing DDL under version control and change management.
Thanks to David for providing details on this:
The easiest option is to export the existing DDL and store it in one deploy file. For the revert file you could use a statement like this then:
DROP $schema CASCADE;
Let’s assume we call this change “firstbigchange”:
The first time you do a deploy to the existing database with Sqitch, do it twice: once with --log-only to apply your first big change, and then, from then on, without:
$ sqitch deploy --log-only --to firstbigchange
$ sqitch deploy --mode change
The --long-only option has Sqitch do everything in the deploy except actually run deploy scripts. It just skips it, assumes it worked successfully, and logs it. You only want to do this --to that first big dump change, as after that you of course want Sqitch to actually run deploy scripts.
Using more than one DB
DS: Currently it seems like there is a Sqitch version for each of these dbs. What if I was working on a project that used two different dbs installed on the same server and I wanted to use Sqitch for both of them (especially for dev I have more than one db installed on the same server/pc)?
DW: You can have more than one plan and accompanying files in a single project by putting them into subdirectories. They would then be effectively separate Sqitch projects in your SCM. The sqitch.conf file at the top can be shared by them all, though, which is useful for setting up separate database info for them ([core.pg] and [core.mysql] sections, for example).
If you are starting a new project, you would do it like this:
$ sqitch --engine pg --top-dir pg init myproject
$ sqitch --top-dir mysql init myproject
Then you have sqitch.plan, deploy, revert, and verify in pg/, and sqitch.plan deploy, revert, and verify in mysql/. To add a new change, add it twice: