Part 2: Data Warehousing Tips and Tricks

Ask and you shall receive: http://face.centosprime.com/rdb-w/?p=68 linked to my previous post on the Data Warehousing Tips and Tricks session (http://sheeri.net/archives/204) with the comment, “I need to learn more about MERGE TABLES and INSERT … ON DUPLICATE KEY UPDATE“.

So here’s a bit more:

The manual pages for the MERGE storage engine:
http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
and
http://dev.mysql.com/doc/refman/5.0/en/merge-table-problems.html

MySQL Forums for the MERGE talbe are at:
http://forums.mysql.com/list.php?93

In a nutshell, a MERGE table is really a set of pointers to similarly-schema’d MyISAM tables. So if you have the same table schema multiple times (ie, partition per day, so you have tables named 2007_04_27_Sales, 2007_04_26_Sales, etc) you’d use a MERGE table to link them all together and then you can run a query on the MERGE table and it will query all the tables that the MERGE table points to.

As for INSERT . . . ON DUPLICATE KEY UPDATE —

MySQL gives many ways to deal with INSERTs and unique/primary keys. If you do an INSERT and the primary key you are trying to insert is already in the table, MySQL will give an error. Ways to deal with this:

1) Try & catch errors in the application code.

2) Use INSERT IGNORE INTO . . . this will insert a new record if a record with the key does not exist. If it does exist, nothing happens. Simply add the word “IGNORE” into your INSERT query after INSERT and before INTO.

3) Use REPLACE INTO . . .this will insert a new record if a record with the key does not exist. If a record does exist, MySQL will *delete* the record and then INSERT your record. This can cause problems when you just want to update part of a row, and not insert the whole row again. And it changes timestamps and auto-increment numbers, which may not be a desired result. Simply change the word “INSERT” in your query to “REPLACE”.

4) Use INSERT . . .ON DUPLICATE KEY UPDATE. The syntax is the regular INSERT statement, and at the end add ON DUPLICATE KEY UPDATE [expression]. For instance,

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing='book';

and what makes it easier, if you have variables or whatever in your VALUES, you can actually set the update statement to say “just use the value I wanted to insert, OK?” as in the following:

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing=VALUES(thing);

Manual page:
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Hope this helps!