OpenSQLCamp Videos online!

OpenSQLCamp was a huge success! I took videos of most of the sessions (we only had 3 video cameras, and 4 rooms, and 2 sessions were not recorded). Unfortunately, I was busy doing administrative stuff for opensqlcamp for the opening keynote and first 15 minutes of the session organizing, and when I got to the planning board, it was already full….so I was not able to give a session.

OpenSQLCamp was a huge success! I took videos of most of the sessions (we only had 3 video cameras, and 4 rooms, and 2 sessions were not recorded). Unfortunately, I was busy doing administrative stuff for opensqlcamp for the opening keynote and first 15 minutes of the session organizing, and when I got to the planning board, it was already full….so I was not able to give a session.

Presentation: Drizzle is not MySQL with Changes by Brian Aker

Another video from the recent OpenSQLCamp in Portland, Oregon….Earlier today I uploaded the Lightning Talk Videos. Due to the holiday I am unsure when the rest of the videos will be ready. When they are, I will do one blog post featuring them all.

However, I have had several requests for this specific video, so here is Brian Aker speaking about Drizzle.

The slides are up at http://www.slideshare.net/brianaker/drizzle-opensql-camp, and here’s the video:

(Note, I will not do a post for each video…..but since this one is up and ready, I figured I’d do it before I leave for the holiday).

Another video from the recent OpenSQLCamp in Portland, Oregon….Earlier today I uploaded the Lightning Talk Videos. Due to the holiday I am unsure when the rest of the videos will be ready. When they are, I will do one blog post featuring them all.

However, I have had several requests for this specific video, so here is Brian Aker speaking about Drizzle.

The slides are up at http://www.slideshare.net/brianaker/drizzle-opensql-camp, and here’s the video:

(Note, I will not do a post for each video…..but since this one is up and ready, I figured I’d do it before I leave for the holiday).

OpenSQLCamp Lightning Talk Videos

OpenSQLCamp was a huge success! Not many folks have blogged about what they learned there….if you missed it, all is not lost. We did take videos of most of the sessions (we only had 3 video cameras, and 4 rooms, and 2 sessions were not recorded).

All the videos have been processed, and I am working on uploading them to YouTube and filling in details for the video descriptions. Not all the videos are up right now….right now all the lightning talks are up.



I am working on uploading the rest of the sessions, so look for them next week, though I may have one or two up very soon.

OpenSQLCamp was a huge success! Not many folks have blogged about what they learned there….if you missed it, all is not lost. We did take videos of most of the sessions (we only had 3 video cameras, and 4 rooms, and 2 sessions were not recorded).

All the videos have been processed, and I am working on uploading them to YouTube and filling in details for the video descriptions. Not all the videos are up right now….right now all the lightning talks are up.



I am working on uploading the rest of the sessions, so look for them next week, though I may have one or two up very soon.

Free 10-day trial of Safari Books Online

That’s right — get your free 10-day trial! All the information I know is here:

http://bit.ly/37E9ld

But the basics are: No access to Rough Cuts or Downloads, for new subscribers only. It’s one of those “sign up and if you do not cancel after 10 days, we bill you” — and at $42.99 a month, that’s not a mistake you want to make. Must sign up by Nov. 24th.

To sign up now: https://ssl.safaribooksonline.com/tryitfree

I was asked to send this information along, so I am…Now’s your chance to skim High Performance MySQL, among other high quality books!

That’s right — get your free 10-day trial! All the information I know is here:

http://bit.ly/37E9ld

But the basics are: No access to Rough Cuts or Downloads, for new subscribers only. It’s one of those “sign up and if you do not cancel after 10 days, we bill you” — and at $42.99 a month, that’s not a mistake you want to make. Must sign up by Nov. 24th.

To sign up now: https://ssl.safaribooksonline.com/tryitfree

I was asked to send this information along, so I am…Now’s your chance to skim High Performance MySQL, among other high quality books!

Testing TokuDB – Faster and smaller for large tables

For the past two months, I have been running tests on TokuDB in my free time. TokuDB is a storage engine put out by Tokutek. TokuDB uses fractal tree indexes instead of B-tree indexes to improve performance, which is dramatically noticeable when dealing with large tables (over 100 million rows).

For those that like the information “above the fold”, here is a table with results from a test comparing InnoDB and TokuDB. All the steps are explained in the post below, if you want more details, but here’s the table:















ActionInnoDBTokuDB
Importing ~40 million rows119 min 20.596 sec69 min 1.982 sec
INSERTing again, ~80 million rows total5 hours 13 min 52.58 sec56 min 44.56 sec
INSERTing again, ~160 million rows total20 hours 10 min 32.35 sec2 hours 2 min 11.95 sec
Size of table on disk42 Gb15 Gb
COUNT(*) query with GROUP BY58 min 10.11 sec5 min 3.21 sec
DELETE query2 hours 46 min 18.13 sec1 hour 14 min 57.75 sec
Size of table on disk42 Gb12 Gb
OPTIMIZE TABLE1 day 2 hours 19 min 21.96 sec21 min 4.41 sec
Size of table on disk41 Gb12 Gb
TRUNCATE TABLE1 min 0.13 sec0.27 sec
Size of table on disk41 Gb193 Mb (after waiting 60 seconds before doing an ls -l)
OPTIMIZE TABLE23.88 sec0.03 sec
Size of table on disk176 Kb193 Mb

Installing TokuDB is not quite as easy as plugging in a storage engine. TokuDB requires a patch to the MySQL source code, so you can either patch the source code yourself or download an already-patched version from Tokutek that contains TokuDB as well. I used the already-patched version of MySQL from Tokutek, and it was no different than setting up a regular MySQL install — install, configure and go.

On disk, a table using the TokuDB storage engine is different from both InnoDB and MyISAM. It has a .frm file, as all MySQL tables do. In addition, there is a directory which contains a main.tokudb file with the data, a status.tokudb file (I believe with an action queue), and a key-KEYNAME.tokudb file for each index:


# ls -1
testtoku.frm
testtoku.tokudb


# ls -1 */*
testtoku.tokudb/key-DATE_COLLECTED.tokudb
testtoku.tokudb/key-HASHCODE.tokudb
testtoku.tokudb/main.tokudb
testtoku.tokudb/status.tokudb

A bit of playing around, and we see that we cannot get much from the file — with MyISAM tables, you can see the data in the table by doing a “strings” command on it:


# cd testtoku.tokudb
# file *
key-DATE_COLLECTED.tokudb data
key-HASHCODE.tokudb data
main.tokudb data
status.tokudb data


# strings *
tokudata
tokuleaf
x^cd
fdac
tokudata
tokuleaf
x^cd
bN fda

For a basic test I compared bulk insertion, simple querying and deletes with TokuDB and InnoDB. I did not use any special features of TokuDB. I started with an sql file produced by mysqldump that was 2.0 Gb in size, which had 19 million rows, and performed some simple tests on it. The table has a signed INT as a primary key, and the goal of this test was to see how easy it would be to delete test data. “Test data” is defined as anything that had a particular field (HASHCODE, defined as VARCHAR(32)) in common with more than 10,000 rows.

0) imported 19,425,235 rows

1) SELECT COUNT(*),HASHCODE FROM test[engine] GROUP BY HASHCODE HAVING COUNT(*)>10000;

2) DELETE FROM PRIMARY_KEY_HASH WHERE HASHCODE IN ([list of id's]); This deleted about 3.3% of the records in the table (647,732 rows)

3) OPTIMIZE TABLE test[engine] – to defragment

Tests were done on an Amazon EC2 instance — AMI ID:ami-2547a34c which is a Fedora 64-bit machine, using the m1.xlarge size (16 Gb RAM).










ActionInnoDBTokuDB
Importing over 19 million rows33 min 2.107 sec31 min 24.793 sec
Size of table on disk4.4 Gb2.4 Gb
COUNT(*) query with GROUP BY8.64 sec29.28 sec
DELETE query26.06 sec2 min 19.51 sec
Size of table on disk4.4 Gb1.9 Gb
OPTIMIZE TABLE35 min 15.04 sec1 min 20.42 sec
Size of table on disk4.3 Gb1.2 Gb

InnoDB performed exceedingly well because the InnoDB buffer pool was sized larger than the data (12 Gb buffer pool vs. 4.4 Gb table), and the data import caused the buffer pool to have all the data and indexes already cached when the queries were run. Even so, TokuDB only fared slightly worse than InnoDB in overall performance.

The most interesting part of the table, for me, is the fact that there is no need to defragment the table — Even though the size on disk does decrease after the OPTIMIZE TABLE, the Tokutek folks explained that there’s a queue of work to be done (such as defragmentation) that is done automatically, and OPTIMIZE TABLE processes the rest of the queue. This is why the size of the table on disk was already reduced even before teh OPTIMIZE TABLE was done, and if I had waited a minute or so before performing the OPTIMIZE TABLE it would have automatically been done and I would have seen no results with the OPTIMIZE TABLE.

(specifically, I was told “The fractal tree is a dynamic data structure which may rearrange itself when queries run. In addition, since the fractal tree is periodically checkpointed, there may be more than one version of the data changed since the last check point was taken in the underlying file.” and pointed to a blog post this post about quantifying fragmentation effects.

The table shows that for smaller amounts of data (fewer than 100 million rows), TokuDB is about 9% faster for inserts, but somewhat slower for even simple queries and deletes. There is no need to defragment TokuDB, which saves a lot of time in the long run.

As TokuDB is recommended for tables larger than 100 million rows, let’s see this same test with a large amount of data. This time we started with an import of 39,334,901 rows, a 4.0 Gb file produced by mysqldump. However, since we want more than 100 million rows, after the import we did 2 inserts to produce almost 160 million records:

INSERT INTO test[engine] (HASHCODE, [other non-primary key fields]) SELECT (HASHCODE, [other non-primary key fields]) FROM test[engine];
# after this there are almost 80 million records (78,669,802)

INSERT INTO test[engine] (HASHCODE, [other non-primary key fields]) SELECT (HASHCODE, [other non-primary key fields]) FROM test[engine];
# after this there are almost 160 million records (157,339,604)















ActionInnoDBTokuDB
Importing ~40 million rows119 min 20.596 sec69 min 1.982 sec
INSERTing again, ~80 million rows total5 hours 13 min 52.58 sec56 min 44.56 sec
INSERTing again, ~160 million rows total20 hours 10 min 32.35 sec2 hours 2 min 11.95 sec
Size of table on disk42 Gb15 Gb
COUNT(*) query with GROUP BY58 min 10.11 sec5 min 3.21 sec
DELETE query2 hours 46 min 18.13 sec1 hour 14 min 57.75 sec
Size of table on disk42 Gb12 Gb
OPTIMIZE TABLE1 day 2 hours 19 min 21.96 sec21 min 4.41 sec
Size of table on disk41 Gb12 Gb
TRUNCATE TABLE1 min 0.13 sec0.27 sec
Size of table on disk41 Gb193 Mb (after waiting 60 seconds before doing an ls -l)
OPTIMIZE TABLE23.88 sec0.03 sec
Size of table on disk176 Kb193 Mb

Clearly, TokuDB is better than InnoDB for all these values. And I did not even use any of the special features of TokuDB — no extra indexes were added!

One great aspect about TokuDB is that it gives you approximate statistics on how many rows have been inserted:


mysql> show processlist\G
*************************** 1. row ***************************
Id: 3
User: root
Host: localhost
db: test
Command: Query
Time: 1
State: Inserted about 4000 rows
Info: INSERT INTO `testtoku` VALUES (14600817,NULL,’c40325fb0406ccf2ad3e3c91aa95a6f2′,’000bxi504′,

The “State” value is per query, so in doing a bulk insert with many rows, I saw this number go up and down. However, it’s very useful nonetheless.

The SHOW TABLE STATUS shows that the statistics are exact (like MyISAM — InnoDB metadata is approximate):


mysql> select count(*) from testtoku;
+———-+
| count(*) |
+———-+
| 18431319 |
+———-+
1 row in set (15.01 sec)


mysql> SHOW TABLE STATUS\G
*************************** 1. row ***************************
Name: testtoku
Engine: TokuDB
Version: 10
Row_format: Dynamic
Rows: 18431319
Avg_row_length: 0
Data_length: 0
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: 19425236
Create_time: NULL
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)

TokuDB does have that “action queue” I mentioned before — the statistics are exact, but may not be up-to-date if there are still actions to be performed. However, any statement that touches every record will perform all the actions left in the queue — so after statements like OPTIMIZE TABLE and even SELECT COUNT(*) FROM tbl, the statistics are up-to-date.

Just in case anyone wants it, here is the my.cnf used for both InnoDB and TokuDB tests:


[mysqld]
datadir = /mnt/mysql/data
port = 3306
socket = /tmp/mysql.sock
skip-locking
key_buffer_size = 384M
max_allowed_packet = 1M
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
thread_concurrency = 8
server-id = 1
innodb_data_home_dir = /mnt/mysql/data/
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /mnt/mysql/data/
innodb_buffer_pool_size = 12G
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_file_per_table

For the past two months, I have been running tests on TokuDB in my free time. TokuDB is a storage engine put out by Tokutek. TokuDB uses fractal tree indexes instead of B-tree indexes to improve performance, which is dramatically noticeable when dealing with large tables (over 100 million rows).

For those that like the information “above the fold”, here is a table with results from a test comparing InnoDB and TokuDB. All the steps are explained in the post below, if you want more details, but here’s the table:















ActionInnoDBTokuDB
Importing ~40 million rows119 min 20.596 sec69 min 1.982 sec
INSERTing again, ~80 million rows total5 hours 13 min 52.58 sec56 min 44.56 sec
INSERTing again, ~160 million rows total20 hours 10 min 32.35 sec2 hours 2 min 11.95 sec
Size of table on disk42 Gb15 Gb
COUNT(*) query with GROUP BY58 min 10.11 sec5 min 3.21 sec
DELETE query2 hours 46 min 18.13 sec1 hour 14 min 57.75 sec
Size of table on disk42 Gb12 Gb
OPTIMIZE TABLE1 day 2 hours 19 min 21.96 sec21 min 4.41 sec
Size of table on disk41 Gb12 Gb
TRUNCATE TABLE1 min 0.13 sec0.27 sec
Size of table on disk41 Gb193 Mb (after waiting 60 seconds before doing an ls -l)
OPTIMIZE TABLE23.88 sec0.03 sec
Size of table on disk176 Kb193 Mb

Installing TokuDB is not quite as easy as plugging in a storage engine. TokuDB requires a patch to the MySQL source code, so you can either patch the source code yourself or download an already-patched version from Tokutek that contains TokuDB as well. I used the already-patched version of MySQL from Tokutek, and it was no different than setting up a regular MySQL install — install, configure and go.

On disk, a table using the TokuDB storage engine is different from both InnoDB and MyISAM. It has a .frm file, as all MySQL tables do. In addition, there is a directory which contains a main.tokudb file with the data, a status.tokudb file (I believe with an action queue), and a key-KEYNAME.tokudb file for each index:


# ls -1
testtoku.frm
testtoku.tokudb


# ls -1 */*
testtoku.tokudb/key-DATE_COLLECTED.tokudb
testtoku.tokudb/key-HASHCODE.tokudb
testtoku.tokudb/main.tokudb
testtoku.tokudb/status.tokudb

A bit of playing around, and we see that we cannot get much from the file — with MyISAM tables, you can see the data in the table by doing a “strings” command on it:


# cd testtoku.tokudb
# file *
key-DATE_COLLECTED.tokudb data
key-HASHCODE.tokudb data
main.tokudb data
status.tokudb data


# strings *
tokudata
tokuleaf
x^cd
fdac
tokudata
tokuleaf
x^cd
bN fda

For a basic test I compared bulk insertion, simple querying and deletes with TokuDB and InnoDB. I did not use any special features of TokuDB. I started with an sql file produced by mysqldump that was 2.0 Gb in size, which had 19 million rows, and performed some simple tests on it. The table has a signed INT as a primary key, and the goal of this test was to see how easy it would be to delete test data. “Test data” is defined as anything that had a particular field (HASHCODE, defined as VARCHAR(32)) in common with more than 10,000 rows.

0) imported 19,425,235 rows

1) SELECT COUNT(*),HASHCODE FROM test[engine] GROUP BY HASHCODE HAVING COUNT(*)>10000;

2) DELETE FROM PRIMARY_KEY_HASH WHERE HASHCODE IN ([list of id's]); This deleted about 3.3% of the records in the table (647,732 rows)

3) OPTIMIZE TABLE test[engine] – to defragment

Tests were done on an Amazon EC2 instance — AMI ID:ami-2547a34c which is a Fedora 64-bit machine, using the m1.xlarge size (16 Gb RAM).










ActionInnoDBTokuDB
Importing over 19 million rows33 min 2.107 sec31 min 24.793 sec
Size of table on disk4.4 Gb2.4 Gb
COUNT(*) query with GROUP BY8.64 sec29.28 sec
DELETE query26.06 sec2 min 19.51 sec
Size of table on disk4.4 Gb1.9 Gb
OPTIMIZE TABLE35 min 15.04 sec1 min 20.42 sec
Size of table on disk4.3 Gb1.2 Gb

InnoDB performed exceedingly well because the InnoDB buffer pool was sized larger than the data (12 Gb buffer pool vs. 4.4 Gb table), and the data import caused the buffer pool to have all the data and indexes already cached when the queries were run. Even so, TokuDB only fared slightly worse than InnoDB in overall performance.

The most interesting part of the table, for me, is the fact that there is no need to defragment the table — Even though the size on disk does decrease after the OPTIMIZE TABLE, the Tokutek folks explained that there’s a queue of work to be done (such as defragmentation) that is done automatically, and OPTIMIZE TABLE processes the rest of the queue. This is why the size of the table on disk was already reduced even before teh OPTIMIZE TABLE was done, and if I had waited a minute or so before performing the OPTIMIZE TABLE it would have automatically been done and I would have seen no results with the OPTIMIZE TABLE.

(specifically, I was told “The fractal tree is a dynamic data structure which may rearrange itself when queries run. In addition, since the fractal tree is periodically checkpointed, there may be more than one version of the data changed since the last check point was taken in the underlying file.” and pointed to a blog post this post about quantifying fragmentation effects.

The table shows that for smaller amounts of data (fewer than 100 million rows), TokuDB is about 9% faster for inserts, but somewhat slower for even simple queries and deletes. There is no need to defragment TokuDB, which saves a lot of time in the long run.

As TokuDB is recommended for tables larger than 100 million rows, let’s see this same test with a large amount of data. This time we started with an import of 39,334,901 rows, a 4.0 Gb file produced by mysqldump. However, since we want more than 100 million rows, after the import we did 2 inserts to produce almost 160 million records:

INSERT INTO test[engine] (HASHCODE, [other non-primary key fields]) SELECT (HASHCODE, [other non-primary key fields]) FROM test[engine];
# after this there are almost 80 million records (78,669,802)

INSERT INTO test[engine] (HASHCODE, [other non-primary key fields]) SELECT (HASHCODE, [other non-primary key fields]) FROM test[engine];
# after this there are almost 160 million records (157,339,604)















ActionInnoDBTokuDB
Importing ~40 million rows119 min 20.596 sec69 min 1.982 sec
INSERTing again, ~80 million rows total5 hours 13 min 52.58 sec56 min 44.56 sec
INSERTing again, ~160 million rows total20 hours 10 min 32.35 sec2 hours 2 min 11.95 sec
Size of table on disk42 Gb15 Gb
COUNT(*) query with GROUP BY58 min 10.11 sec5 min 3.21 sec
DELETE query2 hours 46 min 18.13 sec1 hour 14 min 57.75 sec
Size of table on disk42 Gb12 Gb
OPTIMIZE TABLE1 day 2 hours 19 min 21.96 sec21 min 4.41 sec
Size of table on disk41 Gb12 Gb
TRUNCATE TABLE1 min 0.13 sec0.27 sec
Size of table on disk41 Gb193 Mb (after waiting 60 seconds before doing an ls -l)
OPTIMIZE TABLE23.88 sec0.03 sec
Size of table on disk176 Kb193 Mb

Clearly, TokuDB is better than InnoDB for all these values. And I did not even use any of the special features of TokuDB — no extra indexes were added!

One great aspect about TokuDB is that it gives you approximate statistics on how many rows have been inserted:


mysql> show processlist\G
*************************** 1. row ***************************
Id: 3
User: root
Host: localhost
db: test
Command: Query
Time: 1
State: Inserted about 4000 rows
Info: INSERT INTO `testtoku` VALUES (14600817,NULL,’c40325fb0406ccf2ad3e3c91aa95a6f2′,’000bxi504′,

The “State” value is per query, so in doing a bulk insert with many rows, I saw this number go up and down. However, it’s very useful nonetheless.

The SHOW TABLE STATUS shows that the statistics are exact (like MyISAM — InnoDB metadata is approximate):


mysql> select count(*) from testtoku;
+———-+
| count(*) |
+———-+
| 18431319 |
+———-+
1 row in set (15.01 sec)


mysql> SHOW TABLE STATUS\G
*************************** 1. row ***************************
Name: testtoku
Engine: TokuDB
Version: 10
Row_format: Dynamic
Rows: 18431319
Avg_row_length: 0
Data_length: 0
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: 19425236
Create_time: NULL
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)

TokuDB does have that “action queue” I mentioned before — the statistics are exact, but may not be up-to-date if there are still actions to be performed. However, any statement that touches every record will perform all the actions left in the queue — so after statements like OPTIMIZE TABLE and even SELECT COUNT(*) FROM tbl, the statistics are up-to-date.

Just in case anyone wants it, here is the my.cnf used for both InnoDB and TokuDB tests:


[mysqld]
datadir = /mnt/mysql/data
port = 3306
socket = /tmp/mysql.sock
skip-locking
key_buffer_size = 384M
max_allowed_packet = 1M
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
thread_concurrency = 8
server-id = 1
innodb_data_home_dir = /mnt/mysql/data/
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /mnt/mysql/data/
innodb_buffer_pool_size = 12G
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_file_per_table

Video: Building a MySQL Slave and Keeping it in Sync

Last night at the Boston MySQL User Group I presented on how to get a consistent snapshot to build a slave, how to use mk-table-checksum to check for differences between masters and slaves on an ongoing basis, and how to use tools such as mk-table-sync and mysqldump to sync the data if there are any discrepancies.

The slides are online at http://technocation.org/files/doc/slave_sync.pdf.

The video can be watched on youtube at http://www.youtube.com/watch?v=Un0wqYKmbWY or directly in your browser with the embedded player below:

Last night at the Boston MySQL User Group I presented on how to get a consistent snapshot to build a slave, how to use mk-table-checksum to check for differences between masters and slaves on an ongoing basis, and how to use tools such as mk-table-sync and mysqldump to sync the data if there are any discrepancies.

The slides are online at http://technocation.org/files/doc/slave_sync.pdf.

The video can be watched on youtube at http://www.youtube.com/watch?v=Un0wqYKmbWY or directly in your browser with the embedded player below:

MySQL and hardware information

People often ask “what’s the best hardware to run a database on?” And the answer, of course, is “it depends”. With MySQL, though, you can get good performance out of almost any hardware.

If you need *great* performance, and you have active databases with a large data set, here are some statistics on real life databases — feel free to add your own.

We define “large data set” as over 100 Gb, mostly because smaller data sets have an easier time with the available memory on a machine (even if it’s only 8 Gb) and backups are less intrusive — InnoDB Hot Backup and Xtrabackup are not really “hot” backups, they are “warm” backups, because there is load on the machine to copy the data files, and on large, active servers we have found that this load impacts query performance. As for how active a database is, we’ve found that equates to a peak production load of over 3,000 queries per second on a transactional database — that is, normal production load gets the server to over 3,000 queries per second at peak times — and a flat average of over 500 queries per second if there are definite quiet and peak times, or if the server is used for reporting or a combined reporting/transactional load. This flat average should be taken over a period of a week or more.

We’re only showing the hardware here, not the configurations, for our three busiest/largest environments. All the configurations shown here have 2 machines, for an active primary and either an active secondary (for read-only queries) or a passive secondary (quiet until needed):

A music distribution company runs the following for primary production (Data size in the 360-380GB range):
2x Sun SunFire X4600 M2 Servers using 4xDual Core Opteron 8220 with 32GB
of RAM attached to a Hatachi DF600F SAN. The SunFire X4600 is scalable
to 8 Sockets (32 Core) and 512GB of RAM.

National post office for a G8 nation = 1.2T of data (and growing fast!).
The primary site has 2 machines connected to the same LUNs on a HA
setup. We have an ‘active’ and a ‘passive’ master configured to kick-in
if the other node fails (only one mounts the LUNs with the data). Both
these servers have 4 x Quad-core Intel Xeon processors and 16G of RAM each.

An online marketing firm has ~600GB of Data
2x Dell PowerEdge R710 with 36GB of RAM and two Intel Xeon L5520 CPUs (Quad Core) – Servers supports up to 144GB of RAM and max of 2 Sockets
Storage: Combination of Local Storage (logs, etc) and an DELL PowerVault
MD 3000 Direct Attached Storage (shared).

What are your details?

People often ask “what’s the best hardware to run a database on?” And the answer, of course, is “it depends”. With MySQL, though, you can get good performance out of almost any hardware.

If you need *great* performance, and you have active databases with a large data set, here are some statistics on real life databases — feel free to add your own.

We define “large data set” as over 100 Gb, mostly because smaller data sets have an easier time with the available memory on a machine (even if it’s only 8 Gb) and backups are less intrusive — InnoDB Hot Backup and Xtrabackup are not really “hot” backups, they are “warm” backups, because there is load on the machine to copy the data files, and on large, active servers we have found that this load impacts query performance. As for how active a database is, we’ve found that equates to a peak production load of over 3,000 queries per second on a transactional database — that is, normal production load gets the server to over 3,000 queries per second at peak times — and a flat average of over 500 queries per second if there are definite quiet and peak times, or if the server is used for reporting or a combined reporting/transactional load. This flat average should be taken over a period of a week or more.

We’re only showing the hardware here, not the configurations, for our three busiest/largest environments. All the configurations shown here have 2 machines, for an active primary and either an active secondary (for read-only queries) or a passive secondary (quiet until needed):

A music distribution company runs the following for primary production (Data size in the 360-380GB range):
2x Sun SunFire X4600 M2 Servers using 4xDual Core Opteron 8220 with 32GB
of RAM attached to a Hatachi DF600F SAN. The SunFire X4600 is scalable
to 8 Sockets (32 Core) and 512GB of RAM.

National post office for a G8 nation = 1.2T of data (and growing fast!).
The primary site has 2 machines connected to the same LUNs on a HA
setup. We have an ‘active’ and a ‘passive’ master configured to kick-in
if the other node fails (only one mounts the LUNs with the data). Both
these servers have 4 x Quad-core Intel Xeon processors and 16G of RAM each.

An online marketing firm has ~600GB of Data
2x Dell PowerEdge R710 with 36GB of RAM and two Intel Xeon L5520 CPUs (Quad Core) – Servers supports up to 144GB of RAM and max of 2 Sockets
Storage: Combination of Local Storage (logs, etc) and an DELL PowerVault
MD 3000 Direct Attached Storage (shared).

What are your details?

Video and Slides: How InnoDB works

This presentation was be done by Sheeri Cabral of The Pythian Group and went into how to use SHOW ENGINE INNODB STATUS to get more information about your Innodb tables, foreign keys and transactions. This is a great presentation to learn how InnoDB works.

It also went through how to use SHOW ENGINE INNODB STATUS to tune several InnoDB variables:

innodb_adaptive_hash_index
innodb_commit_concurrency
innodb_concurrency_tickets
innodb_file_io_threads
innodb_log_buffer_size
innodb_max_purge_lag
innodb_sync_spin_loops
innodb_thread_concurrency
innodb_thread_sleep_delay

The slides can be downloaded from:

http://technocation.org/files/doc/ShowEngineInnoDBStatus.pdf

(note that the slides open up to the middle section, which has slides + notes, but if you just want the slides without notes, that starts on page 1 of the PDF)

The video can be watched below, or directly on YouTube at http://www.youtube.com/watch?v=ocdjspoLM58

This presentation was be done by Sheeri Cabral of The Pythian Group and went into how to use SHOW ENGINE INNODB STATUS to get more information about your Innodb tables, foreign keys and transactions. This is a great presentation to learn how InnoDB works.

It also went through how to use SHOW ENGINE INNODB STATUS to tune several InnoDB variables:

innodb_adaptive_hash_index
innodb_commit_concurrency
innodb_concurrency_tickets
innodb_file_io_threads
innodb_log_buffer_size
innodb_max_purge_lag
innodb_sync_spin_loops
innodb_thread_concurrency
innodb_thread_sleep_delay

The slides can be downloaded from:

http://technocation.org/files/doc/ShowEngineInnoDBStatus.pdf

(note that the slides open up to the middle section, which has slides + notes, but if you just want the slides without notes, that starts on page 1 of the PDF)

The video can be watched below, or directly on YouTube at http://www.youtube.com/watch?v=ocdjspoLM58

Dynamic General and Slow Query Log Before MySQL 5.1

This is a hack, but it is a good one. I was looking at some machines on a new client, and they had the general log turned on. I was surprised, because it was a fairly busy server, and they had had many problems with the server a few months ago. I thought perhaps they had turned on the general log to diagnose a problem and forgotten to turn it off, or something similar.

When I looked at the log on disk, I saw it was a symlink to /dev/null. They were running MySQL 5.0, so I immediately realized that it was a hack to have a general log they could “turn on” without having to restart mysqld.

On a different server, I saw the same link to /dev/null done with a slow query log.

The drawbacks to doing it this way is that MySQL still has the overhead of writing to the log file. The I/O overhead is greatly reduced because the writes are to /dev/null, but there’s still overhead from other resources such as RAM, CPU, etc.

But if you are in a similar position — where you frequently need to turn the general log or slow query log on temporarily for debugging purposes — this tip just might help.

This is a hack, but it is a good one. I was looking at some machines on a new client, and they had the general log turned on. I was surprised, because it was a fairly busy server, and they had had many problems with the server a few months ago. I thought perhaps they had turned on the general log to diagnose a problem and forgotten to turn it off, or something similar.

When I looked at the log on disk, I saw it was a symlink to /dev/null. They were running MySQL 5.0, so I immediately realized that it was a hack to have a general log they could “turn on” without having to restart mysqld.

On a different server, I saw the same link to /dev/null done with a slow query log.

The drawbacks to doing it this way is that MySQL still has the overhead of writing to the log file. The I/O overhead is greatly reduced because the writes are to /dev/null, but there’s still overhead from other resources such as RAM, CPU, etc.

But if you are in a similar position — where you frequently need to turn the general log or slow query log on temporarily for debugging purposes — this tip just might help.

Taste test: Innobackup vs. Xtrabackup

Firstly, I have to thank my co-workers Singer Wang and Gerry Narvaja for doing a lot of the work that resulted in this comparison.

After running both InnoDB Hot Backup and Xtrabackup, we have found that there is a measurable but not large difference between the resources that Xtrabackup and InnoDB Hot Backup consume.

Xtrabackup:

  • Free

  • takes 1.1% longer (2 min during a 3 hour backup)

  • uses 1.4% more space (1G more in a 70G backup — this was for uncompressed backups)

  • uses 1.115% more cpu overall

  • split as 0.12% user, 0.66% nice, 0.025% system, 0.31% more iowait, 0% more steal

  • InnoDB Hot Backup:

  • Commercial

  • slightly fewer resources (see xtrabackup)

The server has 32G of RAM, 8 Dual-Core AMD Opteron(tm) Processor 8220, with a speed of 2800 Mhz (2.734 Ghz) and a cache size of 1024 Kb, and is also used as a reporting server. We ran Xtrabackup on odd days, and InnoDB hot backup on even days. Both are stable and have responsive developers, and both projects have developers that know the internals of MySQL.

The clients of ours that use InnoDB Hot Backup (including the client this test was run on) already have purchased indefinite licenses for it. For those clients, there is not a big reason to switch to Xtrabackup.

For companies that are not using a hot backup solution yet, we recommend both InnoDB Hot Backup and Xtrabackup, and they end up choosing what best fits their environment. Some companies shun InnoDB Hot Backup because of its price tag. Some companies are wary of Xtrabackup because it has been less than a year since the release was announced, and feel better paying money to a company for their backup solution. I did not put the “feelings” in the lists above, because they are dependent on the environment — I have the same trust in the skill and commitment of the developers of Xtrabackup and the developers of InnoDB Hot Backup. Other such feelings might be “supporting open source” or “using the backup software that is developed by a partner organization.”

We felt it was easier to set up InnoDB Hot Backup than Xtrabackup, but that may be because we have been using InnoDB Hot Backup for years. However, neither setup is too complex — it is not necessary to have a senior-level DBA set up the backups, no matter which software you choose.

Firstly, I have to thank my co-workers Singer Wang and Gerry Narvaja for doing a lot of the work that resulted in this comparison.

After running both InnoDB Hot Backup and Xtrabackup, we have found that there is a measurable but not large difference between the resources that Xtrabackup and InnoDB Hot Backup consume.

Xtrabackup:

  • Free

  • takes 1.1% longer (2 min during a 3 hour backup)

  • uses 1.4% more space (1G more in a 70G backup — this was for uncompressed backups)

  • uses 1.115% more cpu overall

  • split as 0.12% user, 0.66% nice, 0.025% system, 0.31% more iowait, 0% more steal

  • InnoDB Hot Backup:

  • Commercial

  • slightly fewer resources (see xtrabackup)

The server has 32G of RAM, 8 Dual-Core AMD Opteron(tm) Processor 8220, with a speed of 2800 Mhz (2.734 Ghz) and a cache size of 1024 Kb, and is also used as a reporting server. We ran Xtrabackup on odd days, and InnoDB hot backup on even days. Both are stable and have responsive developers, and both projects have developers that know the internals of MySQL.

The clients of ours that use InnoDB Hot Backup (including the client this test was run on) already have purchased indefinite licenses for it. For those clients, there is not a big reason to switch to Xtrabackup.

For companies that are not using a hot backup solution yet, we recommend both InnoDB Hot Backup and Xtrabackup, and they end up choosing what best fits their environment. Some companies shun InnoDB Hot Backup because of its price tag. Some companies are wary of Xtrabackup because it has been less than a year since the release was announced, and feel better paying money to a company for their backup solution. I did not put the “feelings” in the lists above, because they are dependent on the environment — I have the same trust in the skill and commitment of the developers of Xtrabackup and the developers of InnoDB Hot Backup. Other such feelings might be “supporting open source” or “using the backup software that is developed by a partner organization.”

We felt it was easier to set up InnoDB Hot Backup than Xtrabackup, but that may be because we have been using InnoDB Hot Backup for years. However, neither setup is too complex — it is not necessary to have a senior-level DBA set up the backups, no matter which software you choose.