Backups

This article is somewhat long. Interestingly, it does not actually cover my entire talk, as there is much to talk about besides the mechanics of each backup option. I wonder what I’d need to do to make this into a white paper or an article?

The backup presentation was finished last night. I may decide to go back and put some extra stuff in there, but that would be syntax and code and stuff. The logic is all in there, and the notes have been printed. I will post the slides (in .pdf and .swf (flash, the file is very small that way) formats) after the talk on Monday, as I may yet revise them.

I am very excited about one slide in particular, and I’ll share it here. It’s really a slide that I end with, but I feel as though it’s a great starting point as well as a summary point. I haven’t seen this information encapsulated this way before, so here goes:

Comparison Table of MySQL Backup Methods

Method No Locking DDL Snapshot Remote Free All Engines All Tables Text File Recover
Corruption
# No
SELECT . . .
INTO OUTFILE
Engine
Dependent
No No Yes Yes Yes No Yes Yes 3-4
mysqldump No Option No Yes Yes Yes Yes Yes Yes 2-3
Replication Yes No Yes Yes Yes Yes Yes No No 3
OS level copy No Yes No No Yes No Yes No Yes 5
mysqlhotcopy Yes Yes Yes No Yes No Yes No Yes 3
InnoDB Hot
Backup
Yes Yes Yes No No No Yes No Yes 4

The table is set up so the “No” answers are ‘bad’ and the “Yes” answers are ‘good’. The last column contains the # of “No”s in the row. Right off the bat, we see why mysqldump is probably the most used backup tool — it has the fewest “No”s.

“No locking” — A backup routine should interfere as little as possible with the actual database. If you lock a row or table, even for just reading, you’re blocking others from writing it. Transactional engines like InnoDB and BDB offer transactions to help get around this, but many databases have a mixture of transactional and non-transactional storage engines.

“DDL” — Backups should be able to completely restore a database. If you do not have the Data Definition Language (ie, CREATE TABLE statements), you will not be able to completely restore a database.

“Snapshot” — To use a backup for point-in-time recovery or to build a new replication slave, a snapshot is needed. It is possible to do a point-in-time recovery without a snapshot, but it involves checking the binary logs for possible duplicate statements.

“Remote” — Does the backup need to be run on the OS where the server lies? Due to security in companies or the use of an ISP, it is not always possible for the DBA to have the level of access needed to run programs on the database server.

“Free” — Self-explanatory. Note that the backup options that are free come packaged with MySQL, with the exception of OS-level copy, which is in all OS systems (ignoring embedded systems for now).

“All Engines” — Does the backup option deal with all storage engines? Some tools or commands are engine-specific.

“All Tables” — Can the tool easily backup all the tables without external looping code?

“Text file” — A text file backup is advantageous for a two reasons: Corruption can be easily detected, partial backups (ie, of one table) can be done, and SQL can be standardized for migration. Text files are usually larger than data files, but compression of text is excellent.

“Recover Corruption” — A “Yes” here means the backup option can be used to recover from corruption.

A brief discussion about the options:

SELECT . . . INTO OUTFILE is not used often, because it does not copy DDL. Code is needed to loop through tables, and there will be table-level locking for MyISAM tables. If the SELECT statement is put into a transaction, there will not be a problem for BDB and InnoDB tables.

In MySQL 5.0 it is possible to copy the information about a table from the INFORMATION_SCHEMA database by using SELECT . . .
INTO OUTFILE
. I have not tested if inserting data into the INFORMATION_SCHEMA tables is an equal and complete substitution for DDL. Instinct says “no”.

It’s my opinion that mysqldump is the most widely used backup tool. It has the fewest “No”s of all of them, and the features it does not have can be worked around.

Note that mysqldump does do DDL by default, but it’s easy to get an incomplete DDL statement. If you turn off –opt, and don’t put –create-options, you will end up with CREATE TABLE statements that are standard SQL. This may sound good, but remember that storage engines are MySQL specific, so your CREATE TABLE statements will end up using the default storage engine, which may not be what you want upon restoration.

mysqldump can do a snapshot if the option to lock all the tables is used. However, it is usually not feasible to lock every database on the server while the backup is running, because a backup can take minutes.

Replication is also widely used, but it requires another instance. And to reduce single points of failure, that means more hardware needs to be bought. It may be difficult to justify hardware simply to do nothing unless a backup is needed.

One of the features of replication is also a detriment. The standard way to set up a replication slave is to take a snapshot of the master, import it to the slave, and then use binary logs from there. Another way is to use LOAD DATA FROM MASTER, but that only works for MyISAM tables. It is possible to alter tables and data on the slave before or during replication, without replication failing. This is a handy feature for having a write-only master with InnoDB tables and a read-only slave with MyISAM tables (for example, with fulltext searching). However, this also means that the DDL backup is not necessarily to be trusted.

That mistrust can be extended to the data for the same reasons. Replication may not be a reliable backup if DML is being run on the slave server. As well, if the master becomes corrupt, it is likely that corruption will spread to the replication server.

The two most widely-used storage engines, MyISAM and InnoDB, are OS independent (explanation in the MySQL manual for MyISAM and InnoDB). This means that they can simply be copied on the OS-level. The biggest gotcha is that in order to get a snapshot, the entire database has to be locked. And unlike mysqldump, there is no option to an OS-level copy that will lock the tables. Using this is bulkier than mysqldump , and it cannot be used for all storage engines as mysqldump can.

Two solutions have been devised to combat those problems — mysqlhotcopy and InnoDB Hot Backup. mysqlhotcopy is for MyISAM tables while InnoDB Hot Backup is for InnoDB tables. The other difference is that the InnoDB Hot Backup is not free. However, InnoDB Hot Backup does offer a free perl script that will use InnoDB Hot Backup to take a snapshot of the InnoDB tables, and then use MySQL commands to get a snapshot of the other engine types.

Most people will choose a combination of backup methods — many use both replication and mysqldump, or replication and mysqlhotcopy and InnoDB Hot Backup. There is no clear-cut answer, and mostly it depends on your environment and what the backup will be used for.

Ideally there would be one tool that performed all of the functions listed in the chart. The chart is not complete, either — just a listing of my opinion of the most common desired features. There are other desired features not listed, for example replication is an “immediate hot backup” whereas all the other tools require manually importing or starting the MySQL server with new data files, etc. I am definitely interested in what folks have to say on the subject.

Why didn’t I think of this? Tom Limoncelli was nice enough to create a livejournal feed for this site — http://www.livejournal.com/users/sheericom_feed/

Direct link to add as a ‘friend’ if you’re logged in — http://www.livejournal.com/friends/add.bml?user=sheericom_feed

Why didn’t I think of this? Tom Limoncelli was nice enough to create a livejournal feed for this site — http://www.livejournal.com/users/sheericom_feed/

Direct link to add as a ‘friend’ if you’re logged in — http://www.livejournal.com/friends/add.bml?user=sheericom_feed

The MySQL User Group meeting on Monday, January 9th will focus on backups — I will give a 30-45 minute presentation on backups and we will then delve into a discussion — how folks within the group are backing up, how they’re not backing up, how they’d like to back up. If there’s time, we’ll do a “lessons learned from backup horror stories”.

Afterwards, we’ll head a few doors down to Boston Beer Works for more chatting.

You can see a video of last month’s presentation by Philip Antionades of MySQL AB on the new features in MySQL 5.0 at Google Video:

http://video.google.com/videosearch?q=Mysql+5.0+by+Philip+Antoniades

(we will video this month’s presentation but not the discussion)

More details, including the event location and directions, are at: http://mysql.meetup.com/137/events/

Any questions, comments, etc should be sent to me. We’re currently looking for a space better than North Station — something around MIT might be nice, and we’re trying to get an MIT grad student/faculty/staff member/organization to sponsor it. (We want a location that’s good for people driving AND taking the T, and when the Garden has an event, it’s not feasible to park there). If you can help out, please let me know.

(You may forward this announcement to other groups, lists, blogs, whatever.)

I’m quite excited about the presentation, as I’m writing down a lot of information from different sources that really wants to be together, but as far as I can tell is not yet, at least not in a public place.
Why didn’t I think of this? Tom Limoncelli was nice enough to create a livejournal feed for this site — http://www.livejournal.com/users/sheericom_feed/

Direct link to add as a ‘friend’ if you’re logged in — http://www.livejournal.com/friends/add.bml?user=sheericom_feed

The MySQL User Group meeting on Monday, January 9th will focus on backups — I will give a 30-45 minute presentation on backups and we will then delve into a discussion — how folks within the group are backing up, how they’re not backing up, how they’d like to back up. If there’s time, we’ll do a “lessons learned from backup horror stories”.

Afterwards, we’ll head a few doors down to Boston Beer Works for more chatting.

You can see a video of last month’s presentation by Philip Antionades of MySQL AB on the new features in MySQL 5.0 at Google Video:

http://video.google.com/videosearch?q=Mysql+5.0+by+Philip+Antoniades

(we will video this month’s presentation but not the discussion)

More details, including the event location and directions, are at: http://mysql.meetup.com/137/events/

Any questions, comments, etc should be sent to me. We’re currently looking for a space better than North Station — something around MIT might be nice, and we’re trying to get an MIT grad student/faculty/staff member/organization to sponsor it. (We want a location that’s good for people driving AND taking the T, and when the Garden has an event, it’s not feasible to park there). If you can help out, please let me know.

(You may forward this announcement to other groups, lists, blogs, whatever.)

I’m quite excited about the presentation, as I’m writing down a lot of information from different sources that really wants to be together, but as far as I can tell is not yet, at least not in a public place.
This article is somewhat long. Interestingly, as there is much to talk about besides the mechanics of each backup option. I wonder what I’d need to do to make this into a white paper or an article?

The backup presentation was finished last night. I may decide to go back and put some extra stuff in there, but that would be syntax and code and stuff. The logic is all in there, and the notes have been printed. I will post the slides (in .pdf and .swf (flash, the file is very small that way) formats) after the talk on Monday, as I may yet revise them.

I am very excited about one slide in particular, and I’ll share it here. It’s really a slide that I end with, but I feel as though it’s a great starting point as well as a summary point. I haven’t seen this information encapsulated this way before, so here goes:

Comparison Table of MySQL Backup Methods

Method No Locking DDL Snapshot Remote Free All Engines All Tables Text File Recover
Corruption
# No
SELECT . . .
INTO OUTFILE
Engine
Dependent
No No Yes Yes Yes No Yes Yes 3-4
mysqldump No Option No Yes Yes Yes Yes Yes Yes 2-3
Replication Yes No Yes Yes Yes Yes Yes No No 3
OS level copy No Yes No No Yes No Yes No Yes 5
mysqlhotcopy Yes Yes Yes No Yes No Yes No Yes 3
InnoDB Hot
Backup
Yes Yes Yes No No No Yes No Yes 4

The table is set up so the “No” answers are ‘bad’ and the “Yes” answers are ‘good’. The last column contains the # of “No”s in the row. Right off the bat, we see why mysqldump is probably the most used backup tool — it has the fewest “No”s.

“No locking” — A backup routine should interfere as little as possible with the actual database. If you lock a row or table, even for just reading, you’re blocking others from writing it. Transactional engines like InnoDB and BDB offer transactions to help get around this, but many databases have a mixture of transactional and non-transactional storage engines.

“DDL” — Backups should be able to completely restore a database. If you do not have the Data Definition Language (ie, CREATE TABLE statements), you will not be able to completely restore a database.

“Snapshot” — To use a backup for point-in-time recovery or to build a new replication slave, a snapshot is needed. It is possible to do a point-in-time recovery without a snapshot, but it involves checking the binary logs for possible duplicate statements.

“Remote” — Does the backup need to be run on the OS where the server lies? Due to security in companies or the use of an ISP, it is not always possible for the DBA to have the level of access needed to run programs on the database server.

“Free” — Self-explanatory. Note that the backup options that are free come packaged with MySQL, with the exception of OS-level copy, which is in all OS systems (ignoring embedded systems for now).

“All Engines” — Does the backup option deal with all storage engines? Some tools or commands are engine-specific.

“All Tables” — Can the tool easily backup all the tables without external looping code?

“Text file” — A text file backup is advantageous for a two reasons: Corruption can be easily detected, partial backups (ie, of one table) can be done, and SQL can be standardized for migration. Text files are usually larger than data files, but compression of text is excellent.

“Recover Corruption” — A “Yes” here means the backup option can be used to recover from corruption.

A brief discussion about the options:

SELECT . . . INTO OUTFILE is not used often, because it does not copy DDL. Code is needed to loop through tables, and there will be table-level locking for MyISAM tables. If the SELECT statement is put into a transaction, there will not be a problem for BDB and InnoDB tables.

In MySQL 5.0 it is possible to copy the information about a table from the INFORMATION_SCHEMA database by using SELECT . . .
INTO OUTFILE
. I have not tested if inserting data into the INFORMATION_SCHEMA tables is an equal and complete substitution for DDL. Instinct says “no”.

It’s my opinion that mysqldump is the most widely used backup tool. It has the fewest “No”s of all of them, and the features it does not have can be worked around.

Note that mysqldump does do DDL by default, but it’s easy to get an incomplete DDL statement. If you turn off –opt, and don’t put –create-options, you will end up with CREATE TABLE statements that are standard SQL. This may sound good, but remember that storage engines are MySQL specific, so your CREATE TABLE statements will end up using the default storage engine, which may not be what you want upon restoration.

mysqldump can do a snapshot if the option to lock all the tables is used. However, it is usually not feasible to lock every database on the server while the backup is running, because a backup can take minutes.

Replication is also widely used, but it requires another instance. And to reduce single points of failure, that means more hardware needs to be bought. It may be difficult to justify hardware simply to do nothing unless a backup is needed.

One of the features of replication is also a detriment. The standard way to set up a replication slave is to take a snapshot of the master, import it to the slave, and then use binary logs from there. Another way is to use LOAD DATA FROM MASTER, but that only works for MyISAM tables. It is possible to alter tables and data on the slave before or during replication, without replication failing. This is a handy feature for having a write-only master with InnoDB tables and a read-only slave with MyISAM tables (for example, with fulltext searching). However, this also means that the DDL backup is not necessarily to be trusted.

That mistrust can be extended to the data for the same reasons. Replication may not be a reliable backup if DML is being run on the slave server. As well, if the master becomes corrupt, it is likely that corruption will spread to the replication server.

The two most widely-used storage engines, MyISAM and InnoDB, are OS independent (explanation in the MySQL manual for MyISAM and InnoDB). This means that they can simply be copied on the OS-level. The biggest gotcha is that in order to get a snapshot, the entire database has to be locked. And unlike mysqldump, there is no option to an OS-level copy that will lock the tables. Using this is bulkier than mysqldump , and it cannot be used for all storage engines as mysqldump can.

Two solutions have been devised to combat those problems — mysqlhotcopy and InnoDB Hot Backup. mysqlhotcopy is for MyISAM tables while InnoDB Hot Backup is for InnoDB tables. The other difference is that the InnoDB Hot Backup is not free. However, InnoDB Hot Backup does offer a free perl script that will use InnoDB Hot Backup to take a snapshot of the InnoDB tables, and then use MySQL commands to get a snapshot of the other engine types.

Most people will choose a combination of backup methods — many use both replication and mysqldump, or replication and mysqlhotcopy and InnoDB Hot Backup. There is no clear-cut answer, and mostly it depends on your environment and what the backup will be used for.

Ideally there would be one tool that performed all of the functions listed in the chart. The chart is not complete, either — just a listing of my opinion of the most common desired features. There are other desired features not listed, for example replication is an “immediate hot backup” whereas all the other tools require manually importing or starting the MySQL server with new data files, etc. I am definitely interested in what folks have to say on the subject.

10 thoughts on “Backups”

  1. It interesting and a great resource but two issue was there
    * single-transaction also avoids locking issues. As you said, it only works for InnoDB tables. If you want your MyISAM tables to be consistent, you have to use –lock-tables. However, –lock-tables and –single-transaction are mutually exclusive.
    * Article does not cover ” What to Back UP,and When” .

    Petter Smith
    http://www.hdrconline.com

Comments are closed.