The Care and Feeding of MySQL Tables

Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, or even permission to run table maintenance statements.

MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:

SELECT foo FROM t1 INNER JOIN t2 USING (commonField);

The optimizer looks at the statistics for tables t1 and t2 and decides which is better:
1) To go through each item in t1, looking for a matching “commonField” in t2
or
2) To go through each item in t2, looking for a matching “commonField” t1

If t1 is very large and t2 is very small, it makes sense to follow plan #2. This is a simplified example, of course.

From the documentation:
ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock. This statement works with MyISAM, BDB, and InnoDB tables.

If the key distribution is off, the optimizer will be using incorrect (out-of-date) information. Therefore, the optimizations it makes will not be…well, optimal.

ANALYZE TABLE takes a very short amount of time — less than a second for even a million rows. I tested with InnoDB and MyISAM, but I’d guess that BDB is the same. Our database of 14G took less than a minute to analyze all 112 tables in 3 datases.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

CHECK TABLE check tables and views for incorrectly closed tables, incorrect or deleted links, and verified checksums for the rows. It can also check for full consistency.

This takes a bit — checking our tables for everything (but not checking for full consistency, as it takes longer) took 11 minutes (14G, 112 tables in 3 databases). Next month I will run a CHECK EXTENDED and see how long this takes.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/check-table.html

And the daddy of them all:
OPTIMIZE TABLE can be used on MyISAM, BDB and InnoDB tables. In MyISAM tables, it repairs deleted or split rows, updates index statistics, and sorts the index pages. For InnoDB and BDB, OPTIMIZE TABLE maps to ALTER TABLE and just rebuilds the index, thereby getting rid of defragmentation, corruption and incorrect statistics.

Documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

This took 36 minutes on our (14G, 112 tables in 3 databases) server.

From the documentation:
In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.

This morning, we backed up our data. Then we ran ANALYZE TABLE on all of our tables first. Sure, OPTIMIZE TABLE performs the same function, but OPTIMIZE TABLE takes a long time, and we wanted the benefit of ANALYZE TABLE right away. Plus, if anything failed, at least the table’s index statistics are up-to-date. Then we ran CHECK TABLE, and then OPTIMIZE TABLE on each table. We did this while running live, and as I said, the entire process took less than an hour.

Actually, it took 40 minutes, because once the script I had running to CHECK TABLE was halfway through, I started the OPTIMIZE TABLE script. I specifically set the tables to run through in alphabetical order, so there was no chance of the scripts trying to run on the same table. I will not do that again, I will just run them serially for safety’s sake.

Our site went from weekly crashes during our two busiest nights to not even peeping this week (during the two busiest nights), and the only thing we changed was that we did some table maintenance. We hadn’t done table maintenance at least as long as I’ve been around, which is 6 months. We are a site with high volumes of both reads and writes. This article will talk about the care and feeding of tables; feel free to use this for justification to have a maintenance window, or even permission to run table maintenance statements.

MySQL uses a cost-based optimizer to best translate the written query into what actually happens. This means when you write:

SELECT foo FROM t1 INNER JOIN t2 USING (commonField);

The optimizer looks at the statistics for tables t1 and t2 and decides which is better:
1) To go through each item in t1, looking for a matching “commonField” in t2
or
2) To go through each item in t2, looking for a matching “commonField” t1

If t1 is very large and t2 is very small, it makes sense to follow plan #2. This is a simplified example, of course.

From the documentation:
ANALYZE TABLE analyzes and stores the key distribution for a table. During the analysis, the table is locked with a read lock. This statement works with MyISAM, BDB, and InnoDB tables.

If the key distribution is off, the optimizer will be using incorrect (out-of-date) information. Therefore, the optimizations it makes will not be…well, optimal.

ANALYZE TABLE takes a very short amount of time — less than a second for even a million rows. I tested with InnoDB and MyISAM, but I’d guess that BDB is the same. Our database of 14G took less than a minute to analyze all 112 tables in 3 datases.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

CHECK TABLE check tables and views for incorrectly closed tables, incorrect or deleted links, and verified checksums for the rows. It can also check for full consistency.

This takes a bit — checking our tables for everything (but not checking for full consistency, as it takes longer) took 11 minutes (14G, 112 tables in 3 databases). Next month I will run a CHECK EXTENDED and see how long this takes.

Documentation: http://dev.mysql.com/doc/refman/5.0/en/check-table.html

And the daddy of them all:
OPTIMIZE TABLE can be used on MyISAM, BDB and InnoDB tables. In MyISAM tables, it repairs deleted or split rows, updates index statistics, and sorts the index pages. For InnoDB and BDB, OPTIMIZE TABLE maps to ALTER TABLE and just rebuilds the index, thereby getting rid of defragmentation, corruption and incorrect statistics.

Documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

This took 36 minutes on our (14G, 112 tables in 3 databases) server.

From the documentation:
In most setups, you need not run OPTIMIZE TABLE at all. Even if you do a lot of updates to variable-length rows, it is not likely that you need to do this more than once a week or month and only on certain tables.

This morning, we backed up our data. Then we ran ANALYZE TABLE on all of our tables first. Sure, OPTIMIZE TABLE performs the same function, but OPTIMIZE TABLE takes a long time, and we wanted the benefit of ANALYZE TABLE right away. Plus, if anything failed, at least the table’s index statistics are up-to-date. Then we ran CHECK TABLE, and then OPTIMIZE TABLE on each table. We did this while running live, and as I said, the entire process took less than an hour.

Actually, it took 40 minutes, because once the script I had running to CHECK TABLE was halfway through, I started the OPTIMIZE TABLE script. I specifically set the tables to run through in alphabetical order, so there was no chance of the scripts trying to run on the same table. I will not do that again, I will just run them serially for safety’s sake.

Comments are closed.