InnoDB logfiles

The unsung heroes of InnoDB are the logfiles. They are what makes InnoDB automatic crash recovery possible.

Database administrators of other DBMS may be familiar with the concept of a “redo” log. When data is changed, affected data pages are changed in the innodb_buffer_pool. Then, the change is written to the redo log, which in MySQL is the InnoDB logfile (ib_logfile0 and ib_logfile1). The pages are marked as “dirty”, and eventually get flushed and written to disk.

If MySQL crashes, there may be data that is changed that has not been written to disk. Those data pages were marked as “dirty” in the innodb_buffer_pool, but after a MySQL crash the innodb_buffer_pool no longer exists. However, they were written to the redo log. On crash recovery, MySQL can read the redo log (InnoDB log files) and apply any changes that were not written to disk.

That is the basic functionality of the InnoDB log files. Given this, let’s look at some of the different parameters and their ramifications.

innodb_log_files_in_group is set with a default of 2. The logfiles are written in a circular manner — ib_logfile0 is written first, and when it has reached its maximum size, then ib_logfile1 will be written to.

innodb_log_file_size is the size of each log file in the log group. The total, combined size of all the log files has to be less than 4 Gb (according to the MySQL manual). Because the logfiles contain changes in the buffer pool that have not been written to disk, the total, combined size of all the log files should not be more than the innodb_buffer_pool_size.

If all the log files in the group are full of changes that have not been written to disk, MySQL will start to flush dirty pages from the InnoDB buffer pool, writing the changes to disk. If the log files are small, changes will be written to disk more often, which can cause more disk I/O.

When InnoDB does a crash recovery, it reads the log files. If the log files are large, it will take longer to recover from a crash. If innodb_fast_shutdown is set to 0, the log files are purged when MySQL shuts down — larger files mean a longer shutdown time. The default for innodb_fast_shutdown is 1, which means that the log files are not purged before a shutdown. Starting in MySQL 5.0.5, you can set it to 2, which simulates a crash, and at the next startup InnoDB will do a crash recovery.

innodb_flush_log_at_trx_commit controls how often the log files are written to. A value of 0 causes the log files to be written and flushed to disk once per second. The default is 1, which causes the log buffer to be written and flushed to disk after every transaction commit. The value can also be set to 2, which causes the log buffer to be written after every transaction commit and flushes the log files to disk once per second. A value of 2 means that MySQL might think that some changes are written to the log file, but do not persist in the log file after an operating system crash, because the log file was not flushed to disk before a crash.

Note that some filesystems are not honest about flushing to disk, so even though you may have the default value of 1, your system may be acting as if it has a value of 2. Setting this parameter to 2 means that there will be less I/O, at the cost of not being able to recover data from a crash.

innodb_flush_method changes how InnoDB opens and flushes data and log files. See the manual for details; the end result is a tradeoff in I/O performance versus whether or not an operating system crash would leave the InnoDB log files in an inconsistent state.

innodb_log_buffer_size is the write buffer for InnoDB log files. The larger the buffer is, the less often the log files are written to. This can save I/O.

The unsung heroes of InnoDB are the logfiles. They are what makes InnoDB automatic crash recovery possible.

Database administrators of other DBMS may be familiar with the concept of a “redo” log. When data is changed, affected data pages are changed in the innodb_buffer_pool. Then, the change is written to the redo log, which in MySQL is the InnoDB logfile (ib_logfile0 and ib_logfile1). The pages are marked as “dirty”, and eventually get flushed and written to disk.

If MySQL crashes, there may be data that is changed that has not been written to disk. Those data pages were marked as “dirty” in the innodb_buffer_pool, but after a MySQL crash the innodb_buffer_pool no longer exists. However, they were written to the redo log. On crash recovery, MySQL can read the redo log (InnoDB log files) and apply any changes that were not written to disk.

That is the basic functionality of the InnoDB log files. Given this, let’s look at some of the different parameters and their ramifications.

innodb_log_files_in_group is set with a default of 2. The logfiles are written in a circular manner — ib_logfile0 is written first, and when it has reached its maximum size, then ib_logfile1 will be written to.

innodb_log_file_size is the size of each log file in the log group. The total, combined size of all the log files has to be less than 4 Gb (according to the MySQL manual). Because the logfiles contain changes in the buffer pool that have not been written to disk, the total, combined size of all the log files should not be more than the innodb_buffer_pool_size.

If all the log files in the group are full of changes that have not been written to disk, MySQL will start to flush dirty pages from the InnoDB buffer pool, writing the changes to disk. If the log files are small, changes will be written to disk more often, which can cause more disk I/O.

When InnoDB does a crash recovery, it reads the log files. If the log files are large, it will take longer to recover from a crash. If innodb_fast_shutdown is set to 0, the log files are purged when MySQL shuts down — larger files mean a longer shutdown time. The default for innodb_fast_shutdown is 1, which means that the log files are not purged before a shutdown. Starting in MySQL 5.0.5, you can set it to 2, which simulates a crash, and at the next startup InnoDB will do a crash recovery.

innodb_flush_log_at_trx_commit controls how often the log files are written to. A value of 0 causes the log files to be written and flushed to disk once per second. The default is 1, which causes the log buffer to be written and flushed to disk after every transaction commit. The value can also be set to 2, which causes the log buffer to be written after every transaction commit and flushes the log files to disk once per second. A value of 2 means that MySQL might think that some changes are written to the log file, but do not persist in the log file after an operating system crash, because the log file was not flushed to disk before a crash.

Note that some filesystems are not honest about flushing to disk, so even though you may have the default value of 1, your system may be acting as if it has a value of 2. Setting this parameter to 2 means that there will be less I/O, at the cost of not being able to recover data from a crash.

innodb_flush_method changes how InnoDB opens and flushes data and log files. See the manual for details; the end result is a tradeoff in I/O performance versus whether or not an operating system crash would leave the InnoDB log files in an inconsistent state.

innodb_log_buffer_size is the write buffer for InnoDB log files. The larger the buffer is, the less often the log files are written to. This can save I/O.