Liveblogging: Architecture of Maria By Monty Widenius

Architecture of Maria: A New Storage Engine with a Transactional Design

Goals of Maria:

  • ACID compliant
  • MVCC, transactional
  • default non-transactional and default transactional storage engine for MySQL
  • MyISAM replacement, including temporary table use
  • Storage engine good for data warehousing.
  • Allow any length transactions to take place
  • all indexes should have equal speed (clustered indexes are not in the plan)
  • log shipping — incremental backups just by copying the logs
  • used as a standalone library
  • fast count(*)
  • allow copying of Maria tables between different Maria servers
  • Better blob handling (than MyISAM) — no memory copying, or extra memory used for blobs on INSERT/UPDATE
  • BLOBs are allocated in big sequential blocks means that there is less fragmentation over time.
  • BLOBs are stored so Maria can be easily extended to have access to any part of a BLOB with a single fetch in the future

Maria is actually faster than MyISAM.
Will support READ COMMITTED and REPEATABLE READ (was a specific question).

“You should be able to upgrade Maria and have it just work, without ever having to backup and restore as part of the upgrade.”

Why do we need 2 storage engines, Maria and Falcon?
More storage engines = more scenarios to be able to handle. Falcon good for lots of memory, and shorter queries, Maria is to take care of long transactions and data warehousing side, and also to be an overall “good” engine for most purposes (at least as good as InnoDB).

Project plan — MySQL 5.1 and higher. Maria strives to be bug-free.

Maria 1.0 = “crash safe”, released in Jan 2008. Cacheable row format.
Maria 1.5 = “concurrent insert/select”, Apr/May 2008. Will be merged into the MySQL 6.0 release. Non-transactional.

  • External/internal row CHECK/REPAIR/compressing
  • Different row formats — fixed size, dynamic length, compressed, page
  • Fast checksum of a table
  • maria_check

MySQL 6.0 (includes Maria)
Maria 2.0 = “Transactional and ACID” alpha, Q4 2008
Maria 3.0 = “High concurrency + Online backup” alpha, Q1 2009; GA, Q2 2009

Advantages of Maria compared to MyISAM

  • Data, index and statistics are crash safe. On crash, everything will rollback to the state of the start of the statement or last LOCK TABLES commands.
  • Can replay everything from the log. Still need to backup the .frm files though.
  • LOAD INDEX can skip index blocks for unwanted indexes
  • most parts of Maria have unit tests.
  • Supports all MyISAM formats and a new PAGE format, where data is stored in pages — this is:

    • Only crashsafe, transactional row format for Maria
    • Versioned, so you can have concurrent insert/select
    • Cached by page cache so will give noticeable speed improvement on systems with poor data caching

  • MyISAM is non-transactional only, Maria can support transactional (crash-safe) and non-transactional formats
  • Maria uses big log files by default — 1G
  • Log control file = maria_log_control
  • Log files = maria_log.?????
  • If a log file is already in use by a transaction, create a new one, otherwise, reuse it. SHOW LOGS will show which logfiles exist, which ones are in use, and you can PURGE the logs.
  • Maria uses 8K pages by default (MyISAM uses 1K pages by default. This should be faster on static size indexes but slower on variable length keys, until Maria 2.0 where there will be a directory for index pages.

Before Apr/May release of Maria 1.5

  • Statistics and indexes versioned (data is already versioned)
  • FULLTEXT (boolean mode) and R-TREE indexes need to be made crash-safe
  • Parallel recovery

Disadvantages, not likely to be fixed

  • No external locking
  • One page size for both index and data
  • only one page cache
  • index number and checksums require an extra 5 bytes per page
  • doesn’t support RAID
  • Minimum data file size for BLOCK format is 16K, with 8K pages
  • Storage of rows < 25 bytes are not efficient for the PAGE format

XDB indexes — group records, and get the min and max for each group (groups are not sorted), and save that in a separate block. This means that stats like min and max are very fast. This is something that will help data warehousing queries.


Best quote from Monty: “You don’t steal ideas in open source. You respect people.”

Comments are closed.