Mark Callaghan speaks at the New England Database Summit about how data manageability is more important than performance.
Peak performance is thrown out, 95%-98% is important.
Variance shouldn’t be large.
Data manageability is rate of interrupts per server for the operations team. Rate of server growth much bigger than rate of new hires for the systems teams. A lot of the db team is from University of Wisconsin-Madison!
Why MySQL? Because it was there when he came. Mark and ops/engineering peers made it scale 10x. He likes MySQL for OLTP, InnoDB is “An amazing piece of software.”
They can get 500,000 qps using a cached workload, which is on par with memcached.
What Facebook really does is OLTP for the social graph. The workload is secondary indexes, index-only queries, small joins but most queries use one table, multi-row transactions, majority of workload does not need SQL/optimizer, they do a physical and logical backup.
Most of this does not require SQL [blogger’s note – they built Cassandra]. Why is the grass greener on the other side? automated replacement of failed nodes, less downtime on schema changes and/or fewer schema changes, multi-master, better compressions, etc.
Circa 2010, 13 million queries per second, 4 ms reads, 5 ms writes, 38GB peak network per second, etc.
Why so many servers? Big data high queries per seconds. They add servers to add IOPS, so they’re interested in compression and flash, so they can get more IOPS. If they do remain on disk, write-optimized dbs are interesting too. About 10 people on the db team, which is very small for a company that size.
How to scale MySQL? Fix stalls to make use of capacity, improve efficiency to use fewer queries/fewer data. Fixing stalls doesn’t make MySQL faster, makes it less slow.
[blogger’s note – I stopped taking notes here because this is a rehash of the “How Facebook Does MySQL” talk that has been done over and over…]
[restarted when he started talking about data manageability again]
How Facebook got it’s data manageable.
pylander- sheds load during a query pileup – kills dup queries, limits # of queries from some specific accounts — take off on Highlander: there can be only one.
dogpile – collects data during a query pileup – gets perf counters and list of running queries, generates HTML page with interesting results.
Online schema change tool, for frequent schema changes, especially adding indexes. This briefly locks the table, to setup triggers to track changes, copy data to a new table with the new desired schema, replay changes to the new table, then briefly lock the table again as you rename the new table as the target table.
Manageability is a work in progress — working on:
– make InnoDB compression work for OLTP
– Faker – tool for prefetching for replication slaves – replay workload is: page read, do some modification, page write. bottleneck might be disk reads, work is done by a single thread, transactions on master are concurrent. Faker has multiple threads replay transactions in “fake-changes” mode, no undo, no rollback, read-only, fetches into the buffer pool the pages needed for that transaction. Captures about 70% of disk reads for replication, they’re working on fixes to get it up to 80-90%.
– auto replacement – replace failed and unhealthy MySQL servers.
– Auto resharding – sharding is easy, re-sharding is hard.
open issues in manageability:
diagnose why one host is slow, others are not.
….and some more.