Mozilla DB News Friday 21 September: The Calm Before the Storm

Next week I will be at Nagios World in St. Paul and then going directly to San Francisco for MySQL Connect and Oracle OpenWorld (I have ACE Director responsibilities for OOW), so this week was frantically finishing up a lot of projects before finalizing my talks. All but one of the MySQL 3rd quarter goals have been met (there are 2 Postgres goals that are still in progress, but our Postgres DBA is not going to conferences next week so there’s plenty of time for those.

And of course, there are the spring conferences to submit to. I have already submitted conference proposals to Confoo and RMOUG Training Days, and am working on proposals for SCALEx11 and Percona Live (calls for papers close Saturday October 12th!).

And somewhere in all that, the database team managed to get a bunch of stuff done this week, even though it was a short week for me (due to the Jewish New Year, and also my birthday!):

  • Tweaked our puppet configuration for our backup servers – our backup servers are running several instances of MySQL through sockets only, and some of the puppet configs assumed a running MySQL on port 3306.
  • Dealt with a Mozilla Labs experiment running crazy queries against Bugzilla, twice.
  • Assessed a problem with one of our development clusters, where several databases could not be dropped due to MySQL crashing. The undo log was corrupt on the master and one slave, but not another slave. We could not determine the cause of the corruption, though we believe a sync on Sept 6th caused the corruption to spread from the master to the one slave. The solution was to restore from the non-corrupt slave (which was also our backup slave, so that was pretty easy). We could have also set innodb_crash_recovery and exported and imported, but restoring from hot backup was faster.
  • Opened network flows to get slow query logs copied from the new version check databases for the Addons cluster, and updated SSH keys in puppet.
  • Upgraded one of our staging databases from MySQL 5.0 to Percona 5.1, added it to puppet, and converted it from innodb_file_per_table. (phew!)
  • Extracted data from Bugzilla about who is watching Mozilla Localizations components.
  • Debugged a DHCP issue on one of our Addons staging cluster machines.
  • Started compressing binary logs on our backup server as space was starting to get a bit low.
  • Debugged an issue where one of the machines in our Support dev/staging database cluster fell off the internet due to an IP conflict.
  • Replaced some text in a Bugzilla bug when the profanity filter failed to obscure some profanity.
  • Came up with a process to regularly extract data from an appliance that has MySQL embedded in it, and did a one-off of the process to get a database started.
  • Gave a developer an SQL backup of the production Buildbot database for analysis.

Next week is sure to be hectic, with two conferences. Happy autumn for those in the Northern Hemisphere, and happy spring for those in the Southern Hemisphere!

Next week I will be at Nagios World in St. Paul and then going directly to San Francisco for MySQL Connect and Oracle OpenWorld (I have ACE Director responsibilities for OOW), so this week was frantically finishing up a lot of projects before finalizing my talks. All but one of the MySQL 3rd quarter goals have been met (there are 2 Postgres goals that are still in progress, but our Postgres DBA is not going to conferences next week so theres plenty of time for those.

And of course, there are the spring conferences to submit to. I have already submitted conference proposals to Confoo and RMOUG Training Days, and am working on proposals for SCALEx11 and Percona Live (calls for papers close Saturday October 12th!).

And somewhere in all that, the database team managed to get a bunch of stuff done this week, even though it was a short week for me (due to the Jewish New Year, and also my birthday!):

  • Tweaked our puppet configuration for our backup servers our backup servers are running several instances of MySQL through sockets only, and some of the puppet configs assumed a running MySQL on port 3306.
  • Dealt with a Mozilla Labs experiment running crazy queries against Bugzilla, twice.
  • Assessed a problem with one of our development clusters, where several databases could not be dropped due to MySQL crashing. The undo log was corrupt on the master and one slave, but not another slave. We could not determine the cause of the corruption, though we believe a sync on Sept 6th caused the corruption to spread from the master to the one slave. The solution was to restore from the non-corrupt slave (which was also our backup slave, so that was pretty easy). We could have also set innodb_crash_recovery and exported and imported, but restoring from hot backup was faster.
  • Opened network flows to get slow query logs copied from the new version check databases for the Addons cluster, and updated SSH keys in puppet.
  • Upgraded one of our staging databases from MySQL 5.0 to Percona 5.1, added it to puppet, and converted it from innodb_file_per_table. (phew!)
  • Extracted data from Bugzilla about who is watching Mozilla Localizations components.
  • Debugged a DHCP issue on one of our Addons staging cluster machines.
  • Started compressing binary logs on our backup server as space was starting to get a bit low.
  • Debugged an issue where one of the machines in our Support dev/staging database cluster fell off the internet due to an IP conflict.
  • Replaced some text in a Bugzilla bug when the profanity filter failed to obscure some profanity.
  • Came up with a process to regularly extract data from an appliance that has MySQL embedded in it, and did a one-off of the process to get a database started.
  • Gave a developer an SQL backup of the production Buildbot database for analysis.

Next week is sure to be hectic, with two conferences. Happy autumn for those in the Northern Hemisphere, and happy spring for those in the Southern Hemisphere!

MySQL Connect: A Guide for DBAs

Last week I posted a MySQL Connect Guide for Developers. This week I am focusing on DBAs. The conference is about 2/3 administration/maintenance talks and about 1/3 development, with some overlap of course. Gerry and I did a lot of recommendations in OurSQL Episode 104, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for administrators, with times. Note that these are handpicked from what I think administrators would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am – 12:30 pm
There is a session if you want to learn What’s New In MySQL 5.6. Everyone thinks that there will be a new 5.6 release out (though we are all wondering if it will be a DMR, beta or release candidate release), so this will be a great session to go to, to learn about any new features released.

MySQL Optimizer Overview by Olav Sandstå. Get in depth as to how the optimizer works, so you have the knowledge to tune your server and queries.

There’s also Ronald Bradford’s session on Lessons from Managing 500+ MySQL Instances. Ronald always has great tips and tricks to make administering MySQL less painful and avoid problems.

1:00 – 2:00 pm
If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

Even if you are not a beginner, you are sure to learn some great Replication Tips and Tricks from Mats Kindahl. Mats will present a bag of useful tips and tricks related to the MySQL 5.5 GA and MySQL 5.6 development milestone releases, including multisource replication, using logs for auditing, handling filtering, examining the binary log, using relay slaves, splitting the replication stream, and handling failover.

I am personally excited for Rick’s Rules of Thumb by Rick James. Rick is always a great speaker and I learn so much from him!

2:30 – 3:30 pm
Backups are the single most important maintenance tool for MySQL. Hema Sridharam and Svetlana Smirnova present Save your Data: How to Make MySQL Backups. There are several tools to perform MySQL backups, including mysqldump, Oracle’s MySQL Enterprise Backup, third-party applications, and OS methods.

Henrik Ingo will speak about Evaluating MySQL High-Availability Alternatives, including replication, MySQL Cluster, DRBD, Tungsten and Galera. He will speak about the trade-offs of each method and why you might want to use each one, so you can decide what’s best for your environment.

And of course there’s Peter Zaitsev’s Optimizing MySQL Configuration, which is not-to-be missed!

4 – 5 pm
I am personally interested in Patrick Galbraith’s Database Resources On Demand, covering the concept of DBaaS, how an organization can use it, and what it means for DBAs for management and for developers in how they use database resources. Among the other topics it addresses is how open source technologies such as OpenStack provide an infrastructure that can be used in DBaaS.

Lately, solid-state disk drives have been getting a lot of attention. Vadim Tkachenko will speak about MySQL and Solid-State Drives: Usage and Tuning, covering SSD internals and how they affect database performance.

If you want to get your hands dirty with MySQL Cluster, join the Hands-on Labs by Santo Leto called Get Started with MySQL Cluster, where you will learn by doing: install, configure, administer, and access MySQL Cluster.

5:30 – 6:30 pm
If you want to know what replication features are coming up in MySQL 5.6, make sure to check out Lars Thalmann talking about Enabling the New Generation of Web and Cloud Services with MySQL 5.6 Replication. This session showcases the new replication features, including group commit and multithreaded slave for high performance, crash-safe slaves and failover utilities for high availability, global transaction identifiers and annotated row-based replication [RBR] for flexibility/usability, and event checksums for data integrity.

I have to promote my own session! I’m presenting Google-Hacking MySQL, which takes an in-depth look at using white-hat Google hacking techniques to show you what the “bad guys” can do. White-hat google hacking is the good kind of hacking, where you have permission. You will learn about the following hacking strategies and how they are done: SQL injection, cross-site scripting (XSS), cross-site request forgery (CSRF), gateway vulnerabilities, and social engineering—all without violating Google’s terms of service.

If you want to migrate your systems to MySQL, there is a presentation by Sergio Andres De La Cruz Rodriguez about Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

And there are Birds of a Feather session (BoFs) too!

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

There will be a special panel after that, but it has not been announced yet, so it looks like there is a hole in the schedule (but there is no hole!)

10:15 am – 11:15 am
Most of us have a shortage of DBAs. If you are in that situation, you probably have more money than time on your hands, so check out Rob Young’s talk on Optimizing Security, Performance, and Availability with MySQL Enterprise Edition. Yes, MySQL Enterprise Edition costs money, but it is much easier to buy it than to hire a top-notch DBA, which are rare in this world.

If you are a beginner DBA, attend the Hands on Lab entitled Focus on MySQL Replication, taught by Sven Sandberg and Luis Soares During this hands-on lab, you will learn how to get started, how replication works, and the best practices and tools. You will also learn about architecture, advanced replication configurations and some of the new features in the MySQL 5.6 development milestone releases. This session goes until 12:45, so you have a good 2.5 hours for the hands-on work.

Calvin Sun talks about Better Availability with InnoDB Online Operations, including online operations for schema changes such as add index, drop foreign key, and rename column.

11:45 – 12:45
Linas Virbalas and Robert Hodges of Continuent talk about Replicating from MySQL to Oracle Database and Back Again.

If you are into MySQL Security, Joro Kodinov will present MySQL Security: Past and Present. Since the description includes MySQL 5.6 security features, I would have called it “Past, Present and Future”.

Thinking of deploying, or already deployed, Galera? Then do not miss Seppo Jaakola’s talk on Galera Cluster Best Practices.

1:15 – 2:15 pm
Inaam Rana presents InnoDB Performance Tuning, which includes the newer features in MySQL 5.5 and the upcoming features in MySQL 5.6, including unique InnoDB architectural elements for performance and how to tune InnoDB to achieve better performance.

There is a world of tools designed to help make MySQL administration easier, so check out Charles Bell’s hands-on lab about MySQL Utilities where you can experiment with the tools.

My particular favorite for this time slot is Oystein Gravlen’s Query Performance Comparison of MySQL 5.5 and MySQL 5.6. I cannot wait to see how much improvement there is, and why!

2:45 pm -3:45 pm
Profiling with the Performance Schema, given by Mark Leith, will teach how to set up and use Performance Schema to perform everyday profiling and performance monitoring tasks, such as: finding problem queries; researching blocked hosts; profiling I/O usage; analyzing resource usage by schema, table, or user; or tracing a session to see exactly where it spends its time.

Alexander Rubin will talk about critical performance tuning information during In-Depth Query Optimization for MySQL.

Personally I’m not a fan of use cases, which are sessions like “how X company does Y with MySQL”, but since I’m giving one entitled Database Scaling at Mozilla, I should probably promote it. And I will note that these sessions are usually well-attended – I guess people want to see how the big players do things, even though they are only appropriate for about 5% of the DBAs out there. I will note that Mozilla has relatively small databases and high traffic, so our needs are similar to more DBAs out there, and hopefully our solutions will work for them.

4:15 – 5:15 pm
Grant McAlister of Amazon.com presents Durability Is Key: How to Protect Your Data from Corruption where he describes the differences between logical and physical corruption in MySQL and shows how to best protect your MySQL database from both types of corruption.

My former coworkers, Francisco Bordenave and Marco Tusa of Pythian, are presenting on Scaling MySQL with Multimaster Synchronous Replication, where they explain how they investigated and designed an architecture based on MySQL to support an application that served shops around the globe and to scale out and scale in, based on sales seasons.

Jonathon Coombes presents the hands-on lab MySQL Security: Authentication and Audit, a hands-on lab that starts with an introduction to the authentication plug-in API and how it works, then tries an example HTTP authentication plug-in. The lab takes you thorough setting up a Pluggable Authentication Module (PAM) plug-in to access the server OS user definitions. Then you will walk through the MySQL audit plug-in API and how it works, and experiment with the Oracle audit log plug-in and various events it can log. Participants will build and experiment with their own plug-in that forwards MySQL events to the OS logging APIs (syslogd on Linux and Windows Event Log on Windows).

5:45 – 6:45 pm
Tokutek’s Bradley Kuzsmaul defines big data as “several times as large as main memory”. If you have big data, check out his talk on
Solving the Challenges of Big Databases with MySQL.

Luis Soares teaches about using replication for high availability in Scaling for the Web and Cloud with MySQL Replication

Or, get information about fulltext search with Sphinx from the horse’s mouth – Andrew Aksynoff talks about Full-Text Search with MySQL and Sphinx.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House, which is JavaOne’s opening event and our closing event.

It’s going to be an amazing event with tons of technical content. I feel like I have written a lot here, but these are simply the sessions I’m having trouble choosing between, or wish I could go to. There are tons more sessions than what I’ve written about!

Last week I posted a MySQL Connect Guide for Developers. This week I am focusing on DBAs. The conference is about 2/3 administration/maintenance talks and about 1/3 development, with some overlap of course. Gerry and I did a lot of recommendations in OurSQL Episode 104, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for administrators, with times. Note that these are handpicked from what I think administrators would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am – 12:30 pm
There is a session if you want to learn What’s New In MySQL 5.6. Everyone thinks that there will be a new 5.6 release out (though we are all wondering if it will be a DMR, beta or release candidate release), so this will be a great session to go to, to learn about any new features released.

MySQL Optimizer Overview by Olav Sandstå. Get in depth as to how the optimizer works, so you have the knowledge to tune your server and queries.

There’s also Ronald Bradford’s session on Lessons from Managing 500+ MySQL Instances. Ronald always has great tips and tricks to make administering MySQL less painful and avoid problems.

1:00 – 2:00 pm
If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

Even if you are not a beginner, you are sure to learn some great Replication Tips and Tricks from Mats Kindahl. Mats will present a bag of useful tips and tricks related to the MySQL 5.5 GA and MySQL 5.6 development milestone releases, including multisource replication, using logs for auditing, handling filtering, examining the binary log, using relay slaves, splitting the replication stream, and handling failover.

I am personally excited for Rick’s Rules of Thumb by Rick James. Rick is always a great speaker and I learn so much from him!

2:30 – 3:30 pm
Backups are the single most important maintenance tool for MySQL. Hema Sridharam and Svetlana Smirnova present Save your Data: How to Make MySQL Backups. There are several tools to perform MySQL backups, including mysqldump, Oracle’s MySQL Enterprise Backup, third-party applications, and OS methods.

Henrik Ingo will speak about Evaluating MySQL High-Availability Alternatives, including replication, MySQL Cluster, DRBD, Tungsten and Galera. He will speak about the trade-offs of each method and why you might want to use each one, so you can decide what’s best for your environment.

And of course there’s Peter Zaitsev’s Optimizing MySQL Configuration, which is not-to-be missed!

4 – 5 pm
I am personally interested in Patrick Galbraith’s Database Resources On Demand, covering the concept of DBaaS, how an organization can use it, and what it means for DBAs for management and for developers in how they use database resources. Among the other topics it addresses is how open source technologies such as OpenStack provide an infrastructure that can be used in DBaaS.

Lately, solid-state disk drives have been getting a lot of attention. Vadim Tkachenko will speak about MySQL and Solid-State Drives: Usage and Tuning, covering SSD internals and how they affect database performance.

If you want to get your hands dirty with MySQL Cluster, join the Hands-on Labs by Santo Leto called Get Started with MySQL Cluster, where you will learn by doing: install, configure, administer, and access MySQL Cluster.

5:30 – 6:30 pm
If you want to know what replication features are coming up in MySQL 5.6, make sure to check out Lars Thalmann talking about Enabling the New Generation of Web and Cloud Services with MySQL 5.6 Replication. This session showcases the new replication features, including group commit and multithreaded slave for high performance, crash-safe slaves and failover utilities for high availability, global transaction identifiers and annotated row-based replication [RBR] for flexibility/usability, and event checksums for data integrity.

I have to promote my own session! I’m presenting Google-Hacking MySQL, which takes an in-depth look at using white-hat Google hacking techniques to show you what the “bad guys” can do. White-hat google hacking is the good kind of hacking, where you have permission. You will learn about the following hacking strategies and how they are done: SQL injection, cross-site scripting (XSS), cross-site request forgery (CSRF), gateway vulnerabilities, and social engineering—all without violating Google’s terms of service.

If you want to migrate your systems to MySQL, there is a presentation by Sergio Andres De La Cruz Rodriguez about Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

And there are Birds of a Feather session (BoFs) too!

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

There will be a special panel after that, but it has not been announced yet, so it looks like there is a hole in the schedule (but there is no hole!)

10:15 am – 11:15 am
Most of us have a shortage of DBAs. If you are in that situation, you probably have more money than time on your hands, so check out Rob Young’s talk on Optimizing Security, Performance, and Availability with MySQL Enterprise Edition. Yes, MySQL Enterprise Edition costs money, but it is much easier to buy it than to hire a top-notch DBA, which are rare in this world.

If you are a beginner DBA, attend the Hands on Lab entitled Focus on MySQL Replication, taught by Sven Sandberg and Luis Soares During this hands-on lab, you will learn how to get started, how replication works, and the best practices and tools. You will also learn about architecture, advanced replication configurations and some of the new features in the MySQL 5.6 development milestone releases. This session goes until 12:45, so you have a good 2.5 hours for the hands-on work.

Calvin Sun talks about Better Availability with InnoDB Online Operations, including online operations for schema changes such as add index, drop foreign key, and rename column.

11:45 – 12:45
Linas Virbalas and Robert Hodges of Continuent talk about Replicating from MySQL to Oracle Database and Back Again.

If you are into MySQL Security, Joro Kodinov will present MySQL Security: Past and Present. Since the description includes MySQL 5.6 security features, I would have called it “Past, Present and Future”.

Thinking of deploying, or already deployed, Galera? Then do not miss Seppo Jaakola’s talk on Galera Cluster Best Practices.

1:15 – 2:15 pm
Inaam Rana presents InnoDB Performance Tuning, which includes the newer features in MySQL 5.5 and the upcoming features in MySQL 5.6, including unique InnoDB architectural elements for performance and how to tune InnoDB to achieve better performance.

There is a world of tools designed to help make MySQL administration easier, so check out Charles Bell’s hands-on lab about MySQL Utilities where you can experiment with the tools.

My particular favorite for this time slot is Oystein Gravlen’s Query Performance Comparison of MySQL 5.5 and MySQL 5.6. I cannot wait to see how much improvement there is, and why!

2:45 pm -3:45 pm
Profiling with the Performance Schema, given by Mark Leith, will teach how to set up and use Performance Schema to perform everyday profiling and performance monitoring tasks, such as: finding problem queries; researching blocked hosts; profiling I/O usage; analyzing resource usage by schema, table, or user; or tracing a session to see exactly where it spends its time.

Alexander Rubin will talk about critical performance tuning information during In-Depth Query Optimization for MySQL.

Personally I’m not a fan of use cases, which are sessions like “how X company does Y with MySQL”, but since I’m giving one entitled Database Scaling at Mozilla, I should probably promote it. And I will note that these sessions are usually well-attended – I guess people want to see how the big players do things, even though they are only appropriate for about 5% of the DBAs out there. I will note that Mozilla has relatively small databases and high traffic, so our needs are similar to more DBAs out there, and hopefully our solutions will work for them.

4:15 – 5:15 pm
Grant McAlister of Amazon.com presents Durability Is Key: How to Protect Your Data from Corruption where he describes the differences between logical and physical corruption in MySQL and shows how to best protect your MySQL database from both types of corruption.

My former coworkers, Francisco Bordenave and Marco Tusa of Pythian, are presenting on Scaling MySQL with Multimaster Synchronous Replication, where they explain how they investigated and designed an architecture based on MySQL to support an application that served shops around the globe and to scale out and scale in, based on sales seasons.

Jonathon Coombes presents the hands-on lab MySQL Security: Authentication and Audit, a hands-on lab that starts with an introduction to the authentication plug-in API and how it works, then tries an example HTTP authentication plug-in. The lab takes you thorough setting up a Pluggable Authentication Module (PAM) plug-in to access the server OS user definitions. Then you will walk through the MySQL audit plug-in API and how it works, and experiment with the Oracle audit log plug-in and various events it can log. Participants will build and experiment with their own plug-in that forwards MySQL events to the OS logging APIs (syslogd on Linux and Windows Event Log on Windows).

5:45 – 6:45 pm
Tokutek’s Bradley Kuzsmaul defines big data as “several times as large as main memory”. If you have big data, check out his talk on Solving the Challenges of Big Databases with MySQL.

Luis Soares teaches about using replication for high availability in Scaling for the Web and Cloud with MySQL Replication

Or, get information about fulltext search with Sphinx from the horse’s mouth – Andrew Aksynoff talks about Full-Text Search with MySQL and Sphinx.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House, which is JavaOne’s opening event and our closing event.

It’s going to be an amazing event with tons of technical content. I feel like I have written a lot here, but these are simply the sessions I’m having trouble choosing between, or wish I could go to. There are tons more sessions than what I’ve written about!

Mozilla DB News, 14 Sept – GOOOOAAAALLLLLLs, and a lot of Bugzilla work

In 11 days I will be heading to NagiosWorld, and from there I go straight to MySQL Connect, and that is the end of September. So this week we have been focusing on wrapping up our 3rd quarter goals (and making new goals for the 4th quarter of 2012). We have also seemingly done a bunch of work on different aspects of the Bugzilla database clusters. This week, the database team has:

  • Built out new Bugzilla production database cluster in a failover data center, including monitoring.
  • Analyzed a full days’ worth of general and slow query logs for each database in the Bugzilla cluster so we can make optimization recommendations. Optimization recommendations are a goal for the 4th quarter, so this is pre-work so we can get a good metric for the goal.
  • Successfully implemented consistency checks for Bugzilla using pt-table-checksum, and a Nagios check from PalominoDB that we updated (the patch has been sent back to PalominoDB).
  • Upgraded the Bugzilla staging cluster to MariaDB 5.5.
  • Converted all of the Affiliates tables to UTF-8 instead of latin1. character-set-server has been set to utf8 for a while, but there were several legacy tables causing problems.
  • Created a new read-only slave cluster for Addons, so we can use it for version checking. This will be useful in our eventual change to the Mozilla Marketplace, which will house apps for our mobile platform, Firefox OS.
  • Assisted other folks on the Systems team in their goal to not have any users on any machines that are not in puppet. The DB team only had a few non-standard users on a few machines, but we did our part. 1200 machines were audited and I’m proud of my Systems teammates for getting through it all – we now have no non-puppetized users!
  • Rebuilt and patched the database servers holding our internal Puppet Dashboard data. The servers were deprecated last month, we were looking for a Puppet Dashboard replacement, but realized that there was no other appropriate dashboard for puppet, so we are back to using Puppet Dashboard.
  • Converted one of our development database clusters to use innodb_file_per_table and put it into puppet (most of our clusters are innodb_file_per_table at this point).
  • Upgraded one of our multi-use database clusters to use Percona’s patched MySQL 5.1 and put it into puppet.
  • Fixed some slow query log copies that were using a root account to use a less-privileged (but still in LDAP) account for slow query log copies.
  • Made staging and dev databases and users for Input, Mozilla’s primary user feedback application.
  • Fixed a replication issue on one of our multi-use database clusters – someone was trying to add foreign keys that already existed on the slave.
  • Created a new database for a new imaging service on Buildbot, Mozilla’s continuous integration tool.
  • Finished updating the MySQL ACL’s to include our failover load balancer, and to exclude the ACL’s from the previous data center that we finished moving from in May. This was a q3 goal; Postgres ACL’s still need to be looked at.
  • Pushed a large Mozilla rapid beta (aka “Mobeta”) update on our Postgres databases.
  • Helped debug an issue where the database behind basket, our newsletter subscription service, did not seem to be updating (it was a code issue, not a db issue).
  • Fixed an issue where a large transaction were causing our backup server to stop replicating a service. The transaction was more than 20G in size, but was on a database that was set with replicate_ignore_database, so it could be safely ignored.
  • Exported a list of Mozillian e-mails to help organize the community around Thunderbird.
  • Fixed an issue where replication on a backup server died due to “out of resources”.
  • Made a bug to check consistency for the Addons and Support database clusters using pt-table-checksum. This is a q4 goal.
  • Made bugs to get rid of MySQL 5.0 for a dozen clusters (yes, there are exactly 12 clusters still on MySQL 5.0 that we manage). 1 definitely can be decommissioned, 2 others may be decommissioned or may be upgraded to Percona’s patched MySQL 5.1, and the rest will be upgraded. These are q4 goals, so if all goes well, hopefully by the end of the year, the databases that the DB team is responsible for will not have any MySQL 5.0 servers. (next step, upgrade to MariaDB 5.5 in 2013!)

In 11 days I will be heading to NagiosWorld, and from there I go straight to MySQL Connect, and that is the end of September. So this week we have been focusing on wrapping up our 3rd quarter goals (and making new goals for the 4th quarter of 2012). We have also seemingly done a bunch of work on different aspects of the Bugzilla database clusters. This week, the database team has:

  • Built out new Bugzilla production database cluster in a failover data center, including monitoring.
  • Analyzed a full days’ worth of general and slow query logs for each database in the Bugzilla cluster so we can make optimization recommendations. Optimization recommendations are a goal for the 4th quarter, so this is pre-work so we can get a good metric for the goal.
  • Successfully implemented consistency checks for Bugzilla using pt-table-checksum, and a Nagios check from PalominoDB that we updated (the patch has been sent back to PalominoDB).
  • Upgraded the Bugzilla staging cluster to MariaDB 5.5.
  • Converted all of the Affiliates tables to UTF-8 instead of latin1. character-set-server has been set to utf8 for a while, but there were several legacy tables causing problems.
  • Created a new read-only slave cluster for Addons, so we can use it for version checking. This will be useful in our eventual change to the Mozilla Marketplace, which will house apps for our mobile platform, Firefox OS.
  • Assisted other folks on the Systems team in their goal to not have any users on any machines that are not in puppet. The DB team only had a few non-standard users on a few machines, but we did our part. 1200 machines were audited and I’m proud of my Systems teammates for getting through it all – we now have no non-puppetized users!
  • Rebuilt and patched the database servers holding our internal Puppet Dashboard data. The servers were deprecated last month, we were looking for a Puppet Dashboard replacement, but realized that there was no other appropriate dashboard for puppet, so we are back to using Puppet Dashboard.
  • Converted one of our development database clusters to use innodb_file_per_table and put it into puppet (most of our clusters are innodb_file_per_table at this point).
  • Upgraded one of our multi-use database clusters to use Percona’s patched MySQL 5.1 and put it into puppet.
  • Fixed some slow query log copies that were using a root account to use a less-privileged (but still in LDAP) account for slow query log copies.
  • Made staging and dev databases and users for Input, Mozilla’s primary user feedback application.
  • Fixed a replication issue on one of our multi-use database clusters – someone was trying to add foreign keys that already existed on the slave.
  • Created a new database for a new imaging service on Buildbot, Mozilla’s continuous integration tool.
  • Finished updating the MySQL ACL’s to include our failover load balancer, and to exclude the ACL’s from the previous data center that we finished moving from in May. This was a q3 goal; Postgres ACL’s still need to be looked at.
  • Pushed a large Mozilla rapid beta (aka “Mobeta”) update on our Postgres databases.
  • Helped debug an issue where the database behind basket, our newsletter subscription service, did not seem to be updating (it was a code issue, not a db issue).
  • Fixed an issue where a large transaction were causing our backup server to stop replicating a service. The transaction was more than 20G in size, but was on a database that was set with replicate_ignore_database, so it could be safely ignored.
  • Exported a list of Mozillian e-mails to help organize the community around Thunderbird.
  • Fixed an issue where replication on a backup server died due to “out of resources”.
  • Made a bug to check consistency for the Addons and Support database clusters using pt-table-checksum. This is a q4 goal.
  • Made bugs to get rid of MySQL 5.0 for a dozen clusters (yes, there are exactly 12 clusters still on MySQL 5.0 that we manage). 1 definitely can be decommissioned, 2 others may be decommissioned or may be upgraded to Percona’s patched MySQL 5.1, and the rest will be upgraded. These are q4 goals, so if all goes well, hopefully by the end of the year, the databases that the DB team is responsible for will not have any MySQL 5.0 servers. (next step, upgrade to MariaDB 5.5 in 2013!)

MySQL Connect Guide for Developers

MySQL Connect is a new conference with a lot of good technical content. In the past, it has been helpful to have “guides” of MySQL conferences, so in this post I will give my guide to MySQL Connect for Developers. Gerry and I did a lot of recommendations in OurSQL Episode 103, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for developers, with times. Note that these are handpicked from what I think developers would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am -12:30 pm
You are in luck if you are, or want to be, a Java developer, because there is a hands-on lab for Developing Applications with MySQL and Java with Mark Matthews. Hands-on labs are where you learn by doing, so this is not to be missed if you want to learn how to develop scalable Java applications.

On the internals side, there are a few good optimizer talks that would benefit developers. One is Olav Sandstå’s MySQL Optimizer Overview – you will learn how MySQL chooses the optimal path, and by learning how MySQL does that, you can write better queries.

Oracle’s Geir Høydalsvik presents What’s New in MySQL Server 5.6? which explains the new features and performance enhancements in the 5.6 MySQL Server release candidate. I have a feeling they will have a new release version by MySQL Connect (another DMR? beta? RC?) and we will hear about all the new features in that release, too.

1 – 2 pm
Manyi Lu will present an Overview of New Optimizer Features in MySQL 5.6, talking about multi-range read, index condition pushdown, batched key access, and the new EXPLAIN features.

If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

As you probably know, InnoDB is the default storage engine for Oracle’s MySQL as of MySQL Release 5.5. It provides the standard ACID-compliant transactions, row-level locking, multiversion concurrency control, and referential integrity. InnoDB also implements several innovative technologies to improve its performance and reliability. Chunsen Sung presents 10 Things You Should Know About InnoDB, which is a brief history of InnoDB; its main features; and some recent enhancements for better performance, scalability, and availability.

2:30-3:30 pm
Alexander Rubin from Oracle has a session on the New MySQL Full-text Search Features and Solutions, including the new InnoDB FULLTEXT search*.

If you develop with Python, Geert Vanderkelen has a session about Developing Python Applications with MySQL Utilities and MySQL Connector/Python.

4:00 – 5:00 pm
Heard all the hype about MySQL Cluster? See for yourself what it can do in a Hands-on Lab presented by Santo Leto, Get Started With MySQL Cluster.

Or, head on over to a talk featuring Oracle and Amazon engineers talking about Using MySQL in the Cloud. The Amazon folks know their subject matter when it comes to cloud computing!

5:30 – 6:30 pm
If you are a developer interested in performance, Mark Matthews presents MySQL Enterprise’s Monitor for Developers – the description says you will “learn how to resolve potential performance and scalability issues revealed via performance graphs and query analyzer data from the Monitor feature of MySQL Enterprise Edition and apply them to your own application development. In addition, you will learn how to extend the Monitor feature with your own application-specific performance metrics to extend your visibility into performance issues into the deployment and operations realm.”

If you are interested in migrating to MySQL from Microsoft SQL Server, do not miss Sergio De La Cruz Rodriguez’s talk on Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

Or if you are a more security-focused developer, head on over to my own talk on Google-Hacking MySQL

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

7:00 – 8:00 pm – Birds of a Feather talks – informal discussion sessions
I think these might be relevant to developers:
Python
MySQL Community
Query Optimizations

And it is only the end of the first day! The exhibit hall is open 9:30 am – 1:30 pm, and 7 – 9 pm.

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

10:15 – 11:15 am
If you develop with PHP, don’t miss Johannes Schlüter presenting Current State of PHP and MySQL, which will explain some of the relevant MySQL changes in PHP 5.4.

I am personally quite interested in the provocatively titled Big Data Is a Big Scam (Most of the Time) by PayPal‘s Chief Architect Daniel Austin. This is one of the more unique talks at this conference, so I will just paste the description here: “This session challenges the conventional wisdom and tries to dispel some of the myths about big data, NoSQL, and everything. When do you need a NoSQL system? How do you choose one from another amid the hype? And how do you know when to stick to your RDBMS and resist becoming a follower of big data fashion? Come and hear what you need to know about your options and how to make wise decisions about solutions to your big data problems.”

11:45 am – 12:45 pm
These days, you can’t talk about performance without talking about NoSQL. Andrew Morgan and John Duncan will present Developing High-Throughput Services with NoSQL APIs to InnoDB and MySQL Cluster, which explains how to maintain all the advantages of existing relational databases while providing blazing-fast performance for simple queries. This session describes the memcached connectors and examines some use cases for how MySQL and memcached fit together in application architectures. It does the same for the newest MySQL Cluster native connector, an easy-to-use, fully asynchronous connector for Node.js.

Everybody is security-conscious, with good reason. Joro Kodinov talks about MySQL Security: Past and Present, giving an overview of MySQL security in the past, present and future versions (with MySQL 5.6).

1:15 – 2:15 pm
Mats Kindahl will talk about Sharding with PHP and MySQL for distributing writes over a cluster, including static and dynamic sharding schemes.

Into NoSQL? Check out Ligaya Turmelle’s presentation on A Journey into NoSQLand: MySQL’s NoSQL Implementation. She will discuss how the memcached API is being used to hook directly into the InnoDB and MySQL Cluster (NDB) storage engines — skipping the MySQL server completely.

Want to know what those new optimizer features could look like for you? Don’t miss Øystein Grøvlen’s talk on Query Performance Comparison of MySQL 5.5 and MySQL 5.6. You’ll see comparisons using the DBT-3 benchmark, and explanations of which queries perform better, and why, in MySQL 5.6.

2:45 – 3:45 pm
Oracle ACE Director Ronald Bradford brings his talk on Improving Performance with Better Indexes to MySQL Connect. So often there is a problem in production that a simple index fixes, and developers should be indexing optimally from the start.

I am a tried-and-true commandline lover, but I know many developers love IDE’s and their GUI interfaces. If you are one of those folks, you might want to check out Alfredo Kojima’s Getting the Most out of MySQL with MySQL Workbench. MySQL Workbench can be used to write and debug SQL queries, manage data and schemas, create databases from scratch, or maintain existing ones through graphical enhanced entity-relationship (EER) models and the advanced SQL Editor functionality, which is all available in the free community edition of MySQL Workbench.

Alexander Rubin will talk about In-Depth Query Optimization for MySQL, which includes tips on how to explain and optimize your slow queries and how to make your reporting queries execute much faster than before. He will also discuss MySQL optimizer internals, show some benchmark results, and how to use the new performance_schema in MySQL 5.6 to monitor queries.

4:15 – 5:15 pm
If you like to know about upcoming features, do not miss Oracle’s Evgeny Potemkin session on Powerful EXPLAIN in MySQL 5.6. MySQL 5.6 offers several new additions that give more-detailed information about the query plan and make it easier to understand at the same time – including structured EXPLAIN in JSON format, EXPLAIN for INSERT/UPDATE/DELETE, and optimizer trace.

If you develop with MySQL on Windows, the Windows Experience Group is for you. Learn about the current and future features of the MySQL Windows Installer and Connector/Net, as well as the improvements for the MySQL server itself on Windows.

5:45 – 6:45 pm
If you use do text searching with MySQL, Sphinx‘s own Andrew Aksonoff has a session on Full-text search with MySQL and Sphinx*.

Bradley Kuszmaul of Tokutek presents Solving the Challenges of Big Databases with MySQL. I love that Bradleys defines what a “big database” is – “more than ten times as large as main memory”.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House.

Next week I will write a blog post about MySQL Connect for DBAs. Watch for it!

* Actually, we just did a podcast series including the basics of FULLTEXT search on MyISAM and how FULLTEXT search works on InnoDB in 5.6.

MySQL Connect is a new conference with a lot of good technical content. In the past, it has been helpful to have guides of MySQL conferences, so in this post I will give my guide to MySQL Connect for Developers. Gerry and I did a lot of recommendations in OurSQL Episode 103, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So heres a guide to MySQL Connect for developers, with times. Note that these are handpicked from what I think developers would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am -12:30 pm
You are in luck if you are, or want to be, a Java developer, because there is a hands-on lab for Developing Applications with MySQL and Java with Mark Matthews. Hands-on labs are where you learn by doing, so this is not to be missed if you want to learn how to develop scalable Java applications.

On the internals side, there are a few good optimizer talks that would benefit developers. One is Olav Sandstås MySQL Optimizer Overview you will learn how MySQL chooses the optimal path, and by learning how MySQL does that, you can write better queries.

Oracles Geir Høydalsvik presents What’s New in MySQL Server 5.6? which explains the new features and performance enhancements in the 5.6 MySQL Server release candidate. I have a feeling they will have a new release version by MySQL Connect (another DMR? beta? RC?) and we will hear about all the new features in that release, too.

1 2 pm
Manyi Lu will present an Overview of New Optimizer Features in MySQL 5.6, talking about multi-range read, index condition pushdown, batched key access, and the new EXPLAIN features.

If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

As you probably know, InnoDB is the default storage engine for Oracle’s MySQL as of MySQL Release 5.5. It provides the standard ACID-compliant transactions, row-level locking, multiversion concurrency control, and referential integrity. InnoDB also implements several innovative technologies to improve its performance and reliability. Chunsen Sung presents 10 Things You Should Know About InnoDB, which is a brief history of InnoDB; its main features; and some recent enhancements for better performance, scalability, and availability.

2:30-3:30 pm
Alexander Rubin from Oracle has a session on the New MySQL Full-text Search Features and Solutions, including the new InnoDB FULLTEXT search*.

If you develop with Python, Geert Vanderkelen has a session about Developing Python Applications with MySQL Utilities and MySQL Connector/Python.

4:00 5:00 pm
Heard all the hype about MySQL Cluster? See for yourself what it can do in a Hands-on Lab presented by Santo Leto, Get Started With MySQL Cluster.

Or, head on over to a talk featuring Oracle and Amazon engineers talking about Using MySQL in the Cloud. The Amazon folks know their subject matter when it comes to cloud computing!

5:30 6:30 pm
If you are a developer interested in performance, Mark Matthews presents MySQL Enterprise’s Monitor for Developers the description says you will learn how to resolve potential performance and scalability issues revealed via performance graphs and query analyzer data from the Monitor feature of MySQL Enterprise Edition and apply them to your own application development. In addition, you will learn how to extend the Monitor feature with your own application-specific performance metrics to extend your visibility into performance issues into the deployment and operations realm.

If you are interested in migrating to MySQL from Microsoft SQL Server, do not miss Sergio De La Cruz Rodriguezs talk on Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

Or if you are a more security-focused developer, head on over to my own talk on Google-Hacking MySQL

6:30 8:30 pm MySQL Connect Reception in the Continental Ballroom

7:00 8:00 pm Birds of a Feather talks informal discussion sessions
I think these might be relevant to developers:
Python
MySQL Community
Query Optimizations

And it is only the end of the first day! The exhibit hall is open 9:30 am 1:30 pm, and 7 9 pm.

Sunday, September 30th
8:30 am 9:30 am
MySQL Perspectives Keynote featuring Twitters DB Team manager Jeremy Cole, PayPals Chief Architect Daniel Austin, Verizon Wireless IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

10:15 11:15 am
If you develop with PHP, dont miss Johannes Schlüter presenting Current State of PHP and MySQL, which will explain some of the relevant MySQL changes in PHP 5.4.

I am personally quite interested in the provocatively titled Big Data Is a Big Scam (Most of the Time) by PayPals Chief Architect Daniel Austin. This is one of the more unique talks at this conference, so I will just paste the description here: This session challenges the conventional wisdom and tries to dispel some of the myths about big data, NoSQL, and everything. When do you need a NoSQL system? How do you choose one from another amid the hype? And how do you know when to stick to your RDBMS and resist becoming a follower of big data fashion? Come and hear what you need to know about your options and how to make wise decisions about solutions to your big data problems.

11:45 am 12:45 pm
These days, you cant talk about performance without talking about NoSQL. Andrew Morgan and John Duncan will present Developing High-Throughput Services with NoSQL APIs to InnoDB and MySQL Cluster, which explains how to maintain all the advantages of existing relational databases while providing blazing-fast performance for simple queries. This session describes the memcached connectors and examines some use cases for how MySQL and memcached fit together in application architectures. It does the same for the newest MySQL Cluster native connector, an easy-to-use, fully asynchronous connector for Node.js.

Everybody is security-conscious, with good reason. Joro Kodinov talks about MySQL Security: Past and Present, giving an overview of MySQL security in the past, present and future versions (with MySQL 5.6).

1:15 2:15 pm
Mats Kindahl will talk about Sharding with PHP and MySQL for distributing writes over a cluster, including static and dynamic sharding schemes.

Into NoSQL? Check out Ligaya Turmelles presentation on A Journey into NoSQLand: MySQL’s NoSQL Implementation. She will discuss how the memcached API is being used to hook directly into the InnoDB and MySQL Cluster (NDB) storage engines — skipping the MySQL server completely.

Want to know what those new optimizer features could look like for you? Dont miss Øystein Grøvlens talk on Query Performance Comparison of MySQL 5.5 and MySQL 5.6. Youll see comparisons using the DBT-3 benchmark, and explanations of which queries perform better, and why, in MySQL 5.6.

2:45 3:45 pm
Oracle ACE Director Ronald Bradford brings his talk on Improving Performance with Better Indexes to MySQL Connect. So often there is a problem in production that a simple index fixes, and developers should be indexing optimally from the start.

I am a tried-and-true commandline lover, but I know many developers love IDEs and their GUI interfaces. If you are one of those folks, you might want to check out Alfredo Kojimas Getting the Most out of MySQL with MySQL Workbench. MySQL Workbench can be used to write and debug SQL queries, manage data and schemas, create databases from scratch, or maintain existing ones through graphical enhanced entity-relationship (EER) models and the advanced SQL Editor functionality, which is all available in the free community edition of MySQL Workbench.

Alexander Rubin will talk about In-Depth Query Optimization for MySQL, which includes tips on how to explain and optimize your slow queries and how to make your reporting queries execute much faster than before. He will also discuss MySQL optimizer internals, show some benchmark results, and how to use the new performance_schema in MySQL 5.6 to monitor queries.

4:15 5:15 pm
If you like to know about upcoming features, do not miss Oracles Evgeny Potemkin session on Powerful EXPLAIN in MySQL 5.6. MySQL 5.6 offers several new additions that give more-detailed information about the query plan and make it easier to understand at the same time including structured EXPLAIN in JSON format, EXPLAIN for INSERT/UPDATE/DELETE, and optimizer trace.

If you develop with MySQL on Windows, the Windows Experience Group is for you. Learn about the current and future features of the MySQL Windows Installer and Connector/Net, as well as the improvements for the MySQL server itself on Windows.

5:45 6:45 pm
If you use do text searching with MySQL, Sphinxs own Andrew Aksonoff has a session on Full-text search with MySQL and Sphinx*.

Bradley Kuszmaul of Tokutek presents Solving the Challenges of Big Databases with MySQL. I love that Bradleys defines what a big database is more than ten times as large as main memory.

From 7 9 pm on Sunday, there is the Taylor Street Open House.

Next week I will write a blog post about MySQL Connect for DBAs. Watch for it!

* Actually, we just did a podcast series including the basics of FULLTEXT search on MyISAM and how FULLTEXT search works on InnoDB in 5.6.

Video: An elephant, an ant colony and a ferret walk into a database…

Ryan Betts and Ning Shi of VoltDB give a talk and demo of VoltDB at the Boston MySQL User Group on March 12, 2012. They presented a use-case based overview of what VoltDB is and the types of problems it helps to solve. They discussed some newer VoltDB features, including the WAN replication beta functionality and command logging (a durable write-ahead log).

Richard Laskey did an amazing job with the video and audio. I am absolutely awestruck that you can actually *see* the demo – usually small text is unreadable! My hats off to Richard, who does all this as a volunteer.

The

Ryan Betts and Ning Shi of VoltDB give a talk and demo of VoltDB at the Boston MySQL User Group on March 12, 2012. They presented a use-case based overview of what VoltDB is and the types of problems it helps to solve. They discussed some newer VoltDB features, including the WAN replication beta functionality and command logging (a durable write-ahead log).

Richard Laskey did an amazing job with the video and audio. I am absolutely awestruck that you can actually *see* the demo – usually small text is unreadable! My hats off to Richard, who does all this as a volunteer.

The Posted on Categories UncategorizedLeave a comment on Video: An elephant, an ant colony and a ferret walk into a database…

Slides from “MariaDB: A MySQL Replacement?”

Monday evening, Max Mether from SkySQL came to the Boston MySQL User Group and presented “MariaDB: A MySQL Replacement?” We did take video and will link to it when it’s uploaded. That usually takes a while, so I figured I would link to the PDF slides, which Max has graciously shared.

Note that on slide 26, “All highly speculative” is a bit misleading when taken out of context. We know that MariaDB is working on all those things (including a Cassandra storage engine and Multi-source replication) because their worklogs and bug tracking system are 100% open, so that’s not the “highly speculative” part. We just do not know if they will be in the next production release of MariaDB.

Monday evening, Max Mether from SkySQL came to the Boston MySQL User Group and presented “MariaDB: A MySQL Replacement?” We did take video and will link to it when it’s uploaded. That usually takes a while, so I figured I would link to the PDF slides, which Max has graciously shared.

Note that on slide 26, “All highly speculative” is a bit misleading when taken out of context. We know that MariaDB is working on all those things (including a Cassandra storage engine and Multi-source replication) because their worklogs and bug tracking system are 100% open, so that’s not the “highly speculative” part. We just do not know if they will be in the next production release of MariaDB.

Mozilla DB News, Fri 7 Sep – checksums

Well, it is getting closer and closer to the end of the 3rd quarter of 2012, and so we are starting to hunker down and make sure our goals get met. One of those goals is to start running regular checksums on our databases, so that when doing a switchover (a manual “failover” for maintenance purposes) or when taking backups we can ensure we have maintained the same level of data integrity that the master has. And, of course, monitor the output of those checksums with Nagios. So this week has seen a lot of fiddling with pt-table-checksum to get the appropriate values….without further ado, here’s the list of what the database team did this week:

  • Found out the hard way that slow_query_log=ON is silently ignored by MySQL, and that the syntax to turn on the slow query logs is slow_query_log=1 (even when log_output and slow_query_log_file are set properly).
  • Updated the Graphs database to be informed about Talos/Mountain Lion.
  • Fixed the problem where a getpersonas table locked up intermittently. The table locked up because it is MyISAM, but it had a FULLTEXT index on it. As it turns out, that FULLTEXT index was replaced by Elastic Search, so there was no problem dropping the FULLTEXT index and converting the table to InnoDB.
  • Fixed a problem where cron was not running on one of our backup servers.
  • Researched and debugged a problem with thimble having lost data. Unfortunately, that data was not recoverable.
  • Created development and stage database environments for the Bouncer application.
  • Converted all the Affiliates tables that had a default character set of latin1 to have a default character set of utf8 in dev, stage and production (character_set_server=utf8 has been set on that server for a while, but it was after those tables were created).
  • Deleted some spam comments from a Bugzilla bug.
  • As mentioned above, got checksums working, specifically for that Bugzilla database cluster.
  • Downloaded, updated, tested and submitted a patch back to the owners for PalominoDB’s Nagios check for pt-table-checksum table results, and queued the Nagios check up to be put into monitoring next week.
  • Rebuilt a metrics server for staging purposes.
  • Converted a development server to use innodb_file_per_table.
  • Checked the size of the Tinderboxpushlog database after last month’s defragmentation to see how much larger it became.
  • Used pt-query-digest to parse 40G (about 1/3 of what we have total) of general and slow query logs from one day of Bugzilla logging to come up with some metrics for which Bugzilla queries we are going to optimize in Q4 2012.
  • Tuned the innodb buffer pool on an auxiliary database for Addons database cluster to fix a bug.

These next few weeks will be exciting and fast-paced!

Well, it is getting closer and closer to the end of the 3rd quarter of 2012, and so we are starting to hunker down and make sure our goals get met. One of those goals is to start running regular checksums on our databases, so that when doing a switchover (a manual “failover” for maintenance purposes) or when taking backups we can ensure we have maintained the same level of data integrity that the master has. And, of course, monitor the output of those checksums with Nagios. So this week has seen a lot of fiddling with pt-table-checksum to get the appropriate values….without further ado, here’s the list of what the database team did this week:

  • Found out the hard way that slow_query_log=ON is silently ignored by MySQL, and that the syntax to turn on the slow query logs is slow_query_log=1 (even when log_output and slow_query_log_file are set properly).
  • Updated the Graphs database to be informed about Talos/Mountain Lion.
  • Fixed the problem where a getpersonas table locked up intermittently. The table locked up because it is MyISAM, but it had a FULLTEXT index on it. As it turns out, that FULLTEXT index was replaced by Elastic Search, so there was no problem dropping the FULLTEXT index and converting the table to InnoDB.
  • Fixed a problem where cron was not running on one of our backup servers.
  • Researched and debugged a problem with thimble having lost data. Unfortunately, that data was not recoverable.
  • Created development and stage database environments for the Bouncer application.
  • Converted all the Affiliates tables that had a default character set of latin1 to have a default character set of utf8 in dev, stage and production (character_set_server=utf8 has been set on that server for a while, but it was after those tables were created).
  • Deleted some spam comments from a Bugzilla bug.
  • As mentioned above, got checksums working, specifically for that Bugzilla database cluster.
  • Downloaded, updated, tested and submitted a patch back to the owners for PalominoDB’s Nagios check for pt-table-checksum table results, and queued the Nagios check up to be put into monitoring next week.
  • Rebuilt a metrics server for staging purposes.
  • Converted a development server to use innodb_file_per_table.
  • Checked the size of the Tinderboxpushlog database after last month’s defragmentation to see how much larger it became.
  • Used pt-query-digest to parse 40G (about 1/3 of what we have total) of general and slow query logs from one day of Bugzilla logging to come up with some metrics for which Bugzilla queries we are going to optimize in Q4 2012.
  • Tuned the innodb buffer pool on an auxiliary database for Addons database cluster to fix a bug.

These next few weeks will be exciting and fast-paced!

OurSQL, Time and Commitment

Producing a podcast takes a lot of time. The first 26 episodes of OurSQL: The MySQL Database Community Podcast were produced solely by me, but the podcasts were not regularly recurring. Those 26 episodes occurred over a period of 2 years and 3 months, which is less than one episode per month. Doing it all myself took a lot of time. I loved podcasting, and still do, but I could not do it all myself.

I thought long and hard during a 22-month hiatus. For almost 2 years, I reconciled my love for podcasting with the insane amount of work that doing a podcast solo took. I realized that what I wanted was:

– a co-host to keep me accountable. Too often I would set out with good intentions of “today I will podcast” that produced nothing because I did not prioritize doing the podcast.

– an audio engineer to do the post-processing. Post-processing was, for me, a very nitpicky process, and I would fiddle with the audio for hours to get the sentences to sound OK. And I did not enjoy the process. Which means I needed money, to hire an audio engineer.

The planets aligned and in November 2010 I caught the attention of some folks at the Oracle Technology Network, after I fiercely shot down some myths about a controversial issue (so much so that twitter blocked me for a while because I tweeted so much, responding to the controversy posted by too many folks). After the controversy died down, Justin Kestelyn called to thank me and said, “If there’s anything we can help you with, let us know.”

And I said, “Well, actually, there is this one thing….”. Justin secured the funding for an audio engineer, and by mid-December, I had found a co-host (the wonderful Sarah Novotny) and an audio engineer (the amazing Rich Goyette). The podcast was up and running again. Since December 16, 2010, we have produced weekly episodes, with a week off here and there, but no more long pauses between podcasts.

Even with all this assistance, it still takes a lot of time to produce a podcast. Usually I spend 3 hours research and writing for each podcast, plus another 1.5 to 2 hours to record the podcast and another 45-60 minutes to listen to the finished podcast, write up and publish the show notes, and promote the new episode on Facebook, Twitter and Google Plus. That’s 5-6 hours per week. Some episodes are interviews, which usually take a little longer to record, but less time to research. So let’s say, on average, I spend 4 hours per week doing a podcast, consistently, since December 2010.

This year, the podcast hit two milestones, only one of which was planned. In April, we won a MySQL Community Award, and in July, we hit our 100th episode (episode 99) and episode 100, which we celebrated by interviewing Randall Munroe of XKCD.

Winning the Community Award was extremely moving for me. I had been upset with myself for a while, because there were a few personal goals I had not prioritized (and thus had not met), and I felt that reflected that I was bad at making a commitment to meet those goals. When we won the Community Award, I realized that I had not prioritized some personal goals, but I certainly committed my time well and doing a lot that helped others.

Reaching the 100th episode milestone did not seem like a big deal, until we won the community award. Even with all the help I have sought out and been given, each episode represents several hours of my time. Having produced 100 episodes means having put in several hundred hours of my time. That’s a LOT of time. So it is no surprise that I have not been able to meet all of my personal goals. 4 hours a week, consistently, is a huge commitment, and one that I am proud of.

It has been a rocky road for the podcast, but with the help of the Oracle Technology Network, Sarah Novotny, Gerry Narvaja and Rich Goyette, I feel like the podcast is an institution now – if I had to stop podcasting for some reason, it would continue on without me. Feeling that I have left a legacy that helps folks learn about databases is an amazing feeling.

Producing a podcast takes a lot of time. The first 26 episodes of OurSQL: The MySQL Database Community Podcast were produced solely by me, but the podcasts were not regularly recurring. Those 26 episodes occurred over a period of 2 years and 3 months, which is less than one episode per month. Doing it all myself took a lot of time. I loved podcasting, and still do, but I could not do it all myself.

I thought long and hard during a 22-month hiatus. For almost 2 years, I reconciled my love for podcasting with the insane amount of work that doing a podcast solo took. I realized that what I wanted was:

– a co-host to keep me accountable. Too often I would set out with good intentions of “today I will podcast” that produced nothing because I did not prioritize doing the podcast.

– an audio engineer to do the post-processing. Post-processing was, for me, a very nitpicky process, and I would fiddle with the audio for hours to get the sentences to sound OK. And I did not enjoy the process. Which means I needed money, to hire an audio engineer.

The planets aligned and in November 2010 I caught the attention of some folks at the Oracle Technology Network, after I fiercely shot down some myths about a controversial issue (so much so that twitter blocked me for a while because I tweeted so much, responding to the controversy posted by too many folks). After the controversy died down, Justin Kestelyn called to thank me and said, “If there’s anything we can help you with, let us know.”

And I said, “Well, actually, there is this one thing….”. Justin secured the funding for an audio engineer, and by mid-December, I had found a co-host (the wonderful Sarah Novotny) and an audio engineer (the amazing Rich Goyette). The podcast was up and running again. Since December 16, 2010, we have produced weekly episodes, with a week off here and there, but no more long pauses between podcasts.

Even with all this assistance, it still takes a lot of time to produce a podcast. Usually I spend 3 hours research and writing for each podcast, plus another 1.5 to 2 hours to record the podcast and another 45-60 minutes to listen to the finished podcast, write up and publish the show notes, and promote the new episode on Facebook, Twitter and Google Plus. That’s 5-6 hours per week. Some episodes are interviews, which usually take a little longer to record, but less time to research. So let’s say, on average, I spend 4 hours per week doing a podcast, consistently, since December 2010.

This year, the podcast hit two milestones, only one of which was planned. In April, we won a MySQL Community Award, and in July, we hit our 100th episode (episode 99) and episode 100, which we celebrated by interviewing Randall Munroe of XKCD.

Winning the Community Award was extremely moving for me. I had been upset with myself for a while, because there were a few personal goals I had not prioritized (and thus had not met), and I felt that reflected that I was bad at making a commitment to meet those goals. When we won the Community Award, I realized that I had not prioritized some personal goals, but I certainly committed my time well and doing a lot that helped others.

Reaching the 100th episode milestone did not seem like a big deal, until we won the community award. Even with all the help I have sought out and been given, each episode represents several hours of my time. Having produced 100 episodes means having put in several hundred hours of my time. That’s a LOT of time. So it is no surprise that I have not been able to meet all of my personal goals. 4 hours a week, consistently, is a huge commitment, and one that I am proud of.

It has been a rocky road for the podcast, but with the help of the Oracle Technology Network, Sarah Novotny, Gerry Narvaja and Rich Goyette, I feel like the podcast is an institution now – if I had to stop podcasting for some reason, it would continue on without me. Feeling that I have left a legacy that helps folks learn about databases is an amazing feeling.

CodeConnexx Is Coming

CodeConnexx is a 2-day, 1-track conference being held in Indianapolis, Indiana Nov 8-9th. It is only $99 for 2 days; the first day is all tech, and the second day focuses on soft topics like how to manage geeks, working together, mentoring and contributing to open source. The best part about this conference is they truly understand work/life balance and the importance of gender diversity. There is *free* childcare for conference attendees, and 64% of the speakers are female.

I am proud and honored to be doing a tech session on “MySQL Query Optimization with EXPLAIN” and a life skills session on “The Art of Cat Herding: How to Manage Geeks”. And I am also exceedingly proud that Mozilla has sponsored the childcare at CodeConnexx. Will I see you there?

Register today!

CodeConnexx is a 2-day, 1-track conference being held in Indianapolis, Indiana Nov 8-9th. It is only $99 for 2 days; the first day is all tech, and the second day focuses on soft topics like how to manage geeks, working together, mentoring and contributing to open source. The best part about this conference is they truly understand work/life balance and the importance of gender diversity. There is *free* childcare for conference attendees, and 64% of the speakers are female.

I am proud and honored to be doing a tech session on “MySQL Query Optimization with EXPLAIN” and a life skills session on “The Art of Cat Herding: How to Manage Geeks”. And I am also exceedingly proud that Mozilla has sponsored the childcare at CodeConnexx. Will I see you there?

Register today!

Should You Use GROUP BY/ORDER BY NULL By Default?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

At Northeast PHP a few weeks ago, an audience member came up to me after my talk about indexing and asked about ORDER BY NULL for optimal queries. I have to say, I was surprised, as I had not heard about using ORDER BY NULL. In a nutshell, apparently when MySQL does a GROUP BY, there is an implicit ORDER BY the same fields, which adds extra overhead for the mere purpose of returning the values in order of the GROUP BY fields.

I knew about the implicit ORDER BY, but I thought that was required for the GROUP BY, and made the GROUP BY faster. After all, it’s easier to group like items together if they are already sorted, right?

However, every single source I have researched seems to imply that, no, it is just overhead and completely unnecessary unless you really do want the results returned in the same order as the GROUP BY. For example, if you query with GROUP BY last_name and do not care about having the rows returned in lexical order of last names, you would use GROUP BY last_name ORDER BY NULL.

This leads me to believe that by default, whenever doing a GROUP BY, it is a good idea to use ORDER BY NULL. I have not seen that as a piece of advice that is generally given out in talks, either. So maybe there is something I am not understanding properly? I would love to know what everyone thinks.

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

My question is – why wait until you see a “Using filesort” in your EXPLAIN plan? If the overhead is only used to order the results, and that is not desired, why not just use ORDER BY NULL by default, whenever using a GROUP BY query?

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

At Northeast PHP a few weeks ago, an audience member came up to me after my talk about indexing and asked about ORDER BY NULL for optimal queries. I have to say, I was surprised, as I had not heard about using ORDER BY NULL. In a nutshell, apparently when MySQL does a GROUP BY, there is an implicit ORDER BY the same fields, which adds extra overhead for the mere purpose of returning the values in order of the GROUP BY fields.

I knew about the implicit ORDER BY, but I thought that was required for the GROUP BY, and made the GROUP BY faster. After all, it’s easier to group like items together if they are already sorted, right?

However, every single source I have researched seems to imply that, no, it is just overhead and completely unnecessary unless you really do want the results returned in the same order as the GROUP BY. For example, if you query with GROUP BY last_name and do not care about having the rows returned in lexical order of last names, you would use GROUP BY last_name ORDER BY NULL.

This leads me to believe that by default, whenever doing a GROUP BY, it is a good idea to use ORDER BY NULL. I have not seen that as a piece of advice that is generally given out in talks, either. So maybe there is something I am not understanding properly? I would love to know what everyone thinks.

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

My question is – why wait until you see a “Using filesort” in your EXPLAIN plan? If the overhead is only used to order the results, and that is not desired, why not just use ORDER BY NULL by default, whenever using a GROUP BY query?

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?