sheeri – Page 10 – Sheeri Dot Org

Mozilla DB News Friday 21 September: The Calm Before the Storm

Next week I will be at Nagios World in St. Paul and then going directly to San Francisco for MySQL Connect and Oracle OpenWorld (I have ACE Director responsibilities for OOW), so this week was frantically finishing up a lot of projects before finalizing my talks. All but one of the MySQL 3rd quarter goals have been met (there are 2 Postgres goals that are still in progress, but our Postgres DBA is not going to conferences next week so there’s plenty of time for those.

And of course, there are the spring conferences to submit to. I have already submitted conference proposals to Confoo and RMOUG Training Days, and am working on proposals for SCALEx11 and Percona Live (calls for papers close Saturday October 12th!).

And somewhere in all that, the database team managed to get a bunch of stuff done this week, even though it was a short week for me (due to the Jewish New Year, and also my birthday!):

Tweaked our puppet configuration for our backup servers – our backup servers are running several instances of MySQL through sockets only, and some of the puppet configs assumed a running MySQL on port 3306.
Dealt with a Mozilla Labs experiment running crazy queries against Bugzilla, twice.
Assessed a problem with one of our development clusters, where several databases could not be dropped due to MySQL crashing. The undo log was corrupt on the master and one slave, but not another slave. We could not determine the cause of the corruption, though we believe a sync on Sept 6th caused the corruption to spread from the master to the one slave. The solution was to restore from the non-corrupt slave (which was also our backup slave, so that was pretty easy). We could have also set innodb_crash_recovery and exported and imported, but restoring from hot backup was faster.
Opened network flows to get slow query logs copied from the new version check databases for the Addons cluster, and updated SSH keys in puppet.
Upgraded one of our staging databases from MySQL 5.0 to Percona 5.1, added it to puppet, and converted it from innodb_file_per_table. (phew!)
Extracted data from Bugzilla about who is watching Mozilla Localizations components.
Debugged a DHCP issue on one of our Addons staging cluster machines.
Started compressing binary logs on our backup server as space was starting to get a bit low.
Debugged an issue where one of the machines in our Support dev/staging database cluster fell off the internet due to an IP conflict.
Replaced some text in a Bugzilla bug when the profanity filter failed to obscure some profanity.
Came up with a process to regularly extract data from an appliance that has MySQL embedded in it, and did a one-off of the process to get a database started.
Gave a developer an SQL backup of the production Buildbot database for analysis.

Next week is sure to be hectic, with two conferences. Happy autumn for those in the Northern Hemisphere, and happy spring for those in the Southern Hemisphere!

And somewhere in all that, the database team managed to get a bunch of stuff done this week, even though it was a short week for me (due to the Jewish New Year, and also my birthday!):

Tweaked our puppet configuration for our backup servers our backup servers are running several instances of MySQL through sockets only, and some of the puppet configs assumed a running MySQL on port 3306.
Dealt with a Mozilla Labs experiment running crazy queries against Bugzilla, twice.
Assessed a problem with one of our development clusters, where several databases could not be dropped due to MySQL crashing. The undo log was corrupt on the master and one slave, but not another slave. We could not determine the cause of the corruption, though we believe a sync on Sept 6th caused the corruption to spread from the master to the one slave. The solution was to restore from the non-corrupt slave (which was also our backup slave, so that was pretty easy). We could have also set innodb_crash_recovery and exported and imported, but restoring from hot backup was faster.
Opened network flows to get slow query logs copied from the new version check databases for the Addons cluster, and updated SSH keys in puppet.
Upgraded one of our staging databases from MySQL 5.0 to Percona 5.1, added it to puppet, and converted it from innodb_file_per_table. (phew!)
Extracted data from Bugzilla about who is watching Mozilla Localizations components.
Debugged a DHCP issue on one of our Addons staging cluster machines.
Started compressing binary logs on our backup server as space was starting to get a bit low.
Debugged an issue where one of the machines in our Support dev/staging database cluster fell off the internet due to an IP conflict.
Replaced some text in a Bugzilla bug when the profanity filter failed to obscure some profanity.
Came up with a process to regularly extract data from an appliance that has MySQL embedded in it, and did a one-off of the process to get a database started.
Gave a developer an SQL backup of the production Buildbot database for analysis.

Next week is sure to be hectic, with two conferences. Happy autumn for those in the Northern Hemisphere, and happy spring for those in the Southern Hemisphere!

MySQL Connect: A Guide for DBAs

Last week I posted a MySQL Connect Guide for Developers. This week I am focusing on DBAs. The conference is about 2/3 administration/maintenance talks and about 1/3 development, with some overlap of course. Gerry and I did a lot of recommendations in OurSQL Episode 104, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for administrators, with times. Note that these are handpicked from what I think administrators would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am – 12:30 pm
There is a session if you want to learn What’s New In MySQL 5.6. Everyone thinks that there will be a new 5.6 release out (though we are all wondering if it will be a DMR, beta or release candidate release), so this will be a great session to go to, to learn about any new features released.

MySQL Optimizer Overview by Olav Sandstå. Get in depth as to how the optimizer works, so you have the knowledge to tune your server and queries.

There’s also Ronald Bradford’s session on Lessons from Managing 500+ MySQL Instances. Ronald always has great tips and tricks to make administering MySQL less painful and avoid problems.

1:00 – 2:00 pm
If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

Even if you are not a beginner, you are sure to learn some great Replication Tips and Tricks from Mats Kindahl. Mats will present a bag of useful tips and tricks related to the MySQL 5.5 GA and MySQL 5.6 development milestone releases, including multisource replication, using logs for auditing, handling filtering, examining the binary log, using relay slaves, splitting the replication stream, and handling failover.

I am personally excited for Rick’s Rules of Thumb by Rick James. Rick is always a great speaker and I learn so much from him!

2:30 – 3:30 pm
Backups are the single most important maintenance tool for MySQL. Hema Sridharam and Svetlana Smirnova present Save your Data: How to Make MySQL Backups. There are several tools to perform MySQL backups, including mysqldump, Oracle’s MySQL Enterprise Backup, third-party applications, and OS methods.

Henrik Ingo will speak about Evaluating MySQL High-Availability Alternatives, including replication, MySQL Cluster, DRBD, Tungsten and Galera. He will speak about the trade-offs of each method and why you might want to use each one, so you can decide what’s best for your environment.

And of course there’s Peter Zaitsev’s Optimizing MySQL Configuration, which is not-to-be missed!

4 – 5 pm
I am personally interested in Patrick Galbraith’s Database Resources On Demand, covering the concept of DBaaS, how an organization can use it, and what it means for DBAs for management and for developers in how they use database resources. Among the other topics it addresses is how open source technologies such as OpenStack provide an infrastructure that can be used in DBaaS.

Lately, solid-state disk drives have been getting a lot of attention. Vadim Tkachenko will speak about MySQL and Solid-State Drives: Usage and Tuning, covering SSD internals and how they affect database performance.

If you want to get your hands dirty with MySQL Cluster, join the Hands-on Labs by Santo Leto called Get Started with MySQL Cluster, where you will learn by doing: install, configure, administer, and access MySQL Cluster.

5:30 – 6:30 pm
If you want to know what replication features are coming up in MySQL 5.6, make sure to check out Lars Thalmann talking about Enabling the New Generation of Web and Cloud Services with MySQL 5.6 Replication. This session showcases the new replication features, including group commit and multithreaded slave for high performance, crash-safe slaves and failover utilities for high availability, global transaction identifiers and annotated row-based replication [RBR] for flexibility/usability, and event checksums for data integrity.

I have to promote my own session! I’m presenting Google-Hacking MySQL, which takes an in-depth look at using white-hat Google hacking techniques to show you what the “bad guys” can do. White-hat google hacking is the good kind of hacking, where you have permission. You will learn about the following hacking strategies and how they are done: SQL injection, cross-site scripting (XSS), cross-site request forgery (CSRF), gateway vulnerabilities, and social engineering—all without violating Google’s terms of service.

If you want to migrate your systems to MySQL, there is a presentation by Sergio Andres De La Cruz Rodriguez about Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

And there are Birds of a Feather session (BoFs) too!

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

There will be a special panel after that, but it has not been announced yet, so it looks like there is a hole in the schedule (but there is no hole!)

10:15 am – 11:15 am
Most of us have a shortage of DBAs. If you are in that situation, you probably have more money than time on your hands, so check out Rob Young’s talk on Optimizing Security, Performance, and Availability with MySQL Enterprise Edition. Yes, MySQL Enterprise Edition costs money, but it is much easier to buy it than to hire a top-notch DBA, which are rare in this world.

If you are a beginner DBA, attend the Hands on Lab entitled Focus on MySQL Replication, taught by Sven Sandberg and Luis Soares During this hands-on lab, you will learn how to get started, how replication works, and the best practices and tools. You will also learn about architecture, advanced replication configurations and some of the new features in the MySQL 5.6 development milestone releases. This session goes until 12:45, so you have a good 2.5 hours for the hands-on work.

Calvin Sun talks about Better Availability with InnoDB Online Operations, including online operations for schema changes such as add index, drop foreign key, and rename column.

11:45 – 12:45
Linas Virbalas and Robert Hodges of Continuent talk about Replicating from MySQL to Oracle Database and Back Again.

If you are into MySQL Security, Joro Kodinov will present MySQL Security: Past and Present. Since the description includes MySQL 5.6 security features, I would have called it “Past, Present and Future”.

Thinking of deploying, or already deployed, Galera? Then do not miss Seppo Jaakola’s talk on Galera Cluster Best Practices.

1:15 – 2:15 pm
Inaam Rana presents InnoDB Performance Tuning, which includes the newer features in MySQL 5.5 and the upcoming features in MySQL 5.6, including unique InnoDB architectural elements for performance and how to tune InnoDB to achieve better performance.

There is a world of tools designed to help make MySQL administration easier, so check out Charles Bell’s hands-on lab about MySQL Utilities where you can experiment with the tools.

My particular favorite for this time slot is Oystein Gravlen’s Query Performance Comparison of MySQL 5.5 and MySQL 5.6. I cannot wait to see how much improvement there is, and why!

2:45 pm -3:45 pm
Profiling with the Performance Schema, given by Mark Leith, will teach how to set up and use Performance Schema to perform everyday profiling and performance monitoring tasks, such as: finding problem queries; researching blocked hosts; profiling I/O usage; analyzing resource usage by schema, table, or user; or tracing a session to see exactly where it spends its time.

Alexander Rubin will talk about critical performance tuning information during In-Depth Query Optimization for MySQL.

Personally I’m not a fan of use cases, which are sessions like “how X company does Y with MySQL”, but since I’m giving one entitled Database Scaling at Mozilla, I should probably promote it. And I will note that these sessions are usually well-attended – I guess people want to see how the big players do things, even though they are only appropriate for about 5% of the DBAs out there. I will note that Mozilla has relatively small databases and high traffic, so our needs are similar to more DBAs out there, and hopefully our solutions will work for them.

4:15 – 5:15 pm
Grant McAlister of Amazon.com presents Durability Is Key: How to Protect Your Data from Corruption where he describes the differences between logical and physical corruption in MySQL and shows how to best protect your MySQL database from both types of corruption.

My former coworkers, Francisco Bordenave and Marco Tusa of Pythian, are presenting on Scaling MySQL with Multimaster Synchronous Replication, where they explain how they investigated and designed an architecture based on MySQL to support an application that served shops around the globe and to scale out and scale in, based on sales seasons.

Jonathon Coombes presents the hands-on lab MySQL Security: Authentication and Audit , a hands-on lab that starts with an introduction to the authentication plug-in API and how it works, then tries an example HTTP authentication plug-in. The lab takes you thorough setting up a Pluggable Authentication Module (PAM) plug-in to access the server OS user definitions. Then you will walk through the MySQL audit plug-in API and how it works, and experiment with the Oracle audit log plug-in and various events it can log. Participants will build and experiment with their own plug-in that forwards MySQL events to the OS logging APIs (syslogd on Linux and Windows Event Log on Windows).

5:45 – 6:45 pm
Tokutek’s Bradley Kuzsmaul defines big data as “several times as large as main memory”. If you have big data, check out his talk on Solving the Challenges of Big Databases with MySQL.

Luis Soares teaches about using replication for high availability in Scaling for the Web and Cloud with MySQL Replication

Or, get information about fulltext search with Sphinx from the horse’s mouth – Andrew Aksynoff talks about Full-Text Search with MySQL and Sphinx.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House, which is JavaOne’s opening event and our closing event.

It’s going to be an amazing event with tons of technical content. I feel like I have written a lot here, but these are simply the sessions I’m having trouble choosing between, or wish I could go to. There are tons more sessions than what I’ve written about!

MySQL Optimizer Overview by Olav Sandstå. Get in depth as to how the optimizer works, so you have the knowledge to tune your server and queries.

There’s also Ronald Bradford’s session on Lessons from Managing 500+ MySQL Instances. Ronald always has great tips and tricks to make administering MySQL less painful and avoid problems.

I am personally excited for Rick’s Rules of Thumb by Rick James. Rick is always a great speaker and I learn so much from him!

And of course there’s Peter Zaitsev’s Optimizing MySQL Configuration, which is not-to-be missed!

If you want to migrate your systems to MySQL, there is a presentation by Sergio Andres De La Cruz Rodriguez about Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

And there are Birds of a Feather session (BoFs) too!

There will be a special panel after that, but it has not been announced yet, so it looks like there is a hole in the schedule (but there is no hole!)

Calvin Sun talks about Better Availability with InnoDB Online Operations, including online operations for schema changes such as add index, drop foreign key, and rename column.

11:45 – 12:45
Linas Virbalas and Robert Hodges of Continuent talk about Replicating from MySQL to Oracle Database and Back Again.

Thinking of deploying, or already deployed, Galera? Then do not miss Seppo Jaakola’s talk on Galera Cluster Best Practices.

There is a world of tools designed to help make MySQL administration easier, so check out Charles Bell’s hands-on lab about MySQL Utilities where you can experiment with the tools.

My particular favorite for this time slot is Oystein Gravlen’s Query Performance Comparison of MySQL 5.5 and MySQL 5.6. I cannot wait to see how much improvement there is, and why!

Alexander Rubin will talk about critical performance tuning information during In-Depth Query Optimization for MySQL.

Jonathon Coombes presents the hands-on lab MySQL Security: Authentication and Audit, a hands-on lab that starts with an introduction to the authentication plug-in API and how it works, then tries an example HTTP authentication plug-in. The lab takes you thorough setting up a Pluggable Authentication Module (PAM) plug-in to access the server OS user definitions. Then you will walk through the MySQL audit plug-in API and how it works, and experiment with the Oracle audit log plug-in and various events it can log. Participants will build and experiment with their own plug-in that forwards MySQL events to the OS logging APIs (syslogd on Linux and Windows Event Log on Windows).

Luis Soares teaches about using replication for high availability in Scaling for the Web and Cloud with MySQL Replication

Or, get information about fulltext search with Sphinx from the horse’s mouth – Andrew Aksynoff talks about Full-Text Search with MySQL and Sphinx.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House, which is JavaOne’s opening event and our closing event.

MySQL Connect Guide for Developers

MySQL Connect is a new conference with a lot of good technical content. In the past, it has been helpful to have “guides” of MySQL conferences, so in this post I will give my guide to MySQL Connect for Developers. Gerry and I did a lot of recommendations in OurSQL Episode 103, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for developers, with times. Note that these are handpicked from what I think developers would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

11:30 am -12:30 pm
You are in luck if you are, or want to be, a Java developer, because there is a hands-on lab for Developing Applications with MySQL and Java with Mark Matthews. Hands-on labs are where you learn by doing, so this is not to be missed if you want to learn how to develop scalable Java applications.

On the internals side, there are a few good optimizer talks that would benefit developers. One is Olav Sandstå’s MySQL Optimizer Overview – you will learn how MySQL chooses the optimal path, and by learning how MySQL does that, you can write better queries.

Oracle’s Geir Høydalsvik presents What’s New in MySQL Server 5.6? which explains the new features and performance enhancements in the 5.6 MySQL Server release candidate. I have a feeling they will have a new release version by MySQL Connect (another DMR? beta? RC?) and we will hear about all the new features in that release, too.

1 – 2 pm
Manyi Lu will present an Overview of New Optimizer Features in MySQL 5.6, talking about multi-range read, index condition pushdown, batched key access, and the new EXPLAIN features.

If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

As you probably know, InnoDB is the default storage engine for Oracle’s MySQL as of MySQL Release 5.5. It provides the standard ACID-compliant transactions, row-level locking, multiversion concurrency control, and referential integrity. InnoDB also implements several innovative technologies to improve its performance and reliability. Chunsen Sung presents 10 Things You Should Know About InnoDB, which is a brief history of InnoDB; its main features; and some recent enhancements for better performance, scalability, and availability.

2:30-3:30 pm
Alexander Rubin from Oracle has a session on the New MySQL Full-text Search Features and Solutions, including the new InnoDB FULLTEXT search*.

If you develop with Python, Geert Vanderkelen has a session about Developing Python Applications with MySQL Utilities and MySQL Connector/Python.

4:00 – 5:00 pm
Heard all the hype about MySQL Cluster? See for yourself what it can do in a Hands-on Lab presented by Santo Leto, Get Started With MySQL Cluster.

Or, head on over to a talk featuring Oracle and Amazon engineers talking about Using MySQL in the Cloud. The Amazon folks know their subject matter when it comes to cloud computing!

5:30 – 6:30 pm
If you are a developer interested in performance, Mark Matthews presents MySQL Enterprise’s Monitor for Developers – the description says you will “learn how to resolve potential performance and scalability issues revealed via performance graphs and query analyzer data from the Monitor feature of MySQL Enterprise Edition and apply them to your own application development. In addition, you will learn how to extend the Monitor feature with your own application-specific performance metrics to extend your visibility into performance issues into the deployment and operations realm.”

If you are interested in migrating to MySQL from Microsoft SQL Server, do not miss Sergio De La Cruz Rodriguez’s talk on Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

Or if you are a more security-focused developer, head on over to my own talk on Google-Hacking MySQL

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

7:00 – 8:00 pm – Birds of a Feather talks – informal discussion sessions
I think these might be relevant to developers:
Python
MySQL Community
Query Optimizations

And it is only the end of the first day! The exhibit hall is open 9:30 am – 1:30 pm, and 7 – 9 pm.

10:15 – 11:15 am
If you develop with PHP, don’t miss Johannes Schlüter presenting Current State of PHP and MySQL, which will explain some of the relevant MySQL changes in PHP 5.4.

I am personally quite interested in the provocatively titled Big Data Is a Big Scam (Most of the Time) by PayPal‘s Chief Architect Daniel Austin. This is one of the more unique talks at this conference, so I will just paste the description here: “This session challenges the conventional wisdom and tries to dispel some of the myths about big data, NoSQL, and everything. When do you need a NoSQL system? How do you choose one from another amid the hype? And how do you know when to stick to your RDBMS and resist becoming a follower of big data fashion? Come and hear what you need to know about your options and how to make wise decisions about solutions to your big data problems.”

11:45 am – 12:45 pm
These days, you can’t talk about performance without talking about NoSQL. Andrew Morgan and John Duncan will present Developing High-Throughput Services with NoSQL APIs to InnoDB and MySQL Cluster, which explains how to maintain all the advantages of existing relational databases while providing blazing-fast performance for simple queries. This session describes the memcached connectors and examines some use cases for how MySQL and memcached fit together in application architectures. It does the same for the newest MySQL Cluster native connector, an easy-to-use, fully asynchronous connector for Node.js.

Everybody is security-conscious, with good reason. Joro Kodinov talks about MySQL Security: Past and Present, giving an overview of MySQL security in the past, present and future versions (with MySQL 5.6).

1:15 – 2:15 pm
Mats Kindahl will talk about Sharding with PHP and MySQL for distributing writes over a cluster, including static and dynamic sharding schemes.

Into NoSQL? Check out Ligaya Turmelle’s presentation on A Journey into NoSQLand: MySQL’s NoSQL Implementation. She will discuss how the memcached API is being used to hook directly into the InnoDB and MySQL Cluster (NDB) storage engines — skipping the MySQL server completely.

Want to know what those new optimizer features could look like for you? Don’t miss Øystein Grøvlen’s talk on Query Performance Comparison of MySQL 5.5 and MySQL 5.6. You’ll see comparisons using the DBT-3 benchmark, and explanations of which queries perform better, and why, in MySQL 5.6.

2:45 – 3:45 pm
Oracle ACE Director Ronald Bradford brings his talk on Improving Performance with Better Indexes to MySQL Connect. So often there is a problem in production that a simple index fixes, and developers should be indexing optimally from the start.

I am a tried-and-true commandline lover, but I know many developers love IDE’s and their GUI interfaces. If you are one of those folks, you might want to check out Alfredo Kojima’s Getting the Most out of MySQL with MySQL Workbench. MySQL Workbench can be used to write and debug SQL queries, manage data and schemas, create databases from scratch, or maintain existing ones through graphical enhanced entity-relationship (EER) models and the advanced SQL Editor functionality, which is all available in the free community edition of MySQL Workbench.

Alexander Rubin will talk about In-Depth Query Optimization for MySQL, which includes tips on how to explain and optimize your slow queries and how to make your reporting queries execute much faster than before. He will also discuss MySQL optimizer internals, show some benchmark results, and how to use the new performance_schema in MySQL 5.6 to monitor queries.

4:15 – 5:15 pm
If you like to know about upcoming features, do not miss Oracle’s Evgeny Potemkin session on Powerful EXPLAIN in MySQL 5.6. MySQL 5.6 offers several new additions that give more-detailed information about the query plan and make it easier to understand at the same time – including structured EXPLAIN in JSON format, EXPLAIN for INSERT/UPDATE/DELETE, and optimizer trace.

If you develop with MySQL on Windows, the Windows Experience Group is for you. Learn about the current and future features of the MySQL Windows Installer and Connector/Net, as well as the improvements for the MySQL server itself on Windows.

5:45 – 6:45 pm
If you use do text searching with MySQL, Sphinx‘s own Andrew Aksonoff has a session on Full-text search with MySQL and Sphinx*.

Bradley Kuszmaul of Tokutek presents Solving the Challenges of Big Databases with MySQL. I love that Bradleys defines what a “big database” is – “more than ten times as large as main memory”.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House.

Next week I will write a blog post about MySQL Connect for DBAs. Watch for it!

* Actually, we just did a podcast series including the basics of FULLTEXT search on MyISAM and how FULLTEXT search works on InnoDB in 5.6.

MySQL Connect is a new conference with a lot of good technical content. In the past, it has been helpful to have guides of MySQL conferences, so in this post I will give my guide to MySQL Connect for Developers. Gerry and I did a lot of recommendations in OurSQL Episode 103, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So heres a guide to MySQL Connect for developers, with times. Note that these are handpicked from what I think developers would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

On the internals side, there are a few good optimizer talks that would benefit developers. One is Olav Sandstås MySQL Optimizer Overview you will learn how MySQL chooses the optimal path, and by learning how MySQL does that, you can write better queries.

Oracles Geir Høydalsvik presents What’s New in MySQL Server 5.6? which explains the new features and performance enhancements in the 5.6 MySQL Server release candidate. I have a feeling they will have a new release version by MySQL Connect (another DMR? beta? RC?) and we will hear about all the new features in that release, too.

1 2 pm
Manyi Lu will present an Overview of New Optimizer Features in MySQL 5.6, talking about multi-range read, index condition pushdown, batched key access, and the new EXPLAIN features.

2:30-3:30 pm
Alexander Rubin from Oracle has a session on the New MySQL Full-text Search Features and Solutions, including the new InnoDB FULLTEXT search*.

If you develop with Python, Geert Vanderkelen has a session about Developing Python Applications with MySQL Utilities and MySQL Connector/Python.

4:00 5:00 pm
Heard all the hype about MySQL Cluster? See for yourself what it can do in a Hands-on Lab presented by Santo Leto, Get Started With MySQL Cluster.

Or, head on over to a talk featuring Oracle and Amazon engineers talking about Using MySQL in the Cloud. The Amazon folks know their subject matter when it comes to cloud computing!

5:30 6:30 pm
If you are a developer interested in performance, Mark Matthews presents MySQL Enterprise’s Monitor for Developers the description says you will learn how to resolve potential performance and scalability issues revealed via performance graphs and query analyzer data from the Monitor feature of MySQL Enterprise Edition and apply them to your own application development. In addition, you will learn how to extend the Monitor feature with your own application-specific performance metrics to extend your visibility into performance issues into the deployment and operations realm.

If you are interested in migrating to MySQL from Microsoft SQL Server, do not miss Sergio De La Cruz Rodriguezs talk on Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

Or if you are a more security-focused developer, head on over to my own talk on Google-Hacking MySQL

6:30 8:30 pm MySQL Connect Reception in the Continental Ballroom

7:00 8:00 pm Birds of a Feather talks informal discussion sessions
I think these might be relevant to developers:
Python
MySQL Community
Query Optimizations

And it is only the end of the first day! The exhibit hall is open 9:30 am 1:30 pm, and 7 9 pm.

Sunday, September 30th
8:30 am 9:30 am
MySQL Perspectives Keynote featuring Twitters DB Team manager Jeremy Cole, PayPals Chief Architect Daniel Austin, Verizon Wireless IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

10:15 11:15 am
If you develop with PHP, dont miss Johannes Schlüter presenting Current State of PHP and MySQL, which will explain some of the relevant MySQL changes in PHP 5.4.

I am personally quite interested in the provocatively titled Big Data Is a Big Scam (Most of the Time) by PayPals Chief Architect Daniel Austin. This is one of the more unique talks at this conference, so I will just paste the description here: This session challenges the conventional wisdom and tries to dispel some of the myths about big data, NoSQL, and everything. When do you need a NoSQL system? How do you choose one from another amid the hype? And how do you know when to stick to your RDBMS and resist becoming a follower of big data fashion? Come and hear what you need to know about your options and how to make wise decisions about solutions to your big data problems.

11:45 am 12:45 pm
These days, you cant talk about performance without talking about NoSQL. Andrew Morgan and John Duncan will present Developing High-Throughput Services with NoSQL APIs to InnoDB and MySQL Cluster, which explains how to maintain all the advantages of existing relational databases while providing blazing-fast performance for simple queries. This session describes the memcached connectors and examines some use cases for how MySQL and memcached fit together in application architectures. It does the same for the newest MySQL Cluster native connector, an easy-to-use, fully asynchronous connector for Node.js.

1:15 2:15 pm
Mats Kindahl will talk about Sharding with PHP and MySQL for distributing writes over a cluster, including static and dynamic sharding schemes.

Into NoSQL? Check out Ligaya Turmelles presentation on A Journey into NoSQLand: MySQL’s NoSQL Implementation. She will discuss how the memcached API is being used to hook directly into the InnoDB and MySQL Cluster (NDB) storage engines — skipping the MySQL server completely.

Want to know what those new optimizer features could look like for you? Dont miss Øystein Grøvlens talk on Query Performance Comparison of MySQL 5.5 and MySQL 5.6. Youll see comparisons using the DBT-3 benchmark, and explanations of which queries perform better, and why, in MySQL 5.6.

2:45 3:45 pm
Oracle ACE Director Ronald Bradford brings his talk on Improving Performance with Better Indexes to MySQL Connect. So often there is a problem in production that a simple index fixes, and developers should be indexing optimally from the start.

I am a tried-and-true commandline lover, but I know many developers love IDEs and their GUI interfaces. If you are one of those folks, you might want to check out Alfredo Kojimas Getting the Most out of MySQL with MySQL Workbench. MySQL Workbench can be used to write and debug SQL queries, manage data and schemas, create databases from scratch, or maintain existing ones through graphical enhanced entity-relationship (EER) models and the advanced SQL Editor functionality, which is all available in the free community edition of MySQL Workbench.

4:15 5:15 pm
If you like to know about upcoming features, do not miss Oracles Evgeny Potemkin session on Powerful EXPLAIN in MySQL 5.6. MySQL 5.6 offers several new additions that give more-detailed information about the query plan and make it easier to understand at the same time including structured EXPLAIN in JSON format, EXPLAIN for INSERT/UPDATE/DELETE, and optimizer trace.

5:45 6:45 pm
If you use do text searching with MySQL, Sphinxs own Andrew Aksonoff has a session on Full-text search with MySQL and Sphinx*.

Bradley Kuszmaul of Tokutek presents Solving the Challenges of Big Databases with MySQL. I love that Bradleys defines what a big database is more than ten times as large as main memory.

From 7 9 pm on Sunday, there is the Taylor Street Open House.

Next week I will write a blog post about MySQL Connect for DBAs. Watch for it!

* Actually, we just did a podcast series including the basics of FULLTEXT search on MyISAM and how FULLTEXT search works on InnoDB in 5.6.

Should You Use GROUP BY/ORDER BY NULL By Default?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

At Northeast PHP a few weeks ago, an audience member came up to me after my talk about indexing and asked about ORDER BY NULL for optimal queries. I have to say, I was surprised, as I had not heard about using ORDER BY NULL. In a nutshell, apparently when MySQL does a GROUP BY, there is an implicit ORDER BY the same fields, which adds extra overhead for the mere purpose of returning the values in order of the GROUP BY fields.

I knew about the implicit ORDER BY, but I thought that was required for the GROUP BY, and made the GROUP BY faster. After all, it’s easier to group like items together if they are already sorted, right?

However, every single source I have researched seems to imply that, no, it is just overhead and completely unnecessary unless you really do want the results returned in the same order as the GROUP BY. For example, if you query with GROUP BY last_name and do not care about having the rows returned in lexical order of last names, you would use GROUP BY last_name ORDER BY NULL.

This leads me to believe that by default, whenever doing a GROUP BY, it is a good idea to use ORDER BY NULL. I have not seen that as a piece of advice that is generally given out in talks, either. So maybe there is something I am not understanding properly? I would love to know what everyone thinks.

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

My question is – why wait until you see a “Using filesort” in your EXPLAIN plan? If the overhead is only used to order the results, and that is not desired, why not just use ORDER BY NULL by default, whenever using a GROUP BY query?

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?

Slides From MySQL, Geek Management and Good Ideas for DBA Talks in Trinidad and Guatemala

In the past two days I was fortunate enough to speak to two different groups of people at the Ministry of Science, Technology and Tertiary Education in the Parliament building in Port of Spain, Trinidad – yes, I spoke at Parliament! The PDF slides for my talk are available: The Art of Cat Herding: How to Manage Geeks and Ideas for DBA’s – not “best practices”, but ideas you may or may not want to implement.

I was in Trinidad as part of the Latin America Oracle Technology Network tour of Latin America (North leg). I also spent time in Cali, Colombia and Quito, Ecuador (including visiting the Equator!). Today I am in Guatemala, and I will give talks on more MySQL-specific subjects: MySQL Security and Get Rid of Cron Scripts Using MySQL Events. Tomorrow I travel to Honduras, and on Sunday is Costa Rica, then I go home, which I haven’t seen since the 24th of June. I will have spent 43 hours on a plane in one month, and I am excited to finally go home next week…but I still have a few more countries on the tour!

Mozilla IT Musings from Sheeri

What is an “unauthenticated user”?

Every so often we have a client worrying about unauthenticated users. For example, as part of the output of SHOW PROCESSLIST they will see:

+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| Id  | User                 | Host               | db   | Command | Time | State | Info             |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| 235 | unauthenticated user | 10.10.2.74:53216   | NULL | Connect | NULL | login | NULL             |
| 236 | unauthenticated user | 10.120.61.10:51721 | NULL | Connect | NULL | login | NULL             |
| 237 | user                 | localhost          | NULL | Query   | 0    | NULL  | show processlist |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+

Who are these unauthenticated users, how do they get there, and why aren’t they authenticated?

The client-server handshake in MySQL is a 4-step process. Those familiar with mysql-proxy already know these steps, as there are four functions that a Lua script in mysql-proxy can override. The process is useful to know for figuring out exactly where a problem is when something breaks.
(more…)

Every so often we have a client worrying about unauthenticated users. For example, as part of the output of SHOW PROCESSLIST they will see:

+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| Id  | User                 | Host               | db   | Command | Time | State | Info             |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| 235 | unauthenticated user | 10.10.2.74:53216   | NULL | Connect | NULL | login | NULL             |
| 236 | unauthenticated user | 10.120.61.10:51721 | NULL | Connect | NULL | login | NULL             |
| 237 | user                 | localhost          | NULL | Query   | 0    | NULL  | show processlist |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+

Who are these unauthenticated users, how do they get there, and why aren’t they authenticated?

Step 1: Client sends connect request to server. There is no information here (as far as I can tell). However, it does mean that if you try to connect to a host and port of a mysqld server that is not available, you will get

ERROR 2003 (HY000): Can't connect to MySQL server on '[host]' (111)

Step 2: The server assigns a connection and sends back a handshake, which includes the server’s mysqld version, the thread id, the server host and port, the client host and port, and a “scramble buffer” (for salting authentication, I believe).

It is during Step 2 where the connections show up in SHOW PROCESSLIST. They have not been authenticated yet, but they are connected. If there are issues with authentication, connections will be stuck at this stage. Most often stuck connections are due to DNS not resolving properly, which the skip-name-resolve option will help with.

Step 3: Client sends authentication information, including the username, the password (salted and hashed) and default database to use. If the client sends an incorrect packet, or does not send authentication information within connect_timeout seconds, the server considers the connection aborted and increments its Aborted_connects status variable.

Step 4: Server sends back whether the authentication was successful or not. If the authentication was not successful, mysqld increments its Aborted_connects status variable and sends back an error message:

ERROR 1045 (28000): Access denied for user 'user'@'host' (using password: [YES/NO])

Hope this helps!

Video: Who is the Dick on My Site Keynote

I have already blogged about this keynote at http://www.pythian.com/blogs/948/liveblogging-who-is-the-dick-on-my-site.

If you are interested in actually seeing the video, the 286 Mb .wmv file can be downloaded at http://technocation.org/videos/original/mysqlconf2008/2008_04_17_panelDick.wmv and played through your browser by clicking the “play” link at http://tinyurl.com/55c5ps. This is not to be missed!

From the official conference description:

Much of the data in a database is about people. Identity 2.0 technologies will lower the friction for people to provide and easily move data about themselves online.

This fast paced keynote will offer a background on Identity 2.0, discuss current roadblocks and future opportunities, and explore the potential impacts these will have on databases.

———–
I have already blogged about this keynote at https://sheeri.org/liveblogging-who-is-the-dick-on-my-site/
Colin Charles also blogged about it
Do not miss this keynote! See it on youtube.

Liveblogging: Who is the Dick on My Site?

Identity 2.0: A world that’s simple, safe and secure.

Who is the Dick on My Site? by Dick Hardt (Sxip Identity Corporation)

Quotes:
“Really, data is about people. It’s really identity data.”

“Identity helps you predict behavior.”

“Identity is who you are.”

“Identity is also what you like.”

“Identity enables you to uniquely identify somebody.”

“There are things that other people say about you, too.”

“Modern identity is about photo IDs so you can prove your identity.”

“Identity is a complicated issue….Everyone has a different idea of what it is.”

Identity transactions are:

party identification (who)
authorization (permission)
profile exchange (info about that person)
NOT record matching

Identity transactions can be: (more…)

Identity 2.0: A world that’s simple, safe and secure.

Who is the Dick on My Site? by Dick Hardt (Sxip Identity Corporation)

Quotes:
“Really, data is about people. It’s really identity data.”

“Identity helps you predict behavior.”

“Identity is who you are.”

“Identity is also what you like.”

“Identity enables you to uniquely identify somebody.”

“There are things that other people say about you, too.”

“Modern identity is about photo IDs so you can prove your identity.”

“Identity is a complicated issue….Everyone has a different idea of what it is.”

Identity transactions are:
party identification (who)
authorization (permission)
profile exchange (info about that person)
NOT record matching
Identity transactions can be:
verbal
but it’s unverified
need trust
How do you verify?
ID, subject matches credential, assuming the feature that only the one person can use that ID.

Photo ID is asymmetrical in trust, because the issuing organization (province of British Columbia) doesn’t know when the ID is being used, so there’s some privacy.

What is digital identity?
sometimes, site registration.
definitely a hassle, could be simpler
unverified, fewer trust cues than verbal
Interesting point — searching de.li.cio.us shows you what other people think you are.

How do you prove to a website who you are? It’s not what you give to the site, but what the site knows about you! If you have a good eBay rating, can you take that over to Craigslist?

What we want in Identity 2.0 is a way to make identity user-centric, not site-centric, so a person can move their identity around.

How do we solve this? You have a trusted agent that can give information to relying parties — a relying party is any site that the user wants to share information. The agent does not need to trust the relying party, the sites don’t need to trust the agent. The relying party does need to trust the agent (“issuer”), but that’s it. This is how OpenID works.

Identity data isn’t just data, it’s data about a person.

Why does identity matter?

“The future has arrived, it is just not evenly distributed yet.” William Gibson

More and more apps are becoming distributed (ie, Google). Biometrics are becoming prevalent. There’s a lot of device convergence — a phone can pay for things, etc.

There are “digital natives” and “digital immigrants” — natives grew up with the computer, with the internet. An immigrant has an accent — “digital camera” for an immigrant, “camera” for a native.

Identity 2.0 predictions:
minimal passwords — the agent makes it simpler
rich portable profiles — don’t need to keep re-writing the profile information over and over
portable credentials — digital driver’s licence, prove attributes digitally
agency/delegation — an assistant can book a flight for you, or one site can get
reputation services — like blogosphere, page rank, great contributor to wikis or open source. Similar to credit rating.
identity services — disposable e-mail, one-time tokens, such as one-time payments, one-time phone numbers, all this stuff can help reduce spam and protect privacy.
State of user-centric identity:
functionality — there is nothing out there that’s functional out there for what we need
industry — many organizations are working together, that wouldn’t normally – Grade: A
standards — needs more work – Grade: C
interop — standards not quite there, but folks are making it work – Grade: B
deployment — there’s a start, but more needed – Grade: C
utilization — nominal – Grade: D probably should be F

vitamins — should take, but don’t
painkillers — don’t want to take, but do
viagra — want to take, probably shouldn’t
Identity 2.0 is still at the vitamin stage. There’s no pain.

CHAR() vs. VARCHAR()

So, a little gotcha:

The CHAR() and VARCHAR() types are different types. MySQL silently converts any CHAR() fields to VARCHAR() when creating a table with at least 1 VARCHAR() field.

http://dev.mysql.com/doc/refman/5.0/en/silent-column-changes.html

If any column in a table has a variable length, the entire row becomes variable-length as a result. Therefore, if a table contains any variable-length columns (VARCHAR, TEXT, or BLOB), all CHAR columns longer than three characters are changed to VARCHAR columns. This does not affect how you use the columns in any way; in MySQL, VARCHAR is just a different way to store characters. MySQL performs this conversion because it saves space and makes table operations faster.

However, that’s not entirely accurate. Because according to the manual page at http://dev.mysql.com/doc/refman/5.0/en/char.html:

As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.

If you have a field such as name, and require it to not be blank, you probably have some function testing it before it goes into the database. However, most languages are perfectly happy that ” ” isn’t blank. When it gets put into the database, however, it becomes blank if your column is a VARCHAR. Which means folks may be able to get beyond your requirement of a blank field, and actually store a blank field in the database (as opposed to storing a space or series of spaces).

The CHAR() and VARCHAR() types are different types. MySQL silently converts any CHAR() fields to VARCHAR() when creating a table with at least 1 VARCHAR() field.

http://dev.mysql.com/doc/refman/5.0/en/silent-column-changes.html

If any column in a table has a variable length, the entire row becomes variable-length as a result. Therefore, if a table contains any variable-length columns (VARCHAR, TEXT, or BLOB), all CHAR columns longer than three characters are changed to VARCHAR columns. This does not affect how you use the columns in any way; in MySQL, VARCHAR is just a different way to store characters. MySQL performs this conversion because it saves space and makes table operations faster.

However, that’s not entirely accurate. Because according to the manual page at http://dev.mysql.com/doc/refman/5.0/en/char.html:

As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.