MySQL Connect Guide for Developers

MySQL Connect is a new conference with a lot of good technical content. In the past, it has been helpful to have “guides” of MySQL conferences, so in this post I will give my guide to MySQL Connect for Developers. Gerry and I did a lot of recommendations in OurSQL Episode 103, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So here’s a guide to MySQL Connect for developers, with times. Note that these are handpicked from what I think developers would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am -12:30 pm
You are in luck if you are, or want to be, a Java developer, because there is a hands-on lab for Developing Applications with MySQL and Java with Mark Matthews. Hands-on labs are where you learn by doing, so this is not to be missed if you want to learn how to develop scalable Java applications.

On the internals side, there are a few good optimizer talks that would benefit developers. One is Olav Sandstå’s MySQL Optimizer Overview – you will learn how MySQL chooses the optimal path, and by learning how MySQL does that, you can write better queries.

Oracle’s Geir Høydalsvik presents What’s New in MySQL Server 5.6? which explains the new features and performance enhancements in the 5.6 MySQL Server release candidate. I have a feeling they will have a new release version by MySQL Connect (another DMR? beta? RC?) and we will hear about all the new features in that release, too.

1 – 2 pm
Manyi Lu will present an Overview of New Optimizer Features in MySQL 5.6, talking about multi-range read, index condition pushdown, batched key access, and the new EXPLAIN features.

If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

As you probably know, InnoDB is the default storage engine for Oracle’s MySQL as of MySQL Release 5.5. It provides the standard ACID-compliant transactions, row-level locking, multiversion concurrency control, and referential integrity. InnoDB also implements several innovative technologies to improve its performance and reliability. Chunsen Sung presents 10 Things You Should Know About InnoDB, which is a brief history of InnoDB; its main features; and some recent enhancements for better performance, scalability, and availability.

2:30-3:30 pm
Alexander Rubin from Oracle has a session on the New MySQL Full-text Search Features and Solutions, including the new InnoDB FULLTEXT search*.

If you develop with Python, Geert Vanderkelen has a session about Developing Python Applications with MySQL Utilities and MySQL Connector/Python.

4:00 – 5:00 pm
Heard all the hype about MySQL Cluster? See for yourself what it can do in a Hands-on Lab presented by Santo Leto, Get Started With MySQL Cluster.

Or, head on over to a talk featuring Oracle and Amazon engineers talking about Using MySQL in the Cloud. The Amazon folks know their subject matter when it comes to cloud computing!

5:30 – 6:30 pm
If you are a developer interested in performance, Mark Matthews presents MySQL Enterprise’s Monitor for Developers – the description says you will “learn how to resolve potential performance and scalability issues revealed via performance graphs and query analyzer data from the Monitor feature of MySQL Enterprise Edition and apply them to your own application development. In addition, you will learn how to extend the Monitor feature with your own application-specific performance metrics to extend your visibility into performance issues into the deployment and operations realm.”

If you are interested in migrating to MySQL from Microsoft SQL Server, do not miss Sergio De La Cruz Rodriguez’s talk on Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

Or if you are a more security-focused developer, head on over to my own talk on Google-Hacking MySQL

6:30 – 8:30 pm – MySQL Connect Reception in the Continental Ballroom

7:00 – 8:00 pm – Birds of a Feather talks – informal discussion sessions
I think these might be relevant to developers:
Python
MySQL Community
Query Optimizations

And it is only the end of the first day! The exhibit hall is open 9:30 am – 1:30 pm, and 7 – 9 pm.

Sunday, September 30th
8:30 am – 9:30 am
MySQL Perspectives Keynote featuring Twitter‘s DB Team manager Jeremy Cole, PayPal‘s Chief Architect Daniel Austin, Verizon Wireless‘ IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

10:15 – 11:15 am
If you develop with PHP, don’t miss Johannes Schlüter presenting Current State of PHP and MySQL, which will explain some of the relevant MySQL changes in PHP 5.4.

I am personally quite interested in the provocatively titled Big Data Is a Big Scam (Most of the Time) by PayPal‘s Chief Architect Daniel Austin. This is one of the more unique talks at this conference, so I will just paste the description here: “This session challenges the conventional wisdom and tries to dispel some of the myths about big data, NoSQL, and everything. When do you need a NoSQL system? How do you choose one from another amid the hype? And how do you know when to stick to your RDBMS and resist becoming a follower of big data fashion? Come and hear what you need to know about your options and how to make wise decisions about solutions to your big data problems.”

11:45 am – 12:45 pm
These days, you can’t talk about performance without talking about NoSQL. Andrew Morgan and John Duncan will present Developing High-Throughput Services with NoSQL APIs to InnoDB and MySQL Cluster, which explains how to maintain all the advantages of existing relational databases while providing blazing-fast performance for simple queries. This session describes the memcached connectors and examines some use cases for how MySQL and memcached fit together in application architectures. It does the same for the newest MySQL Cluster native connector, an easy-to-use, fully asynchronous connector for Node.js.

Everybody is security-conscious, with good reason. Joro Kodinov talks about MySQL Security: Past and Present, giving an overview of MySQL security in the past, present and future versions (with MySQL 5.6).

1:15 – 2:15 pm
Mats Kindahl will talk about Sharding with PHP and MySQL for distributing writes over a cluster, including static and dynamic sharding schemes.

Into NoSQL? Check out Ligaya Turmelle’s presentation on A Journey into NoSQLand: MySQL’s NoSQL Implementation. She will discuss how the memcached API is being used to hook directly into the InnoDB and MySQL Cluster (NDB) storage engines — skipping the MySQL server completely.

Want to know what those new optimizer features could look like for you? Don’t miss Øystein Grøvlen’s talk on Query Performance Comparison of MySQL 5.5 and MySQL 5.6. You’ll see comparisons using the DBT-3 benchmark, and explanations of which queries perform better, and why, in MySQL 5.6.

2:45 – 3:45 pm
Oracle ACE Director Ronald Bradford brings his talk on Improving Performance with Better Indexes to MySQL Connect. So often there is a problem in production that a simple index fixes, and developers should be indexing optimally from the start.

I am a tried-and-true commandline lover, but I know many developers love IDE’s and their GUI interfaces. If you are one of those folks, you might want to check out Alfredo Kojima’s Getting the Most out of MySQL with MySQL Workbench. MySQL Workbench can be used to write and debug SQL queries, manage data and schemas, create databases from scratch, or maintain existing ones through graphical enhanced entity-relationship (EER) models and the advanced SQL Editor functionality, which is all available in the free community edition of MySQL Workbench.

Alexander Rubin will talk about In-Depth Query Optimization for MySQL, which includes tips on how to explain and optimize your slow queries and how to make your reporting queries execute much faster than before. He will also discuss MySQL optimizer internals, show some benchmark results, and how to use the new performance_schema in MySQL 5.6 to monitor queries.

4:15 – 5:15 pm
If you like to know about upcoming features, do not miss Oracle’s Evgeny Potemkin session on Powerful EXPLAIN in MySQL 5.6. MySQL 5.6 offers several new additions that give more-detailed information about the query plan and make it easier to understand at the same time – including structured EXPLAIN in JSON format, EXPLAIN for INSERT/UPDATE/DELETE, and optimizer trace.

If you develop with MySQL on Windows, the Windows Experience Group is for you. Learn about the current and future features of the MySQL Windows Installer and Connector/Net, as well as the improvements for the MySQL server itself on Windows.

5:45 – 6:45 pm
If you use do text searching with MySQL, Sphinx‘s own Andrew Aksonoff has a session on Full-text search with MySQL and Sphinx*.

Bradley Kuszmaul of Tokutek presents Solving the Challenges of Big Databases with MySQL. I love that Bradleys defines what a “big database” is – “more than ten times as large as main memory”.

From 7 – 9 pm on Sunday, there is the Taylor Street Open House.

Next week I will write a blog post about MySQL Connect for DBAs. Watch for it!

* Actually, we just did a podcast series including the basics of FULLTEXT search on MyISAM and how FULLTEXT search works on InnoDB in 5.6.

MySQL Connect is a new conference with a lot of good technical content. In the past, it has been helpful to have guides of MySQL conferences, so in this post I will give my guide to MySQL Connect for Developers. Gerry and I did a lot of recommendations in OurSQL Episode 103, but that was before the schedule itself was up, so now I can present a list of session-by-session talks for developers who are building their schedules.

So heres a guide to MySQL Connect for developers, with times. Note that these are handpicked from what I think developers would be interested in. There are many more sessions than the ones listed here, so head on over to the Schedule Builder to build your own schedule:

Saturday, September 29th:
9-10:30 am
MySQL Connect Keynote: The State of the Dolphin by Tomas Ulin, VP and Edward Screven, Chief Corporate Architect, both of Oracle. I am pretty excited to see where Oracle is taking MySQL next!

11:30 am -12:30 pm
You are in luck if you are, or want to be, a Java developer, because there is a hands-on lab for Developing Applications with MySQL and Java with Mark Matthews. Hands-on labs are where you learn by doing, so this is not to be missed if you want to learn how to develop scalable Java applications.

On the internals side, there are a few good optimizer talks that would benefit developers. One is Olav Sandstås MySQL Optimizer Overview you will learn how MySQL chooses the optimal path, and by learning how MySQL does that, you can write better queries.

Oracles Geir Høydalsvik presents What’s New in MySQL Server 5.6? which explains the new features and performance enhancements in the 5.6 MySQL Server release candidate. I have a feeling they will have a new release version by MySQL Connect (another DMR? beta? RC?) and we will hear about all the new features in that release, too.

1 2 pm
Manyi Lu will present an Overview of New Optimizer Features in MySQL 5.6, talking about multi-range read, index condition pushdown, batched key access, and the new EXPLAIN features.

If you are a beginner, you will want to attend the Hands-on Lab Getting started with MySQL presented by Gillian Gunson and Alfredo Kojima, to learn the MySQL architecture, how to install and configure the MySQL server, and how to query and back up the database. You will also learn about error messages, accounts, datatypes, simple SQL statements and how to import data into and export it from the MySQL server. And remember, you are doing this all in front of a computer, because this is a hands-on lab. This hands on lab runs from 1-3:30 pm, so there is plenty of time to learn and do a lot!

As you probably know, InnoDB is the default storage engine for Oracle’s MySQL as of MySQL Release 5.5. It provides the standard ACID-compliant transactions, row-level locking, multiversion concurrency control, and referential integrity. InnoDB also implements several innovative technologies to improve its performance and reliability. Chunsen Sung presents 10 Things You Should Know About InnoDB, which is a brief history of InnoDB; its main features; and some recent enhancements for better performance, scalability, and availability.

2:30-3:30 pm
Alexander Rubin from Oracle has a session on the New MySQL Full-text Search Features and Solutions, including the new InnoDB FULLTEXT search*.

If you develop with Python, Geert Vanderkelen has a session about Developing Python Applications with MySQL Utilities and MySQL Connector/Python.

4:00 5:00 pm
Heard all the hype about MySQL Cluster? See for yourself what it can do in a Hands-on Lab presented by Santo Leto, Get Started With MySQL Cluster.

Or, head on over to a talk featuring Oracle and Amazon engineers talking about Using MySQL in the Cloud. The Amazon folks know their subject matter when it comes to cloud computing!

5:30 6:30 pm
If you are a developer interested in performance, Mark Matthews presents MySQL Enterprise’s Monitor for Developers the description says you will learn how to resolve potential performance and scalability issues revealed via performance graphs and query analyzer data from the Monitor feature of MySQL Enterprise Edition and apply them to your own application development. In addition, you will learn how to extend the Monitor feature with your own application-specific performance metrics to extend your visibility into performance issues into the deployment and operations realm.

If you are interested in migrating to MySQL from Microsoft SQL Server, do not miss Sergio De La Cruz Rodriguezs talk on Migrating from Microsoft SQL Server to MySQL: The New MySQL Migration Tool.

Or if you are a more security-focused developer, head on over to my own talk on Google-Hacking MySQL

6:30 8:30 pm MySQL Connect Reception in the Continental Ballroom

7:00 8:00 pm Birds of a Feather talks informal discussion sessions
I think these might be relevant to developers:
Python
MySQL Community
Query Optimizations

And it is only the end of the first day! The exhibit hall is open 9:30 am 1:30 pm, and 7 9 pm.

Sunday, September 30th
8:30 am 9:30 am
MySQL Perspectives Keynote featuring Twitters DB Team manager Jeremy Cole, PayPals Chief Architect Daniel Austin, Verizon Wireless IT Director and DB Architect/Engineer, Ash Kanagat and Shivinder Singh, who will share their experiences and perspectives. I think this is going to be fascinating, and well-worth having to wake up early to get to the venue at 8:30 am.

10:15 11:15 am
If you develop with PHP, dont miss Johannes Schlüter presenting Current State of PHP and MySQL, which will explain some of the relevant MySQL changes in PHP 5.4.

I am personally quite interested in the provocatively titled Big Data Is a Big Scam (Most of the Time) by PayPals Chief Architect Daniel Austin. This is one of the more unique talks at this conference, so I will just paste the description here: This session challenges the conventional wisdom and tries to dispel some of the myths about big data, NoSQL, and everything. When do you need a NoSQL system? How do you choose one from another amid the hype? And how do you know when to stick to your RDBMS and resist becoming a follower of big data fashion? Come and hear what you need to know about your options and how to make wise decisions about solutions to your big data problems.

11:45 am 12:45 pm
These days, you cant talk about performance without talking about NoSQL. Andrew Morgan and John Duncan will present Developing High-Throughput Services with NoSQL APIs to InnoDB and MySQL Cluster, which explains how to maintain all the advantages of existing relational databases while providing blazing-fast performance for simple queries. This session describes the memcached connectors and examines some use cases for how MySQL and memcached fit together in application architectures. It does the same for the newest MySQL Cluster native connector, an easy-to-use, fully asynchronous connector for Node.js.

Everybody is security-conscious, with good reason. Joro Kodinov talks about MySQL Security: Past and Present, giving an overview of MySQL security in the past, present and future versions (with MySQL 5.6).

1:15 2:15 pm
Mats Kindahl will talk about Sharding with PHP and MySQL for distributing writes over a cluster, including static and dynamic sharding schemes.

Into NoSQL? Check out Ligaya Turmelles presentation on A Journey into NoSQLand: MySQL’s NoSQL Implementation. She will discuss how the memcached API is being used to hook directly into the InnoDB and MySQL Cluster (NDB) storage engines — skipping the MySQL server completely.

Want to know what those new optimizer features could look like for you? Dont miss Øystein Grøvlens talk on Query Performance Comparison of MySQL 5.5 and MySQL 5.6. Youll see comparisons using the DBT-3 benchmark, and explanations of which queries perform better, and why, in MySQL 5.6.

2:45 3:45 pm
Oracle ACE Director Ronald Bradford brings his talk on Improving Performance with Better Indexes to MySQL Connect. So often there is a problem in production that a simple index fixes, and developers should be indexing optimally from the start.

I am a tried-and-true commandline lover, but I know many developers love IDEs and their GUI interfaces. If you are one of those folks, you might want to check out Alfredo Kojimas Getting the Most out of MySQL with MySQL Workbench. MySQL Workbench can be used to write and debug SQL queries, manage data and schemas, create databases from scratch, or maintain existing ones through graphical enhanced entity-relationship (EER) models and the advanced SQL Editor functionality, which is all available in the free community edition of MySQL Workbench.

Alexander Rubin will talk about In-Depth Query Optimization for MySQL, which includes tips on how to explain and optimize your slow queries and how to make your reporting queries execute much faster than before. He will also discuss MySQL optimizer internals, show some benchmark results, and how to use the new performance_schema in MySQL 5.6 to monitor queries.

4:15 5:15 pm
If you like to know about upcoming features, do not miss Oracles Evgeny Potemkin session on Powerful EXPLAIN in MySQL 5.6. MySQL 5.6 offers several new additions that give more-detailed information about the query plan and make it easier to understand at the same time including structured EXPLAIN in JSON format, EXPLAIN for INSERT/UPDATE/DELETE, and optimizer trace.

If you develop with MySQL on Windows, the Windows Experience Group is for you. Learn about the current and future features of the MySQL Windows Installer and Connector/Net, as well as the improvements for the MySQL server itself on Windows.

5:45 6:45 pm
If you use do text searching with MySQL, Sphinxs own Andrew Aksonoff has a session on Full-text search with MySQL and Sphinx*.

Bradley Kuszmaul of Tokutek presents Solving the Challenges of Big Databases with MySQL. I love that Bradleys defines what a big database is more than ten times as large as main memory.

From 7 9 pm on Sunday, there is the Taylor Street Open House.

Next week I will write a blog post about MySQL Connect for DBAs. Watch for it!

* Actually, we just did a podcast series including the basics of FULLTEXT search on MyISAM and how FULLTEXT search works on InnoDB in 5.6.

Should You Use GROUP BY/ORDER BY NULL By Default?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

At Northeast PHP a few weeks ago, an audience member came up to me after my talk about indexing and asked about ORDER BY NULL for optimal queries. I have to say, I was surprised, as I had not heard about using ORDER BY NULL. In a nutshell, apparently when MySQL does a GROUP BY, there is an implicit ORDER BY the same fields, which adds extra overhead for the mere purpose of returning the values in order of the GROUP BY fields.

I knew about the implicit ORDER BY, but I thought that was required for the GROUP BY, and made the GROUP BY faster. After all, it’s easier to group like items together if they are already sorted, right?

However, every single source I have researched seems to imply that, no, it is just overhead and completely unnecessary unless you really do want the results returned in the same order as the GROUP BY. For example, if you query with GROUP BY last_name and do not care about having the rows returned in lexical order of last names, you would use GROUP BY last_name ORDER BY NULL.

This leads me to believe that by default, whenever doing a GROUP BY, it is a good idea to use ORDER BY NULL. I have not seen that as a piece of advice that is generally given out in talks, either. So maybe there is something I am not understanding properly? I would love to know what everyone thinks.

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

My question is – why wait until you see a “Using filesort” in your EXPLAIN plan? If the overhead is only used to order the results, and that is not desired, why not just use ORDER BY NULL by default, whenever using a GROUP BY query?

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

At Northeast PHP a few weeks ago, an audience member came up to me after my talk about indexing and asked about ORDER BY NULL for optimal queries. I have to say, I was surprised, as I had not heard about using ORDER BY NULL. In a nutshell, apparently when MySQL does a GROUP BY, there is an implicit ORDER BY the same fields, which adds extra overhead for the mere purpose of returning the values in order of the GROUP BY fields.

I knew about the implicit ORDER BY, but I thought that was required for the GROUP BY, and made the GROUP BY faster. After all, it’s easier to group like items together if they are already sorted, right?

However, every single source I have researched seems to imply that, no, it is just overhead and completely unnecessary unless you really do want the results returned in the same order as the GROUP BY. For example, if you query with GROUP BY last_name and do not care about having the rows returned in lexical order of last names, you would use GROUP BY last_name ORDER BY NULL.

This leads me to believe that by default, whenever doing a GROUP BY, it is a good idea to use ORDER BY NULL. I have not seen that as a piece of advice that is generally given out in talks, either. So maybe there is something I am not understanding properly? I would love to know what everyone thinks.

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

My question is – why wait until you see a “Using filesort” in your EXPLAIN plan? If the overhead is only used to order the results, and that is not desired, why not just use ORDER BY NULL by default, whenever using a GROUP BY query?

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?

Slides From MySQL, Geek Management and Good Ideas for DBA Talks in Trinidad and Guatemala

In the past two days I was fortunate enough to speak to two different groups of people at the Ministry of Science, Technology and Tertiary Education in the Parliament building in Port of Spain, Trinidad yes, I spoke at Parliament! The PDF slides for my talk are available: The Art of Cat Herding: How to Manage Geeks and Ideas for DBAs not best practices, but ideas you may or may not want to implement.

I was in Trinidad as part of the Latin America Oracle Technology Network tour of Latin America (North leg). I also spent time in Cali, Colombia and Quito, Ecuador (including visiting the Equator!). Today I am in Guatemala, and I will give talks on more MySQL-specific subjects: MySQL Security and Get Rid of Cron Scripts Using MySQL Events. Tomorrow I travel to Honduras, and on Sunday is Costa Rica, then I go home, which I havent seen since the 24th of June. I will have spent 43 hours on a plane in one month, and I am excited to finally go home next week but I still have a few more countries on the tour!

What is an “unauthenticated user”?

Every so often we have a client worrying about unauthenticated users. For example, as part of the output of SHOW PROCESSLIST they will see:

+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| Id  | User                 | Host               | db   | Command | Time | State | Info             |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| 235 | unauthenticated user | 10.10.2.74:53216   | NULL | Connect | NULL | login | NULL             |
| 236 | unauthenticated user | 10.120.61.10:51721 | NULL | Connect | NULL | login | NULL             |
| 237 | user                 | localhost          | NULL | Query   | 0    | NULL  | show processlist |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+

Who are these unauthenticated users, how do they get there, and why aren’t they authenticated?

The client-server handshake in MySQL is a 4-step process. Those familiar with mysql-proxy already know these steps, as there are four functions that a Lua script in mysql-proxy can override. The process is useful to know for figuring out exactly where a problem is when something breaks.
(more…)

Every so often we have a client worrying about unauthenticated users. For example, as part of the output of SHOW PROCESSLIST they will see:

+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| Id  | User                 | Host               | db   | Command | Time | State | Info             |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| 235 | unauthenticated user | 10.10.2.74:53216   | NULL | Connect | NULL | login | NULL             |
| 236 | unauthenticated user | 10.120.61.10:51721 | NULL | Connect | NULL | login | NULL             |
| 237 | user                 | localhost          | NULL | Query   | 0    | NULL  | show processlist |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+

Who are these unauthenticated users, how do they get there, and why aren’t they authenticated?

The client-server handshake in MySQL is a 4-step process. Those familiar with mysql-proxy already know these steps, as there are four functions that a Lua script in mysql-proxy can override. The process is useful to know for figuring out exactly where a problem is when something breaks.

Step 1: Client sends connect request to server. There is no information here (as far as I can tell). However, it does mean that if you try to connect to a host and port of a mysqld server that is not available, you will get

ERROR 2003 (HY000): Can't connect to MySQL server on '[host]' (111)

Step 2: The server assigns a connection and sends back a handshake, which includes the server’s mysqld version, the thread id, the server host and port, the client host and port, and a “scramble buffer” (for salting authentication, I believe).

It is during Step 2 where the connections show up in SHOW PROCESSLIST. They have not been authenticated yet, but they are connected. If there are issues with authentication, connections will be stuck at this stage. Most often stuck connections are due to DNS not resolving properly, which the skip-name-resolve option will help with.

Step 3: Client sends authentication information, including the username, the password (salted and hashed) and default database to use. If the client sends an incorrect packet, or does not send authentication information within connect_timeout seconds, the server considers the connection aborted and increments its Aborted_connects status variable.

Step 4: Server sends back whether the authentication was successful or not. If the authentication was not successful, mysqld increments its Aborted_connects status variable and sends back an error message:

ERROR 1045 (28000): Access denied for user 'user'@'host' (using password: [YES/NO])

Hope this helps!

Video: Who is the Dick on My Site Keynote

I have already blogged about this keynote at http://www.pythian.com/blogs/948/liveblogging-who-is-the-dick-on-my-site.

If you are interested in actually seeing the video, the 286 Mb .wmv file can be downloaded at http://technocation.org/videos/original/mysqlconf2008/2008_04_17_panelDick.wmv and played through your browser by clicking the “play” link at http://tinyurl.com/55c5ps. This is not to be missed!

From the official conference description:

Much of the data in a database is about people. Identity 2.0 technologies will lower the friction for people to provide and easily move data about themselves online.

This fast paced keynote will offer a background on Identity 2.0, discuss current roadblocks and future opportunities, and explore the potential impacts these will have on databases.

———–
I have already blogged about this keynote at https://sheeri.org/liveblogging-who-is-the-dick-on-my-site/
Colin Charles also blogged about it
Do not miss this keynote! See it on youtube.

Liveblogging: Who is the Dick on My Site?

Identity 2.0: A world that’s simple, safe and secure.

Who is the Dick on My Site? by Dick Hardt (Sxip Identity Corporation)

Quotes:
“Really, data is about people. It’s really identity data.”

“Identity helps you predict behavior.”

“Identity is who you are.”

“Identity is also what you like.”

“Identity enables you to uniquely identify somebody.”

“There are things that other people say about you, too.”

“Modern identity is about photo IDs so you can prove your identity.”

“Identity is a complicated issue….Everyone has a different idea of what it is.”

Identity transactions are:

  • party identification (who)
  • authorization (permission)
  • profile exchange (info about that person)
  • NOT record matching

Identity transactions can be: (more…)

Identity 2.0: A world that’s simple, safe and secure.

Who is the Dick on My Site? by Dick Hardt (Sxip Identity Corporation)

Quotes:
“Really, data is about people. It’s really identity data.”

“Identity helps you predict behavior.”

“Identity is who you are.”

“Identity is also what you like.”

“Identity enables you to uniquely identify somebody.”

“There are things that other people say about you, too.”

“Modern identity is about photo IDs so you can prove your identity.”

“Identity is a complicated issue….Everyone has a different idea of what it is.”

Identity transactions are:
party identification (who)
authorization (permission)
profile exchange (info about that person)
NOT record matching
Identity transactions can be:
verbal
but it’s unverified
need trust
How do you verify?
ID, subject matches credential, assuming the feature that only the one person can use that ID.

Photo ID is asymmetrical in trust, because the issuing organization (province of British Columbia) doesn’t know when the ID is being used, so there’s some privacy.

What is digital identity?
sometimes, site registration.
definitely a hassle, could be simpler
unverified, fewer trust cues than verbal
Interesting point — searching de.li.cio.us shows you what other people think you are.

How do you prove to a website who you are? It’s not what you give to the site, but what the site knows about you! If you have a good eBay rating, can you take that over to Craigslist?

What we want in Identity 2.0 is a way to make identity user-centric, not site-centric, so a person can move their identity around.

How do we solve this? You have a trusted agent that can give information to relying parties — a relying party is any site that the user wants to share information. The agent does not need to trust the relying party, the sites don’t need to trust the agent. The relying party does need to trust the agent (“issuer”), but that’s it. This is how OpenID works.

Identity data isn’t just data, it’s data about a person.

Why does identity matter?

“The future has arrived, it is just not evenly distributed yet.” William Gibson

More and more apps are becoming distributed (ie, Google). Biometrics are becoming prevalent. There’s a lot of device convergence — a phone can pay for things, etc.

There are “digital natives” and “digital immigrants” — natives grew up with the computer, with the internet. An immigrant has an accent — “digital camera” for an immigrant, “camera” for a native.

Identity 2.0 predictions:
minimal passwords — the agent makes it simpler
rich portable profiles — don’t need to keep re-writing the profile information over and over
portable credentials — digital driver’s licence, prove attributes digitally
agency/delegation — an assistant can book a flight for you, or one site can get
reputation services — like blogosphere, page rank, great contributor to wikis or open source. Similar to credit rating.
identity services — disposable e-mail, one-time tokens, such as one-time payments, one-time phone numbers, all this stuff can help reduce spam and protect privacy.
State of user-centric identity:
functionality — there is nothing out there that’s functional out there for what we need
industry — many organizations are working together, that wouldn’t normally – Grade: A
standards — needs more work – Grade: C
interop — standards not quite there, but folks are making it work – Grade: B
deployment — there’s a start, but more needed – Grade: C
utilization — nominal – Grade: D probably should be F

vitamins — should take, but don’t
painkillers — don’t want to take, but do
viagra — want to take, probably shouldn’t
Identity 2.0 is still at the vitamin stage. There’s no pain.

CHAR() vs. VARCHAR()

So, a little gotcha:

The CHAR() and VARCHAR() types are different types. MySQL silently converts any CHAR() fields to VARCHAR() when creating a table with at least 1 VARCHAR() field.

http://dev.mysql.com/doc/refman/5.0/en/silent-column-changes.html

If any column in a table has a variable length, the entire row becomes variable-length as a result. Therefore, if a table contains any variable-length columns (VARCHAR, TEXT, or BLOB), all CHAR columns longer than three characters are changed to VARCHAR columns. This does not affect how you use the columns in any way; in MySQL, VARCHAR is just a different way to store characters. MySQL performs this conversion because it saves space and makes table operations faster.

However, that’s not entirely accurate. Because according to the manual page at http://dev.mysql.com/doc/refman/5.0/en/char.html:

As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.

If you have a field such as name, and require it to not be blank, you probably have some function testing it before it goes into the database. However, most languages are perfectly happy that ” ” isn’t blank. When it gets put into the database, however, it becomes blank if your column is a VARCHAR. Which means folks may be able to get beyond your requirement of a blank field, and actually store a blank field in the database (as opposed to storing a space or series of spaces).

The CHAR() and VARCHAR() types are different types. MySQL silently converts any CHAR() fields to VARCHAR() when creating a table with at least 1 VARCHAR() field.

http://dev.mysql.com/doc/refman/5.0/en/silent-column-changes.html

If any column in a table has a variable length, the entire row becomes variable-length as a result. Therefore, if a table contains any variable-length columns (VARCHAR, TEXT, or BLOB), all CHAR columns longer than three characters are changed to VARCHAR columns. This does not affect how you use the columns in any way; in MySQL, VARCHAR is just a different way to store characters. MySQL performs this conversion because it saves space and makes table operations faster.

However, that’s not entirely accurate. Because according to the manual page at http://dev.mysql.com/doc/refman/5.0/en/char.html:

As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.

If you have a field such as name, and require it to not be blank, you probably have some function testing it before it goes into the database. However, most languages are perfectly happy that ” ” isn’t blank. When it gets put into the database, however, it becomes blank if your column is a VARCHAR. Which means folks may be able to get beyond your requirement of a blank field, and actually store a blank field in the database (as opposed to storing a space or series of spaces).