Video: How to Stop Hating MySQL

(Note: updated with the presentation video on 11/15/2008)

At LISA 2008, I gave a presentation entitled “How to Stop Hating MySQL: Fixing Common Mistakes and Myths”.

The presentation slides can be downloaded as a PDF at:

http://technocation.org/files/doc/stophatingmysql.pdf

View the video online at http://www.youtube.com/watch?v=DylLYQlzuKo

Here are some notes and links I referred to:

Technocation, Inc containing free videos, a MySQL podcast (currently on hiatus) and blog posts about MySQL.

Why you want to switch to MySQL 5.1

How MySQL Uses Memory

Query Cache Configuration

EXPLAIN manual page

EXPLAIN cheatsheet

(Note: updated with the presentation video on 11/15/2008)

At LISA 2008, I gave a presentation entitled “How to Stop Hating MySQL: Fixing Common Mistakes and Myths”.

The presentation slides can be downloaded as a PDF at:

http://technocation.org/files/doc/stophatingmysql.pdf

View the video online at http://www.youtube.com/watch?v=DylLYQlzuKo

Here are some notes and links I referred to:

Technocation, Inc containing free videos, a MySQL podcast (currently on hiatus) and blog posts about MySQL.

Why you want to switch to MySQL 5.1

How MySQL Uses Memory

Query Cache Configuration

EXPLAIN manual page

EXPLAIN cheatsheet

Max VARCHAR Size

The maximum length of a VARCHAR is only restricted by the maximum row length. In most storage engines, the maximum row length is the maximum allowed by MySQL, which is 65,535 bytes. Only the NDB storage engine has a different maximum value.

The VARCHAR data type has some overhead. If the length of VARCHAR is less than 255, one byte per row is used to store the actual length of the string. If the length of VARCHAR is greater than 255, the overhead cost of storing the string length is two bytes per row. Thus, the maximum length of a VARCHAR should be 65,533 bytes.

However, that is not the case:

mysql> CREATE TABLE max_len_varchar(fld VARCHAR(65533) CHARSET latin1);
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columns to TEXT or BLOBs

mysql> CREATE TABLE max_len_varchar(fld VARCHAR(65532) CHARSET latin1);
Query OK, 0 rows affected (0.16 sec)

Why is that? I believe that is because there is also a 1-byte overhead per row, though I have not confirmed that. There is some allusion to that on the Forge Internals page……but I have not confirmed that it is the case.

(one of the reasons I believe this is that there’s a different error message when the character set is multi-byte:

mysql> CREATE TABLE max_len_varchar(fld VARCHAR(16384) CHARSET utf8);
ERROR 1074 (42000): Column length too big for column 'fld' (max = 16383); use BLOB or TEXT instead

This actually gives the maximum length you can use. Note that the maximum usable length still seems to be missing 1 byte from the theoretical max of 65,535:

mysql> SELECT 16383*4+2;
+-----------+
| 16383*4+2 |
+-----------+
|     65534 |
+-----------+
1 row in set (0.00 sec)

)

(and yes, you can use TEXT and BLOB fields if you actually want a variable-length string with more than 65,532 bytes, that’s not really the point of this article)

The maximum length of a VARCHAR is only restricted by the maximum row length. In most storage engines, the maximum row length is the maximum allowed by MySQL, which is 65,535 bytes. Only the NDB storage engine has a different maximum value.

The VARCHAR data type has some overhead. If the length of VARCHAR is less than 255, one byte per row is used to store the actual length of the string. If the length of VARCHAR is greater than 255, the overhead cost of storing the string length is two bytes per row. Thus, the maximum length of a VARCHAR should be 65,533 bytes.

However, that is not the case:

mysql> CREATE TABLE max_len_varchar(fld VARCHAR(65533) CHARSET latin1);
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columns to TEXT or BLOBs

mysql> CREATE TABLE max_len_varchar(fld VARCHAR(65532) CHARSET latin1);
Query OK, 0 rows affected (0.16 sec)

Why is that? I believe that is because there is also a 1-byte overhead per row, though I have not confirmed that. There is some allusion to that on the Forge Internals page……but I have not confirmed that it is the case.

(one of the reasons I believe this is that there’s a different error message when the character set is multi-byte:

mysql> CREATE TABLE max_len_varchar(fld VARCHAR(16384) CHARSET utf8);
ERROR 1074 (42000): Column length too big for column 'fld' (max = 16383); use BLOB or TEXT instead

This actually gives the maximum length you can use. Note that the maximum usable length still seems to be missing 1 byte from the theoretical max of 65,535:

mysql> SELECT 16383*4+2;
+-----------+
| 16383*4+2 |
+-----------+
|     65534 |
+-----------+
1 row in set (0.00 sec)

)

(and yes, you can use TEXT and BLOB fields if you actually want a variable-length string with more than 65,532 bytes, that’s not really the point of this article)

Happy Birthday to Wonderful Community Organizers!

Happy birthday to Giuseppe Maxia, a wonderful community organizer for MySQL.

And happy birthday to Leslie Hawthorn, Google’s geek herder.

Thank you for making life better for open source volunteers like me!

Happy birthday to Giuseppe Maxia, a wonderful community organizer for MySQL.

And happy birthday to Leslie Hawthorn, Google’s geek herder.

Thank you for making life better for open source volunteers like me!

Virtualizing MySQL

I had so much to say in response to a recent post asking about virtualization from Jennifer Glore that I realized it was long enough to be a blog post.

It really depends on what you’re looking to do. Many companies don’t have the money and staff to have an in-house data center with proper power and network redundancy; others don’t want the depreciation associated with owning computer hardware (even if they leased space in a data center, they’d have to buy equipment to put in it).

Some reasons to virtualize:
1) you need a fresh machine and cannot wait to order a new one or re-purpose an older one.
2) your need for machines/services fluctuates (and again, re-purposing takes time). This need can be as broad as employee desktops or as specialized as needing extra machines for a qa cycle.
3) you own resources that are not being utilized to the fullest extent — virtualization can sometimes make more or better use of these resources.
4) you need to easily re-create an exact environment and do not want to worry about hardware differences.

Here at The Pythian Group, we have clients using many different types of virtualization.

A few clients are using MySQL on Amazon’s EC2 platform. The biggest advantage is also one of the biggest disadvantages — before EBS (elastic block storage) was offered, the threat of a reboot wiping the filesystem clean meant that we really had to ensure that we had:

1) redundancy
2) a catalog of what was needed on the machine — everything from users to perl modules. This can be done either by using a machine image, documenting a setup and recreating the machine manually, or via automation. By using tools like CFEngine or Puppet to control machine configurations, our clients have the added benefit of more standardized installations and layouts. As well, pushing a change (say, adding a new hire’s public key into an authorized_keys file) is made much much more easy.

The downside is, of course, the work to set up tools such as this. However, they are fairly common best practices, and are almost always good to implement.

The clients that are using Amazon EC2 are happy with the service they get; One client I work with especially closely moved to EC2 because they had a bad experience with their hosting provider. I cannot speak to pricing, but I do know that being able to just *have* another machine up and running in minutes is very useful … periodic work such as load testing and qa cycles work really well.

We also have many clients who “virtualize” MySQL by running more than one instance on a server. A few clients have a replication slave that has 2 instances — one as a read-only reporting instance, and another as an instance to backup, doing a cold backup — stopping the instance, copying the files, starting the instance back up. Other clients have 4-5 MySQL instances running on one machine, for developers to have individual database instances to write code against and as backups for 4-5 different physical machines.

I had so much to say in response to a recent post asking about virtualization from Jennifer Glore that I realized it was long enough to be a blog post.

It really depends on what you’re looking to do. Many companies don’t have the money and staff to have an in-house data center with proper power and network redundancy; others don’t want the depreciation associated with owning computer hardware (even if they leased space in a data center, they’d have to buy equipment to put in it).

Some reasons to virtualize:
1) you need a fresh machine and cannot wait to order a new one or re-purpose an older one.
2) your need for machines/services fluctuates (and again, re-purposing takes time). This need can be as broad as employee desktops or as specialized as needing extra machines for a qa cycle.
3) you own resources that are not being utilized to the fullest extent — virtualization can sometimes make more or better use of these resources.
4) you need to easily re-create an exact environment and do not want to worry about hardware differences.

Here at The Pythian Group, we have clients using many different types of virtualization.

A few clients are using MySQL on Amazon’s EC2 platform. The biggest advantage is also one of the biggest disadvantages — before EBS (elastic block storage) was offered, the threat of a reboot wiping the filesystem clean meant that we really had to ensure that we had:

1) redundancy
2) a catalog of what was needed on the machine — everything from users to perl modules. This can be done either by using a machine image, documenting a setup and recreating the machine manually, or via automation. By using tools like CFEngine or Puppet to control machine configurations, our clients have the added benefit of more standardized installations and layouts. As well, pushing a change (say, adding a new hire’s public key into an authorized_keys file) is made much much more easy.

The downside is, of course, the work to set up tools such as this. However, they are fairly common best practices, and are almost always good to implement.

The clients that are using Amazon EC2 are happy with the service they get; One client I work with especially closely moved to EC2 because they had a bad experience with their hosting provider. I cannot speak to pricing, but I do know that being able to just *have* another machine up and running in minutes is very useful … periodic work such as load testing and qa cycles work really well.

We also have many clients who “virtualize” MySQL by running more than one instance on a server. A few clients have a replication slave that has 2 instances — one as a read-only reporting instance, and another as an instance to backup, doing a cold backup — stopping the instance, copying the files, starting the instance back up. Other clients have 4-5 MySQL instances running on one machine, for developers to have individual database instances to write code against and as backups for 4-5 different physical machines.

IMPORTANT: Registration for OpenSQL Camp

OpenSQL Camp is a free unconference happening Friday, Nov. 14th through Sunday, Nov. 16th in Charlottesville, Virginia, USA. More details can be found at http://www.opensqlcamp.org.

Whether or not you have already registered for OpenSQL Camp by putting your name here:

http://www.opensqlcamp.org/index.php?title=Events/2008/AttendeeList

you should be aware of very important information.
(more…)

OpenSQL Camp is a free unconference happening Friday, Nov. 14th through Sunday, Nov. 16th in Charlottesville, Virginia, USA. More details can be found at http://www.opensqlcamp.org.

Whether or not you have already registered for OpenSQL Camp by putting your name here:

http://www.opensqlcamp.org/index.php?title=Events/2008/AttendeeList

you should be aware of very important information.
(more…)

Liveblogging: A Five-Step Framework for Achieving the Strategic Value of Cloud Computing

I took part in a webinar on cloud computing today, including some of the top names in cloud computing services. As Pythian has some MySQL clients using cloud computing, I was particularly interested…

I was interested by the many levels of what cloud computing means, including such categorizations as Facebook apps being a part of the cloud. I think many of us consider cloud computing to mean “virtual infrastructure as a service” and overlook some pretty robust cloud computing that’s already out there, such as “application components as a service” and “software [platform] as a service”.

Following are my notes:

“Our objective today is to cut through some of the noise associated with ‘cloud’ and get to a real world approach for getting some serious value from the cloud.”
(more…)

I took part in a webinar on cloud computing today, including some of the top names in cloud computing services. As Pythian has some MySQL clients using cloud computing, I was particularly interested…

I was interested by the many levels of what cloud computing means, including such categorizations as Facebook apps being a part of the cloud. I think many of us consider cloud computing to mean “virtual infrastructure as a service” and overlook some pretty robust cloud computing that’s already out there, such as “application components as a service” and “software [platform] as a service”.

Following are my notes:

“Our objective today is to cut through some of the noise associated with ‘cloud’ and get to a real world approach for getting some serious value from the cloud.”
(more…)

What is an “unauthenticated user”?

Every so often we have a client worrying about unauthenticated users. For example, as part of the output of SHOW PROCESSLIST they will see:

+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| Id  | User                 | Host               | db   | Command | Time | State | Info             |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| 235 | unauthenticated user | 10.10.2.74:53216   | NULL | Connect | NULL | login | NULL             |
| 236 | unauthenticated user | 10.120.61.10:51721 | NULL | Connect | NULL | login | NULL             |
| 237 | user                 | localhost          | NULL | Query   | 0    | NULL  | show processlist |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+

Who are these unauthenticated users, how do they get there, and why aren’t they authenticated?

The client-server handshake in MySQL is a 4-step process. Those familiar with mysql-proxy already know these steps, as there are four functions that a Lua script in mysql-proxy can override. The process is useful to know for figuring out exactly where a problem is when something breaks.
(more…)

Every so often we have a client worrying about unauthenticated users. For example, as part of the output of SHOW PROCESSLIST they will see:

+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| Id  | User                 | Host               | db   | Command | Time | State | Info             |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+
| 235 | unauthenticated user | 10.10.2.74:53216   | NULL | Connect | NULL | login | NULL             |
| 236 | unauthenticated user | 10.120.61.10:51721 | NULL | Connect | NULL | login | NULL             |
| 237 | user                 | localhost          | NULL | Query   | 0    | NULL  | show processlist |
+-----+----------------------+--------------------+------+---------+------+-------+------------------+

Who are these unauthenticated users, how do they get there, and why aren’t they authenticated?

The client-server handshake in MySQL is a 4-step process. Those familiar with mysql-proxy already know these steps, as there are four functions that a Lua script in mysql-proxy can override. The process is useful to know for figuring out exactly where a problem is when something breaks.

Step 1: Client sends connect request to server. There is no information here (as far as I can tell). However, it does mean that if you try to connect to a host and port of a mysqld server that is not available, you will get

ERROR 2003 (HY000): Can't connect to MySQL server on '[host]' (111)

Step 2: The server assigns a connection and sends back a handshake, which includes the server’s mysqld version, the thread id, the server host and port, the client host and port, and a “scramble buffer” (for salting authentication, I believe).

It is during Step 2 where the connections show up in SHOW PROCESSLIST. They have not been authenticated yet, but they are connected. If there are issues with authentication, connections will be stuck at this stage. Most often stuck connections are due to DNS not resolving properly, which the skip-name-resolve option will help with.

Step 3: Client sends authentication information, including the username, the password (salted and hashed) and default database to use. If the client sends an incorrect packet, or does not send authentication information within connect_timeout seconds, the server considers the connection aborted and increments its Aborted_connects status variable.

Step 4: Server sends back whether the authentication was successful or not. If the authentication was not successful, mysqld increments its Aborted_connects status variable and sends back an error message:

ERROR 1045 (28000): Access denied for user 'user'@'host' (using password: [YES/NO])

Hope this helps!

MySQL Magazine Fall 2008 Issue Available!

Get it while it’s hot! The Fall 2008 issue of the MySQL Magazine is now available at http://www.mysqlzine.net. Issue 6 is chock full of 16 pages good stuff, including:

  • “Decision Table”-Driven Development by Jonathan Levin
  • Part I of a series on Transaction Time Validity in MySQL by Peter Brawley
  • An Overview of Zmanda Recovery Manager by Pythian’s own Gerry Narvaja
  • Keith Murphy, editor has a note about Drizzle.

Download the PDF directly or go to the MySQL Magazine page to download any and all of the 6 issues.

Get it while it’s hot! The Fall 2008 issue of the MySQL Magazine is now available at http://www.mysqlzine.net. Issue 6 is chock full of 16 pages good stuff, including:

  • “Decision Table”-Driven Development by Jonathan Levin
  • Part I of a series on Transaction Time Validity in MySQL by Peter Brawley
  • An Overview of Zmanda Recovery Manager by Pythian’s own Gerry Narvaja
  • Keith Murphy, editor has a note about Drizzle.

Download the PDF directly or go to the MySQL Magazine page to download any and all of the 6 issues.

Open Source and the Economy

I was reading Savio Rodrigues’ post, The economy and open source, in which he responds to Andrew Keen’s thoughts that a bad economy will see fewer open source contributions.

Now, Keen feels that people will contribute less during bad financial times:

The hungry and cold unemployed masses aren’t going to continue giving away their intellectual labor on the Internet in the speculative hope that they might get some “back end” revenue. “Free” doesn’t fill anyone’s belly; it doesn’t warm anyone up.

I know several volunteer open source developers — I consider this to be a “role” that someone plays. A person may be *employed* as an open source developer (say, working at Sun on MySQL) but may also contribute to another open source project off-the-clock. I would consider that person to be a volunteer open source developer for the off-the-clock project.

At any rate…very few people volunteer as some kind of investment with a monetary return. (more…)

I was reading Savio Rodrigues’ post, The economy and open source, in which he responds to Andrew Keen’s thoughts that a bad economy will see fewer open source contributions.

Now, Keen feels that people will contribute less during bad financial times:

The hungry and cold unemployed masses aren’t going to continue giving away their intellectual labor on the Internet in the speculative hope that they might get some “back end” revenue. “Free” doesn’t fill anyone’s belly; it doesn’t warm anyone up.

I know several volunteer open source developers — I consider this to be a “role” that someone plays. A person may be *employed* as an open source developer (say, working at Sun on MySQL) but may also contribute to another open source project off-the-clock. I would consider that person to be a volunteer open source developer for the off-the-clock project.

At any rate…very few people volunteer as some kind of investment with a monetary return. (more…)

mysqlbinlog –server-id before MySQL 5.1? awk to the rescue!

Recently I had an interesting issue crop up. Due to an unfortunate migration incident in which involved master/master replication and not checking to see if replication was caught up, we ended up with an infinite replication loop of a number of SQL statements. awk helped immensely in the aftermath cleanup.

The basics of the replication infinite loop were (more…)

Recently I had an interesting issue crop up. Due to an unfortunate migration incident in which involved master/master replication and not checking to see if replication was caught up, we ended up with an infinite replication loop of a number of SQL statements. awk helped immensely in the aftermath cleanup.

The basics of the replication infinite loop were (more…)