Should You Use GROUP BY/ORDER BY NULL By Default?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

At Northeast PHP a few weeks ago, an audience member came up to me after my talk about indexing and asked about ORDER BY NULL for optimal queries. I have to say, I was surprised, as I had not heard about using ORDER BY NULL. In a nutshell, apparently when MySQL does a GROUP BY, there is an implicit ORDER BY the same fields, which adds extra overhead for the mere purpose of returning the values in order of the GROUP BY fields.

I knew about the implicit ORDER BY, but I thought that was required for the GROUP BY, and made the GROUP BY faster. After all, it’s easier to group like items together if they are already sorted, right?

However, every single source I have researched seems to imply that, no, it is just overhead and completely unnecessary unless you really do want the results returned in the same order as the GROUP BY. For example, if you query with GROUP BY last_name and do not care about having the rows returned in lexical order of last names, you would use GROUP BY last_name ORDER BY NULL.

This leads me to believe that by default, whenever doing a GROUP BY, it is a good idea to use ORDER BY NULL. I have not seen that as a piece of advice that is generally given out in talks, either. So maybe there is something I am not understanding properly? I would love to know what everyone thinks.

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

My question is – why wait until you see a “Using filesort” in your EXPLAIN plan? If the overhead is only used to order the results, and that is not desired, why not just use ORDER BY NULL by default, whenever using a GROUP BY query?

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?

Edited to add: Thanks to Roland Bouman for pointing out Bug 30477 created Aug 2007 that addresses this issue. I am glad I am not the only one who thinks implicit overhead is bad!

At Northeast PHP a few weeks ago, an audience member came up to me after my talk about indexing and asked about ORDER BY NULL for optimal queries. I have to say, I was surprised, as I had not heard about using ORDER BY NULL. In a nutshell, apparently when MySQL does a GROUP BY, there is an implicit ORDER BY the same fields, which adds extra overhead for the mere purpose of returning the values in order of the GROUP BY fields.

I knew about the implicit ORDER BY, but I thought that was required for the GROUP BY, and made the GROUP BY faster. After all, it’s easier to group like items together if they are already sorted, right?

However, every single source I have researched seems to imply that, no, it is just overhead and completely unnecessary unless you really do want the results returned in the same order as the GROUP BY. For example, if you query with GROUP BY last_name and do not care about having the rows returned in lexical order of last names, you would use GROUP BY last_name ORDER BY NULL.

This leads me to believe that by default, whenever doing a GROUP BY, it is a good idea to use ORDER BY NULL. I have not seen that as a piece of advice that is generally given out in talks, either. So maybe there is something I am not understanding properly? I would love to know what everyone thinks.

Here is the result of my research:

The manual at http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html says:

By default, MySQL sorts all GROUP BY col1, col2, … queries as if you specified ORDER BY col1, col2, … in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL.

High Performance MySQL, 3rd Edition says the following on page 246, in the section on “Optimizing GROUP BY and DISTINCT”:

MySQL automatically orders grouped queries by the columns in the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic sort. You can also add an optional DESC or ASC keyword right after the GROUP BY clause to order the results in the desired direction by the clause’s columns.

A random blog post extols the joy of “no more filesorts” in his EXPLAINs at http://www.subelsky.com/2008/05/order-by-null-kills-mysql-filesorts.html.

My question is – why wait until you see a “Using filesort” in your EXPLAIN plan? If the overhead is only used to order the results, and that is not desired, why not just use ORDER BY NULL by default, whenever using a GROUP BY query?

Or is it possible that yes, whenever developers write GROUP BY, they probably want the results returned in the order of the ORDER BY?

Comments are closed.