Why I Use ONLY_FULL_GROUP_BY
MySQL uses the concept of SQL_MODE to “define what SQL syntax MySQL should support and what kind of data validation checks it should perform”. This post is about one of those modes, ONLY_FULL_GROUP_BY, and why I use it.
Roland Bouman wrote a great post a few years ago that debunks some myths about using GROUP BY in MySQL. His post has a lot of detail and examples, and does a very good job detailing the way GROUP BY works in MySQL with and without ONLY_FULL_GROUP_BY enabled. I recommend that you go and read that post now. Among other things, Roland points out one case where query performance is improved by not using a full GROUP BY. The post is several years old, but the performance difference is still present today in MySQL 5.6.
That post doesn’t make any strong recommendation on using ONLY_FULL_GROUP_BY, so why do I use it? For me, it makes it safer to run dynamically generated report queries. I execute a variety of dynamic reports that select one or more metrics for a specific time period over a specific set of dimensions. In order for the report to return accurate data, all of the dimensional columns must be in both the SELECT clause and the GROUP BY clause. I trust myself to write good SQL by hand, but since I have code generating dynamic SQL I value the extra protection provided by this SQL mode. If MySQL throws an error due to a partial GROUP BY I can catch it with a functional test, rather than trying to catch a more subtle error, namely incorrect report data.
To illustrate the problem I am trying to avoid, I will execute a reporting query on the sakila database. Let’s say I want to see the 10 most popular language/category combinations for the films in the sakila database. I could use a query like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
That looks pretty good. Now imagine my query was written by a query builder in code, and due to a bug the category column was not added to the GROUP BY clause. By default MySQL will still execute the query and give me a result, but the result is misleading:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
That makes it look like every film in the sakila database is a Documentary, which is obviously not accurate.
To avoid this I will set the SQL_MODE. When setting the SQL_MODE you need to be careful not to remove any existing SQL_MODE values. In my case I will append ONLY_FULL_GROUP_BY to the 2 SQL_MODE values I am already using:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Now that I’ve set the SQL_MODE, I will try my query again:
1 2 3 4 5 6 7 8 9 10 | |
That’s much better. In my case I would rather get an error than get bad data, so this is the result I want.
