Query Reviews part 2: pt-query-digest
One task that can really help reduce future problems is to do a periodic query review. I’ve been using pt-query-digest to do this since 2010 (back when it was part of Ma’atkit, mk-query-digest!), and while I have presented the idea several times at conferences, I have never blogged about it.
I am going to share a secret with you – I blog not just to share information with YOU, but to share information with ME. Future me. This comes up because I am working on a query review at Salesforce.com for one of our busiest shards. I went to go look up a blog post on using pt-query-digest, because it’s a handy cheat sheet….and….I never did it. So, here goes!
This is the first blog post in what will be a series of blog posts, because it is a long topic and there are a few tools that can be used these days for analysis.
What is a query review?
At its simplest, it’s a review of queries. In this context, a query review is where you proactively review the performance of ALL successful queries sent to a server.
Why should you do a query review?
A query review can find possibly problematic queries BEFORE they are a problem. As an example, you can easily find queries that do not use an index, and make indexes before the tables become so large that they have problems.
Another example is the case when you have a query that does the same thing over and over, many times per second – for example, a query that counts the sessions table every time a user hits the page, to say “x people online now”. That query could be rewritten to not do a count every time – maybe do a count once every minute or 5 minutes and put inside a “user count” table, and then each page hit queries that table. Or use an intermediate cache.
Another reason to do a query review is that sometimes a sample query is difficult to find.
Who should do a query review?
A query review is not for a junior DBA. You need to have some knowledge of how to optimize queries, how indexing works, when an index is valuable, and when an index is necessary. For example, a query not using an index on a text field in the “countries” table is much better than a query not using an index on an integer in the “customer” table, because countries won’t grow to be huge but hopefully your customers table will.
What does a query review look like?
Necessary for a query review is gathering “all” queries. This can be done in many ways:
- general log – logs all queries when they are sent to the server
- Pros
- Built into all versions and forks of MySQL.
- Gets ALL queries sent to the server, even ones that have an error. You can see if there are lots of queries with syntax errors, which can help find code/ORM bugs.
- Gets ALL attempted logins, even if they fail, so it’s useful for a security/technical debt.
- You can turn it on dynamically since….MySQL 5.1 (I think? at any rate, probably your version has it dynamic)
- MySQL overhead
- Cons
- because the logging happens when they are sent, there is no indication if the query completed successfully, or how long the query took.
- Write intensive to disk
- Grows at a large rate
- slow query log with long_query_time turned to 0 – gets all *successful* queries
- Pros
- Built into all versions and forks of MySQL.
- Can be turned on dynamically (since, I think, MySQL 5.1, same as general log).
- Gets lots of information, including lock timing, query execution timing, rows returned, whether the query was successful or not.
- Cons
- Does not get ALL attempted queries – only gets some errors (e.g., not syntax errors)
- Does not get failed logins
- Write intensive to disk
- Grows at a large rate
- MySQL overhead
- tcpdump and other traffic sniffers (wireshark, built-in sniffers to programs like MONyog, etc)
- Pros
- Built into every platform
- Gets all MySQL traffic, including attempted logins and queries with syntax errors
- Gets information like execution time, users and hosts.
- No additional MySQL overhead
- Cons
- Must be root to run tcpdump
- Write intensive to disk
- Grows at a large rate
- PERFORMANCE_SCHEMA, pt-query-digest –processlist, proxies, audit logs
- Rather than rehash what other people have written, I’ll refer to Baron of VividCortex’s post on the subject, which also refers to Percona’s post on the subject
Well, that’s a lot of words for one blog post, so I’ll end part 1: overview here. Future posts in the series will cover how to use the tools to do a query review.
In theory, tcpdump will capture all queries, but often the sniffer cannot get all packets on very high traffic servers. There can be rare issues with parsing it. IMHO slow_query_log with long_query_time 0 is your best bet and run it as long as you can tolerate the overhead. One other method you did not include is pt-query-digest against the processlist (–processlist) which does not capture nearly everything. (short running queries are often so short you don’t see them) but it gives you access to your longer running queries without as much overhead as some of the logging methods mentioned.
It’s true that network sniffing can lead to lost packets (in fact, pt-query-digest’s documentation include an example on how to mitigate this courtesy of Devananda van der Veen), but what I like the most about that approach is that:
– It does not require any changes to mysqld
– it gives you response time data as seen from the application. For MySQL this is usually not a problem, but my experiences with MongoDB’s profiler is that the response time resolution is useless to diagnose very fast queries, while doing so via network sniffing gives a much more accurate picture)
– It logs some things the slow query log does not, like warnings. I have found VividCortex’ warnings logging quite useful when troubleshooting.
As for –processlist, I fully agree. I have used it several times when I can’t change anything on the server and want to get at least an idea of the queries with the poorest execution time at the moment. Provided one is aware one is working with incomplete information, it can be a useful method too.