So, at midnight I got a call from customer service saying our site was slow. I narrowed it down to one of our auxiliary databases, that seems to have gotten wedged just about midnight. Normal queries that took less than 4 seconds started taking longer and longer, moving up to 5 seconds and past 30 seconds in the span of a minute or so.
In the moment, I thought killing off all the queries would be a good move. My kill script, which consists of:
for i in `/usr/bin/mysql -u user -pPass -e 'show full processlist' | grep appuser | cut -f1`
mysql -u user -pPass -e "kill $i"
This will attempt to kill any mysql connection owned by the appuser. I used it a few times, and it didn’t work. So I used a trick I learned when we bring our site down — sometimes there are straggling connections to mysql, so what I do is change the app user’s password by direct manipulation of the mysql.user table and flush privileges.
Within 10 seconds, all the connections from the appuser were gone, and when I put the correct password back and flushed privileges, everything came back normal. Queries started taking their usual amount of time.
Why is it that queries that refused to be killed by “kill”, and yet changing the password for the user they were running as killed them off? Some were running more than 45 seconds, in various states of “Sending data” and “closing tables”. Nothing was running for much longer than 60 seconds, so it doesn’t seem like there was a big query that was wedging things.
Oh, and what happened at midnight to cause this? No clue, the only thing we run at that time is a PURGE LOGS FROM MASTER, which we do every hour, as we fill up a 1.1G binary log file every 20 minutes or so. This database holds a particularly heavy write application and also runs reports, so we optimize the tables every week (wednesday at 2 am). I’ve put the optimization to daily, as when I ran it manually this morning it took about 20 seconds.
Anyone have an idea about why changing the password worked so quickly when kill did not?