A Fail-fast First Test of MySQL Backup Integrity
This morning I was planning to check the integrity of a snapshot of a MySQL database that runs on EC2 using EBS, and I accidentally stumbled upon a fast way to quickly find certain types of problems with a backup.
My plan was to use CHECK TABLE on all of the tables to verify the integrity of the backup. The simplest way to run CHECK TABLES on an entire database is like this:
1
| |
Rather than using that command, I decided to generate a bunch of individual CHECK TABLE statements dynamically by running a query on the information_schema.tables table. Why do it that way? It gives me the flexibility to check a specific schema first, or check the smallest or most frequently updated tables first, etc.
So I ran this command to generate my CHECK TABLE script:
1 2 3 4 5 | |
And I planned to execute the script like this:
1
| |
However, when I ran the first query it crashed the MySQL server!
1 2 3 4 5 6 | |
Of course that error message doesn’t always mean that mysqld crashed. Sometimes it just means your session was killed. But in this case I checked, and mysqld did indeed crash.
It was a harmless crash since it was on a passive server, but it pointed out a problem with the backup so I had to find out exactly what went wrong so I could fix it.
I checked the error log and found the source of the crash:
1 2 3 4 5 6 7 8 9 10 11 12 | |
I’ve omitted the rest of the error output (including the backtrace) for brevity, but it’s clear which table caused the problem: test.promo_history_old.
After confirming that table is no longer needed I dropped it and re-ran the information_schema query successfully so I could move on with the rest of my integrity check.
In the future I’ll run a quick fail-fast test on information_schema before doing the more time-consuming check of running CHECK TABLE on all of the tables. I think this should be sufficient:
1
| |
Of course I’ll still run CHECK TABLE on all of the tables, but the information_schema test will help me find some problems faster.
