Fast-forwarding a Delayed MySQL Replica
Last week I needed to fast-forward a delayed MySQL replica in order to recover some data. I use pt-slave-delay to keep this replica X days behind the master, and I needed to fast-forward it to be about X-2 days behind the master instead.
I wanted to catch the replica up to a precise moment in time, and it was important that it not go past that point until I was done recovering the data I needed.
I knew there was a command to do this, but I couldn’t remember the syntax. Since I figured pt-slave-delay was implemented using this command, I just opened the perl script and searched around until I found the syntax I was looking for. Here it is:
1 2 | |
Once I got the right syntax I needed to know which log_name and log_pos values to use.
I got the log_name by SSHing to the master DB, listing the binary log directory, and finding the oldest log with an mtime greated than the time I was looking for (If there is a way to get a listing of binary logs based on time range in the MySQL CLI, please let me know in the comments!).
Once I had log_name, I used the mysqlbinlog tool to find the log_pos I was looking for. In my experience start-datetime is inclusive, and stop-datetime is exclusive, so to get all of the binary logging from 2014-02-04 16:18:30 I use that as start-datetime and add 1 second to stop-datetime. For example:
1 2 3 | |
Then I sorted through the output of that until I found the most recent SQL statement I wanted to execute on my delayed replica. One of the comments preceding that statement has an end_log_pos value, which I used as my log_pos.
In order to fast-forward the replica I needed to stop running pt-slave-delay. Otherwise it would not let replication proceed past X days behind the master.
Once I had those values, here’s the process I followed to fast-forward the replica:
- Disable monit for
pt-slave-delayso it wouldn’t restart automatically - Stop
pt-slave-delay - Run
STOP SLAVEon the delayed replica - Run
START SLAVE UNTIL MASTER_LOG_FILE = 'bin-log.000386', MASTER_LOG_POS = 950592556on the delayed replica - Wait for replication to catch up to that point
- Recover the data I need
- Run
STOP SLAVEon the delayed replica (just to be safe) - Enable monit for
pt-slave-delay - Start
pt-slave-delay(or just let monit start it)
At that point the replica will stay in that state for about 2 days, at which point pt-slave-delay will again enforce the pre-defined replication delay of X days.
