In yesterday’s post I was fairly prescriptive about how to do an upsert in MySQL. I’m aware that there are other ways to do an upsert in MySQL, but I intentionally emphasized INSERT ... ON DUPLICATE KEY UPDATE over REPLACE because I think it’s almost always the better choice.
Here are my thoughts on REPLACE. I’ll start with the good stuff.
The appeal of REPLACE
REPLACE is a natural complement to INSERT and INSERT IGNORE. A plain INSERT gives you expected SQL INSERT behavior. If you try to insert a duplicate row you get an error. For example:
1234567891011
mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update |
+----------+------------+-----------+---------------------+
| 50 | NATALIE | HOPKINS | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)
mysql> insert into sakila.actor (actor_id, first_name, last_name, last_update)
-> values (50,'NATALIE','SMITH','2013-09-30 14:11:36');
ERROR 1062 (23000): Duplicate entry '50' for key 'PRIMARY'
INSERT IGNORE is a “first write wins” implementation. The syntax is exactly the same as INSERT, you just add the IGNORE keyword. If you try to insert a duplicate row the insert is simply ignored, but without an error. For example:
REPLACE is a “last write wins” implementation. The syntax is exactly the same as INSERT, you just use the REPLACE command instead of INSERT. If you try to insert a duplicate row the existing row is replaced by the new row. For example:
12345678910111213141516171819202122
mysql> set foreign_key_checks = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update |
+----------+------------+-----------+---------------------+
| 50 | NATALIE | HOPKINS | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)
mysql> replace into sakila.actor (actor_id, first_name, last_name, last_update)
-> values (50,'NATALIE','SMITH','2013-09-30 14:11:36');
Query OK, 2 rows affected (0.00 sec)
mysql> select * from sakila.actor where actor_id = 50;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update |
+----------+------------+-----------+---------------------+
| 50 | NATALIE | SMITH | 2013-09-30 14:11:36 |
+----------+------------+-----------+---------------------+
1 row in set (0.00 sec)
The downside of REPLACE
The interface of REPLACE has its advantages, but its implementation can cause problems. REPLACE is implemented as a DELETE followed by an INSERT. One disadvantage of that implementation is its performance. In every scenario I can imagine in MySQL it is faster to update an existing row than to delete one row and insert another one.
Beyond the performance implications, executing a DELETE followed by an INSERT can cause more problems when foreign key constraints are involved. For example if I try yesterday’s upsert using REPLACE I get an error:
123
mysql> replace into sakila.actor (actor_id, first_name, last_name, last_update)
-> values (50,'NATALIE','SMITH','2013-09-27 12:34:56');
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (sakila.film_actor, CONSTRAINT fk_film_actor_actor FOREIGN KEY (actor_id) REFERENCES actor (actor_id) ON UPDATE CASCADE)
The error occurred because there are child rows in the film_actor table that reference the row in the actor table that I am updating, and the delete is disallowed by the foreign key. As a workaround, I can temporarily disable foreign key checks before executing my REPLACE command:
That workaround is fine for this particular case, in part because the relevant foreign key is defined with ON DELETE RESTRICT, but what about foreign keys defined with ON DELETE CASCADE? For those REPLACE is even more problematic. Here’s a quick schema change to allow me to test it:
123456789
mysql> alter table sakila.film_actor
-> drop foreign key fk_film_actor_actor;
Query OK, 5462 rows affected (0.11 sec)
Records: 5462 Duplicates: 0 Warnings: 0
mysql> alter table sakila.film_actor
-> add constraint fk_film_actor_actor foriegn key (actor_id) references actor (actor_id) on delete cascade on update cascade;
Query OK, 5462 rows affected (0.11 sec)
Records: 5462 Duplicates: 0 Warnings: 0
Now I’ll run my REPLACE command again, this time counting the relevant rows in the child table before and after the REPLACE:
1234567891011121314151617181920212223
mysql> select count(*)
-> from sakila.film_actor
-> where actor_id = 50;
+----------+
| count(*) |
+----------+
| 32 |
+----------+
1 row in set (0.00 sec)
mysql> replace into sakila.actor (actor_id, first_name, last_name, last_update)
-> values (50,'NATALIE','SMITH','2013-09-27 12:34:56');
Query OK, 2 rows affected (0.00 sec)
mysql> select count(*)
-> from sakila.film_actor
-> where actor_id = 50;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
Oops. All I wanted to do was update one row in the actor table, and I ended up inadvertantly deleting 32 related rows from the film_actor table!
I can get into similar trouble when using REPLACE on a table with a DELETE trigger, since the REPLACE will execute an implicit DELETE and thereby fire the relevant trigger(s).
Since REPLACE can be used with any unique index (not just primary keys), it can also cause the auto-increment primary key value to change for a given row. For example:
12345678910111213141516171819202122232425262728
mysql> create table user_account (
-> id int auto_increment primary key,
-> username varchar(50) not null,
-> unique key username (username)
-> );
Query OK, 0 rows affected (0.03 sec)
mysql> replace into user_account (username) values ('ike');
Query OK, 1 row affected (0.00 sec)
mysql> select * from user_account;
+----+----------+
| id | username |
+----+----------+
| 1 | ike |
+----+----------+
1 row in set (0.00 sec)
mysql> replace into user_account (username) values ('ike');
Query OK, 2 rows affected (0.00 sec)
mysql> select * from user_account;
+----+----------+
| id | username |
+----+----------+
| 2 | ike |
+----+----------+
1 row in set (0.00 sec)
In summary, given its implementation I think it is usually not a good idea to use the MySQL REPLACE syntax. The one exception for me is when loading multiple rows from a flat file into a MySQL OLAP database using LOAD DATA INFILE. In that case I am bulk-loading data into MySQL from an external ETL process and I truly want a “last write wins” command, so I use LOAD DATA INFILE ... REPLACE. This use case works because none of the downsides I mentioned above are relevant. There are no foreign keys referencing my target table, no triggers, and I am not using auto-increment columns.