In a recent post I talked about how to do an upsert in MySQL. Doing an upsert in Cassandra 1.2 using CQL 3 is more straightforward. Why? Because in Cassandra every INSERT (and every UPDATE) is actually an upsert. Cassandra is a distributed database that avoids reading before a write, so an INSERT or UPDATE sets the column values you specify regardless of whether the row already exists. This means inserts can update existing rows, and updates can create new rows. It also means it’s easy to accidentally overwrite existing data, so keep that in mind.
Read on for some examples that illustrate Cassandra’s upsert behavior. For the sake of consistency I’ll use the same row from my related MySQL post.
Now I INSERT my test row and read it back to verify the data:
1234567
cqlsh:test> insert into actor (actor_id, first_name, last_name, last_update)
... values (50,'NATALIE','HOPKINS','2006-02-15 04:34:33');
cqlsh:test> select * from actor where actor_id = 50;
actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
50 | NATALIE | HOPKINS | 2006-02-15 04:34:33+0000
I could update the last_name and last_update values in that row using a CQL UPDATE, but since this post is about upserts I’ll do it with an INSERT instead:
1234567
cqlsh:test> insert into actor (actor_id, first_name, last_name, last_update)
... values (50,'NATALIE','SMITH','2013-09-27 12:34:56');
cqlsh:test> select * from actor where actor_id = 50;
actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
50 | NATALIE | SMITH | 2013-09-27 12:34:56+0000
Now that I’ve covered updating an existing row using an INSERT, how about inserting a non-existant row using UPDATE? I’ll delete the row first and verify it’s gone, then do the UPDATE and SELECT to verify the behavior:
123456789101112
cqlsh:test> delete from actor where actor_id = 50;
cqlsh:test> select * from actor where actor_id = 50;
cqlsh:test> update actor
... set first_name = 'NATALIE',
... last_name = 'HOPKINS',
... last_update = '2006-02-15 04:34:33'
... where actor_id = 50;
cqlsh:test> select * from actor where actor_id = 50;
actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
50 | NATALIE | HOPKINS | 2006-02-15 04:34:33+0000
If the row already exists I can update specific columns and leave others alone by only naming the columns I want to update. In this case any columns that I do not specify will keep their existing values. For example:
1234567
cqlsh:test> insert into actor (actor_id, last_name, last_update)
... values (50,'SMITH','2013-09-27 12:34:56');
cqlsh:test> select * from actor where actor_id = 50;
actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
50 | NATALIE | SMITH | 2013-09-27 12:34:56+0000