How to Do an Upsert in Cassandra

In a recent post I talked about how to do an upsert in MySQL. Doing an upsert in Cassandra 1.2 using CQL 3 is more straightforward. Why? Because in Cassandra every INSERT (and every UPDATE) is actually an upsert. Cassandra is a distributed database that avoids reading before a write, so an INSERT or UPDATE sets the column values you specify regardless of whether the row already exists. This means inserts can update existing rows, and updates can create new rows. It also means it’s easy to accidentally overwrite existing data, so keep that in mind.

Read on for some examples that illustrate Cassandra’s upsert behavior. For the sake of consistency I’ll use the same row from my related MySQL post.

First I create the actor table in Cassandra:

cqlsh:test> CREATE TABLE actor (
        ...   actor_id int,
        ...   first_name varchar,
        ...   last_name varchar,
        ...   last_update timestamp,
        ...   PRIMARY KEY (actor_id)
        ... ); 

Now I INSERT my test row and read it back to verify the data:

cqlsh:test> insert into actor (actor_id, first_name, last_name, last_update) 
        ... values (50,'NATALIE','HOPKINS','2006-02-15 04:34:33');
cqlsh:test> select * from actor where actor_id = 50;

 actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
       50 |    NATALIE |   HOPKINS | 2006-02-15 04:34:33+0000

I could update the last_name and last_update values in that row using a CQL UPDATE, but since this post is about upserts I’ll do it with an INSERT instead:

cqlsh:test> insert into actor (actor_id, first_name, last_name, last_update) 
        ... values (50,'NATALIE','SMITH','2013-09-27 12:34:56');
cqlsh:test> select * from actor where actor_id = 50;

 actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
       50 |    NATALIE |     SMITH | 2013-09-27 12:34:56+0000

Now that I’ve covered updating an existing row using an INSERT, how about inserting a non-existant row using UPDATE? I’ll delete the row first and verify it’s gone, then do the UPDATE and SELECT to verify the behavior:

cqlsh:test> delete from actor where actor_id = 50;
cqlsh:test> select * from actor where actor_id = 50;
cqlsh:test> update actor
        ... set first_name = 'NATALIE',
        ...   last_name = 'HOPKINS',
        ...   last_update = '2006-02-15 04:34:33'
        ... where actor_id = 50;
cqlsh:test> select * from actor where actor_id = 50;

 actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
       50 |    NATALIE |   HOPKINS | 2006-02-15 04:34:33+0000

If the row already exists I can update specific columns and leave others alone by only naming the columns I want to update. In this case any columns that I do not specify will keep their existing values. For example:

cqlsh:test> insert into actor (actor_id, last_name, last_update) 
        ... values (50,'SMITH','2013-09-27 12:34:56');
cqlsh:test> select * from actor where actor_id = 50;

 actor_id | first_name | last_name | last_update
----------+------------+-----------+--------------------------
       50 |    NATALIE |     SMITH | 2013-09-27 12:34:56+0000

Nov 1st, 2013

Cassandra, PlanetCassandra

Comments

How to Do an Upsert in Cassandra

Posted by Ike Walker

Comments