Understanding and Handling Race Conditions at DynamoDB

Understanding and Handling Race Conditions at DynamoDB

Focusing on serverless development, DynamoDB is an epiphany of a database solution. It’s managed, highly available, and scales on demand. But not everything can be taken off your hands by AWS: you still need to think about your usage patterns. A common mistake is to forget about possible race conditions caused by dirty reads.

In this article, I’ll introduce you to some common approaches on how to avoid having inconsistent data due to conflicting writes on single documents.

Even if your service is not experiencing a lot of write operations, it’s worth thinking about possible concurrency issues early on.

What are Race Conditions?

Starting with the quote from Wikipedia on race conditions:

A race condition or race hazard is the condition of an electronics, software, or another system where the system’s substantive behavior is dependent on the sequence or timing of other uncontrollable events.

So to rephrase: timings and sequences are influencing the deterministic of our system. If we’re transferring this into the web-sphere, our API operations will end up with different results if the operation orders and timings are changing, even though they submit the same payloads.

Let’s create an imaginary (which would never be implemented this way, but is good for demonstration purposes) example for this:

  • we’re saving votes per candidate in a list in a DynamoDB document.

  • a Lambda function behind an API gateway submits those votes.

If we’re receiving a concurrent voting operation for the same candidate by different voters, we can get into trouble if the operations are in a certain order.

DynamoDB and Lambda Race Condition Sequence Diagram

The second voting request comes in before the first one submits its write operation. Therefore another Lambda instance will be used (because each Lambda instance can only handle a single request at a time), which will receive the same query result for candidate Z, only containing a vote by A.

Both instances will try to update by extending the list with its voter, so we’ll end up with different results based on the timings of the write operation.

What’s even worse: both paths— either A finishing before B or visa versa — will result in corrupted data because we’ll, either way, lose one vote.

Handling Race Conditions

There are three common strategies: Optimistic Locking, Database Transactions, and Update Expressions. Optimistic Locking allows us to detect conflicting operations that would result in an inconsistent state. Database transactions ensure that a set of operations is executed together in an atomic "all-or-nothing" operation. Update Expressions can modify certain fields of a complex document, allowing us to do fine-grained, consistent update operations.

By using these strategies, we can ensure that our data remains consistent in a distributed system or any multi-node service with possibly concurrent write operations.

Optimistic Locking

Let’s go one step back and not try to solve the problem in the first place, but just detect conflicting operations which would end up in an inconsistent state.

DynamoDB Mapper and its Optimistic Locking allow us to verify that we’re updating the item we’re expecting by using a dedicated version field.

If we’re looking back at our example, we’d enhance our document with a new field version.

{  
  "key": "Z",  
  "votedBy": ["A"],  
  "version": 1  
}

It will automatically assign an initial version — to the field annotated with @DynamoDBVersionAttribute — when we create a new document. Every other write operation on that item will ensure that the version number matches by internally using a ConditionExpression. If so, it will be automatically increased which guarantees us that there was no intermediate write operation between our read and write operation.

Optimistic Locking in DynamoDB

If there was an intermediate write, our expected version won’t match and we’ll receive a ConditionalCheckFailedException. That would be the case for our introduction example at the write operation by instance B.

Now we could just resolve our case manually by reloading the current state of the document and merging our changes. This is easy in our case, as we only have a single attribute, but having a complex document with multiple fields, this can be non-trivial to complex as we also need to be aware of the changes we made in our current process.

We can stick with having optimistic locking for ensuring consistency in general, but relying on DynamoDB internals for making our update atomic.

Database Transactions

DynamoDB supports transactions to assure that a set of operations are coupled together and are only executed together, guaranteeing us our deterministic.

As seen, those are limited to either a set of reading or writing operations. So in our case, this does not help as we need to ensure that arbitrary operations on a single document are guaranteed to end up in a consistent state.

Update Expressions

With Update Expressions, we can modify certain fields of a complex document. It allows us for example to add or remove elements from a list, which is exactly what we need in our use case.

Generally, there are four different operations:

  • SET— for adding one or several attributes to an item.

  • REMOVE— for removing one or several attributes.

  • ADD— adding a new attribute with its value.

  • DELETE— remove one or more elements from a set.

For our use case, we can either use ADD or SET, but Amazon recommends using SET. Let's quickly define our update operation for instance B without using a particular framework but solely the AWS-SDK:

aws dynamodb update-item \  
    --table-name votes \  
    --key '{"key":{"S":"Z"}}' \  
    --update-expression "**SET** #field = **list_append**(#ri, :value)" \  
    --expression-attribute-names '{"#field": "votedBy"}' \  
    --expression-attribute-values '{"value": {"L": ["B"]}}'

Using DynamoDB Data Mapper for Node this can look as the following:

As written before, Update Expressions can also be used to check for certain states of the documents — like the implementation for Optimistic Locking uses the version field for ensuring that there are no conflicting operations between the read and the write operation.

Conclusion

Ensuring consistency in a distributed system — or any multi-node service — with possibly concurrent write operations is a non-trivial task. With DynamoDB, we’re getting a toolbox of helpers which allow us to detect conflicts by using Optimistic Locking and do fine-grained, consistent update operations with the help of Update Expressions.