English | Site Directory

Transaction Isolation in App Engine

Max Ross, Software Engineer
May 13, 2008

Introduction

According to Wikipedia, the transaction isolation level of a database management system "defines how/when the changes made by one operation become visible to other concurrent operations." The goal of this article is to explain transaction isolation in the Google App Engine Datastore. After reading this article you should have a better understanding of how concurrent reads and writes behave.

Read Committed

Of the four isolation levels typically supported by databases (Serializable, Repeatable Read, Read Committed, Read Uncommitted), the datastore's isolation level most closely resembles Read Committed. Entities retrieved from the datastore by queries or get()s will consist only of committed data. A retrieved entity will never have partially committed data, ie some from before a commit and some from after. The interaction between queries and transactions is a bit more subtle, though, and in order to understand it we need to look at the commit() process in more depth.

The commit() process

A commit() consists of two milestones - the point at which changes to an entity have been applied and the point at which changes to indices for that entity have been applied. Let's call the first point Milestone A, and the second point, when commit() returns, Milestone B. By the time we reach Milestone A, all changes to the entity have been applied. By the time we reach Milestone B, the changes to the entity's indices have been applied.



A request that looks up an updated entity by its key at a time after Milestone A is guaranteed to see the latest version of that entity. However, if a concurrent request executes a query whose predicate (the 'where clause' for you SQL/GQL fans out there) is not satisfied by the pre-update entity but is satisfied by the post-update entity, the entity will only be part of the result set if the query executes after the commit() operation has reached Milestone B. In other words, during brief windows it is possible for a result set to not include an entity whose properties, according to the result of a lookup by key, satisfy the query predicate, and it is also possible for a result set to include an entity whose properties, again according to the result of a lookup by key, fail to satisfy the query predicate. Note that while a query cannot take into account modifications that are in between Milestone A and Milestone B when deciding which entities to return, once a query decides to return a specific entity it will always return the Milestone A version of the entity.

Examples

We've provided a general explanation of how concurrent updates and queries interact, but if you're like me you typically find it easier to get your head around these concepts by working through concrete examples. Let's walk through a few. We'll start with some simple examples and then finish up with the more interesting ones.

Let's say we have an application that stores Person entities. A Person has the following attributes:

  • Name
  • Height

This application supports the following operations:

  • updatePerson()
  • getTallPeople(), which returns all people over 72 inches tall.

We have 2 Person entities in the datastore:

  • Adam, who is 68 inches tall.
  • Bob, who is 73 inches tall.

Example 1 - Making Adam Taller

Suppose an application receives two requests at essentially the same time. The first request updates the height of Adam from 68 inches to 74 inches. A growth spurt! The second request calls getTallPeople(). What does getTallPeople() return?

The answer depends on the relationship between the two commit() milestones triggered by Request 1 and the getTallPeople() query executed by Request 2. Suppose it looks like this:

  • Request 1, put()
  • Request 2, getTallPeople()
  • Request 1, put()-->commit()
  • Request 1, put()-->commit()-->Milestone A
  • Request 1, put()-->commit()-->Milestone B

In this scenario getTallPeople() will only return Bob. Why? Because the update to Adam that increases his height has not yet been committed, so the change is not yet visible to the query we issue in Request 2.

Now suppose it looks like this:

  • Request 1, put()
  • Request 1, put()-->commit()
  • Request 1, put()-->commit()-->Milestone A
  • Request 2, getTallPeople()
  • Request 1, put()-->commit()-->Milestone B

In this scenario the query executes before Request 1 reaches Milestone B, so the updates to the Person indices have not yet been applied. As a result, getTallPeople() only returns Bob. This is an example of a result set that excludes an entity whose properties satisfy the query predicate.

Example 2 - Making Bob Smaller (sorry Bob)

In this example we'll have Request 1 do something different. Instead of increasing Adam's height from 68 inches to 74 inches, it will reduce Bob's height from 73 inches to 65 inches. Once again, what does getTallPeople()

  • Request 1, put()
  • Request 2, getTallPeople()
  • Request 1, put()-->commit()
  • Request 1, put()-->commit()-->Milestone A
  • Request 1, put()-->commit()-->Milestone B

In this scenario getTallPeople() will return only Bob. Why? Because the update to Bob that decreases his height has not yet been committed, so the change is not yet visible to the query we issue in Request 2.

Now suppose it looks like this:

  • Request 1, put()
  • Request 1, put()-->commit()
  • Request 1, put()-->commit()-->Milestone A
  • Request 1, put()-->commit()-->Milestone B
  • Request 2, getTallPeople()

In this scenario getTallPeople() will return no one. Why? Because the update to Bob that decreases his height has been committed by the time we issue our query in Request 2.

Now suppose it looks like this:

  • Request 1, put()
  • Request 1, put()-->commit()
  • Request 1, put()-->commit()-->Milestone A
  • Request 2, getTallPeople()
  • Request 1, put()-->commit()-->Milestone B

In this scenario the query executes before Milestone B, so the updates to the Person indices have not yet been applied. As a result, getTallPeople() still returns Bob, but the height property of the Person entity that comes back is the updated value: 65. This is an example of a result set that includes an entity whose properties fail to satisfy the query predicate.

Conclusion

As you can see from the above examples, the transaction isolation level of the Google App Engine Datastore is pretty close to Read Committed. There are of course meaningful differences, but now that you understand these differences and the reasons behind them you should be in a better position to make intelligent, datastore-related design decisions in your applications.