Google Code offered in: 中文 - English - Português - Pусский - Español - 日本語
According to Wikipedia, the transaction isolation level of a database management system "defines how/when the changes made by one operation become visible to other concurrent operations." The goal of this article is to explain transaction isolation in the Google App Engine Datastore. After reading this article you should have a better understanding of how concurrent reads and writes behave.
Of the four isolation levels typically supported by databases (Serializable,
Repeatable Read, Read Committed, Read Uncommitted), the datastore's isolation
level most closely resembles Read Committed. Entities retrieved from the
datastore by queries or get()s will consist only of committed data. A
retrieved entity will never have partially committed data, ie some from before
a commit and some from after. The interaction between queries and transactions
is a bit more subtle, though, and in order to understand it we need to look at
the commit()
process in more depth.
A commit()
consists of two milestones - the point at which changes to an
entity have been applied and the point at which changes to indices for that
entity have been applied. Let's call the first point Milestone A, and the
second point, when commit()
returns, Milestone B. By the time we reach
Milestone A, all changes to the entity have been applied. By the time we
reach Milestone B, the changes to the entity's indices have been applied.
A request that looks up an updated entity by its key at a time after
Milestone A is guaranteed to see the latest version of that entity.
However, if a concurrent request executes a query whose predicate
(the 'where clause' for you SQL/GQL fans out there) is not satisfied
by the pre-update entity but is satisfied by the post-update entity,
the entity will only be part of the result set if the query executes
after the commit()
operation has reached Milestone B. In other words,
during brief windows it is possible for a result set to not include an
entity whose properties, according to the result of a lookup by key,
satisfy the query predicate, and it is also possible for a result set
to include an entity whose properties, again according to the result
of a lookup by key, fail to satisfy the query predicate. Note that
while a query cannot take into account modifications that are in between
Milestone A and Milestone B when deciding which entities to return,
once a query decides to return a specific entity it will always return
the Milestone A version of the entity.
We've provided a general explanation of how concurrent updates and queries interact, but if you're like me you typically find it easier to get your head around these concepts by working through concrete examples. Let's walk through a few. We'll start with some simple examples and then finish up with the more interesting ones.
Let's say we have an application that stores Person entities. A Person has the following attributes:
This application supports the following operations:
updatePerson()
getTallPeople()
, which returns all people over 72 inches tall.We have 2 Person entities in the datastore:
Suppose an application receives two requests at essentially the same time. The first request updates the height of Adam from 68 inches to 74 inches. A growth spurt! The second request calls getTallPeople(). What does getTallPeople() return?
The answer depends on the relationship between the two commit() milestones triggered by Request 1 and the getTallPeople() query executed by Request 2. Suppose it looks like this:
put()
getTallPeople()
put()
-->commit()
put()
-->commit()
-->Milestone Aput()
-->commit()
-->Milestone BIn this scenario getTallPeople()
will only return Bob. Why? Because the
update to Adam that increases his height has not yet been committed, so
the change is not yet visible to the query we issue in Request 2.
Now suppose it looks like this:
put()
put()
-->commit()
put()
-->commit()
-->Milestone AgetTallPeople()
put()
-->commit()
-->Milestone BIn this scenario the query executes before Request 1 reaches Milestone B, so the updates to the Person indices have not yet been applied. As a result, getTallPeople() only returns Bob. This is an example of a result set that excludes an entity whose properties satisfy the query predicate.
In this example we'll have Request 1 do something different. Instead of
increasing Adam's height from 68 inches to 74 inches, it will reduce Bob's
height from 73 inches to 65 inches. Once again, what does
getTallPeople()
put()
getTallPeople()
put()
-->commit()
put()
-->commit()
-->Milestone Aput()
-->commit()
-->Milestone BIn this scenario getTallPeople()
will return only Bob. Why? Because
the update to Bob that decreases his height has not yet been committed,
so the change is not yet visible to the query we issue in Request 2.
Now suppose it looks like this:
put()
put()
-->commit()
put()
-->commit()
-->Milestone Aput()
-->commit()
-->Milestone BgetTallPeople()
In this scenario getTallPeople()
will return no one. Why? Because the
update to Bob that decreases his height has been committed by the time
we issue our query in Request 2.
Now suppose it looks like this:
put()
put()
-->commit()
put()
-->commit()
-->Milestone AgetTallPeople()
put()
-->commit()
-->Milestone BIn this scenario the query executes before Milestone B, so the updates
to the Person indices have not yet been applied. As a result,
getTallPeople()
still returns Bob, but the height property of the Person
entity that comes back is the updated value: 65. This is an example of
a result set that includes an entity whose properties fail to satisfy
the query predicate.
As you can see from the above examples, the transaction isolation level of the Google App Engine Datastore is pretty close to Read Committed. There are of course meaningful differences, but now that you understand these differences and the reasons behind them you should be in a better position to make intelligent, datastore-related design decisions in your applications.