English | Site Directory

Queries and Indexes

Every datastore query uses an index, a table that contains the results for the query in the desired order. An App Engine application defines its indexes in a configuration file named index.yaml. The development web server automatically adds suggestions to this file as it encounters queries that do not yet have indexes configured. You can tune indexes manually by editing the file before uploading the application.

The index-based query mechanism supports most common kinds of queries, but it does not support some queries you may be used to from other database technologies. Restrictions on queries, and their explanations, are described below.

Introducing Queries

A query retrieves entities from the datastore that meet a set of conditions. The query specifies an entity kind, zero or more conditions based on entity property values (sometimes called "filters"), and zero or more sort order descriptions. When the query is executed, it fetches all entities of the given kind that meet all of the given conditions, sorted in the order described.

The datastore API provides two interfaces for preparing and executing queries: the Query interface, which uses methods to prepare the query, and the GqlQuery interface, which uses a SQL-like query language called GQL to prepare the query from a query string. These interfaces are described in more detail in Creating, Getting and Deleting Data: Getting Entities Using a Query and the corresponding reference pages.

class Person(db.Model):
  first_name = db.StringProperty()
  last_name = db.StringProperty()
  city = db.StringProperty()
  birth_year = db.IntegerProperty()
  height = db.IntegerProperty()

# The Query interface prepares a query using instance methods.
q = Person.all()
q.filter("last_name =", "Smith")
q.filter("height <", 72)
q.order("-height")

# The GqlQuery interface prepares a query using a GQL query string.
q = db.GqlQuery("SELECT * FROM Person " + 
                "WHERE last_name = :1 AND height < :2 " +
                "ORDER BY height DESC",
                "Smith", 72)

# The query is not executed until results are accessed.
results = q.fetch(5)
for p in results:
  print "%s %s, %d inches tall" % (p.first_name, p.last_name, p.height)

Introducing Indexes

The App Engine datastore maintains an index for every query an application intends to make. As the application makes changes to datastore entities, the datastore updates the indexes with the correct results. When the application executes a query, the datastore fetches the results directly from the corresponding index.

An application has an index for each combination of kind, filter property and operator, and sort order used in a query. Consider the example query from above:

SELECT * FROM Person WHERE last_name = "Smith"
                       AND height < 72
                     ORDER BY height DESC

The index for this query is a table of keys for entities of the kind Person, with columns for the values of the height and last_name properties. The index is sorted by height in descending order.

Two queries of the same form but with different filter values use the same index. For example, the following query uses the same index as the query above:

SELECT * FROM Person WHERE last_name = "Jones"
                       AND height < 63
                     ORDER BY height DESC

The datastore executes a query using the following steps:

  1. The datastore identifies the index that corresponds with the query's kind, filter properties, filter operators, and sort orders.
  2. The datastore starts scanning the index at the first entity that meets all of the filter conditions using the query's filter values.
  3. The datastore continues to scan the index, returning each entity, until it finds the next entity that does not meet the filter conditions, or until it reaches the end of the index.

An index table contains columns for every property used in a filter or sort order. The rows are sorted by the following aspects, in order:

  • ancestors
  • property values used in equality or IN filters
  • property values used in inequality filters
  • property values used in sort orders

Note: For the purposes of indexes, IN filters are handled like = filters, and != filters are handled like the other inequality filters.

This puts all results for every possible query that uses this index in consecutive rows in the table.

This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries you may be used to from other database technologies. See Restrictions on Queries, below.

Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:

db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2", "abc", u"abc" + u"\xEF\xBF\xBD")

This matches every MyModel entity with a string property prop that begins with the characters abc. The byte string "\xEF\xBF\xBD" represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.

Entities Without a Filtered Property Are Never Returned by a Query

An index only contains entities that have every property referred to by the index. If an entity does not have a property referred to by an index, the entity will not appear in the index, and will never be the result for the query that uses the index.

Note that the App Engine datastore makes a distinction between an entity that does not possess a property and an entity that possesses the property with a null value (in Python, None). If you want every entity of a kind to be a potential result for a query, you can use a data model that assigns a default value (such as None) to the properties used by filters in the query.

Text and Blob Values are Not Indexed

Properties with values of types Text or Blob (such as with the TextProperty or BlobProperty models) are not included in indexes, and so are not findable by queries. To filter on short string values, use str or unicode values (the StringProperty model).

As a consequence of not indexing these property values, a query with a filter or sort order on a property will never match an entity whose value for the property is a Text or Blob. Properties with such values behave as if the property is not set with regard to query filters and sort orders.

Defining Indexes With index.yaml

App Engine builds indexes for several simple queries by default. For other queries, the application must specify the indexes it needs in a configuration file named index.yaml. If the application running under App Engine tries to perform a query for which there is no corresponding index (either provided by default or described in index.yaml), the query will fail.

App Engine provides automatic indexes for the following forms of queries:

  • queries using only equality, IN and ancestor filters
  • queries using only inequality filters (which can only be of a single property)
  • queries with only one sort order, ascending

Other forms of queries require their indexes to be specified in index.yaml, including:

  • queries with a descending sort order
  • queries with multiple sort orders
  • queries with one or more inequality filters on a property and one or more equality or IN filters over other properties
  • queries with inequality filters and ancestor filters

The development web server (dev_appserver.py) makes managing index.yaml easy: Instead of failing to execute a query that does not have index configuration and requires it, the development web server adds an index definition to the file that would allow the query to succeed.

If your local testing of your application calls every possible query the application will make (every combination of kind, ancestor, filter and sort order), the generated entries will represent a complete set of indexes. If your testing might not exercise every possible query form, you can review and adjust the index definitions in this file before uploading the application.

Tip: If dev_appserver.py is started with the --require_indexes option, index.yaml generation is disabled and queries that require index configuration that isn't present will raise an error. Test your application with this option to verify that all the required index configuration is present.

index.yaml describes each index table, including the kind, the properties needed for the query filters and sort orders, and whether or not the query uses an ancestor clause (either Query.ancestor() or a GQL ANCESTOR IS clause). The properties are listed in the order they are to be sorted: properties used in equality or IN filters first, followed by the property used in inequality filters, then the query results sort orders and their directions.

Consider once again the following example query:

SELECT * FROM Person WHERE last_name = "Smith"
                       AND height < 72
                     ORDER BY height DESC

If the application executed only this query (and possibly other queries similar to this one but with different values for "Smith" and 72), the index.yaml file would look like this:

indexes:
- kind: Person
  properties:
  - name: last_name
  - name: height
    direction: desc

When an entity is created or updated, every appropriate index is updated as well. The number of indexes that apply to an entity affects the time it takes to create or update the entity.

For more information on the syntax of index.yaml, see Configuring Indexes.

Restrictions on Queries

The nature of the index query mechanism imposes a few restrictions on what a query can do.

Filtering Or Sorting On a Property Requires That the Property Exists

A query filter condition or sort order for a property also implies a condition that the entity have a value for the property.

A datastore entity is not required to have a value for a property that other entities of the same kind have. A filter on a property can only match an entity with a value for the property. Entities without a value for a property used in a filter or sort order are omitted from the index built for the query.

No Filter That Matches Entities That Do Not Have a Property

It is not possible to perform a query for entities that are missing a given property. One alternative is to create a fixed (modeled) property with a default value of None, then create a filter for entities with None as the property value.

Inequality Filters Are Allowed On One Property Only

A query may only use inequality filters (<, <=, >=,> and !=) on one property across all of its filters.

For example, this GQL query is allowed:

SELECT * FROM Person WHERE birth_year >= :min
                       AND birth_year <= :max

However, this GQL query is not allowed, because it uses inequality filters on two different properties in the same query:

SELECT * FROM Person WHERE birth_year >= :min_year
                       AND height >= :min_height     # ERROR

Filters can combine equal (=) comparisons for different properties in the same query, including queries with one or more inequality conditions on a property. This is allowed:

SELECT * FROM Person WHERE last_name = :last_name
                       AND city = :city
                       AND birth_year >= :min_year

The query mechanism relies on all results for a query to be adjacent to one another in the index table, to avoid having to scan the entire table for results. A single index table cannot represent multiple inequality filters on multiple properties while maintaining that all results are consecutive in the table.

Properties In Inequality Filters Must Be Sorted Before Other Sort Orders

If a query has both a filter with an inequality comparison and one or more sort orders, the query must include a sort order for the property used in the inequality, and the sort order must appear before sort orders on other properties.

This GQL query is not valid, because it uses an inequality filter and does not order by the filtered property:

SELECT * FROM Person WHERE birth_year >= :min_year
                     ORDER BY last_name              # ERROR

Similarly, this GQL query is not valid because it does not order by the filtered property before ordering by other properties:

SELECT * FROM Person WHERE birth_year >= :min_year
                     ORDER BY last_name, birth_year  # ERROR

This GQL query is valid:

SELECT * FROM Person WHERE birth_year >= :min_year
                     ORDER BY birth_year, last_name

To get all results that match an inequality filter, a query scans the index table for the first matching row, then returns all consecutive results until it finds a row that doesn't match. For the consecutive rows to represent the complete results set, the rows must be ordered by the inequality filter before other sort orders.

Sort Orders and List Properties

Due to the way list properties are indexed, the sort order for list values is unusual:

  • If the entities are sorted by a list property in ascending order, the value used for ordering is the smallest element in the list.
  • If the entities are sorted by a list property in descending order, the value used for ordering is the greatest element in the list.
  • Other elements in the list do not affect the sort order, nor does the list's length.
  • In the case of a tie, the key of the entity is used as the tie-breaker.

This sort order has the unusual consequence that [1, 9] comes before [4, 5, 6, 7] in both ascending and descending order.

One important caveat is queries with both an equality filter and a sort order on a list property. In those queries, the sort order is disregarded. For non-list properties, this is a simple optimization. Every result would have the same value for the property, so the results do not need to be sorted further.

However, list properties may have additional values. Since the sort order is disregarded, the query results may be returned in a different order than if the sort order were applied. (Restoring the dropped sort order would be expensive and require extra indices, and this use case is rare, so the query planner leaves it off.)

Big Entities and Exploding Indexes

As described above, every property (that doesn't have a Text or Blob value) of every entity is added to at least one index table, including a simple index provided by default, and any indexes described in the application's index.yaml file that refer to the property. For an entity that has one value for each property, App Engine stores a property value once in its simple index, and once for each time the property is referred to in a custom index. Each of these index entries must be updated every time the value of the property changes, so the more indexes that refer to the property, the more time it will take for a put() that updates the property to succeed.

To prevent the update of an entity from taking too long, the datastore limits the number of index entries that a single entity can have. The limit is large, and most applications will not notice. However, there are some circumstances where you might encounter the limit. For example, an entity with very many single-value properties can exceed the index entry limit.

Properties with multiple values, such as using a list value or a ListProperty model, store each value as a separate entry in an index. An entity with a single property with very many values (a very long list) can exceed the index entry limit.

Custom indexes that refer to multiple properties with multiple values can get very large with only a few values. To completely record such properties, the index table must include a row for every permutation of the values of every property for the index. For example, the following index (described in index.yaml syntax) includes the x and y properties for entities of the kind MyModel:

indexes:
- kind: MyModel
  properties:
  - name: x
  - name: y

The following code creates an entity with 2 values for the property x and 2 values for the property y:

class MyModel(db.Expando):
  pass

e2 = MyModel()
e2.x = ['red', 'blue']
e2.y = [1, 2]
e2.put()

To accurately represent these values, the index must store 8 property values: two each for the built-in indexes on x and y, and one for each permutation of x and y in the custom index. With many list values, this can mean an index must store very many index entries for a single entity. You could call an index that refers to multiple properties with multiple values an "exploding index," because it can get very large with just a few values.

If a put() would result in a number of index entries that exceeds the limit, the call will fail with a BadRequestError exception. If you create a new index that would contain a number of index entries that exceeds the limit for any entity when built, queries against the index will fail, and the index will appear in the "Error" state in the Admin Console.

To handle "Error" indexes, first remove them from your index.yaml file and run appcfg.py vacuum_indexes. Then, either reformulate the index definition and corresponding queries or remove the entities that are causing the index to "explode." Finally, add the index back to index.yaml and run appcfg.py update_indexes.

You can avoid exploding indexes by avoiding queries that would require a custom index using a list property. As described above, this includes queries with descending sort orders, multiple sort orders, a mix of equality and inequality filters, and ancestor filters.