Google Code offered in: 中文 - English - Português - Pусский - Español - 日本語
Every datastore query uses an index, a table that contains the results for the query in the desired order. An App Engine application defines its indexes in a configuration file named index.yaml
. The development web server automatically adds suggestions to this file as it encounters queries that do not yet have indexes configured. You can tune indexes manually by editing the file before uploading the application.
The index-based query mechanism supports most common kinds of queries, but it does not support some queries you may be used to from other database technologies. Restrictions on queries, and their explanations, are described below.
A query retrieves entities from the datastore that meet a set of conditions. The query specifies an entity kind, zero or more conditions based on entity property values (sometimes called "filters"), and zero or more sort order descriptions. When the query is executed, it fetches all entities of the given kind that meet all of the given conditions, sorted in the order described.
The datastore API provides two interfaces for preparing and executing queries: the Query interface, which uses methods to prepare the query, and the GqlQuery interface, which uses a SQL-like query language called GQL to prepare the query from a query string. These interfaces are described in more detail in Creating, Getting and Deleting Data: Getting Entities Using a Query and the corresponding reference pages.
class Person(db.Model): first_name = db.StringProperty() last_name = db.StringProperty() city = db.StringProperty() birth_year = db.IntegerProperty() height = db.IntegerProperty() # The Query interface prepares a query using instance methods. q = Person.all() q.filter("last_name =", "Smith") q.filter("height <", 72) q.order("-height") # The GqlQuery interface prepares a query using a GQL query string. q = db.GqlQuery("SELECT * FROM Person " + "WHERE last_name = :1 AND height < :2 " + "ORDER BY height DESC", "Smith", 72) # The query is not executed until results are accessed. results = q.fetch(5) for p in results: print "%s %s, %d inches tall" % (p.first_name, p.last_name, p.height)
The App Engine datastore maintains an index for every query an application intends to make. As the application makes changes to datastore entities, the datastore updates the indexes with the correct results. When the application executes a query, the datastore fetches the results directly from the corresponding index.
An application has an index for each combination of kind, filter property and operator, and sort order used in a query. Consider the example query from above:
SELECT * FROM Person WHERE last_name = "Smith" AND height < 72 ORDER BY height DESC
The index for this query is a table of keys for entities of the kind Person
, with columns for the values of the height
and last_name
properties. The index is sorted by height
in descending order.
Two queries of the same form but with different filter values use the same index. For example, the following query uses the same index as the query above:
SELECT * FROM Person WHERE last_name = "Jones" AND height < 63 ORDER BY height DESC
The datastore executes a query using the following steps:
An index table contains columns for every property used in a filter or sort order. The rows are sorted by the following aspects, in order:
IN
filtersNote: For the purposes of indexes, IN
filters are handled like =
filters, and !=
filters are handled like the other inequality filters.
This puts all results for every possible query that uses this index in consecutive rows in the table.
This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries you may be used to from other database technologies. See Restrictions on Queries, below.
Tip: Query filters do not have an explicit way to match just part of a string value, but you can fake a prefix match using inequality filters:
db.GqlQuery("SELECT * FROM MyModel WHERE prop >= :1 AND prop < :2", "abc", u"abc" + u"\xEF\xBF\xBD")
This matches every MyModel
entity with a string property prop
that begins with the characters abc
. The byte string "\xEF\xBF\xBD"
represents the largest possible Unicode character. When the property values are sorted in an index, the values that fall in this range are all of the values that begin with the given prefix.
An index only contains entities that have every property referred to by the index. If an entity does not have a property referred to by an index, the entity will not appear in the index, and will never be the result for the query that uses the index.
Note that the App Engine datastore makes a distinction between an entity that does not possess a property and an entity that possesses the property with a null value (in Python, None
). If you want every entity of a kind to be a potential result for a query, you can use a data model that assigns a default value (such as None
) to the properties used by filters in the query.
Properties with values of types Text or Blob (such as with the TextProperty or BlobProperty models) are not included in indexes, and so are not findable by queries. To filter on short string values, use str or unicode values (the StringProperty model).
As a consequence of not indexing these property values, a query with a filter or sort order on a property will never match an entity whose value for the property is a Text or Blob. Properties with such values behave as if the property is not set with regard to query filters and sort orders.
App Engine builds indexes for several simple queries by default. For other queries, the application must specify the indexes it needs in a configuration file named index.yaml
. If the application running under App Engine tries to perform a query for which there is no corresponding index (either provided by default or described in index.yaml
), the query will fail.
App Engine provides automatic indexes for the following forms of queries:
IN
and ancestor filtersOther forms of queries require their indexes to be specified in index.yaml
, including:
IN
filters over other propertiesThe development web server (dev_appserver.py) makes managing index.yaml
easy: Instead of failing to execute a query that does not have index configuration and requires it, the development web server adds an index definition to the file that would allow the query to succeed.
If your local testing of your application calls every possible query the application will make (every combination of kind, ancestor, filter and sort order), the generated entries will represent a complete set of indexes. If your testing might not exercise every possible query form, you can review and adjust the index definitions in this file before uploading the application.
Tip: If dev_appserver.py
is started with the --require_indexes
option, index.yaml
generation is disabled and queries that require index configuration that isn't present will raise an error. Test your application with this option to verify that all the required index configuration is present.
index.yaml
describes each index table, including the kind, the properties needed for the query filters and sort orders, and whether or not the query uses an ancestor clause (either Query.ancestor() or a GQL ANCESTOR IS clause). The properties are listed in the order they are to be sorted: properties used in equality or IN
filters first, followed by the property used in inequality filters, then the query results sort orders and their directions.
Consider once again the following example query:
SELECT * FROM Person WHERE last_name = "Smith" AND height < 72 ORDER BY height DESC
If the application executed only this query (and possibly other queries similar to this one but with different values for "Smith"
and 72
), the index.yaml
file would look like this:
indexes: - kind: Person properties: - name: last_name - name: height direction: desc
When an entity is created or updated, every appropriate index is updated as well. The number of indexes that apply to an entity affects the time it takes to create or update the entity.
For more information on the syntax of index.yaml
, see Configuring Indexes.
The nature of the index query mechanism imposes a few restrictions on what a query can do.
A query filter condition or sort order for a property also implies a condition that the entity have a value for the property.
A datastore entity is not required to have a value for a property that other entities of the same kind have. A filter on a property can only match an entity with a value for the property. Entities without a value for a property used in a filter or sort order are omitted from the index built for the query.
It is not possible to perform a query for entities that are missing a given property. One alternative is to create a fixed (modeled) property with a default value of None
, then create a filter for entities with None
as the property value.
A query may only use inequality filters (<
, <=
, >=
,>
and !=
) on one property across all of its filters.
For example, this GQL query is allowed:
SELECT * FROM Person WHERE birth_year >= :min AND birth_year <= :max
However, this GQL query is not allowed, because it uses inequality filters on two different properties in the same query:
SELECT * FROM Person WHERE birth_year >= :min_year AND height >= :min_height # ERROR
Filters can combine equal (=
) comparisons for different properties in the same query, including queries with one or more inequality conditions on a property. This is allowed:
SELECT * FROM Person WHERE last_name = :last_name AND city = :city AND birth_year >= :min_year
The query mechanism relies on all results for a query to be adjacent to one another in the index table, to avoid having to scan the entire table for results. A single index table cannot represent multiple inequality filters on multiple properties while maintaining that all results are consecutive in the table.
If a query has both a filter with an inequality comparison and one or more sort orders, the query must include a sort order for the property used in the inequality, and the sort order must appear before sort orders on other properties.
This GQL query is not valid, because it uses an inequality filter and does not order by the filtered property:
SELECT * FROM Person WHERE birth_year >= :min_year ORDER BY last_name # ERROR
Similarly, this GQL query is not valid because it does not order by the filtered property before ordering by other properties:
SELECT * FROM Person WHERE birth_year >= :min_year ORDER BY last_name, birth_year # ERROR
This GQL query is valid:
SELECT * FROM Person WHERE birth_year >= :min_year ORDER BY birth_year, last_name
To get all results that match an inequality filter, a query scans the index table for the first matching row, then returns all consecutive results until it finds a row that doesn't match. For the consecutive rows to represent the complete results set, the rows must be ordered by the inequality filter before other sort orders.
Due to the way list properties are indexed, the sort order for list values is unusual:
This sort order has the unusual consequence that [1, 9]
comes before [4, 5, 6, 7]
in both ascending and descending order.
One important caveat is queries with both an equality filter and a sort order on a list property. In those queries, the sort order is disregarded. For non-list properties, this is a simple optimization. Every result would have the same value for the property, so the results do not need to be sorted further.
However, list properties may have additional values. Since the sort order is disregarded, the query results may be returned in a different order than if the sort order were applied. (Restoring the dropped sort order would be expensive and require extra indices, and this use case is rare, so the query planner leaves it off.)
As described above, every property (that doesn't have a Text or Blob value) of every entity is added to at least one index table, including a simple index provided by default, and any indexes described in the application's index.yaml
file that refer to the property. For an entity that has one value for each property, App Engine stores a property value once in its simple index, and once for each time the property is referred to in a custom index. Each of these index entries must be updated every time the value of the property changes, so the more indexes that refer to the property, the more time it will take for a put()
that updates the property to succeed.
To prevent the update of an entity from taking too long, the datastore limits the number of index entries that a single entity can have. The limit is large, and most applications will not notice. However, there are some circumstances where you might encounter the limit. For example, an entity with very many single-value properties can exceed the index entry limit.
Properties with multiple values, such as using a list value or a ListProperty model, store each value as a separate entry in an index. An entity with a single property with very many values (a very long list) can exceed the index entry limit.
Custom indexes that refer to multiple properties with multiple values can get very large with only a few values. To completely record such properties, the index table must include a row for every permutation of the values of every property for the index. For example, the following index (described in index.yaml
syntax) includes the x
and y
properties for entities of the kind MyModel
:
indexes: - kind: MyModel properties: - name: x - name: y
The following code creates an entity with 2 values for the property x
and 2 values for the property y
:
class MyModel(db.Expando): pass e2 = MyModel() e2.x = ['red', 'blue'] e2.y = [1, 2] e2.put()
To accurately represent these values, the index must store 8 property values: two each for the built-in indexes on x
and y
, and one for each permutation of x
and y
in the custom index. With many list values, this can mean an index must store very many index entries for a single entity. You could call an index that refers to multiple properties with multiple values an "exploding index," because it can get very large with just a few values.
If a put()
would result in a number of index entries that exceeds the limit, the call will fail with a BadRequestError exception. If you create a new index that would contain a number of index entries that exceeds the limit for any entity when built, queries against the index will fail, and the index will appear in the "Error" state in the Admin Console.
To handle "Error" indexes, first remove them from your index.yaml
file and run appcfg.py vacuum_indexes
. Then, either reformulate the index definition and corresponding queries or remove the entities that are causing the index to "explode." Finally, add the index back to index.yaml
and run appcfg.py update_indexes
.
You can avoid exploding indexes by avoiding queries that would require a custom index using a list property. As described above, this includes queries with descending sort orders, multiple sort orders, a mix of equality and inequality filters, and ancestor filters.