Why Prepared Statements are important and how to use them "properly"
Databases have a tough job. They accept SQL queries from many clients
concurrently and execute the queries as efficiently as possible against
the data. Processing statements can be an expensive operation but
databases are now written in such a way so that this overhead is
minimized. However, these optimizations need assistance from the
application developers if we are to capitalize on them. This article
shows you how the correct use of PreparedStatements can significantly
help a database perform these optimizations.
How does a database execute a statement?
Obviously, don't expect alot of detail here; we'll only examine the
aspects important to this article. When a database receives a
statement, the database engine first parses the statement and looks
for syntax errors. Once the statement is parsed, the database needs
to figure out the most efficient way to execute the statement. This
can be computationally quite expensive. The database checks what
indexes, if any, can help, or whether it should do a full read of
all rows in a table. Databases use statistics on the data to figure
out what is the best way. Once the query plan is created then it
can be executed by the database engine.
It takes CPU power to do the access plan generation. Ideally, if we
send the same statement to the database twice, then we'd like the
database to reuse the access plan for the first statement. This uses
less CPU than if it regenerated the plan a second time.
Statement Caches
Databases are tuned to do statement caches. They usually include
some kind of statement cache. This cache uses the statement itself
as a key and the access plan is stored in the cache with the
corresponding statement. This allows the database engine to
reuse the plans for statements that have been executed previously.
For example, if we sent the database a statement such as "select a,b
from t where c = 2", then the computed access plan is cached. If we
send the same statement later, the database can reuse the previous
access plan, thus saving us CPU power.
Note however, that the entire statement is the key. For example, if
we later sent the statement "select a,b from t where c = 3", it would
not find an access plan. This is because the "c=3" is different from
the cached plan "c=2". So, for example:
- For(int I = 0; I < 1000; ++I)
{
PreparedStatement ps = conn.prepareStatement("select a,b from t where c = " + I);
ResultSet rs = Ps.executeQuery();
Rs.close();
Ps.close();
}
Here the cache won't be used. Each iteration of the loop sends a
different SQL statement to the database. A new access plan is computed
for each iteration and we're basically throwing CPU cycles away using
this approach. However, look at the next snippet:
-
- PreparedStatement ps = conn.prepareStatement("select a,b from t where c = ?");
For(int I = 0; I < 1000; ++I)
{
ps.setInt(1, I);
ResultSet rs = ps.executeQuery();
Rs.close();
}
ps.close();
Here it will be much more efficient. The statement sent to the database
is parameterized using the '?' marker in the sql. This means every
iteration is sending the same statement to the database with different
parameters for the "c=?" part. This allows the database to reuse the
access plans for the statement and makes the program execute more
efficiently inside the database. This basically let's your application
run faster or makes more CPU available to users of the database.
PreparedStatements and J2EE servers
Things can get more complicated when we use a J2EE server. Normally,
a prepared statement is associated with a single database connection.
When the connection is closed, the preparedstatement is discarded.
Normally, a fat client application would get a database connection
and then hold it for its lifetime. It would also create all prepared
statements eagerly or lazily. Eagerly means that they are all
created at once when the application starts. Lazily means that they
are created as they are used. An eager approach gives a delay when
the application starts but once it starts then it performs optimally.
A lazy approach gives a fast start but as the application runs, the
prepared statements are created when they are first used by the
application. This gives an uneven performance until all statements
are prepared but the application eventually settles and runs as fast
as the eager application. Which is best depends on whether you need
a fast start or even performance.
The problem with a J2EE application is that it can't work like this. It
only keeps a connection for the duration of the request. This means
that it must create the prepared statements every time the request is
executed. This is not as efficient as the fat client approach where the
prepared statements are created once, rather than on every request.
J2EE vendors have noticed this and designed connection pooling to avoid
this performance disadvantage.
When the J2EE server gives your application a connection, it isn't
giving you the actual connection; you're getting a wrapper. You can
verify this by looking at the name of the class for the connection
you are given. It won't be a database JDBC connection, it'll be a
class created by your application server. Normally, if you called
close on a connection then the jdbc driver closes the connection.
We want the connection to be returned to the pool when close is
called by a J2EE application. We do this by making a proxy jdbc
connection class that looks like a real connection. It has a
reference to the actual connection. When we invoke any method on
the connection then the proxy forwards the call to the real connection.
But, when we call methods such as close instead of calling close on
the real connection, it simply returns the connection to the
connection pool and then marks the proxy connection as invalid
so that if it is used again by the application we'll get an exception.
Wrapping is very useful as it also helps J2EE application server
implementers to add support for prepared statements in a sensible way.
When an application calls Connection.prepareStatement, it is returned a
PreparedStatement object by the driver. The application then keeps the
handle while it has the connection and closes it before it closes the
connection when the request finishes. However, after the connection
is returned to the pool and later reused by the same, or another
application, , then ideally, we want the same PreparedStatement to
be returned to the application.
J2EE PreparedStatement Cache
J2EE PreparedStatement Cache is implemented using a cache inside the
J2EE server connection pool manager. The J2EE server keeps a list of
prepared statements for each database connection in the pool. When an
application calls prepareStatement on a connection, the application
server checks if that statement was previously prepared. If it was,
the PreparedStatement object will be in the cache and this will be
returned to the application. If not, the call is passed to the jdbc
driver and the query/preparedstatement object is added in that
connections cache.
We need a cache per connection because that's the way jdbc drivers
work. Any preparedstatements returned are specific to that connection.
If we want to take advantage of this cache, the same rules apply as
before. We need to use parameterized queries so that they will match
ones already prepared in the cache. Most application servers will allow
you to tune the size of this prepared statement cache.
Summary
In conclusion, we should use parameterized queries with prepared
statements. This reduces the load on the database by allowing it
to reuse access plans that were already prepared. This cache is
database-wide so if you can arrange for all your applications to
use similar parameterized SQL, you will improve the efficiency of
this caching scheme as an application can take advantage of prepared
statements used by another application. This is an advantage of an
application server because logic that accesses the database should
be centralized in a data access layer (either an OR-mapper, entity
beans or straight JDBC).
Finally, the correct use of prepared statements also lets you
take advantage of the prepared statement cache in the application
server. This improves the performance of your application as the
application can reduce the number of calls to the JDBC driver
by reusing a previous prepared statement call. This makes it
competitive with fat clients efficiency-wise and removes the
disadvantage of not being able to keep a dedicated connection.
If you use parameterized prepared statements, you improve the
efficiency of the database and your application server hosted code.
Both of these improvements will allow your application to improve its
performance.
PRINTER FRIENDLY VERSION
|