Enterprise Java Community: Cayenne: Being Productive with Object Relational Mapping

Discuss this Article

Introduction

This article presents Cayenne - a fast, scalable and easy-to-learn open source Object Relational Mapping (ORM) framework. Cayenne is a rare kind of a Java open source project. It is not just an "edit your own XML" runtime library, but an integrated suite that includes modeling and deployment tools. This article shows how to quickly create an ORM application with Cayenne and discusses its core features and design principles. For detailed information on Cayenne visit http://objectstyle.org/cayenne/.

What Is Cayenne

Cayenne was started in 2001 as an open source community-driven project. It is a powerful production-quality ORM. A partial list of Cayenne features includes:

"Lazy" relationships and incremental fetching of data.
Object queries, including in-memory sorting and filtering of objects with Cayenne expression language.
Isolation of object graph changes between user sessions.
Committing all created, modified or deleted objects with a single method call.
Distributed cache.
Automatic ordering of DML operations to satisfy database integrity constraints.
Combining multiple physical data sources into a single virtual data source.
Database independence with adapters for all major databases.
Multiple strategies for automated primary key generation.
... and many more

The Cayenne "trademark" is its ease of use, quick learning, and developer productivity. The idea that very much defined Cayenne as a product was that no user would want to memorize yet another XML format, let alone manually edit the mapping files. While one can work with Cayenne mapping files in vi or a favorite XML editor, there is rarely a need to do so. Cayenne comes with CayenneModeler - a Swing GUI tool that supports all mapping and deployment operations. CayenneModeler makes the learning process very intuitive for newbies and increases productivity of advanced users. It helps to visualize the structure of DB schema and object layer, it also supports reusable parameterized queries and fast prototyping.

Committing to building a full-scale GUI for Cayenne wasn't an easy decision, as it placed serious burden on the development team. While Java Swing is an undeniably powerful technology, it also happens to be very developer-unfriendly. However the time invested in the tools payed off extremely well in increased usability of Cayenne. Cayenne breaks the the common stereotype that a server-side open source product can be either a library or a command-line application. This places it in the league with commercial tools such as TopLink and EOF.

At the same time Cayenne takes a very pragmatic approach to the development process, recognizing that there is a number of repetitive operations that can and should be scripted outside the GUI. For example, when making changes to the mapping in CayenneModeler, users often forget to update persistent Java classes to match the latest mapping. Another example is deploying an application in multiple environments (e.g. development, testing, production). It is error-prone and unproductive to configure deployment settings via GUI every time a new release goes out. For these cases Cayenne provides Ant tasks for integration into the build scripts.

Cayenne Primer

Creating a Project

We will use a simple art catalog database to demonstrate various Cayenne features. A database schema for the catalog is shown on the left.

Typical Cayenne project consists of mapping and database access objects (both kinds are described in XML files built using CayenneModeler). A project contains DataMaps and DataNodes organized into DataDomains. DataMap is a repository of all mapping information about the database and object layers. DataNode is an abstraction of a single physical database. Different databases can be mixed in the same project, e.g. MySQL and Oracle DataNodes can coexist together. DataNodes do not have to be JDBC-compliant or relational (though normally they are both). It is possible to have DataNodes for other types of data, such as flat files or LDAP. DataDomain groups one or more DataNodes and DataMaps in a virtual object-relational data source. Internally it performs routing of queries to the appropriate connection. As you can see, the Cayenne application can map to a very complex physical layout of database servers.

It takes only a few simple steps to create a persistent object layer from an existing database:

Start CayenneModeler and create a new project.
Click "DataDomain" button to create a new DataDomain.
Select this DataDomain and create a child DataNode by clicking "DataNode" button.
On the DataNode configuration screen select "DriverDataSourceFactory" for "DataSource Factory", select DB adapter for the target database (choices are self-explanatory) and enter connection information.
From the menu select "Tools -> Reengineer Database Schema" and follow the wizard. After this step is complete, there should be a new DataMap in the project, containing all reengineered DB tables and "guessed" Java classes for them.
Select this DataMap and choose "Tools -> Generate Classes" from the menu to create persistent classes.

All these steps take about 2 minutes, and produce a fully-functional set of Java classes "enabled" for persistence with Cayenne and ready to be used in applications. Cayenne persistent classes are POJO (or "Plain Old Java Objects"), and do not use any bytecode "enhancements". Default class generation template (which is based on Velocity and is easily customizable) produces the following subclass/superclass pair for the ARTIST table (Painting and Gallery classes generated for the other two tables are not shown):

/** Class _Artist was generated by Cayenne.
  * It is probably a good idea to avoid changing this class manually, 
  * since it may be overwritten next time code is regenerated. 
  * If you need to make any customizations, please use subclass. 
  */
public class _Artist extends org.objectstyle.cayenne.CayenneDataObject {

    public static final String ARTIST_NAME_PROPERTY = "artistName";
    public static final String DATE_OF_BIRTH_PROPERTY = "dateOfBirth";
    public static final String PAINTING_ARRAY_PROPERTY = "paintingArray";

    public static final String ARTIST_ID_PK_COLUMN = "ARTIST_ID";

    public void setArtistName(String artistName) {
        writeProperty("artistName", artistName);
    }
    public String getArtistName() {
        return (String)readProperty("artistName");
    }

    public void setDateOfBirth(java.util.Date dateOfBirth) {
        writeProperty("dateOfBirth", dateOfBirth);
    }
    public java.util.Date getDateOfBirth() {
        return (java.util.Date)readProperty("dateOfBirth");
    }

    public void addToPaintingArray(org.objectstyle.art.Painting obj) {
        addToManyTarget("paintingArray", obj, true);
    }
    public void removeFromPaintingArray(org.objectstyle.art.Painting obj) {
        removeToManyTarget("paintingArray", obj, true);
    }
    public List getPaintingArray() {
        return (List)readProperty("paintingArray");
    }

public class Artist extends _Artist {

}

Superclass contains all the property "set" and "get" methods and as a rule is never edited manually. Subclass is generated with no contents and is never modified by the class generation procedure on subsequent runs. This approach (also known as generation gap pattern) makes it safe to regenerate classes any number of times without conflicts between generated and custom code. As a result each subclass becomes an ideal place to include custom logic related to the application domain model and also business logic. We should note that mixing business logic with domain model is a controversial topic, but one would be surprised to discover how many applications can do just fine without an extra "business layer" on top of persistent objects.

Writing Application Code

All operations related to persistence are performed via instances of DataContext. Users almost never have to deal with potentially complex structure of DataDomains and DataNodes. Instead DataContext provides a facade to the rest of Cayenne access layer. It isolates uncommitted changes made to objects from the changes made by users of other DataContexts. In multi-user applications DataContext is usually placed in the session scope, keeping ongoing changes private to a session.

Here are a few common examples of how an application obtains a DataContext. Command-line and Swing applications can use a simple static method to get a new instance:

import org.objectstyle.cayenne.access.DataContext;
...
DataContext context = DataContext.createContext();

Web applications usually register a special listener in web.xml, so that a DataContext is automatically inserted into the session scope:

<!-- Inside web.xml -->
<listener>
   <listener-class>
      org.objectstyle.cayenne.conf.WebApplicationListener
   </listener-class>
</listener>

// WebApp Java Code
import org.objectstyle.cayenne.conf.BasicServletConfiguration;
import org.objectstyle.cayenne.access.DataContext;
import javax.servlet.http.HttpServletRequest;
...
HttpServletRequest r;

// retrieve DataContext stored in the session by the WebApplicationListener
DataContext context = BasicServletConfiguration.getDefaultContext(r.getSession());

The most common operations with persistent objects are:

Querying the database to get a list of objects matching a certain criteria.
Obtaining related objects.
Modifying object properties.
Deleting persistent objects.
Creating new persistent objects.
Committing all changes back to the database.

This is where Cayenne helps tremendously in reducing the amount of code and making developers productive. The following examples show how little code is needed to do all of the things above.

// Fetch all artists who worked in the XX century..
// note how easy it is to build property expressions
Expression qualifier = 
   Expression.fromString("paintingArray.year between 1899 and 2000");
SelectQuery query = new SelectQuery(Artist.class, qualifier);
List artists = context.performQuery(query);

if(artists.size() > 0) {
  Artist firstArtist = (Artist) artists.get(0);

  // "Simple" properties are accessible via generated "get" methods:
  System.out.println("First Artist Name: " + firstArtist.getName());

  // Collection properties, such as the list of paintings, can be 
  // obtained just like simple properties, by calling a "get" method:
  List paintings = firstArtist.getPaintingArray();

  if(paintings.size() > 0) {
     Painting firstPainting = (Painting) paintings.get(0);
     System.out.println("First Painting Name: " + firstPainting.getName());
  }
}

// Catalog a new artist, Pablo Picasso, and his paintings
Artist picasso = (Artist) context.createAndRegisterNewObject(Artist.class);
picasso.setName("Pablo Picasso");

Painting selfPortrait = (Painting) context.createAndRegisterNewObject(Painting.class);
selfPortrait.setName("Self-portrait");
selfPortrait.setYear(new Integer(1907));

Painting theDream = (Painting) context.createAndRegisterNewObject(Painting.class);
theDream.setName("The Dream");
theDream.setYear(new Integer(1932));

// Set artist on both paintings; as a side effect this will automatically
// add these paintings to the Artist's paintings collection.
selfPortrait.setToArtist(picasso);
theDream.setToArtist(picasso);

// Final step - commit changes.
// All three objects will be stored in the DB in one method call
context.commitChanges();

Building Blocks

Expressions: Navigating Object Graph

One of the cornerstones of Cayenne is a concise and readable expression language for object and database graph traversal and SQL-like conditions. It is used in a number of places, namely in by the query API, and also as a standalone object graph navigation framework. The language is self-explanatory to anyone familiar with JavaBeans properties concept. For example, "name", "paintingArray.toGallery.name" and "paintingArray.year > 2002" are all valid expressions. Since some of the table columns may not be mapped by the object layer, and therefore can't be scripted as object properties, Cayenne allows to create expressions using database column and relationship names (note that Cayenne mapping provides named relationships between tables). A "db:" prefix is used with the property path to distinguish database "properties" from Java ones. The examples above can be written as "db:NAME", "db:paintingArray.toGallery.NAME" and "db:paintingArray.YEAR > 2002". Note that "db:" expressions are rarely used directly, and were originally intended for internal use. Though as community experience shows, sometimes they are indispensable.

As mentioned already expressions are abstract and independent from the rest of the framework. This makes them very helpful in manipulating Java Beans, regardless of whether you work with objects mapped in Cayenne or just regular Java objects:

1. Reading object property values.

Expression exp = Expression.fromString("name");

Object object = ...;

// 'object' is any kind of Java object that has "getName()" method.
Object objectName = exp.evaluate(object);

2. Filtering collections in-memory.

// this particular expression does some DB-style 
// regular expression matching on object properties
Expression filter = Expression.fromString("name like 'A%'");

List objects = ...;
List startWithA = filter.filterObjects(objects);

3. Sorting collections in-memory.

// For ordering purposes, expressions are wrapped in an Ordering instance,
// that provides ordering direction in addition to the property specification,
// and also implements java.util.Comparator
Ordering comparator = new Ordering("toArtist.name", Ordering.ASC);

List paintings = ...;

// sorts Paintings collection by artist name in ascending order
Collections.sort(objects, comparator);

Expressions are dynamic and can contain named parameters prefixed with "$". Parameter substitution (resolving an expression) is done using expWithParameters(..) method:

Expression exp = Expression.fromString("toArtist.name like $namePattern or year = $year");
...
Map parameters = new HashMap();
parameters.put("namePattern", "X%");
parameters.put("year", new Integer(1954));

// 'anotherExp' will be: "toArtist.name like 'X%' or year = 1954"
Expression anotherExp = exp.expWithParameters(parameters);

If some parameters are missing from the map, the method is smart enough to strip unused parts of the template expression:

Expression exp = Expression.fromString("toArtist.name like $namePattern or year = $year");
...
Map parameters = new HashMap();
parameters.put("namePattern", "X%");

// since "year" is missing, "or year = $year" part of the expression
// is automatically stripped, producing "toArtist.name like 'X%'"
Expression anotherExp = exp.expWithParameters(parameters);

Queries

Cayenne does not attempt to create its own SQL dialect and force users to program in it, since it would be redundant and inherently limited (so there is no "CQL"). Instead it provides Java Query API based in part on Cayenne expressions. The most often used query is SelectQuery. It is intended to get objects (and also "raw" untyped data) from the database via DataContext. SelectQuery contains information needed for Cayenne to find target database, generate SQL, execute it in the most efficient manner, and return the results in specified format. It is important to note that SelectQueries are the "first class citizens" in Cayenne mapping, and can be built and configured using CayenneModeler and stored in XML. The "anatomy" of SelectQuery is the following:

1. Query Root. Each query has a "root", which is normally a Java class. It serves multiple purposes. It tells Cayenne what source table or view to use for the query. It specifies what type of objects is expected to be returned back. And finally it works as a key to determine which DataNode (and underlying physical database) should be used for the query. Root is the only required part of a valid query.

SelectQuery query = new SelectQuery(Artist.class);

2. Query Qualifier. Qualifier is a Cayenne expression describing what rows should be included in the result. When DataContext executes a query, qualifier is translated to the WHERE clause of SELECT statement, using the target database SQL dialect. Expressions are discussed in details later.

Expression qualifier = Expression.fromString("toArtist.name like 'X%'");
SelectQuery query = new SelectQuery(Artist.class, qualifier);

3. Query Ordering. As the name implies, Ordering specifies how the result should be sorted. It is translated to the ORDER BY clause. SelectQuery can have multiple orderings.

SelectQuery query = new SelectQuery(Artist.class);
query.addOrdering("name", true);
query.addOrdering("dateOfBirth", true);

// the above is just a short form for explicitly instantiating Orderings:
// query.addOrdering(new Ordering("name", Ordering.ASC));
// query.addOrdering(new Ordering("dateOfBirth", Ordering.ASC));

4. Query Prefetching. Most web and desktop applications work with tabular views of data. Each row in such view displays information from a "master" object and sometimes from a "detail" object related to "master". For example painting catalog search results table may have the following columns: "Painting Name", "Artist", "Gallery". In this example each Painting object is "master", and related Gallery and Artist objects are "details". If Galleries and Artists for all Paintings are not fully resolved in-memory by the time search results are displayed, they will be fetched one-by-one as the view is rendered. Application may end up issuing hundreds if not thousands of extra queries just to process a single page. To solve this problem, SelectQuery can be configured to use one or more "prefetches". "Prefetch" is a Cayenne expression showing how to navigate from master to detail.

SelectQuery query = new SelectQuery(Painting.class);
query.addPrefetch("toArtist");
query.addPrefetch("toGallery");

Another example - prefetching can be done on collections (to-many relationships) and can span multiple relationships.

SelectQuery query = new SelectQuery(Artist.class);
query.addPrefetch("paintingArray");
query.addPrefetch("paintingArray.toGallery");

5. Various Hints. SelectQuery has a number of properties that allow further customization of its processing by DataContext.

SelectQuery query = new SelectQuery(Artist.class);

// resolve results page by page, 50 at a time
query.setPageSize(50);

// do not fetch more than 1000 records
query.setFetchLimit(1000);

// instead of Artist objects, fetch untyped Maps for each row
query.setFetchingDataRows(true);

Transactions

Cayenne completely frees users from any transaction management. DataContext.commitChanges() is the only method that users normally care about. It demarcates a moment when all local changes made to objects within a given DataContext are flushed down a JDBC Connection. Depending on a setting configured in CayenneModeler, the result will be either committing the changes into one or more data sources (yes, Cayenne can commit to more than one database at once), or simply executing a bunch of generated PreparedStatements, and letting the container commit them via its preferred mechanism.

Cayenne has support for explicit transactions via Transaction class. I still have to see a case where one would bother to use it, but then I've been pretty successful so far in talking my customers out of using EJB on their projects :-).

What Else Is in Cayenne?

A few more features that deserve mentioning here, without any particular order:

1. Distributed Caching. DataDomain has an object caching facility that serves all its DataContexts and can be distributed across JVMs via multiple mechanisms. Out of the box, distributed cache updates can be done via JavaGroups and JMS, both configurable via CayenneModeler.

2. Batching and Operation Sorting. Cayenne uses JDBC batching for all DML operations if the underlying driver can support it. This results in significant performance improvements for massive data modifications. Also Cayenne automatically sorts generated INSERT/DELETE/UPDATE DML to satisfy database integrity constraints. The algorithm used is topological sorting of directed acyclic graphs applied to the graph of database tables. It is implemented using excellent Ashwood graph library (just like Cayenne, Ashwood is an open source project at ObjectStyle.org). Sorting is extremely important when a given database does not implement deferred constraint checking, or if constraint parameters are not controlled by Java developers.

3. Inheritance and Advanced Mapping. Cayenne supports inheritance for persistent classes that are using the same base table. It also support "flattened" relationships that transparently support one or more "join" tables between source and destination tables.

4. Optimistic Locking. CayenneModeler allows to select any number of object attributes and relationships to be used for optimistic locking. DataContext.commitChanges() will throw an exception with detailed failure information whenever a row update fails.

5. Lazy Collections and "hollow" objects. Cayenne is "lazy" in resolving to-one or to-many relationships (such behavior can be overridden using prefetching, as discussed above). Normally when a single object is returned from a relationship (e.g. by calling painting.getToArtist()), such object is "hollow" (unless it happened to be cached previously), meaning that it was created without a database fetch and doesn't have all its properties resolved yet. Whenever a user calls a getter or a setter on such object, it will be fully resolved via a query.

How Cayenne Compares To XYZ ORM Framework?

After a period of denial, ignorance and suspicion towards the ORM technology, it is quickly becoming mainstream in Java. And it turns out there are quiet a few robust solutions to choose from. Other frameworks known to the authors (those include EOF, TopLink, and to a lesser extent - JDO-based ones and Hibernate) do their job well, follow the same basic design concept, and have more than enough features to beat JDBC-based DAOs any day of the week. Unfortunately popular feature-for-feature comparisons of such frameworks provide as much information about the substance as nutrition labels about the food taste. So this is a bit like choosing a dish in a restaurant - its all about the flavor.

Two things defining "the hot flavor of Cayenne" are CayenneModeler and DataContext. It's been already mentioned that by providing GUI Modeler we want to make our users productive, while keeping them sane. None of the other open source frameworks have consistent and integrated modeling tools.

DataContext does to Cayenne API what CayenneModeler has done to the modeling process - it makes it extremely easy to use. The philosophy behind DataContext is: "Use single object copy per session. Do any number of edits. Flush the whole thing." This frees users from many inconveniences found in other frameworks. DataContext is a true facade for the rest of Cayenne. Its user API is trivial. In fact most applications call just these three methods:

DataContext.performQuery(..)
DataContext.commitChanges()
DataContext.rollbackChanges() - mind you, this is done in memory

DataContext is "disconnected" and uses Cayenne runtime for connectivity, still it can transparently work with multiple databases at once. Underlying DataNode will provide a connection only when the actual operation is started, and then it knows how to commit and close it. As a result connection pool is used very efficiently. DataContext is serializable, together with all its objects, and can span multiple transactions. No transaction management is required at the code level. There is no such thing as committing just one object (and trying to figure what related objects will be stored as a result). There is no need to do error-prone object graph cloning, TopLink style, when jumping from unit of work to session and back. All this makes DataContext an ideal session-level persistent objects container.

Conclusion

The article showed core Cayenne concepts and its main advantages from the point of view of Java developer. The advantages are no small ones - quick learning, model visualization, ease of use, powerful mapping and runtime abstractions. You can get more information about Cayenne from our website and friendly user mailing list. Commercial support and training is readily available directly from the authors.

About the Authors

Andrei (aka Andrus) Adamchik is one of the founders and the main developer of Cayenne. He has six years of experience developing enterprise applications for a number of companies ranging from logistics to finance, to media and entertainment. An advocate of Object Relational technology long before it became well known and accepted by the Java "mainstream". Currently Andrus is a CEO of ObjectStyle LLC, a New York and Atlanta based software consulting company.

Eric Schneider is principal partner of New York City based software engineering firm, Central Park Software, Inc. Eric has been developing large-scale enterprise applications for over eight years helping clients in various industries including, financial, medical, e-commerce, educational, and sports entertainment. Eric has been a contributor to the Cayenne project since early 2003.

PRINTER FRIENDLY VERSION