April 2004
Introduction
If you ask someone if they use code generation, chances are that they swear by
it. Code generation saves time and effort and can greatly improve the
maintainability of a system. Letting your computer write code for you
is so compelling that it's hard to imagine why any developer wouldn't
embrace it. Yet, code generation remains a bit of a black art,
practiced by a few and met with skepticism and distrust by the rest.If
you haven't yet been bitten by the code generation bug, let me explain
why I think code generation matters to Java developers.
What is code generation?
Code generation can mean a number of things, but here I'm referring to
the act of having a computer program generate Java source files,
ones that you would have otherwise needed to write by hand, which are
compiled as part of your project.With this style of code generation
(known as active code generation, for those keeping score) the code
generator owns the generated code and the programmers don't edit the
generated code directly.
There are many ways to think about this type of code generation. I find
it useful to mentally model it as template expansion. A template is an
outline of the final product with the details left to be filled in,
sort of like a JSP page. Input data is fed into a template, to fill in
the blanks, and the result is a customized source file based on the
input data. I'm not trying to suggest that all code generators use
templates or even that somehow that is the preferred way to implement a
code generator. It's just a simple and easily understood abstract model
to help us think about code generation. In this model, the template
encapsulates code and logic that we are trying to re-use across all of
our generated classes.
The source code and the resulting class files might be
indistinguishable from good old-fashioned cut-and-paste coding, but
from the programmer's perspective the code can be treated as a single
unit. When we test a generated class, we are testing all the classes,
at least partially. When bugs are found, they can be fixed in the
generator and the fix goes in across the whole system. Consider the
benefit of a completely consistent generated subsystem. Naming is
consistent. Error handling is consistent. Logging is consistent. When
we understand how one generated class works, we understand all of them.
Code generation let's you leverage code across a code base in ways you
otherwise might not be able to.
Code generation as re-use
I find the code generation as re-use argument quite compelling. It's
hard of think of a virtue that has been more revered by coders of every
platform, system and language orientation than code reuse. From simple
subroutines and modules to modern object-oriented techniques, the
methods have varied, but the idea of writing code once and re-using it
has been fundamental to both the art and science of coding.
We usually think of re-use in terms of language and runtime
mechanisms. We put code in shared libraries to be used by any number of
callers. We use inheritance to share code with subclasses and use
interfaces to allow our classes to fit into and re-use existing
infrastructures. Today's hot aspect-oriented programming (AOP)
techniques are an attempt to extract orthogonal concerns from classes,
allowing them to be written once and re-used.
Given our predisposition to strive for re-use in any form, it seems odd
that many developers don't give much consideration to the value of code
generation. Part of the problem is that code generation can look like a
hack. If language and runtime systems provided re-use techniques that were deemed acceptable, then code generation would not be seen as a primitive pre-processing hack that we should be try to
avoid.
There may be a bit of justification for that view. Code generation is
often the most valuable in those dark, messy corners of the systems we
are working with, in places where our language and tools seem
inadequate. Take EJB development, for example. I don't want to bury (or
praise) EJB here, but it's hard to deny that EJBs are messy and
complicated. XDoclet is a code generation tool that has found its niche
in relieving at least some of the pain associated with EJB development.
But even here, XDoclet is often seen as a necessary evil in keeping the
EJB monster at bay rather than a positive use of code generation.
Don't think that because the examples of code generation you see are
often band-aids around inflexible or poorly designed systems that code
generation is necessarily a sign of a poorly designed system. Code
generation is a valid technique that provides a valuable type of
re-use.
Code generation as high level language
I'll make another argument in favor of code generation as a relevant
modern technique, that generating code is like writing code in a higher
level language. Think back to our abstract model of code
generation. Our template is expanded based on some high level data we
feed into it, and to change the code, we merely change the input. In
essence, we are writing code in a high level language compiled by the code
generator.
Is Java necessarily the best language to express the concepts you need
to express? What if a UML type model is the best high level way to
express your business objects? Maybe a few JavaDoc comments on your
class is a better way to express certain details of your
system? Sometimes, the best way to program a system isn't with the
language it is implemented in.
My first experience with source level code generation was with a parser
generator called yacc (yet another compiler-compiler). Yacc takes as
its input a grammar for a language: rules like "an assignment statement
is a variable name followed by an equals sign and then an
expression". From this high level grammar, it generates C code that
implements a parser for that language.
Very few people would consider writing a parser by hand for anything
but the simplest of expressions. We want to work at a more abstract
level, but we don't want to lose the benefits of a static compiled
implementation.
That's the power of code generation. You can work at a higher, more
general level without having to leave your domain-specific
language environment.
A code generation case study
I worked on a project that migrated a large business model from hand
coded business objects to business objects generated from a UML model.
There were strict requirements for security, persistence and
extensibility. With the hand coded objects, each class worked slightly
differently. As much as we tried to stick to well-defined conventions,
each class looked slightly different. When changes needed to be made
across the whole system, the developers had to apply the changes across
the whole system. The potential for bugs was enormous and the
developers spent a lot of time just maintaining the system and keeping
the model, the code and the relational mappings in sync.
When a code generator was added to the system, everything changed. The
UML model became king and all the logic for the various aspects of the
system was coded once in the code generator. Writing the code generator
took a lot of effort, and maintaining it was far from free. But the
cost of maintaining the generator was much less than maintaining the
entire hand coded business model.
Code generation was a huge win in this system. But were there other
options? An AOP system could have provided consistent security,
persistence, and logging layers across the entire system. A managed
component system like EJB could have provided a similar set of layers:
transactions, security, persistence, etc... Both of these are
interesting options, but neither quite fit our needs the way code
generation did.
A third possibility would have been to move to a completely dynamic
abstract business object system. I call this the framework approach.
Instead of implementing the business model as static Java objects, we
could invoke generic methods for getting/setting properties and
invoking business methods on your abstract objects. Our framework would
figure out, based on the properties of your dynamic objects, how to
apply all the layers of system in a consistent way.
The framework approach is surprisingly similar to code generation. All
the decisions the code generator made at compile time about the
behavior of the objects could be made in a similar way at runtime by
the framework. And, the runtime framework could do many things not
possible at compile time. But, compile time generation has it's own
benefits too. Having source code pass through the watchful eyes of the
compiler helps find problems faster, especially problems stemming from
changes to the original input sources. Also, having the generated
source code available can speed up debugging.
The goal isn't to say that code generation is necessarily better than
the other possible approaches. In some cases it is, and in other cases
it isn't. What I do want to say is that code generation should be
considered on equal footing with these other techniques. Although code
generation is often looked down upon as a dirty hack, it is really a
powerful high level way to reuse code across a large system.
Conclusion
Many developers turn their noses up when code generation is mentioned,
but given all the benefits of code generation it's hard to understand
why. I strongly encourage using code generation wherever it is
applicable on a project. The rewards are well worth the effort, and the
occasional upturned nose.
About the Author
Norman Richards has ten years software development experience, and has worked with code generation for much of that time. He is an avid XDoclet user and evangelist. Norman lives in Austin, Texas.
Norman is the co-author of XDoclet in Action.
PRINTER FRIENDLY VERSION
|