Welcome to MSDN Blogs Sign in | Join | Help

Brian Jones: Open XML Formats

I'm Brian Jones, a program manager in Office. I've been working on the XML functionality and file formats in Office for about 6 years now. In this blog, I'll mainly focus on XML in Office and the Open XML File Formats coming in the 2007 Microsoft Office system.
Difficult decisions between loose conformance and true interoperability

Rick Jelliffe had a great post earlier this week discussing the problems you get when you allow for really loose conformance. We've had discussions around this issue in Ecma for the past 9 months. On the one hand, we don't want people to feel like they have to implement the entire standard to be compliant. We want people to benefit from the work we've done, and to choose which parts of the standard they want to use. We also want people to use the standard as a launching pad for their own innovations. But as Rick points out, this can be problematic:

The ability to create extensions or subsets willy nilly is the antithesis of a standard. It is the difference between "I bought product X because it says it supports standard Y but it often fails and I am pissed" and "I bought product X because it says it has partial support for standard Y and I accept it doesn't work completely."

 

The way we've decided to go in terms of conformance (and you can read this in the 1.4 working draft of the spec) was to make it possible to use as much or as little of the spec as you want, but if you claim conformance to any part of the spec, you must fully conform to that part. If you don't fully conform to that part of the spec, you can still be conformant as long as you state which things you did not implement. Here is the "Goals" description from the spec:

The goal of this clause is to define conformance, and to provide interoperability guidelines in a way that fosters broad and innovative use of the Office Open XML file format, while maximizing interoperability and preserving investment in existing files and applications (§4). By meeting this goal, this Standard benefits the following audiences:

  • Developers that design, implement, or maintain Office Open XML applications.
  • Developers that interact programmatically with Office Open XML applications.
  • Governmental or commercial entities that procure Office Open XML applications.
  • Testing organizations that verify conformance of specific Office Open XML applications to this Standard. (Note that this Standard does not include a test suite.)
  • Educators and authors who teach about Office Open XML applications.

So based on those goals, and the nature of the spec, we identified the following issues:

  1. The application domain encompasses a range of possible consumers (§4) and producers (§4) so broad that defining specific application behaviors would restrict innovation. For example, stipulating visual layout would be inappropriate for a consumer that extracts data for machine consumption, or that renders text in sound. Another example is that restricting capacity or precision runs the risk of diluting the value of future advances in hardware.
  2. Commonsense user expectations regarding the interpretation of an Office Open XML package (§4) play such an important role in that package's value that a purely syntactic definition of conformance would fail to effect a useful level of interoperability. For example, such a definition would admit an application that reads a package, and then writes it in a manner that, though syntactically valid, differs arbitrarily from the original.
  3. Legitimate operations on a package include deliberate transformations, making blanket change prohibitions inappropriate in the conformance definition. For example, collapsing spreadsheet formulas to their calculated values, or converting complex presentation graphics to static bitmaps, could be correct for an application whose published purpose is to perform those operations. Again, commonsense user expectation makes the difference.
  4. Existing files and applications exercise a broad range of formats and functionality that, if required by the conformance definition, would add an impractical amount of bulk to the Standard and could inadvertently obligate new applications to implement a prohibitive amount of functionality. This issue is caused by the breadth of currently available functionality and is compounded by the existence of legacy formats.

The important thing to then get clear on is what the standard specifies. Section 2.3 of Part 1 defines this:

To address the issues listed above, this Standard constrains both syntax and semantics, but it is not intended to predefine application behavior. Therefore, it includes, among others, the following three types of information:

  1. Schemas and an associated validation procedure for validating document syntax against those schemas. (The validation procedure includes un-zipping, locating files, processing the extensibility elements and attributes, and XML Schema validation.)
  2. Additional syntax constraints in written form, wherever these constraints cannot feasibly be expressed in the schema language.
  3. Descriptions of element semantics. The semantics of an element refers to its intended interpretation by a human being.

And then we defined document conformance; application conformance; and listed some interoperability guidelines.

It's actually interesting how much time it took for the TC to really get comfortable with this section of the spec. It's extremely important to get right though, and Rick's post reminded me of that.

-Brian

Published Friday, September 08, 2006 11:11 AM by BrianJones

Filed under:

Comments

# Friday thoughts @ Friday, September 08, 2006 2:57 PM

Some really interesting things to note for the week:
New blog on Math in Office – Murray Sargent who...

Brian Jones: Open XML Formats

# re: Difficult decisions between loose conformance and true interoperability @ Friday, September 08, 2006 8:35 PM

I didn't really have much time to go through the spec in detail, focusing on just the citation and bib stuff, but I submitted some comments to the ECMA list that were in fact in part about extensiblty.

If I understand right (and correct me if I'm wrong) right now the bibiliographic schema is fixed; e.g. you cannot add foreign properties, nor use values for the source type that are not in the schema. Moreover, because you use the totally flat model, it means any new kinds of bibliographhc sources almost by definition will need new properties. A bad combination that will limit what users and developers can do with it.

If I had that right, then, this is where you err too much with strictness. Tweak the model just a bit to be a little more relational, add some rules for extension, and then you give a better balance of control vs. extensibility, something which I think is essential for this use case.

Bruce

# re: Difficult decisions between loose conformance and true interoperability @ Saturday, September 09, 2006 3:00 PM

Brian: Your comments had me read Rick's post again.  The follow-on discussion comments are interesting and I added my perspective at the end: http://www.oreillynet.com/xml/blog/2006/09/freaked_out_by_odfs_definition.html#comment-82811

I am very attentive to conformance and how up-/down-level and mixes usages are handled.  I'm still pondering my way through TC45 Working Draft 1.4 Part 5 but I certainly agree that this takes a great deal of effort.  

The treatment in the OASIS ODF Specification 1.0 is thin and disappointing and very much in contrast with the results that are claimed as assured by simply adopting ODF.  

I think that the way applications of the Application Office Open XML specifications are employed to accomplish document interchange will depend a great deal on what conformance accountability is required in the specification.  I suspect that we need something that allows communities of users to profile feature-usage conditions for their interchange requirements and have those conditions explicitly enforced and filtered for.

I am looking at the Markup Compatibility sub-standard to see if it provides an avenue for that kind of thing.  We'll see.

orcmid

New Comments to this post are disabled