Welcome to MSDN Blogs Sign in | Join | Help

Brian Jones: Open XML Formats

I'm Brian Jones, a program manager in Office. I've been working on the XML functionality and file formats in Office for about 6 years now. In this blog, I'll mainly focus on XML in Office and the Open XML File Formats coming in the 2007 Microsoft Office system.
Quick question for ODF experts

I must be reading this wrong, but in the ODF spec for tables it says the following:

"This chapter describes the table structure that is used for tables that are embedded within text documents and for spreadsheets."

Could it really be the case that ODF doesn't allow tables in presentations? I know that OpenOffice's presentation application "Impress" doesn't allow for native tables, but I had assumed the ODF format wouldn't have the same restrictions as OpenOffice.

Could someone more familiar with ODF help clear this up for me?

-Brian

Published Thursday, July 20, 2006 5:54 PM by BrianJones

Comments

# re: Quick question for ODF experts @ Thursday, July 20, 2006 10:55 PM

Native table support in presentations is slated for the 1.2 version of the spec. The recommendation until then is to use an embedded spreadsheet object.

Chris Nokleberg

# re: Quick question for ODF experts @ Friday, July 21, 2006 2:17 AM

I believe it's slated for a future version. One reason should be apparent: presentation differences. It's fairly easy in a word-processing document to handle a table that's too long and/or too wide for the page, likewise a spreadsheet that's too large for the current window. Now, what do you do on a slide when it contains a table that's too wide and/or too long for a single screen? You can make an argument for scaling, but it becomes infeasible after about 2-3x. You can make an argument for scrolling, but you can make equally good arguments against it. Generating an error when you try to create it sounds good, but reasonable people might not agree with that. Truncation... is probably the only near-universally bad option. It obviously needs some more discussion to figure out how to accomodate everyone, and how to express that in the XML without being hard to work with, staying consistent and not breaking anything fatally when trying to do something else.

But, as I said, we do know how to handle tables in word-processing documents and spreadsheets. It's hard to make the argument that well-understood and useful portions of the spec should be put on indefinite hold while more obscure portions are hammered out, so ODF isn't.

Todd Knarr

# re: Quick question for ODF experts @ Friday, July 21, 2006 5:18 AM

An open standard is defined by an open process as well as an open specification fully implementable by all. You can openly read OASIS' mailing list and see for yourself: http://lists.oasis-open.org/archives/office/

Of course you already do this in order to highlight every progress on the specification as a current shortcoming (Sic).

ECMA TC45 on the other hand is a closed process. No open mailing lists just very brief summaries with no details on the actual work (if any) on the specfication like this: http://www.ecma-international.org/news/TC45_current_work/TC45-2006-50.htm

Of course we - the independent developers - can read this blog and your point of view, and we do, but we can't judge the progress, the discussions, the points of views and the rationale behind the choices made. Is there any hope that you will actually listen to the questions raised on the newly published draft? E.g. http://www.robweir.com/blog/

As I said an open standard is defined by an open process as well as an open specification fully implementable by all. Quick question: Could it be the case that ECMA TC45 is failing on both counts and thus OpenXML is not an open standard?

RequiredName

# re: Quick question for ODF experts @ Friday, July 21, 2006 6:11 AM

Thanks for this kind question.
Sure; OpenDocument allows tables in presentations.

OpenDocument allows tables in presentations encoded in the following way:
<draw:frame>
 <draw:object xlink:href="Link to OpenDocument spreadsheet table" />
</draw:frame>
For convenience the next version of OpenDocument will also allow tables in presentations to be encoded as follows:
<draw:frame>
<table:table>
....
</table:table>
</draw:frame>


Talking about tables I have a kind question for the “ECMA Open XML” experts:

         Why do you need so many table models in your spec?

I currently counted three different table models, but I’ve only quick scanned the 4000 pages spec. Would it not be better to use one table model, as done e.g. in OpenDocument?

florian

# re: Quick question for ODF experts @ Friday, July 21, 2006 9:42 AM

Probably for the same reason they have two, extremely similar, but slightly different ways of storing hyperlinks in Excel.  One in SpreadsheetML and one in DrawingsML.  The structures are almost exactly alike, but the target of the hyperlink is different enough to make it annoying to figure out if the link is to the current document or an external one.

MS is not big on sharing code?  I've seen a several cases where they've reinvented the wheel.  Sometimes it makes one want to tell the developpers working on the different parts of the applications to, you know, talk to each other from time to time.

On the other hand, I shouldn't really complain.  The new hyperlink structures are better than their old IHlink interface used in the binary.

Also I can hazard a guess why there are at least two table models in the spec.  The concept of table in Excel is quite different from the concept of table in Word.  In Word you need to store all the heights, widths, cell merges, positioning, nesting, text, etc, etc that goes with the table.  In Excel, its mostly about formatting (borders, fills, fonts).  It doesn't control the size or position of cells in the grid, the cells (or really the rows and columns) themselves handle that.

A

# re: Quick question for ODF experts @ Friday, July 21, 2006 9:58 AM

OK, this helps clear things up. Thanks.

To quickly respond to the tables question, this really becomes clear as you have a better understanding of the different applications. Users of a wordprocessing document have different requirements for tables than users of a spreadsheet. The same is the case for presentations. Some of these differences are just properties on the structure, but other differences get down into the base structures of the tables. You need the seperate models so that you can efficiently respond to changes in the industry and customer demand. You don't want to hold off on improving the table model in presentations for example, just because it conflicts with what you are trying to do in a spreadsheet.
Now, that said, we have focused on sharing as much of the underlying structures and design principles as possible, but at the end of the day, a table in a spreadsheet may contain 500,000 rows of data, and it doesn't really make sense for that structure to be identical to the structure of a table in a presentation which is primarily used for very small amounts of data.

--- ODF folks ---

Chris and Todd, thanks for replying so quickly with this. I had assumed if it wasn't already in the spec, it was something that would eventually get added. I understand that defining structures like tables in a presentation and formulas in a spreadsheet is hard and complex. Wouldn't it be more accurate though to say that the current form of the spec is more like a version 0.5? I know that OpenOffice doesn't support tables in presentations, but the more widely used presentation technologies like PowerPoint definitely do, and I can't imagine how a spec could be complete if it leaves out things like tables in a presentation and formulas in a spreadsheet.

RequiredName, I could have looked through the mailing lists to find my answer, but I figured it would be quicker just to ask on my blog. I have a number of ODF experts who read this blog so I knew I'd get an anwer pretty fast.

Florian, embedding a spreadsheet in a presentation is not nearly the same as having native table support.

-Brian



BrianJones

# re: Quick question for ODF experts @ Friday, July 21, 2006 10:33 AM

>Florian, embedding a spreadsheet in a presentation is not nearly the same as having native table support.

That’s true Brian if you have three different table models. Since OpenDocument shares a common model the difference between the two options I outlined are just marginal. One is table declaration “by reference” and the other is a table declaration “in place”.

Hope this helps clarification,

~Florian

florian

# re: Quick question for ODF experts @ Friday, July 21, 2006 12:53 PM

Florian, if you are linking to a spreadsheet object, how do you determine which application to load it with? Are you saying that a Presentation program should have the ability itself to load and render a spreadsheet object? Do they share the same style definitions and other resources?

Does this mean that if I'm building a Presentation program, I have to support everything that is in a spreadsheet such as pivot tables and formulas?

Don't you find that people creating presentations usually have different feature requests than people creating spreadsheets?

BrianJones

# re: Quick question for ODF experts @ Friday, July 21, 2006 12:56 PM

I think, Brian, that one of the things the ODF people are trying to do is avoid having multiple table models. As noted, tables have slightly different requirements in different applications. It'd be nice, though, if they all used the same basic model with perhaps a few attributes specific to the usage in question. That simplifies any code that works with tables since large portions will in fact be common to all usages and only the parts that vary need to be coded to a specific variation. Note that one of the goals is something you noted: you don't want to hold up development of one thing just because of conflicts with another. MS choose to do that by using different models so one can be changed without affecting the others. ODF is choosing to use just one model, but to be careful about the base design and hamstringing so extensions can be done without breaking existing parts. This requires more up-front thought to avoid hamstringing, but once the first variation is done you end up with a large body of commonality that future extensions can take advantage of (eg. a word processor program can in fact read a presentation table natively and get it mostly right without having to understand presentation tables explicitly).

I think a large part of your confusion is a radical difference in spec development methodologies. The waterfall model is to get the entire spec correct before releasing anything. The open-standards world (of which Unix and Open Source are subsets) takes the attitude of "Get the parts you understand well right and get them out there so you can get feedback, then move on to hashing out the parts you don't understand as well. Just make sure you think through the future parts enough to avoid making decisions now that'll hamstring you later.". Basically the word-processing portion of ODF is fully done, so it gets released as 1.0. Spreadsheet formulas may then be added and 1.1 released (which'll include all of 1.0 and be compatible with it). Native tables in presentations then get added in 1.2 (which'll include all of 1.1 and be compatible with it). People who don't care about spreadsheets or presentations, just word-processing documents, don't have to wait for parts of the spec that they, well, don't care about to be finished, and 1.1 and 1.2 can gain from any feedback about 1.0 that comes in before they're finished.

Todd Knarr

# re: Quick question for ODF experts @ Friday, July 21, 2006 1:46 PM

Thanks Todd, I actually do understand that approach and I personally don't have any problems with it. I do think that this is the cause though of some of the problems people are seeing with ODF files. For instance, the spreadsheet format is super slow, and I think it's because they had to make it compatible with the text table model. The stuff that we do in SpreadsheetML around super efficient XML and string usage is something you'd only want to do if you knew that your application could have tables with thousands of rows (something you'd never get in a wordprocessing or presentation application). If you look at the open source application Gnumeric, it opens Open XML files much faster than it opens ODF files.

All that said, I have no problems with the folks working on the OASIS spec. I know how much work it is and I respect it. The issue that I have a problem with is that there are organizations like the ODF Alliance who are pushing on governments to mandate ODF for document exchange. It would seem like this is putting the cart in front of the horse though since as you point out the spec really isn't done yet. Version 1.2 is slated for OASIS approval next fall (2007), and probably another 6 months to go through ISO. At that point it sounds like they will have some of the more glaring issues (like tables, formulas, etc.) covered. But to actually make policies around version 1.0 just doesn't sound like a good idea.

-Brian

BrianJones

# re: Quick question for ODF experts @ Friday, July 21, 2006 3:10 PM

Brian,

thanks very much for your quick answers and the insights you gave into the ECMA work. I learned that you guys expecting heavy changes to the table models in “ECMA XML” in the future. This is why you want to keep the table models separate. That’s interesting to know!
So you expecting “ECMA XML” to change in the future, right? Why then such comments about an “unfinished” OpenDocument specification?

Regarding your initial question: “Yes of course; OpenDocument has support for tables in presentations and the encoding will even be improved --- following an open process”.

~Florian

P.S. I'm really interested to hear what changes you expect for tables which makes you keep the models separate. Will they change dramatically? But I understand if you consider this to be a business secret and can't talk about it.
Thank you so much for your sincere answers.

florian

# re: Quick question for ODF experts @ Friday, July 21, 2006 3:11 PM

Thanks for this kind question.
Sure; OpenDocument allows tables in presentations.

OpenDocument allows tables in presentations encoded in the following way:
<draw:frame>
 <draw:object xlink:href="Link to OpenDocument spreadsheet table" />
</draw:frame>
For convenience the next version of OpenDocument will also allow tables in presentations to be encoded as follows:
<draw:frame>
<table:table>
....
</table:table>
</draw:frame>


Talking about tables I have a kind question for the “ECMA Open XML” experts:

         Why do you need so many table models in your spec?

I currently counted three different table models, but I’ve only quick scanned the 4000 pages spec. Would it not be better to use one table model, as done e.g. in OpenDocument?

florian

# re: Quick question for ODF experts @ Friday, July 21, 2006 3:58 PM

I think any slowness in spreadsheets isn't due to the ODF format, Brian. At least the OO.o spreadsheet program doesn't use the XML as an internal representation (as far as I know), so the XML syntax shouldn't affect anything beyond loading and saving (where speed isn't horribly critical since you're waiting on I/O a lot anyway). I suspect this seperation between external storage format and internal working representation isn't the way MSOffice applications work (Office's XML (old and new) both look to my eye to be a straightforward XML encoding of a COM object hierarchy).

There's another rule that ODF is following: never optimize a general-purpose spec for a specific purpose. That kind of optimization leads to hamstringing: the optimizations tend to make it hard if not impossible to use for any other purpose, and you have to either suffer or create Yet Another Format for new purposes. The latter tends to be a headache down the road as more variations get introduced and formats for them multiply and other applications need to have more and more formats added to understand other applications.

Todd Knarr

# re: Quick question for ODF experts @ Friday, July 21, 2006 4:40 PM

--- Florian,

Did you mean to post that same comment twice?

I understand that you can have a spreadsheet embedded in a presentation. That is actually far different than native table support in a presentation though.

Of course the Ecma spec is going to evolve. I don't know how it will evolve, that's up to Ecma TC45. It's not like we're going to finish up version 1.0 and then just call it quits. As the different companies participating bring new ideas to the group we'll look at adding those into the spec. That's how you evolve software.

The problem with ODF is that version 1.1 isn't adding new functionality that wasn't thought of at the point of version 1.0. Version 1.1 and version 1.2 of ODF are actually still adding functionality like tables in presentations and formulas in spreadsheet. Those types of technologies have existed for over a decade, and it's probably a bit dishonest to claim that a spec is complete and ready for everyone to use when it's missing such key pieces. Once we hit version 1.2 we'll have to revisit this.

--- Todd,

I'm sorry, but I'm going to have to disagree with you on a few of your points there. First off, the ODF spreadsheet format is extremely slow to load. There have been a number of articles discussing this (http://blogs.zdnet.com/Ou/?p=196).

Second off, you should absolutely optimize your spec around specific purposes. Trying to design something completely generic doesn't do any good other than create a really simple lowest common denominator format that doesn't quite work in any specific situation. If you are building a format to represent spreadsheets, then you need to look at how spreadsheets are used. Go and tell the accountant or research analyst that the reason their spreadsheet file is so slow to open is that it was designed so that it could also be easily opened in a wordprocessing application and a presentation application. That's rediculous. It's fine as a low level interchange format, but not for a full fledged default file format. You need to look at how people use spreadsheets and design around that. The same is true for wordprocessors and for presentations.

Now, if you look at the spec, there are a number of similar structures for tables between the three applications. This is especially the case between PresentationML and WordprocessingML because in those cases the user scenarios are often very similar.

-Brian

BrianJones

# re: Quick question for ODF experts @ Friday, July 21, 2006 5:25 PM

I had problems with the server. I got a "SERVER BUSY message" and then both comments appeared. Sorry. Using Mozilla; perhaps thats the problem ;-).
Thanks for taking so much time laying out your opinion (flavored with some politics :-)). Really would have liked to work together with you guys in the OpenDocument TC.

~Florian

florian

# re: Quick question for ODF experts @ Friday, July 21, 2006 5:41 PM

Hey Florian, I actually get that same error every once and awhile too. So it's not just your browser. I've actually had times where I write up a big comment ane when I submit I get that message (I've actually usually lose the comments at that point though).

Sorry about the politics, there's so much out there at this point it's hard to avoid completely. :-)

-Brian

BrianJones

# re: Quick question for ODF experts @ Friday, July 21, 2006 6:38 PM

Brian,

I know you keep loving to state how "super-slow" ODF spreadsheets are. Do you have any numbers, other than George Ou's poor article?

Comparing Excel loading files to OpenOffice.org is hardly a fair comparison of the basic speed of the format: I'm sure Excel loads files extremely quickly no matter what they are.

It would be interesting if you actually had some numbers to back up your claims. Most XML parsers achieve a pretty decent speed, and parsing time is usually a really small proportion of the overall document load time.

Alex

# re: Quick question for ODF experts @ Friday, July 21, 2006 7:22 PM

Hi Alex,  
There are only a few things in this world that I love, and talking about spreadsheet performance definitely isn't one of them.

I assume that your mention of parsing times is in reference to my post about smaller tag length. Remember that as I mentioned in my post, that was just one of the many things we did to improve the performance of SpreadsheetML. The shared string table and shared formulas have a much larger positive impact on the load times than the tag length.

Now, I agree with you that comparing OpenOffice to Excel isn't really fair. The key is to find an application that supports both formats and compare them there.

I tried that out with Gnumeric, using a smaller version of the file George Ou used, and Gnumeric opened Open XML files faster than ODF files. That's not the best example, because Gnumeric's support for Open XML is still in it's early phases (at least it was the last I checked), but it's still a pretty good initial measuring point.

-Brian

BrianJones

# re: Quick question for ODF experts @ Saturday, July 22, 2006 6:50 AM

Gnumeric's support for OpenDocument is also in it's early stages too, though. It also depends on how the application's support has been developed - eg., within Microsoft Office, ODF can never, ever beat OpenXML simply because the plugin translates the ODF into OpenXML first.

I think we agree that parsing longer tags isn't necessarily much of a win, shared string support I don't know about. It would be interesting to look at numbers there too - it seems to me like that's kinda trying to optimise the parsing stage again.

A smaller filesize is a win in two ways: getting the file off the disk in the first place (less IO), and then having less to parse later. Now, I would suspect that the file size saving isn't going to be huge, because the optimisation of strings is something pretty straight-forward that Zip does for you already. With quick harddrives, I would think that even if the difference were somehow tens of megs or something, that's still not a huge amount of time added on to the load time.

So we're virtually back again to "How quickly does my XML parser parse?", and I guess "How much memory do I need to store my document tree?".

Given the different ways you can parse and store information coming out of XML files, it's so implementation-dependent that I think it's very hard to sensibly call one format faster than the other.

That's not to say I don't think Excel will probably blow most ODF apps away: I'm sure it will be an extremely fast program, especially since the file format has been designed for it. I'm just not sure it says much more than "Excel is very quick" - eg., I would be interested to see if Gnumeric loads OXML anything like as quickly as Excel.

Alex

# re: Quick question for ODF experts @ Saturday, July 22, 2006 3:18 PM


Brian Jones said "The shared string table and shared formulas have a much larger positive impact on the load times than the tag length. "

Gratuitous claim. Whether that's effective is directly related to the content. If you have no duplicate strings, or no strings by the way, then shared strings buy you nothing.

Are we in 2006 Brian? How come Excel's new XML is just BIFF surrounded by angle brackets? How come developers will have to manipulate indexes (for instance to add a shared string) if XML is a first-class citizen? With a modern programming angle, shared strings don't exist because they are just an artefact of optimization. As long as it was buried inside BIFF and well taken care of by the Excel run-time, everything was fine. But now that this is getting exposed directly onto every developer's face, this becomes a hassle. While shared strings are optional, I don't understand why they exist in the first place in the new XML. May be you can give a clue or two?

Mike

# re: Quick question for ODF experts @ Saturday, July 22, 2006 3:31 PM


Finishing my previous post, shared strings actually bear a long standing design flaw. They are used both to anticipate the storing of possible duplicate strings, and to store formatting runs (aka rich strings).

It's important to understand that your development environment makes a strong difference. If you are using Automation, you are totally insulated from all this, and thus can concentrate on your business. If you are directly accessing the xml, you have to deal with all of this, which is error prone (document corruption will be rampant), and shared strings is just one example where indexes and other things have to be handled by the developer.

Why, oh why? Accessing the xml (in fact, a nice acronym for the serialized BIFF surrounded by angle brackets) is just masochism. Don't get me wrong, Custom XML has virtues, but that's a totally different topic (and a Office 2003 feature by the way).

Mike

# re: Quick question for ODF experts @ Sunday, July 23, 2006 3:55 PM

Hi Brian.  Your question stopped me because of something I recalled from the early days when the OASIS 1.0 specification was first available.  There was a claim that ODF had this great uniformity and reusability of elements across document applications.  I thought that was great, but then I couldn't understand why there are so many file extensions (and Zip-embedded MIME types).  So, I figure, maybe that variety is just a hint as to what application should open the top level, with it likely that there's an integrated application underneath those different views into documents.  That was my thought.

So today I checked the specification to see what is going on with that.  There is a very useful (Non-Normative) Appendix D on Core Features Sets in the ODF 1.0 Specification.  It has this marvelous first paragraph:

"The OpenDocument specification does not specify which elements and attributes conforming application must, should, or may support. The intention behind this is to insure that the OpenDocument specification can be used by as many implementations as possible, even if
these applications do not support some or many of the elements and attributes defined in this specification.  Viewer applications for instance may not support all editing relates elements and attributes (like change tracking), other application may support only the content related elements and attributes, but none of the style related ones."

This is not completely problematic, but it reaffirms that there is no minimal core to ODF, not even a level that one could declare being honored for purposes of interchange.  I really do hope this will be addressed.

Meanwhile, the table in that Appendix makes it clear that table elements are not part of the (non-normative) Core Features Set for Presentation documents.

On the other hand, the "strict" schema in Appendix A does not appear to differentiate among type of document at all, which is moderately marvelous.  I can't find presentation as a distinct XML format at all.  However it is a distinct "document type" and section 2.3 has this interesting description:

"All document types share the same content elements, but different document types place different restrictions on which elements may occur, and in what combinations. The document content is typically framed by a prelude and epilogue, which contain additional information for a
specific type of document, like form data or variable declarations."

My first thought at this point was "Lordy, this thing is way more broken than I could have possibly imagined."  How are they ever going to accomplish interchange and preservation of documents in civil government in ways that that allow them to have multiple suppliers?  How are State and local level IT organizations going to be able to deal with the reality of what is about to happen?  It may take the National Archivist to step in, and they've got enough thankless tasks.

I'm sure that I'm over-reacting, but I remember a senior software architect (Dick Wilson) who taught me an important principle, sometime around 1975 or so: Don't start out taking architectural positions that you might have to take back later.  Start with constraints that you can always preserve and, if possible, broaden later when you're sure about how things are working out.  I guess I should call this Dick Wilson's "When the horse is out the barn door" principle.

It will be valuable to see how the ODF development process manages to preserve validity of ODF 1.0 documents as some sort of profiling or other agreement is done as ODF 1.x moves forward.  It appears that it would be worthwhile to have an ODF-I group whose task is not unlike the WS-I effort to create profiles for interoperability.

Dennis E. Hamilton

# re: Quick question for ODF experts @ Monday, July 24, 2006 9:51 AM

Come on Brian, must you always spin these details? Re: this in particular:

"Wouldn't it be more accurate though to say that the current form of the spec is more like a version 0.5? "

This is a ridicuously over-the-top comment. Sure tables in presentations ought to be there, and if MS had been involved with the TC, they would have been. You could have raised the issue, proposed a solution, and it woud have been there.

AFAIK, apps like KOffice and OOo don't support this functionality in presentations, so it's not any huge surprise the developers didn't consider it a top priority. But the accessibility SC raised the issue, and it will be addressed shortly enough, along with formulas, and the metadata stuff I'm involved in.

You can nitpick all you want, but we all know these are imperfect and evolving specs. For another view from the other side, read Rob Weir's blog.

http://www.robweir.com/blog/

He works for IBM and has a stake in all this too, but I think he has some good points and suggestions about OXML.

Bruce

# re: Quick question for ODF experts @ Monday, July 24, 2006 5:16 PM

"Wouldn't it be more accurate though to say that the current form of the spec is more like a version 0.5?"

Brian, you are being too kind. Looks more like version 0.2 to me.

No tables in presentations, no formulas in spreadsheets, no conformance criteria... Every day it seems that Sun and IBM are really trying to pull a fast one on several governments worldwide. Please keep it coming, someone need to expose this.

Fernando

# re: Quick question for ODF experts @ Tuesday, July 25, 2006 2:16 AM

It is spin indeed. MSXML has 4000 pages of documentation! Mostly documenting the way that MS Office does things. It seems a spec that no one is going to try and re implement in an office application -especially competing with MS Office-, and certainly not designed with simplicity or readability in mind. So many of the benefits of XML as an interchange document will not be achieved. Is a step in the right direction but as a customer I would be better served with a solid single Office XML format rather than re playing the format wars. It's just not in Microsoft (short term) interest to do that.

Oscar

# re: Quick question for ODF experts @ Tuesday, July 25, 2006 2:21 AM

Fernando, as discussed in previous posts in this blog the design goals were significantly different. RTF hasn't got all the features of MS Word but it works as a (virus free) exchange document and is far more stable than MS Word docs as a positive side effect. It also probably serves what 95% of users of word do with it. Also it would be good thing to let governments and citizens to decide for themselves and certainly not spread mis information. There should be a blog about statements from MS about competitors that are plain FUD or deliberately incorrect.

Oscar

# re: Quick question for ODF experts @ Tuesday, July 25, 2006 1:08 PM

Oscar, yes the goal for a standard would be to allow free document exchange. Would you say that ODF as it is today satify this goal? Clearly not.

<i>There should be a blog about statements from MS about competitors that are plain FUD or deliberately incorrect.</i>

Are you suggesting that Microsoft should follow IBM's lead and create its own version of Groklaw?

Fernando

# re: Quick question for ODF experts @ Tuesday, July 25, 2006 2:39 PM

Oscar, your comment that Microsoft is "Mostly documenting the way that MS Office does things." has helped me clarify my understanding of the contentious issues about document format standards.

To my mind, the ODF folks have not properly appreciated the significance of the fact that a document format needs to be judged, at least in part, in relation to the *capabilities* of the application that created it.  Once that is appreciated, it becomes possible to introduce two important concepts needed as part of the tools for judging a document standard: *Completeness* and *Fidelity*.

Completeness means that the document format can represent all the different aspects of the document (structure, presentation).  For example, if a presentation application allows for tables in its presentations, then to be complete, the document format needs to be able to represent that there are tables in the presentation.  Fidelity goes further, and requires that the representation be 100% faithful to to original document's structure and presentation (as expressed in the application that originally created the document).

The importance of completeness and fidelity is best understood by considering one important use of a document format: archiving.  An archived document needs to be able to be changed back at any time (though viewers or document translation software) into a 100% faithful representation of the original.

To see why, consider the case of a government or company that wants to archive a legal document.  The meaning of that legal document can be affected by things you might consider unimportant for other documents, e.g., placement of commas or use of italics.  If the archived document is in a document format that is not faithful to the original application, then lawsuits might result because someone could claim they were damaged by the fact the the archived document did not accurately represent the intentions of the original document.

Now it becomes understandable why Microsoft is "Mostly documenting the way that MS Office does things." They don't just want an open document format.  They also want completeness and fidelity with respect to documents produced in Microsoft Office applications.

There is nothing wrong or sneaky about this attitude.  In fact, ODF supporters *also* want completeness and fidelity with respect to a small number of other office applications that they are associated with (not necessaily legally or organizationally, but emotionally and in terms of mindset).  Those office applications are clones of a small subset of Microsoft Office's capabilities -- the most commmonly used parts, of course.  Now, ODF by their own admission is probably a few years away from completeness or fidelity with respect to these particular applications.  But when they do achieve it, they will rightly be able to claim that the enhanced ODF provides a complete and faithful representation for documents produced by those *specific* office applications, and it is therefore safe to archive them in ODF format, or to interchange documents between these specific office applications.

That does *not* mean that the future ODF will be able to completely and faithfully respresent documents produced in Microsoft Office applications. (The claim on their web site that they can do this is incorrect.)  The reason is practical: To be complete and faithful to Microsoft Office documents, the ODF folks would have to enhance ODF so much that it would be a virtual clone of OpenXML from Microsoft.

Nowadays, the overwhelming majority of office documents produced today are created in Microsoft Office (except for PDF documents).  I hold the common view that this is not going to change in the forseeable future.  That means that in real life, ODF will never be able to properly represent anything other than a very small fraction of office documents.  That also means a fairly marginal future for ODF (in my opinion).

Ian Easson

# re: Quick question for ODF experts @ Tuesday, July 25, 2006 5:17 PM

Ian,

First of all thanks for your comment.

I don't consider myself part of the "ODF folks". I use and appreciate both MS Office and Open Office in Windows. Because I use both (some to a higher degree than others) I do not believe the statement that "the applications are clones of a small subset of MS Office capabilities". Both have advantages and disadvantages. Excel is clearly faster and more developed than Calc, but I actually use and prefer Writer as an application to MS Word.

Going back to your comment, are you claiming that current versions of Word do provide a complete and faithful representation of Word documents created in previous versions of the application and/or in other platforms (eg. Apple)? This is not the case in my experience, and if you look around on the web there is plenty of similar experiences. On the other hand the current version of Excel has heavily and repeatedly corrupted critical spreadsheets on a crash that I only have been able to recover with Calc.

In fact I doubt any program can claim complete and faithful representation of a document other than presentation-only formats like PDF (and possible the MS version of PDF) or a printout, unless you open them exactly on the same computer that produced them.

What I want is to be in control of my data now and in the future, and being able to edit it with the program of my choice. I can take a JPEG picture with any digital camera and manipulate it with the software that better suits me. This is also the case for HTML but definitely not for MS Office documents.

Your last paragraph is very telling of the conflict: "In real life ODF will never be able to represent anything other than a very small fraction of office documents", I take you mean MS Office documents. It seems to you the value is in the format itself, to me as a user is in what it does/doesn't enable me to do.

Fernando: if by free document exchange you mean gratis yes, it enables me to share documents freely. (i.e. I think the standard version of Office for normal users is overpriced). If you mean the current version of ODF enables me to share documents I'd have to speak about specific implementations/programs, and who do I want to share documents with. I take your point that ODF is not complete  as a spec (i.e. formulas), but at the same time the new version of Office is not a market reality either and it won't be for a while in terms of amount of documents in that format.

Oscar

# re: Quick question for ODF experts @ Tuesday, July 25, 2006 6:06 PM

If the purpose of a document format is fidelity, then fidelity to what?  If I save a document in WordML, is fidelity judged by how well it holds up under Word, or any program?  And which version of Word?
If I save documents in WordML but do not save a version of Word that reads it well, I do not have fidelity.  The format without the program is not complete.
My big gripe with all of the Microsoft XML work is that they have not removed their products from the format, so far.  Word compensates for invalid WordML.  It does not tell you this, it just does it.  One of the reasons that WordML to PDF translators aren't doing better than they are is due to this marriage of format to program.
I would applaud any group that would produce a format with true fidelity - meaning that the program was not required to get it.

Rick

# Table models in file formats @ Tuesday, July 25, 2006 10:03 PM

In my post last week about the lack of table support in ODF, some folks were curious as to why the Ecma...

Brian Jones: Open XML Formats

# re: Quick question for ODF experts @ Wednesday, July 26, 2006 9:18 AM

Rick, you seem to have misinterpreted some of what I said.  But I certainly agree with your last sentence.

You said:"If the purpose of a document format is fidelity..."  I said it was one of "two important concepts needed as part of the tools for judging a document standard".

You ask "then fidelity to what?".  I already answered that by saying "to the original document's structure and presentation (as expressed in the application that originally created the document)."  I did not mean to imply by that last phrase that it had to be the *original* application that viewed the document, although I can see how it might have been misinterpreted that way it was worded.  Let me explain further.  

What I have in mind (and so does Microsoft, because they have stated so) is that with an open document specification, *anyone* is free to develop a viewer application which would then be able to render the document in full fidelity to what it looked like in the original application (e.g., Microsoft Office or OpenOffice or whatever.)  That is an answer to Microsoftphobes who either:
a) Don't want to buy Microsoft Office to be able to view documents produced by it; or
b) Believe that Microsoft is going to screw them up by changing document formats in the future, rendering all OpenXML documents unviewable or inaccessible. (Why they think Microsoft would be so stupid I have no idea!); or
c) Believe that Microsoft will disappear off the face of the earth (as they passionately hope it will), leaving OpenXML documents unviewable or inaccessible except to those people who have purchased Microsoft Office.

The model I have in mind (and although they haven't explicitly said so, I believe Microsoft has it in mind, too) is the (Adobe Acrobat, Adobe Reader) one.  In that model, you have an original application (Acrobat) that allows the user to produce files in an open document format(PDF).  You also have the viewer program (Reader) that renders in *full* fidelity the contents of the original, without the need for the Acrobat program that originally produced it.  Also, since the PDF format is published, *anyone* can produce alternate Reader-like viewers(and they have!).  Now just substitute the words "Acrobat" --> "Microsoft Office" and "Reader" -> "OpenXML Viewer", and you see what I mean.

It doesn't have to be Microsoft that makes an "OpenXML Viewer".  They might do it, they might not; it's irrelevant.  What the ECMA *does* need to come up with is conformance criteria/tools, so that OpenXML viewers can get certified as being conforming to the standard.

Continuing the model further:
- Anyone today can write programs that manipulate PDF files and incorporate them into a business process.  Microsoft has stated that they want and expect this to be the case for OpenXML documents.
- There are various versions (1.0, 2.0, etc.) of the PDF specification, and the PDF viewers and processing programs are able to handle the version issue.  Likewise, the ECMA OpenXML specification they are currently completing will be version 1.0. There *will* be future versions.  That will not break things, any more than going to enhanced versions of the PDF specification broke programs.  Those programs just got enhanced to handle new versions of the specification.

You raised the question of which version of Office was I talking about.  Since Microsoft has said it will be releasing free add-ons for Office 2003, XP, etc. that will handle OpenXML, the question is moot.  The legacy binary ".doc" format doesn't enter into the discussion.  (Of course. Microsoft would be stupid not to release a *batch* converter!)

You said that "Word compensates for invalid WordML. It does not tell you this, it just does it."  That's news to me.  Do you have a reference?  If so, Word 2007 needs to have an additional mode in which it strictly interprets the file according to the standard.  That will be needed especially after external programs get written which modify or create WordML documents.

Finally, you said:" I would applaud any group that would produce a format with true fidelity - meaning that the program was not required to get it.".  Then you should applaud the ECMA in a few months.  As for ODF, hold your applause for a few years, according to their own statements.

Ian Easson

# re: Quick question for ODF experts @ Wednesday, July 26, 2006 2:52 PM

Ian - didn't mean to misinterpret.

We have been working with WordML since it's inception in a commercial product.  I have also had a few conversations with Brian (and others) about Microsoft's direction with future Office XML.  I will admit to being one of those that want MS to succeed in making a very good XML-format.  I am less worried about it becoming an international standard, but I see how it will help.

My gripe with MS has been in how it interprets WordML.  The example I can give you is a simple one, but still telling.  It is possible to state a size for a header or footer that is not big enough to hold the text and/or graphics placed in it.  Now, give this WordML to multiple viewers, including Word.  Some will show the header and correctly show that it is wrong.  Some will refuse to show the file at all, saying the header is broke.  Word will show the header as it should have been had it been done correctly.  Which was correct?

With this next go, I would like something more rigid.  If it was the fault of WordML being too ambiguous, then I want the ambiguity removed.  If some form of XML tester existed, you could at least ask "Is this OK?", sort of like lint used to be for the C language.  There's no question that the XML schema will be complex enough to warrant one.

Rick

# re: Quick question for ODF experts @ Wednesday, July 26, 2006 5:34 PM

I know nothing about the details you're talking about Rick (though I've had it confirmed to me by people who would know), but I wonder if any ambiguity in the schemas is more a limitation of XML Schema? There are a lot of practical, real world, constraints that simply can't be expressed with the language. For those cases, you either need to use Schematron alongside XML Schema, or use a better schema language, like RELAX NG.

Bruce

# Spin Spin Sugar @ Thursday, July 27, 2006 9:27 AM

OK, forgive the random Sneaker Pimps reference and I promise we will move off this topic of ODF politics...

Brian Jones: Open XML Formats

# re: Quick question for ODF experts @ Saturday, July 29, 2006 8:01 AM

Ian,

I don't consider myself a "Microsoftphobe" just because I want a wide choice of software available. I probably would call myself "promiscuous". :-). Anyway, to your list I would add:
d) Believe their data is their own and do not want to be restricted in the application they use to edit it.

Oscar

# What is Rob Weir (and IBM's) Agenda with the OOXML Bashing? @ Tuesday, January 23, 2007 1:35 PM

Dare Obasanjo aka Carnage4Life

# Dare Obasanjo aka Carnage4Life - What is Rob Weir (and IBM's) Agenda with the OOXML Bashing? @ Tuesday, January 23, 2007 1:35 PM

PingBack from http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=ac147359-114a-42d2-9ec4-64aa599dec58

Dare Obasanjo aka Carnage4Life - What is Rob Weir (and IBM's) Agenda with the OOXML Bashing?

New Comments to this post are disabled