Welcome to MSDN Blogs Sign in | Join | Help

I had a few people point me at a couple of IBM blogs today (Bob Sutor and Rob Weir) and I have to admit I was a little disappointed to see that they are really working hard to continue to push negative views of the Office Open XML formats. Basically they want to position it in such a way that there is a winner and a loser, and it's no surprise that they think the winner should be the one they've put all their resources behind (ODF). It's definitely a strong "us vs. them" mentality that you also see a lot in politics these days. I admit I've pushed back in the other direction at times and had some criticisms of the Open Document format, but those have always been in response to folks who ask why we couldn't use ODF as the default format for Microsoft Office. I had always stated that we needed a format that could fully support all of the features our customers used, and when the ODF folks snapped back saying that I wasn't providing enough concrete examples, I decided to start providing specific problems. I've never said the world can't use ODF, I've just said that the Office Open XML formats are also necessary. I feel like some of these folks have watched Highlander one too many times (hence the title of this post). I would never make the claim that the HTML format means that ODF isn't necessary, and I certainly don't believe the ODF means that Office Open XML isn't necessary.

The latest criticism from Bob and Rob is that the Open XML formats don't use MathML, and instead define a separate XML syntax for a Math Presentation format. Rob even displayed a bit of a flare for the dramatic, as he titles his post "Math you can't use" and Bob followed up with "Making bad choices, over and over again." Well thankfully this isn't really true, and to be honest, if posts like that aren't considered 'FUD' I don't know what is. Every piece of the Office Open XML format is being fully defined in Ecma, and we've even built XSLTs that will transform from MathML and back. In addition to that, we've worked closely with different companies out there that already support MathML to make sure we are compatible with their solutions. We support MathML on the clipboard, so you can paste a MathML equation into Word. Here is the latest version of the XSLT that takes the Office Open XML format for Math and transforms it into MathML (http://jonesxml.com/resources/omml2mml.xsl), and here is the XSLT that goes in the opposite direction (http://jonesxml.com/resources/mml2omml.xsl). Anyone who has Beta 2 of Office 2007 should already have these on their machine under "Program Files\Microsoft Office\Office12".

Just in case folks aren't sure what I'm talking about, this is all about the presentation form of MathML. The math is never actually calculated, only displayed. Also note that this is different from the discussions around functions, which are a large part of the SpreadsheetML specification. Unlike the spreadsheet functions, the Math support is all around scenarios like academic papers that need to use formulas as part of the information they are presenting.

I remember a few years ago having a discussion with Murray Sargent, who was one of the key folks behind the new math support in Office 2007. He also had worked on the MathML 2.0 standards body before it was dissolved, and we talked about whether or not we could use MathML for the formats. He obviously was very familiar with the MathML format, and the conclusion was that we unfortunately couldn't use MathML in our new default XML formats. We found that while MathML works great for isolated math islands, it didn't give us everything we needed at the document-level. Although MathML does have space for annotations so we could have extended it, that would not have worked well with document-level features like comments, track changes, Word styles, etc. The equation support in Word 2007 is actually very impressive, and if you haven't taken a look yet I strongly suggest you give it a try.

We did agree though that we should fully support MathML as an interoperability language between apps, which is why we can read and write Presentation MathML on the clipboard (leveraging those XSLTs).

This is just another example of the difficult decisions we had to make when building these new formats. Of course we would have loved to have just used MathML, as it was already fully designed and documented. It would have been much easier, but it would have also meant we would have to either cut back the functionality, or extend it in such ways that it was no longer as usable. If you ever used the HTML formats from prior versions of Office, you've seen that when you try to take a format that was designed for other purposes and add extensions so that it can represent your files you often end up with a rather complex and unmanageable result. So instead, we used MathML as a guide, and tried to leverage as much of the design as we could. We had to make sure we could support our features though and not let the format put the end user in a bad state. Most of our users don't care the least bit about XML and XML formats, and if moving to the new file formats meant things like tracked changes wouldn't work on the equations, then folks would have chosen to stick with the binary formats instead. So we instead have an XML format that supports all of the features, and that format is fully documented and free for anyone to use. Not a bad deal in my view. I can't say enough how proud those of us are who worked on the formats are. It's such an important change in the world of Office documents.

-Brian

44 Comments
Filed Under:

This is the fifth post by Zeyad Rajabi who owns the XHTML output from Word's new blogging feature. In earlier posts, Zeyad discussed a general overview of the XHTML, details on XHML compliance, how we map styles to semantics, and bullets and numbering. Today Zeyad is discussing the ways in which output tables.

Today I will be talking about tables in our blogging feature. Similarly to other supported Word features, our XHTML output for tables is not full fidelity. Word 2007 provides you with a rich editing experience that allows you to create a multitude of different types of tables. Our blogging feature only supports a handful of the possible types of tables that can be created in Word 2007.

Below is a table that enumerates all the table related HTML elements and the possible attributes and CSS properties that can be output for those elements.

 

HTML Elements HTML Attributes CSS Properties
table border
style
border-collapse
background
colgroup  -  -
col  style  width
tbody  valign  -
tr  style  background height
td colspan rowspan
style valign
background border-bottom border-left border-right border-top padding-bottom padding-left
padding-right
padding-top

 

I think the easiest introduction to our XHTML output for tables is to show some sample examples.

Example 1:

This example is the default table style for Word.

 One  Two
 Three  Four

 

Our XHTML output for the table can be found here.

Example 2:

This example shows shading applied to cells vs. rows.

February
M  T  W  T  F  S S
            1
2  3  4  5  6  7 8
9  10  11  12  13  14 15
16  17  18  19  20  21 22
23  24  25  26  27  28  

 

Our XHTML output for the table can be found here.

Example 3:

This example shows difference of fidelity when publishing from Word:

In Word In Browser

 

Our XHTML output for the table can be found here.

Example 4:

This example shows different borders being applied to different cells:

August 2006
Sun  

Mon

 

Tue

 

Wed

 

Thu

 

Fri

 

Sat

                   

1

 

2

                         
3  

4

 

5

 

6

 

7

 

8

 

9

                         
10  

11

 

12

 

13

 

14

 

15

 

16

                         
17  

18

 

19

 

20

 

21

 

22

 

23

                         
24  

25

 

26

 

27

 

28

 

29

 

30

                         
31                        


Our XHTML output for the table can be found here.

Comments are welcome

3 Comments
Filed Under:

This question has come up a few times, most recently over on the OpenXMLDeveloper site (http://openxmldeveloper.org/forums/477/ShowThread.aspx#477)

The challenge a lot of folks have is that they want to generate a WordprocessingML document using pre-existing content. Often times that content is in other formats, like HTML. This is also the case if you have folks entering rich content in a web form or some other type of HTML control, and then you want to use that content to generate a wordprocessingML document. While there are tools out there that will transform from HTML into WordprocessingML, this is also easily achievable using the altChunk element.

You can place one or more XHTML files as a seperate part(s) in the ZIP package, and give it the proper content type. Then create a relationship to it from the document.xml part. Once you've done that, you can place the afChunk element (which is a block level element) into the content of the document, and reference the relationship ID that you used to point at the XHTML part. You also have the option to specify whether you want the styles to be merged with the document, or if you want it to maintain the source formatting.

So, for example, you could have the following:

<document>
  <body>
    <p><r><t>Here is a some WordprocessingML followed by someXHTML:</t></r></p>
    <altChunk r:id="rel7"/>
    <p><r><t>Here is some more WordprocessingML</t></r></p>
  <body>
</document>

The relationship type is: http://schemas.openxmlformats.org/officeDocument/
2006/relationships/afChunk

The content type for html is: application/html

With the example above, the content of the HTML file that was referenced by the altChunk tag would show up directly inline after the first paragraph. Now, you should note that this is an import only feature. Once the file is opened, the XHTML content is merged with the rest of the file, and when you save, it will be represented with wordprocessingML rather than XHTML.

This was something I really wanted us to support with the 2003 XML formats when we did the cfChunk work. The cfChunk is extremely useful, and the altChunk builds off of it.

-Brian

6 Comments
Filed Under: ,

There are quite a few online labs for Office, but I thought I'd point out this one in particular: Programmatic Manipulation of the Microsoft Office Open XML Formats Virtual Lab

I have to admit that I haven't tried this one out myself yet, so I'll be really interested to get feedback from folks who give it a try. I remember being asked to give feedback on it awhile ago, and they gave me a basic description of what it would cover. That was actually a good while ago though, but I do remember that it looked pretty good. If there is content missing, let me know.

If you like it, you should check out this link for a list of other Office labs: http://msdn.microsoft.com/virtuallabs/office/default.aspx Notice that there are a few labs that show you more about the XML support in Office 2003 as well.

-Brian

0 Comments
Filed Under:

I still get folks asking me questions about the licensing of the Open XML formats from time to time, and it seems there is a lot of misinformation out there. It's actually been well over 7 months since we made the move away from licensing the formats and instead just provided a general commitment to not enforce any IP behind the formats. The legal term for this new commitment is CNS (covenant not to sue). This allows anyone to develop against our formats without having to worry about patents, and this it's irrevocable (meaning it can't be changed in the future).

The CNS is available up here (http://www.microsoft.com/office/xml/covenant.mspx). Recently we took an additional step to help people who don't want to deal with parsing legal documents, and actually asked an outside law firm (Baker & McKenzie) to do a study for us on both the standardization as well as the CNS. I think any of you folks who've been frightened by some of the FUD that has been spread about the Ecma Office Open XML formats should take a look: (Link)

Some good takeaways I wanted to call out were:

  • "In this case, the CNS is a unilateral statement to the world about Microsoft's future behaviour towards the enforcement of its patent rights contained in the Schema. While the covenant governs Microsoft's future behavior, it is retrospective in effect, applying to any past uses of the Schema that may have been in actual or potential breach of the terms of the preceding Patent License."

     

  • "By stating that the covenant is 'irrevocable', Microsoft has protected users against a change in company policy at any point in the future."

     

  • "The CNS is therefore considerably more favourable to a person relying on it, than any form of patent licence because it does not impose positive restrictions on beneficiaries' activities as a condition of relying on it."

     

  • "Microsoft's CNS is similar to a covenant issues by Sun Microsystems Inc., in September 2005, in respect of any patents that it hold in respect of the Open Document Format ('ODF') for Office Applications (OpenDocument) v1.0 Specification ('Sun's Covenant')."

     

  • "The CNS does not affect users' rights to create their own applications using the Schema specifications. For example, there are no restrictions in the CNS that would prohibit third parties from incorporating the standard into applications they create and distribute in source code form, or for other hardware or operating-system platforms. Such applications, developed by third parties, will generally be subject to separate legal agreements, licences and covenants that the developers of those applications may impose, such as Sun's Covenant in respect of ODF. "

     

  • "Any such restrictions will be determined by the development and licensing practices of the third-party developer, not by Microsoft; and this will be as true for applications developed under the ODF standard as it is for applications incorporating the Open XML Schema standard."

Have a great weekend everyone!

-Brian

3 Comments
Filed Under:

Here's a post from Eric White where he provides some code samples for using XLinq to parse a WordprocessingML document: http://blogs.msdn.com/ericwhite/archive/2006/08/01/685535.aspx

Here's the description of what Eric was trying to accomplish:

"Recently, I had a problem where there wasn't a code testing harness that would do exactly what I wanted. I want to grab my code snippet directly from my word document, compile it, run it, and validate the output.

In more technical terms, I want to parse some WordML to grab text formatted with a given style. Further, I want to put a comment on the first line of the formatted text, and be able to grab the comment. The comment will contain the metadata that tells how to compile and run the code.

My word docs are stored in WordML (which is XML). My experiment was to see how easy it would be to pick apart the WordML using XLinq. This is the result.

First, I needed to see what the WordML looked like. If you open a WordML file, it is saved without any indenting, making it difficult to see the element tags, and the structure of the document. So I used the following program to indent the file: ..."

-Brian

1 Comments
Filed Under: ,

There are now a handful of code snippets from Wouter available over on the openxmldeveloper.org site that you should check out. He's been pulling these together over the past couple weeks, and they are really looking good.

Here's what he has up there so far:

It sounds like his next step will be to get a bit more detailed on the WordprocessingML table details. The OpenXMLDeveloper folks are also asking anyone that has additional snippets they'd like to see to send the suggestions on in: http://openxmldeveloper.org/contact.aspx

-Brian

There is a new article up on the OpenXMLDeveloper.org site showing how to build a basic PresentationML file from text entered into a control. It lets you create numerous slides and specify the title and body for each slide. Check it out: http://openxmldeveloper.org/archive/2006/08/01/424.aspx

There have been some similar scripts out there that folks have posted for creating a WordprocessingML document, but this is the first one I've seen for creating PresentationML documents. Very cool.

-Brian

Well, it was all the way back in March when we had the Office Developer's conference out here in Redmond. It was a lot of fun, as we actually had a track dedicated to the Open XML formats. There were three separate presentations and we were able to get into some pretty good details.

Now, almost 5 months later, we finally have the videos from the conference available online. They are all pretty raw, and definitely unedited, but I figured you might still be interested in taking a look. Here are the three file format talks if you are interested:

  1. FF301—New Office XML File Formats – This is an overview of the file formats that I gave to kick off the File Format track. It's a little over an hour.
  2. FF302—New Office XML File Formats (Schemas) – This talk goes into the three different formats (WordprocessingML; SpreadsheetML; and PresentationML). I have to apologize up front for butchering the presentationML talk. Shawn unfortunately had to cancel his trip up at the last minute and so I subbed in for him. :-)
  3. FF303—New Office XML File Formats (Solution Development) – In this talk, Kevin goes into some of the tools that will be provided to help make programming against the formats a bit easier. This is about 3 months prior to all the code snippets being uploaded though, so there's definitely a lot more content now than when Kevin first gave the talk.

Another interesting talk was: Word 2007 XML Programmability: Data/View Separation and Rich Eventing for Custom XML Solutions - For those of you interested in learning more about content controls and custom XML mapping in Word, you should check out Tristan's talk. He talks about how to set up XML mappings, as well as the powerful additions made to the Object Model for content controls.

-Brian

This is pretty cool: http://www.developwithoutborders.com/Default.aspx

There are a lot of ways that we get involved with charities at Microsoft. I really love how easy they make it is for us to pick from just about any charitable organization out there and specify a certain percentage of our pay that we want to go to that charity. For instance, with just a few clicks on the giving site I can easily give directly to the specific Boys and Girls club where my younger brother works. On top of that, Microsoft matches everything that we give, so I can pretty quickly make an impact on the charities that mean most to me. While I take a lot of pride in my contributions, I always feel like I should figure out ways to give more, and it's always difficult to choose the different organizations.

Well the link I provided above is a pretty cool way that you can help out your favorite charity, and as a result you could win additional awards for them. There is a contest that is going on from now until October 1, and the prizes include hardware, software, service contracts, and of course a lot of potential PR. If you or any of your friends help out with charities you know that PR alone is sometimes one of the most important pieces. Even if you don't win though, it's a great chance to help out.

If you decide to participate, and you build a solution that leverages the new file formats or custom defined schema support, let me know. I'd love to take a look!

-Brian

2 Comments
Filed Under:

OK, forgive the random Sneaker Pimps reference and I promise we will move off this topic of ODF politics we've had the past week or two, but I wanted to call out something that Stephen McGibbon pointed out to me today. He mentioned this blog post he made on Monday entitled Spinning out of Control. Stephen pointed out that in the press release for the ISO approval of ODF, the following statement was made:

Billions of existing office documents will be able to be converted to the XML standard format with no loss of data, formatting, properties, or capabilities. This will facilitate document contents access, search, use, integration and development in new and innovative ways.

Now, I'm not sure if this was just an exaggeration, or if they meant that ideally in future versions of ODF it will be the case. It's clear though that as the spec stands now, it's not the case. There are clearly a number of areas either left unspecified, or specified to a more limited level than what people are already doing today in their documents. I'm not talking about future innovations, but basics that have been around for years. I know that pushers of ODF like to say this is just FUD, but really it's just a fact. Look at the spec. If the goal is to guarantee perfect fidelity with the existing base of Microsoft Office documents (which would be implied by the "billions of documents" statement), then there is still a long way to go.

Now, maybe fidelity with the existing base of Microsoft Office documents was a non goal. In reading through the newsgroups, it's pretty clear that the initial goal of ODF was mainly targeted around fidelity with the existing OpenOffice 1.1 format that was created by Sun. This is stated pretty clearly by David Faure who is a voting member on the OASIS Open Document Technical Committee:

The format is heavily based on the requirements, constraints, and experiences of *Sun* customers and KOffice users and developers though, and nothing says that those requirements are totally different. But for sure we didn't target *Microsoft*'s customers. The art of implying something without actually saying so...

"Almost no material changes" is certainly exaggerated, but yes, ODT is mostly bsaed on OO-1.1, it wasn't completely redesigned;

I think the key here is for everyone to just be clear on the goals. The ODF format is based on Sun's StarOffice, and Open XML was based on the Microsoft Office formats. Both have the goals of being open, both have been submitted to standards bodies, and both have a commitment from the donating companies (Sun and Microsoft) that there will be no licensing restrictions and anyone is allowed to freely use the formats. A big difference though is that the ODF folks took a slightly different approach as far as when to declare draft 1.0 complete. There are even features that OpenOffice supports that aren't yet defined clearly in the spec. The Ecma draft on the other hand pretty clearly defines everything, which then allows people to implement as much or as little of it as they want.

A recent statement that really left me scratching my head around this though was made by Gary Edwards up on Stephen's blog post. You may remember Gary as the guy who was under the impression that there was a mythical binary key in the Office XML formats. Gary is a member on the ODF Foundation and has been talking a lot about the add-in they built to open and save ODF in Microsoft Office. I still haven't had a chance to look at the add-in, as it's been kept pretty secret, but Gary has really promised a lot. Here is what he said on Stephen's blog about ODF not being full fidelity with the existing base of documents:

You're wrong. The OpenDocument Foundation plug-in will deliver near perfect fidelity for ODF documents produced by MSOffice. Our fidelity is near identical to the fidelity achieved when converting MS binaries to MOOX.

Maybe you need to pay more attention to the trials going on in Massachusetts. Oh, that's right. Microsoft isn't participating in those trials. Based on the piss poor fidelity of your translator project, i wouldn't participate either if that was the best i could do.

The truth is that it doesn't matter to us if it's billions of documents or ten documents. If that document can be loaded into any version of M$Office from 1997 to 2007, we can convert it with near perfect fidelity. At least as good as your own conversion within MSOffice to MOOX.

Perhaps you need to worry more about your own credibility than that of the ODF Community. We're doing just fine thank you,

Oh yeah, one other thing. Accessibility add ons to MSOffice work just fine with ODF. There is no performance differential between ODF and MOOX within MSOffice worth worrying about. There is no differential in how accessiblitiy applications are handled. So what was your complaint again?

~ge~

I really don't understand this. First off maybe he isn't aware that the translator project we announced is currently in a very early prototype stage and is completely open source. It will continue to improve over the coming months. I understand people usually expect stuff that we announce to be further along, but we wanted this to be done in the open so anyone could comment and contribute.

I also thought that everyone was in agreement that the ODF format was not yet to a point where it could fully represent the existing base of Office documents, but Gary seems to say their tool can somehow get around this limitation. I don't know how deep Gary has looked into this, but it's simply not possible unless he and the ODF Foundation have already added significant extensions to the ODF standard. I haven't seen these new extensions documented anywhere. The OASIS ODF technical committee claims it's still over a year away from defining spreadsheet functions and tables in presentations, and no mention of solutions to the international numbering issues or even simple things like character highlighting.

Gary also doesn't seem to understand the performance problems with ODF. It has nothing to do with performance once the file is loaded. The problems are with how long it takes to read and write ODF files since they decided to use a generic table model to represent full spreadsheets.

So, while I think the ODF spec is a great representation of the OpenOffice file format, it's just not anywhere close to the Ecma spec in terms of representing Microsoft Office documents. And since we already have billions of documents in that format and hundreds of millions of customers, we absolutely have to keep our focus on the Ecma spec for now. We are also helping to build transformations between the two formats, which really helps to show the beauty of working with documented, open, XML formats.

-Brian

46 Comments
Filed Under:

In my post last week about the lack of table support in ODF, some folks were curious as to why the Ecma Open XML formats have three different table models. I explained that when you are designing a file format, you need to examine closely the target user scenarios of the applications that will use those formats.

Obviously the use cases for a table in a spreadsheet are different from those around tables in presentations or wordprocessing documents. For instance, it's not much of a stretch to imagine a table of data in a spreadsheet with 50,000 rows and 200 columns. That would never happen in a presentation though. A table in a presentation is much more heavily focused on the layout and formatting of the data (same case with a wordprocessing document).

I think a great example of why you often need different table models would be the ODF spec itself. It was too difficult to map their existing table model to the presentation format so instead of working through that issue they instead left it out of the spec. Otherwise the spec would have just stated that the table model applied to all three formats. As it stands right now, the only way to get a table in a presentation is to embed a spreadsheet. The plan is that in V 1.2 (which is still over a year out) they will have support for spreadsheet formulas and presentation tables. One could argue over whether it would have been better to actually finish the spec before submitting it to ISO and creating organizations like the ODF Alliance who purpose is to push for policies that mandate ODF, but maybe that's just me ;-).

Looking at the table models, I do think the ODF guys made a big mistake in the design of their spreadsheet format. They chose to make the table model for wordprocessing documents and spreadsheet documents the exact same (but it looks like it's still different from HTML or CALS). Now I do understand that this level of commonality is the nirvana for most folks and I also had the goal of making the Open XML formats as close to this ideal as possible (it's something we actually looked really hard at doing when we first started working on the Open XML formats). The problem is that for the same reason you often need different user interfaces in the different applications, you also need a different file format at times. Sure there are plenty of concepts that can be reused (such as basic formatting), but a spreadsheet grid is different from a presentation table. Otherwise you're stuck with a format that sells everyone a bit short as it's the greatest common factor of all the applications, and isn't optimized for the unique customer scenarios.

If you've looked at the Ecma spec, you can see that we had to diverge in the table design of for the Office Open XML formats. The use of tables in wordprocessing documents and presentations is very similar, and as a result the table models in those formats are very similar. In a spreadsheet though, you have to account for much larger sets of data, and at that point the efficiency with which you write out that information can have a much more significant impact in the amount of time it takes to actually parse the files.

So, is the spreadsheetML format super easy? Well that depends on who you ask. For people that have developed against the old binary formats, things will be unbelievably easier and more reliable. But for folks who've primarily used table models like HTML, there will be a bit of a learning curve. That's why the file format documentation that we're doing in Ecma is so important. It will empower anyone to program against these files. We could have gone with a more verbose simple table model, but that would have been at the detriment of every user out there. Most people don't care about their file format, they just want things to work. As I said in an earlier post, we had to take the training wheels off, but we're going to be there with you as you learn to ride on your own.

-Brian

20 Comments
Filed Under:

I must be reading this wrong, but in the ODF spec for tables it says the following:

"This chapter describes the table structure that is used for tables that are embedded within text documents and for spreadsheets."

Could it really be the case that ODF doesn't allow tables in presentations? I know that OpenOffice's presentation application "Impress" doesn't allow for native tables, but I had assumed the ODF format wouldn't have the same restrictions as OpenOffice.

Could someone more familiar with ODF help clear this up for me?

-Brian

There are a number of great sources of information on the new Office Open XML formats out there. I like to think that this blog is one of those sources, but there is also the OpenXMLDeveloper.org community, and the Ecma TC45 site. Another great source of information though that I always forget to mention is actually the official MSDN site for Office XML development: http://msdn.microsoft.com/office/tool/xml/default.aspx

There is a lot of good content up there, and it will continue to grow as we move closer to RTM. One article that you guys might want to check out is: “Walkthrough: Word 2007 XML Format” written by Erika Ehrli. It shows you how to build your own WordprocessingML document from scratch, as well as provides an overview of things like the Open Packaging Conventions (parts, relationships, content types), and the custom XML data store.

-Brian

0 Comments
Filed Under:

Doug Mahugh has another post on programmatically generating a basic Office Open XML file. This latest post shows how to create a simple SpreadsheetML file:

This post covers the code for a CreateXlsx program that creates a simple Open XML spreadsheet from scratch using the .NET Framework 3.0 packaging API (System.IO.Packaging), as well as two of the Open XML code snippets that are available on MSDN.  Full source code for this sample is provided in the attached ZIP file.

While Doug's code uses System.IO.Packaging, you could also do the same thing with any XML and ZIP library. There was an example up on OpenXMLDeveloper.org that demonstrated how to manipulate the files using Java code. Since the Open Packaging Conventions are going to be part of the Ecma spec, it shouldn't take much time for folks to build tools like System.IO.Packaging to make it that much easier to develop on top of the file formats.

-Brian

More Posts Next page »