Brian Jones: Open XML Formats

Sample code for generating a SpreadsheetML file

posted Tuesday, July 18, 2006 10:57 AM by BrianJones

Doug Mahugh has another post on programmatically generating a basic Office Open XML file. This latest post shows how to create a simple SpreadsheetML file:

This post covers the code for a CreateXlsx program that creates a simple Open XML spreadsheet from scratch using the .NET Framework 3.0 packaging API (System.IO.Packaging), as well as two of the Open XML code snippets that are available on MSDN. Full source code for this sample is provided in the attached ZIP file.

While Doug's code uses System.IO.Packaging, you could also do the same thing with any XML and ZIP library. There was an example up on OpenXMLDeveloper.org that demonstrated how to manipulate the files using Java code. Since the Open Packaging Conventions are going to be part of the Ecma spec, it shouldn't take much time for folks to build tools like System.IO.Packaging to make it that much easier to develop on top of the file formats.

-Brian

1 Comments
Filed Under: Office 2007, Excel, 3rd Party Tools

Politics behind standardization

posted Friday, July 14, 2006 6:00 PM by BrianJones

Stephen McGibbon has an interesting blog post based on his observations of the politics behind standardization. He's been involved in a lot of the ODF and Open XML discussions and started looking deeper into the reasons behind ODF going through ISO even though it was not yet feature complete.

I think there was a good amount of pressure to wrap things up and say that version 1.0 was ready to go. It will be interesting to see how much further they are able to get with version 1.2. It sounds like there will be a draft of that ready for review next summer and approved by OASIS in fall 2007 (http://lists.oasis-open.org/archives/office/200605/msg00005.html). I'm sure they will be able to make a lot of progress over the next year or so.

As I talked about earlier this week, it doesn't make sense for us (Microsoft) to join the ODF committee in OASIS. In the original post some folks felt that I didn't really clearly provide reasons why this was the case, but lower down in the comments I took another stab at trying to explain the reasons and I think most folks felt it helped to clear things up. Here is what I said (for those of you that don't feel like reading through comments).

"Bruce, Wouter, Simon,
I'm sorry if it appeared like I didn't answer the question. I thought I had actually made it clear why we weren't participating directly in the OASIS committee, but let me try to clear it up.

We ultimately need to prioritize our standardization efforts, and as the Ecma Office Open XML spec is clearly further along in meeting the goal of full interoperability with the existing set of billions of Office documents, that is where our focus is. The Ecma spec is only a few months away from completion, while the OASIS committee has stated they believe they have at least another year before they are even able to define spreadsheet formulas. If the OASIS Open Document committee is having trouble meeting the goal of compatibility with the existing set of Office documents, then they should be able to leverage the work done by Ecma as the draft released back in the spring is already very detailed and the final draft should be published later this year.

To be clear, we have taken a 'hands off' approach to the OASIS technical committees because: a) we have our hands full finishing a great product (Office 2007) and contributing to Ecma TC45, and b) we do not want in any way to be perceived as slowing down or working against ODF. We have made this clear during the ISO consideration process as well. The ODF and Open XML projects have legitimate differences of architecture, customer requirements and purpose. This Translator project and others will prove that the formats can coexist with a certain tolerance, despite the differences and gaps.

No matter how well-intentioned our involvement might be with ODF, it would be perceived to be self-serving or detrimental to ODF and might come from a different perception of requirements. We have nothing against the different ODF committees' work, but just recognize that our presence and input would tend to be misinterpreted and an inefficient use of valuable resources. The Translator project we feel is a good productive 'middle ground' for practical interoperability concerns to be worked out in a transparent way for everyone, rather than attempting to swing one technical approach and set of customer requirements over to the other.

-Brian"

Now, all that said, I think there are still plenty of ways we can help out the OASIS folks with the ODF format. The entire translator project is open source, so the conversion will be completely transparent and everyone will have the ability to benefit from what we discover as the transformations are built. In addition to that, as I've looked through our Ecma documentation, I've also been looking at the ODF spec as a point of comparison. As I come across areas that are either missing, or just not fully specified, I'll be sure to point them out on my blog. That should help them in creating a list of areas to improve.

I think we will really see some good discussions over the summer and fall. The Ecma spec is really getting close to completion, and we of course still have a large number of ways for the public to comment on the spec. Now with the open source translator project we'll all be able to clearly follow along with how the two formats compare and how you can go back & forth between the two.

BTW, for those interested in providing feedback on the Ecma spec, the main way to comment is via this e-mail address (mailto:ecmatc45feedback@ecma-international.org). You can also provide comments here on my blog and I'll pass them on. Of course the best approach though would be to join us in Ecma!

-Brian

7 Comments
Filed Under: Office 2007

New project for converting WordprocessingML to HTML

posted Thursday, July 13, 2006 10:53 AM by BrianJones

It looks like there is a new project starting up on the OpenXMLDeveloper.org community for transforming WordprocessingML into HTML using XSLT. The first article posted here (http://openxmldeveloper.org/articles/333.aspx) starts off just mapping into basic paragraph and text formatting, and it's super simple and straightforward.

I've been told that Sanjay plans to have some future updates to this soon that will start to include other functionality. It would be great to see who else is interested in getting involved here.

-Brian

1 Comments
Filed Under: Word, Office 2007, 3rd Party Tools

Word XHTML - Bullets and Numbering

posted Wednesday, July 12, 2006 3:34 AM by BrianJones

This is the fourth post by Zeyad Rajabi who owns the XHTML output from Word's new blogging feature. In earlier posts, Zeyad discussed a general overview of the XHTML, details on XHML compliance, and how we map styles to semantics. Today Zeyad is discussing the ways in which styles have been directly tied to specific XHTML tags.

Today will be a short post about lists in our blogging feature. Word 2007 provides you with a rich editing experience that allows you to create a multitude of different types of lists, from simple standard one level lists, multi-level lists, to custom defined bullet and numbering lists.

Given the time and resource constraint for our blogging feature we decided to take a more simplistic route with lists. Our blogging feature only outputs two types of lists: unordered and ordered lists (we do not support definition lists). That is, we are only relying on <ul> and <ol> HTML elements to render the look of lists, which will give full power to the host browser for rendering.

For this release of the blogging feature we are not going to output the following CSS properties:

list-style
list-style-image
list-style-position
list-style-type

Not outputting such CSS properties limits the fidelity level we will support for our blogging feature when comparing to the full power of Word 2007 bullets and numbering list feature.

Word 2007 allows for defining custom style lists, such as using strings “Heading 1” and “Heading 2” to depict different levels in a list. Given that we will only rely on <ul> and <ol> HTML elements and not the CSS properties mentioned above, the number of lists supported in our blogging feature will be much less than Word 2007.

Sample Lists

Below is a collection of some example lists and the corresponding HTML output.

Simple flat numbered list

item 1
item 2
item 3

HTML:

<ol>
   <li>item 1</li>
   <li>item 2</li>
   <li>item 3</li>
</ol>

Simple flat bulleted list

item 1
item 2
item 3

HTML:

<ul>
   <li>item 1</li>
   <li>item 2</li>
   <li>item 3</li>
</ul>

Nested bulleted and numbered lists

Level 1 item 1
- Level 2 item 1
- Level 2 item 2
Level 1 item 2
1. Level 2 item 1
2. Level 2 item 2

HTML:

<ul>
   <li>Level 1 item 1
      <ul>
         <li>Level 2 item 1</li>
         <li>Level 2 item 2</li>
      </ul>
   </li>
   <li>Level 1 item 2
      <ol>
         <li>Level 2 item 1</li>
         <li>Level 2 item 2</li>
      </ol>
   </li>
</ul>

Multilevel List

level 1
- level 2
  - level 3

HTML:

<ul>
   <li>level 1
      <ul>
         <li>level 2
            <ul>
               <li>level 3</li>
            </ul>
         </li>
      </ul>
   </li>
</ul>

Nested paragraphs

Item 1
Some text.
Item 2
Some text.

HTML:

<ul>
   <li>Item 1
      <p>Some text.</p>
   </li>
   <li>Item 2
      <p>Some text.</p>
   </li> 
</ul>

Nested paragraphs (w/o spaces)

Item 1
Some text
Item 2
Some text

HTML:

<ul>
   <li style="margin-top:0px;margin-bottom:0px">Item 1
      <p style="margin-top:0px;margin-bottom:0px">Some text.</p> 
   </li> 
   <li style="margin-top:0px;margin-bottom:0px">Item 2
      <p style="margin-top:0px;margin-bottom:0px">Some text.</p> 
   </li> 
</ul>

Comments are welcome

Any comments or questions are welcome.

18 Comments
Filed Under: Word, Office 2007, Word HTML

More information on the Open XML translator and some questions answered

posted Tuesday, July 11, 2006 12:48 AM by BrianJones

There were a lot of great comments from last week's announcement about the creation of an open source project to transform between the Ecma Office Open XML formats and the OASIS OpenDocument format. Rather than respond to all the comments and questions directly, I thought it would be better just to write up another post to address the general themes people have raised.

Here are the main questions:

Will the translator only work with Office 2007?
Aren't there licensing differences that make ODF and Open XML incompatible?
Will the functionality be easy to find in the UI?
Doesn't this move contradict what you've been saying about Office not supporting ODF?
Will the Ecma Office Open XML formats still be the default in Office 2007?
Why don't you join OASIS and help improve where they are lacking?

What versions of Office will this work with?

Well, first you should all remember that we are making the new Open XML formats backward compatible and providing free updates to Office 2000, XP, and 2003 which will allow all three of those versions to consume and generate files in the Open XML format. The new tool that is now an open project up on sourceforge will convert from the Open XML format into ODF, which means that you can use this tool in combination with the free updates to read and write ODF in all those earlier versions of Office as well.

Aren't there licensing differences between ODF and Open XML?

Actually, this misunderstanding is the unfortunate result of a really strong push by folks who I don't believe quite understand the Open XML story. There are a handful of folks who blog a lot (primarily ODF supporters) who aren't up to speed on the latest policies around the Open XML formats.

Let's address this first misunderstanding. The formats are available without any licensing restrictions. Any IP (patents, etc.) that Microsoft may have behind the formats does not apply to folks who want to implement the formats, because Microsoft made a legal commitment to not enforce that IP. If you hear people complaining about licensing issues, they probably just aren't up to date.

Secondly, the formats are no longer owned by Microsoft, they are owned by Ecma international. They are fully documented and the spec is free to download. A large number of organizations (British Library; Apple; Novell; Microsoft; BP; Intel; etc.) have worked on ensuring that the documentation allows for cross platform implementation.

Will the functionality be easy to find in the UI?

Look for yourself, here's a screenshot of the current prototype:

It's directly exposed in the UI. We're even going to make it really easy to initially discover the download. We already need to do this for XPS and PDF, so we'll also do it for ODF. There will be a menu item directly on the file menu that takes to you a site where you can download different interoperability formats (like PDF, XPS, and now ODF).

Heck, if you wanted to be even more hardcore, the Office object model allows you to capture the save event. So if you wanted to you could make it so that anytime you hit save you always used the ODF format, just by capturing the save event and overriding it. I'm not expecting folks to do that, but it does show just how extensible Office really is.

Doesn't this move contradict what you've been saying about Office not supporting ODF?

I've been pretty clear that I thought third parties would come along and build ODF support into Office if there was interest. That was shown to be the case pretty early on, as there have been a couple different projects announced over the past year. Ironically, one of the most high profile projects was announced by the OpenDocument Foundation but it has turned out to be pretty secretive and closed, which seems to go against all the goals of "openness". I've had folks ask me how they can get a hold to it, but as far as I can tell only a select group of folks have been given access. I saw a quote saying they still hadn't decided if they wanted to charge for it or not, so that may still be holding things up.

With all the mystery around projects like that, we had a number of governments ask us to get involved and actually choose a project to back, as they wanted to know that if any of their constituents used the ODF format, they would be able to view those files.

I think this project is a great example of the openness of both of these formats. We are now going to have an open source implementation that everyone can use. It will of course be freely available to anyone, and will really help show how to use the two architectures of ODF and Open XML.

Will the Ecma Open XML format still be the default for Office 2007?

Yes, this is definitely still the case. While this new translator will help people read and write the ODF format in Office, it will also help make it clear to all why the Open XML format was necessary. The Open XML formats were designed to be 100% backward compatible with the existing set of Office binary formats, and that was really a goal that we can't compromise on. If we went with an XML format that resulted in data loss or poor performance, then the only people that would use it would be folks who actually cared about that specific file format. Since most of our users don't really care about file formats, we needed to create an XML file format that we knew everyone could use, otherwise most people would have just gone back to using the old binary formats, and that doesn't help anyone.

While the ODF format is great in terms of being an open XML format, it's lacking in a number of functional areas that make it not a realistic option for Office to use as a default format. For instance, the format for ODF spreadsheets is much less efficient from Open XML's spreadsheet format. I have a few posts talking about this (and I plan to cover it in greater detail as we move forward):

There are also a whole host of areas that are left unspecified in the spec (such as spreadsheet formulas), which would have meant we'd either need to extend the format, or wait for it to catch up (and it sounds like they are more than a year out for formulas in particular). There are a number of blog posts out there talking about the incompatibilities between the various applications that have implemented ODF, and a lot of that is due to the lack of clarity on some features in the spec. Look at this comment from the OpenDocument Foundation talking about KOffice's ODF support:

"Our tests show that OpenOffice and KOffice have some problems opening each other's OpenDocument files. Also, support for drawings is a bit incomplete."

The Ecma Open XML format is significantly further along in all of these areas, just look at the differences in the documentation of numbering formats, formulas, etc. The draft of the Ecma spec released back in the spring has over 160 pages on spreadsheet formulas; the ODF spec only has 1 page.

I don't want to be critical of ODF because I think it's great to see applications use open XML formats for their storage. I'm calling attention to these points because I think a lot of folks have mistakenly assumed that once there is a standardized office format, everything is set and you don't need another one. Unfortunately that's not the case, and I want everyone to understand why we couldn't use ODF as our default format. I have no problem with multiple XML standards for documents and I think this is definitely a case where an alternative is necessary. If a single XML file format were the way to go, then we would have just stopped with XHTML (or maybe DocBook).

Most of our customers actually do understand this, and contrary to the news being spread (primarily by people excited about the ODF format), most governments have not adopted policies around ODF exclusively but instead around open formats in general. Most of those governments have also expressed that once the Office Open XML format is approved by Ecma, it would also be viewed as an open format.

For example, the Belgium government is currently being described as "mandating ODF", but that's actually not the case. They even made a public statement last week after we made the translator announcement that clarified this. Here's a small blurb from that:

"The government’s choice for ODF is clear, but not exclusive." ..."If the OpenXML file format (Microsoft’s own contribution in the domain of open standard file formats) receives ISO approval as a standard, then this format will also be eligible for use in the administration of the Belgian government."

Why don't we join the OASIS technical committee to help them along?

I had a few folks asking this question (and saw it on a few other blogs as well). The standardization of Ecma Office Open XML formats is really moving along well, but there is still a bit more work to do here to nail things down. If you've read through the latest draft (all 4000 pages), you've probably noticed how comprehensive of a spec it really is. For example, there are over 160 pages on how spreadsheet formulas works as compared to 1 page in the ODF spec. The ODF spec still has a lot of catching up to do, and according to this post they are still more than a full year from just getting in line on some of the basics (like formulas) that have existed in office documents for decades.

The Ecma Office Open XML spec on the other hand serves as a great base in terms of fully standardizing an XML format that is capable of representing the billions of Office documents that exist today. Once that's done, we (as a community) can then move forward and start to enhance it with new innovations. It's maintained by Ecma, and anyone can join and participate in the standard.

I think that anyone interested in helping to drive the future of office file formats should join us in Ecma and take advantage of the powerful framework for document formats that is being delivered. As I already pointed out, formulas in spreadsheets for example is already close to being fully documented. The same is the case for all the international features and functionality (like the various numbering styles I'd mentioned before). If you don't have the time to participate directly in the working group, you can instead send direct feedback here: mailto:ecmatc45feedback@ecma-international.org

-Brian

24 Comments
Filed Under: Office 2007

Open XML Translator project announced (ODF support for Office)

posted Wednesday, July 05, 2006 10:03 PM by BrianJones

Today we are announcing the creation of the Open XML Translator project that will help translate between the Office Open XML formats and the OpenDocument format. We've talked a lot about the value the Open XML formats bring, and one of them of course is the ability to filter it down into other formats. While we still aren't seeing a strong demand for ODF support from our corporate or consumer customers, it's now a bit different with governments. We've had some governments request that we help build solutions so that can use ODF for certain situations, so that's why we are creating the Open XML Translator project. I think it's going to be really beneficial to a number of folks and for a number of reasons.

There has been a push in Microsoft for better interoperability and this is another great step in that direction. We already have the PDF and XPS support for Office 2007 users that unfortunately had to be separated out of the product and instead offered as a free download. There will be a menu item in the Office applications that will point people to the downloads for XPS, PDF, and now ODF. So you'll have the ability to save to and open ODF files directly within Office (just like any other format).

For me, one of the really cool parts of this project is that it will be open source and located up on SourceForge, which means everyone will have the ability to see how to leverage the open architectures of both the Office Open XML formats and ODF. We're developing the tools with the help of Clever Age (based in France) and a few other folks like Aztecsoft (based in India) and Dialogika (based in Germany). There should actually be a prototype of the first translator (for Word 2007) posted up on SourceForge later on today (http://sourceforge.net/projects/odf-converter). It's going to be made available under the BSD license, and anyone can provide feedback, submit bugs, and of course directly contribute to the project. The Word tool should be available by the end of this year, with the Excel and PPT versions following in 2007.

There are a few other key points I want to call out:

Choice - It's always great to offer choices to customers, and as most people are aware we already have a number of formats we've already built in Office to meet different customer scenarios. The Open XML formats that are going to be the default in Office 2007 are going to be the most important in my mind. It's the Open XML formats that allow us to build the ODF support (and will open doors to a number of other formats as well). The PDF and XPS functionality would be another example of new formats we're providing this release.
Great example of Open XML development - The project is going to be an open source project located up on SourceForge, so that means anyone has the opportunity to take a look and see how it's done. This should help folks see what challenges are involved mapping from Open XML into ODF, and what tradeoffs will need to be made. We had a tool for the WordprocessingML format from Word 2003 that let you transform it into HTML, but it didn't go the other way. I think this new tool will be another great example of what you can do with these formats.
Interoperability - We've really been focusing on this a lot in the past year. I talked last month about our push towards interoperability by design. There is now a letter from Chris Capossela called "A Foundation for the New World of Documents" that's located up on the interoperability site that I'd encourage you to check out if you're interested in learning more (http://www.microsoft.com/interop).
Big challenges ahead - There are definitely going to be some challenges in this project, but I think that the approach of making it an open process will really help us achieve the best results. One area I'm going to be interested to follow is how to map features that aren't specified in the ODF spec. OpenOffice has actually made the decision to extend the spec in ways that don't actually appear to be allowed (like with numbering formats), and I'm not sure if that's the right way to go. I've seen a lot of problems when moving documents from OpenOffice to KOffice for example, and I'm sure these divergences from the spec don't help out. Is the right thing to extend in the same ways OpenOffice did, or is it best to wait for OASIS to release the next version of the spec and hope that it specifies some of those missing features? Nobody wants a format that's constantly changing, so if you do decide to extend the format like OpenOffice did, what happens when ODF 2.0 comes out and it specifies that feature differently from how OpenOffice did it? What about features that aren't in ODF or in OpenOffice? Should we create new extensions ourselves or just lose that information? It's going to be fun working with everyone to figure this stuff out.

Another cool piece of this is that it will also work in older versions of Office. This is because the tools leverage the Open XML support, and we're providing free updates to previous versions of Office that allow them to read and write Open XML. It's another great benefit of leveraging the Open XML formats for the tool.

So, this should be an interesting 2nd half to the year. We have the Ecma Open XML spec progressing rapidly; Office 2007 coming closer to shipping; and now an open source project to leverage the Open XML formats for interoperability. Sounds like fun... well at least to those of us who care about file formats!

-Brian

64 Comments
Filed Under: Office 2007, 3rd Party Tools

TechEd presentation on Office Open XML available for download

posted Wednesday, July 05, 2006 1:00 PM by BrianJones

For those of you who weren't able to attend my talk at TechEd 2006 on the file formats, you can actually view it online here: http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032297834&EventCategory=5&culture=en-US&CountryCode=US

The quality of the video isn't great, but it actually does a decent job capturing the demos. The background for the PPT slides doesn't come through too clearly though (but it's not really that big of a deal). You'll also have to deal with my mono-tone presentation style, but hopefully the content helps where I fall short. :-)

It looks like you need to provide a bit of information before you can download the video which is pretty lame, but for now I haven't found a way around that. Maybe I'll just take the video and have them post it up on OpenXMLDeveloper.org site so that you don't have to go through the extra hassle.

-Brian

4 Comments
Filed Under: Office 2007, Conferences

Sample code for creating a DOCX file

posted Thursday, June 29, 2006 1:08 PM by BrianJones

Doug Mahugh had a post a couple days ago where he provides some basic code that will generate a WordprocessingML document: http://blogs.msdn.com/dmahugh/archive/2006/06/27/649007.aspx

Some folks have been asking about examples of how to create a basic document with the required relationships, content types, and parts. Doug had actually blogged on Monday that he would put up examples for all three document types, and this is the first of those examples.

Doug is very active on the OpenXMLDeveloper.org site, and it's great to see him pulling some examples together like this. Thanks Doug!

-Brian

0 Comments
Filed Under: Office 2007

Writing managed solutions for Office 2007

posted Tuesday, June 27, 2006 1:47 PM by BrianJones

Patrick Smith has a post up on Dave Gainer's blog talking about the updated PIA story for Office 2007: (2007 Microsoft Office System Primary Interop Assemblies)

-Brian

0 Comments

Back from Sapporo - tons of progress in Ecma

posted Monday, June 26, 2006 5:21 PM by BrianJones

Well, it has definitely been a pretty hectic couple weeks, and it's going to take me awhile to get caught up. I was in Boston two weeks ago for TechEd, and in Sapporo last week for Ecma meetings. Both were great trips, but it's nice to be home. The meetings in Sapporo were extremely productive, and you can actually read all about it in the status report filed by Tom Ngo from NextPage (http://www.ecma-international.org/news/TC45_current_work/TC45-2006-50.htm).

Some of the key things I wanted to call attention to:

U.S. Library of Congress joins Ecma TC-45 - This was really great news. We've already benefited significantly with the participation of Adam Farquhar from the British Library, and I'm really excited to have the Library of Congress on board too. Like the British Library, the Library of Congress cares deeply about archival and is particularly interested in the long term accessibility of the formats.
Progress on conformance definition - We've spent a lot of time debating how to best define conformance to allow for good interoperability while at the same time making it super easy to use just portions of the specification. We resolved a number of issues and I think we're really in a good spot here.
Progress on WordprocessingML issues - We've made a lot of progress working on the initial WordprocessingML documentation and are now able to drill into the various issues logged by the various members of the technical committee. I think everyone was excited as we were able to start closing down some of the older issues.
Java WordprocessingML to HTML converter - Toshiba gave us a demo of a WordprocessingML to HTML converter they've written in Java. I always get excited when I see tools built on top of the new formats. It's really one of the biggest differences between the old formats and the new. We'll see a lot more 3^rd party solutions that were either not possible, or incredibly difficult with the old binary formats.
Schema visualization - Representatives from BP, StatOil, and Essilor went over some ideas for making the documentation and schemas easier to visualize. There are about 4000 pages of documentation right now, and we really want to figure out ways to make it easier to consume.

It really was a great few days, but I wish I'd had more time to explore the area. I lived in Okinawa, Japan throughout most of Junior High and High School, and this was my first time back since then. I really enjoyed Sapporo. The food was great, and of course you can't beat being that close to the Sapporo brewery. Toshiba was an outstanding host.

-Brian

24 Comments
Filed Under: Office 2007

Code snippets for working with the Open XML formats

posted Saturday, June 17, 2006 6:13 PM by BrianJones

This is my first time on an airplane with wifi, so I'm pleased to bring this news to you from somewhere over the pacific.

I've been promising for the past year that one of the big things we're going to do this time around that we didn't do much of with the 2003 XML formats was to provide a whole bunch of examples. The openxmldeveloper.org site is going to be the best place for people to share their experiences and code, but it's also really important that we at Microsoft give examples of how to do various things.

With the 2003 formats, we had every element and attribute documented, but we didn't do a great job of showing how to actually use the formats. This time around, we want to provide examples that will provide good prescriptive guides on how to do various things with the files. We came up with a huge list of what we thought people would like to see, and it was pretty hard to narrow it down.

We now have the first set of examples, and they all work against the Beta 2 version of the file formats (they will also be updated to match the final versions once they are finished in Ecma). You can go grab them up here: http://www.microsoft.com/downloads/details.aspx?FamilyID=8d46c01f-e3f6-4069-869d-90b8b096b556&displaylang=en

The examples leverage the WinFX system.IO.packaging interface, but they could also be mapped to function with other tools (just like the java examples up on openxmldeveloper). You'll probably notice that the examples are some of the more basic ones that we could think of, but it made sense to use these as the starting point. We will most likely start building some more complex ones as well that leverage one or more of the initial examples as building blocks.

There are 40 examples overall, and I'd love to hear what you think. Also let me know if there are other things you'd like to see us add to the list. Kevin Boske who is on the programmability team for Office was tasked with pulling these together, and I'm really appreciative of the work that he and Ken Getz did on these. Kevin already blogged on this earlier today, and there is also mention of them up on OpenXMLDeveloper.org.

-Brian

8 Comments
Filed Under: Office 2007

More on the PDF support in Office

posted Friday, June 16, 2006 4:31 PM by BrianJones

In the comments of my latest post around the legal issues we’re currently dealing with from Adobe around our PDF support in Office 2007 a number of folks were wondering when Adobe would provide their side of the story. Well while I was down in Tech Ed, there was a press release from Adobe that you can view here. I just got back today and was pointed at the official Microsoft response. You should take a look, I think there are some really good clarifying statements.

Adobe said that they view PDF as an open standard that is freely available without any restrictions or royalties required. That’s really great, and it was why we felt there would be no problems when we started the work at the beginning of the project to support PDF output. That’s also what had led to my initial confusion around why our built in support had become a problematic issue for Adobe. Someone in the comments even posted this interview at Wharton with Bruce Chizen:

Knowledge@Wharton: One of the other things Microsoft has announced is the ability to save as PDF in Office 12. This means that, once that happens, non-Adobe technologies are creating PDF in MacOS X, in StarOffice, and on Windows in Office [applications]. Isn't this a challenge to one of your major revenue streams?

Chizen: Maybe. But we don't think so. First of all, it's somewhat flattering that Microsoft has validated a document format that is not theirs, but one that is Adobe's -- which suggests that their customers were demanding that it do so. We had anticipated for many years that the revenue we achieve around PDF creation would, at some point in time, go away. It's an open standard! There are many clones out in the marketplace today that create PDF and compete with Acrobat. What we have done over the last five years is added functionality beyond PDF creation in our product line-up. If you look at Acrobat today it is much more than just simple PDF creation. In fact, we have a product, called Acrobat Elements, that just does PDF creation, and it represents a relatively tiny piece of our overall revenue -- less than one percent. Most customers choose to buy the more feature-rich products, Acrobat Standard and Acrobat Pro, which do annotations, digital signatures, web capture [and so on]. And many customers are buying LiveCycle, the server products for mission-critical workflows. That suggests to me that even though PDF creation will become free with products like Microsoft Office, our revenue streams will continue on. In fact, with more PDFs being created from Microsoft Office, it gives us an opportunity to take those PDFs and do more with them, like building mission-critical workflows around them.

From the latest public statement from Adobe, it appears that they are concerned that Microsoft would one day “extend” the PDF specifications. It looks like this is the root of the problem, and I’m hoping it's just a misunderstanding. We don’t have any plans to extend PDF, and if you think about it... doing so would serve no purpose. We’re only a producer, not a consumer. All we care about is that it’s easy for our users to export PDF, and that the PDF we export looks great in the main PDF viewers out there (otherwise no one would use the feature). I work with the team that built the PDF support and they did an amazing job. It was a lot of work, and they paid extremely close attention to the spec, and even spent a lot of time trying to decide which internal features (such as bookmarks and TOCs in Word) it would make sense to map to the proper PDF constructs. As I’ve said before, the output we provide is far more powerful that what you would get with just a printer driver, as there is an inherit awareness of the structure of the file, and not just the presentation of it. Adobe is actually a participant in the Office 2007 beta program, and if there is any place where they think we haven’t followed the spec properly, we would love to hear about it right away. You should all have the ability to download the Beta so let me know if there is anywhere that you think we’ve either diverted from or extended the spec.

I know that it is common for folks implementing a format to extend it in order to support whatever extra features that application has. Those of you who are developers know that the way you differentiate your product is to innovate and design new powerful features that deliver value to your customers – it’s good to continue improving. This is not the case though for our PDF support, and we have no intention of ever doing so (even though as far as I can tell there is nothing in the PDF spec that limits third party extensions and Adobe has never tried to stop that until now). We’ve publicly stated that we will not extend the spec, and I’m hoping that as long as we can be clear on that then Adobe will change their mind about wanting us to remove the support.

-Brian

26 Comments
Filed Under: Office 2007, PDF

Interoperability by design

posted Wednesday, June 14, 2006 5:41 AM by BrianJones

Today we announced the formation of a new customer council focused on interoperability (how to make technologies work better together). I'm sure you've noticed over time that Microsoft has made a strong commitment to work towards better interoperability, and this is a big step forward in achieving that goal. I personally have focused on interoperability issues for about the past 6 years or so in working on extensible technologies like the object model and both the HTML and XML file formats. It's something I've always viewed as a key piece of our product design, and it's exciting to see more momentum building around this.

Pulling a quote from the press release:

"The council, hosted by Muglia, will meet twice a year in Redmond, Wash. The council will have direct contact with Microsoft executives and product teams so it can focus on interoperability issues that are of greatest importance to customers, including connectivity, application integration and data exchange. Council members will include chief information officers (CIOs), chief technology officers (CTOs) and architects from leading corporations and governments. Representatives from Société Générale, LexisNexis, Kohl's Department Stores, Denmark's Ministry of Finance, Spain's Generalitat de Catalunya and Centro Nacional de Inteligencia (CNI), and the states of Wisconsin and Delaware have joined as founding members."

As I said, we’ve been committed to the idea of "interoperability by design" for quite some time now, but the actual "interoperable by design" initiative was kicked off by Bill Gates last winter (Feb '05). We've heard numerous times from our customers that interoperability is a "key IT priority." When we design our products we look at how they will interact with a large selection of other products and with a wide variety of hardware. We have very large testing matrices in place to help ensure they work. This new customer council will help us in huge ways though as they will be able to identify some real life issues that we hadn't yet thought of (or prioritized high enough). As we identify new issues we can then look to solving those as well.

You see a lot of folks talk about interoperability, but often they just don't mean the same thing. From our perspective it's something we want to build directly into the products so that it just works. Another approach that companies have taken is to talk about it from the perspective of building specific "projects" where consulting is done (for a fee of course <g/>) to wire together a number of separate bits. I've also seen that often companies will talk about interoperability when it comes to areas that they aren't really competitive in, but want to be. This often leads them to push towards less functional and innovative technologies in an attempt to level the playing field. This is a far different approach from what we are talking about, and I want to make sure there isn't any confusion. There were a couple key talking points around this announcement that I really liked, and that is that we're producing "people-ready" and "value-returning" interop solutions and this new council will help us to be even more successful in doing that.

The work we're doing in Ecma is obviously a great example of the "interoperable by design" concept. We've taken a product where one of the key complaints was that the file format was not documented, and not only moved to use open technologies (ZIP and XML), but we're working with a bunch of other companies (including some competitors) to make it a fully documented international standard.

If you want to learn more about interoperability at Microsoft, you should check out the interoperability site: http://www.microsoft.com/interoperability

-Brian

24 Comments
Filed Under: Office 2007

Learn more about Word 2007's support for seperating data from presentation

posted Monday, June 12, 2006 2:51 PM by BrianJones

If you're heading out to TechEd this week like I am, you should definitely plan on attending Tristan Davis' talk on Thursday afternoon that covers the new functionality in Word 2007 for custom XML solutions.

This talk goes into great detail on the true power of XML in Office applications. XML file formats are obviously important, but the really exciting stuff isn't what you can do with the wordprocessing schemas but instead it's with the support for your own schemas. People want their office documents to seemlessly interoperate with business processes and solutions, and custom schema support is the way you can achieve it. With schemas like Open XML and ODF, you are generating wordprocessing documents, spreadsheets, and presentations. With custom defined schema support, you can take it to the next level and instead create invoices, trip reports, product specifications, research reports, pitchbooks, reviews, articles, resumes, applications, etc., etc., etc. There are no limits to the types of documents you can create, as you have the ability to define the schema.

I blogged earlier this year on both the importance of custom defined schemas as well as the new content controls in Word 2007. There is also a new article up on openxmldeveloper that shows some more examples of how to drop your own XML into a .docx file and map the values into the surface of the document.

Here is the description of Tristan's talk where he'll show a number of examples as well as dig into the ways you can leverage Word 2007 to build powerful solutions:

OFC335 Microsoft Office Word 2007 XML Programmability: True Data/View Separation and Rich Eventing for Custom XML

Day/Time: Thursday, June 15 4:30 PM - 5:45 PM Room: 257 AB
Speaker(s): Tristan Davis

Microsoft Office Word 2007 brings a data model that allows data and presentation to be managed separately, extending the structured document concept introduced in Word 2003. This includes significant investments in the support for custom XML data in the new Office Open XML file formats, as well as rich access to that data from within the application. Developers can work directly against the XML data via XML mappings to the Word document, or via embedded InfoPath solutions in the Document Information Panel. In this session, we introduce these new capabilities, then dive into the functionality of the Office XML data store (which provides custom XML storage), and how it can be leveraged to build solutions that will strongly tie Word documents to your business processes.

Track(s): Office System
Session Type(s): Breakout Session
Session Level(s): 300

Hope to see you all there.

-Brian

2 Comments
Filed Under: Word, Office 2007, Conferences

TechEd Boston

posted Friday, June 09, 2006 7:56 AM by BrianJones

Who else is going to be heading out to Boston next week for Tech Ed 2006? I'll be out there for a couple days in the middle of the week. I'm presenting Wednesday from 5:30-6:45, so be sure to swing by and say hi.

Here's the information on my session:

OFC324 Microsoft Office Open XML Formats

Day/Time: Wednesday, June 14 5:30 PM - 6:45 PM Room: 253 ABC

Speaker(s): Brian Jones

Learn about the huge change that will affect the role Office documents can now play in business processes and solutions. Previously the binary formats had meant Office documents were treated more like a "black box," but that is no longer the case, as these open formats allow documents to serve as a first class source of data as they travel through workflow and other business. Document content can now directly integrate with systems new and old. Generation of documents based on business data for up-to-date and accurate rich content is now possible throughout your own solutions. This session delves into schemas, solution code, and numerous examples.

Track(s): Office System

Session Type(s): Breakout Session

Session Level(s): 300

I hope I'll get a chance to see you all out there. It should be a lot of fun!

-Brian

3 Comments
Filed Under: Office 2007, Conferences

Brian Jones: Open XML Formats

News

Archives

Post Categories

Office '12' Blogs

Sample code for generating a SpreadsheetML file

Politics behind standardization

New project for converting WordprocessingML to HTML

Word XHTML - Bullets and Numbering

Sample Lists

Simple flat numbered list

Simple flat bulleted list

Nested bulleted and numbered lists

Multilevel List

Nested paragraphs

Nested paragraphs (w/o spaces)

Comments are welcome

More information on the Open XML translator and some questions answered

What versions of Office will this work with?

Aren't there licensing differences between ODF and Open XML?

Will the functionality be easy to find in the UI?

Doesn't this move contradict what you've been saying about Office not supporting ODF?

Will the Ecma Open XML format still be the default for Office 2007?

Why don't we join the OASIS technical committee to help them along?

Open XML Translator project announced (ODF support for Office)

TechEd presentation on Office Open XML available for download

Sample code for creating a DOCX file

Writing managed solutions for Office 2007

Back from Sapporo - tons of progress in Ecma

Code snippets for working with the Open XML formats

More on the PDF support in Office

Interoperability by design

Learn more about Word 2007's support for seperating data from presentation

TechEd Boston