Welcome to MSDN Blogs Sign in | Join | Help

Sorry to have taken so long between blog posts. I was off in London all last week for the latest face to face meeting of Ecma TC45 at the British Library. It was a great meeting and we made a ton of progress. Unfortunately, I haven't had any time to blog though, and I have to go back on the road next week (San Francisco and New York). The amount of interest in the Open XML formats is really exciting, but it's also a lot of work :-)

Over at the OpenXMLDeveloper.org site, there is a new article that shows how to generate a lightweight WordprocessingML editor with a web front end. This is a continuation to the first article that I referenced a couple weeks ago. I really love the tools that folks are starting to pull together. We saw a couple really impressive demos out in London, and now there is this new article up on the OpenXMLDeveloper site. Another thing that's pretty cool is that some of the other Office developers who've been working on the file formats for years now have started to participate in the discussions up on the OpenXMLDeveloper site. The community has been really great so far, and I expect that after Beta 2 for Office 2007 arrives we'll see a whole lot more activity.

-Brian

Well, I’m off to the next Ecma TC45 face to face meeting where we’ll continue to make progress in the Open XML format standardization effort. This time we’re meeting out in London. Adam Farquhar of the British Library is the Vice-Chair of TC45, and he’s been gracious enough to host the meeting. It will be really great having this event at the British Library were so many invaluable documents are preserved.  Makes you really think about true long term interoperability and longevity of file formats.

It’s been a few years since I’ve been out to the U.K. but unfortunately this trip will have to be all business. I’m not able to bring my wife with me this time so I'm aiming to get back home as soon as I can. I’ll try to blog while I’m out there in London, but I’m not sure how often I’ll actually be online.

-Brian

You can create your very own Open XML editor! Over on openxmldeveloper.org, there is a new article that shows how you can quickly generate an extremely simple editor for creating WordprocessingML files. When I say "really simple" I mean it though. I'm talking about a plain text control where you can insert text, and from that generate a Word document. No rich formatting, no pictures, no tables... just plain text. I love to see tools like this cropping up though, especially given that we aren't even at Beta 2 yet for Office 2007. Here's a link to the article: http://openxmldeveloper.org/articles/OpenXMLDocFromDotNet.aspx

The next steps with this will be to move over to using a rich text control, so that you can actually generate more complicated documents while still just using a basic web page as the front end. I think this is another great example of how these Open XML formats will really change the role that Office plays in business processes. The more platforms and environments that Office documents can play a role, the more powerful of a platform the Office system becomes. That was one of the big motivations for opening up the file formats in the first place.

Here is a screenshot of the text editor:

And here is a screenshot of the resulting Word document:

If anyone plans on letting me know that you could do the same thing with a plain text file, don't bother :-). I realize this is pretty basic but I think this is a great start, and there are already some folks over in the OpenXMLDeveloper community who've talked about taking this to the next level. I'd also love to see one for PresentationML and SpreadsheetML. Like I said... it's a start.

I should actually try to dig up some similar tools I built back when we first started pushing for full blown XML support in Word. We were still working on Office XP, and on the side I was working with a developer on prototyping XML I/O using a converter on top of Word. To show the different things you could do outside of the application once you had an XML format, I built a web front end where you could read through a Word document, navigating it based on the TOC. It had a few basic DHTML controls that let you add comments to various regions of the document at the same time other folks were editing it. You could also check out content at the paragraph level, rather than the document level to make edits. It could transform the documents into HTML, WML, and VoiceML; which helped to give a preview of the possibilities when different devices have the ability to read and write the formats. That was about 6 years ago though so I'm not sure if I can dig it up...

-Brian

I've had a few folks ask me about other industry standard schemas out there like HR-XML, XBRL, DocBook, etc. The out of the box schemas that are supported are WordprocessingML, SpreadsheetML, and PresenatationML. As I've discussed numerous times though, there is also support for custom defined schema, which means you can take your own schemas and work with them in the applications:

So, if you want to work with another schema, you can take advantage of the custom defined schema support. Using that in combination with the Open XML formats allows you to build some extremely powerful solutions.

-Brian

 

 

I've really dropped the ball here over the past several months. I'd been meaning to post some example PresentationML files and give everyone a walkthrough of the format, but I keep falling behind. Sorry folks.

Today, I'd like to give you a basic overview of the PresentationML format. I'll try to keep it fairly high level today, and then hopefully get into more details in future posts. Let's first start with the basic architecture behind PresentationML. Presentations are naturally componentized:

  • Presentations contain Slides
  • Slides contain Shapes
  • Shapes contain Content

In addition to all the design goals that we had behind the Open XML formats, PresentationML was additionally optimized around slide re-use scenarios. It's common practice for folks to reuse slides from multiple presentations when they are creating a presentation of their own. We call this the "slide library" scenario, and we wanted to make sure that PresentationML was designed to accommodate for this. Because of that, it's pretty easy to grab a single slide, and also quickly get at all the other resources that slide uses, so you can pull it out of the larger presentation without losing information.

If you are not already familiar with the open packaging conventions that we use to structure all Open XML files, you should read up a bit. In PresentationML, the file is broken into a collection of parts, that make for a very robust and versatile format. Here is a diagram of the basic parts that make up a PresentationML file, and how they are related to each other (the "Presentation" part in the middle is the start part):

 

"Presentation" part (the root node)

The primary start part or root node of a presentation is usually called "presentation.xml", although if you are familiar with the open packaging conventions, you know that the part name is not significant, and instead it's the relationships and content types that really determine how the file is interpreted.

The presentation part contains information about the presentation itself. It contains the following structural information:

  • Slide lists ( e.g., slides, masters, IDs, custom shows, etc. ) - While the contents for the various slides are stored in seperate parts, the actual ordering information for the slides is stored in the presentation part.
  • Slide sizes (note that this applies to all slides)

In addition to the structural information, the presentation part also contains the following properties:

  • Text Properties ( e.g., embedded font list, Kinsoku settings, etc. )
  • Save Properties ( e.g., flags for embedding fonts, compressing pictures, etc. )
  • Editor Properties ( e.g., flags for using Right-to-Left mode, etc. )
  • Content Properties ( e.g., first slide number for footers, etc. )

Example of editing the "presentation" part

Here's a quick example of how you can modify the "presentation" part to change the order of your slides. Grab the following basic PresentationML file (*note* this file will work with Beta 1 technical refresh, and should also work with Beta 2 when that comes out):  http://jonesxml.com/labs/presentationML1/SlideReorder.pptx

(If you don't have a copy of the beta, here is an equivalent file in the old binary format so you can see what it would look like when opened: http://jonesxml.com/labs/presentationML1/SlideReorder.ppt)

Let's crack the SlideReorder.pptx file open and take a look at what's inside:

  1. You can use any number of methods to get to the start part, but for simplicities sake, let's just add a ".zip" to the end of the file name and open it using a ZIP tool (I'm just using the Windows shell).
  2. Navigate to the "ppt/presentation.xml" part, which is the start part (the way you tell the start part of course is by opening the "_rels/.rels" part and from there you'll see a pointer to the presentation part).
  3. If you are using a ZIP tool that let's you directly edit the files within, open the "presentation.xml" part in an XML editor or text editor (if you can't edit it directly, just copy it out, and then make the edits). I prefer using an XML editor that let's you "pretty print" the files, otherwise they are a bit difficult to read through (see this post for more info on that).

Here is what the XML for "presentation.xml" looks like:

<p:presentation xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/3/main">
    <p:sldMasterIdLst>
        <p:sldMasterId r:id="rId1"/>
    </p:sldMasterIdLst>
    <p:notesMasterIdLst>
        <p:notesMasterId r:id="rId5"/>
    </p:notesMasterIdLst>
    <p:handoutMasterIdLst>
        <p:handoutMasterId r:id="rId6"/>
    </p:handoutMasterIdLst>
    <p:sldIdLst>
        <p:sldId id="256" r:id="rId2"/>
        <p:sldId id="257" r:id="rId3"/>
        <p:sldId id="258" r:id="rId4"/>
    </p:sldIdLst>
    <p:sldSz cx="9144000" cy="6858000" type="screen"/>
    <p:notesSz cx="6858000" cy="9144000"/>
</p:presentation>

And the relationship file for the presentation.xml part looks like this:

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps" Target="viewProps.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide2.xml"/>
    <Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps" Target="presProps.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide1.xml"/>
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideMaster" Target="slideMasters/slideMaster1.xml"/>
    <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/handoutMaster" Target="handoutMasters/handoutMaster1.xml"/>
    <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/notesMaster" Target="notesMasters/notesMaster1.xml"/>
    <Relationship Id="rId10" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/tableStyles" Target="tableStyles.xml"/>
    <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slide" Target="slides/slide3.xml"/>
    <Relationship Id="rId9" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
</Relationships>

Now, if you want to reorder the slides, you can either modify the relationship file, or modify the presentation.xml file. Let's leave the rels file alone, and instead just change the order in the presentation.xml file. Modify it so that it now looks like this:

<p:presentation xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/3/main">
    <p:sldMasterIdLst>
        <p:sldMasterId r:id="rId1"/>
    </p:sldMasterIdLst>
    <p:notesMasterIdLst>
        <p:notesMasterId r:id="rId5"/>
    </p:notesMasterIdLst>
    <p:handoutMasterIdLst>
        <p:handoutMasterId r:id="rId6"/>
    </p:handoutMasterIdLst>
    <p:sldIdLst>

        <p:sldId id="258" r:id="rId4"/>
        <p:sldId id="257" r:id="rId3"/>
        <p:sldId id="256" r:id="rId2"/>

    </p:sldIdLst>
    <p:sldSz cx="9144000" cy="6858000" type="screen"/>
    <p:notesSz cx="6858000" cy="9144000"/>
</p:presentation>

Now, if you open the .pptx file, you should see that we've simply reversed the order of the slides. It's a very basic example, but I think it serves as a pretty good first look at the structure of PresentationML.

Go ahead and play around with that a bit, and let me know if you have any questions. The next concept I'll cover is the slide content.

-Brian

Here's another example of folks getting ready to take advantage of the Open XML formats for their business solutions. The newly announced BioIT alliance (http://www.medadnews.com/News/Index.cfm?articleid=328651) was formed to help connect the pharmaceutical, biotechnology, hardware and software industries. As you can imagine, Open XML formats can play a huge role here. Check out this quote:  

"Through the BioIT Alliance, we are working closely with Microsoft to increase data access across our instrument systems and data analysis software tools using Ecma Open Office XML," said Catherine M. Burzik, president of Applied Biosystems. "This format enables life science companies to access data using the familiar Microsoft Office Excel(R) interface, providing them with the insight they need to make decisions more quickly."

As I said, this is yet another example of how these new Open XML formats really change the game when it comes to interacting with Office applications. In this case, they can use the SpreadsheetML format to automatically generate data in a much richer, interactive format with no vendor lock-in or worry about long term archivability. This particular organization was founded by the following members: Accelrys Software, Affymetrix, Amylin Pharmaceuticals, Applied Biosystems and The Scripps Research Institute.

I'm expecting that after Beta 2 ships, we'll see more and more of these examples up on the openxmldeveloper.org site.

-Brian

I've had a few folks ask me about the XML format from Word 2003, and whether or not it would be supported in Word 2007. I mentioned this back in the fall, but in case you missed it let me repeat that the Word 2003 format will continue to be supported in Word 2007.

There are a ton of folks out there who have already built solutions on top of the Word 2003 XML format, and those will continue to work. Everyone can decide for themselves whether they want to port those solutions forward into the new Open XML format, or keep them in the 2003 XML format. The new Open XML format is largely based on the Word 2003 XML format, so you'll see a lot of similarities.

One of the benefits of the Open XML format over the Word 2003 format is the number of Office versions that will support it. As you all know by now, the new Open XML formats will work in Office 2000, XP, 2003, and 2007; while the Word 2003 XML format will work in Word 2003, and 2007.

The XML formats in Word 2007 will be:

  1. Word document (.docx) - This is the default, and it's in the Open XML format
  2. Word macro-enabled document (.docm) - This is the macro-enabled version of the Open XML format
  3. Word XML document (.xml) - This is a single XML file that is a serialized version of the Open XML format
  4. Word 2003 XML document (.xml) - This is the exact same as the XML format that Word 2003 supported

We also will continue to support opening anyone else's XML files as well, just like we did in Word 2003. Here's an entry I made back in the summer about opening your own XML files in Word: http://blogs.msdn.com/brian_jones/archive/2005/08/16/452478.aspx

-Brian

I just saw that Document Sciences has also joined the OpenXMLDeveloper.org site (story here). It's been a pretty fun week seeing all the folks joining the community. It's pretty suprising that after only a week we already have over 250 registered users and over 40 companies.

I'm really excited about the value this site will bring to folks, especially once the public Beta 2 comes out in the next few months. There are already a number of good articles and discussions up there, and it's only been a week :-)

-Brian

 

As Doug mentions here, there is a brief video where a few of us were talking last week about the creation of the OpenXMLDeveloper.org site. It's something we'd been thinking about doing since I first started blogging last summer. At the time there just wasn't enough information out there to justify the site, but now that we have Beta 1 and Beta 1TR out the door, as well as the first draft of the Ecma spec, I think there is plenty to discuss!

I've got a few more hours before the UW game, which will hopefully be the start of a great weekend! I hope everyone has a great weekend, and thanks again to everyone who came out to the developer conference this week, we all had a great time.

-Brian

I always forget how much fun these developer conferences can be. On Tuesday, we had an entire afternoon focused on the Open XML formats. There were three presentations focusing on different aspects of the formats, followed by a couple hours of food and drinks at the conference. After that, a bunch of us headed over to a poolhall in Bellevue to play some pool, drink some beer, and talk about the new openxmldeveloper.org community.

My first presentation on Tuesday was pretty much an overview of everything that we're doing with the Open XML formats, and we had a huge turnout. We were in a really big conference room, which was actually made up of two rooms with the separation wall removed to form one large room (as you can see from the picture):

The crowd just seemed to keep growing as we first talked about the big picture of why XML is so important; then drilled into the details of the various schemas; and closed with Kevin Boske's talk about developing on top of the formats to allow for easy server based consumption and generation of Office files. I had tons of people come up to me after the talks who where just blown away by what we've now made possible, and they couldn't wait to start working with the Beta. This is going to be so much fun over the coming years! I've never seen so much excitement this early on the the product cycle. A lot of the folks had already used Office 2003's XML support, and that has helped increase the level of awareness and knowledge.

It's so much fun to get a chance to talk to all of you. I've had so many great questions over the past couple days, and I'm sorry I won't be able to make it out to the "ask the experts" table today at lunch (we have our weekly Ecma TC-45 phone conference from 1-3). I'll try to make it over there at some point today though. I was really impressed with how many people already have an intimate knowledge of the formats just from working with the Beta 1. Now that the technical refresh for Beta 1 is out, people should be able to do even more with the XML support.

I had some folks come up who already had large production level systems using the WordprocessingML support from Office 2003. They had some good questions around how they could best migrate these solutions into the new Open XML formats. As I told them, the old XML formats from Office 2003 will still be supported in Office 2007, so they don't have to change if they don't want to. The new Open XML formats though are largely based on those earlier schemas, so migrating forward to use the new formats shouldn't be too difficult (and we'll provide much more documentation around this once we get closer to solidifying the Open XML formats in Ecma).

-Brian

 

There exist today billions of documents in the Office binary formats. With the move to the standard Open XML formats as the default for Office 2007; the free updates allowing Office 2000, XP, and 2003 to also support those formats; and the participation of other software companies like Apple in the Ecma standardization efforts, it's only a matter of time before there are billions of files in the Open XML format. You have the opportunity to be one of the first to learn, develop with, and provide solutions for these formats that will soon be everywhere (which is probably why you've been reading my blog)!

It's the first day of the Office Developer's Conference out here in Redmond, and I couldn't think of a better time to announce the formation of the openxmldeveloper.org community. I've been waiting for us to get this together for a long time now, and announcing it in conjunction with the Office Developer Conference gives a great opportunity to get a lot of the existing Office developers involved. There is a huge growing community of folks developing solutions on top of Office, and the XML support we've been rolling out over the past 6 years has really helped build that community. I already mentioned last year that we found there were about a million folks around the world developing on top of Office, and of those people, 1/3 were leveraging the XML functionality. That means there were 330,000 developers leveraging the Office XML support, and that number continues to grow larger and larger!

The group is completely free and open to anyone. There will be tons of content, code snippets, even free tools for working with the files. I'm especially excited about getting a broad range of folks that work on all different platforms. The more diverse the community, the more interesting and valuable the discussions will be. As time goes on, I hope we will see tools for working on all different kinds of platforms.

Ever since I started blogging last summer, it was clear that we needed to get a community organized to help fuel technical discussions around the Open XML formats. There is just no way I can stay on top of all the questions people have been sending, and I'm really excited to have a site that supports more active discussions. I'll obviously continue to blog, but now my blog won't be the only place you can go to get information or have discussions about the Open XML formats.

There are already 39 organizations from all over the world involved in the community. Here's a list of the founding members (hope I didn't miss anyone):

It should be a lot of fun over the coming years as more and more people start building solutions for these formats cross platform. Already today up on the site there is an article on the site that shows how to create a Word document from scratch in Java, while another article shows a real-world example of using the WinFX System.IO.Packaging API to work with embedded documents. This should be a lot of fun. I hope I'll see you all join me there as some of the first members of this new community.

-Brian

This should be an awesome week. We have a ton of folks out here in Redmond for the Microsoft Office Developers Conference. If you didn't recieve an invitation, but would still like to follow what's going on, you can sign up for the live webcast tomorrow morning of Bill Gates and Kurt DelBene's presentation: http://go.microsoft.com/fwlink/?LinkId=63190

We're going to have a track tomorrow afternoon focusing specifically on the file formats. For those of you coming out for the conference, check out the following talks:

  1. FF301 Open XML Formats (Overview) - Brian Jones - In this first session I'll give an overview of the formats and drill into some of the ways we are already taking advantage of them for solutions within Microsoft.
  2. FF302 Open XML Formats (Schemas) - Shawn Villaron; Tristan Davis; Chad Rothschiller - The 2nd hour will have seperate drill downs into the the schemas for Word, PPT, and Excel files. We'll probably spend about 20 minutes on each
  3. FF303 Open XML Formats (Solutions) - Kevin Boske - In this final session, Kevin will show a number of examples of how to build solutions on top of the Open XML formats. He has a number of really sweet demos.

I'll also aim to be at the "ask the experts" lunches Tuesday - Thursday so swing by the File Format table and say hi if you get a chance.

-Brian

 

Since Beta 1, the Office team has been hard at work refining the new UI. Yesterday they revealed the latest work, and for those of you on Beta 1, you'll notice that things are much more crisp. If you haven't seen any of the screenshots yet, you should check out Jensen's post: http://blogs.msdn.com/jensenh/archive/2006/03/09/547281.aspx

Jensen has a bunch of screenshots and if you haven't been to his site yet, I highly recommend it. He puts me to shame with the amount of content he posts (he posts a new article every day). Here's one of the screenshots from the new Excel UI (notice that a different skin is applied too):

If you're interested in finding out more information about Office 2007, as well as signing up for the upcoming Beta 2, you should visit the office preview site: http://www.microsoft.com/office/preview

-Brian

Kevin Boske has another post on using the System.IO.Packaging assembly to work with the new Office Open XML formats. He shows how you can quickly open the package and navigate the relationships to find a specific part, then remove that part. http://blogs.msdn.com/kevinboske/archive/2006/02/22/537439.aspx

This is great if you want to clean up a file. Let's say you want to always remove comments from a document before it's posted on the web. Or maybe you want to remove the VBA from a macro-enabled document before users have access to it. These new open formats give you a lot more control over document entering and leaving your environment.

Kevin also stresses a point I've brought up before, and that's related to part names. It's important to remember that in Office we rely on the relationships to navigate the packages. We never reference a part directly (always via relationships), and as a result you can't rely on part names when you are inspecting a file. You need to use the content types part and the relationship files.

-Brian

Here's a question I got from someone wanting to store alternative formats in the files:

Want to know if alternative formats can also be stored in the same office XML zip format. E.g., Is it possible that the XML file format also stores the 2003 binary format as an alternative? Or a pdf version of it along?

This would provide the user another level of backup if some part of the XML format is corrupt.

Does the current Open XML schema allow such inclusions?

I have access to the beta 1, but I could not create such documents. Do you see it as a possibility in near long term (office 12 GM)

If yes, how would the Office UI react to opening such documents?

This is actually an issue we thought through a lot at the beginning of the project. One of the advantages of the packaging model we chose was that it's easy to add additional content to the files. This could be used in a number of different scenarios. One potential use would be to do something similar to the binder (binder was was an old feature in earlier versions of Office that let you take multiple files and bind them up into one "project file"). Another case (which is what this question was about) is the ability to take just one document, but embed multiple representations so that viewers could choose which format they best support.

For Office 2007 at least, we aren't planning to support either of these scenarios. That said, there is nothing in the format to prevent a solution provider from extending Office to output alternate representations. Office would have no problem opening the file either as long as the proper relationships were set up in the package. For example, someone could capture the save event, and in addition to saving the file, programmatically do a save to PDF and put that output into the package as well.

-Brian

More Posts Next page »