a 'mooh' point

clearly an IBM drone

OpenXml SDK released as OSS

Yesterday I was notified that the OpenXml SDK had been released as an Open Source Project by Microsoft.

Back in summer 2008 I attended a workshop in Redmond, WA regarding the future support of OOXML and ODF in Microsoft Office. I remember sitting down with one of the PMs at the time – talking to him about what a wonderful idea it would be to release the OpenXml SDK as OSS. I also remember how frustrating it was to be told – amidst between the lines - that “it ain’t gonna happen”.

Now – almost exactly 6 years later, they have finally listened (btw, I am in no way trying to take credit for “making” Microsoft OSS the OpenXml SDK – it was completely their decision to do it). But it does seem to confirm a trend in Microsoft – where the revenue cows (Office, Windows, and Servers etc.) are kept closed, but the tooling around them, the stuff that ties them all together – is with increasing frequency being released as open source.

The OpenXml SDK is released under the auspice of “MS Open Tech” – in other words; Doug Mahugh and friends. Eric White has been an integral part of making this happen. Kudos to all of them from here :-).

The source code is available on github and is free for everyone to look at and download. The license is Apache 2.0 . It will still remain to be seen if they request pull-requests, but I cannot imagine why they should not.

Now, I haven’t had the time to dig into the code in much detail yet, but I will do this in the following weeks. One thing I will look deeply into is the .Validate()-method of the toolkit. It validates the content of the OOXML-document being worked at – oh well, it should do, but if anyone has tried to run a document through e.g. my validator on http://29500.idippedut.dk or Alex Brown’s at https://code.google.com/p/officeotron/ will have found out, that the document – even with a “clean” result from .Validate() is not valid according to the schemas of OOXML. It turns out, that it does not validate against the spec – it validates against the supported functionality of Microsoft Office. Now, that is a completely valid (no pun intended) approach from the SDK, since most working with OOXML at the end of the say need interoperability with Microsoft Office.

But now with the SDK being released to a larger amount of developers, I guess it would be appropriate to expand or “fix” the validation-method. One possible improvement could be to allow validation against a range of XML schemas. Another would be to allow validation after haven processed the document applying MCE to the content. A Third improvement would be to write out dependency of WindowsBase.dll ( and thereby System.IO.Packaging) . I have a theory that the reason why OpenXml SDK is not available on Windows Phone is this exact dll, and it would be nice to be able to manipulate OOXML-documents in memery on WP.

We’ll see what will happen to it in the future – what would you like to have changed in the SDK?

Comments (7) -

Doug Mahugh

Jesper, I think you're entirely within your right to claim some credit for this move! Your feedback in 2008 has come up in more than one of the discussions that led to this release. Smile Interesting thoughts on validation possibilities, and regarding pull requests, yes indeed we're looking forward to those.

Jesper Lund Stocholm

Hi Doug,

Well, I thank you for your kind words Smile.

I'll be looking thru the code in the near future ... can't wait to get to play with it. I also see, that you have received your first pull request. Be sure to kick Chris out of bed and have him make the merge.

/Jesper

Robert te Kaat

Jesper,

The validation feature has helped me a lot in the passed when trying to find out why Word would not open my generated document. However, with the release of the later Word-versions and the OpenXmlSDK to go with it, the validation has become rather 'useless'.

Two major bugs are related to tables:
- The validator finds a lot of styling issues for tables (illegal attributes, etc)
- While manipulating documents I sometimes end up with table cells without content, not even a paragraph. This is illegal and Word will refuse to open the document. The validator however, does not find the problem.

I'm not saying you need to fix these issues as well, but that would be pretty nice! Wink

Regards,
Robert

Eric White

Hi Robert,

I'm one of the maintainers of the SDK - great suggestions!

One thing to pay attention to - the SDK has three options for validation - you can validate for Word 2007, Word 2010, and Word 2013.  For later versions of Word, you need to appropriately set the Word version for validation.

Regarding the validation of tables that contain no content in cells, that is a good catch.  I have added it to my list of enhancements to the validation portion of the SDK.  There are a few cases where the SDK misses issues that make the document invalid, and I definitely want to augment the existing checks.

Feel free to submit issues like this on GitHub.

Just FYI - my first task is to build a comprehensive cross-platform test harness / suite so that we can validate that the SDK going forward works the same as previous versions, and so that we can validate that the SDK on Linux / Mono works the same as under Windows.  While this isn't the most sexy "feature", it is super vital to the project in the future.  I'm not going to make much in the way of changes to the SDK until this task is completed.  Smile

Cheers, Eric

Jesper Lund Stocholm

Hi Eric,

Are there any High-level documents available for how e.g. Validation is performed? Like which steps are taken and in which order?

I am thinking about incorporating the validation of the SDK in my OOXML-validator - that validates against the ISO-standard and not any specific version of Microsoft Office.

Eric White

Hi Jesper,

At the moment, there are no good high-level documents available for how validation is performed.  However, this is a critical piece of information - certainly is on the short list of things to do.

But as I mentioned, first on my list is to create a test suite to validate the existing functionality of the SDK.  Other things on the short list are looking at releasing the productivity tool and other pieces of supporting code.  We also have a couple of critical bugs to address - the SDK is not thread-safe, which leads to problems with deploying on high-activity web sites.

There is an awful lot to do, so we have to prioritize.  I really appreciate, though, the importance of clarifying and enhancing the validation functionality.

-Eric

Robert te Kaat

Hi Eric,

Thank you for the comprehensive answer!

Regarding the empty table cells: I created a document generator based on content controls (similar to the approach suggested by Gray Knowlton: blogs.technet.com/.../...ith-content-controls.aspx). Basically the content controls are placeholders and are replaced during the generation process with content. In table cells, the content controls are the only contents of the cell, thus require special handling. My product (docati.com) supports other constructs as well (foreach, if (condition) which may not render anything at all and require the same special handling. Assuming more people took the approach suggested by Gray Matter, others have probably ran into this issue as well.

My need for validation is just that I want to make sure it opens in Word itself. Currently a document created by Word itself, without messing with it programmatically, doesn't validate. This makes the validation not usable yet. But I'll keep an eye on the github project and register all issues there!

Keep up the great work!

Comments are closed