a 'mooh' point

clearly an IBM drone

OpenXml SDK released as OSS

Yesterday I was notified that the OpenXml SDK had been released as an Open Source Project by Microsoft.

Back in summer 2008 I attended a workshop in Redmond, WA regarding the future support of OOXML and ODF in Microsoft Office. I remember sitting down with one of the PMs at the time – talking to him about what a wonderful idea it would be to release the OpenXml SDK as OSS. I also remember how frustrating it was to be told – amidst between the lines - that “it ain’t gonna happen”.

Now – almost exactly 6 years later, they have finally listened (btw, I am in no way trying to take credit for “making” Microsoft OSS the OpenXml SDK – it was completely their decision to do it). But it does seem to confirm a trend in Microsoft – where the revenue cows (Office, Windows, and Servers etc.) are kept closed, but the tooling around them, the stuff that ties them all together – is with increasing frequency being released as open source.

The OpenXml SDK is released under the auspice of “MS Open Tech” – in other words; Doug Mahugh and friends. Eric White has been an integral part of making this happen. Kudos to all of them from here :-).

The source code is available on github and is free for everyone to look at and download. The license is Apache 2.0 . It will still remain to be seen if they request pull-requests, but I cannot imagine why they should not.

Now, I haven’t had the time to dig into the code in much detail yet, but I will do this in the following weeks. One thing I will look deeply into is the .Validate()-method of the toolkit. It validates the content of the OOXML-document being worked at – oh well, it should do, but if anyone has tried to run a document through e.g. my validator on http://29500.idippedut.dk or Alex Brown’s at https://code.google.com/p/officeotron/ will have found out, that the document – even with a “clean” result from .Validate() is not valid according to the schemas of OOXML. It turns out, that it does not validate against the spec – it validates against the supported functionality of Microsoft Office. Now, that is a completely valid (no pun intended) approach from the SDK, since most working with OOXML at the end of the say need interoperability with Microsoft Office.

But now with the SDK being released to a larger amount of developers, I guess it would be appropriate to expand or “fix” the validation-method. One possible improvement could be to allow validation against a range of XML schemas. Another would be to allow validation after haven processed the document applying MCE to the content. A Third improvement would be to write out dependency of WindowsBase.dll ( and thereby System.IO.Packaging) . I have a theory that the reason why OpenXml SDK is not available on Windows Phone is this exact dll, and it would be nice to be able to manipulate OOXML-documents in memery on WP.

We’ll see what will happen to it in the future – what would you like to have changed in the SDK?

ODF 1.2 in ISO (PAS-submission)

The other day a document landed on my "desk" in the Danish Mirror committee to JTC1 SC34. It was the document ”EXPLANATORY REPORT - OASIS Submission of OpenDocument v1.2 to ISO/IEC JTC 1 [JTC1 N12033]”. In other words: ODF TC in OASIS has wished to elevate OASIS ODF 1.2 to an ISO-standard.

OASIS is a so-called "PAS Submitter" to ISO, which enables more or less direct elevation of existing standards to be an ISO-standard, but without any corresponding work on the standard itself in ISO. OASIS ODF TC has used this process for ODF 1.0 back in 2006.

So ODF will not be maintained in ISO - and agreement has been made that work itself developing and improving ODF will be exclusively done in OASIS ODF TC - with subsequent releases to ISO for approval ... what someone might refer to as "rubber-stamping".

For OOXML a different agreement was made during sumission/approval of OOXML - that being that work with OOXML takes place in ISO and this is where the standard is developed and improved. That being said, a substantial amount of work with OOXML takes place in ECMA and a large amount of our work in ISO originates from ECMA's TC45 - the group where OOXML was "born". ODF 1.2 was approved in OASIS in september 2011 (and publicized in early 2012), and now three years later it has landed on our desk in ISO.

I immediately wrote to Danish Standards and told them, that I suggest that Denmark votes "yes". Technically this is not a task for the mirror committee to SC34 since the vote is on JTC1-level, but the more I think about it, the more I doubt that I made the right suggestion to Danish Standards.

Because does it add any value for anybody to - three years after approval in OASIS - ask for an ISO approval? Is there any good reason to spend time on this in OASIS and in ISO? If the argument is that there are some (governmental) institutions that require an ISO-standard level to use it in their organisations/countries - what good is it to them to wait more than three years to submit it to ISO for approval?

Will you be my friend too?

The other day I noticed that the traffic on my blog had increased dramatically. I couldn’t really understand why (I have not written anything substantial in quite some time) – all I could see was that the majority of the visitors originated from Google. The search results were mainly “Jesper Lund Stocholm”, “Alex Brown” and “OOXML” … which told me just about nothing at all.

But then I noticed that a few of them also included the term “boycottnovell” and “techrights”. This lead me to check out the feed from #boycottboy’s website – and behold – I was actually mentioned in one of his defamatory articles.

The article is this: http://techrights.org/2010/09/11/sc34-is-still-a-farce/ .

The quotes start like this:

Weir is then met by opposition from Brown’s longtime right-winger, the ODF-hostile Jesper Lund Stocholm [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. He is a known Microsoft booster and Weir’s responses to him go like this:

I wonder if I shall refer to myself from now on as “Goose” or similar. Alex is clearly the Maverick here.

Now – and this is where I envy Rob – #boycottboyquotes a conversation I had with Rob (which I clearly won, btw :-) ) – but conveniently leaves out everything I wrote. It seems to me that Rob and #boycottboy live in some sort of symbiosis – each benefitting from on another. #Boycottboy clearly regards every little thing Rob writes to him as a “badge of honor” – regardless of the content itself. And Rob very much benefits from #boycottboys critique-less c/p of his comments/articles … with the infamous “#boycottboy reality distortion field” applied.

I wish I had a friend like that. I wish I had a friend that would blindly recite every syllable coming from my lips – without any sanity-check at all.

I totally get why #boycottboy needs Rob – but why IBM’s chief ODF Architect (elsewhere known as “ODF’s one-man-army”) needs someone like #boycottboy is beyond me.

And you know what the silver lining is here? I actually benefit greatly (and not only in a monetary sense) from being posted on #boycottboy’s site. The increased traffic from techrights.org keeps the fire burning and it is almost always a certainty that someone will write to me to invite me to speak on document formats at conference, potential customers or even political parties.

So please keep it up, you two – it helps me avoid the wife and kid going to bed hungry at night.

Moving towards OOXML(S) (update)

Some time ago I wrote about some of the enhancements of Microsoft Office in terms of how far they have made it in implementing the content of the conformance profile "Strict" or "<S>". As you might recall, I made a run-through of a list of feature areas and marked each with either a green, yellow or red traffic light. There were no red traffic lights, but some areas had a yellow marking. These were

  • "ink"
  • "legacy diagrams"
  • "groups"
  • "form controls"
  • "activeX objects".

These document types previously used VML as containing frame etc, but Microsoft Office 2010 was now supposedly using DrawingML for these. The reason for them being yellow and not red was that I did simply not know how to test these things - either because of poor Microsoft Office skills or lack of proper hardware ("ink" is used on tablet PC's and I don't have one of those at hand).

Stockholm plug-fest

When WG4 met in Stockholm a couple of months ago, I got a chance to take a look at the documents I couldn't create myself. The cool thing about participating in these meetings is that there is an abundance of different hardware and software on the laptops of the delegates, so after one of the sessions a few of us had our own little "Microsoft Office OOXML <S> interop plug-fest" and I finally had a chance to get my hands on those files.

I could have simply updated the previous article with the new information, but a couple of interesting thing emerged that made me write up a new piece.

First, the results are this:

File typeFeatureComment 
DOCX Ink Drawings Previously used VML, now uses DrawingML
Success, green traffic light
XLSX Ink Drawings Previously used VML, now uses DrawingML Success, green traffic light
PPTX Ink Drawings Previously used VML, now uses DrawingML Success, green traffic light
DOCX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
XLSX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
PPTX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
DOCX Drawing Shapes Previously used VML, now uses DrawingML Success, green traffic light
DOCX Textboxes Previously used VML, now uses DrawingML Success, green traffic light
DOCX WordArt Previously used VML, now uses DrawingML Success, green traffic light
DOCX Groups Previously used VML, now uses DrawingML Success, green traffic light
XLSX Form Controls Previously used VML, now uses DrawingML - except on "chart sheets" Success, green traffic light
XLSX ActiveX Objects Previously used VML, now uses DrawingML Success, green traffic light
PPTX ActiveX Objects Previously used VML, now uses DrawingML Success, green traffic light
XLSX OLE Objects Previously used VML, now uses DrawingML Success, green traffic light
DOCX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
XLSX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
PPTX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
XLSX ISO-dates Can persist dates in ISO-8601 format and avoids the "evil" serial dates. Failure, red traffic light

The trained eye will notice that all the yellow lights have been replaced by green lights - In other words, the list above clearly shows that even though Microsoft Office 2010 does not write <S>, the developers in Redmond have clearly made some significant progress.

There are a couple of interesting points about the technicalities of the files I looked at.

Predicting the future is difficult

The files containing "legacy diagrams" stand out, because of the way Microsoft Office 2010 breaks compatibility with e.g. Microsoft Office 2003 and earlier versions. The thing is - when loading a PPT-file with a legacy diagram from e.g. Microsoft Office 2003 the diagram will be in VML-format. When it is loaded in Microsoft Office 2010, modified and saved again - it won't save the diagram in VML. It just won't. The diagram will be saved all right - but now using DrawingML instead of VML. So this is essentially a case where interoperability with this "legacy" application is hurt since Microsoft Office 2003 has no idea what to do with the DrawingML it loads.

MCE to the rescue

For all the other files, MCE once again steps up.

If we look at the file containing the ink notations, (some of) the markup will look like this:

[code:xml]<mc:AlternateContent>
  <mc:Choice Requires="wpi">
    <w:drawing>
      <wp:anchor>
        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
        <a:graphicData uri="http://schemas.microsoft.com/office/word/2010/wordprocessingInk">
      </wp:anchor>
    </w:drawing>
  </mc:Choice>
  <mc:Fallback>
    <w:pict>
      <v:shapetype >
        <v:stroke joinstyle="miter" />
        <v:formulas>
          <v:f eqn="if lineDrawn pixelLineWidth 0" />
        </v:formulas>
      </v:shapetype>
      <v:shape id="Ink 15">
        <v:imagedata r:id="rId6" o:title="" />
      </v:shape>
    </w:pict>
  </mc:Fallback>
</mc:AlternateContent>[/code]

(I have modified and trimmed the real XML for easier reading - especially the VML was really, really ugly)

So with this approach you can actually have the best of two worlds - the new and the old without losing information. That is - if you know MCE, of course. This again shows what a great tool alternating content blocks (ACB) of MCE are for this task. It allows you to innovate while still making it possible to ensure some sort of compatibility with earlier programs that did not know of the new technology.

And what about them dates?

The even more trained eye will have noticed that the green traffic light for ISO-dates in SpreadsheetML has been downgraded to a flashing red traffic sign.

The first test I did with Microsoft Office 2010 CTP1 had a small check-box in the "backstage" area that would allow dates in spreadsheets to be persisted in ISO-8601 format. With RTM of Microsoft Office 2010 this check-box is gone so we are now back to using serial dates again.

It would be easy to hit on Microsoft for removing this check-box, and I am sure that many will. But the truth is that the removal of this feature is due to activities in WG4 where we maintain OOXML.

As some of you may recall, the introduction of ISO-dates in OOXML was done in Geneva at the BRM in those hectic days we spent there. The trouble with introducing the ISO-dates in OOXML was that it looked really, really good on paper - but it sucked in real life.

The reason is that we "forgot" to change namespace name for ISO OOXML-files so documents conforming to ISO OOXML<T> share namespace name with documents conforming to ECMA 376. This has has enormous consequences for spreadsheets and those applications designed to support ECMA-376 but not necessarily ISO OOXML. At the second F2F of WG4 in Prague in 2009, we had a demonstration of how bad is was - not a single application would interpret these new dates correctly and - what was perhaps even worse - they did not display any warnings to the user.

We have been discussing this a lot in WG4, and in the end we decided to start the work to remove usage of ISO-dates from Part 4. This correction of a BRM decision was not easy to agree on (AFAIR it has not been finally approved as of yet) and the removal of the "Save-as ISO-dates" feature in Microsoft Office 2010 is propably in anticipation of this pending removal of ISO-dates from <T>. I think it might be important to note that this removal was not due to "pleasing Microsoft". In fact - they had already implemented support for this in CTP1. We are removing ISO-dates from Part 4 due to problems with everybody else.

I always like to give credit where credit is due, and I think this is one of those cases. Microsoft has clearly worked with - and listened to - the standardisation community and has chosen to remove a feature they had already implemented.

So what's next?

Well, Doug Mahugh recently wrote about the approach of Microsoft when dealing with OOXML<S>. Amongst other things he wrote that Microsoft Office 2010 will have read-support for OOXML<S> but that "a small number of optional features" will still be lost (that's just new-speak meaning "we haven't implemented support for all of Part 1"). I asked him what that list consisted of, and Doug said they'd provide anwsers when they have them. I hope that list will come soon.

As I have mentioned earlier, CIBER Denmark A/S (the company I work for) is not in the "productivity-suite-business" - but we develop solutions that work with these suites be that Microsoft Office, OpenOffice.org, iWork or others. Having read-support for OOXML<S> in Microsoft Office 2010 helps us a great deal, because we can now start trimming our code to target OOXML<S> instead of OOXML<T>. We think that adds great value to us and our customers. But we need a definitive list of the areas where we can expect Microsoft Office 2010 to ignore our markup. If we can't have that we are forced to go the safe-route and keep producing OOXML<T>-files and we'd hate to do that. But without a list from Microsoft, we feel that our hands are tied behind our backs.

So please, Microsoft - give us the list ASAP. Otherwise the uncertainty of what Microsoft Office will ignore is to great a risk for us to start producing strict files and your read-support for OOXML<S> is more or less useless to us.

Correct according to spec or implementation?

In the recent SC34 WG4-meeting in Stockholm, validators quickly became the talk of the town - so to speak. As I am sure you all know, Alex Brown made the office-o-tron some time ago - a validator targeting both ODF and OOXML in their ISO-editions. A few weeks ago I myself made a validator - but mine only targets OOXML in its "latest-improved-and-approved-transitional-version". Alex Brown's is written in Java and mine is written in C# .

Anyways - both Alex and I had some lengthy discussions with Microsoft about our validators and the errors they report. The thing is - there is a bug in the OOXML-specification dealing with how to specify relationship type for the part containing document properties like "author", "created on", "last modified on" etc. This part is a "central" part in OOXML, and to the best of my knowledge, there is not a single implementation out there that doesn't use this part for storing these so-called "core properties".

If you have tried to validate an OOXML-file in my validator, you'd probably have encountered this error:

Checking relationshiptype http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties ...

RelationshipType http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties is not valid. It should have been http://schemas.openxmlformats.org/officedocument/2006/relationships/metadata/core-properties.

In OOXML the "glue" tying the document and its various parts together is "relationship types". So for a given media-type (content type), a relationship type has to be used to properly register it in the package. A few relationship types are defined for the common parts of OOXML documents, i.e. for wordprocessing files, for spreadsheets, for presentations, for headers, footers etc. Some of these are defined in Part 1, section 15 and this is where the bug is. It is obviously a typo, and it has already been included in our list of fixes for the next batch.

The trick is - this has rather drastic consequences - at least from a validation point of view. Because a typo in this area will affect almost every implementation of OOXML that persists these basic data-chunks.

The thing is ... each and every document created by Microsoft Office will likely fail due to this bug in the specification.

So what are you gonna do?

Well, we discussed several different approaches.

One was simply to correct my validator to not report this error. I don't really like this idea, since it opens a flood gate of other scenarios where small corrections should take place Also, if I did want to go down that road, it should require a strategy for handling these things since I wouldn't want to correct any one error based on what Microsoft Office does - being an IBM drone and all. As of yet, I haven't been able to come up with such a strategy.

A second was to report warnings instead of errors in areas where "known bugs" were already in our to-do list of future corrections. I am not sure I like this either since it makes the validator almost impossible to maintain and it muddens the results since no-one will be able to figure out if a warning was simply a warning or a "down-graded error".

A third option is to do nothing.

I like that.

If you have tried to validate the same document using my validator and Alex's you'd probably have noticed that Alex's validator emits many more errors than mine. This is due to the fact that I use the schemas with the first batch of corrections (the so-called COR1-set). I'll update the schemas whenever the next batch of corrections are approved by either SC34 or JTC1. Alex's validator uses the schemas that was in the originally approved version of ISO/IEC 29500:2008. So my validator is already pretty "graceful" as it is.

Aonther reason that I like the idea of "doing nothing" is that it emphasizes a crucial point: A document should be valid according to the spec and not according to whatever implementation one considers "reference". There are other standards out there where we have a strange mixture of behaviour defined in the specification and behaviour buried in a "reference implementation". I don't know about you - but I'd rather have the spec be "the truth" than several gigs of source-code from whatever implementation is the pet app-de-jour at the moment.

Additionally, this shows us that all the implementations that handle this have failed in terms of feeding their experiences back to the standardisation organisation maintaining the specification. They will all have encountered this issue - but failed to report it ... unless, of course

  • they haven't looked in the spec at all [0]
  • they haven't bothered to validate their documents

The puzzling thing is - Alex and Gareth discovered this bug in January 2010 and his validator has been reporting this error for months now. I guess the answer to why neither of the implementers of OOXML has reported this bug is ... blowing in the wind.

So what I am trying to say is this: My validator stays the way it is - validating documents according to the spec. If any vendor discover a problem that is clearly an error in the spec, they should prioritize notifying us about it so we can correct it (which we will).

 

 

[0] Truth be told, prioritizing "make it work with most important implementation" is not the un-heard of. I myself, when I created my first ODF-files, didn't look in the ODF-spec. I reverse-engineered ODF-documents created by OOo since I only cared about whether OOo would eat it or not. Other implementations insist on not "supporting OOXML" but "supporting Microsoft Office output".

Validating OOXML documents

The question of “is this a valid document?” is tricky. At the end of the day it comes down to the description in the conformance clauses of the specification of the document being considered. The conformance clauses of OOXML is listed in Section 2 of Part 1 and Section 2 of Part 4. There are also conformance clauses for Part 2 and Part 3, but they are not really relevant for this post.

The basic requirements for “document validity” can be summarized in these two points:

  • The markup must correspond/obey to the schemas of the specification
  • The markup must correspond/obey to any semantic and syntactic constraints of the specification

The first bullet is the easy one to check – because all you need to do is to validate the markup. The second bullet is much harder and it is almost impossible to automatically perform such a validation.

But since the first requirement is so easy to test, one could argue that at the bare minimum, a document producer MUST be able to create documents that are valid to the corresponding schemas.

To be able to test this, I implemented an OOXML document schema validator.

It turned out, that I should have done this from “day one” of my work with OOXML, because trying to implement a validator revealed a lot of information about how the document is structured and put together – a knowledge that really comes nicely in hand when trying to implement a document generator.

My approach was this:

  • Implement a (as much as possible) generic tool to validate documents
  • Use the latest, approved version of the schemas
  • Implement a web front-end to allow anyone to use it from anywhere.
  • Open-source the stuff

Originally I based the validator on OpenXML SDK 2.0, but during the implementation I realized that first of all it seemed a bit too “Microsoft Office dependant”. Secondly I could not get access to all the necessary information in the OPC-package that I needed to validate since the SDK hides some of this information (and rightly so, if you ask me) and thirdly it turned out that I didn’t need it at all. OpenXML SDK is based on System.IO.Packaging and since I use .Net to implement this, I found this a much better tool for the job.

What does it do?

The validator performs these tasks:

  1. It checks if the media types (MIME types) of each part is listed in the specification
  2. It checks if the relationship-type of the relationship file is listed in the specification
  3. It checks if each part referenced exists in the package in the correct location
  4. It checks the content (markup) of each part against the transitional schemas of the specification

What doesn’t it do?

The validator does not do the following things:

  • Support validation of documents containing extensions using MCE
  • Support documents in files with extensions not being either “docx”, “xlsx” or “pptx”.
  • Support validation against the strict schemas of OOXML
  • Support validation of the “root” package entry, being the file [Content_Types].xml

Other tools:

As you probably know Alex Brown has made the “office-o-tron”, which is a SAX/Java-based document validator. The differences between this tool and mine are summarized here:

Task Office-o-tron OOXML Validator
Validates OOXML documents x x
Validates OOXML <T> documents x x
Validates OOXML <S> documents
Validates against ISO/IEC 29500:2008 x
Validates against ISO/IEC 29500:2008 COR 1
x
Supports MCE
Inspects package of document (ZIP container) x
Validates ODF documents x

 

The only major difference (OOXML-wise) is really that office-o-tron validates against the core, base schemas of ISO/IEC 29500:2008 whereas the OOXML Validator validates against the set of schemas with the first set of approved corrigenda (COR1). Whenever a new set has been approved as either an amendment or a corrigendum, the schema sets will be updates accordingly.

I'll update this article with some of the details revealed during the creation of the validator - until then, have fun.

Smile

Final nail in the coffin for the "highlander approach"

On January 29th the Danish politicians finally got their acts together and did something about open document formats. After almost 3 years of debate and endless dragging of their feet - a consensus and agreement was finally made on that Friday.

The agreement made had this content:

For use in the public sector in Denmark, a document format can be used, if it is on the list of approved document formats. To get on the list, the document format must comply with these rules:

  • It has to be completely documented and be publicly available
  • It has to be implementable by anyone without financial-, political- and legal limitations on neither implementation nor utilization
  • It has to have been approved by an internationally recognised standardisation organisation, such as ISO, and standardised and maintained in an open forum with an transparent process.
  • It must be demonstrated that the standard can be implemented by everyone directly in its entirety on several platforms.

If you ask me, this list is pure rubbish. Apparently a deal was made on that Friday morning literally minutes before an open hearing in Parliament and this bears all signs of a job done in too much haste.

(of course, all this happened when I was away on family weekend, but as you can imagine the blog-sphere went crazy and twitter buzzed like a hive of bees with the gent's of "big blue" and "big red" taking swings at each other)

Devil in the details

The problem is that it is written all over it that this list will be taken very literally and we are going to continue to have to discuss stupid details with stupid people - instead of getting to work to start giving value to our customers.

The problems pertain to item 1) and item 4).

Item 1 is actually not that big of a deal, but it is an example of a requirement that cannot be verified. Because what does "completely documented mean"? Does it mean that a mere list of all elements and attributes is enough to give a "thumbs-up"? Does it mean that a single ocurrence of the phrase "... is application defined" provides automatic rejection? Now, I agree with the idea behind this, coz' shit has to be documented but this item should be removed or altered to provide real meaning.

And what about item 4) ?

Well the problem with this is that no document standard of today can be said to comply with this requirement - thereby making the list Ø. The only way a document format can be said to comply with this would be to have 2 independant applications, each claiming to be implementing the specification in "its entirety". And still we wouldn't be able to actually prove it. We would, at best, be able to show that with high likelyhood the applications do actually implement the specification "in its entirety".

Two to go ...

So that basically leaves us with two requirements. The only requirement we should think of adding would be "It has to be relevant in the market" ... ODA, anyone?

The silver lining

But do not fret - it is not all bad. No, because the agreement effectively puts the final nail in the coffin for the "there can be only one document format"-line of thought. The Danish parliament has has turned its back on any exclusivity with regards to document formats and has turned its focus to "open standards". This is no doubt a positive move, because now it doesn't make any sense any more to argue "which one is bigger (or, smaller)" or over who got to the playground first.

With this decision Denmark follows other countries like Norway, Belgium and Holland where the notion of "open standards" is also the center of thought - and who have also discarded the idea of "value can only come if we only have one document format". This is fantastic - and I applaud our politicians on making this decision - even though some of the details lacked consideration.

Smile

Moving towards OOXML(S)

Some time ago I wrote a bit about what Microsoft Office had managed to get into Microsoft Office 2010 CTP1 (or, I wrote about the stuff I had tested). As you might recall, the results were rather slim, so I wrote to Microsoft to hear, if that was really it. It has been the fear of many that Microsoft will never, ever care at all about the strict conformance clause of ISO/IEC 29500, and my tests clearly was a sign that they were right. Heck, some even mentioned that "the only choice for Microsoft is to avoid adding new BRM features in their OOXML files".

On the other hand I have always regarded big companies like Novell, IBM, ORACLE etc as rather simplistic in their development cycles - that they'll always choose the path of least resistance. Microsoft is in no different here, and moving towards strict and side-tracking nasty. legacy stuff like VML etc is clearly an attempt to make the developmentpath easier in the future.

The list I got was this:

File typeFeatureComment 
DOCX Ink Drawings Previously used VML, now uses DrawingML
Advarsel, gult trafiklys
XLSX Ink Drawings Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX Ink Drawings Previously used VML, now uses DrawingML Advarsel, gult trafiklys
DOCX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
DOCX Drawing Shapes Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX Textboxes Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX WordArt Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX Groups Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX Form Controls Previously used VML, now uses DrawingML - except on "chart sheets" Advarsel, gult trafiklys
XLSX ActiveX Objects Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX ActiveX Objects Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX OLE Objects Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
XLSX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
PPTX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
XLSX ISO-dates Can persist dates in ISO-8601 format and avoids the "evil" serial dates. Succes, grønt trafiklys

(The last four was addeed by me and didn't appear on the list from Microsoft)

Now, "someone" once wrote to me that you shouldn't make any decisions based on what Microsoft Office says they will do - you should wait until they actually do act. I couldn't agree more, so I tried to test the list I received. I have tested the lines marked with a green traffic light by first creating a document in Microsoft Office 2007 to verify the usage of e.g. VML and then I created the same document in Microsoft Office 2010 Beta [']. The lines marked with yellow traffic lights have not been tested by me, since I frankly don't have the Office-skills to create a file (what the hell is an "Ink Drawing, btw?). If anyone can test this, I'd be happy to update the list. Also, regarding the lines about ST_OnOff, I have tried to create files that would contain the "bad" On/Off-values, but I haven't succeeded in this. That is not the same as deterministically verifying that it cannot be done, so again - if you can create a file in Microsoft Office 2010 with the bad values, send it to me and I'll update the list.

So getting back to "don't trust Microsoft as far as you can throw them", this is in no way a definitive list. The list is based on Microsoft Office 2010 Beta, and much can happen until final RTM - both in the right direction with even more things being fixed, but also in the wrong direction with things being pulled off the list again (WinFS, anyone?). But for those of us not implementing complete Office suites but "merely" interacting with the ecosystem by generating files, this is undoubtly good news. Add to this that Microsoft confirmed a few TC-calls ago in WG4, that pending the current AMD1-ballot, Microsoft would add the new namespaces of strict files to the white-list of known namespaces in Office 2010. This effectively means that Microsoft Office will be able to load (some) strict files, and if you just happen to generate PPTX-files with embedded objects, you'll likely never again have to generate markup like this:

[code:xml]<w:object w:dxaOrig="15" w:dyaOrig="15">
    <v:shapetype
        id="_x0000_t75"
        coordsize="21600,21600"
        o:spt="75"
        o:preferrelative="t"
        path="m@4@5l@4@11@9@11@9@5xe"
        filled="f"
        stroked="f">
        <v:stroke joinstyle="miter"/>
            <v:formulas>
                <v:f eqn="if lineDrawn pixelLineWidth 0"/>
                <v:f eqn="sum @0 1 0"/>
                <v:f eqn="sum 0 0 @1"/>
                <v:f eqn="prod @2 1 2"/>
                <v:f eqn="prod @3 21600 pixelWidth"/>
                <v:f eqn="prod @3 21600 pixelHeight"/>
                <v:f eqn="sum @0 0 1"/>
                <v:f eqn="prod @6 1 2"/>
                <v:f eqn="prod @7 21600 pixelWidth"/>
                <v:f eqn="sum @8 21600 0"/>
                <v:f eqn="prod @7 21600 pixelHeight"/>
                <v:f eqn="sum @10 21600 0"/>
            </v:formulas>
            <v:path
                o:extrusionok="f"
                gradientshapeok="t"
                o:connecttype="rect"/>
        <o:lock v:ext="edit" aspectratio="t"/>
    </v:shapetype>
    <v:shape
        id="_x0000_i1025"
        type="#_x0000_t75"
        style="width:.75pt;height:.75pt"
        o:ole="">
        <v:imagedata r:id="rId4" o:title=""/>
    </v:shape>
    <o:OLEObject
        Type="Embed"
        ProgID="opendocument.WriterDocument.1"
        ShapeID="_x0000_i1025"
        DrawAspect="Content"
        ObjectID="_1327745060"
        r:id="rId5"/>
</w:object>[/code]

['] I have tried, in vain, to get my hands on the latest pre-release edition of Microsoft Office 2010 ... so much for being a drone, when you can't get your hands on the latest bits Frown

 

Microsoft Office 2010 Beta, ODF and leap-year-bug

Some time ago I did some tests of Excel in Microsoft Office 2010 (CTP). The test was around OOXML - but test of ODF-support was missing.

One of the things ODF is missing but is in OOXML is the leap-year-bug ... although most of propably don't miss it all that much. The leap-year-bug is the good ol' Lotus 1-2-3 bug that treated 1900 as a leap year. As a consequence of that, calculations based on dates in the range from January 1st 1900 and February 28th 1900 with dates after this period will be off with one day.

Since Microsoft Office supports (a subset of) ODF, I thought it'd be fun to look at how Excel 2010 handles the leap-year-bug.

The first thing to do is to show how the leap-year-bug is handled by Excel:

So adding a day to February 28th 1900 will result in the non-existing date February 29th 1900, and if you subtract the dates February 27th 1900 and March 2nd 1900 (you'd expect the a value of 3) you actually get a value of 4.

So what will happen if you save this spreadsheet in ODF-format and open it again in Excel? You might expect that - since it was round-tripped through a format not supporting the leap-year-bug, the calculations would now be correct.

... but you'd be wrong. The result is excatly the same:

As I was, you might be wondering how the hell that was possible. But a simple inspection of the markup generated by Microsoft Excel 2010 reveals the answer:

[code:xml]<table:table-cell

  office:value-type="date"

  office:date-value="1900-02-29T00:00:00"

  table:formula="msoxl:=A2+1"

  >
  <text:p>29-02-1900</text:p>

  </table:table-cell>[/code]

A quick-and-dirty conclusion to this would be that Microsoft Excel 2010 violates not only ODF but also xsd:datetime, since February 29th is not a valid xsd:datetime. However, an inspection of ODF reveals that this is not the case. Microsoft Office claims conformance to ODF 1.1. and ODF 1.1 states the following about the value-space of the attribute "office:date-value" (Section 16.1 , p 702) :

A dateOrDateTime value is essentially an [xmlschema-2] date and time value with an optional time component. In other words, it may contain either a date, or a date and time value.

So strictly (*giggle*) speaking, Microsoft Office 2010 does not violate ODF 1.1 .

However - specifying an invalid date in an attribute that might contain xsd:dates is not very smart, dear Microsoft. Those of us wanting to use standard libraries to process the content of an ODF-document will likely get unpredictable results when trying to parse this invalid date. Heck, even .Net's DateTime.Parse()-method throws an exception when trying to parse this value.

Also, ODF TC has tightened up the prose in ODF 1.2 and it is now:

A dateOrDateTime value is either an [xmlschema-2] date value or an [xmlschema-2] dateTime value.

So Microsoft Office 2010 might not violate it now - but it will when ODF 1.2 comes out.

Extending ODF

Microsoft could always opt for extending ODF using the extension mechanism (to add elements and attributes using a foreign namespace). So Microsoft could chose to add their own attribute to the <office:spreadsheet>-element saying something like

[code:xml]<office:spreadsheet mso:EnableLeapYear="true"/>[/code]

The problem with this approach is that is comes into conflict with the new conformance clauses of ODF where a clear distinction between "normal" documents and "extended" documents is made. Procurement-wise it is a big no-no only to support extended documents (look what happened in Denmark!) and Microsoft risks that some government somewhere decides not to use Microsoft Office due to lack of conformance to the "normal" conformance clause of ODF 1.2.

Thus, Microsoft needs to find another solution ...

Configuration to the rescue!

Luckily for Microsoft (and we all know how picky they are wrt "preserving functionality" etc), there is a fully compliant way out of this while still preserving the leap-year-bug in spreadsheets - regardless of persistance format.

As you probably know that so-called config-item-sets are a gold-mine of endless possibilities. Originally (until ODF 1.1) the purpose of these elements and attributes were to store application specific settings, like (and this is a quote from ODF 1.1) "document settings, for example a default printer or view settings, for example zoom level". In ODF 1.2, all bets are off and there are no restrictions to the usage of the elements. The config-item-set elements were never meant to be an extension mechanism (by ODF TC co-chair from Sun/ORACLE - go figure), but OpenOffice.org uses them extensively - in fact, when creating a "blank" text-document, spreadsheet or presentation in OpenOffice.org, a total of 228 (76 for text documents, 66 for spreadsheets and 86 for presentations) settings (of which non are described in ODF) are defined in the the settings.xml-file of the packages. Somehow ODF TC has not found it necessary to include usage of config-item-sets in the "extended conformance clause", so a document can claim 100% conformance to ODF 1.2 "normal documents" while throwing dozens of settings into config-item-set elements. So the solution to claim conformance to ODF while enabling the leap-year-bug is simply

[code:xml]<config:config-item-set  config:name="mso:spreadsheet-settings">
  <config:config-item

    config:name="EnableLeapYearBug"

    config:type="boolean"

  >

    true

  </config:config-item>
</config:config-item-set>[/code]

This should be combined with this markup for the specific cell

[code:xml]<table:table-cell

  office:value-type="float"

  office:-value="60"

  table:formula="msoxl:=A2+1"

  >
  <!--<text:p>29-02-1900</text:p>-->

  </table:table-cell>[/code]

(and you don't really need the bit I have commented out).

I don't know about you, but I find this just darn right fantastic!

Smile

test

[code:xml]<o:OLEObject
    Type="Embed"
    ProgID="AVIFile"
    ObjectID="_1219561732" r:id="rId5"
/>[/code]