a 'mooh' point

clearly an IBM drone

Will you be my friend too?

The other day I noticed that the traffic on my blog had increased dramatically. I couldn’t really understand why (I have not written anything substantial in quite some time) – all I could see was that the majority of the visitors originated from Google. The search results were mainly “Jesper Lund Stocholm”, “Alex Brown” and “OOXML” … which told me just about nothing at all.

But then I noticed that a few of them also included the term “boycottnovell” and “techrights”. This lead me to check out the feed from #boycottboy’s website – and behold – I was actually mentioned in one of his defamatory articles.

The article is this: http://techrights.org/2010/09/11/sc34-is-still-a-farce/ .

The quotes start like this:

Weir is then met by opposition from Brown’s longtime right-winger, the ODF-hostile Jesper Lund Stocholm [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. He is a known Microsoft booster and Weir’s responses to him go like this:

I wonder if I shall refer to myself from now on as “Goose” or similar. Alex is clearly the Maverick here.

Now – and this is where I envy Rob – #boycottboyquotes a conversation I had with Rob (which I clearly won, btw :-) ) – but conveniently leaves out everything I wrote. It seems to me that Rob and #boycottboy live in some sort of symbiosis – each benefitting from on another. #Boycottboy clearly regards every little thing Rob writes to him as a “badge of honor” – regardless of the content itself. And Rob very much benefits from #boycottboys critique-less c/p of his comments/articles … with the infamous “#boycottboy reality distortion field” applied.

I wish I had a friend like that. I wish I had a friend that would blindly recite every syllable coming from my lips – without any sanity-check at all.

I totally get why #boycottboy needs Rob – but why IBM’s chief ODF Architect (elsewhere known as “ODF’s one-man-army”) needs someone like #boycottboy is beyond me.

And you know what the silver lining is here? I actually benefit greatly (and not only in a monetary sense) from being posted on #boycottboy’s site. The increased traffic from techrights.org keeps the fire burning and it is almost always a certainty that someone will write to me to invite me to speak on document formats at conference, potential customers or even political parties.

So please keep it up, you two – it helps me avoid the wife and kid going to bed hungry at night.

Moving towards OOXML(S) (update)

Some time ago I wrote about some of the enhancements of Microsoft Office in terms of how far they have made it in implementing the content of the conformance profile "Strict" or "<S>". As you might recall, I made a run-through of a list of feature areas and marked each with either a green, yellow or red traffic light. There were no red traffic lights, but some areas had a yellow marking. These were

  • "ink"
  • "legacy diagrams"
  • "groups"
  • "form controls"
  • "activeX objects".

These document types previously used VML as containing frame etc, but Microsoft Office 2010 was now supposedly using DrawingML for these. The reason for them being yellow and not red was that I did simply not know how to test these things - either because of poor Microsoft Office skills or lack of proper hardware ("ink" is used on tablet PC's and I don't have one of those at hand).

Stockholm plug-fest

When WG4 met in Stockholm a couple of months ago, I got a chance to take a look at the documents I couldn't create myself. The cool thing about participating in these meetings is that there is an abundance of different hardware and software on the laptops of the delegates, so after one of the sessions a few of us had our own little "Microsoft Office OOXML <S> interop plug-fest" and I finally had a chance to get my hands on those files.

I could have simply updated the previous article with the new information, but a couple of interesting thing emerged that made me write up a new piece.

First, the results are this:

File typeFeatureComment 
DOCX Ink Drawings Previously used VML, now uses DrawingML
Success, green traffic light
XLSX Ink Drawings Previously used VML, now uses DrawingML Success, green traffic light
PPTX Ink Drawings Previously used VML, now uses DrawingML Success, green traffic light
DOCX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
XLSX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
PPTX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
DOCX Drawing Shapes Previously used VML, now uses DrawingML Success, green traffic light
DOCX Textboxes Previously used VML, now uses DrawingML Success, green traffic light
DOCX WordArt Previously used VML, now uses DrawingML Success, green traffic light
DOCX Groups Previously used VML, now uses DrawingML Success, green traffic light
XLSX Form Controls Previously used VML, now uses DrawingML - except on "chart sheets" Success, green traffic light
XLSX ActiveX Objects Previously used VML, now uses DrawingML Success, green traffic light
PPTX ActiveX Objects Previously used VML, now uses DrawingML Success, green traffic light
XLSX OLE Objects Previously used VML, now uses DrawingML Success, green traffic light
DOCX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
XLSX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
PPTX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
XLSX ISO-dates Can persist dates in ISO-8601 format and avoids the "evil" serial dates. Failure, red traffic light

The trained eye will notice that all the yellow lights have been replaced by green lights - In other words, the list above clearly shows that even though Microsoft Office 2010 does not write <S>, the developers in Redmond have clearly made some significant progress.

There are a couple of interesting points about the technicalities of the files I looked at.

Predicting the future is difficult

The files containing "legacy diagrams" stand out, because of the way Microsoft Office 2010 breaks compatibility with e.g. Microsoft Office 2003 and earlier versions. The thing is - when loading a PPT-file with a legacy diagram from e.g. Microsoft Office 2003 the diagram will be in VML-format. When it is loaded in Microsoft Office 2010, modified and saved again - it won't save the diagram in VML. It just won't. The diagram will be saved all right - but now using DrawingML instead of VML. So this is essentially a case where interoperability with this "legacy" application is hurt since Microsoft Office 2003 has no idea what to do with the DrawingML it loads.

MCE to the rescue

For all the other files, MCE once again steps up.

If we look at the file containing the ink notations, (some of) the markup will look like this:

  <mc:Choice Requires="wpi">
        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
        <a:graphicData uri="http://schemas.microsoft.com/office/word/2010/wordprocessingInk">
      <v:shapetype >
        <v:stroke joinstyle="miter" />
          <v:f eqn="if lineDrawn pixelLineWidth 0" />
      <v:shape id="Ink 15">
        <v:imagedata r:id="rId6" o:title="" />

(I have modified and trimmed the real XML for easier reading - especially the VML was really, really ugly)

So with this approach you can actually have the best of two worlds - the new and the old without losing information. That is - if you know MCE, of course. This again shows what a great tool alternating content blocks (ACB) of MCE are for this task. It allows you to innovate while still making it possible to ensure some sort of compatibility with earlier programs that did not know of the new technology.

And what about them dates?

The even more trained eye will have noticed that the green traffic light for ISO-dates in SpreadsheetML has been downgraded to a flashing red traffic sign.

The first test I did with Microsoft Office 2010 CTP1 had a small check-box in the "backstage" area that would allow dates in spreadsheets to be persisted in ISO-8601 format. With RTM of Microsoft Office 2010 this check-box is gone so we are now back to using serial dates again.

It would be easy to hit on Microsoft for removing this check-box, and I am sure that many will. But the truth is that the removal of this feature is due to activities in WG4 where we maintain OOXML.

As some of you may recall, the introduction of ISO-dates in OOXML was done in Geneva at the BRM in those hectic days we spent there. The trouble with introducing the ISO-dates in OOXML was that it looked really, really good on paper - but it sucked in real life.

The reason is that we "forgot" to change namespace name for ISO OOXML-files so documents conforming to ISO OOXML<T> share namespace name with documents conforming to ECMA 376. This has has enormous consequences for spreadsheets and those applications designed to support ECMA-376 but not necessarily ISO OOXML. At the second F2F of WG4 in Prague in 2009, we had a demonstration of how bad is was - not a single application would interpret these new dates correctly and - what was perhaps even worse - they did not display any warnings to the user.

We have been discussing this a lot in WG4, and in the end we decided to start the work to remove usage of ISO-dates from Part 4. This correction of a BRM decision was not easy to agree on (AFAIR it has not been finally approved as of yet) and the removal of the "Save-as ISO-dates" feature in Microsoft Office 2010 is propably in anticipation of this pending removal of ISO-dates from <T>. I think it might be important to note that this removal was not due to "pleasing Microsoft". In fact - they had already implemented support for this in CTP1. We are removing ISO-dates from Part 4 due to problems with everybody else.

I always like to give credit where credit is due, and I think this is one of those cases. Microsoft has clearly worked with - and listened to - the standardisation community and has chosen to remove a feature they had already implemented.

So what's next?

Well, Doug Mahugh recently wrote about the approach of Microsoft when dealing with OOXML<S>. Amongst other things he wrote that Microsoft Office 2010 will have read-support for OOXML<S> but that "a small number of optional features" will still be lost (that's just new-speak meaning "we haven't implemented support for all of Part 1"). I asked him what that list consisted of, and Doug said they'd provide anwsers when they have them. I hope that list will come soon.

As I have mentioned earlier, CIBER Denmark A/S (the company I work for) is not in the "productivity-suite-business" - but we develop solutions that work with these suites be that Microsoft Office, OpenOffice.org, iWork or others. Having read-support for OOXML<S> in Microsoft Office 2010 helps us a great deal, because we can now start trimming our code to target OOXML<S> instead of OOXML<T>. We think that adds great value to us and our customers. But we need a definitive list of the areas where we can expect Microsoft Office 2010 to ignore our markup. If we can't have that we are forced to go the safe-route and keep producing OOXML<T>-files and we'd hate to do that. But without a list from Microsoft, we feel that our hands are tied behind our backs.

So please, Microsoft - give us the list ASAP. Otherwise the uncertainty of what Microsoft Office will ignore is to great a risk for us to start producing strict files and your read-support for OOXML<S> is more or less useless to us.

Correct according to spec or implementation?

In the recent SC34 WG4-meeting in Stockholm, validators quickly became the talk of the town - so to speak. As I am sure you all know, Alex Brown made the office-o-tron some time ago - a validator targeting both ODF and OOXML in their ISO-editions. A few weeks ago I myself made a validator - but mine only targets OOXML in its "latest-improved-and-approved-transitional-version". Alex Brown's is written in Java and mine is written in C# .

Anyways - both Alex and I had some lengthy discussions with Microsoft about our validators and the errors they report. The thing is - there is a bug in the OOXML-specification dealing with how to specify relationship type for the part containing document properties like "author", "created on", "last modified on" etc. This part is a "central" part in OOXML, and to the best of my knowledge, there is not a single implementation out there that doesn't use this part for storing these so-called "core properties".

If you have tried to validate an OOXML-file in my validator, you'd probably have encountered this error:

Checking relationshiptype http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties ...

RelationshipType http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties is not valid. It should have been http://schemas.openxmlformats.org/officedocument/2006/relationships/metadata/core-properties.

In OOXML the "glue" tying the document and its various parts together is "relationship types". So for a given media-type (content type), a relationship type has to be used to properly register it in the package. A few relationship types are defined for the common parts of OOXML documents, i.e. for wordprocessing files, for spreadsheets, for presentations, for headers, footers etc. Some of these are defined in Part 1, section 15 and this is where the bug is. It is obviously a typo, and it has already been included in our list of fixes for the next batch.

The trick is - this has rather drastic consequences - at least from a validation point of view. Because a typo in this area will affect almost every implementation of OOXML that persists these basic data-chunks.

The thing is ... each and every document created by Microsoft Office will likely fail due to this bug in the specification.

So what are you gonna do?

Well, we discussed several different approaches.

One was simply to correct my validator to not report this error. I don't really like this idea, since it opens a flood gate of other scenarios where small corrections should take place Also, if I did want to go down that road, it should require a strategy for handling these things since I wouldn't want to correct any one error based on what Microsoft Office does - being an IBM drone and all. As of yet, I haven't been able to come up with such a strategy.

A second was to report warnings instead of errors in areas where "known bugs" were already in our to-do list of future corrections. I am not sure I like this either since it makes the validator almost impossible to maintain and it muddens the results since no-one will be able to figure out if a warning was simply a warning or a "down-graded error".

A third option is to do nothing.

I like that.

If you have tried to validate the same document using my validator and Alex's you'd probably have noticed that Alex's validator emits many more errors than mine. This is due to the fact that I use the schemas with the first batch of corrections (the so-called COR1-set). I'll update the schemas whenever the next batch of corrections are approved by either SC34 or JTC1. Alex's validator uses the schemas that was in the originally approved version of ISO/IEC 29500:2008. So my validator is already pretty "graceful" as it is.

Aonther reason that I like the idea of "doing nothing" is that it emphasizes a crucial point: A document should be valid according to the spec and not according to whatever implementation one considers "reference". There are other standards out there where we have a strange mixture of behaviour defined in the specification and behaviour buried in a "reference implementation". I don't know about you - but I'd rather have the spec be "the truth" than several gigs of source-code from whatever implementation is the pet app-de-jour at the moment.

Additionally, this shows us that all the implementations that handle this have failed in terms of feeding their experiences back to the standardisation organisation maintaining the specification. They will all have encountered this issue - but failed to report it ... unless, of course

  • they haven't looked in the spec at all [0]
  • they haven't bothered to validate their documents

The puzzling thing is - Alex and Gareth discovered this bug in January 2010 and his validator has been reporting this error for months now. I guess the answer to why neither of the implementers of OOXML has reported this bug is ... blowing in the wind.

So what I am trying to say is this: My validator stays the way it is - validating documents according to the spec. If any vendor discover a problem that is clearly an error in the spec, they should prioritize notifying us about it so we can correct it (which we will).



[0] Truth be told, prioritizing "make it work with most important implementation" is not the un-heard of. I myself, when I created my first ODF-files, didn't look in the ODF-spec. I reverse-engineered ODF-documents created by OOo since I only cared about whether OOo would eat it or not. Other implementations insist on not "supporting OOXML" but "supporting Microsoft Office output".

Validating OOXML documents

The question of “is this a valid document?” is tricky. At the end of the day it comes down to the description in the conformance clauses of the specification of the document being considered. The conformance clauses of OOXML is listed in Section 2 of Part 1 and Section 2 of Part 4. There are also conformance clauses for Part 2 and Part 3, but they are not really relevant for this post.

The basic requirements for “document validity” can be summarized in these two points:

  • The markup must correspond/obey to the schemas of the specification
  • The markup must correspond/obey to any semantic and syntactic constraints of the specification

The first bullet is the easy one to check – because all you need to do is to validate the markup. The second bullet is much harder and it is almost impossible to automatically perform such a validation.

But since the first requirement is so easy to test, one could argue that at the bare minimum, a document producer MUST be able to create documents that are valid to the corresponding schemas.

To be able to test this, I implemented an OOXML document schema validator.

It turned out, that I should have done this from “day one” of my work with OOXML, because trying to implement a validator revealed a lot of information about how the document is structured and put together – a knowledge that really comes nicely in hand when trying to implement a document generator.

My approach was this:

  • Implement a (as much as possible) generic tool to validate documents
  • Use the latest, approved version of the schemas
  • Implement a web front-end to allow anyone to use it from anywhere.
  • Open-source the stuff

Originally I based the validator on OpenXML SDK 2.0, but during the implementation I realized that first of all it seemed a bit too “Microsoft Office dependant”. Secondly I could not get access to all the necessary information in the OPC-package that I needed to validate since the SDK hides some of this information (and rightly so, if you ask me) and thirdly it turned out that I didn’t need it at all. OpenXML SDK is based on System.IO.Packaging and since I use .Net to implement this, I found this a much better tool for the job.

What does it do?

The validator performs these tasks:

  1. It checks if the media types (MIME types) of each part is listed in the specification
  2. It checks if the relationship-type of the relationship file is listed in the specification
  3. It checks if each part referenced exists in the package in the correct location
  4. It checks the content (markup) of each part against the transitional schemas of the specification

What doesn’t it do?

The validator does not do the following things:

  • Support validation of documents containing extensions using MCE
  • Support documents in files with extensions not being either “docx”, “xlsx” or “pptx”.
  • Support validation against the strict schemas of OOXML
  • Support validation of the “root” package entry, being the file [Content_Types].xml

Other tools:

As you probably know Alex Brown has made the “office-o-tron”, which is a SAX/Java-based document validator. The differences between this tool and mine are summarized here:

Task Office-o-tron OOXML Validator
Validates OOXML documents x x
Validates OOXML <T> documents x x
Validates OOXML <S> documents
Validates against ISO/IEC 29500:2008 x
Validates against ISO/IEC 29500:2008 COR 1
Supports MCE
Inspects package of document (ZIP container) x
Validates ODF documents x


The only major difference (OOXML-wise) is really that office-o-tron validates against the core, base schemas of ISO/IEC 29500:2008 whereas the OOXML Validator validates against the set of schemas with the first set of approved corrigenda (COR1). Whenever a new set has been approved as either an amendment or a corrigendum, the schema sets will be updates accordingly.

I'll update this article with some of the details revealed during the creation of the validator - until then, have fun.


Final nail in the coffin for the "highlander approach"

On January 29th the Danish politicians finally got their acts together and did something about open document formats. After almost 3 years of debate and endless dragging of their feet - a consensus and agreement was finally made on that Friday.

The agreement made had this content:

For use in the public sector in Denmark, a document format can be used, if it is on the list of approved document formats. To get on the list, the document format must comply with these rules:

  • It has to be completely documented and be publicly available
  • It has to be implementable by anyone without financial-, political- and legal limitations on neither implementation nor utilization
  • It has to have been approved by an internationally recognised standardisation organisation, such as ISO, and standardised and maintained in an open forum with an transparent process.
  • It must be demonstrated that the standard can be implemented by everyone directly in its entirety on several platforms.

If you ask me, this list is pure rubbish. Apparently a deal was made on that Friday morning literally minutes before an open hearing in Parliament and this bears all signs of a job done in too much haste.

(of course, all this happened when I was away on family weekend, but as you can imagine the blog-sphere went crazy and twitter buzzed like a hive of bees with the gent's of "big blue" and "big red" taking swings at each other)

Devil in the details

The problem is that it is written all over it that this list will be taken very literally and we are going to continue to have to discuss stupid details with stupid people - instead of getting to work to start giving value to our customers.

The problems pertain to item 1) and item 4).

Item 1 is actually not that big of a deal, but it is an example of a requirement that cannot be verified. Because what does "completely documented mean"? Does it mean that a mere list of all elements and attributes is enough to give a "thumbs-up"? Does it mean that a single ocurrence of the phrase "... is application defined" provides automatic rejection? Now, I agree with the idea behind this, coz' shit has to be documented but this item should be removed or altered to provide real meaning.

And what about item 4) ?

Well the problem with this is that no document standard of today can be said to comply with this requirement - thereby making the list Ø. The only way a document format can be said to comply with this would be to have 2 independant applications, each claiming to be implementing the specification in "its entirety". And still we wouldn't be able to actually prove it. We would, at best, be able to show that with high likelyhood the applications do actually implement the specification "in its entirety".

Two to go ...

So that basically leaves us with two requirements. The only requirement we should think of adding would be "It has to be relevant in the market" ... ODA, anyone?

The silver lining

But do not fret - it is not all bad. No, because the agreement effectively puts the final nail in the coffin for the "there can be only one document format"-line of thought. The Danish parliament has has turned its back on any exclusivity with regards to document formats and has turned its focus to "open standards". This is no doubt a positive move, because now it doesn't make any sense any more to argue "which one is bigger (or, smaller)" or over who got to the playground first.

With this decision Denmark follows other countries like Norway, Belgium and Holland where the notion of "open standards" is also the center of thought - and who have also discarded the idea of "value can only come if we only have one document format". This is fantastic - and I applaud our politicians on making this decision - even though some of the details lacked consideration.


Moving towards OOXML(S)

Some time ago I wrote a bit about what Microsoft Office had managed to get into Microsoft Office 2010 CTP1 (or, I wrote about the stuff I had tested). As you might recall, the results were rather slim, so I wrote to Microsoft to hear, if that was really it. It has been the fear of many that Microsoft will never, ever care at all about the strict conformance clause of ISO/IEC 29500, and my tests clearly was a sign that they were right. Heck, some even mentioned that "the only choice for Microsoft is to avoid adding new BRM features in their OOXML files".

On the other hand I have always regarded big companies like Novell, IBM, ORACLE etc as rather simplistic in their development cycles - that they'll always choose the path of least resistance. Microsoft is in no different here, and moving towards strict and side-tracking nasty. legacy stuff like VML etc is clearly an attempt to make the developmentpath easier in the future.

The list I got was this:

File typeFeatureComment 
DOCX Ink Drawings Previously used VML, now uses DrawingML
Advarsel, gult trafiklys
XLSX Ink Drawings Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX Ink Drawings Previously used VML, now uses DrawingML Advarsel, gult trafiklys
DOCX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
DOCX Drawing Shapes Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX Textboxes Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX WordArt Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX Groups Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX Form Controls Previously used VML, now uses DrawingML - except on "chart sheets" Advarsel, gult trafiklys
XLSX ActiveX Objects Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX ActiveX Objects Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX OLE Objects Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
XLSX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
PPTX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
XLSX ISO-dates Can persist dates in ISO-8601 format and avoids the "evil" serial dates. Succes, grønt trafiklys

(The last four was addeed by me and didn't appear on the list from Microsoft)

Now, "someone" once wrote to me that you shouldn't make any decisions based on what Microsoft Office says they will do - you should wait until they actually do act. I couldn't agree more, so I tried to test the list I received. I have tested the lines marked with a green traffic light by first creating a document in Microsoft Office 2007 to verify the usage of e.g. VML and then I created the same document in Microsoft Office 2010 Beta [']. The lines marked with yellow traffic lights have not been tested by me, since I frankly don't have the Office-skills to create a file (what the hell is an "Ink Drawing, btw?). If anyone can test this, I'd be happy to update the list. Also, regarding the lines about ST_OnOff, I have tried to create files that would contain the "bad" On/Off-values, but I haven't succeeded in this. That is not the same as deterministically verifying that it cannot be done, so again - if you can create a file in Microsoft Office 2010 with the bad values, send it to me and I'll update the list.

So getting back to "don't trust Microsoft as far as you can throw them", this is in no way a definitive list. The list is based on Microsoft Office 2010 Beta, and much can happen until final RTM - both in the right direction with even more things being fixed, but also in the wrong direction with things being pulled off the list again (WinFS, anyone?). But for those of us not implementing complete Office suites but "merely" interacting with the ecosystem by generating files, this is undoubtly good news. Add to this that Microsoft confirmed a few TC-calls ago in WG4, that pending the current AMD1-ballot, Microsoft would add the new namespaces of strict files to the white-list of known namespaces in Office 2010. This effectively means that Microsoft Office will be able to load (some) strict files, and if you just happen to generate PPTX-files with embedded objects, you'll likely never again have to generate markup like this:

[code:xml]<w:object w:dxaOrig="15" w:dyaOrig="15">
        <v:stroke joinstyle="miter"/>
                <v:f eqn="if lineDrawn pixelLineWidth 0"/>
                <v:f eqn="sum @0 1 0"/>
                <v:f eqn="sum 0 0 @1"/>
                <v:f eqn="prod @2 1 2"/>
                <v:f eqn="prod @3 21600 pixelWidth"/>
                <v:f eqn="prod @3 21600 pixelHeight"/>
                <v:f eqn="sum @0 0 1"/>
                <v:f eqn="prod @6 1 2"/>
                <v:f eqn="prod @7 21600 pixelWidth"/>
                <v:f eqn="sum @8 21600 0"/>
                <v:f eqn="prod @7 21600 pixelHeight"/>
                <v:f eqn="sum @10 21600 0"/>
        <o:lock v:ext="edit" aspectratio="t"/>
        <v:imagedata r:id="rId4" o:title=""/>

['] I have tried, in vain, to get my hands on the latest pre-release edition of Microsoft Office 2010 ... so much for being a drone, when you can't get your hands on the latest bits Frown


Microsoft Office 2010 Beta, ODF and leap-year-bug

Some time ago I did some tests of Excel in Microsoft Office 2010 (CTP). The test was around OOXML - but test of ODF-support was missing.

One of the things ODF is missing but is in OOXML is the leap-year-bug ... although most of propably don't miss it all that much. The leap-year-bug is the good ol' Lotus 1-2-3 bug that treated 1900 as a leap year. As a consequence of that, calculations based on dates in the range from January 1st 1900 and February 28th 1900 with dates after this period will be off with one day.

Since Microsoft Office supports (a subset of) ODF, I thought it'd be fun to look at how Excel 2010 handles the leap-year-bug.

The first thing to do is to show how the leap-year-bug is handled by Excel:

So adding a day to February 28th 1900 will result in the non-existing date February 29th 1900, and if you subtract the dates February 27th 1900 and March 2nd 1900 (you'd expect the a value of 3) you actually get a value of 4.

So what will happen if you save this spreadsheet in ODF-format and open it again in Excel? You might expect that - since it was round-tripped through a format not supporting the leap-year-bug, the calculations would now be correct.

... but you'd be wrong. The result is excatly the same:

As I was, you might be wondering how the hell that was possible. But a simple inspection of the markup generated by Microsoft Excel 2010 reveals the answer:







A quick-and-dirty conclusion to this would be that Microsoft Excel 2010 violates not only ODF but also xsd:datetime, since February 29th is not a valid xsd:datetime. However, an inspection of ODF reveals that this is not the case. Microsoft Office claims conformance to ODF 1.1. and ODF 1.1 states the following about the value-space of the attribute "office:date-value" (Section 16.1 , p 702) :

A dateOrDateTime value is essentially an [xmlschema-2] date and time value with an optional time component. In other words, it may contain either a date, or a date and time value.

So strictly (*giggle*) speaking, Microsoft Office 2010 does not violate ODF 1.1 .

However - specifying an invalid date in an attribute that might contain xsd:dates is not very smart, dear Microsoft. Those of us wanting to use standard libraries to process the content of an ODF-document will likely get unpredictable results when trying to parse this invalid date. Heck, even .Net's DateTime.Parse()-method throws an exception when trying to parse this value.

Also, ODF TC has tightened up the prose in ODF 1.2 and it is now:

A dateOrDateTime value is either an [xmlschema-2] date value or an [xmlschema-2] dateTime value.

So Microsoft Office 2010 might not violate it now - but it will when ODF 1.2 comes out.

Extending ODF

Microsoft could always opt for extending ODF using the extension mechanism (to add elements and attributes using a foreign namespace). So Microsoft could chose to add their own attribute to the <office:spreadsheet>-element saying something like

[code:xml]<office:spreadsheet mso:EnableLeapYear="true"/>[/code]

The problem with this approach is that is comes into conflict with the new conformance clauses of ODF where a clear distinction between "normal" documents and "extended" documents is made. Procurement-wise it is a big no-no only to support extended documents (look what happened in Denmark!) and Microsoft risks that some government somewhere decides not to use Microsoft Office due to lack of conformance to the "normal" conformance clause of ODF 1.2.

Thus, Microsoft needs to find another solution ...

Configuration to the rescue!

Luckily for Microsoft (and we all know how picky they are wrt "preserving functionality" etc), there is a fully compliant way out of this while still preserving the leap-year-bug in spreadsheets - regardless of persistance format.

As you probably know that so-called config-item-sets are a gold-mine of endless possibilities. Originally (until ODF 1.1) the purpose of these elements and attributes were to store application specific settings, like (and this is a quote from ODF 1.1) "document settings, for example a default printer or view settings, for example zoom level". In ODF 1.2, all bets are off and there are no restrictions to the usage of the elements. The config-item-set elements were never meant to be an extension mechanism (by ODF TC co-chair from Sun/ORACLE - go figure), but OpenOffice.org uses them extensively - in fact, when creating a "blank" text-document, spreadsheet or presentation in OpenOffice.org, a total of 228 (76 for text documents, 66 for spreadsheets and 86 for presentations) settings (of which non are described in ODF) are defined in the the settings.xml-file of the packages. Somehow ODF TC has not found it necessary to include usage of config-item-sets in the "extended conformance clause", so a document can claim 100% conformance to ODF 1.2 "normal documents" while throwing dozens of settings into config-item-set elements. So the solution to claim conformance to ODF while enabling the leap-year-bug is simply

[code:xml]<config:config-item-set  config:name="mso:spreadsheet-settings">






This should be combined with this markup for the specific cell







(and you don't really need the bit I have commented out).

I don't know about you, but I find this just darn right fantastic!


I hate automatic spam filtering

Every time you use a tool, you make a (conscious) decision to trust that tool to do what it says. This is true for closed source software and open source software alike (especially if you don't have the skills to plow through the source of whatever tool you use).

Most people use a generic blogging platform for their pieces of information that they wish to share with the world. Popular tools are Blogger by Google or WordPress. These tools often come with spam filtering and all sorts of other features intended to make your life as blog-owner more easy.

The catch you sometimes find yourself in, is that they don't work as they should. Spam is let through and non-spam is deleted. That can be a real pain in the ass - especially if you are not notified by this.


on Rob's blog I have been writing a bit with him and Luc. Rob moderates his blog (as most people, including me). We kind of have to - because blog spam is annoying and it disturbs not only the owner, but also everyone participating in the conversation if mail notification is enabled.

(as a funny side-note, I once promised never, ever to moderate my blog ... well, colour me stupid)


I think Rob has become a victim of too agressive spam filtering - it sems that two of my posts have been lost in cyber space - even though I have tried to repost them several times. Maybe it's my (grammer-error prone) English (or, "Dænglish", as we call it here) tricking something - I dunno.

Luckily I have an archive of the stuff I write (because Rob's blog is certainly not the only one suffering from this)

So to preserve our common digital legacy, here are my two posts that were erronously caught:


Hi Rob,

<i>I suppose at some point Microsoft will approach us with a list of suggested additions to OOXML. That is the prerogative of any vendor or any national body.</i>
So if I coined this differently to e.g.

"I suppose at some point ECMA will approach us with a list of suggested additions to OOXML. That is the prerogative of any liaison or any national body."

... that would make your day?

I'd be happy to make that correction - just say the word.

"In any case, aside from being inaccurate, your comment is off topic. No more, please."

Well, it is your blog, so feel free to censor whatever you want. My point was to confirm parts of what Luc was saying - that indeed some of the extensions Microsoft has made to OOXML will likely be added to the standard.

How that can be OT is beyond me.


Hi Luc,

<i>I agree, but then we must have a formal commitment by Microsoft that they will implement ISO29500 Strict within the coming 12 to 24 months latest.</i>

Yes, and I'd personally encourage them to make such a statement. But it is really out of scope of WG4 to do anything about it.

<i>I would recommend everybody to not invest one cent of their money or one second of their time improving it: it is lost time and money.</i>

Well, there are several tools one can use to push Microsoft to implement S - "not participating" is perhaps the least effective of those.


Excel 2010 (Microsoft Office 2010 CTP TO-do list (01)

I have been looking at how Excel 2010 has implemented various features using ISO/IEC 29500-4:2008 - also known as "OOXML Transitional".


ISO/IEC comes in two variants, a "transitional" (T) and a "strict (S). Transitional is the one containing all the legacy stuff such as VML, legacy digest algorithms, leap year bug etc. S does not contain these things and is therefore considered "more pure". T is practically identical to the document format submitted to ISO/IEC by ECMA - also known as "ECMA-376". A document conforming to ECMA-376 will therefore also conform to T, since the schemas are practically identical.

T is currently a superset of S, which means that "T includes everything in S". This has the effect that within a T document, a vendor can take advantage of new features in S while still being in "the comfort zone of T". T is therefore considered by some as providing a “graceful migration path to S”, meaning that vendors can change their existing T-compliant code, on a case-by-case basis, to gradually support the features of S instead of those of T.

Update 2009-11-17: By reading in the spec today, I realised that I was wrong in saying that the leap-year bug was not in S - indeed it is. I appologize for the confusion. Thank you, Jens Hørlück for pointing this out to me.

As you know, Microsoft is claiming "IS29500 compliance" for their latest incarnation of Microsoft Office. OOXML is a big and complex document format, and even though OOXML was not published until waaaay after "feature freeze" of Office 2010, I thought it'd be interesting to take a look at how Excel has reached compliance.

When looking at the markup generated by Excel 2010, it is important to remember the background information above, since you'd expect to find traces of this "graceful migration" in the markup already.

Conformance class

The DIS-process added a new attribute to the root elements of the documents. This element would specify the conformance level of the document. As to not invalidate existing documents, the default value for this attribute was “transitional”. The other possible value is "strict".

So when creating a new spreadsheet using Excel 2010, which markup does it create?

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>

(condensed for easier reading)

The above is the markup for the root element of a SpreadsheetML spreadsheet.

So the conformance class attribute is omitted – making the document a T-document. I know that it is basically nonsense to add an attribute with the default value to implement support for this behavior, but for those of us that like looking at markup in text editors, I’d recommend Microsoft to include the conformanceClass-attribute with the appropriate value. As it is a no-brainer to implement for a consumer – it should be a no-brainer to implement for a producer.

Also – the existence of the conformanceClass-attribute is the only (or, best, anyway) indicator that a T document was created according to the schemas of ISO/IEC 29500 and not those of ECMA-376.

(and yes, I know that they are practically identical, but adding the attribute would be a clear indication that Microsoft wishes to produce markup according to ISO/IEC 29500 and not ECMA-376 1st Ed.)

Microsoft, please fix!


Support for ISO-dates in SpreadsheetML just might be the biggest issue we faced in the DIS-process. It was hugely controversial and as such, a large amount of time was used at the BRM to figure out a solution.

Excel 2010 allows a user to manually chose the option to persist dates in ISO-8601 format. This is done through the "Settings-dialogue" as

So let us see what Excel 2010 does with dates. I opened the application and in a cell I wrote the date 28-02-1900. I then unzipped the XLSX-file and found this markup in the Worksheet-part:


  <row r="1" >
    <c r="A1" t="d">

So the date is actually persisted in the file using ISO-8601 notation and the cell is correctly typed with a t=”d”-declaration.

Now, depending on your point of view, this is a good thing, because it takes us further from the hell-hole I have previously written about with respect to “date-typing-of-numbers” in ECMA-376.

Thank you, Microsoft – good for you!


Leap-year bug

So … with the “Save-as-ISO- 8601-dates”-setting to “yes, please” as previously described, I added a formula to the spreadsheet above in cell B1. The formula simply was “=A1+1”.

Anyone up for guessing the result?

Well, the result was this:

That is funny because of two things:

  1. I set the “Save-as-ISO Dates”-setting to “yes”, so I was not expecting the leap-year-bug to still be used.
  2. I am pretty sure that the date-representation “1900-02-29” is not a valid ISO-date.

So I looked at bit at the markup generated for this spreadsheet.

The result was:


  <row r="1" >
    <c r="A1" t="d">
    <c r="B1">


Ahem … dear Microsoft … WTF?

So you have persisted the first date as an ISO-date,

but the result of calculating a value based on it – you persist the result as a serial date? In which universe would this make sense?

(I should note that whenever doing calculations resulting in dates not being invalid, the date is correctly persisted by Excel 2010 as an ISO-8601 date)

But – as I was writing this I thought “wasn’t there something about an at-BRM-added attribute called “dateCompatibility”? This was the attribute used to determine if the date-system used should support the leap-year-bug or not. And truth be told, the attribute is not used – making the dateCompatibility default to “true” – hence honoring the leap-year-bug.

But – this could simply be a glitch of Excel2010, so I added the attribute myself to force Excel2010 to discard date compatibility. I’d then suspect the result of adding 1 to the date of 1900-02-28 would be 1900-03-01.

The markup was this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook dateCompatibility="false">
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>

Result: still 1900-02-29

I this calls for a few conclusive words before I move on:

Microsoft, I think you should stop using ISO-8601 dates in SpreadsheetML in T because of the arguments provided by my fellow WG4-expert Gareth Horton. But if you insist on using ISO-8601-dates in SpreadsheetML-T, you should do it right,. If you ask me, a user manually choosing to have Excel2010 use ISO-8601 dates, is a clear indication that the user does not want to deal with the leap-year-bug. The user has even decided to continue despite your warning that he or she might lose precision. So when a user performs this specific action, set the damn dateCompatibility-attribute to “false” and persist ALL dates as ISO-8601 dates.

You have a golden opportunity here to ditch the leap-year-bug for all new documents. If you act wisely and add the conformance attribute (so consuming applications can distinguish the files from ECMA-files) and setting the dateCompatibility-attribute to “false” when persisting dates as ISO-8601 dates, you’d have done really, really good. You have the by far biggest implementation of OOXML – so the vendors will follow your lead. Any application adding support for ISO-8601 dates in transitional documents in their existing ECMA-376-compliant code, will be able to kno

w what to do - and they'll do it the way you do it.

Now, I know you would like to demonstrate to all of us that you want to go towards “the S way”, but you need to do it right. Mixing ISO-8601 dates with serial dates simply to be able to maintain the leap-year-bug is not the right thing to do.

I have absolutely no idea of how the internals

of either the calculation instructions in Excel works – or even the Excel-team itself, but the way you have “added” ISO-8601 dates indicates to me, that you have changed barely anything except for the persistance-mechanism for dates.

I’m sure you’d argue that there is no longer time to change the inner workings of Excel2010, and you might very well be right about that. If that is indeed the case, I suggest you remove the option of saving dates as ISO-8601-dates from Excel2010 as soon as possible to avoid doing any more harm to the ecosystem around OOXML that we both share deep concerns for.


At the BRM the biggest pile of things added to OOXML was where a feature was previously only possible with the use of VML. Of these include adding comments to cells in spreadsheets. The comments themselves are stored in a “Comment part” of the OPC-package, but the display of the comment was done using VML. Luckily, at the BRM markup was added to allow DrawingML to be used instead.

So I used the spreadsheet from before and added a comment in cell B2 (the one with the crappy leap-year-bug result)


Anyone wanna guess if this comment is displayed using VML or DrawingML?

Yes, you guessed it … somewhere in the back of the document is a VML-fragment containing the “box” containing my comment.

[code:xml]<xml xmlns:v="urn:schemas-microsoft-com:vml"

 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1"/>

 </o:shapelayout><v:shapetype id="_x0000_t202" coordsize="21600,21600" o:spt="202"
  <v:stroke joinstyle="miter"/>
  <v:path gradientshapeok="t" o:connecttype="rect"/>
 </v:shapetype><v:shape id="_x0000_s1025" type="#_x0000_t202" style='position:absolute;

  visibility:hidden;mso-wrap-style:tight' fillcolor="#ffffe1" o:insetmode="auto">
  <v:fill color2="#ffffe1"/>
  <v:shadow on="t" color="black" obscured="t"/>
  <v:path o:connecttype="none"/>

  <v:textbox style='mso-direction-alt:auto'>
   <div style='text-align:left'></div>
  <x:ClientData ObjectType="Note">


    2, 15, 0, 2, 4, 55, 3, 16</x:Anchor>


If this was a tweet on Twitter, I’d finish it off using the tag #fail.

Document protection

And finally – what about document protection? “Document protection” is the mechanism in OOXML to allow applications to open documents in “read-only-mode”, to protect worksheets from editing etc. These are not really protecting the document as such, because a hash of the password is simply saved – but everything is still in clear text. Document protection was one of the areas where OOXML was fundamentally changed, allowing more robust algorithms to be used – and not only the weak “legacy-algorithm” previously supported by Office 2007.

So in Excel2010 I chose to protect the active sheet and looked at the markup again. Again, I was hoping to find some of the new markup, but again I was let down. Excel2010 still uses the weak algorithm and it still uses the legacy markup to persist it.

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>


  <sheetProtection password="8985" sheet="1" objects="1" scenarios="1" />

This is not only an annoyance for those not working on the Windows-platform – it is also an annoyance for those of us working with e.g. .Net development. If Excel2010 used SHA1, SHA256 or one of the other modern algorithms supported directly by .Net framework, we’d be able to use it straight out of the box. Now all of us still need to add additional code/assemblies to our code – simply to protect a whorksheet.

At the end …

I must admit that I am a bit disappointed. I know that barely a year has gone since the publication of ISO/IEC 29500, but I had expected more of the things we discussed at the BRM to be implemented. At the very least I was expecting some of the more “easy ones” to have been implemented – like the conformanceClass-attribute, VML in comments and document protection. It would have been a nice token of “good faith”.

Now, we are basically left with nothing. Do note, however, that the observations above are by no means a comprehensive test. These represent simply a few of the thoughts on the top of my head while writing this.

Next time: WordpressingML

PS: I have been trying to find a list of the stuff from the BRM that was implemented in Office 2010 (like ISO-8601 dates), but I have been unsuccessful. So if anyone can point me towards such a list, I'd be happy to include those results above.

Mummy, Tom from school is an idiot

Back in the day when I started blogging, I showed a friend of mine a blog post I had written. He noticed the large number of links to other bloggers in the piece, and he asked me (tongue-in-cheek) if those links were some kind of geeky way of saying "I love you" to the people I linked to.

Well, to some extend, he was correct. But that is not the only way people communicate varm feelings to each other - without saying it directly. If you look at kids in (pre)school, when they start to have feelings for the opposite sex, the message to everyone else is typically "No, I don't like Tom at all ... he's a total idiot!". So parents (I presume) quickly learn that sometimes name-calling is a token of love and affection.

So ... Alex and I have made a little game. As you might now, name-calling is quite the way to do stuff - at least if you are in some way opposed to this whole "oh oh xml thingy". Classics are "drone" and "$hill" and more exotic examples are "nazi" or "saboteur". The current score is listed in the table at the right of this page.

The rules are quite simple:

  • Each mention of either of us with a special name scores one point
  • If one of us are mentioned with name in the title of the article/post, each score for each mention in the article is doubled

The prize is beer in Paris for the next WG4-meeting in the beginning of December.

This is all fine ... except for one thing ... I am loosing miserably. I had a good thing going for a few months (and I even was so cockey to suggest a wager on beer, because I "knew" I was winning"). But somehow Mr. Fox has gained momentum and with the latest bashing of Alex, we are now tied.

So Roy, Pam ... give me a hand here ... think about all the frustration I might have caused you (and you families) and put it down in writing ... otherwise I'll be the one buying beer in Paris and Alex will be winning ... and none of us want that to happen, do we?

PS: Roy, I love you too ...