a 'mooh' point

clearly an IBM drone

Moving towards OOXML(S) (update)

Some time ago I wrote about some of the enhancements of Microsoft Office in terms of how far they have made it in implementing the content of the conformance profile "Strict" or "<S>". As you might recall, I made a run-through of a list of feature areas and marked each with either a green, yellow or red traffic light. There were no red traffic lights, but some areas had a yellow marking. These were

  • "ink"
  • "legacy diagrams"
  • "groups"
  • "form controls"
  • "activeX objects".

These document types previously used VML as containing frame etc, but Microsoft Office 2010 was now supposedly using DrawingML for these. The reason for them being yellow and not red was that I did simply not know how to test these things - either because of poor Microsoft Office skills or lack of proper hardware ("ink" is used on tablet PC's and I don't have one of those at hand).

Stockholm plug-fest

When WG4 met in Stockholm a couple of months ago, I got a chance to take a look at the documents I couldn't create myself. The cool thing about participating in these meetings is that there is an abundance of different hardware and software on the laptops of the delegates, so after one of the sessions a few of us had our own little "Microsoft Office OOXML <S> interop plug-fest" and I finally had a chance to get my hands on those files.

I could have simply updated the previous article with the new information, but a couple of interesting thing emerged that made me write up a new piece.

First, the results are this:

File typeFeatureComment 
DOCX Ink Drawings Previously used VML, now uses DrawingML
Success, green traffic light
XLSX Ink Drawings Previously used VML, now uses DrawingML Success, green traffic light
PPTX Ink Drawings Previously used VML, now uses DrawingML Success, green traffic light
DOCX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
XLSX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
PPTX Legacy Diagrams Previously used VML, now uses DrawingML Success, green traffic light
DOCX Drawing Shapes Previously used VML, now uses DrawingML Success, green traffic light
DOCX Textboxes Previously used VML, now uses DrawingML Success, green traffic light
DOCX WordArt Previously used VML, now uses DrawingML Success, green traffic light
DOCX Groups Previously used VML, now uses DrawingML Success, green traffic light
XLSX Form Controls Previously used VML, now uses DrawingML - except on "chart sheets" Success, green traffic light
XLSX ActiveX Objects Previously used VML, now uses DrawingML Success, green traffic light
PPTX ActiveX Objects Previously used VML, now uses DrawingML Success, green traffic light
XLSX OLE Objects Previously used VML, now uses DrawingML Success, green traffic light
DOCX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
XLSX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
PPTX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Success, green traffic light
XLSX ISO-dates Can persist dates in ISO-8601 format and avoids the "evil" serial dates. Failure, red traffic light

The trained eye will notice that all the yellow lights have been replaced by green lights - In other words, the list above clearly shows that even though Microsoft Office 2010 does not write <S>, the developers in Redmond have clearly made some significant progress.

There are a couple of interesting points about the technicalities of the files I looked at.

Predicting the future is difficult

The files containing "legacy diagrams" stand out, because of the way Microsoft Office 2010 breaks compatibility with e.g. Microsoft Office 2003 and earlier versions. The thing is - when loading a PPT-file with a legacy diagram from e.g. Microsoft Office 2003 the diagram will be in VML-format. When it is loaded in Microsoft Office 2010, modified and saved again - it won't save the diagram in VML. It just won't. The diagram will be saved all right - but now using DrawingML instead of VML. So this is essentially a case where interoperability with this "legacy" application is hurt since Microsoft Office 2003 has no idea what to do with the DrawingML it loads.

MCE to the rescue

For all the other files, MCE once again steps up.

If we look at the file containing the ink notations, (some of) the markup will look like this:

[code:xml]<mc:AlternateContent>
  <mc:Choice Requires="wpi">
    <w:drawing>
      <wp:anchor>
        <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
        <a:graphicData uri="http://schemas.microsoft.com/office/word/2010/wordprocessingInk">
      </wp:anchor>
    </w:drawing>
  </mc:Choice>
  <mc:Fallback>
    <w:pict>
      <v:shapetype >
        <v:stroke joinstyle="miter" />
        <v:formulas>
          <v:f eqn="if lineDrawn pixelLineWidth 0" />
        </v:formulas>
      </v:shapetype>
      <v:shape id="Ink 15">
        <v:imagedata r:id="rId6" o:title="" />
      </v:shape>
    </w:pict>
  </mc:Fallback>
</mc:AlternateContent>[/code]

(I have modified and trimmed the real XML for easier reading - especially the VML was really, really ugly)

So with this approach you can actually have the best of two worlds - the new and the old without losing information. That is - if you know MCE, of course. This again shows what a great tool alternating content blocks (ACB) of MCE are for this task. It allows you to innovate while still making it possible to ensure some sort of compatibility with earlier programs that did not know of the new technology.

And what about them dates?

The even more trained eye will have noticed that the green traffic light for ISO-dates in SpreadsheetML has been downgraded to a flashing red traffic sign.

The first test I did with Microsoft Office 2010 CTP1 had a small check-box in the "backstage" area that would allow dates in spreadsheets to be persisted in ISO-8601 format. With RTM of Microsoft Office 2010 this check-box is gone so we are now back to using serial dates again.

It would be easy to hit on Microsoft for removing this check-box, and I am sure that many will. But the truth is that the removal of this feature is due to activities in WG4 where we maintain OOXML.

As some of you may recall, the introduction of ISO-dates in OOXML was done in Geneva at the BRM in those hectic days we spent there. The trouble with introducing the ISO-dates in OOXML was that it looked really, really good on paper - but it sucked in real life.

The reason is that we "forgot" to change namespace name for ISO OOXML-files so documents conforming to ISO OOXML<T> share namespace name with documents conforming to ECMA 376. This has has enormous consequences for spreadsheets and those applications designed to support ECMA-376 but not necessarily ISO OOXML. At the second F2F of WG4 in Prague in 2009, we had a demonstration of how bad is was - not a single application would interpret these new dates correctly and - what was perhaps even worse - they did not display any warnings to the user.

We have been discussing this a lot in WG4, and in the end we decided to start the work to remove usage of ISO-dates from Part 4. This correction of a BRM decision was not easy to agree on (AFAIR it has not been finally approved as of yet) and the removal of the "Save-as ISO-dates" feature in Microsoft Office 2010 is propably in anticipation of this pending removal of ISO-dates from <T>. I think it might be important to note that this removal was not due to "pleasing Microsoft". In fact - they had already implemented support for this in CTP1. We are removing ISO-dates from Part 4 due to problems with everybody else.

I always like to give credit where credit is due, and I think this is one of those cases. Microsoft has clearly worked with - and listened to - the standardisation community and has chosen to remove a feature they had already implemented.

So what's next?

Well, Doug Mahugh recently wrote about the approach of Microsoft when dealing with OOXML<S>. Amongst other things he wrote that Microsoft Office 2010 will have read-support for OOXML<S> but that "a small number of optional features" will still be lost (that's just new-speak meaning "we haven't implemented support for all of Part 1"). I asked him what that list consisted of, and Doug said they'd provide anwsers when they have them. I hope that list will come soon.

As I have mentioned earlier, CIBER Denmark A/S (the company I work for) is not in the "productivity-suite-business" - but we develop solutions that work with these suites be that Microsoft Office, OpenOffice.org, iWork or others. Having read-support for OOXML<S> in Microsoft Office 2010 helps us a great deal, because we can now start trimming our code to target OOXML<S> instead of OOXML<T>. We think that adds great value to us and our customers. But we need a definitive list of the areas where we can expect Microsoft Office 2010 to ignore our markup. If we can't have that we are forced to go the safe-route and keep producing OOXML<T>-files and we'd hate to do that. But without a list from Microsoft, we feel that our hands are tied behind our backs.

So please, Microsoft - give us the list ASAP. Otherwise the uncertainty of what Microsoft Office will ignore is to great a risk for us to start producing strict files and your read-support for OOXML<S> is more or less useless to us.

Moving towards OOXML(S)

Some time ago I wrote a bit about what Microsoft Office had managed to get into Microsoft Office 2010 CTP1 (or, I wrote about the stuff I had tested). As you might recall, the results were rather slim, so I wrote to Microsoft to hear, if that was really it. It has been the fear of many that Microsoft will never, ever care at all about the strict conformance clause of ISO/IEC 29500, and my tests clearly was a sign that they were right. Heck, some even mentioned that "the only choice for Microsoft is to avoid adding new BRM features in their OOXML files".

On the other hand I have always regarded big companies like Novell, IBM, ORACLE etc as rather simplistic in their development cycles - that they'll always choose the path of least resistance. Microsoft is in no different here, and moving towards strict and side-tracking nasty. legacy stuff like VML etc is clearly an attempt to make the developmentpath easier in the future.

The list I got was this:

File typeFeatureComment 
DOCX Ink Drawings Previously used VML, now uses DrawingML
Advarsel, gult trafiklys
XLSX Ink Drawings Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX Ink Drawings Previously used VML, now uses DrawingML Advarsel, gult trafiklys
DOCX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX Legacy Diagrams Previously used VML, now uses DrawingML Advarsel, gult trafiklys
DOCX Drawing Shapes Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX Textboxes Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX WordArt Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX Groups Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX Form Controls Previously used VML, now uses DrawingML - except on "chart sheets" Advarsel, gult trafiklys
XLSX ActiveX Objects Previously used VML, now uses DrawingML Advarsel, gult trafiklys
PPTX ActiveX Objects Previously used VML, now uses DrawingML Advarsel, gult trafiklys
XLSX OLE Objects Previously used VML, now uses DrawingML Succes, grønt trafiklys
DOCX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
XLSX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
PPTX ST_OnOff Uses the new ISO-approved simple type without the values "on" and "off" Succes, grønt trafiklys
XLSX ISO-dates Can persist dates in ISO-8601 format and avoids the "evil" serial dates. Succes, grønt trafiklys

(The last four was addeed by me and didn't appear on the list from Microsoft)

Now, "someone" once wrote to me that you shouldn't make any decisions based on what Microsoft Office says they will do - you should wait until they actually do act. I couldn't agree more, so I tried to test the list I received. I have tested the lines marked with a green traffic light by first creating a document in Microsoft Office 2007 to verify the usage of e.g. VML and then I created the same document in Microsoft Office 2010 Beta [']. The lines marked with yellow traffic lights have not been tested by me, since I frankly don't have the Office-skills to create a file (what the hell is an "Ink Drawing, btw?). If anyone can test this, I'd be happy to update the list. Also, regarding the lines about ST_OnOff, I have tried to create files that would contain the "bad" On/Off-values, but I haven't succeeded in this. That is not the same as deterministically verifying that it cannot be done, so again - if you can create a file in Microsoft Office 2010 with the bad values, send it to me and I'll update the list.

So getting back to "don't trust Microsoft as far as you can throw them", this is in no way a definitive list. The list is based on Microsoft Office 2010 Beta, and much can happen until final RTM - both in the right direction with even more things being fixed, but also in the wrong direction with things being pulled off the list again (WinFS, anyone?). But for those of us not implementing complete Office suites but "merely" interacting with the ecosystem by generating files, this is undoubtly good news. Add to this that Microsoft confirmed a few TC-calls ago in WG4, that pending the current AMD1-ballot, Microsoft would add the new namespaces of strict files to the white-list of known namespaces in Office 2010. This effectively means that Microsoft Office will be able to load (some) strict files, and if you just happen to generate PPTX-files with embedded objects, you'll likely never again have to generate markup like this:

[code:xml]<w:object w:dxaOrig="15" w:dyaOrig="15">
    <v:shapetype
        id="_x0000_t75"
        coordsize="21600,21600"
        o:spt="75"
        o:preferrelative="t"
        path="m@4@5l@4@11@9@11@9@5xe"
        filled="f"
        stroked="f">
        <v:stroke joinstyle="miter"/>
            <v:formulas>
                <v:f eqn="if lineDrawn pixelLineWidth 0"/>
                <v:f eqn="sum @0 1 0"/>
                <v:f eqn="sum 0 0 @1"/>
                <v:f eqn="prod @2 1 2"/>
                <v:f eqn="prod @3 21600 pixelWidth"/>
                <v:f eqn="prod @3 21600 pixelHeight"/>
                <v:f eqn="sum @0 0 1"/>
                <v:f eqn="prod @6 1 2"/>
                <v:f eqn="prod @7 21600 pixelWidth"/>
                <v:f eqn="sum @8 21600 0"/>
                <v:f eqn="prod @7 21600 pixelHeight"/>
                <v:f eqn="sum @10 21600 0"/>
            </v:formulas>
            <v:path
                o:extrusionok="f"
                gradientshapeok="t"
                o:connecttype="rect"/>
        <o:lock v:ext="edit" aspectratio="t"/>
    </v:shapetype>
    <v:shape
        id="_x0000_i1025"
        type="#_x0000_t75"
        style="width:.75pt;height:.75pt"
        o:ole="">
        <v:imagedata r:id="rId4" o:title=""/>
    </v:shape>
    <o:OLEObject
        Type="Embed"
        ProgID="opendocument.WriterDocument.1"
        ShapeID="_x0000_i1025"
        DrawAspect="Content"
        ObjectID="_1327745060"
        r:id="rId5"/>
</w:object>[/code]

['] I have tried, in vain, to get my hands on the latest pre-release edition of Microsoft Office 2010 ... so much for being a drone, when you can't get your hands on the latest bits Frown

 

Microsoft Office 2010 Beta, ODF and leap-year-bug

Some time ago I did some tests of Excel in Microsoft Office 2010 (CTP). The test was around OOXML - but test of ODF-support was missing.

One of the things ODF is missing but is in OOXML is the leap-year-bug ... although most of propably don't miss it all that much. The leap-year-bug is the good ol' Lotus 1-2-3 bug that treated 1900 as a leap year. As a consequence of that, calculations based on dates in the range from January 1st 1900 and February 28th 1900 with dates after this period will be off with one day.

Since Microsoft Office supports (a subset of) ODF, I thought it'd be fun to look at how Excel 2010 handles the leap-year-bug.

The first thing to do is to show how the leap-year-bug is handled by Excel:

So adding a day to February 28th 1900 will result in the non-existing date February 29th 1900, and if you subtract the dates February 27th 1900 and March 2nd 1900 (you'd expect the a value of 3) you actually get a value of 4.

So what will happen if you save this spreadsheet in ODF-format and open it again in Excel? You might expect that - since it was round-tripped through a format not supporting the leap-year-bug, the calculations would now be correct.

... but you'd be wrong. The result is excatly the same:

As I was, you might be wondering how the hell that was possible. But a simple inspection of the markup generated by Microsoft Excel 2010 reveals the answer:

[code:xml]<table:table-cell

  office:value-type="date"

  office:date-value="1900-02-29T00:00:00"

  table:formula="msoxl:=A2+1"

  >
  <text:p>29-02-1900</text:p>

  </table:table-cell>[/code]

A quick-and-dirty conclusion to this would be that Microsoft Excel 2010 violates not only ODF but also xsd:datetime, since February 29th is not a valid xsd:datetime. However, an inspection of ODF reveals that this is not the case. Microsoft Office claims conformance to ODF 1.1. and ODF 1.1 states the following about the value-space of the attribute "office:date-value" (Section 16.1 , p 702) :

A dateOrDateTime value is essentially an [xmlschema-2] date and time value with an optional time component. In other words, it may contain either a date, or a date and time value.

So strictly (*giggle*) speaking, Microsoft Office 2010 does not violate ODF 1.1 .

However - specifying an invalid date in an attribute that might contain xsd:dates is not very smart, dear Microsoft. Those of us wanting to use standard libraries to process the content of an ODF-document will likely get unpredictable results when trying to parse this invalid date. Heck, even .Net's DateTime.Parse()-method throws an exception when trying to parse this value.

Also, ODF TC has tightened up the prose in ODF 1.2 and it is now:

A dateOrDateTime value is either an [xmlschema-2] date value or an [xmlschema-2] dateTime value.

So Microsoft Office 2010 might not violate it now - but it will when ODF 1.2 comes out.

Extending ODF

Microsoft could always opt for extending ODF using the extension mechanism (to add elements and attributes using a foreign namespace). So Microsoft could chose to add their own attribute to the <office:spreadsheet>-element saying something like

[code:xml]<office:spreadsheet mso:EnableLeapYear="true"/>[/code]

The problem with this approach is that is comes into conflict with the new conformance clauses of ODF where a clear distinction between "normal" documents and "extended" documents is made. Procurement-wise it is a big no-no only to support extended documents (look what happened in Denmark!) and Microsoft risks that some government somewhere decides not to use Microsoft Office due to lack of conformance to the "normal" conformance clause of ODF 1.2.

Thus, Microsoft needs to find another solution ...

Configuration to the rescue!

Luckily for Microsoft (and we all know how picky they are wrt "preserving functionality" etc), there is a fully compliant way out of this while still preserving the leap-year-bug in spreadsheets - regardless of persistance format.

As you probably know that so-called config-item-sets are a gold-mine of endless possibilities. Originally (until ODF 1.1) the purpose of these elements and attributes were to store application specific settings, like (and this is a quote from ODF 1.1) "document settings, for example a default printer or view settings, for example zoom level". In ODF 1.2, all bets are off and there are no restrictions to the usage of the elements. The config-item-set elements were never meant to be an extension mechanism (by ODF TC co-chair from Sun/ORACLE - go figure), but OpenOffice.org uses them extensively - in fact, when creating a "blank" text-document, spreadsheet or presentation in OpenOffice.org, a total of 228 (76 for text documents, 66 for spreadsheets and 86 for presentations) settings (of which non are described in ODF) are defined in the the settings.xml-file of the packages. Somehow ODF TC has not found it necessary to include usage of config-item-sets in the "extended conformance clause", so a document can claim 100% conformance to ODF 1.2 "normal documents" while throwing dozens of settings into config-item-set elements. So the solution to claim conformance to ODF while enabling the leap-year-bug is simply

[code:xml]<config:config-item-set  config:name="mso:spreadsheet-settings">
  <config:config-item

    config:name="EnableLeapYearBug"

    config:type="boolean"

  >

    true

  </config:config-item>
</config:config-item-set>[/code]

This should be combined with this markup for the specific cell

[code:xml]<table:table-cell

  office:value-type="float"

  office:-value="60"

  table:formula="msoxl:=A2+1"

  >
  <!--<text:p>29-02-1900</text:p>-->

  </table:table-cell>[/code]

(and you don't really need the bit I have commented out).

I don't know about you, but I find this just darn right fantastic!

Smile

Excel 2010 (Microsoft Office 2010 CTP TO-do list (01)

I have been looking at how Excel 2010 has implemented various features using ISO/IEC 29500-4:2008 - also known as "OOXML Transitional".

Background:

ISO/IEC comes in two variants, a "transitional" (T) and a "strict (S). Transitional is the one containing all the legacy stuff such as VML, legacy digest algorithms, leap year bug etc. S does not contain these things and is therefore considered "more pure". T is practically identical to the document format submitted to ISO/IEC by ECMA - also known as "ECMA-376". A document conforming to ECMA-376 will therefore also conform to T, since the schemas are practically identical.

T is currently a superset of S, which means that "T includes everything in S". This has the effect that within a T document, a vendor can take advantage of new features in S while still being in "the comfort zone of T". T is therefore considered by some as providing a “graceful migration path to S”, meaning that vendors can change their existing T-compliant code, on a case-by-case basis, to gradually support the features of S instead of those of T.

Update 2009-11-17: By reading in the spec today, I realised that I was wrong in saying that the leap-year bug was not in S - indeed it is. I appologize for the confusion. Thank you, Jens Hørlück for pointing this out to me.

As you know, Microsoft is claiming "IS29500 compliance" for their latest incarnation of Microsoft Office. OOXML is a big and complex document format, and even though OOXML was not published until waaaay after "feature freeze" of Office 2010, I thought it'd be interesting to take a look at how Excel has reached compliance.

When looking at the markup generated by Excel 2010, it is important to remember the background information above, since you'd expect to find traces of this "graceful migration" in the markup already.

Conformance class

The DIS-process added a new attribute to the root elements of the documents. This element would specify the conformance level of the document. As to not invalidate existing documents, the default value for this attribute was “transitional”. The other possible value is "strict".

So when creating a new spreadsheet using Excel 2010, which markup does it create?

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook>
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>
  </sheets>
</workbook>[/code]

(condensed for easier reading)

The above is the markup for the root element of a SpreadsheetML spreadsheet.

So the conformance class attribute is omitted – making the document a T-document. I know that it is basically nonsense to add an attribute with the default value to implement support for this behavior, but for those of us that like looking at markup in text editors, I’d recommend Microsoft to include the conformanceClass-attribute with the appropriate value. As it is a no-brainer to implement for a consumer – it should be a no-brainer to implement for a producer.

Also – the existence of the conformanceClass-attribute is the only (or, best, anyway) indicator that a T document was created according to the schemas of ISO/IEC 29500 and not those of ECMA-376.

(and yes, I know that they are practically identical, but adding the attribute would be a clear indication that Microsoft wishes to produce markup according to ISO/IEC 29500 and not ECMA-376 1st Ed.)

Microsoft, please fix!

ISO-dates

Support for ISO-dates in SpreadsheetML just might be the biggest issue we faced in the DIS-process. It was hugely controversial and as such, a large amount of time was used at the BRM to figure out a solution.

Excel 2010 allows a user to manually chose the option to persist dates in ISO-8601 format. This is done through the "Settings-dialogue" as

So let us see what Excel 2010 does with dates. I opened the application and in a cell I wrote the date 28-02-1900. I then unzipped the XLSX-file and found this markup in the Worksheet-part:

[code:xml]<sheetData>

  <row r="1" >
    <c r="A1" t="d">
      <v>1900-02-28</v>
    </c>
  </row>
</sheetData>[/code]

So the date is actually persisted in the file using ISO-8601 notation and the cell is correctly typed with a t=”d”-declaration.

Now, depending on your point of view, this is a good thing, because it takes us further from the hell-hole I have previously written about with respect to “date-typing-of-numbers” in ECMA-376.

Thank you, Microsoft – good for you!

Smile

Leap-year bug

So … with the “Save-as-ISO- 8601-dates”-setting to “yes, please” as previously described, I added a formula to the spreadsheet above in cell B1. The formula simply was “=A1+1”.

Anyone up for guessing the result?

Well, the result was this:

That is funny because of two things:

  1. I set the “Save-as-ISO Dates”-setting to “yes”, so I was not expecting the leap-year-bug to still be used.
  2. I am pretty sure that the date-representation “1900-02-29” is not a valid ISO-date.

So I looked at bit at the markup generated for this spreadsheet.

The result was:

[code:xml]

<sheetData>
  <row r="1" >
    <c r="A1" t="d">
      <v>1900-02-28</v>
    </c>
    <c r="B1">

      <f>A1+1</f>
      <v>60</v>
    </c>
  </row>
</sheetData>[/code]

Ahem … dear Microsoft … WTF?

So you have persisted the first date as an ISO-date,

but the result of calculating a value based on it – you persist the result as a serial date? In which universe would this make sense?

(I should note that whenever doing calculations resulting in dates not being invalid, the date is correctly persisted by Excel 2010 as an ISO-8601 date)

But – as I was writing this I thought “wasn’t there something about an at-BRM-added attribute called “dateCompatibility”? This was the attribute used to determine if the date-system used should support the leap-year-bug or not. And truth be told, the attribute is not used – making the dateCompatibility default to “true” – hence honoring the leap-year-bug.

But – this could simply be a glitch of Excel2010, so I added the attribute myself to force Excel2010 to discard date compatibility. I’d then suspect the result of adding 1 to the date of 1900-02-28 would be 1900-03-01.

The markup was this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook dateCompatibility="false">
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>
  </sheets>
</workbook>[/code]

Result: still 1900-02-29

I this calls for a few conclusive words before I move on:

Microsoft, I think you should stop using ISO-8601 dates in SpreadsheetML in T because of the arguments provided by my fellow WG4-expert Gareth Horton. But if you insist on using ISO-8601-dates in SpreadsheetML-T, you should do it right,. If you ask me, a user manually choosing to have Excel2010 use ISO-8601 dates, is a clear indication that the user does not want to deal with the leap-year-bug. The user has even decided to continue despite your warning that he or she might lose precision. So when a user performs this specific action, set the damn dateCompatibility-attribute to “false” and persist ALL dates as ISO-8601 dates.

You have a golden opportunity here to ditch the leap-year-bug for all new documents. If you act wisely and add the conformance attribute (so consuming applications can distinguish the files from ECMA-files) and setting the dateCompatibility-attribute to “false” when persisting dates as ISO-8601 dates, you’d have done really, really good. You have the by far biggest implementation of OOXML – so the vendors will follow your lead. Any application adding support for ISO-8601 dates in transitional documents in their existing ECMA-376-compliant code, will be able to kno

w what to do - and they'll do it the way you do it.

Now, I know you would like to demonstrate to all of us that you want to go towards “the S way”, but you need to do it right. Mixing ISO-8601 dates with serial dates simply to be able to maintain the leap-year-bug is not the right thing to do.

I have absolutely no idea of how the internals

of either the calculation instructions in Excel works – or even the Excel-team itself, but the way you have “added” ISO-8601 dates indicates to me, that you have changed barely anything except for the persistance-mechanism for dates.

I’m sure you’d argue that there is no longer time to change the inner workings of Excel2010, and you might very well be right about that. If that is indeed the case, I suggest you remove the option of saving dates as ISO-8601-dates from Excel2010 as soon as possible to avoid doing any more harm to the ecosystem around OOXML that we both share deep concerns for.

VML

At the BRM the biggest pile of things added to OOXML was where a feature was previously only possible with the use of VML. Of these include adding comments to cells in spreadsheets. The comments themselves are stored in a “Comment part” of the OPC-package, but the display of the comment was done using VML. Luckily, at the BRM markup was added to allow DrawingML to be used instead.

So I used the spreadsheet from before and added a comment in cell B2 (the one with the crappy leap-year-bug result)

.

Anyone wanna guess if this comment is displayed using VML or DrawingML?

Yes, you guessed it … somewhere in the back of the document is a VML-fragment containing the “box” containing my comment.

[code:xml]<xml xmlns:v="urn:schemas-microsoft-com:vml"

 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel">
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1"/>

 </o:shapelayout><v:shapetype id="_x0000_t202" coordsize="21600,21600" o:spt="202"
  path="m,l,21600r21600,l21600,xe">
  <v:stroke joinstyle="miter"/>
  <v:path gradientshapeok="t" o:connecttype="rect"/>
 </v:shapetype><v:shape id="_x0000_s1025" type="#_x0000_t202" style='position:absolute;
  margin-left:120.75pt;margin-top:1.5pt;width:126pt;height:55.5pt;z-index:1;

  visibility:hidden;mso-wrap-style:tight' fillcolor="#ffffe1" o:insetmode="auto">
  <v:fill color2="#ffffe1"/>
  <v:shadow on="t" color="black" obscured="t"/>
  <v:path o:connecttype="none"/>

  <v:textbox style='mso-direction-alt:auto'>
   <div style='text-align:left'></div>
  </v:textbox>
  <x:ClientData ObjectType="Note">

   <x:MoveWithCells/>
   <x:SizeWithCells/>

   <x:Anchor>
    2, 15, 0, 2, 4, 55, 3, 16</x:Anchor>

   <x:AutoFill>False</x:AutoFill>
   <x:Row>0</x:Row>
   <x:Column>1</x:Column>
  </x:ClientData>
 </v:shape></xml>[/code]

If this was a tweet on Twitter, I’d finish it off using the tag #fail.

Document protection

And finally – what about document protection? “Document protection” is the mechanism in OOXML to allow applications to open documents in “read-only-mode”, to protect worksheets from editing etc. These are not really protecting the document as such, because a hash of the password is simply saved – but everything is still in clear text. Document protection was one of the areas where OOXML was fundamentally changed, allowing more robust algorithms to be used – and not only the weak “legacy-algorithm” previously supported by Office 2007.

So in Excel2010 I chose to protect the active sheet and looked at the markup again. Again, I was hoping to find some of the new markup, but again I was let down. Excel2010 still uses the weak algorithm and it still uses the legacy markup to persist it.

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<worksheet>

  <sheetData/>
  <sheetProtection password="8985" sheet="1" objects="1" scenarios="1" />
</worksheet>[/code]

This is not only an annoyance for those not working on the Windows-platform – it is also an annoyance for those of us working with e.g. .Net development. If Excel2010 used SHA1, SHA256 or one of the other modern algorithms supported directly by .Net framework, we’d be able to use it straight out of the box. Now all of us still need to add additional code/assemblies to our code – simply to protect a whorksheet.

At the end …

I must admit that I am a bit disappointed. I know that barely a year has gone since the publication of ISO/IEC 29500, but I had expected more of the things we discussed at the BRM to be implemented. At the very least I was expecting some of the more “easy ones” to have been implemented – like the conformanceClass-attribute, VML in comments and document protection. It would have been a nice token of “good faith”.

Now, we are basically left with nothing. Do note, however, that the observations above are by no means a comprehensive test. These represent simply a few of the thoughts on the top of my head while writing this.

Next time: WordpressingML

PS: I have been trying to find a list of the stuff from the BRM that was implemented in Office 2010 (like ISO-8601 dates), but I have been unsuccessful. So if anyone can point me towards such a list, I'd be happy to include those results above.