Losing data the silent way - ISO8601-dates

by jlundstocholm 11. April 2009 23:52

In Prague we spent quite some time discussing how to deal with the fact that applications supporting ECMA-376 1st Ed. not necessarily support ISO/IEC 29500:2008 strict as well. Our talks revolved primarily around how major implementations dealt with the modified functionality of the elements <cell> and <v> in SpreadsheetML now that ISO-dates are allowed as content of the <v>-element. But “dates in spreadsheets” is not the only place where changes occurred. Changes were also made to other areas, including

  • Object embedding
  • Comments in spreadsheets
  • Hash-functions for document protection

This will be the first post in a series of posts evolving around how IS29500 differs from ECMA-376 and how existing applications behave when encountering a document with new content. What I will do here is to create some sample documents and load them in the applications I have access to that supports OOXML the best. In my case these are Microsoft Office 2007 SP2, OpenOffice.org 3.0.1 and NeoOffice for Mac and Apple iWorks. If you want to contribute and you have access to other applications, please let me know the result and I’ll update the article with your findings. If you have access to Microsoft Office 2007 SP1, I'd really like to know. When the series is done I’ll post a bit about MCE and how it might help overcome some of the problems I have highlighted (if we’ll get to change the namespace for the strict edition of IS29500 schemas)

I should also note that as the series progresses, the examples I make will increase in complexity. A consequence of this will be that my examples will be more of a “magic-8-ball-type prediction” than “simple examples of IS29500-strict documents”. Since there is not a single application out there supporting IS29500-strict, the examples will be my “qualified guesses” to how applications might interpret IS29500-strict when they implement it.

ISO-8601 dates in SpreadsheetML

Let me first touch upon the problem with dates in SpreadsheetML since this was the problem we talked about the most. Gareth Horton from the UK national body hand-crafted a spreadsheet document with these new dates. I have modified his example a bit to better illustrate the point. Files are found at the bottom of this post.

In the original submission to ISO dates were persisted in SpreadsheetML as “Julian numbers” (serial representation) and subsequently formatted as dates using number format styles.

[code=xml]<sheetData>
  <row r="1">
    <c r="A1" s="1">
      <v>39904</v>
    </c>
  </row>
  <row r="2">
    <c r="A2" s="1">
      <v>39905</v>
    </c>
  </row>
(…)
  <row r="10">
    <c r="A10" s="1">
      <v>39913</v>
    </c>
  </row>
</sheetData>[/code]

So the above would create a column with 10 rows displaying the dates from April 1st to April 10th.

Let’s change one of the cells to contain a date persisted in ISO-8601 format.

[code=xml]<row r="9">
  <c r="A9" s="1" t="d">
    <v>2009-04-09T01:02:03.04Z</v>
  </c>
</row>[/code]

So the cell contains an ISO-8601 date and it is formatted using the same number format as the other cells. I have added a bit of additional data to the spreadsheet to illustrate the problem with using formulas on these values.

Result

The interesting thing to investigate iswhat happens when this cell is loaded in a popular OOXML-supporting application. Note here that the existing corpus of implementations supporting OOXML supports the initial edition of OOXML, ECMA-376 1st Ed.So they would have no way to look into the specification and see what to do with a cell containing an ISO/IEC 8601 date value.

Microsoft Excel 2007 SP2

As you can see Excel 2007 screws up the content of the cell. And on top of that, should you try to manipulate the content of the cells with formulas, they are also basically useless. The trouble? Well, you are not notified that Excel 2007 does not know how to handle the content of the cell, so chances are that you’ll never find out – until you find yourself in a position where there are real consequences to the faulty data and kittens are killed.

OpenOffice 3.0.1 Calc

 

 

The result here is almost the same. Data is lost and the user is not notified.

NeoOffice for Mac

 

Again we see the same result. This is not so strange, since the latest version of NeoOffice shares the same code base as OOo 3.0.1 so behavious should be the same.

iWorks 09 Numbers

 



Wow, so for iWorks on the Mac, the user is actually warned that something went wrong. Only trouble is - it does not warn you that the content of the cell is not valid - it informs you that the system cannot find the font "Calibri".

Conclusion

It is pretty hard to conclude enything but "this sucks!". None of the applications warn the user that they have lost data - and they all do exactly that - loose data.

Original file: Book1.xlsx (8.82 kb)

Modified file: 

book2.xlsx (8.22 kb)

Comments

4/12/2009 1:10:15 AM #

trackback

Trackback from Where is there an end of it?

SC 34 Meetings, Prague, Days 2, 3

Where is there an end of it? |

4/12/2009 1:30:01 AM #

Rob Weir

Surely none of these Ecma-376 1st edition applications write out dates in ISO format, right?  So the only time this would ever happen is when someone hand-modifies the XML.

This would become more of a problem if some vendors decide to implement ISO/IEC 29500 while others remain on Ecma-376 1st edition.  At that point, the issue is obvious: if you try to read a document that contains markup that your application does not understand, then you may lose data.

I know that some have suggested namespace changes to solve this problem, but it really doesn't address the real issue.  Even if you had only a single standard, say only Ecma-376 1st edition, there would still be occasions for data loss if different vendors implemented a different subset of the standard.  In other words, the problem is not feature differences between different versions of the standard.  The problem is that there is no defined set of features which an OOXML application must implement to achieve interoperability.   If you solve that problem, and also clearly differentiate schema versions via an attribute or namespace, then this issue should be manageable.  

Of course, you still have the problem that the text of OOXML, of any version of it, is irrelevant.  What matters is what Microsoft Office does in practice.  There is no market for a conformant ISO/IEC 29500 application and there never will be.  But there is a market for Microsoft Office-compatible applications.  A source of interoperability woes will be those who do not realize this and try to implement what the OOXML specification allows rather than follow what MS Office actually does.  Your examples show that all of the vendors understand this.

Did you really think otherwise? For example, do you think that Excel will really support dates before 1900?  Or calculations with correct leap year calculations?  Or SVG graphics?  Or any of the other dozens of "concessions" made at the BRM?

Rob Weir United States |

4/12/2009 2:04:59 AM #

jlundstocholm

Hi Rob,

Surely none of these Ecma-376 1st edition applications write out dates in ISO format, right?

I think you are correct in your assumption that no application writes out ISO-dates yet (since there are no applications out there supporting IS29500). The point (or mine, anyways) is that changing the namespace for strict files and OPC will give implementers a solid way of indicating the conformance-level of the documents they create. It will also give implementers a tool that will most likely crash existing implementations when they encounter XML-schemas they are not familiar with. This is not necessarily the case when using a version/conformance attribute only.

Of course, you still have the problem that the text of OOXML, of any version of it, is irrelevant. What matters is what Microsoft Office does in practice.

Yes, I agree. This is the same problem you guys have with OOo. It doesn't really matter what the spec says - what matters is the quirks of OOo and the extent to which it implements ODF. Any market with a single big player (as OOo with ODF and MSO with OOXML) will suffer from this.


A source of interoperability woes will be those who do not realize this and try to implement what the OOXML specification allows rather than follow what MS Office actually does. Your examples show that all of the vendors understand this.


Eeeh ... I am not sure I agree. My examples show that the vendors have implemented OOXML according to the spec and that the behaviour when encountering faulty data varies a bit from application to application. Actually, none of the applications behave like Microsoft Office when it comes to dealing with "corrupt" data. Microsoft Office tries to guess the correct value - the other applications simply disregard it.

For example, do you think that Excel will really support (...) SVG graphics? Or any of the other dozens of "concessions" made at the BRM?

Was this on the table at the BRM? Your memory seems to be a bit better than mine - did we really discuss SVG in Geneva?

jlundstocholm Denmark |

4/12/2009 8:45:19 AM #

Doug Mahugh

Was there a change at the BRM to how leap years are handled?  I don't recall that one, either.

Interesting analysis, Jesper.  Looking forward to more in this series.

Doug Mahugh United States |

4/13/2009 3:58:29 AM #

Esni

Just to be clear: MS Office 2007 SP2 is the one with the ODF support?


Actually, none of the applications behave like Microsoft Office when it comes to dealing with "corrupt" data. Microsoft Office tries to guess the correct value - the other applications simply disregard it.



Microsoft Office looks as it interpreted the data correct, but did not - and it did not show the usual #ERROR in the calculation field.

OOo/Neoo showd a blank, and delivered a bogus value even further from the correct one.

And iWorks is !Works in this case


To Doug: I think prior to the BRM we had the two classic Lotus/Excel formats (1900 and 1904 epochs)

At the BRM those were both extended back in time, and the ISO8601 support was added.

And I think that only the Lotus/Excel  (1900) formats has the leap year bug, which the BRM insisted should still be there

That is what I gather from reports etc.

/esni

Esni Denmark |

4/13/2009 9:55:23 AM #

Rob Weir

Jesper, the point is that none of the vendors currently write ISO 8601 format dates and none will until MS Office does, if they every do. That is the dynamic at play when a single vendor has 95%+ of the market share in products that implement a standard.  It makes the standard, as well as the committee that maintains it, irrelevant.  The only thing that matters is the de facto standard of what Microsoft Office actually reads and writes. No vendor is going to bother to write code to support a feature that MS Office does not support.  Why would they?

The situation with ODF is a bit different.  Although OOo was a strong influence on the earliest ODF work, you should note that this was 6 years ago. Today the TC has participation from many ODF vendors, including IBM, Sun, Novell, Google, KOffice and Microsoft, as well as non-vendor parties, including adopters. The implementors of ODF are also diverse and none has the dominating market share and influence over ODF that MS Office has with OOXML.  

As for SVG, you are correct that this was not discussed at the BRM.  But it was dealt with in one of the hundreds of items that were approved without discussion on that last day.

Rob Weir United States |

4/14/2009 1:09:39 AM #

Gareth Horton

Hi Rob,

Why don't we all give up and go home then, since the standard and the committee that maintains it are irrelevant? Luckily there aren't any active standards regarding mainframes, since participation therein truly would be an utter waste of time, by that measure!

That is IBM's position on Open XML is it? Completely cede all control to Microsoft on the dominant document format worldwide without a whimper?

I would have thought that the Cognos folks might benefit from helping shape or clean up the SpreadsheetML side of the house at least, since a decent amount of their development effort is based on Excel and Office integration per se. They could bring a lot to the table there.  I suppose you could try getting them to at least support ODF first.

Maybe you could also get them to change their definition of 'Office' too.

www.cognos.com/.../reporting.html

(Look in the 'Complete reporting coverage' section.)

I am under no illusions that Microsoft are going to lie down and simply become a passenger in the ISO maintenance process, but why should they?

It would, of course, be good to have more vendors in the process that have recently heavily invested in Microsoft Office integration, such as SAP and many other big ERP vendors, the BI vendors (with the political handbag-swinging around office suites, it makes most sense for IBM to have participants from this area, w.r.t Excel)

By the way, we have already written a bunch of code that supports features that MS Office does not support - that's the beauty of custom XML. Office can read it and consume it in it's way, we can read and consume it in our way.  I know that's not exactly what you were getting at though Wink  


Gareth

Gareth Horton United Kingdom |

4/14/2009 8:29:58 AM #

Rob Weir

Gareth, Looking at the membership of SC34, at the NB level, it appears that many (perhaps most?) NB's that were involved in the approval of DIS 29500 in fact have given up and gone home.   You don't see much participation from Cote-d'Ivoire lately, do you?  So it appears that I'm not the only one coming to the conclusion that OOXML is a lost cause.  If SC34 was relevant, its membership would be increasing, not decreasing, I would think.

So rather than worry whether one little old company participates or not, why not worry why a significant portion of SC34's membership has disappeared?

-Rob

Rob Weir United States |

4/14/2009 7:16:27 PM #

Gareth Horton

Rob,

I think we both know the once-in-a-lifetime situation that OOXML created, you can't project that onto the general relevance of SC34 or the utility of participation in OOXML maintenance.

With all due respect to Cote D'Ivoire (and their ilk), they are not likely to be a rich source of leading experts that can both provide valuable input on defects and improvements. Not only that, but also have the political nous to make things happen when operating in the committee situation.

I know you just wanted to be disarming, but IBM is probably the anti-example of a 'little old company'.  There aren't many companies outside IBM with the amount of experts with the technical chops and the soft skills to be very valuable -with the right motives of course.  If Sun is subsumed into the IBM juggernaut, the pool becomes even larger.

The other issue affecting falling membership is the economy, of course.  It was a not a simple decision to get Datawatch involved in the standards process, as incurring extra costs, especially those not directly involved in revenue generation is not popular anywhere.  Once again, IBM are immersed in cash, so more than most are in a position to participate.

We will review our participation after a year, so at that point I might have a better idea whether the whole thing is irrelevant or not.

Gareth

Gareth Horton United Kingdom |

4/15/2009 12:04:16 AM #

hAl

It is rather pathetic that OOXML does not contain versioning.
It would be very easy for vendor to quickly implement an upgrade which recognizes versions and can then show either
* supporting that version   (straightforward opening of the document)
* converting to supported version (use converter and then opening supported converted document)
* offer limited support with clear warning that data might be lost or give unexpected behaviour due to version differences and lack of support for the version in the document.
* not supporting that version (not opening the document)

hAl |

4/15/2009 4:01:53 AM #

Alex Brown

@hAl

Yup - for me the key thing (from a user perspective) is that it is crystal clear what is happening ...

- Alex.

Alex Brown United Kingdom |

4/19/2009 7:22:50 PM #

007b2b

So rather than worry whether one little old company participates or not, why not worry why a significant portion of SC34's membership has disappeared?

007b2b People's Republic of China |

4/21/2009 1:05:52 AM #

pingback

Pingback from nonaka.eu

Novità importanti per OOXML – IS29500 – nonaka

nonaka.eu |

4/23/2009 7:13:24 AM #

trackback

Trackback from Doug Mahugh

Miscellaneous Links, 04/22/2009

Doug Mahugh |

7/14/2009 6:33:13 AM #

pingback

Pingback from answerspluto.com

list of urls 4 « Answers Pluto

answerspluto.com |

8/16/2009 12:00:38 PM #

pingback

Pingback from iamosx.com

Losing data the silent way - ISO8601-dates | I AM OSX

iamosx.com |

10/23/2009 3:02:02 AM #

pingback

Pingback from aristippus303.wordpress.com

Why do we need serial dates in the Transitional form of IS 29500? « document∩database

aristippus303.wordpress.com |

6/6/2010 10:58:35 AM #

pingback

Pingback from iamosx.com

a 'mooh' point | Losing data the silent way - ISO8601-dates | I AM OSX

iamosx.com |

Comments are closed