Conformance of ODF-documents

by jlundstocholm 30. April 2008 18:13

Ever since the now infamous article by Alex Brown the blogsphere has been filled with interpretations of the, really not so surprising, results - that the OOXML document with the original ECMA-376 spec does not conform to IS 29500.

The, really not so surprising, conclusions have been "Office 2007 does not even produce valid OOXML" followed closely by statements like "This shows that Microsoft Office 2007 should not be allowed since it does not produce valid OOXML".

Hmmm ... ok.

As some of you might remember, I participated in some lab tests with OOXML/ODF interop in Fall 2007. Basically I sat in a small room with guys from IBM, Microsoft, Novell and some guys from the Danish National IT- and Telecom Agency sifting through documents, converting them and examining the resulting XML generated. The documents we worked on were supplied by different parts of the Danish public sector. They were basically told to use some of their existing documents as basis for the parts of the tests they participated in. So these documents were real-world-documents.

One of the things we tested was to see if the documents were in compliance with their respective specs. The original OOXML-documents we tested were all compliant to the ECMA-376 spec ... but it was a different case with the ODF-documents. So the other day I tried to validate all the sent-in original ODF-documents supplied to us.

The results are illustrated in the table below:

File name

Generator

Konklusion

DFFE_Afgået svar til Jane Doe.odt

OpenOffice.org/2.3

not valid

DFFE_SJ_(1) - 15-06-2007 Foreløbig Høring om forslag.odt

OpenOffice.org/2.0

valid

GRIBSKOV_bek-281(BS).odt

OpenOffice.org/2.0

valid

GRIBSKOV_Standardbrev ifm ITST pilotprojekt.odt

OpenOffice.org/2.2

valid

GRIBSKOV_Udkast til Forslag til Lokalplan.odt

OpenOffice.org/2.1

not valid

ITST standardbrev ODT.odt

OpenOffice.org/2.0

valid

ITST Testdokument ODT.odt

OpenOffice.org/2.2

not valid

RM Kursusmateriale.odt

OpenOffice.org/2.0

not valid

RM Standardbrev 2s.odt

OpenOffice.org/2.3

not valid

The table contains information about the file name of the original document, the application that generated it (from the META-file in the ODF-package) and if the document passed the test.

Overall conclusion of this was:

Application

Creates consistantly valid ODF?

OpenOffice.org/2.0

 

OpenOffice.org/2.1

 

OpenOffice.org/2

OpenOffice.org/2.3

 

So should we demand that OOo not be used at all? Of course not, but we should keep the pressure on the OOo-team to fix their code ... just as we should with Microsoft and Microsoft Office.

Comments

5/1/2008 9:40:29 AM #

pingback

Pingback from blogs.msdn.com

Doug Mahugh : ODF conformance tests

blogs.msdn.com |

5/1/2008 11:46:01 AM #

Rob Brown

Hi Jesper,

Are those documents publicly available? What was your validation method? And what were the specific failures?

My concern here is that "we should keep the pressure on the OOo-team to fix their code" is a good aim, but pointing out specific failures and expected results is a whole lot more valuable than saying "Your code produces invalid ODF, fix it!" Smile

Rob Brown New Zealand |

5/1/2008 2:48:16 PM #

Ron House

The difference, in case you genuinely don't know it, is that ODF is a genuine standard designed by combining the insights of many different stateholders, and OOo have stated that any failure to conform is regarded as a bug in the product. OTOH, Odious Office XML was designed to reflect exactly and only the features of the Microsoft Office format, bugs and all, and Microsoft have stated that they make no promise to remain faithful even to that self-serving standard in future releases. Get it now?

Ron House Australia |

5/1/2008 6:42:46 PM #

hAl

@Ron House.
On this blog it was already shown that OpenOffice uses a modified version of Math 1.x where the OpenDocument spec states that MathML in an OpenDocument file should be MathML 2.0.
I would not call that a bug but a deliberate choice not to conform. And as Sun/OOo is indeed a big stakeholder in the development of ODF not having for insstance the MathML 2.0 schema included makes validation whilst not conforming a lot easier. So to suggest that every non conformace in OOo to the OpenDocument standard is a bug seems incorrect. Also what is worse is that where in the test of OOXML documents each conformed to Ecma 376 which was the format version they were created in but being made in that currrently used format versions the documents naturally did not conform to a yet unpublished ISO version.
For OpenDocument however it is not sure if a document made by the main supporting implementation even conforms to any version of the standard let alone to the ISO ODF version.

I have just been checking the OpenOffice site for a couple of minutes but I am unable to find which version of ODF is supported by which version of OpenOffice not making the issue easier.

hAl |

5/1/2008 6:55:58 PM #

hAl

@Jesper
Did you validate against the ISO ODF v1.0 schema version of the OASIS ODF 1.1 schema version.

hAl |

5/1/2008 7:45:24 PM #

jlundstocholm

Rob,

Most of the documents are not available since we were told not to make them public because they did not own the copyright to some of the content.

However, I agree with you that the details are valuable and I will write an article withe these details in a couple of days.

Do also note that the aim of this post was not to point out specific errors in ODF-implementations. The aim was to illustrate the hypocrisy of demanding Microsoft Office 2007 denied access to markets demanding OOXML-compliance since OOo does not meet those criteria itself ... and as far as I know noone has demanded OOo being pulled from the market.

Smile

jlundstocholm Denmark |

5/2/2008 12:29:58 AM #

jlundstocholm

Ron,

I'm sorry ... but I fail to see any relevanse to my post in your comment.

About Microsoft nt supporting OOXML in their own products, please look at www.microsoft.com/.../ChrisCapOpenLetter.mspx where Chris Capossela, Senior Vice President, Microsoft Office stated that

Microsoft has been afforded a wonderful opportunity as a result of this process. We've listened to the global community and learned a lot, and we are committed to supporting the Open XML specification that is approved by ISO/IEC in our products.

Of course you are feel to regard this as just another marketing bluff, but saying that

Microsoft have stated that they make no promise to remain faithful even to that self-serving standard in future releases

is simply not true.

Get it now? Wink

jlundstocholm Denmark |

5/2/2008 12:30:54 AM #

jlundstocholm

hAl,

I used the OpenDocumentFellowship validator and confirmed the result afterwards. As I told Rob, I will post another article in a few days with more details.

jlundstocholm Denmark |

5/2/2008 12:42:46 AM #

jlundstocholm

hAl,

You can check the schema-notation in e.g. content.xml in the ODF-package to see which edition of ODF the specific version of OOo produces.

(they all - up to 2.4 - produce ODF 1.0 markup)

jlundstocholm Denmark |

5/2/2008 12:44:52 AM #

jlundstocholm

To the people over in comp.linux.os.advocacy reading this blog:

I just noticed a funny little detail. Please compare the content of Ron's post with the one in groups.google.co.uk/.../8cf6b936aa80fbcb by "Linonut".

Maybe they are related?

Smile

jlundstocholm Denmark |

5/2/2008 1:45:13 PM #

Rob Brown

Hi Jesper,

Your intention may be illustrate hypocrisy, but your testing is far more valuable than that. I'll look forward to seeing further details when you get around to it. I'd urge you to take the extra step and send your test results to the people responsible for OOo / KOffice / MSOffice / whoever. Hell, I've already referred one of your blog posts in a KOffice bug report!

In case you're interested, I have no interest in MS Office being denied access to any markets. It's reasonably important to me that people not be forced to buy MS Office to communicate with governments etc, but I don't know if that's ever actually been the case and I don't see it becoming the case in the future.

Rob Brown New Zealand |

5/4/2008 6:53:58 AM #

next_ghost

You're wrong about OO.o 2.x generating ODF 1.0. My documents written in OO.o 2.3.1-r1 have <office:document-content office:version="1.1">. My older documents which have office:version="1.0" pass validation against ODF 1.0 schema.

next_ghost Czech Republic |

5/4/2008 6:09:24 PM #

jlundstocholm

next_ghost,

Yes - you are correct. I was confused by all the 1.0-suffixes in the namespace declarations in the XML-files in the ODF-package.

Furthermore - I just saw that Rob had started a Wiki for ODF-validation at wiki.oasis-open.org/.../How_to_Validate_an_ODF_document (the comment section of his previous post at www.robweir.com/.../...lidation-for-dummies.html). I think this is really great news and I think we should have the same for OOXML on e.g. openxmldeveloper.org or the ECMA-page directly.

Smile

jlundstocholm Denmark |

5/4/2008 7:26:42 PM #

hAl

Noteworthy is that Rob Weir in his article on validating suggest that people not use the official OASIS RelaxNG DTD compatiblity because it was not submitted to ISO and therefore wrong to use.

Strange that he thinks it is ok for OpenOffice to use a version of ODF not submitted to ISO  then ?

I also wonder if his collegues in OASIS like it when he dismisses part of their work because it was not submitted to ISO and then himselves promotes ODF versions being used when they are not submitted to ISO.

hAl |

5/4/2008 8:42:14 PM #

jlundstocholm

hAl,

Personally I like that the discussions have turned towards conformance and not comparing technicalities between the two document formats. But we need to somehow be able to do more in terms of conformance than mere schema-validation. The trick is - schema validation only gets us some of the way. There are other constraints in both ODF and OOXML that are not reflected in the schemas.

Rob told Alex Brown that he should have consulted ODF TC with some of the problems he encountered instead of simply blogging about them. Well, I actually tried to get help previously, which lead to one of my posts: idippedut.dk/post/2008/02/A-cry-for-help.aspx . I wrote an email to the ODF mail list at lists.oasis-open.org/.../threads.html but never got an answer.

... maybe I have become a "persona non grata" in ODF-circles.

Smile

One of the things I couldn't get my head around was naming of the files in the ODF-package. I have been able to deduct the requirements, e.g. that the main file in an ODF-package has to be called "content.xml", but I cannot see this reflected in the manifest-schemas. It is extremely important to discuss these things when talking "validity" as well, otherwise validation could show "no errors" - but since the package structure is messed up, noone will be able to read the damn document.

jlundstocholm Denmark |

5/4/2008 9:00:51 PM #

next_ghost

hAl,

Rob Weir wrote that ODF doesn't conform to ID/IDREF contraints so there's no point in checking it. It has nothing to do with RelaxNG DTD compatibility being submitted to ISO or not.

next_ghost Czech Republic |

5/5/2008 3:29:59 AM #

Rob Weir

Albert,

The problem with Relax NG DTD Compatibility is not that it was from OASIS.  Even if it was an ISO standard, it still is the case that ODF does not claim that its schema conforms to Relax NG DTD Compatibility.  That doesn't mean that ID/IDREF semantics don't apply to ODF document instances.  It just means that you'll need to verify these constraints in application logic rather than using Relax NG DTD Compatibility.

Jesper has a good point that conformance is a much broader issues than document validity.  When you are dealing with very small XML schemas, like purchase orders and transaction systems, then validity may give you 100% of what you need.  But the more complex your format, the more that conformance, and interoperability, depend on other things.  That is why only 0.7% of web pages are valid XHTML/HTML, but the web still works pretty well.  

It is a simple matter of large systems design.  The way we deal with complexity, in human designed systems, is with robustness, fault-tolerance, etc.  No large, complex engineering system will work if it relies on perfection in all its parts.  But that is what XML validity requires -- perfection.  If I snapped my fingers and required that all web browsers reject web pages that were not 100% conformant to the HTML/CSS/EcmaScript, etc., standards then the web would collapse and we would never recover it.  That doesn't mean that we don't continually try to improve.  But it does mean that we need to design systems for a world of multiple players, multiple vendors with multiple applications, which will sometimes have application bugs, and for users that will sometimes make user errors.

Oh, Jesper, I recommend that try signing up for the ODF TC's comment list.  The link is on our home page.  I know we have ODF user and ODF developer lists, but they get almost zero traffic.  The critical mass of discussion is on the comments list.  Re-post your question there  and you'll be much more likely to get a response.  Most of the TC members subscribe to that list.

Rob Weir United States |

5/5/2008 4:38:07 AM #

hAl

I think someone is rather oposing Rob view on not validating the ID/IDREF constraints.

www.griffinbrown.co.uk/.../PermaLink.aspx

hAl |

5/6/2008 1:42:20 AM #

pingback

Pingback from blogs.msdn.com

Doug Mahugh : Open XML links for 05-05-2008

blogs.msdn.com |

5/6/2008 3:18:26 AM #

Alex Brown

Rob Weir wrote:

> ODF does not claim that its schema conforms to Relax NG DTD Compatibility.

It does, actually. It references that very document and uses its "attribute default value feature" without which it is not possible to validate ODF instances correctly.

- Alex.

Alex Brown United Kingdom |

5/7/2008 12:29:32 AM #

pingback

Pingback from ctrambler.wordpress.com

ODF validation « CyberTech Rambler

ctrambler.wordpress.com |

5/7/2008 7:50:30 AM #

Rob Weir

Alex, to be specific, as I was on my blog, ODF does not claim conformity with the ID/IDREF checking part of Relax NG DTD Compatibility.  As you know very well, there are three different features described in the Relax NG DTD Compatibility specification, and the specification clearly states that conformance can be claimed against these features independently  You also know that the ODF 1.0 Standard claims conformance with only the default attribute value portion.  This is allowed.  We did it.  And jing validates ODF documents fine with the ID/IDREF part of Relax NG DTD Compatibility disabled, and Sun's Multi-Schema Validator works fine with my test ODF document in its default mode of operation.

Rob Weir United States |

5/7/2008 6:25:20 PM #

Ron House

jlundstocholm, you can stop your wiseacre "Maybe they are related?" comments about the similarity between my post and linonut's on c.o.l.a: the simple reason his post is identical to mine is plagiarism. To make it plainer: he copied my post here word for word and didn't credit it. I am exactly who I say I am, a real person known by my real name. I never post anonymously. So please don't make any more covert personal attacks by innuendo.

Ron House Australia |

5/7/2008 6:32:48 PM #

jlundstocholm

Dude, I am not attacking you personally ... plagiarism is just not something you see everyday where something is copied word for word without a reference to the original poster. I simply couldn't help but think that you were Linonut's alter ego (or the reverse).

I'm glad we cleared that up Smile

jlundstocholm Denmark |

6/12/2008 12:47:49 AM #

Doug Mahugh

Jesper, have you had a chance to get back to the details of the validation methodology that Rob Brown asked about above?  It would be interesting to see exactly how that worked if possible.

Doug Mahugh United States |

1/18/2009 11:49:06 PM #

pingback

Pingback from hilpers.it

Lettura file xls | hilpers

hilpers.it |

7/8/2011 12:27:10 AM #

pingback

Pingback from blogs.msdn.com

MSDN Blogs

blogs.msdn.com |

Comments are closed