a 'mooh' point

clearly an IBM drone

Official complaint on OOXML-procedures in Denmark

I just wanted to let you in on a bit of information here from sunny Copenhagen.

Denmark has joined Norway in the strange sense that the Danish NSB (Dansk Standard) has received an official complaint regarding the Danish vote on March 29th 2008. I am sure the news will spread to the rest of the blog-sphere soon, so be the first to get the information here (my translation)

The Municipality of Aarhus, who was a member of the OOXML-committee in Dansk Standard, has now complained about the "Yes"-vote in the ISO-approval [of OOXML]. The reason: No one knows the real content of the specification. [...] . "It is strange to vote 'Yes' to a standard, that could still be filled with flaws and defects. In principle, it might say that the Moon is made of Swiss cheese, and they voted 'Yes' to that, explains Jens Kjellerup to Computerworld."

Check it out here (in Danish): http://www.computerworld.dk/art/45835?a=fp_2&i=514

Document translation sucks (When Rob is right, he's right)

It is very seldom I read one of Rob's posts and think "That is just so true" - but yesterday was one of those occasions. I was reading through his latest post about load of different documents in a couple of applications and I couldn't help but smile when I got to the part where Rob made som observations about possible reasons for the poor load times of ODF-files using Microsoft Office 2003:

What is a file filter? It is like 1/2 of a translator. Instead of translating from one disk format to another disk format, it simply loads the disk format and maps it into an application-specific memory model that the application logic can operate directly on. This is far more efficient than translation. This is the untold truth that the layperson does not know. But this is how everyone does it. That is how we support formats in SmartSuite. That is how OpenOffice does it. And that is how MS Office does it for the file formats they care about. In fact, that is the way that Novell is now doing it now, since they discovered that the Microsoft approach is doomed to performance hell.


I have been trying to pitch my idea of "document format channels" for some time now. The basic idea is not to do translations between formats but to support the feature sets of both formats in the major applications.

I remember when I participated in the interop-work for the Danish Government in Fall 2007 and we tried to say something clever about the dissapointing results we saw of translation, we heard the rumours of Novell skipping the XSLT-translation of ODF to OOXML (and vice versa) and instead extend the internal object model of Novell's edition of OpenOffice.org . This was there the idea was born.

The idea was to round-trip documents in the format they were born and not to attempt translation (also, how the hell do you translate e.g. a digital signature between an ODF-file and an OOXML-file?).  What triggered the "vision" was that 1) the formats are not fully compatible and 2) translation sucks. In every interop-session I have attended and in every piece of interop-work I have participated in, there has been one, crystal clear conclusion:

When you translate, you loose information.

Essentially, translation is a poor-man's document consumption, because if you loose information when translating - why would do it? As Rob so correctly points out - when Microsoft chooses to use translators to enable "support" for ODF in their Microsoft Office suites, it's really another way of saying: "We don't really care about ODF". The same thing naturally goes for OpenOffice.org (and spin-offs). When they insist of implementing just import filters for OOXML and use translators to do so - they are saying exactly the same: "We don't really care about OOXML". In both cases what they are communicating to their users is really

We don't care that you loose information - you'll just have to settle for half of the correct solution

It's the same message I hear when some of my colleagues come to me and say: "Jesper, I finished the piece of code you wanted me to do". Sometimes I am blessed with conversations like:

Colleage: I finished the code piece
Jesper: Cool - does it work all right?
Colleage: Eh well, it compiles just fine ...

Is that good enough?

(and with this friendly post, I can only hope "someone" will accept the LinkedIn-invitation I sent in February just before the BRM in Geneva ... or maybe I should try Diigo instead?)

Smile

Do you license your blog-content?

A few weeks back I attended an IT-architecture conference in Aarhus, Denmark and one of the sessions I participated in was about licensing your software with OSS-licensing. It was originally about software licensing, but at the end of the session, the speaker asked the audience:

How many of you are bloggers?

A few of us raised our hands. Then he asked:

How many of you have thought about how you license your blog entries?

Well, I for one didn't have a clue. Then the other day I noticed a small image on the bottom of the posts of Rick Jelliffe saying "Some rights reserved". It linked to Creative Commons and that kindda got the ball rollin'. I read about the different license-models and I have come to the conclusion that the license most applicaple to me and the content I put online is the "Attribution"-model. This is the least restrictive of the Creative Commons licenses and is says in abstract:

This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licenses offered, in terms of what others can do with your works licensed under Attribution.

One reason I chose this was that I hereby grant everyone the right to use my work commercially. You see, say I in a post made an argument that Rob Weir liked so much that he wanted to quote me on his blog. Even though I am not a lawyer, I could fear that he might not publish it if it was under a "non-commercial"-license (IBM being a commercial company and all). So to be sure that most of you will be able to use the work I publish here, I chose the "Attribution"-license for my entries.

What about the rest of you - have you thought of this? 

Challenge (Part II)

A tongue-in-cheek challenge for Mr. Rob Weir.

[code=xml]<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
  office:version="1.1">
  <office:body>
    <office:spreadsheet>
      <table:table table:name="Sheet1" table:protected="true" table:protection-key="8A45FB0C33667F9E33ECA007FCE4F6684DC5F242">
        <table:table-column />
        <table:table-row >
          <table:table-cell office:value-type="float" office:value="10">
            <text:p>10</text:p>
          </table:table-cell>
        </table:table-row>
        <table:table-row >
          <table:table-cell office:value-type="string">
            <text:p>
              Dear Rob Weir. Please prove by this example that ODF is an "interoperable"
              document format and tell me how a consuming application should determine if the
              user should be allowed to modify the document. I do not think that it is.
              In fact I think that your statements that ODF is a document format that
              provides interoperability are brash, irresponsible and indefensible
              pieces of bombast that you should retract.
            </text:p>
          </table:table-cell>
        </table:table-row>
      </table:table>
    </office:spreadsheet>
  </office:body>
</office:document-content>[/code] 

(and yes, one of the reasons for this post is to show off the cool syntax highlighter of this blog engine)

Wink

And could you guys please stop the bickering and let's move on to something a bit more interesting? 

What is conformance, really?

The OOXML/ODF-blogsphere has been in a frenzy the last couple of weeks after a couple of posts made by yours truly and Alex Brown that was picked up by Rob Weir. I don't want to get into the technical details here - you should catch up on the conversations taking place in the comment sections of their respective blogs.

Bu I do want to talk a bit about conformance - because conformance should be much more than schema-validation. To be able to have a clear perspective, we need to look in the two specifications for how conformance is described.

ODF 1.0 (IS 26300):

Conformance is described in section 1.5 

Documents that conform to the OpenDocument specification MAY contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within this specification and are called foreign elements and attributes.

So this means that the only requirements for a document to have an "ODF-conformant" sticker slapped on it is to be able to validate against the ODF schema. If the document contains elements or attributes not defined in ODF 1.0, they should be marked with their own namespaces. This is actually all there is to say about conformance of individual documents in ODF 1.0 .

The section further describes conformance requirements for consuming and producing applications:

Conforming applications either MUST read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place, or MUST write documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place.

So this section describes requirements to how foreign elements are handled when writing and reading ODF documents.

OOXML 1.0 (IS 29500):

The conformance clauses for OOXML were (drastically) changed at the BRM. Conformance in OOXML is described with more details and most specifically it contains conformance clauses for the OOXML-package itself, the so-called "OPC-package".

As with ODF, an OOXML 1.0 document is conformant if it adheres to the schema described in the standard.

More specifically it says in Part 1 section 2.4

Document conformance is purely syntactic; it involves only Items 1 and 2 in §2.3 above.

  • A conforming document shall conform to the schema (Item 1 above) and any additional syntax constraints (Item 2).

Now, this is already more difficult to "put down on paper" than the ODF-equivilant. Because "Item 1" and "Item 2" are described in Part 1 section 2.3 as

  1. Schemas and an associated validation procedure for validating document syntax against those schemas. (The validation procedure includes un-zipping, locating files, processing the extensibility elements and attributes, and XML Schema validation.)
  2. Additional syntax constraints in written form, wherever these constraints cannot feasibly be expressed in the schema language.


As a side-note, Item 2 above was the exact reason Stepháne Rodriguez' example with the broken Calculation Chain was actually a non-conforming OOXML-document, but that's a completely different story.

Moreover OOXML describes a few "conformance classes", specifically "Wordprocessing", "Spreadsheet" and "Presentation"-classes. The intent here is to be able to claim conformance to parts of the OOXML-spec.

And just as ODF contained requirements for applications, so does OOXML. But it takes conformance a bit wider. Since there is an "Item 1" and "Item 2" above, there is also an "Item 3". This was modified at the BRM and now says:

3. Descriptions of element semantics. The semantics of an element refers to its intended interpretation by a human being. 

In section 2.5 of Part 1 it now says:

Application conformance incorporates both syntax and semantics; it involves items 1, 2 and 3 in §2.3 above.

So a conforming application also has to abide by the semantics of the specification of elements and attributes. In lay-man's terms this could be described as "A conforming application has to treat content faithfully with respect to the specification of it". So it basically tells applications not to make their own interpretation of the elements it encounter as it traverses the XML-tree.

Now, I know that this is just a crude introduction to conformance of ODF- and OOXML-documents, but I think it is important to get the ball rolling and to give everyone a feeling of the complexity of the concept.

Thoughts, anyone? 

Conformance of ODF-documents

Ever since the now infamous article by Alex Brown the blogsphere has been filled with interpretations of the, really not so surprising, results - that the OOXML document with the original ECMA-376 spec does not conform to IS 29500.

The, really not so surprising, conclusions have been "Office 2007 does not even produce valid OOXML" followed closely by statements like "This shows that Microsoft Office 2007 should not be allowed since it does not produce valid OOXML".

Hmmm ... ok.

As some of you might remember, I participated in some lab tests with OOXML/ODF interop in Fall 2007. Basically I sat in a small room with guys from IBM, Microsoft, Novell and some guys from the Danish National IT- and Telecom Agency sifting through documents, converting them and examining the resulting XML generated. The documents we worked on were supplied by different parts of the Danish public sector. They were basically told to use some of their existing documents as basis for the parts of the tests they participated in. So these documents were real-world-documents.

One of the things we tested was to see if the documents were in compliance with their respective specs. The original OOXML-documents we tested were all compliant to the ECMA-376 spec ... but it was a different case with the ODF-documents. So the other day I tried to validate all the sent-in original ODF-documents supplied to us.

The results are illustrated in the table below:

File name

Generator

Konklusion

DFFE_Afgået svar til Jane Doe.odt

OpenOffice.org/2.3

not valid

DFFE_SJ_(1) - 15-06-2007 Foreløbig Høring om forslag.odt

OpenOffice.org/2.0

valid

GRIBSKOV_bek-281(BS).odt

OpenOffice.org/2.0

valid

GRIBSKOV_Standardbrev ifm ITST pilotprojekt.odt

OpenOffice.org/2.2

valid

GRIBSKOV_Udkast til Forslag til Lokalplan.odt

OpenOffice.org/2.1

not valid

ITST standardbrev ODT.odt

OpenOffice.org/2.0

valid

ITST Testdokument ODT.odt

OpenOffice.org/2.2

not valid

RM Kursusmateriale.odt

OpenOffice.org/2.0

not valid

RM Standardbrev 2s.odt

OpenOffice.org/2.3

not valid

The table contains information about the file name of the original document, the application that generated it (from the META-file in the ODF-package) and if the document passed the test.

Overall conclusion of this was:

Application

Creates consistantly valid ODF?

OpenOffice.org/2.0

 

OpenOffice.org/2.1

 

OpenOffice.org/2

OpenOffice.org/2.3

 

So should we demand that OOo not be used at all? Of course not, but we should keep the pressure on the OOo-team to fix their code ... just as we should with Microsoft and Microsoft Office.

Custom XML in ODF (XForms) Part 1

For some time now I have had an urge to see how to do "Custom XML" or "Custom Schemas" in ODF. Saturday evening after the BRM in Geneva I was sitting in the bar at the Kempinsky Hotel at the lake - naturally talking tech-stuff. We talked about the usual ODF/OOXML-stuff and touched upon the subject of Custom XML. I was told that ODF would not have Custom XML capabilities since the ODF TC thought it was good enough to do it with XForms.

Cool, I thought ... I need to test this.

For this first test I have used the UI of OpenOffice.org 2.4 to create an XForm-enabled document with some basic data in it. I will dig deeper into the technicalities later but OOo UI will do for now. I have been searchning high and low for tutorials on XForms and their usage in ODF, and finally I found this article by J. David Eisenberg on xml.org The article is from 2006. I have made a more simple document for this test - avaliable here: xforms.odt (8,98 kb).

I have created a small form to enable the user to type in some basic data, e.g. "name", "phone" and "email".


The idea is to be able to map the typed-in data to a XML-structure. In my case the structure is this:

<xforms:instance id="clubData">
    <club>
        <name />
        <contact>
            <name />
            <phone />
            <email />
            <city />
        </contact>
    </club>
</xforms:instance>

I have set up the document to do more or less what the original article did so let's look at what is really persisted in the ODF-package. An XForm is basically a connection between "input fields" like "text boxes", "radio buttons" and "drop-down menus" and some XML in the document. Look at the content of a part of the content.xml-file below (some details have been removed to enhance readability):




So 1) puts a control (input field) next to the text "Club name" with control-id "control1". This control is further defined in 2) where the XForms "bind"-attribute 3) tells the application to bind the  contents of the control to the XML specified with the XPath expression in 4).

It's really cool and nicely set up.

But what about persistance of the data entered in the form fields? Well, you add a button and attach an action to it. I called my button "Persist". What this action does is defined with the XForms "Submission" element.



In short the above describes that the content of the form fields should be persisted in a file on my local hard drive. Other methods could be to post it to a webserver or URI somewhere. This is very similar to how InfoPath works.

But at this moment I have two outstanding issues - and here I could use the help of you guys:

  1. I would really like to persist the data in the ODF-package - but I cannot get my head around making OOo doing it. Is it at all possible?
  2. When I click the button nothing happens - the data is not saved to disk. What am I missing here?


This is my first attempt to work with XForms in ODF, and to me it really seems kind of nifty.

So the guy tossing down beers at the bar in the hotel was kindda right - it is possible to do some kind of CustomXML-embedding in ODF using XForms. I also think, however, that it doesn't make a whole lotta sense to compare XForms with the CustomXML-implementation in OOXML - especially if it is not possible to use XForms actions to save the data directly in the package. In this case it seems to me that XForms should be compared to InfoPath instead.

 

So guys, what do you think? What are your experiences with XForms in ODF?

Smile

Formulas in ODF-supporting applications

Some time ago I noticed that Fredrik e. Nielsen had posted a link in a Norwegian debate to a website comparing spreadsheet formula interop using ODF. The article is from 2005 comparing formula interop between OOo Calc 1.9.117 and KSpread from KOffice 1.4.1. The article is interesting since it highlights one of the more serious problems with lacking spreadsheet formula definitions in ODF. Some of the pictures in the article are missing and because the article is 2.5 years old, I thought it'd be interesting to take it for a spin again and see what has happened since 2005 in terms of interop between the two major ODF-implementations. I have done exactly the same as in the original article and have additionally added a bit of research to see where the problem really lies.

What did I do?

Well, on my brand new ubuntu 8.0.4 installation I installed KSpread 1.6.3 in addition to OOo 2.4 that came pre-installed with the system.

I created a spreadsheet using OOo 2.4 Calc with the data from the original article (formula OOo.ods (7.58 kb)

 



I then tried to open it using KSpread. This is what it looked like:

 



As in the original article I modified the formula to fit in KSpread and the result was:

The file s available here: formula KSpread.ods (5.44 kb)

When saving this file and opening it in OOo again, this was the result:

So there has actually (pheew) been some improvement in spreadsheet formula interop for applications using ODF Spreadsheets since 2005 ... thank God! At least now OOo is able to show the formula created by KSpread.

To take a more deep look into what was the cause of the problems, I added some information to the original spreadshee. The result is here: Since OOo can read the formulas from KSpread, I have opened the file using KSpread to demonstrate the problem:



The file is available here:(formula OOo exp.ods (9.30 kb)

So what should we conclude from this very basic test? Well, you tell me ... but at least, when someone next time tells you, that lacking formula spec in ODF is not a practical problem but only a theoretical problem ... please tell them that they are wrong.

(D)IS 29500 ISO process F.A.Q.

Due to the still overwhelming interest of the now done ISO DIS 29500 process, ISO has created a small F.A.Q. to answer some of the more frequently asked questions.

My excerpts from the F.A.Q. are listed here:

Q: How could a 6.000-page document be fast-tracked?

Because the information technology (IT) sector is fast-moving, the joint technical committee ISO/IEC JTC 1, Information technology, introduced the "fast track" process for the adoption as ISO/IEC standards of documents originating from the IT sector on which substantial development has already taken place.

(...)

The number of pages of a document is not a criterion cited in the JTC 1 Directives for refusal. It should be noted that it is not unusual for IT standards to run to several hundred, or even several thousand pages.

ISO/IEC 29500 has spent a total of 15 months being processed within the ISO/IEC system, from its submission in December 2006 to the deadline of 29 March 2008 approving it.

Q:  Why would ISO and IEC allow two standards for the same subject?

(...)

In this particular case, some claim that the Open Document Format (ODF), which is also an ISO/IEC standard (ISO/IEC 26300) and ISO/IEC 29500 are competing solutions to the same problem, while others claim that ISO/IEC 29500 provides additional functionalities, particularly with regard to legacy documents.

The ability to have both as International Standards was something that needed to be decided by the market place. ISO and IEC and their national members provided the JTC 1 infrastructure that facilitated such a decision by the market players.

Q: What about hidden patent issues?

(...)

Microsoft, the holder of patents involved in the implementation of ISO/IEC 29500, has made such a declaration to ISO and IEC. If, after publication of the standard, it is determined that licenses to all required patents are not so available, one option would be to withdraw the International Standard.

Q: What about contradictions with other ISO and IEC Standards?

(...)

A number of such claimed contradictions were identified during the one-month JTC 1 fast-track review period, prior to its release for voting and comment. The submitter, Ecma International, responded to these comments at the end of the review period.

Some of these comments were reflected in national body comments on the fast-track Draft International Standard (DIS). These comments, e.g. the non-alignment with ISO 8601, Data elements and interchange formats – Information interchange – Representation of dates and times, were dealt with in the ballot resolution meeting (BRM).

It is possible that others may still remain, but these can be taken care of during the maintenance of the standard.  In all cases, the final decision on whether there are contradictions and how to resolve them rests with the national members of ISO and IEC.

Q: Will ISO and IEC review how ISO/IEC 29500 was adopted?

We reviewed the process before it started, all the while during its course and afterwards as well. While the voting on ISO/IEC 29500 has attracted exceptional publicity, it needs to be put in context. ISO and IEC have collections of more than 17 000 and 7 000 successful standards respectively, these being revised and added to every month. This suggests that the standards development process is credible, works well and is delivering the standards needed, and widely implemented, by the market. (...)

Object-embedding in OOXML with Microsoft Office 2007

(updated 2008-04-14, added links to external resources) 

Now that the ISO-vote and approval of OOXML is done with, it is time to continue the coverage of implementing OOXML as well as ODF – this time about OOXML, Microsoft Office 2007 and embedded objects.

As I have previously said, there are always quirks when it comes to implementations of any standard in large applications. I have covered a few of these already regarding mathematical content [0], [1] and it is no different with regards to object embedding. I should say that a source of inspiration to this article was Stepháne Rodrigues’ article about binary Parts of an OOXML-file (OPC-package).

Now, embedding objects in an OOXML-file is pretty straight-forward: Simply add the object somewhere in the package and make a reference to the location and specify what kind of file you are embedding. This is very similar to how it is done in ODF.

(note: the specific schema-fragments defining how to do this were dealt with and changed at the BRM, so I will not include these until the final version of IS 29500 is released. I will update this article according to the revised spec).

As I have noted earlier, interoperability happens at application-level, so it is worth pondering a bit on how the specification is implemented in the major implementations of it. So let’s see how Microsoft Office acts when embedding objects.

What I did was this: 

I used Microsoft Office 2007, created a text-document and I embedded an object in it – in this case an OpenOffice.org Calc Spreadsheet. The spreadsheet is also inspired by one of Stepháne Rodrigues’ articles, the infamous “OOXML is defective by design”.

 

The object is inserted and displayed in the document. When activating the object, I can edit it as if it was in OOo Calc itself. Actually it is OOo Calc itself. It is invoked using OLE and as a side-note it shows a cool thing about OLE – or similar other object linking techniques. Microsoft Office 2007 does not know anything about OpenOffice.org, yet it is still able to invoke the application and edit the embedded object.

 

Ok – now let’s look at the OOXML-file created. In the file document.xml the following fragment is located:


The <v:shape>-element is part of the nasty VML-dependency that luckily was dealt with at the BRM. This will be replaced by DrawingML in the final IS 29500. The <o:OLEObject>-element specifies the type of the embedded object (“opendocument.CalcDocument.1”) and the location of it (“rId5”). There is really nothing platform dependent here in the OOXML-markup.What is more interesting, though, is looking at the Calc-object after it is embedded. By navigating through the relationship-model of the OPC-package, the embedded object is located.

 

One might think that this file was simply the Calc-file renamed, but sadly this is not so. This file is actually the Calc-file wrapped in an OLE2 Compound file (“CF”). The CF-file is basically a stream wrapper which allows a number of streams to be persisted in a file as well as information about these streams. Using one of the many CF-viewers you can get the data of the wrapped file itself as well as the persisted information of it, here “com.sun.star.comp.Calc.SpreadsheetDocument _   Embedded Object _   opendocument.CalcDocument.1”.

 

 

Technically this is really not a big deal – there are well-known ways to manipulate these files on all platforms and most programming languages and extracting the required data should really be a no-brainer. OpenOffice.org is licensed under LGPL, so you can use the source-code from this to figure out how to do it on the platforms supported by OpenOffice.org. It is also pretty evident why Microsoft Office 2007 works this way. Microsoft Office 2007 is the latest incarnation of the Microsoft Office Suite – a suite that has depended on this file format since at least 1999 … and of course on OLE itself as well. So if you want to implement a document consumer, this is simply something to be aware of when consuming OOXML-files.

From the perspective of a developer, however, this is really annoying. I would definitely opt for Microsoft Office 2007 embedding the objects simply as the objects they are – and not wrapping them in a CF-wrapper. This is how it is done in OpenOffice.org. Granted, this suite does other weir(d) things like renaming the files and not being entirely clear how to embed all object types, but the objects are embedded as they are (unless they are OpenDocument objects). This is a benefit to me as a developer when examining OOXML-files, because I can simply extract the object in question from the document package and verify the file.

So this might be the first new post-vote change-modification to IS 29500:

 

When embedding objects an application shall not modify or wrap the embedded object in any way before embedding it in the package. When a document consumer encounters an embedded object, this shall not be converted to another object type without knowledge-based confirmation by the user.

 

This (or similar woring in standard-lingo) would prevent Microsoft Office in wrapping objects on CF-wrappers, but it would also prevent applications like OpenOffice.org on SUSE to convert embedded Excel-objects to Calc-spreadsheets. FYI, this kills interop too.

A final request: Microsoft, please, as you must already be implementing the changes from the BRM for Office 2007, would you be so kind to make this change to the application as well? It should really be a no-brainer, and if there should be any requirements in your code for the CF-files, feel free to load the objects, wrap them in an in-memory CF-file and take it from there.

Smile