a 'mooh' point

clearly an IBM drone

Get the ball rollin'

The latest couple of weeks have been almost as quiet as the couple of weeks after the BRM in Geneva. Most bloggers have cut down on their posts regarding ODF and OOXML (including your's truly) and have apparently gone back to their day-jobs to do a bit of Summer cleaning before the Summer vacations. There has, however, been some interesting development in some of the corners of the ODF/OOXML blogsphere.

First,

ODF TC has launched the preliminary work to form another ODF TC to focus on interoperability between ODF-enabled applications. I am one of the "lurkers" on the mail list, and I encourage everyone to either eavesdrop on the conversations taken place or actively participate in the debate. As noted before, conformance and interoperability is so much more than schema validation and the initiative is highly valuable. The topics being discussed are conformance, interoperability, IPR, AcidTests and much, much more.

Second,

Microsoft has joined ODF TC in OASIS. By Paul's message to the maillist of ODF OIIC  it seems that Microsoft has already been admitted into ODF TC. Microsoft is now listed as one of the "Sponsor Level"-members along with Adobe, Google Inc, IBM, Intel Corporation, Microsoft Corporation, Novell, and Sun Microsystems. (PS: Why is there an asterix after the names of Google and Novell?). This would mean that, as far as I get the legal mumbo-jumbo, that Microsoft will include ODF in its OSP. According to the content of the list as I write this, it has not yet been included. This also reminds me, that Microsoft said that they would modify the OSP for OOXML and specify that it also covers any future versions of OOXML that Microsoft participated in producing. We have still to see the modified text of the OSP, so Microsoft please "encourage" your legal team to finish up the wording.

and third,

(and by far, most importantly) 

Today is the day of the off-line beer drinking of the participants in the Danish debates around ODF/OOXML. We decided that it would be fun to have an off-line meeting between those of us that know each other by (blog)name only. It takes place at 17.00 this afternoon at BrewPub in Copenhagen. If you have a chance, feel free come by and share a beer with the rest of us.

Smile

No reason anymore to mandate anything but ODF?

Yesterday the news broke about Microsoft adding support for ODF in Microsoft Office 2007 SP2. Within minutes the news spread like fire on the hills of Malibu, California and blog-entries started to pop up everywhere - even Brian Jones has apparently returned from Winter hibernation and has made his first blog post in almost 6 weeks. Welcome back to the party, Brian.

Smile

The Denmark IT-news sphere was not hesitant on the keyboard as well and ComputerWorld Denmark posted an article yesterday evening and the competition on version2.dk followed up on the news this morning. I myself got the information from Luc Bollen in his comment in the article I wrote on document translation (and why it sucks). I was sitting under a maple-tree (or some other wooden artifact) having a beer with a friend after a fabulous sushi-dinner and could do absolutely nothing about it.

Dammit!

Well, the reactions to Microsoft's move has actually been surprisingly positive. Even the ADD-bunch at noooxml.org said "If this is an honest attempt to play nice, it is a very welcome move" and even IBM has been quite positive - prompting Bob Sutor to turn the axe on Apple saying: "Hey, Apple, what about you? Let’s see you do this in iWork!". Simply starting to beat on someone else reminds me of the John Wayne quote "A day without blood is like a day without sunshine".

But what is missing from the reactions?

OSP coverage of ODF

One of the side-effects of Microsoft joining OASIS ODF TC is that ODF will likely be included in the list of specifications covered by Microsoft's Open Specification Promise (OSP). The list of specificationshas not yet been updated, but I would expect it to be updated soon - or at least when they officially join the ODF TC. When you think about all the fuss around IPR in this Spring, it is quite surprising that noone has picked up on this. It rams a huge stick through the FUD about the OSP not being applicable for GPL-licensed software. Now the OSP covers ODF as well and thereby the native document format of OpenOffice.org [LGPL 3.0 license] and (I think) OpenOffice Novell Edition.

But why OOXML, then?

A lot of people are now spinning information about this move pulling the rug under OOXML and that ODF should be mandated everywhere - but nothing could be further from the truth. The reason why we approved OOXML still stands and the incompatible feature-sets of OOXML and ODF did not suddenly become compatible. There are still stuff in OOXML that cannot be persisted in ODF and vice versa. The backwards compatibility to the content in the existing corpus of binary documents is still a core value of OOXML and this incompatibility of ODF has not dissapeared. You will still loose information and functionality when you choose to persist an OOXML-file in ODF ... just as you would when persisting it to old WordPerfect formats. Insisting that having ODF-support in Microsoft Office (12 SP2) makes the need for OOXML go away is a moot point - since I am sure no one would argue to replace OOXML with TXT - simply because TXT is a supported format in Microsoft Office.

Microsoft steps up to the task at hand

Some quite extraordinary news emerged from the Redmond, WA, headquarters of Microsoft today. In summary, they announced that

  1. Microsoft will join OASIS ODF TC
  2. Microsoft will include ODF in their list of specifications covered by the Open Specification Promise (OSP)
  3. Microsoft will include full, native support for ODF 1.1 in Microsoft Office 14 and in Microsoft Office 12 SP2 - scheduled for Q2 2009. Microsoft Office 12 SP" will have built-in support for the three most widely used ISO-standards for document formats, e.g. OOXML, ODF and PDF.


My initial reaction when I heard it was "Wow . that's amazing". I am sure a lot of people will react "It's too little, too late", though, but let me use a couple of bytes to describe why I think it is a good move by Microsoft.

Microsoft joins OASIS ODF TC

Well, Microsoft has been widely criticised for not joining OASIS a few years ago. I think it is a bogus claim, but never the less; it has been on the minds of quite a lot of people. Novell has had a seat in both ECMA TC45 and OASIS ODF TC for some time now, and it is my firm belief that both consortia has benefited by this. The move by Microsoft to join OASIS ODF TC will likely have a similar effect. One of the most frequent requests in the standardisation of OOXML was to increase the feature-overlap of ODF and OOXML. This is quite difficult to accomplish (effectively) without knowing what the features of the other document format is (going to be). By Microsoft participating on both committees (and IBM will hopefully consider joining ECMA TC45) harmonization (or "enlargement of the feature-overlap") will likely occur at a quicker pace.

This also means that the worries some of us have had about Microsoft's future involvement in standardisation work around document formats has been toned down a bit. Microsoft is now actively participating in this work in ECMA, in ISO and also in OASIS. I think this is really good news. Not good news for Microsoft - but good news for those of us that are working with document formats every day.

Microsoft will cover ODF with OSP

One of the most difficult, non-technical, discussions during the standardisation of OOXML was legal aspects. It was discussions about different wordings in Sun's CNS, IBM's ISP and Microsoft's OSP (Jesus Christ, guys, pick ONE single acronym, already!) and the possible impact on implementers of ODF and OOXML. One of the aspects of the discussion that never really surfaced was that if IBM has software patents covering ODF - some of them quite possibly cover parts of OOXML as well. But the ISP of IBM does not mention OOXML - it only mentions ODF. This leaves me as a developer in quite a legal pickle, because by implementing OOXML I am covered by the OSP - but I am not covered by IBM's ISP (and vice versa). To me as a developer, Microsoft's coverage of ODF in their OSP is a good move, because it should remove all legal worries I might have around stepping into SW-patent covered territory.

ODF support in Microsoft Office

Microsoft will finally deliver on requests for native ODF-support for ODF in Microsoft Office. Microsoft will support ODF 1.1 in Microsoft Office 12 SP2 and also have built-in support for PDF and XPS (these are currently only available as a separate download).

Denmark is one of the countries where both ODF and OOXML have been approved for usage in the public sector. This is currently bringing quite a bit of complexity to the daily work of information workers since there are not many (if any) applications offering high fidelity, native support for both formats. They hence rely on translators like ODF-Converter or similar XSLT-based translators. It's a bad, but currently necessary, choice. The usage of translators for document conversion has been widely criticised, amongst others by Rob and I, and the built-in support for ODF in Microsoft Office is a great step in the right direction.

As with everything Microsoft does, we need a healthy amount of scepticism as to which extend they will deliver on their promises. However, I truly believe that the moves by Microsoft here are good news - regardless of the scepticism. An old proverb says "don't count your chickens before they hatch" - and this applies perfectly here. We will have to wait and see what will eventually happen - but so far . it looks good.

Beer, beer, beer ... bed, bed, bed ... (Danish)

I den danske del af ODF/OOXML-blogsfæren har vi et stykke tid talt om, at det kunne være skægt at mødes til en pilsner et eller andet sted i København. De fleste af os kender jo kun hinanden fra vores respektive navne og gravatars, og det er i hvert fald min erfaring, at man er lidt langsommere på aftrækkeren, når man har fået sat et ansigt på navnet.

Jeg indbyder derfor til en omgang øl (eller lignende, det kunne også være noget uden alkoholisk indhold) 

torsdag d. 12. juni 2008 kl. 17:00  

Jeg finder ud af et sted at være inden længe - men hvis du har et bud på et sted, så sig til. Jeg hælder selv til ScrollBar på ITU i Ørestaden.

Tilmelding sker i kommentartråden herunder. 

Challenge (Part II)

A tongue-in-cheek challenge for Mr. Rob Weir.

[code=xml]<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
  office:version="1.1">
  <office:body>
    <office:spreadsheet>
      <table:table table:name="Sheet1" table:protected="true" table:protection-key="8A45FB0C33667F9E33ECA007FCE4F6684DC5F242">
        <table:table-column />
        <table:table-row >
          <table:table-cell office:value-type="float" office:value="10">
            <text:p>10</text:p>
          </table:table-cell>
        </table:table-row>
        <table:table-row >
          <table:table-cell office:value-type="string">
            <text:p>
              Dear Rob Weir. Please prove by this example that ODF is an "interoperable"
              document format and tell me how a consuming application should determine if the
              user should be allowed to modify the document. I do not think that it is.
              In fact I think that your statements that ODF is a document format that
              provides interoperability are brash, irresponsible and indefensible
              pieces of bombast that you should retract.
            </text:p>
          </table:table-cell>
        </table:table-row>
      </table:table>
    </office:spreadsheet>
  </office:body>
</office:document-content>[/code] 

(and yes, one of the reasons for this post is to show off the cool syntax highlighter of this blog engine)

Wink

And could you guys please stop the bickering and let's move on to something a bit more interesting? 

What is conformance, really?

The OOXML/ODF-blogsphere has been in a frenzy the last couple of weeks after a couple of posts made by yours truly and Alex Brown that was picked up by Rob Weir. I don't want to get into the technical details here - you should catch up on the conversations taking place in the comment sections of their respective blogs.

Bu I do want to talk a bit about conformance - because conformance should be much more than schema-validation. To be able to have a clear perspective, we need to look in the two specifications for how conformance is described.

ODF 1.0 (IS 26300):

Conformance is described in section 1.5 

Documents that conform to the OpenDocument specification MAY contain elements and attributes not specified within the OpenDocument schema. Such elements and attributes must not be part of a namespace that is defined within this specification and are called foreign elements and attributes.

So this means that the only requirements for a document to have an "ODF-conformant" sticker slapped on it is to be able to validate against the ODF schema. If the document contains elements or attributes not defined in ODF 1.0, they should be marked with their own namespaces. This is actually all there is to say about conformance of individual documents in ODF 1.0 .

The section further describes conformance requirements for consuming and producing applications:

Conforming applications either MUST read documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place, or MUST write documents that are valid against the OpenDocument schema if all foreign elements and attributes are removed before validation takes place.

So this section describes requirements to how foreign elements are handled when writing and reading ODF documents.

OOXML 1.0 (IS 29500):

The conformance clauses for OOXML were (drastically) changed at the BRM. Conformance in OOXML is described with more details and most specifically it contains conformance clauses for the OOXML-package itself, the so-called "OPC-package".

As with ODF, an OOXML 1.0 document is conformant if it adheres to the schema described in the standard.

More specifically it says in Part 1 section 2.4

Document conformance is purely syntactic; it involves only Items 1 and 2 in §2.3 above.

  • A conforming document shall conform to the schema (Item 1 above) and any additional syntax constraints (Item 2).

Now, this is already more difficult to "put down on paper" than the ODF-equivilant. Because "Item 1" and "Item 2" are described in Part 1 section 2.3 as

  1. Schemas and an associated validation procedure for validating document syntax against those schemas. (The validation procedure includes un-zipping, locating files, processing the extensibility elements and attributes, and XML Schema validation.)
  2. Additional syntax constraints in written form, wherever these constraints cannot feasibly be expressed in the schema language.


As a side-note, Item 2 above was the exact reason Stepháne Rodriguez' example with the broken Calculation Chain was actually a non-conforming OOXML-document, but that's a completely different story.

Moreover OOXML describes a few "conformance classes", specifically "Wordprocessing", "Spreadsheet" and "Presentation"-classes. The intent here is to be able to claim conformance to parts of the OOXML-spec.

And just as ODF contained requirements for applications, so does OOXML. But it takes conformance a bit wider. Since there is an "Item 1" and "Item 2" above, there is also an "Item 3". This was modified at the BRM and now says:

3. Descriptions of element semantics. The semantics of an element refers to its intended interpretation by a human being. 

In section 2.5 of Part 1 it now says:

Application conformance incorporates both syntax and semantics; it involves items 1, 2 and 3 in §2.3 above.

So a conforming application also has to abide by the semantics of the specification of elements and attributes. In lay-man's terms this could be described as "A conforming application has to treat content faithfully with respect to the specification of it". So it basically tells applications not to make their own interpretation of the elements it encounter as it traverses the XML-tree.

Now, I know that this is just a crude introduction to conformance of ODF- and OOXML-documents, but I think it is important to get the ball rolling and to give everyone a feeling of the complexity of the concept.

Thoughts, anyone? 

Conformance of ODF-documents

Ever since the now infamous article by Alex Brown the blogsphere has been filled with interpretations of the, really not so surprising, results - that the OOXML document with the original ECMA-376 spec does not conform to IS 29500.

The, really not so surprising, conclusions have been "Office 2007 does not even produce valid OOXML" followed closely by statements like "This shows that Microsoft Office 2007 should not be allowed since it does not produce valid OOXML".

Hmmm ... ok.

As some of you might remember, I participated in some lab tests with OOXML/ODF interop in Fall 2007. Basically I sat in a small room with guys from IBM, Microsoft, Novell and some guys from the Danish National IT- and Telecom Agency sifting through documents, converting them and examining the resulting XML generated. The documents we worked on were supplied by different parts of the Danish public sector. They were basically told to use some of their existing documents as basis for the parts of the tests they participated in. So these documents were real-world-documents.

One of the things we tested was to see if the documents were in compliance with their respective specs. The original OOXML-documents we tested were all compliant to the ECMA-376 spec ... but it was a different case with the ODF-documents. So the other day I tried to validate all the sent-in original ODF-documents supplied to us.

The results are illustrated in the table below:

File name

Generator

Konklusion

DFFE_Afgået svar til Jane Doe.odt

OpenOffice.org/2.3

not valid

DFFE_SJ_(1) - 15-06-2007 Foreløbig Høring om forslag.odt

OpenOffice.org/2.0

valid

GRIBSKOV_bek-281(BS).odt

OpenOffice.org/2.0

valid

GRIBSKOV_Standardbrev ifm ITST pilotprojekt.odt

OpenOffice.org/2.2

valid

GRIBSKOV_Udkast til Forslag til Lokalplan.odt

OpenOffice.org/2.1

not valid

ITST standardbrev ODT.odt

OpenOffice.org/2.0

valid

ITST Testdokument ODT.odt

OpenOffice.org/2.2

not valid

RM Kursusmateriale.odt

OpenOffice.org/2.0

not valid

RM Standardbrev 2s.odt

OpenOffice.org/2.3

not valid

The table contains information about the file name of the original document, the application that generated it (from the META-file in the ODF-package) and if the document passed the test.

Overall conclusion of this was:

Application

Creates consistantly valid ODF?

OpenOffice.org/2.0

 

OpenOffice.org/2.1

 

OpenOffice.org/2

OpenOffice.org/2.3

 

So should we demand that OOo not be used at all? Of course not, but we should keep the pressure on the OOo-team to fix their code ... just as we should with Microsoft and Microsoft Office.

Custom XML in ODF (XForms) Part 1

For some time now I have had an urge to see how to do "Custom XML" or "Custom Schemas" in ODF. Saturday evening after the BRM in Geneva I was sitting in the bar at the Kempinsky Hotel at the lake - naturally talking tech-stuff. We talked about the usual ODF/OOXML-stuff and touched upon the subject of Custom XML. I was told that ODF would not have Custom XML capabilities since the ODF TC thought it was good enough to do it with XForms.

Cool, I thought ... I need to test this.

For this first test I have used the UI of OpenOffice.org 2.4 to create an XForm-enabled document with some basic data in it. I will dig deeper into the technicalities later but OOo UI will do for now. I have been searchning high and low for tutorials on XForms and their usage in ODF, and finally I found this article by J. David Eisenberg on xml.org The article is from 2006. I have made a more simple document for this test - avaliable here: xforms.odt (8,98 kb).

I have created a small form to enable the user to type in some basic data, e.g. "name", "phone" and "email".


The idea is to be able to map the typed-in data to a XML-structure. In my case the structure is this:

<xforms:instance id="clubData">
    <club>
        <name />
        <contact>
            <name />
            <phone />
            <email />
            <city />
        </contact>
    </club>
</xforms:instance>

I have set up the document to do more or less what the original article did so let's look at what is really persisted in the ODF-package. An XForm is basically a connection between "input fields" like "text boxes", "radio buttons" and "drop-down menus" and some XML in the document. Look at the content of a part of the content.xml-file below (some details have been removed to enhance readability):




So 1) puts a control (input field) next to the text "Club name" with control-id "control1". This control is further defined in 2) where the XForms "bind"-attribute 3) tells the application to bind the  contents of the control to the XML specified with the XPath expression in 4).

It's really cool and nicely set up.

But what about persistance of the data entered in the form fields? Well, you add a button and attach an action to it. I called my button "Persist". What this action does is defined with the XForms "Submission" element.



In short the above describes that the content of the form fields should be persisted in a file on my local hard drive. Other methods could be to post it to a webserver or URI somewhere. This is very similar to how InfoPath works.

But at this moment I have two outstanding issues - and here I could use the help of you guys:

  1. I would really like to persist the data in the ODF-package - but I cannot get my head around making OOo doing it. Is it at all possible?
  2. When I click the button nothing happens - the data is not saved to disk. What am I missing here?


This is my first attempt to work with XForms in ODF, and to me it really seems kind of nifty.

So the guy tossing down beers at the bar in the hotel was kindda right - it is possible to do some kind of CustomXML-embedding in ODF using XForms. I also think, however, that it doesn't make a whole lotta sense to compare XForms with the CustomXML-implementation in OOXML - especially if it is not possible to use XForms actions to save the data directly in the package. In this case it seems to me that XForms should be compared to InfoPath instead.

 

So guys, what do you think? What are your experiences with XForms in ODF?

Smile

Formulas in ODF-supporting applications

Some time ago I noticed that Fredrik e. Nielsen had posted a link in a Norwegian debate to a website comparing spreadsheet formula interop using ODF. The article is from 2005 comparing formula interop between OOo Calc 1.9.117 and KSpread from KOffice 1.4.1. The article is interesting since it highlights one of the more serious problems with lacking spreadsheet formula definitions in ODF. Some of the pictures in the article are missing and because the article is 2.5 years old, I thought it'd be interesting to take it for a spin again and see what has happened since 2005 in terms of interop between the two major ODF-implementations. I have done exactly the same as in the original article and have additionally added a bit of research to see where the problem really lies.

What did I do?

Well, on my brand new ubuntu 8.0.4 installation I installed KSpread 1.6.3 in addition to OOo 2.4 that came pre-installed with the system.

I created a spreadsheet using OOo 2.4 Calc with the data from the original article (formula OOo.ods (7.58 kb)

 



I then tried to open it using KSpread. This is what it looked like:

 



As in the original article I modified the formula to fit in KSpread and the result was:

The file s available here: formula KSpread.ods (5.44 kb)

When saving this file and opening it in OOo again, this was the result:

So there has actually (pheew) been some improvement in spreadsheet formula interop for applications using ODF Spreadsheets since 2005 ... thank God! At least now OOo is able to show the formula created by KSpread.

To take a more deep look into what was the cause of the problems, I added some information to the original spreadshee. The result is here: Since OOo can read the formulas from KSpread, I have opened the file using KSpread to demonstrate the problem:



The file is available here:(formula OOo exp.ods (9.30 kb)

So what should we conclude from this very basic test? Well, you tell me ... but at least, when someone next time tells you, that lacking formula spec in ODF is not a practical problem but only a theoretical problem ... please tell them that they are wrong.

Now I get it - ODF and MathML

(see updated content below) 

Some time ago I wrote an article about ODF and usage of mathematical content by MathML. One of the quirks I couldn't get my head around was why the XML-file containing the MathML-document had to be named "content.xml". I used the phrasing

This was a bit more tricky, since somehow it seems that the mathical formula can only be contained in a file called "content.xml" - otherwise OpenOffice.org simply shuts down.

Well ... the answer came to me almost in a dream - or at least in the evening in one of those semi-awake-states. Basically the answer lies in the answer I never got on my question on which parts in an ODF-poackage are mandatory and if there are any parts with pre-defined names. The ODF-specification  section 9.3.3 says for embedding objects:

  • The xlink:href attribute links to the object representation, as follows:
    • For objects that have an XML representation, the link references the sub package of the object. The object is contained within this sub page exactly as it would as it is a document of its own.
    • For objects that do not have an XML representation, the link references a sub stream of the package that contains the binary representation of the object.

Now, in ODF MathML clearly has an XML-representation and MathML is also a "real" "OpenDocument representation". So a MathML is stored as a "sub package" within the ODF-package itself. And that brings me back to my original question.  You see, even though a piece of MathML is not a OpenDocument file per se, it still has to be embedded as an entire ODF-package (without the ZIP-structure). Section 2.1 clearly states this as

A document root element is the primary element of a document in OpenDocument format. It contains the entire document. All types of documents, for example, text documents, spreadsheets, and drawing documents use the same types of document root elements. The OpenDocument format supports the following two ways of document representation: 

  • As a single XML document.
  • As a collection of several subdocuments within a package (see section 17), each of which stores part of the complete document. Each subdocument has a different document root andsstores a particular aspect of the XML document. For example, one subdocument contains the style information and another subdocument contains the content of the document. All types of documents, for example, text and spreadsheet documents, use the same document and subdocuments definitions.

And since an ODF-package requires the main part to be called "content.xml" the MathML-file needs to be called "content.xml" as well. There is also no manifest file in the sub-package to tell the name of the package part - hence the requirement to have a fixed part name for the main part.  I wish this information was more clearly described in ODF and not simply implied in the text.

... did I mention that I prefer the relationship-model of OPC?

Smile

(update  2008-05-16)

A bit down in the comment track of this post I promised to make an ODT-file with MathML embedded inline as opposed to the "regular" OOo-way of embedding it as a seperate object. Today I finally got around to doing it. It was actually really easy - I just took the embedded MathML-object from the ODF-package and pasted in into the correct location in the content.xml-file. A good thing is that with this approach you don't have to worry about specifying a DOCTYPE (the OOo-dependancy), so I would say this is highly recommendable. The XML looks like this:

[code=xml]<draw:frame
  draw:name="Objekt1"
  text:anchor-type="as-char"
  svg:width="2.972cm"
  svg:height="1.138cm"
  draw:z-index="0">
  <draw:object>
    <math:math>
      <math:mrow>
        <math:mtext>cos</math:mtext>
        <math:mo>(</math:mo>
        <math:mfrac>
          <math:mi>pi</math:mi>
          <math:mn>4</math:mn>
        </math:mfrac>
        <math:mo>)</math:mo>
        = <math:mo>(</math:mo>
        <math:mfrac>
          <math:msqrt>
            <math:mn>2</math:mn>
          </math:msqrt>
          <math:mn>2</math:mn>
        </math:mfrac>
        <math:mo>)</math:mo>
      </math:mrow>
    </math:math>
</draw:object>
</draw:frame>[/code]

When opened in OOo (2.4 DA) the result looks like this:

 


Only remaining quirk is the missing "equals-sign", but I haven't had time to dig into those details yet.

If anyone can help and contribute here, that would be great.