a 'mooh' point

clearly an IBM drone

Is "interoperability" a transitive characteristic?

Way back when I was a math-major at university, we were taught about "operations on sets". A set could simply be "the natural numbers", which could be defined as all positive integers including the number 0. An operation on this set could be addition of numbers, multiplication of numbers and so forth. An operation can have a lot of characteristics, e.g "commutative", "associative" or "transitive". An "associative" operator means that you can group the operands any way you want and a "commutative" operator means that you can change the order of the operands. Confused? Well, it's not that complex when you think of it. The mathematical operator "addition" is an "associative" operator (or "relation") since (1+2) + 3 = 6 and 1 + (2+3) = 6. The operator "divide" is not associative since (1/2) / 3 = 1/6 whereas 1 / (2/3) = 3/2. Addition is also a commutative property since you can change the order of the numbers being added together. This is evident since 1+2+3 = 6 and 3+2+1 = 6. Similarly "subtraction" is not a commutative operator since 1-2-3 = -4 whereas 3-2-1 = 0.

The transitive characteristic is a bit different than this and the "everyday equivilant" would be when we infer something. So think of transitivity is a mathematical formulation of what we do when we infer.

The relation "is greater than" is a transitive characteristic - as well as "is equal to". Basically, a relation (is greater than) being transitive means, that if A > B and B > C then A > C.

The latter popped into my mind the other day when I was pondering over interoperability between implementations of document formats.

Ever since Rob's ingenious article "Update on OpenOffice.org Calc ODF interoperability", I haven't been able to get it out of my head.

 

1  /  2  / 3   

Microsoft Office 2007 - now with ODF-support

On October 22nd a long awaited email popped into my mailbox  - news of the release of first beta of Microsoft Office 2007 SP2. The reason for me longing to get my hands on this piece of software (and I have, in vain, tried to squize each and every single Microsoft employee I could to get it earlier) was not that it is a Service Pace for my current office application. Nor is it that I should now expect a more stable software package, because I am not troubled by instability in my everyday work with Microsoft Office.

My interest is caused by the fact that Microsoft Office 2007 SP2 includes support for ODF 1.1, and to be frank, it is not really because Microsoft has now chosen to support ODF natively in Microsoft Office - I am sure most would agree with me that they should have supported ODF a loooong time ago.

No, what will be interesting to see will be what it will mean for interoperability via ODF.

It's the standards, stupid

It has long been a public secret that you were walking in egg-shells when exchanging ODF-documents between ODF-supporting applications that are not somehow based/cloned from OpenOffice. Of course it is possible to exchange "BUI-documents" (yes, it is a acronym I have invented for this. It means Bold, Underline and Italics and represents rather simple documents without too much fancy pancy stuff in it.) but the best experience is when using OO spin-offs.

This makes perfect sense. When using the same program, you will get the least amount of problems. This is in essense the text-book/Page1 elevator pitch for Microsoft Office sales people

And this is exactly why ODF-support in Microsoft Office 2007 is interesting - it is the first major productivity application not based on OpenOffice that promises native ODF-support.

Now some people seem to think that as long as you use an open standard like ODF, PDF or OOXML, "interoperability" is somehow included. It is as if they are trying to apply some sort of Kant'ish "Das ding an sich"-thinking when they argue that achieved interoperability is somehow an intrinsic, guaranteed feature of an open standard. The funny thing is that every time I hear these arguments I always try (or fail, rather) to find a nice way of saying that they have understood squat of the problem and that they should try to work seriously with the subject at hand before speaking so bluntly about it.

The truth is of course somewhat different and this is why I genuinely applaud the work done with the OIIC in OASIS. The truth is that an open standard enables or facilitates good interoperability and that this potential is bigger for an open standard than for a closed standard. It is clear that both ODF and OOXML provide for better interoperability than the proprietary binary DOC-formats, but reversely the binary DOC-formats are also proof that fairly good interoperability is also possible when using non-open document formats. The world is not - once again - black/white, because it is clear that an open standard is not a requirement for interoperability - but it certainly helps a lot.

My point here is

Interoperability is not created by the standards. It is created in the applications based on the standard

All applications have bugs/quirks

This is the reason this is not about the standards - rather, it's about the applications. We are now in the situation that we have two big players supporting ODF (to a varying degree). But they will propably do it in different ways. We are now in a situation where we no longer have the luxury of the major ODF-producing/consuming applications being built on the same engine. My expectation is therefore that we will experience interoperability-problems with the ODF-applications, because Microsoft Office will likely do some things differently than the OpenOffice-clones (but comply to the ODF-spec at the same time).

This is why I asked Microsoft these two questions when I attented the first DII workshop in late July 2008 (they recently held another one but I did not attend).

1. How have you handled the possibility of using application specific settings in ODF?

As you know ODF has (and now also OOXML after BRM #¤"¤%¤#¤#&"#¤#"¤#¤%, thank you very much!) the so-called "config-item-set"-elements, which are used by the current ODF-implementations to store application specific behaviour. The problem with these elements and attributes is that they are not specified in the ODF spec, so there is really no obvious way to figure out what to do with the binary printer-blob that Lotus Symphony stores in ODF-documents produced by it. The short reply from Microsoft was: "We don't use it" and if you open the settings.xml-file in the ODF-package, it is empty. This is all fine and dandy - only problem is that you risk loosing information when exchanging documents.

2. How have you handled known bugs, features in other, major ODF-applications?

All applications have bugs - including ODF-supporting applications, so my question was perfectly legitimate. Again the answer was: "We don't handle it". With this answer Microsoft gets in line with alle the other application manufacturers that don't handle their competitor's bugs. There is e.g. a "bug" in KSpread's implementation of formulas (specifically the LOG-method). This is not handled by OpenOffice.org - even though it is fairly well known.The consequence is that strange things might happen when exchanging spreadsheets between KSpread and OOo Calc.

It didn't really matter before, 'cause not that many people use KSpread - but this picture is about to change with ODF-support in Microsoft Office 2007.

The bigger picture

I you will allow me to use one of my favorite, stupid expressions, then let's for a moment "step into the helicopter to see the bigger picture".

Because I believe that Microsoft's implementation of ODF will mean interoperability-problems using ODF-files in the short term. But I also think that it will mean better ODF-support on a broad scale - in the long run.

I have previously dealt with the MathML-support of OpenOffice.org which is slightly buggy. The ODF-spec says this about mathematical content:

Mathematical content is represented by MathML 2.0

And that's it.

As you might remember, the problems with OOo's MathML-support are due to the fact that OpenOffice.org requires a DOCTYPE-declaration in the MathML-object to display it. Also it seems that OOo will only display a certain kind of MathML. I have documented this in a previous post, but the short story here is that a simple mathematical equation in an ODF-document created using Microsoft Office 2007 SP2 will not display in OOo 3.0 nor Lotus Symphony 1.0 The ODF-file is perfectly valid and so is the MathML-fragment (tested using jing and the RelaxNG-schemas for ODF 1.1 and MathML as well as the MathML-tool from W3C, Amaya).

This example serves to illustrate my point: Microsoft's implementation of ODF will mean better support for ODF in the long run, because it forces existing problems in the applications to surface - and they can then be fixed.

And a small note for the trigger-happy ones: This is not due to the fact that Microsoft has implemented ODF - merely it is due to the fact that we will now have a new, major implementation of ODF to exchange documents with.

The problems described above have propably existed for years but no-one have noticed since most people use some kind of OpenOffice-clone for creation and display of ODF-documents. Now, on the other hand, errors in the applications (including in Microsoft Office) will be very obvious and the pressure to fix them will be much bigger. I also predict that Microsoft will have to speed up the release cycle of updates to their productivity-applications supporting ODF - at least when it comes to hotfixes of known problems. I don't think anyone will settle for bi-annual service packs for fixing trivial errors with big impact on productivity and interoperability.

Only remaining question now is: when will SP2 make it into Microsoft Office 2007? When it snows in Seattle?

(btw, I watched Grey's Anatomy yesterday, and according to them, it does snow in Seattle from time to time!)

DII ODF workshop - the good stuff

... continued from DII-workshop in Redmond - round-table discussions.

So - let's get down to what was the real purpose of going to Redmond - apart from the great breakfast I had at Lowell's in Farmer's market in Seattle - to test the pre-alpha version of Microsoft Office 2007 SP2 and its ODF-support.

(let me start by appologizing for the late post, but I lost my USB-drive with my test-files on, and I didn't find it until a few days ago)

I have already listed some of the findings of the day in my previous post, so I'll try to get into more detail here.

What did I do?

Well, we would have some hands-on time with the latest build of Microsoft Office 2007 SP2 (apparently directly from a developer's machine) so I brought a bunch of documents I have worked on before - some of them was from the application interop-work I participated in in Fall 2007 for the Danish National IT- and Telecommunication Agency. Others I have created myself. I performed the following steps for each file:

  1. Load the ODF-file in OpenOffice.org 2.4
  2. Create a PDF-file of the document using a PDF printer driver (CutePDF)
  3. Load the ODF-file in Microsoft Office 2007 SP2
  4. Do a "Save as ODF" and prefix the original filename with "MSO". According to the Microsoft project managers I talked to, this would ensure I actually saved a version of the ODF-file that had been processed by the internal object model of Microsoft Office 2007 SP2.
  5. Create a PDF-file of the document using a PDF printer driver (CutePDF)

Below I have listed for each document the following data:

Original file: somefile
Original file New file
Generator: SomeApplication PDF Generator: Microsoft Office 2007 SP2 PDF

For each I will include some tech remarks on interesting subjects - if any.

There are a couple of things to note on a general level before we get started. Microsoft has chosen to follow implementation of ODF "by the book" in the sense that they have not looked so much about bugs or "features" in competing applications. This has the peculiar effect that perfectly legitimate ODF-files produced by Microsoft Office 2007 SP2 might not properly in competing applications. For more general ideas of what they did, you should check out Dennis Hamilton's post from the workshop. It is by far the most comprehensive of the ones posted since last week.

Original file: Testfile_03.odt
Original file New file
Generator: OpenOffice.org/2.4$Win32 PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

This file is an ODT-file with an embedded ODS-spreadsheet. Loading this file into Microsoft Office shows a nice red cross and no spreadsheet. An inspection of the ODT-file shows that the content is pretty much preserved including the embedded ODS-spreadsheet. But when looking at the manifest file, the following appears:

[code=xml]<manifest:file-entry
 manifest:media-type="application/x-openoffice-gdimetafile;windows_formatname="GDIMetaFile""
 manifest:full-path="ObjectReplacements/Object 1"
/>[/code]

It is the location of the graphical representation of the embedded spreadsheet. The media-type seems to be an old StarView Metafile format (confirm, anyone?) and Microsoft Word doesn't understand this image format - hence the red cross. This example highlights one of the points of bad interoperability: Small errors can cause big problems. Everything but the missing image is preserved, but the document becomes useless regardless of this "small" error".

Original file: Testfile_07.odt
Original file New file
Generator: OpenOffice.org/2.0$Win32 PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

This file is included in the "Self-assesment"-package from the Danish National IT- and Telecom Agency. Loading the letter into Microsoft Office 2007 initially appears to produce an identical file, but even though the content itself is preserved, there are still areas with problems.

  1. There is a border around the logo image in the header
  2. The height of the header is not completely preserved
  3. The "right margin" (which is really a stretched text box) is gone since the text box is wrapped around the text instead of being preserved in its full length
  4. Page numbering is gone on the last page
A funny note: if you load the file generated by Microsoft Office 2007 in OOo 2.4, it loads perfectly fine as the original document. This suggests that the problems encountered by loading it in Microsoft Office 2007 are not problems with converting ODF to the internal object model of Microsoft Office 2007 but instead problems in the layout engines.

Original file: Testfile_08.odt
Original file New file
Generator: OpenOffice.org/2.2$Win32 PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

This is another document from the Self-assessment package. It contains a few different features; a TOC, colored text, text boxes, a drawing, an embedded spreadsheet as well as some change-modification. This generated document is kind of messy. The content has been "shuffled" around and again we have the problem with Microsoft Office 2007 SP2 not understanding the GDIMetafile image format. The embedded objects are fine themselves - the graphical representation of them is not.

Original file: Testfile_10.odt
Original file New file
Generator: Jesper Lund Stocholm PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

This file is another one of my own files that I have created earlier. It contains a mathematical formula in MathML. When loading it in Microsoft Office 2007 SP2, the mathematical formula simply dissapears. I am kind of lost on the reason for this. It is not the DOCTYPE-declaration used by OOo (see next file for those details) so maybe it is the construction of my ODT-file that poses an issue for them.

Original file: Testfile_11.odt
Original file New file
Generator: OpenOffice.org/2.4$Win32 PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

This file is almost identical to the one above - but it is generated by OOo 2.4 instead of me and carries all the styling and configuration that comes with it. Here the file and the mathematical content loads just fine. But an interesting thing happens when saving it again. The MathML-fragment is slightly altered from

[code=xml]<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML">
 <math:semantics>
  <math:mrow>
   <math:mi>cos</math:mi>
   <math:mrow>
    <math:mrow>
     <math:mo math:stretchy="false">(</math:mo>
     <math:mfrac>
      <math:mo math:stretchy="false">π</math:mo>
      <math:mn>4</math:mn>
     </math:mfrac>
     <math:mo math:stretchy="false">)</math:mo>
    </math:mrow>
    <math:mo math:stretchy="false">=</math:mo>
    <math:mfrac>
     <math:msqrt>
      <math:mn>2</math:mn>
     </math:msqrt>
     <math:mn>2</math:mn>
    </math:mfrac>
   </math:mrow>
  </math:mrow>
  <math:annotation math:encoding="StarMath 5.0">
    cos({%pi} over {4} ) = {sqrt{2} } over {2}
  </math:annotation>
 </math:semantics>
</math:math>[/code]

to

[code=xml]<?xml version="1.0" encoding="UTF-8"?>
<mml:math
  xmlns:mml="http://www.w3.org/1998/Math/MathML"
  xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
  <mml:mi mathvariant="normal">c</mml:mi>
  <mml:mi mathvariant="normal">o</mml:mi>
  <mml:mi mathvariant="normal">s</mml:mi>
  <mml:mo>(</mml:mo>
  <mml:mfrac>
    <mml:mrow>
      <mml:mi>π</mml:mi>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>4</mml:mn>
    </mml:mrow>
  </mml:mfrac>
  <mml:mo>)</mml:mo>
  <mml:mo>=</mml:mo>
  <mml:mfrac>
    <mml:mrow>
      <mml:msqrt>
        <mml:mn>2</mml:mn>
      </mml:msqrt>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>2</mml:mn>
    </mml:mrow>
  </mml:mfrac>
</mml:math>[/code]

The clever reader will notice that the semantic annotations used by OOo are removed from the MathML-fragment. The MathML is in general altered a bit, but it is not that big changes - most of them are visual things related to styling. The problem is that this MathML is un-consumable for OOo. The MathML-fragment produced by Microsoft Office 2007 SP2 is valid MathML (validated using Amaya) and even though I add the required !DOCTYPE, it still won't load in OOo.

Original file: Testfile_13.odt
Original file New file
Generator: OpenOffice.org/2.0$Win32 PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

(file has been removed at the request of the originator of the file )

This file is a bit more complex, and as with Testfile_08 it consists of a lot of different parts. Key issues here is failure to read GDIMetaFiles, borders around images, errors in visual presentation of numbering/bulleted lists and lines being much thicker than in the original file. There is really nothing new in this file - just that it confirms the problems identified with Testfile_08.

Original file: Testfile_14.odt
Original file New file
Generator: OpenOffice.org/2.3$Win32 PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

This file is one of those template-files that are used a lot almost everywhere. You know, someone has created a "standard" document with correct header, footer and images, and this file is then distributed in the organisation. The conversion is actually almost error-free. There is a slight error with respect to border around images and rendering of them, but that is just about it.

Original file: Testfile_20.ods
Original file New file
Generator: OpenOffice.org/2.4$Win32 PDF Generator: Microsoft Office 2007 SP2 PDF

Remarks

(both PFD-files have been created by OOo 2.4/Win32)

I created the file above to illustrate what would happen when working with spreadsheets. I used the infamous CEILING-function, but I was at that time not aware that Microsoft Office 2007 SP2 would throw out formulas from "unknown namespaces". Hence there is very little change - only the visible number of decimals after having been through Microsoft Office 2007 SP2 has been reduced to two. If you look in the XML generated, you will find one interesting thing, though:

[code=xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<office:document-content
  xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
  xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
  xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
  xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
  xmlns:msoxl="http://schemas.microsoft.com/office/excel/formula"
  >
  (...)
</office:document-content>[/code]

Can you see it?

Conclusions

Well, the investigation above was done based on about 20 files tested and they were primarily text documents (and one spreadsheet). Some of them was created by me and some were created by various parts of the public sector in Denmark. I have only looked at about half of the files, but a few other files are also available shold you wish to play with them yourself. You can get them here: public.zip (3,02 mb).

Validation

I have made some effort to validate the ODF-files generated by Microsoft Office 2007 SP2. What I have done is to download the RelaxNG ODF 1.1-schemas from OASIS' website and I used JING to perform the schema-validation. Since there is a known bug in the schemas I have used JING with the "-i" flag set. Validating the structure of the package itself is a bit tricky (as reported by Rick Jellife) and I have not done that. I have done a schema-validation on the files "content.xml" and "styles.xml" based on the thought, that these are the most complex files in the package. The result of the validation is that all files generated by Microsoft Office 2007 SP2 are valid ODF 1.1-files. I piped the result of the validation into an output file available here for your viewing pleasure: output.txt (1,92 kb).

All in all I think Microsoft has done a pretty good job. Obviously there is still some way to go until it reaches production quality, but I was pleasantly surprised to see the big difference in conversion results compared with the results of the ODF Converter from SourceForge.net I have worked with earlier. There are a couple of things I would like to note, though:

Graphical representations of embedded objects

Microsoft Office 2007 SP2 has problems with reading the graphical representation of embedded objects if the file is created by OpenOffice. It seems that it simply doesn't support the GDIMetaFile-format used by OpenOffice (and its derivatives). I think the "nice" way to solve this would be to load the object (if supported) and render an image of it again. The dimension of the image is available in the <draw:frame>-element and could be used to determine the size of the image.

Embedded objects

I noticed that handling of embedded objects are done using a "don't touch"-approach, which means that when loading an ODF-file with an embedded object, the embedded object is simply copied and not touched by Microsoft Office 2007 SP2 (if they are not activated by the user). I think this is a good approach. Consuming applications should respect the "integrity" of the consumed package and not alter its content unless it has to.

mimetype

A funny little thing: The mimetype-file in the ODF-package is created using CAPITAL letters, i.e. the file will be called "MIMETYPE". This causes the OpenDocumentFellowship validator to fail since it cannot find the file (with non-capital letters). I have suggested to Microsoft to generate the file using non-capital letters to enhance interop and validation across platforms where some are "a bit more" case-sensitive than Windows.

config settings

Microsoft has chosen not to use the configuration elements otherwise to widely used by Lotus Symphony and OpenOffice.org . I am not sure if I think it is a good or a bad idea, but since they do not use the settings.xml-file at all, they should remove the file completely.

Are document formats silver-bullets?

A new study from the University of Illinois College of Law has made its way to cyberspace. The title is "Lost in Translation: Interoperability Issues for Open Standards - ODF and OOXML as Examples" and is done by Rajiv Shah and Jay P. Kesan. The study takes a rather novel approach compared to the debates that have been raging through the last year or so: Is the choice of a(ny) document format a silver bullet for interoperability?

The answer in the paper is a clear "No". When discussing the various interop-studies internationally, they note

While it is widely acknowledged that there are problems with interoperability across different formats, e.g., going from ODF to OOXML, there is an assumption here that all implementations produce the same ODF or OOXML.

Their conclusion is that this is not the case. What they did was to create a number of test documents using the reference implementation for each format, OpenOffice.org for ODF and Microsoft Office 2007 for OOXML. They then opened these documents in other applications supporting these formats.

The results are rather interesting:

Results for ODF

Implementation Raw score  Raw score Percentage
Weighted Percent
OpenOffice
 151  100% 100%
StarOffice  149  99%  97%
Sun plug-in for Word
 142  94%  96%
CleverAge/MS plug-in for Word  139  92%  94%
WordPerfect  122  81%  86%
KOffice
 121  80%  79%
Google Docs  117  77%  76%
TextEdit
 55  36%  47%
AbiWord
 48  32%  55%

Results for OOXML

Implementation
Raw score
Raw score Percentage
Weighted Percent
Office 2007
148
 100% 100%
Office 2003
148
 100% 100%
Office 2008 (Mac)
147  99%  99%
OpenOffice
141  95%  96%
Pages 142  96%

 95%

WordPerfect 114  77%  84%
ThinkFree Office
101  68%  83%
TextEdit
52  35%  43%

They further conclude that

The final implication stems from the surprisingly good results for OOXML implementations. Critics of OOXML have argued that it was too complex and difficult to implement. While OOXML is a long and complex standard, it is possible to offer good compatibility. In fact, our results suggest that implementations of OOXML work as well as implementations of ODF. At the level of basic word-processing that we examined, neither standard had a dominant advantage over the other in terms of compatibility scores. While ODF has had a head start that has lead to more implementations, there appears no reason why OOXML cannot catch up. After all, several developers have provided independent implementations of OOXML.

... which should be interesting for those mandating usage of (an open) document format.

If nothing else this study highlights a couple of very interesting points:

  1. You don't get good interoperability simply by choosing an open document format
  2. Interoperability still has a long way to go and there is still a lot of work to be done. 
Smile

DII-workshop in Redmond - round-table discussions

... continued from DII ODF workshop catchup.

The last part of the afternoon in Redmond was a round-table discussion of standards in general; what to do with them and how to work with them in terms of handling interop with other vendors implementing the same standard. It was really interesting and it was clear that Microsoft wanted to hear our input. Everyone in the Microsoft Office "Who's who"-book was there to participate and we had a good couple of hours debating the issues at hand.

One of the really interesting guys I met there was John Head aka "Starfish". He is a Microsoft partner as well as an IBM business partner, and he really grilled Microsoft with respect to some of the decisions they had made around how the UI behaved. You should check out his thoughts on his own blog. It was clear that he had some leverage in relation to Microsoft - even though I did not agree with everything he said. 

An interesting topic was application interop. If you ask me, interop is based on standards but carried out by applications - in other words, standards do not give good interop simply by themselves. This idea was really confirmed when we talked about a thing John also mentions - how do I handle bugs in other applications? I think that it was Peter Amstein that noted that an example of this was the 1900-leap year problem where a decision made 20 years ago still haunt them. I couldn't agree more. But a similar example is application-specific extensions. ODF has this wonderful (read: awful) concept of "configuration item sets". These are specified in section 2.4 of ODF 1.0 and the usage is intended to be to store various application specific settings. The problem with these elements is that there are really no restrictions to how to use them. So you will end up with an application like OpenOffice.org 2.4 that puts data like this in the section:

[code=xml]<config:config-item config:name="AddParaTableSpacing" config:type="boolean">true</config:config-item>
<config:config-item config:name="AddParaSpacingToTableCells" config:type="boolean">true</config:config-item>
<config:config-item config:name="UseFormerLineSpacing" config:type="boolean">false</config:config-item>
<config:config-item config:name="AddParaTableSpacingAtStart" config:type="boolean">true</config:config-item>
<config:config-item config:name="UseFormerTextWrapping" config:type="boolean">false</config:config-item>
<config:config-item config:name="UseFormerObjectPositioning" config:type="boolean">false</config:config-item>
<config:config-item config:name="UseOldNumbering" config:type="boolean">false</config:config-item>[/code]

 

Lotus Symphony 1 even puts binary blobs into the configuration items to hold application specific printer settings

[code=xml]<config:config-item config:name="PrinterSetup" config:type="base64Binary">
  ugD+/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
  AAAAAAAAAAAAAAAAAAWAAAAAAAAAAAAAAAIAAAAAAAAAAAA
</config:config-item>[/code]

So you now have OOo (and also Lotus Symphony but to a lesser degree) put in all these settings that not only directly affects the visual layout of the document but - in terms of e.g. the "UseFormerLineSpacing" - specifices that an application should behave as OOo 1.1 .  These are really "OOo Compat-elements".

Now, the question is, what should other vendors do with these "extensions"? Well, Microsoft seems to be under a lot of pressure from organisations like the European Union to implement ODF strictly by the book, so they have chosen to ignore them (and other knowledge of bugs) completely. If you look at the settings.xml-file they actually strip it completely from content and do not use it themselves. Another example is mathematical content in text documents. As I documented some time ago, OOo has a bug requiring the MathML-fragment to include a !DOCTYPE-declaration - otherwise OOo will not display the math content. The result is that ODF with math generated by Microsoft Office will not load the math in OOo due to this OOo-bug. Is the approach chosen by Microsoft the right one? I think so for the following reasons:

  1. Otherwise the result will be an endless propagation of these settings where each implementation will need to support each and every setting from all other vendors
  2. I agree with John Head that it is good to put some pressure on OOo. It has for a long time been living relatively "low-key" in terms of critism and market pressure and it will be good for all of us to have the quality of the application be enhanced.

 

Will this hurt interop? Yes, of course it will ... but I still think it is the right decision.

Another interesting thing we discussed was extensibility - how applications should/could extend a standard. This was one of the topics where it seemed that I was dissagreeing with almost the entire room. What we talked about was: What do an application do with content it does not understand? Both ODF and OOXML have mechanisms to extend the document format with foreign namespaces etc, and I got the impression that most implementations simply remove content they do not understand when roundtripping documents. Microsoft has chosen the same approach and the argument they made was that it was imposed on them by their "Thrustworthy computing"-guys since preserving non-understood data could be used to hide sensitive information in documents. Even though I see the problem, I still think the argument is wrong. There are tons of other places and ways to hide information in a document and I'd prefer to have the unknown elements and attributes preserved when roundtripping. 

DII ODF workshop catch-up

Wow - it seems the entire cast of Microsoft bloggers are laying the grounds for a whole bunch of blog-entries after the DII-workshop in Redmond the day before yesterday (Wednesday July 30th).

So OK - I'll bite.

I am again back in Denmark after a Hell-ish evening after the workshop where jet-lag almost had the end of me. If the cap-driver had actually known the way to my hotel (he didn't so I ended up giving directions in a town I had only been in for about 36 hours in a country I was only visiting for the second time) I am sure I would have been catching Z's big time on my way home in the cap ... at 20.15 in the evening.

All in all it was very interesting to attend the workshop in Redmond. The day started up with an introduction by Peter Amstein about the approaches and decisions Microsoft had done when working on their ODF-implementation. One of the more interesting discussions here was what Microsoft had done where there was differences between ODF and OOXML - should they be conservative or creative? An example of this was numbering/bulleted lists (it is one of the key parts with differences between ODF and OOXML).OOXML has the possibility of having each bullet in a list a seperate colour where ODF does not. Microsoft had chosen the conservative approach and simply removed all colours from the bullets, but Patrick Durusau noted that he thought it was possible to use styling to do it in ODF - only downside was, that to his knowledge, this particular way of doing it was not supported by any ODF-implementation. I guess some times you are screwed regardless of what you do.

After this followed various project managers that demonstrated how their part of Microsoft Office supported ODF and talked about the remaining work. They each had complex documents that they showed us their work on and the result of saving them (OOXML-files) to ODF. I remember working with conversion using the SourceForge translator in Fall 2007, and I was really impressed by the fidelity of the conversions we saw.

Key points from this part was

  • Shapes are converted without much loss of fidelity
  • Metadata is converted (Dublin Core)
  • Fields are converted
  • Headers/footers are converted
  • TOC etc are converted
  • Shapes are converted
  • SmartArt is converted to shapes
  • Images are converted including cropping etc
  • New 3D-shapes are reduced to the closest possible OpenDocument Drawing shape
  • Old 3D-shapes are converted
  • Formulas in spreadsheets are implemented by the ECMA-376 specwith Microsoft's namespace
  • Mathematical content is converted from/to OMML and MathML
  • When loading an ODF-spreadsheet
  • Tables in OOXML-presentations are converted to shapes thereby making a "virtual table" since ODF does not support tables in presentations
  • Conversion of embedded objects is not fully supported
  • CustomXML is converted to "flat Xml" with content controls being discarded.
  • When loading a spreadsheet from ODF with formula-namespace other than Microsoft's, just the values are being converted and the formulas are disgarded
  • Animations are converted in presentations
  • The concept of "master pages" is not converted to ODF
The rest of the day consisted of hands-on labs (stay tuned for the tech-stuff from this part of the day) and a round-table discussion in the afternoon. I will talk about these part in a couple of posts in the beginning of next week.

I saw that Doug Mahugh will post the material presented to us - so watch his blog for these data.

Generated by Microsoft Office 2007

As we are testing the various test ODF-files we/I have brought to the Microsoft DII workshop here in Redmond, I stumpled over the following XML-fragment:

[code=xml]<office:document-meta
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  office:version="1.1">
  <office:meta>
    <dc:generator>MicrosoftOffice/12.0 MicrosoftWord</dc:generator>
    ...
  </office:meta>
</office:document-meta>[/code]

*giggles*

A year ago - who would have thought this?

Smile

ODF-implementation in Microsoft Office 2007 SP2

So once again here I am – waiting for a connecting flight out of Frankfurt, Germany. There is about an hour and a half to my flight to Seattle where I will attend the Microsoft ODF DII-workshop about Microsoft’s implementation of ODF in Microsoft Office 2007 SP2. I am looking forward to seeing what they have accomplished and especially for the hands-on lab on Wednesday afternoon, where we will have the opportunity of testing our own documents to see the quality of the implementation. I have therefore brought my own little “tool-box” of documents to test in the lab. Since we were not required to sign any NDAs, I will try do document my tests as good as I can and post them on my blog ASAP. I hope they will be able to contribute to the on-going discussions taking place.

Of course, there are tons of different parameters to test, and it will be impossible for us to test them all, but a few areas do indeed deserve some attention, because these have traditionally been the areas causing most trouble. A non-exhaustive list would be

  • ODF-files with embedded objects
    •  MathML
    • Spreadsheets
    • Presentations
    • Binary objects
  • Inline embedding of MathML-fragments (just for the fun of it)
  • ODF Drawing (vector graphics)
  • Numbering
  • Formulas in spreadsheets
  • Handling of anchoring of graphics and other document parts
  • Conversion from OOXML to ODF
    • OMML
    • DrawingML
    • VML
    • Embedded objects
  • XForms
  • Custom XML
  • clip-board content

The list above (apart from the latter three) nicely summarizes the problems we encountered when I participated in the work for the Danish Government (National Agency of IT and Telecommunications) in Fall 2007 about application interoperability between ODF and OOXML. I hope these tests will be able to contribute to the on-going discussions here in Denmark as well.

Another interesting thing to see will be how Microsoft has handled the various application-specific parts of ODF. How have they handled formulas in spreadsheets? How have they handled document protection of document parts? How have they handled the application-specific content in the <config-item-set>-elements of the other implementations? These are not trivial questions and they directly impact interoperability with other implementations of ODF.

I think it will be an interesting day tomorrow – and I’ll keep you posted on the progress. If you have any last-minute ideas and suggestions to what I should test, please write me an email or simply post a comment to this post. If you have files you’d like me to test, send them to me as well. You can use the “Contact” form on this blog to do so.

No reason anymore to mandate anything but ODF?

Yesterday the news broke about Microsoft adding support for ODF in Microsoft Office 2007 SP2. Within minutes the news spread like fire on the hills of Malibu, California and blog-entries started to pop up everywhere - even Brian Jones has apparently returned from Winter hibernation and has made his first blog post in almost 6 weeks. Welcome back to the party, Brian.

Smile

The Denmark IT-news sphere was not hesitant on the keyboard as well and ComputerWorld Denmark posted an article yesterday evening and the competition on version2.dk followed up on the news this morning. I myself got the information from Luc Bollen in his comment in the article I wrote on document translation (and why it sucks). I was sitting under a maple-tree (or some other wooden artifact) having a beer with a friend after a fabulous sushi-dinner and could do absolutely nothing about it.

Dammit!

Well, the reactions to Microsoft's move has actually been surprisingly positive. Even the ADD-bunch at noooxml.org said "If this is an honest attempt to play nice, it is a very welcome move" and even IBM has been quite positive - prompting Bob Sutor to turn the axe on Apple saying: "Hey, Apple, what about you? Let’s see you do this in iWork!". Simply starting to beat on someone else reminds me of the John Wayne quote "A day without blood is like a day without sunshine".

But what is missing from the reactions?

OSP coverage of ODF

One of the side-effects of Microsoft joining OASIS ODF TC is that ODF will likely be included in the list of specifications covered by Microsoft's Open Specification Promise (OSP). The list of specificationshas not yet been updated, but I would expect it to be updated soon - or at least when they officially join the ODF TC. When you think about all the fuss around IPR in this Spring, it is quite surprising that noone has picked up on this. It rams a huge stick through the FUD about the OSP not being applicable for GPL-licensed software. Now the OSP covers ODF as well and thereby the native document format of OpenOffice.org [LGPL 3.0 license] and (I think) OpenOffice Novell Edition.

But why OOXML, then?

A lot of people are now spinning information about this move pulling the rug under OOXML and that ODF should be mandated everywhere - but nothing could be further from the truth. The reason why we approved OOXML still stands and the incompatible feature-sets of OOXML and ODF did not suddenly become compatible. There are still stuff in OOXML that cannot be persisted in ODF and vice versa. The backwards compatibility to the content in the existing corpus of binary documents is still a core value of OOXML and this incompatibility of ODF has not dissapeared. You will still loose information and functionality when you choose to persist an OOXML-file in ODF ... just as you would when persisting it to old WordPerfect formats. Insisting that having ODF-support in Microsoft Office (12 SP2) makes the need for OOXML go away is a moot point - since I am sure no one would argue to replace OOXML with TXT - simply because TXT is a supported format in Microsoft Office.

Microsoft steps up to the task at hand

Some quite extraordinary news emerged from the Redmond, WA, headquarters of Microsoft today. In summary, they announced that

  1. Microsoft will join OASIS ODF TC
  2. Microsoft will include ODF in their list of specifications covered by the Open Specification Promise (OSP)
  3. Microsoft will include full, native support for ODF 1.1 in Microsoft Office 14 and in Microsoft Office 12 SP2 - scheduled for Q2 2009. Microsoft Office 12 SP" will have built-in support for the three most widely used ISO-standards for document formats, e.g. OOXML, ODF and PDF.


My initial reaction when I heard it was "Wow . that's amazing". I am sure a lot of people will react "It's too little, too late", though, but let me use a couple of bytes to describe why I think it is a good move by Microsoft.

Microsoft joins OASIS ODF TC

Well, Microsoft has been widely criticised for not joining OASIS a few years ago. I think it is a bogus claim, but never the less; it has been on the minds of quite a lot of people. Novell has had a seat in both ECMA TC45 and OASIS ODF TC for some time now, and it is my firm belief that both consortia has benefited by this. The move by Microsoft to join OASIS ODF TC will likely have a similar effect. One of the most frequent requests in the standardisation of OOXML was to increase the feature-overlap of ODF and OOXML. This is quite difficult to accomplish (effectively) without knowing what the features of the other document format is (going to be). By Microsoft participating on both committees (and IBM will hopefully consider joining ECMA TC45) harmonization (or "enlargement of the feature-overlap") will likely occur at a quicker pace.

This also means that the worries some of us have had about Microsoft's future involvement in standardisation work around document formats has been toned down a bit. Microsoft is now actively participating in this work in ECMA, in ISO and also in OASIS. I think this is really good news. Not good news for Microsoft - but good news for those of us that are working with document formats every day.

Microsoft will cover ODF with OSP

One of the most difficult, non-technical, discussions during the standardisation of OOXML was legal aspects. It was discussions about different wordings in Sun's CNS, IBM's ISP and Microsoft's OSP (Jesus Christ, guys, pick ONE single acronym, already!) and the possible impact on implementers of ODF and OOXML. One of the aspects of the discussion that never really surfaced was that if IBM has software patents covering ODF - some of them quite possibly cover parts of OOXML as well. But the ISP of IBM does not mention OOXML - it only mentions ODF. This leaves me as a developer in quite a legal pickle, because by implementing OOXML I am covered by the OSP - but I am not covered by IBM's ISP (and vice versa). To me as a developer, Microsoft's coverage of ODF in their OSP is a good move, because it should remove all legal worries I might have around stepping into SW-patent covered territory.

ODF support in Microsoft Office

Microsoft will finally deliver on requests for native ODF-support for ODF in Microsoft Office. Microsoft will support ODF 1.1 in Microsoft Office 12 SP2 and also have built-in support for PDF and XPS (these are currently only available as a separate download).

Denmark is one of the countries where both ODF and OOXML have been approved for usage in the public sector. This is currently bringing quite a bit of complexity to the daily work of information workers since there are not many (if any) applications offering high fidelity, native support for both formats. They hence rely on translators like ODF-Converter or similar XSLT-based translators. It's a bad, but currently necessary, choice. The usage of translators for document conversion has been widely criticised, amongst others by Rob and I, and the built-in support for ODF in Microsoft Office is a great step in the right direction.

As with everything Microsoft does, we need a healthy amount of scepticism as to which extend they will deliver on their promises. However, I truly believe that the moves by Microsoft here are good news - regardless of the scepticism. An old proverb says "don't count your chickens before they hatch" - and this applies perfectly here. We will have to wait and see what will eventually happen - but so far . it looks good.