a 'mooh' point

clearly an IBM drone

OOXML is defective #2 (depends on proprietary technologies)

A standard is not "free enough" if implementation of it depends on existance of a proprietary technology on the specific platform. Ideally it should be possible simply to buy the specification and implement it without any other financial requirements.

This is where OOXML fails.

OOXML heavily depends on "Object Linking and Embedding Technology" also known as "OLE-technology". Section 9.3.3 of the specification deals with how objects are embedded in the file format. The section is divided in two where the first section specifies how to embed documents otherwise defined in this standard. These documents are defined as

  • Formulas
  • Charts
  • Spreadsheets
  • Text documents
  • Drawings
  • Presentations

This is one of the clear cases where it is obvious that Microsoft continiously tries to preserve their main cash-cow: The Microsoft Office eco system! OOXML not only depends on Microsoft's proprietary technology OLE, the specification itself also makes it more easy to embed it's own "cousins" than any other file format. Talk about "first class citizens" of OOXML!

The section goes on telling us about binary objects:

Objects that do not have an XML representation. These objects only have a binary representation [...] (see [OLE]).

WTF? Once again a reference and requirement to use proprietary technologies like OLE! What if I want to embed my own JLSObjectType? What if I want to embed some object from the Linux-world like Bonobo-elements or KParts? The schema-elements only emphasizes my point:

<draw:object/> and <draw:object-ole/>

Are you also puzzled by this? Well, I don't blame you. To wrap up - we can embed "our own documents" and we can embed everything else. There are even two seperate elements from the draw-namespace that specifies this for us: <draw:object/> and <draw:object-ole/>. The entire schema-fragment is included here for your pleasure:

<define name="draw-object">
    <element name="draw:object">
        <ref name="draw-object-attlist"/>
            <ref name="common-draw-data-attlist"/>
            <ref name="office-document"/>
            <ref name="math-math"/>

<define name="draw-object-ole">
    <element name="draw:object-ole">
        <ref name="draw-object-ole-attlist"/>
            <ref name="common-draw-data-attlist"/>
            <ref name="office-binary-data"/>

This is yet another example of Microsoft on one hand claiming "openness" and with the other hand forcing everyone to use their own proprietary, undocumented technology.

But we're not done:

The embedded object is referenced through an XLink attribute in the enclosing frame-element. The behaviour is described as (bold typeface is my addition, /JLS):

The xlink:href attribute links to the object representation, as follows:

  • For objects that have an [OO]XML representation, the link references the sub package of the object. The object is contained within this sub page exactly as it would as it is a document of its own.
  • For objects that do not have an XML representation, the link references a sub stream of the package that contains the binary representation of the object.

Wow - wait a minute: Is this it? Don't you think a bit of clarification would be in order?

The fileformat for the physical file is a Zip-archive with a number of files and folders in it. But this archive also contains a "TOC"-list of the files and the mime type of the entire package. The latter is not an XML-file - where do I put this? Where do I put the TOC-file? What if my spreadsheet contains an image? Since the image is not in XML-format (it's binary) ... would my entire spreadsheet qualify as having "an XML representation"? And did you notice the part "the link references a sub stream of the package that contains the binary representation of the object."? A stream? Binary representation? Again totally unspecified behaviour and noone will ever be able to implement this apart from Microsoft and Microsoft Office 2007.

Microsoft had a good chance to specify this properly in the beginning. They could have made an open format to enable competition or a format that would stiffle competetion. So what does Microsoft do? Yup, the anti-competitive choice. Anyone surprised?

Interoperability - between what?

What is interoperability, really?

Well, when it comes to document formats, some people seems to think that interoperability is the ability to transform one format to another. That high-fidelity interoperability can only be achieved when it is possible to perform a complete translation/conversion of format X to format Y.

The basic problem for this premis is that if you were able to do this conversion, it would be the same as being able to make a 1-1 mapping between the functionality and features of format X and format Y (and vice versa). However - this effectively means that format X is actually just a permutation of format Y ... making format X and format Y the same format (pick up your favorite book on mathematical topology to see the details).

When it comes to ODF and OOXML, the case is pretty clear - the two formats are not the same. Sure - they can both define bold text,  but there are quite a few differences between the formats. A list of some of them can be found at the ODF-Converter website. I think that the list is the best argument for not being able to do a complete conversion of ODF to OOXML (and back). This was also one of the conclusions of the Frauenhofer/DIN-work in Germany, where they concluded that a full 1-1 mapping between the two formats could not be done.

The key question here is: Is interoperability diminshed by this fact?

If you ask Rob's posse, they will almost certainly say "Yes". They will say something like "Microsoft chose not to make OOXML interoperable with the existing ISO-standard ODF and therefore OOXML is a blow to interoperability".

If you ask me, I will say "No". I will say no because the term "interoperability" has been hijacked by the anti-OOXML-lobby in much the same way the SVG-namespace was hijacked by ODF TC. I will say "No" because interoperability means something radically different. The meaning is not rocket sciency, really ... and usually most people agree with the basis definition of interoperability. A few of those are:

Computer Dictionaly online: 


The ability of software and hardware on multiple machines from multiple vendors to communicate.



the ability of two or more systems or components to exchange information and to use the information that has been exchanged

US e-Government Act of 2002:


ability of different operating and software systems, applications, and services to communicate and exchange data in an accurate, effective, and consistent manner.

If you also look at the enormous list from Google you will see, that none of the definitions talk about the ability to convert formats. Instead they talk about communication between machines, platforms and networks. This is very close to my definition of interoperability when it comes to document formats.

The interoperability gained by using a specific document format is based on the possibility of implementing the format on any kind of platform, in any kind of software using any kind of operatingsystem. It is based on how well and consice and clear the language of the specification of the format is and it depends of howwell thought out the specification is.

It has nothing, nothing, nothing to do with the possibility of converting the format to any other format. 

A cry for help

Working almost everyday with implementing solutions that support ODF and OOXML I am naturally tasked (or more appropriately: challenged) with ambiguities in the forementioned specifications. At first glance ODF has an appealing simpleness and form, and reading the specification is almost like reading a book in natural "prose". However - the easiness to read sadly comes at the expense of clear language. So - as always when implementing any specifications, you need to have somewhere to go to ask your technical questions regarding how to implement the damn thing or questions about how to read the devil.

And therein lies my problem:

Where do I go to get answers to get these questions about ODF? Where is the website for ODF-development?

I have tried the forums at opendocument.xml.org -but the groups there are almost dead.

I have tried the maillist for the OpenDocument TC, but it is also almost dead.

So please help me - where do I go?

Update: I almost forgot - I have also prowled the Danish blogsphere where the ongoing battle between OpenXml and ODF usually takes place, but noone has been able to give me any pointers to where they usually get their information about implementing ODF.

(or have I been so heavily stigmatised by being pro-choice that noone wants to help me?)


OOXML is defective #1 (Pasword hashing)

OOXML has been accused of being rushed through not even the writing itself but also certification in both ECMA and ISO. It's a quick accusation to make but sometimes it can be really tricky to figure out if a statement is true or false. But you know, sometimes you stumple over something that really shows you that the specification was rushed through not only preliminary editing but also certification in ISO.

The one thing I noticed in was password hashing. As with other document formats, document protection can be defined in multiple ways. There is of course protection of the document itself but most document formats also allow protection of specific parts of the document or even read-only protection of the document. The way it's usually done is to ask the user for a password, hash it and store it in the document. When the document is opened the next time, the user is prompted for a password, and if it matches the stored value - the protection of the document (or parts of it) is released.

Now, this is defined, amongst other places, in section 4.4.1 (Section attributes) where it deals with protection of sections. The text says:

A section can be protected, which means that a user can not edit the section. The text:protected attribute indicates whether or not a section is protected. The user interface must enforce the protection attribute if it is enabled.

This is more or less what I wrote above. It also says:

A user can use the user interface to reset the protection flag, unless the section is further protected by a password. In this case, the user must know the password in order to reset the protection flag. The text:protection-key attribute specifies the password that protects the section. To avoid saving the password directly into the XML file, only a hash value of the password is stored.

And that's it.

WTF? Nothing more? Nothing about how to specify the hashing algorithm? Nothing about how to specify initialization vectors, prepending of zeroes ... nothing?

But wait - what if we look in the schema itself - maybe it's just the descriptive text that is a bit ... ahem ... limited. Ok - the schema says:

<define name="sectionAttr" combine="interleave">
  <attribute name="text:protection-key">
   <ref name="string"/>

Dammit - nothing here either. Notice also that it is not possible to store the way the hash-value is persisted. Is it a bit-sequence? A Hex'ed bit-sequence? A Base64-sequence? Nothing!

But wait again - let's look into the file of an actual document with read-only protection. Let's see what is stored in the document. Well, the XML-fragment lists as:


Any clever suggestions for me as an ocument consumer to what to do with this value? This is truly amazing. One one hand the authors talk about their document format being able to provide true and pure interoperability ... but they haven''t specified something as common as document protection. I wonder how they can claim this with a straight face. Interoperability is certainly not enabled by limiting the details of the specification to as little as this ... but maybe they just hope noone will use this feature and thereby have "interoperability by rejection".

I cannot help to wonder: who in their right mind would put up a suggestion for standardisation of a document format that was unspecified in such a central feature as "document protection". This must be one of those places where

Ratification trumphs perfection 

Yeah, well ...

Do your math - ODF and MathML

When I studied at DTU (Technical University of Denmark) I basically lived in the Department of Mathematics. I did my bachelor project there and I did my thesis there. I think it would be fair to say that math is really in my blood (or was).

Of course - in those days we wrote our equations in LaTeX (not the suit) and I remember how we laughed diabolically at our co-students that did their papers in e.g. Microsoft Word and had to use the really, really annoying "Equation Editor" (shudder). I remember how we also laughed at the students that did pictures and graphs in e.g. Adobe PhotoShop or Visio (before it was aquired by Microsoft, afaik), coz everybody knew that it had to be done using xFig ... the program with the worst possible UI ever ... at least in those days.

For the purpose of these articles (an article about Microsoft Office 2007 and OMML will follow shortly) I dug into my thesis and looked at how math was displayed using LaTeX. I created a "reference equation" to use when trying to display some math in either ODF or OOXML. The test equation I made was this:

    \cos\Big(\fraq{\pi}{4}\Big) = \Big(\fraq{\sqrt{2}}{2}\Big)

For those of you not speaking LaTeX fluently - you should consult the "Not so short introduction to LaTeX" chapter 3 - or simply behold the equation below:


In ODF mathematical notations are done using MathML (section 12.5) - a W3C-standard for displaying mathematical content. The mathematical content is embedded in the ODF-package as an object and as far as I can see, it is not possible to use MathML inline in the content of the paragraphs of the document itself. I have earlier talked about ODF being vague and this is imo one of the places where some clarity could help.

But - learning MathML is like learning a new language ... it doesn't really make sense in the beginning. So I started to poke around a bit on the W3C-website in search of some tools or tutorials that would help me figure ot what MathML is all about. I eventually found a W3C tool called Amaya. It's a MathML/SVG-tool developed by W3C and I used this tool to create the MathML for the base equation above. In Amaya it looks like this:



The interesting part, of course, it the MathML created by Amaya. The MathML (slightly modified, but validated) is listed below

<?xml version="1.0" encoding="utf-8" ?>
<math xmlns="http://www.w3.org/1998/Math/MathML">

If you look at the XML, it is pretty easy to identify the different parts of the equation.

So - in theory I should be able to put this into an ODF-document and it would be displayed when opening the document using OpenOffice.org - the reference implementation of ODF. 

Let's see


Step 1

Create an ODF-document using OpenOffice.org with an mathematical formula embedded.

Now, this was the easy part. I cannot figure out how to insert a regular "Pi"-sign in the formula, but the formula looks just fine. The file is available here: math.odt (9,72 kb). It looks like this:



Step 2

Clean the file for all the disturbing crap that the application puts in per default

This was a bit more tricky, since somehow it seems that the mathical formula can only be contained in a file called "content.xml" - otherwise OpenOffice.org simply shuts down. Also, I have removed alle meta-data, styling, extra namespace-declarations, embedded thumbnails and graphical representation of the formula. The cut-down ODT-file is available here: math-minimal.odt (1,43 kb). The visual representation is completely like the original file. 

Step 3

Inspect the MathML in the application created MathML-file

The MathML created by OpenOffice.org looks like this: 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">
<math:math xmlns:math="http://www.w3.org/1998/Math/MathML">
        <math:mfenced math:open="" math:close="">
            <math:mi math:fontstyle="italic">pi</math:mi>
        <math:mo math:stretchy="false">=</math:mo>
        <math:mfenced math:open="" math:close="">
    <math:annotation math:encoding="StarMath 5.0">cos left ( pi over 4 right )  = left (sqrt{2}  over 2  right )</math:annotation>

There are a couple of things to note about this. Firstly, I don't understand the namespace declaration as

"<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">

The doctype should not matter at all - and why they chose to use a "DTD Modified W3C MathML 1.01" is beyond me. I'm not saying it's an error - I just don't get it. Enlighten me, pleze.  Secondly the MathML created looks different from the MathML created my Amaya. However - just as the same paragraph can be presented in all sorts of way using HTML and the same equation can be presented in different ways (e.g. sin2(x) + cos2(x) = 1 is basically the same as a2 + b2 = c2), the same equation can be created in an endless myriad of ways using MathML. Thirdly there are two distinct ways where the OOo MathML is different from the MathML of Amaya. Notice how it uses the <mfenced>-element to make a parenthesis instead of <mo>)</mo>. There is really no difference - however I tend to think that using the <mfenced>-element is slightly more sophisticated than the <mo>-element, but it's just a personal belief. Also, look at the usage of the <semantics> and <annotation>-elements. This is actually really cool. The <semantics>-elements are used to provide "meaning" to the MathML-markup and the content in the <annotation>-elements directly maps the MathML markup to the corresponding expression tree. Also, OpenOffice.org allows you to type in the annotation directly, thereby enabling some of the ease of writing LaTeX directly by hand.

Step 4

Validate the MathML-file using W3C-validator or Amaya 

The picture below shows the content.xml loaded and displayed in Amaya. The green dot in the bottom right corner indicates that the MathML is valid. I have also made a test with embedding the MathML in a HTML-document and validated it against the W3C-validator and the result is the same.




Step 5

Insert the MathML created by Amaya into the ODT-file and open the file using OpenOffice.org

Now, I have previously created the formula using Amaya and I just have to inject it into the ODT-file. I did and the file is available here: mathml-minimal-error.odt (1,23 kb). The result is, however, not as I expected



Ok - but as you might have noticed, all elements in the OOo MathML-file were namespace-prefixed, so maybe this will do the trick. I tried this as well but with the same result. File is available here: mathml-minimal-nsprefix-error.odt (1,24 kb).

Final step

Figure out what the hell is wrong 

I finally figured out what is wrong with the way OpenOffice.org handles MathML-content. It turns out that if I took the Amaya MathML (without ns-prefix) and inserted the MathML into the original content.xml-file but preserved the DOCTYPE-declaration, it works almost as expected. File is available here: mathml-minimal-inject-succes.odt (1,30 kb).

Well, some error are introduced. The Π-character is not displayed and the equation is displayed in bold. Also the equal-sign has disappeared as well.

Just for the fun of it I took the MathML-file generated by OpenOffice.org and removed the <semantics>-element as well as the <annotation>-element. File is available her: mathml-minimal-inject-no-semantics.odt (1,35 kb). The result when opening it in OpenOffice.org is .. well ... sad:

I have absolutely no idea of why it displays it like this. Removing the <semantics>-element and <annotation>-element should have no effect on the visual representation of the equation.


Well, I don't really know what to conclude. Most of the things I have shown above are imo due to errors in the implementation of OpenOffice.org where MathML is clearly not implemented correctly sufficiently. It seems that there are some unwritten rules to how MathML is supposed to be used when working with it in OpenOffice.org, but they seem rather unclear and weird to me.

But how OpenOffice.org behaves is really not important to me - some implementations of ODF are better than others, and maybe other implementations do a better job at displaying MathML. The point should be how the specification says it should be used. Luckily the ODF-spec only talks about how MathML is used in a single place - section 12.5 Mathematical Content. It says that "Mathematical content is represented by MathML 2.0 (see [MathML])". The RelaxNG-snippet provided also tells us that you can put everything into a "math area", <math:math>:

<?xml version="1.0" encoding="UTF-8" ?>
<define name="math-math">
    <element name="math:math">
        <ref name="mathMarkup" />
<!-- To avoid inclusion of the complete MathML schema, anything -->
<!-- is allowed within a math:math top-level element -->
<define name="mathMarkup">
                <anyName />
            <text />
                <anyName />
                <ref name="mathMarkup" />

So basically, all bets are off. I can only begin to wonder how other implementations of ODF use MathML.

And a small appetizer:

As soon as I get the time for it, I'll write an article as this one with Office 2007 and OMML. I will investigate how to markup mathematical content using OMML and I will also try to use the XSL-files provided by Microsoft in Office 2007 to create XSLT-translations of my base equation from OMML to MathML and vice versa.

... stay tuned ... 


Embrace and extend - SVG in ODF revisited

One of the attack-vectors on OOXML has been the lack of reuse of existing standards. Specifically it lands directly in the discussion of DrawingML vs. SVG and OOML vs. MathML ... both of which are relatively interesting subjects. The argument has been why Microsoft chose not to reuse SVG and created DrawingML instead - and likewise with MathML and OMML.

Now, some of the arguments for reusing existing standards are:

  • Reuse of other people's code
    As a programmer, I love this - there is nothing more satisfying than being able to reuse something that others have made an effort to produce
  • Increase quality
    If something is an existing standard, someone else has propably reviewed it and the worst bugs have likely been removed.
  • Brain cycle reuse
    If you reuse some work already defined, you will propably be able to find someone in your organization that has skills in this area - and you avoid the costs of re-educating them to use a new tool.

So, with respect to ODF, it has tried to reuse as many standards as possible, so e.g. mathematical content is done using MathML and vector graphics are supposedly done using SVG. Microsoft has chosen a different path where they have created new formats for their formats, så mathematical content is done using OMML (Office Math Markup Language) and vector graphics are done using DrawingML.

A couple of weeks ago I heard some rumours that ODF had not actually only used SVG as vector graphics format but also even extended it beyond the standardized format. My initial response was that it had to be wrong information. One of the corner stones of ODF is namely that it reuses existing standards and that there is a "clean cut" between ODF and the standard it utilizes. This way I would be able to buy/aquire some library that supports SVG and simply incorporate it in my product implementing ODF. But if the referenced standard is extended - I will either experience less functionality due to extensions not being parts of the standard or I could experience crashing code when I try to pass the extended format to the external library - at least if it performs e.g. DTD/schema validation and finds out that invalid elements are present in the input.

So what did I do?

Basically I started by doing a random text-search in the ODF-spec for occurences of "[SVG]". One of the first things that caught my attention was the paragraph in section 1.3 Namespaces, Table 2 where it says:

Prefix Description
For elements and attributes that are compatible to elements or attributes defined in [SVG].
urn:oasis:names:tc:opendocument:xmlns: svg-compatible:1.0

The term "compatible to elements or attributes" seems quite odd to me, since it should not be necessary to specify this if the referenced standard is not extended. I did another quick search and I stumpled over these sections of the specification:

  • 14.14.2 SVG Gradients
  • 15.13.13 Line Join

Let me quickly walk through the contents of each section.

14.14.2 SVG Gradients

The contents of section 14.14.2 says, amongst other things.

In addition to the gradients specified in section 14.14.1, gradient may be defined by the SVG gradient elements <linarGradient> and <radialGradient> as specified in §13.2 of [SVG].


Now, the section goes on as

The following rules apply to SVG gradients if they are used in documents in OpenDocument format:

  • The gradients must get a name. It is specified by the draw:name attribute.
  • For <linarGradient>, only the attributes gradientTransform, x1, y1, x2, y2 and spreadMethod will be evaluated.
  • For <radialGradient>, only the attributes gradientTransform, cx, cy, r, fx, fy and spreadMethod will be evaluated.
  • The gradient will be calculated like having a gradientUnits of objectBoundingBox, regardless what the actual value of the attribute is.
  • The only child element that is evaluated is <stop>.
  • For <stop>, only the attributes offset, stop-color and stop-opacity will be evaluated.

 So, to be able to determine if ODF is only referencing SVG, we need to look at section 13.2 in SVG spec. It says:

<!ELEMENT %SVG.linearGradient.qname; %SVG.linearGradient.content; >
<!-- end of SVG.linearGradient.element -->]]>
<!ENTITY % SVG.linearGradient.attlist "INCLUDE" >
<!ATTLIST %SVG.linearGradient.qname;
    x1 %Coordinate.datatype; #IMPLIED
    y1 %Coordinate.datatype; #IMPLIED
    x2 %Coordinate.datatype; #IMPLIED
    y2 %Coordinate.datatype; #IMPLIED
    gradientUnits ( userSpaceOnUse | objectBoundingBox ) #IMPLIED
    gradientTransform %TransformList.datatype; #IMPLIED
    spreadMethod ( pad | reflect | repeat ) #IMPLIED  

So it seems that at least the attribute gradientUnits is not used in the ODF-adapted version of SVG.

If we look at <radialGradient>, we need to cross reference with the corresponding  DTD in SVG. It says:

<!ENTITY % SVG.radialGradient.extra.content "" >
<!ENTITY % SVG.radialGradient.element "INCLUDE" >
<!ENTITY % SVG.radialGradient.content
    "(( %SVG.Description.class; )*, ( %SVG.stop.qname; | %SVG.animate.qname;
    | %SVG.set.qname; | %SVG.animateTransform.qname;
    %SVG.radialGradient.extra.content; )*)"
<!ELEMENT %SVG.radialGradient.qname; %SVG.radialGradient.content; >
<!-- end of SVG.radialGradient.element -->]]>
<!ENTITY % SVG.radialGradient.attlist "INCLUDE" >
<!ATTLIST %SVG.radialGradient.qname;
    cx %Coordinate.datatype; #IMPLIED
    cy %Coordinate.datatype; #IMPLIED
    r %Length.datatype; #IMPLIED
    fx %Coordinate.datatype; #IMPLIED
    fy %Coordinate.datatype; #IMPLIED
    gradientUnits ( userSpaceOnUse | objectBoundingBox ) #IMPLIED
    gradientTransform %TransformList.datatype; #IMPLIED
    spreadMethod ( pad | reflect | repeat ) #IMPLIED

So here the attribute gradientUnits is not used as well. 

But luckily the good guys at ODF TC have solved this mystery for us - since they have decided that the value of the (non-existing) attribute gradientUnits is calculated as having a value of "objectBoundingBox", regardless of the value passed as this parameter. It's a bit odd, but I suppose it has something to do with the way the SVG-fragments positions themselves around the other objects in the document.

15.13.12 Line Join

The contents of section 15.13.13 is:

The attribute draw:stroke-linejoin specifies the shape at the corners of paths or other vector shapes, when they are stroked. The values are the same as for [SVG]'s strokelinejoin attribute, except that the attribute in addition to the values supported by SVG may have the value middle, which means that the mean value between the joints is used.

They have even been so kind to provide us with a schema fragment defining the possible usage of this feature in ODF:

<define name="style-graphic-properties-attlist" combine="interleave">
        <attribute name="draw:stroke-linejoin">

Compare this with the DTD of SVG (Appendix A.1.7 Paint Attribute Model):

<!ENTITY % SVG.stroke-linejoin.attrib
    "stroke-linejoin ( miter | round | bevel | inherit ) #IMPLIED"

So the attribute value "middle"  is indeed an addition to SVG.


You might be wondering if all this is really worth an entire article about a couple of additions/exclusions of SVG, and you kindda have a point. However, the devil lies in the details.

The modifications to SVG (even if they are minor) are bad enough as they are, because they basically kill high-fidelity interoperability when using existing SVG-libraries. When you are limiting the usage of some component (the limitations to the values of gradientUnits) you basically loose control with how existing data behaves. And when you enlarge a standard (addition of the middle-attribute of the stroke-linejoin element) you loose control with how your own data behaves when using it in other scenarios. You know, this is exactly what Microsoft did when they enlarged not only CSS but JavaScript. Maybe the memory of the ODF-founders is not that great, but I certainly remember the loads of crap-work we had to do in the late ninetees when creating web-pages to "IE5-compatible" browsers and "the rest". In fact - this nightmare still haunts us with the Microsoft additions to JavaScript.  Maybe they just thought: "If Microsoft pulled it off, so can we". I think that's a bad choice.

Also, you should note that ODF does not use SVG "as such" at all. They use fragments of SVG, i.e. elements with same names and attributes and then they fit it into the overall architecture of ODF. This is hardly "just referencing". As the paragraph says above (stroke-linejoin), the elements specifying this are not SVG-elements. They are similar to SVG-elements and even extended beyond this. I actually find it really hard to see or understand how the ODF TC can claim - with a straight face - that ODF only references SVG. I suppose that if I made my own JLSMarkup for document formats and used an element called <body> I would also be able to claim that I was reusing W3C xHTML 1.0. I just don't find it the right thing to do.

My only surprise is why this has not surfaced until now and how anyone can sit down and read in ODF (as being both pro-ODF or pro-choice) and not be just a little confused about how they could claim "just referencing existing standards", is a bit mind-baffling to me. I suppose ECMA could do the same with OOXML and claim "reusage of HTML DOM in OOXML-architecture" since a WordProcessingML-document contains both a <body>-element as well as a <p>-element.

Post scriptum

On his blog Brian Jones speculated in his last comment on the thread "Why all the secrecy?" if you could take an existing SVG-drawing, put it into an ODF-document and expect it to work. Well, just as OOXML, ODF has no limitations to what kind of data you might want to put into it, so usage of SVG in a ODF-document is indeed possible from a technical/architectural point of view. It is not a format question but an implementation-specific question. However - will it work?

ODF has several ways to embed data into the document. The two relevant means are inclusion of an SVG-drawing as an image and inclusion of an SVG-image as an object. ODF supports two ways to embed an object, as stipulated in section 9.3.3:

A document in OpenDocument format can contain two types of objects, as follows:

  1. Objects that have an OpenDocument representation. These objects are:'
    1. Formulas (represented as [MathML])
    2. Charts
    3. Spreadsheets
    4. Text documents
    5. Drawings
    6. Presentations
  2. Objects that do not have an XML representation. These objects only have a binary Representation, An example for this kind of objects OLE objects (see [OLE]).


Well, SVG is clearly XML but it is not an "OpenDocument representation" - but then again, neither is MathML, so I'll opt for using these two methods when trying to embed an SVG-drawing into a ODT-document:

  • Insert the SVG-drawing as an image
  • Insert the SVG-drawing as an XML part using the <draw:object>-element as specified in section 9.3.3 of the ODF spec.

I'll use the latest and greatest release of OOo, OpenOffice 2.3.1 DA, to try to display the files. You can see the SVG-file here: ex.svg (482,00 bytes)

Insert SVG as an image

I have created a small ODT-document and added the SVG-file to it. I have added an SVG-image to content.xml as a regular image and put the SVG-file in a folder by itself. The XML-file content.xml is displayed here below.

<?xml version="1.0" encoding="UTF-8" ?>
   <text:p >Test of insertion of SVG-image in ODT-document</text:p>
   <text:p >
      xlink:actuate="onLoad" />

As it is seen the SVG-image is simply added as a regular image using the ODF-modified version of SVG. The ODT-file is available here: test svg image.odt (1,48 kb). Anyone want to take a guess on what the result of opening this file will be?



Insert SVG as an "XML-object"

As noted above ODF allows insertion of objects with an "XML-representation" as just a text file. The construction of the ODF-package is a bit more complicated and I'd be happy if anyone could tell me if I made a mistake - and what the correct way would be. As basis for my file I have used an ODT-file with a formula in MathML embedded, an so I'll just again show the contents of the content.xml-file here below.

<?xml version="1.0" encoding="UTF-8"?>
      <text:p >Test of insertion of SVG in OOo</text:p>
      <text:p >
          draw:name="My SVG drawing [JLS]"

Again an xlink reference to the SVG-file is "simply" added to content.xml. The ODT-file is available here: test insert svg.odt (1,48 kb). Anyone want to take a guess on what the result of opening this file will be?




So it seems to recognize the SVG filetype - it just doesn't understand how to process it.

I have a feeling that I might have made an error in the manifest, so I'll include it here and hopefully someone can pinpoint if there is an error:

<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
 <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.text" manifest:full-path="/"/>
 <manifest:file-entry manifest:media-type="image/svg+xml" manifest:full-path="SVG/ex.svg"/>
 <manifest:file-entry manifest:media-type="application/vnd.oasis.opendocument.image" manifest:full-path="SVG/"/>

OOo and SVG

I have said before that the devil lies in the details - but here it actually lies right up-front. You see - OpenOffice.org does presently (version 2.3.1) not suppport SVG. It doesn't support SVG as regular images and it does not support SVG as providing vector graphics or "line art". You can import SVG-images with OOo, but it is converted to OpenDocument Draw and Open Document Draw data can be exported to SVG. The import/export is not done not using OOo itself but with a filter, that converts the SVG into the internal ODF Draw format. The feature of supporting SVG is apparently the single most requested feature in OOo, so maybe it will soon be a part of OOo. Also take a look at the "General note" on the "Unsuppoted SVG features"-page of the filter:

SVG and what's named SVG-compatible in OpenDocument is really different. Therefore, the import filter can only approximate the SVG contents.

Ooh - and incidentally - the way ODF and OOo handles SVG is exactly the same way OOXML and Microsoft Office 2007 handles MathML.


What's up with OLE?

A few weeks back I made an article about how Microsoft Office 2007 dealt with password-protection of an OPC-package, since this feature is not a part of the OOXML-specification. The answer I found was that Microsoft Office 2007 persists the password-protected file as a OLE2 Compound File ... more commonly known as a "OLE-file". I also concluded that using OLE2 Compound Files is not a problem - and certainy not an issue regarding OOXML.

Now - the whole topic around OLE has been at the front row of the worldwide debates regarding OOXML. My personal opinion is that the people jumping up and down screaming about problems with OLE ... really haven't understood what OLE is.

So let me start by making a small recap' of what it is really all about.

... there is OLE and then there is OLE 

First of all:

there is "OLE" and then there is ... "OLE"

... or put in another way:

there is the "OLE-technology" and then there is the "OLE-file"

or in a third, more correct, way:

there is the "OLE application technology"  and then there are "Compound Files".

The foremore mentioned is the technology that - on the Windows platform - enables a program to use the UI of another program ... without launching the entire application itself. I mostly use this when editing MS Visio-documents in Word but other usages of this is using an Excel spreadsheet in an MS Word application. The OLE-technology itself is a tool on the Windows-platform that all applications can - and do - use to enable "utilizing other applications in their own applications". It is here important to understand, that there is (today) nothing really revolutionary about OLE. Another similar technology on the Windows-platform is DDE and on the Linux-platform it could be KParts and Bonobo. These technologies simply enable one program to communicate with another (simply put).

But what about these OLE-files?

Well, Compound Files are actually not dependant of OLE-technology. Or put in another way: you don't need OLE-technology to read and use the contents of a Compound File. Compound Files are just files. A Compound File is a collection of persisted streams - actually much like a ZIP-archive. Most commonly it is used because it brings the ability to "utilize a file system within a file". Of course you will need to know how to use the contents of the file, be it created by OpenOffice, Corel Draw, Adobe Acrobat or any application that might store its files using Compound Files. But this is seperate from being able to read and write to the contents of a Compound File.

Ok - I will not bother you any more with this. You should check out the original article about OLE and also look into the specification of the binary formats for Microsoft Office95 - Office2007, avilable from Microsoft. It is actually quite interesting. Just remember that OLE-technology and Compound Files are not the same thing.

And now for something completely different (kindof)

In the lab-tests I have been part of for the Danish Government (National IT and Telecom Agency) we have tested OLE-interoperability. It is important since it is quite normal to embed e.g. a spreadsheet file in a Text-processing file. So it is important that the contents of the file is actually usable when receiving it and opening using another application or on another platform.In this setup we only tested Compound File interop and not interop between OOXML and ODF.

What we did was this:

We created a ODF-file using OpenOffice where we embedded a Excel-spreadsheet (binary .DOC-file) (on the Windows-platform)

We sent this file to a number of different platforms and applications

  • Windows XP using OpenOffice.org 2.3 DA
  • Windows XP using OpenOffice Novell Edition
  • Linux using OpenOffice Novell Edition
  • Linux (SLED) using IBM Lotus Notes 8

We tried to open the file and documented what happened.

Setup  What happened? 
Windows XP using OpenOffice.org 2.3 DA OpenOffice.org opened the document and correctly displayed the contents of the spreadsheet. It was possible to edit the spreadsheet and save it back into the ODF-container
Windows XP using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
3 Linux using OpenOffice Novell Edition OpenOffice Novell Edition opened the document and correctly displayed the contents of the spreadsheet. It was possible to activate the spreadsheet but only in "read-only"-mode
Linux (SLED) using IBM Lotus Notes 8 Lotus Notes 8 opened the document and correctly displayed the contents of the spreadsheet. When activating the spreadsheet the user was prompted to convert the spreadsheet. When accepting this it became editable and when saving it back into the ODF-container, the spreadsheet was persisted as an Open Document Spreadsheet.

So what we saw was basically 3 different approaches to handling the embedded object. In general the Excel-object (Compound file) itself was not a problem - regardless of application and platform. All combinations had no problems with opening the file and displaying the contents - even on platforms without OLE-technology present. The difference was in the applications and their handling of the object. OpenOffice.org presented the approach that most people would expect: it allowed editing the embedded object and saving it back into the container. OpenOffice Novell Edition allowed activating the embedded object but not saving it back into the container and Lotus 8 took the approach of converting the Excel-object to an Open Document Spreadsheet.

A conclusion?

Well, we took great care not to conclude much - that was not for us to do, we merely provided the technical background for post-lab conclusions. However - the pattern emerging from the description above was similar to a pattern we saw a lot. The problems were not in incompatibility between the formats but instead in how the applications and converters dealt with the formats. We also saw no indications that any of the formats were tied to a specific platform. There were no problems with roundtripping - or to put more clearly: the problem we saw when round-tripping documents were not caused by incompatibilities between the platforms (e.g. Linux and Windows) but between different behaviour in the applications implemented on either platform.

So is this good or bad news? Well, as always, truth lies in the eyes of the beholder ... but I think it is good news. 

Where did my line go?

When we started doing our tests in the lab and started thinking about what we thought we would be seeing, we had a very clear understanding that it would not all be blue-sky conversions and that we would identify problems - some more severe than others. We were also pretty aware, that there would be areas, where conversion was just not possible.

But - I am pretty sure I speak for the rest of the group - we were quite surprised to see which areas this concerned.

On area where absolutely nothing could be converted was ... lines. Not only line art, not only complex line drawings ... but simply - lines.

Lines are done in OOXML as either VML or DrawingML and in ODF it is done using a SVG-derivative. The puzzling thing is, that this area is apparently simply left out in either of the converters. We made some simple documents (line.docx 10,47 kb) and (line.odt 6,60 kb)  [I have re-made these for this article]. When converting these files using CleverAge 1.0 on Microsoft Office 2003 and 2007, Novell OOXML Translator (on Windows and SLED) or IBM Lotus Notes 8 (on SLED), the lines are simply removed. They are not altered, they are not just hidden, they are not moved to a different location in the document ... they are just removed.

This is another example of the overall observation from our tests ... the quality of the converters are simply not good enough today. If you look at the XML in either of the files above, you will see, that even though they look different, they basically specify the same thing (start and end-point for the line drawn), so technically it should pose no problem to be able to do a better conversion.

It is often said, that the main problem with converting from ODF to OOXML (and vice versa) is incompatibilities between the formats. This example is by first glance suporting this argument, but if you dig a bit deeper into the technicality of it, is simply boils down to a problem with bad converters.

Conclusion: The world is seldom black/white ... even if people are trying to convince you so. More often, the world is grey and depressing as a rainy day. 

What is a conversion, really?

I have been part of some work for the the Danish National Telecom and IT Agency (IT- og Telestyrelsen). They have coordinated quite a few projects around the country to evaluate the usage of ODF and OOXML and possible problems with co-existance of the two document standards. The website for this work is at http://dokumentformater.oio.dk .

The basic setup for the projects and tests has been:

How does a particular department handle the two document formats and possible conversion between them?

Which problems will arise given their current software install-base?

Is it possible to provide some guidance to the departments regarding which specific features of a document format to avoid since they cause problems?

In other words it has been a rather pragmatic approach based on trying to answer the question: "Why do you experience the problems you see?"


The first thing we realized during the very first day was something quite crucial:

We were not testing compatibility between two formats - instead we were testing quality of converter-tools and compatibility between the specific format and the internal object model the format is loaded into.


Both OOXML and ODF are rather immature document formats in the market today since neither of them has a broad market penetration as such. Despite the document count on Google, ODF is not widely used and most people still save their work in .DOC-files -even though they have Microsoft Office 2007 installed. This means that conversion between them is also rather immature and this affects the quality of the converters and the results of converting between one format and another. The ODF-Converter project has an extensive list of the differences between the formats themselves and also a list of features currently not supported by the converter and similar lists exist of features not supported by the other tools used. Luckily it seems that the quality of the converters are drastically improving for each incremental new release.

We also noted that a converter is not "just a converter". It lives and breathes on the application it is installed. This was of particular interest when looking at the ODF-Converter Office Add-In and the SUN OOXML-converter. They are both add-ons to existing Office applications but the application behaviour we saw was in principle the same when using OpenOffice.org, IBM Lotus Notes 8 or OpenOffice Novell Edition.

The problem lies in the fact, that every application has an internal object model that determines how a document is persisted in memory in the application. The binary format for Microsoft Office files were essentially a binary dump of the current memory in the application and this basically counts for at lot of applications with binary file formats. Anyway - regardless of how a document is "converted" or "transformed" using another application than the originator, at the end of the day it has to be loaded into the internal object model for the receiving application. This essentially means, that unless there is a 100% air-tight 1-1-mapping of the document format and the internal object model ... information will be lost. This was one part of the problem - the other was the sequence of conversion. Take a look at the sequence listed here:

Sequence 01  Sequence 02
load original format Load original format
Convert format to new format Load original format into internal object model
load new format into internal object model (make changes)
(make changes) Persist as new document format
Persist as new document format  

It is not entirely evident that this will produce the same output, and we have seen no evidence that any of applications tested did actually have a 1-1 mapping between (any) document format and their internal object model. This also counts for Microsoft Office and its corresponding file types and OpenOffice itself. In short, this was a fact that we had to deal with in our tests.

On a funny note:

The conversion tools we used were all based on XSLT-transformation between the document formats. They are both XML-formats, so it is a good choice. However, we heard rumours that Novell would dump their OOXML-converter (based on XSLT) and develop their own converter based on the internal object model. It will be interesting to see, if it brings greater quality to the converters.

On a lighter note:

We saw in our tests that using the binary Microsoft Office file format as a middle-man when converting from OOXML to ODF (and back) actually produced the best results ... by a long shot. Having this step and using the binary Office file format as a type of "Lingua Franca", was more or less the key to "flaw-less conversion". If you stop and think about it, it makes perfect sense why we saw this. The Microsoft Office Binary file format is well established in the market (not thanks to Microsoft, but to reverse engineering) and the format has been arround for a long time. Basically, all applications can read it and all applications can write it. But why is this interesting? Well, OOXML is an XML-version of the binary Office file format, so since there are "no problems" with converting from the binary format to ODF, it should be technically relatively easy to convert from OOXML to ODF, since OOXML is a binary version of the binary file format.

It is just a matter of time ... and continious improvement of the format converters. 

Værktøjer, ODF, AODL

I de sidste par dage har jeg kigget på de tilgængelige værktøjer til generering af ODF-filer - pt. AODL. AODL er en del af ODF Toolkit til generering af ODF-dokumenter og AODL er C#-versionen af dette kit. Kildekoden til dette toolkit kan hentes fra OOs' CVS-repository.

Et ODF-dokument er jo reelt "blot" et ZIP-arkiv indeholdende en række folder og filer og det er et eller andet sted ikke noget, som er relevant for særligt mange. AODL ligger som et objektmæssigt abstraktionslag over dette faktum - og når man anvender biblioteket behøver man ikke at vide dette - altså før det tidspunkt, hvor koden ikke virker. Her er det til gengæld væsentligt at have viden om denne ZIP-fils sttruktur i forbindelse med fejlretning. Ved anvendelse af biblioteket danner man et "regnearks-objekt" (SpreadsheetDocument), hvis man vil lave et regneark, et "teksbehandlingsobjekt" (TextDocument), hvis man vil danne et tekstbehandlingsdokument etc. Man kan tilføje et afsnit (Paragraph) til et tekstbehandlingsdokument og lignende. Abstraktionsmæssigt er implementeringen altså meget tæt på den måde, som man bruger et tekstbehandlingssystem på og det er ganske intuitivt at komme igang med at lave dokumenter.

Et eksempel på dette taget fra AODL-code snippets, der viser dannelse af et tekstbehandlingsdokument:

TextDocument document = new TextDocument();
Paragraph paragraph = ParagraphBuilder.CreateStandardTextParagraph(document);
FormatedText formText = new FormatedText(document, "T1", "Some formated text!");
formText.TextStyle.TextProperties.Bold = "bold";

Ganske tilsvarende er eksempelkoden til generering af et regnearksdokument med en celle med indhold:

SpreadsheetDocument spreadsheetDocument = new SpreadsheetDocument();
Table table = new Table(spreadsheetDocument, "First", "tablefirst");
Cell cell = table.CreateCell("cell001");
cell.OfficeValueType = "string";
cell.CellStyle.CellProperties.Border = Border.NormalSolid;          
Paragraph paragraph = ParagraphBuilder.CreateSpreadsheetParagraph(spreadsheetDocument);
paragraph.TextContent.Add(new SimpleText(spreadsheetDocument, "Some text"));
table.InsertCellAt(2, 3, cell);

Jeg vil vove den påstand, at selv folk uden dyb programmeringsmæssig baggrund vil kunne give en beskrivelse af, hvad koden gør. Two thumbs up herfra!.


Selve designet af det begrebsmæssige abstraktionslag for ODF er - sagt igen - rigtigt godt. Jeg kan godt lide at arbejde med objekter med semantisk mening og arkitekturen er sådan relativt gennemført. Der anvendes polymorfi, nedarvning og objekterne er definerede via interfaces de rigtige steder. En løs gennemgang af koden indikerer, at de har lavet det rigtigt godt.

Men mest interessant - uderover objektmodellen - er jo hvordan den bruges. På objektet TextDocument findes en SaveTo()-metode med to overloads. SaveTo() er defineret i interfacet IDocument som både TextDocument() og SpreadsheetDocument() implementerer. Den mest simple af disse overloads tager en "filename"-parameter som argument. Naturligt nok gemmes indholdet af dokumentet i den specificerede fil. Den anden metode tager - udover en filename-parameter - et "IExporter"-objekt. Idéen med dette objekt er, at man kan gemme sit ODF-dokument ved at specificere en konkret exporter. Der medfølger en "PDFExporter" med AODL, og man må antage, at den gemmer ODF-dokumentet som en PDF-fil (jeg har ikke prøvet selv). Idéen er faktisk rigtigt god. Det er dermed muligt at implementere en konkret exporter - til HTML, RTF, OOXML eller hvad man måtte ønske - og så medsende den som argument til SaveTo-kaldet. Men der mangler i mine øjne en overload til SaveTo-metoden. Metoderne jeg kunne ønske mig at have i IDocument-interfacet er:

void SaveTo(System.IO.Stream stream);
void SaveTo(System.IO.Stream stream, AODL.Document.Export.IExporter iExporter);

Rationalet er, at det ikke altid er tilfældet, at man har behov for at persistere et dokument i en fysisk fil på disk. Naturligvis man kan hente den dannede fil fra disk og lave en stream af dette, men det er en lidt omvendt metodik. Skal et dokument dannes via en JIT-tankegang og sendes til fx en browser fra en web-applikation, er det et unødigt niveau af kompleksitet først at skulle danne filen på disk. Her kunne det være lækkert at kunne sende dokumentet til fx en MemoryStream og derefter til klienten. I/O-mæssigt ville dette være det mest optimale. At insistere på at danne en fil undervejs afstedkommer nemlig også yderligere problemer. Selve AODL skal nemlig konfigureres til at gemme filer på bestemte steder og i forbindelse med anvendelse af AODL på web er det ganske sandsynligt at det kommer til at give problemer med skriverettigheder til disk. Men hvis jeg lægger mine "bruger-briller" på hylden og i stedet kigger på AODL med udviklingsøjne, så kunne de næsten ikke implementere det på andre måder. Som nævnt ovenfor er et ODF-dokument reelt en fil- og folderstruktur, der er zippet i en fil. Indholdet af denne zip-fil skal jo være et eller andet sted indtil den dannes, og da der mig bekendt ikke findes metoder til at persistere en folderstruktur i hukommelsen, så har de været nødt til at lave den konkrete implementering.


Helt overordnet set er AODL et behageligt framework at arbejde med. Objektmodellen er intuitiv og let tilgængelig og da den er et OSS-projekt er kildekoden til det også fri. Det er helt klart en fordel og jeg drog netop nytte af dette, da jeg skulle finde ud af, hvordan framework'et var skruet sammen. På den negative side tæller at der i mine øjne mangler nogle muligheder for at gemme til andet end en disk. Det er også frustrerende at dokumentationen er sløset sat sammen - fx er kommentaren ved flere metoder og klasser "Summary of <some class>" - hvilket et default-kommentaren i Visual Studio 2003. Det kunne være rart med en mere gennembearbejdet dokumentation. Hvis man dykker helt ned i koden og eksemplerne herover, vil man lægge mærke til, at der efter instantiering af fx TextDocument() kaldes metoden .New(). Det synes jeg ærligt talt er en smule bekymrende, da et generelt designprincip i OOP er, at et objekt som udgangspunkt skal være klar til anvendelse efter en ctor er kaldet. At kræve at en ekstra metode afvikles er blot at introducere fejl. Fejlhåndtering i klassebiblioteket er også reelt ikke-eksisterende, da fejl blot genkastes, og det giver et overordnet indtryk af, at AODL ikke kvalitetsmæssigt er helt i top.

Min vurdering (af max 5 mulige smileys):