I just wanted to share with you a sign we noticed at the bistro we dined at yesterday.

Se my Google map.

Tags : brm

It is only a matter of minutes until the BRM starts here in Geneva. It will be a tough week here with long meetings during the day and preparation in the evening (good bye Red Light District!). So to be able to concentrate fully on the task at hand, I will shut down this blog for the week ... but don't worry - I'll be back soon.

Now there are only a few days until I jump on a plane and head South to Switzerland, Geneva for the ISO/IEC SC34 Ballot Resolution Group Meeting, amongst laymen known primarily as "The BRM meeting". I cannot get my head around if I am exited or worried about the outcome of the meeting ... thinking primarily about the enormous workload expecting us down there. We will have to work through about 1000 unique disposition of comments from ISO/IEC editor Rex - scattered over about 3500 comments in total. It's a daunting task indeed - not least for BRM convenor Alex Brown from BSI UK. Adding to this workload is the small addition, that we will be 120 delegates dealing with it. It truly is breath-taking and I cannot help but feel like a mountain-climber standing at the foot of Mount Everest waiting to start the journey upwards. I expect the days to be work in the BRM meeting during normal work hours and work in the evening at the hotel sifting through the results of the day preparing for the next.

I am also thinking quite a bit on what will actually take place in Geneva at the meetings. As I understand the ISO rules (and please note, I have been wrong before), after the BRM is done, the standard to approve is the original submission with the changes made in Geneva. In other words - if not a single disposition can be agreed upon, the standard stands as it did when it was submitted in Spring 2007. I really hope that the delegates opposing OOXML do not try to paralyze the BRM with a massive DOS-attack on the process. As Alex Brown points out, it is the responsibility of the Head of Delegations (HoD) that this does not happen, and if I look at what we have been informed by the Danish HoD, it is clear to me, that they actually have a lot of future credibility in standards work vested in this. If they are not able to perform in an ordily manner at the BRM, their influence in all the other work they are doing will be diminished. I hope this will keep the lid on most of the fanatic out-bursts.

I am also looking forward to meeting some of the people I met in Kyoto in December 2007. Of course it is always nice to talk to people you agree with, but I sometimes get a bit bored with the "echo-chamber"-feeling of spending too much time with people of your own opinion. So I am even more looking forward to conversations with the delegates (and, yes, even the people of Open Forum Europe, who I have been told will be cheering us along in the corridors of the meeting) who are a bit more on the negative side of DIS 29500. It will be interesting to see what they think.

OOh ... and on Saturday I will go see Dinosaurs!

Wanna join?

A standard is not "free enough" if implementation of it depends on existance of a proprietary technology on the specific platform. Ideally it should be possible simply to buy the specification and implement it without any other financial requirements.

This is where OOXML fails.

OOXML heavily depends on "Object Linking and Embedding Technology" also known as "OLE-technology". Section 9.3.3 of the specification deals with how objects are embedded in the file format. The section is divided in two where the first section specifies how to embed documents otherwise defined in this standard. These documents are defined as

*Formulas**Charts**Spreadsheets**Text documents**Drawings**Presentations*

This is one of the clear cases where it is obvious that Microsoft continiously tries to preserve their main cash-cow: __ The Microsoft Office eco system!__ OOXML not only depends on Microsoft's proprietary technology OLE, the specification

The section goes on telling us about binary objects:

*Objects that do not have an XML representation. These objects only have a binary representation [...] (see [OLE]).*

WTF? Once again a reference and requirement to use proprietary technologies like OLE! What if I want to embed my own JLSObjectType? What if I want to embed some object from the Linux-world like Bonobo-elements or KParts? The schema-elements only emphasizes my point:

<draw:object/> and <draw:object-ole/>

Are you also puzzled by this? Well, I don't blame you. To wrap up - we can embed "our own documents" and we can embed everything else. There are even two seperate elements from the draw-namespace that specifies this for us: <draw:object/> and <draw:object-ole/>. The entire schema-fragment is included here for your pleasure:

<define name="draw-object">

<element name="draw:object">

<ref name="draw-object-attlist"/>

<choice>

<ref name="common-draw-data-attlist"/>

<ref name="office-document"/>

<ref name="math-math"/>

</choice>

</element>

</define>

<define name="draw-object-ole">

<element name="draw:object-ole">

<ref name="draw-object-ole-attlist"/>

<choice>

<ref name="common-draw-data-attlist"/>

<ref name="office-binary-data"/>

</choice>

</element>

</define>

This is yet another example of Microsoft on one hand claiming "openness" and with the other hand forcing everyone to use their own proprietary, undocumented technology.

But we're not done:

The embedded object is referenced through an XLink attribute in the enclosing frame-element. The behaviour is described as (bold typeface is my addition, /JLS):

*The xlink:href attribute links to the object representation, as follows:
*

*For objects that have an***[OO]**XML representation, the link references the sub package of the object. The object is contained within this sub page exactly as it would as it is a document of its own.*For objects that do not have an XML representation, the link references a sub stream of the package that contains the binary representation of the object.*

Wow - wait a minute: Is this it? Don't you think a bit of clarification would be in order?

The fileformat for the physical file is a Zip-archive with a number of files and folders in it. But this archive also contains a "TOC"-list of the files and the mime type of the entire package. The latter is not an XML-file - where do I put this? Where do I put the TOC-file? What if my spreadsheet contains an image? Since the image is not in XML-format (it's binary) ... would my entire spreadsheet qualify as having "*an XML representation*"? And did you notice the part "*the link references a sub stream of the package that contains the binary representation of the object.*"? A *stream*? *Binary representation*? Again totally unspecified behaviour and noone will ever be able to implement this apart from Microsoft and Microsoft Office 2007.

Microsoft had a good chance to specify this properly in the beginning. They could have made an open format to enable competition or a format that would stiffle competetion. So what does Microsoft do? Yup, the anti-competitive choice. Anyone surprised?

What is interoperability, really?

Well, when it comes to document formats, some people seems to think that interoperability is the ability to transform one format to another. That high-fidelity interoperability can only be achieved when it is possible to perform a complete translation/conversion of format X to format Y.

The basic problem for this premis is that *if* you were able to do this conversion, it would be the same as being able to make a 1-1 mapping between the functionality and features of format X and format Y (and vice versa). However - this effectively means that format X is actually just a permutation of format Y ... making format X and format Y the same format (pick up your favorite book on mathematical topology to see the details).

When it comes to ODF and OOXML, the case is pretty clear - the two formats are not the same. Sure - they can both define **bold text**, but there are quite a few differences between the formats. A list of some of them can be found at the ODF-Converter website. I think that the list is the best argument for not being able to do a complete conversion of ODF to OOXML (and back). This was also one of the conclusions of the Frauenhofer/DIN-work in Germany, where they concluded that a full 1-1 mapping between the two formats could not be done.

The key question here is: *Is interoperability diminshed by this fact?*

If you ask Rob's posse, they will almost certainly say "Yes". They will say something like "Microsoft chose not to make OOXML interoperable with the existing ISO-standard ODF and therefore OOXML is a blow to interoperability".

If you ask me, I will say "No". I will say no because the term "interoperability" has been hijacked by the anti-OOXML-lobby in much the same way the SVG-namespace was hijacked by ODF TC. I will say "No" because interoperability means something radically different. The meaning is not rocket sciency, really ... and usually most people agree with the basis definition of interoperability. A few of those are:

Computer Dictionaly online:

http://www.computer-dictionary-online.org/interoperability.htm?q=interoperability:

The ability of software and hardware on multiple machines from multiple vendors to communicate.

IEEE:

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&isnumber=4683&arnumber=182763&punumber=2267

the ability of two or more systems or components to exchange information and to use the information that has been exchanged

US e-Government Act of 2002:

ability of different operating and software systems, applications, and services to communicate and exchange data in an accurate, effective, and consistent manner.

If you also look at the enormous list from Google you will see, that none of the definitions talk about the ability to convert formats. Instead they talk about communication between machines, platforms and networks. This is very close to my definition of interoperability when it comes to document formats.

The interoperability gained by using a specific document format is based on the possibility of implementing the format on any kind of platform, in any kind of software using any kind of operatingsystem. It is based on how well and consice and clear the language of the specification of the format is and it depends of howwell thought out the specification is.

It has nothing, nothing, nothing to do with the possibility of converting the format to any other format.

Working almost everyday with implementing solutions that support ODF and OOXML I am naturally tasked (or more appropriately: challenged) with ambiguities in the forementioned specifications. At first glance ODF has an appealing simpleness and form, and reading the specification is almost like reading a book in natural "prose". However - the easiness to read sadly comes at the expense of clear language. So - as always when implementing any specifications, you need to have somewhere to go to ask your technical questions regarding how to implement the damn thing or questions about how to read the devil.

And therein lies my problem:

Where do I go to get answers to get these questions about ODF? Where is *the* website for ODF-development?

I have tried the forums at opendocument.xml.org -but the groups there are almost dead.

I have tried the maillist for the OpenDocument TC, but it is also almost dead.

So please help me - where do I go?

Update: I almost forgot - I have also prowled the Danish blogsphere where the ongoing battle between OpenXml and ODF usually takes place, but noone has been able to give me any pointers to where they usually get their information about implementing ODF.

(or have I been so heavily stigmatised by being pro-choice that noone wants to help me?)

Tags : odf

OOXML has been accused of being rushed through not even the writing itself but also certification in both ECMA and ISO. It's a quick accusation to make but sometimes it can be really tricky to figure out if a statement is true or false. But you know, sometimes you stumple over something that really shows you that the specification was rushed through not only preliminary editing but also certification in ISO.

The one thing I noticed in was password hashing. As with other document formats, document protection can be defined in multiple ways. There is of course protection of the document itself but most document formats also allow protection of specific parts of the document or even read-only protection of the document. The way it's usually done is to ask the user for a password, hash it and store it in the document. When the document is opened the next time, the user is prompted for a password, and if it matches the stored value - the protection of the document (or parts of it) is released.

Now, this is defined, amongst other places, in section 4.4.1 (Section attributes) where it deals with protection of sections. The text says:

A section can be protected, which means that a user can not edit the section. The text:protected attribute indicates whether or not a section is protected. The user interface must enforce the protection attribute if it is enabled.

This is more or less what I wrote above. It also says:

A user can use the user interface to reset the protection flag, unless the section is further protected by a password. In this case, the user must know the password in order to reset the protection flag. The text:protection-key attribute specifies the password that protects the section. To avoid saving the password directly into the XML file, only a hash value of the password is stored.

And that's it.

WTF? Nothing more? Nothing about how to specify the hashing algorithm? Nothing about how to specify initialization vectors, prepending of zeroes ... nothing?

But wait - what if we look in the schema itself - maybe it's just the descriptive text that is a bit ... ahem ... limited. Ok - the schema says:

<define name="sectionAttr" combine="interleave">

<optional>

<attribute name="text:protection-key">

<ref name="string"/>

</attribute>

</optional>

</define>

Dammit - nothing here either. Notice also that it is not possible to store the way the hash-value is persisted. Is it a bit-sequence? A Hex'ed bit-sequence? A Base64-sequence? Nothing!

But wait again - let's look into the file of an actual document with read-only protection. Let's see what is stored in the document. Well, the XML-fragment lists as:

<table:table

table:name="Ark1"

table:style-name="ta1"

table:protected="true"

table:protection-key="PnKGfjzdfrt6XxQxdTcQVqbmA/7Ro="

table:print="false"

>

Any clever suggestions for me as an ocument consumer to what to do with this value? This is truly amazing. One one hand the authors talk about their document format being able to provide true and pure interoperability ... but they haven''t specified something as common as document protection. I wonder how they can claim this with a straight face. Interoperability is certainly not enabled by limiting the details of the specification to as little as this ... but maybe they just hope noone will use this feature and thereby have "interoperability by rejection".

I cannot help to wonder: who in their right mind would put up a suggestion for standardisation of a document format that was unspecified in such a central feature as "document protection". This must be one of those places where

Yeah, well ...

Today - or was it yesterday? - Patrick Durusau issued an open letter regarding the standardization of OOXML. It is an interesting read - especially for those of us that have worked endless hours in NSBs with processing the dispositions of comments from IEC/ISO editor Rex Jaeschke. I will not dig too much into the details of the statement, since I am sure others will do so, just quietly note that is it nice once in a while to be appreciated and not only picked at because of our "lack of qualifications" and accusations of being angle-grapping, bribed, paid for puppets only acting by the will of Microsoft.

Thank you, Patrick!

I will only quote this:

The OpenXML project has made a large amount of progress in terms of the openness of its evelopment. Objections that do not recognize that are focusing on what they want to see and not what is actually happening with OpenXML

Ooh - and one prediction: I think the anti-OOXML-lobby will try to drop this like a hot potato. The Pro-choice side will naturally salute this - and the Pro-ODF side will quietly wait out the storm quietly mumbling "Nothing to see here, please pass along".

Yes, some of them might even use some of the skills they learned in the third part of the course they took, Hypocricy 101.

"Talk is silver, but silence is gold"

As I promised in my latest article about ODF and MathML, I have worked a bit with the ECMA-equivilants of ODF and MathML: OOXML and OMML (Office Math ML).

A bit of introduction is propably a good idea:

In OOXML, mathematical content is structured using the internal markup language, Office Math ML or OMML, for short notation. OMML is closely tied to the structure of WordProcessingML and the look-and-feel is very similar. In contrast to the ODF-way, OMML is usually inserted *inline *in the WordProcessingML whereas it in ODF is kept in a seperat *part* of the package.

Ok - now that that is done with - lets get on with the good stuf!

As in my previous article, I'll work with the same base equation

Now, as I wrote in the other article, learning MathML is like learning a new (programming)-language, and I can tell you, it is no different with OMML. MathML arranges the mathematical elements by position whereas OMML arranges the mathematical elements by their explicit meaning, so a fraction is created in MathML as (simplified)

<math:mfrac>

<math:mi >π</math:mi>

<math:mn>4</math:mn>

</math:mfrac>

and in OMML it is created as (simplyfied)

<m:f>

<m:num>

<m:r>π</m:r>

</m:num>

<m:den>

<m:r>4</m:t>

</m:den>

</m:f>

So when dealing with MathML and e.g. fractions, we look at a fraction with "something at the top and something at the bottom". When dealing with OMML, we deal with "numerators" and "denominators". It is rather clear to me, that any skills learned in MathML are not directly applicable to OMML - and vice versa. It took me about the same amount of tíme to "get" MathML as it did to "get" OMML. In both cases, I had not worked with the specific ML before. It has taken me about a day to research and write each article.

**
Anyway - back to the plot.**

As always I work with my friend, "the minimal OOXML-file". It is an OOXML-file stripped from all the junk and cut down to the bare minimum - not even a single, not-used namespace declaration is left behind. You can see the minimal file here: Minimal OOXML.docx (1,16 kb).

So my task was a two-step-task: Since OOXML is rather new there is not that much information about OMML out there. So as first step I created a sample equation using Word 2007 to get a feeling of what it's all about. Then I found Part 4 of the OOXML-spec, located section 7 and started to put the OMML together. The OMML I ended with was this:

<m:oMathPara>

<m:oMath>

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>cos</m:t>

</m:r>

<m:f>

<m:num>

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>π</m:t>

</m:r>

</m:num>

<m:den>

<m:r>

<m:t>4</m:t>

</m:r>

</m:den>

</m:f>

<m:r>

<m:t>=</m:t>

</m:r>

<m:f>

<m:num>

<m:rad>

<m:radPr>

</m:radPr>

<m:deg/>

<m:e>

<m:r>

<m:t>2</m:t>

</m:r>

</m:e>

</m:rad>

</m:num>

<m:den>

<m:r>

<m:t>2</m:t>

</m:r>

</m:den>

</m:f>

</m:oMath>

I bet you are now thinking what I was thinking: what the f***? That's a lot of markup! Well, the reason why there is so much markup is that each piece of text/data in the equation is encapsulated in a *"run"*-element that enables additional styling. If all this additional markup including other property-markup is removed, the result is this:

<m:oMathPara>

<m:oMath>

cos

<m:f>

<m:num>π</m:num>

<m:den>4</m:den>

</m:f>

=

<m:f>

<m:num>

<m:rad>

<m:e>2</m:e>

</m:rad>

</m:num>

<m:den>2</m:den>

</m:f>

</m:oMath>

</m:oMathPara>

Ain't that purdy?

The OOXML-file with the equation is available here: minimal ooxml with math.docx (1,25 kb). It displays like this in Microsoft Office 2007:

Before I go into the details with converting from MathML to OMML, I think it is appropriate to pause and look at how MathML and OMML differ from each other. As I noted above there is quite a lot of "overhead" in OMML with everything being encapsulated in "runs". But there is a reason for this. The overhead enables us to do a couple of things that we cannot do with MathML.

You can put virtually everything into a OMML-formula that you can put into a normal WordprocessingML-fragment. As Murray Sargent puts it:

Word needs to allow users to embed arbitrary span-level material (basically anything you can put into a Word paragraph) in math zones and MathML is geared toward allowing only math in math zones. A subsidiary consideration is the desire to have an XML that corresponds closely to the internal format, aiding performance and offering readily achievable robustness. Since both MathML and OMML are XMLs, XSLTs can (and have) been created to convert one into the other. So it seems you can have your cake and eat it too. Thank you XML!

MathML allows some styling of the individual text fragments in the equations, but that's basically it.

To me it is really nice to work with markup for equations that is similar to the markup surrounding it. If I was to use MathML inline instead of OMML, the markup would be completely different than the markup around it. You can say that using MathML enables you to reuse any MathML-skills you might have in advance. Similarly you can say, that using OMML for equations enables you to reuse the skills you have from working with WordprocessingML. It's kind of a "give-and-take"-sitiation.

Having the overhead enables change-tracking on the same granular level as with your regular text. You can track changes in your equations on a character-by-character basis. In Word 2007 it looks like this when I make a modification to the equation (multiply the second fraction with "2" and remove the *cosine*-function from the first fraction).

The markup enabling this is here (for removing the cosine function, where "w:del" means "delete"):

**<w:del w:id="0" w:author=" Jesper Lund Stocholm" w:date="2008-01-30T10:41:00Z">**

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>cos</m:t>

</m:r>

**</w:del>**

This is not at all possible when using MathML out-of-the-box. You cannot merge the MathML with other markup like this, and if you use MathML as it is done in ODF (i.e. not "inline) it is simply impossible (at least as far as I can see). MathML in ODF is treated as an external object. which means that it is encapsulated in a OpenDocument Draw frame. The markup for one of the files I used in the other article is like this:

<text:p text:style-name="Standard">

<draw:frame

draw:style-name="fr1"

draw:name="Objekt1"

text:anchor-type="as-char"

svg:width="2.418cm"

svg:height="1.034cm"

draw:z-index="0"

>

<draw:object

xlink:href="./MathML"

xlink:type="simple"

xlink:show="embed"

xlink:actuate="onLoad"

/>

<draw:image

xlink:href="./ObjectReplacements/MathML"

xlink:type="simple"

xlink:show="embed"

xlink:actuate="onLoad"

/>

</draw:frame>

</text:p>

If I wanted to change some text like "Display equation below" to "Disrply equation below" (add an 'r' and delete an 'a') in ODT, it would look something like this:

<text:p>

Dis<text:change-start text:change-id="ct102825880"/>

r<text:change-end text:change-id="ct102825880"/>

pl<text:change text:change-id="ct102844952"/>

y equation below

</text:p>

So registration of the changes are - as with OOXML - merged into the text being modified. I think you could mark the whole equation as "modified" in ODF by putting an <text:change-start>-element around the complete <draw:object>-element, but I am not sure it would work. Also, OpenOffice.org doesn't seem to register changes to MathML-zones at all. Using OpenOffice.org it looks like this

(I changed the denominator of the first fraction to "54")

I cannot say that there are (or are not) other areas where MathML just doesn't cut it - these were just a couple of those that I have experienced myself. I do believe, though, that the examples above warrant the simply question:

Why the hell did OASIS ODF TC decide to use MathML in the first place?

Interoperability is clearly what the young kids want these days - so let's see what we can do with mathematical content. MathML and OMML are clearly two different markup languages, but is it possible to convert between them? Fortunately it is. Microsoft Office 2007 allows c/p of MathML into OMML-equations and it can even export OMML to MathML. Luckily for us the logic around this is not embedded into some fancy place in Microsoft Office 2007 - it is done using simple XSLT-transformations. They have made the stylesheets OMML2MML.xsl and MML2OMML.xls and if you apply these to either your OMML or MathML, it is translated to the other. Just for the fun of it I tried to convert the OMML-version of the equation to MathML. All I did was to find the OMML2MML.XSL and insert a single line in the XML-file document.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

**<?xml-stylesheet type="text/xsl" href="OMML2MML.XSL"?>**

<w:document

xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"

xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"

>

<w:body>

<w:p>

<m:oMathPara>

<m:oMath>

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>cos</m:t>

</m:r>

...

(and then I processed the file using my favorite XSLT-translator)

I'm sure - if you are a "technical" person - you have found yourself using/writing some code and just before you press "Compile" or "Run" you think: "This is sooo not gonna work". This was one of those situations for me - but you know what, it actually worked in the first try. The MathML generated is this

<?xml version="1.0" encoding="utf-8"?>

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">

<mml:mi mathvariant="italic">cos</mml:mi>

<mml:mfrac>

<mml:mrow>

<mml:mi>π</mml:mi>

</mml:mrow>

<mml:mrow>

<mml:mn>4</mml:mn>

</mml:mrow>

</mml:mfrac>

<mml:mo>=</mml:mo>

<mml:mfrac>

<mml:mrow>

<mml:mroot>

<mml:mrow>

<mml:mn>2</mml:mn>

</mml:mrow>

<mml:mrow />

</mml:mroot>

</mml:mrow>

<mml:mrow>

<mml:mn>2</mml:mn>

</mml:mrow>

</mml:mfrac>

</mml:math>

... and it validates as well (using Amaya and changing the XML-file from a UTF-16 file to UTF-8)

Now, wouldn't it be cool if the MathML generated from the OMML could be used in a ODT-document? You know what ... it can! I took the MathML above and inserted it into one of the documents I made for the ODF/MathML-article and inserted it into the MathML-zone of the ODF-package. The file is available here: minimal-mathml-omml-inject.odt (1,31 kb).

The result of opening the file using OpenOffice.org:

In the words of Murray Sargent, I guess you can have you cake and eat it too after all.

When writing my post about where to get help for ODF-development I suddenly remembered that I missed a part of this article: "The quirks". Because - naturally there are quirks with using OMML with Microsoft Office 2007 ... just as there were with MathML and OpenOffice.org.

Now, if you take another look at the OMML/XML-fragment I created, there were to parts I really couldn't figure out a way to remove:

<m:oMathPara>

<m:oMath>

<m:r>

<w:rPr>

**<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>**

</w:rPr>

<m:t>cos</m:t>

</m:r>

<m:f>

<m:num>

<m:r>

<w:rPr>

**<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>**

</w:rPr>

<m:t>π</m:t>

</m:r>

</m:num>

Now, the <w:rPr>-elements should have absolutely nothing to do with the content of <w:t>-element - or more correctly, the visibility of the text in the <w:t>-element should not depend of existance of an <w:rPr>-element. But if the two <w:rPr>-sections are omitted, the "cos"-text as well as the π-sign are not displayed. I really have no idea of why this is to so if you do, please let me know. Maybe one of the Microsoft Office 2007-Math guys could step in here?

When I studied at DTU (Technical University of Denmark) I basically lived in the Department of Mathematics. I did my bachelor project there and I did my thesis there. I think it would be fair to say that math is really in my blood (or was).

Of course - in those days we wrote our equations in LaTeX (not the suit) and I remember how we laughed diabolically at our co-students that did their papers in e.g. Microsoft Word and had to use the really, really annoying "Equation Editor" (shudder). I remember how we also laughed at the students that did pictures and graphs in e.g. Adobe PhotoShop or Visio (before it was aquired by Microsoft, afaik), coz everybody knew that it had to be done using xFig ... the program with the worst possible UI ever ... at least in those days.

For the purpose of these articles (an article about Microsoft Office 2007 and OMML will follow shortly) I dug into my thesis and looked at how math was displayed using LaTeX. I created a "reference equation" to use when trying to display some math in either ODF or OOXML. The test equation I made was this:

\begin{equation}

\cos\Big(\fraq{\pi}{4}\Big) = \Big(\fraq{\sqrt{2}}{2}\Big)

\end{equation}

For those of you not speaking LaTeX fluently - you should consult the "Not so short introduction to LaTeX" chapter 3 - or simply behold the equation below:

In ODF mathematical notations are done using MathML (section 12.5) - a W3C-standard for displaying mathematical content. The mathematical content is embedded in the ODF-package as an object and as far as I can see, it is not possible to use MathML inline in the content of the paragraphs of the document itself. I have earlier talked about ODF being vague and this is imo one of the places where some clarity could help.

But - learning MathML is like learning a new language ... it doesn't really make sense in the beginning. So I started to poke around a bit on the W3C-website in search of some tools or tutorials that would help me figure ot what MathML is all about. I eventually found a W3C tool called Amaya. It's a MathML/SVG-tool developed by W3C and I used this tool to create the MathML for the base equation above. In Amaya it looks like this:

The interesting part, of course, it the MathML created by Amaya. The MathML (slightly modified, but validated) is listed below

<?xml version="1.0" encoding="utf-8" ?>

<math xmlns="http://www.w3.org/1998/Math/MathML">

<mrow>

<mtext>cos</mtext>

<mo>(</mo>

<mfrac>

<mi>π</mi>

<mn>4</mn>

</mfrac>

<mo>)</mo>

<mi>=</mi>

<mo>(</mo>

<mfrac>

<msqrt>

<mn>2</mn>

</msqrt>

<mn>2</mn>

</mfrac>

<mo>)</mo>

</mrow>

</math>

If you look at the XML, it is pretty easy to identify the different parts of the equation.

So - in theory I should be able to put this into an ODF-document and it would be displayed when opening the document using OpenOffice.org - the reference implementation of ODF.

Let's see

Now, this was the easy part. I cannot figure out how to insert a regular "Pi"-sign in the formula, but the formula looks just fine. The file is available here: math.odt (9,72 kb). It looks like this:

This was a bit more tricky, since somehow it seems that the mathical formula can only be contained in a file called "content.xml" - otherwise OpenOffice.org simply shuts down. Also, I have removed alle meta-data, styling, extra namespace-declarations, embedded thumbnails and graphical representation of the formula. The cut-down ODT-file is available here: math-minimal.odt (1,43 kb). The visual representation is completely like the original file.

The MathML created by OpenOffice.org looks like this:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">

<math:math xmlns:math="http://www.w3.org/1998/Math/MathML">

<math:semantics>

<math:mrow>

<math:mi>cos</math:mi>

<math:mrow>

<math:mfenced math:open="" math:close="">

<math:mfrac>

<math:mi math:fontstyle="italic">pi</math:mi>

<math:mn>4</math:mn>

</math:mfrac>

</math:mfenced>

<math:mo math:stretchy="false">=</math:mo>

<math:mfenced math:open="" math:close="">

<math:mfrac>

<math:msqrt>

<math:mn>2</math:mn>

</math:msqrt>

<math:mn>2</math:mn>

</math:mfrac>

</math:mfenced>

</math:mrow>

</math:mrow>

<math:annotation math:encoding="StarMath 5.0">cos left ( pi over 4 right ) = left (sqrt{2} over 2 right )</math:annotation>

</math:semantics>

</math:math>

There are a couple of things to note about this. Firstly, I don't understand the namespace declaration as

"<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">"

The doctype should not matter at all - and why they chose to use a "DTD Modified W3C MathML 1.01" is beyond me. I'm not saying it's an error - I just don't get it. Enlighten me, pleze. Secondly the MathML created looks different from the MathML created my Amaya. However - just as the same paragraph can be presented in all sorts of way using HTML and the same equation can be presented in different ways (e.g. *sin ^{2}(x) + cos^{2}(x) = 1* is basically the same as

The picture below shows the content.xml loaded and displayed in Amaya. The green dot in the bottom right corner indicates that the MathML is valid. I have also made a test with embedding the MathML in a HTML-document and validated it against the W3C-validator and the result is the same.

Super!

Now, I have previously created the formula using Amaya and I just have to inject it into the ODT-file. I did and the file is available here: mathml-minimal-error.odt (1,23 kb). The result is, however, not as I expected

Ok - but as you might have noticed, all elements in the OOo MathML-file were namespace-prefixed, so maybe this will do the trick. I tried this as well but with the same result. File is available here: mathml-minimal-nsprefix-error.odt (1,24 kb).

I finally figured out what is wrong with the way OpenOffice.org handles MathML-content. It turns out that if I took the Amaya MathML (without ns-prefix) and inserted the MathML into the original content.xml-file * but preserved the DOCTYPE-declaration*, it works almost as expected. File is available here: mathml-minimal-inject-succes.odt (1,30 kb).

Well, some error are introduced. The Π-character is not displayed and the equation is displayed in bold. Also the equal-sign has disappeared as well.

Just for the fun of it I took the MathML-file generated by OpenOffice.org and removed the <semantics>-element as well as the <annotation>-element. File is available her: mathml-minimal-inject-no-semantics.odt (1,35 kb). The result when opening it in OpenOffice.org is .. well ... sad:

I have absolutely no idea of why it displays it like this. Removing the <semantics>-element and <annotation>-element should have no effect on the visual representation of the equation.

Well, I don't really know what to conclude. Most of the things I have shown above are imo due to errors in the implementation of OpenOffice.org where MathML is clearly not implemented ~~correctly~~ sufficiently. It seems that there are some unwritten rules to how MathML is supposed to be used when working with it in OpenOffice.org, but they seem rather unclear and weird to me.

But how OpenOffice.org behaves is really not important to me - some implementations of ODF are better than others, and maybe other implementations do a better job at displaying MathML. The point should be how the specification says it should be used. Luckily the ODF-spec only talks about how MathML is used in a single place - section 12.5 Mathematical Content. It says that "Mathematical content is represented by MathML 2.0 (see [MathML])". The RelaxNG-snippet provided also tells us that you can put everything into a "math area", **<math:math>**:

<?xml version="1.0" encoding="UTF-8" ?>

<define name="math-math">

<element name="math:math">

<ref name="mathMarkup" />

</element>

</define>

<!-- To avoid inclusion of the complete MathML schema, anything -->

<!-- is allowed within a math:math top-level element -->

<define name="mathMarkup">

<zeroOrMore>

<choice>

<attribute>

<anyName />

</attribute>

<text />

<element>

<anyName />

<ref name="mathMarkup" />

</element>

</choice>

</zeroOrMore>

</define>

So basically, all bets are off. I can only begin to wonder how other implementations of ODF use MathML.

As soon as I get the time for it, I'll write an article as this one with Office 2007 and OMML. I will investigate how to markup mathematical content using OMML and I will also try to use the XSL-files provided by Microsoft in Office 2007 to create XSLT-translations of my base equation from OMML to MathML and vice versa.

... stay tuned ...

Copyright © 2020 - Powered by BlogEngine.NET 2.9.1.0 - Theme by Farzin Seyfolahi