(see updated content below)

Some time ago I wrote an article about ODF and usage of mathematical content by MathML. One of the quirks I couldn't get my head around was why the XML-file containing the MathML-document *had* to be named "content.xml". I used the phrasing

This was a bit more tricky, since somehow it seems that the mathical formula can only be contained in a file called "content.xml" - otherwise OpenOffice.org simply shuts down.

Well ... the answer came to me almost in a dream - or at least in the evening in one of those semi-awake-states. Basically the answer lies in the answer I never got on my question on which parts in an ODF-poackage are mandatory and if there are any parts with pre-defined names. The ODF-specification section 9.3.3 says for embedding objects:

- The xlink:href attribute links to the object representation, as follows:
**For objects that have an XML representation, the link references the sub package of the object. The object is contained within this sub page exactly as it would as it is a document of its own.**- For objects that do not have an XML representation, the link references a sub stream of the package that contains the binary representation of the object.

Now, in ODF MathML clearly has an XML-representation and MathML is also a "real" "OpenDocument representation". So a MathML is stored as a "sub package" within the ODF-package itself. And that brings me back to my original question. You see, even though a piece of MathML is not a OpenDocument file per se, it still has to be embedded as an entire ODF-package (without the ZIP-structure). Section 2.1 clearly states this as

*
A document root element is the primary element of a document in OpenDocument format. It contains the entire document. All types of documents, for example, text documents, spreadsheets, and drawing documents use the same types of document root elements. The OpenDocument format supports the following two ways of document representation: *

*As a single XML document.**As a collection of several subdocuments within a package (see section 17), each of which stores part of the complete document. Each subdocument has a different document root andsstores a particular aspect of the XML document. For example, one subdocument contains the style information and another subdocument contains the content of the document. All types of documents, for example, text and spreadsheet documents, use the same document and subdocuments definitions.*

And since an ODF-package requires the main part to be called "content.xml" the MathML-file needs to be called "content.xml" as well. There is also no manifest file in the sub-package to tell the name of the package part - hence the requirement to have a fixed part name for the main part. I wish this information was more clearly described in ODF and not simply *implied* in the text.

... did I mention that I prefer the relationship-model of OPC?

(update 2008-05-16)

A bit down in the comment track of this post I promised to make an ODT-file with MathML embedded inline as opposed to the "regular" OOo-way of embedding it as a seperate object. Today I finally got around to doing it. It was actually really easy - I just took the embedded MathML-object from the ODF-package and pasted in into the correct location in the content.xml-file. A good thing is that with this approach you don't have to worry about specifying a DOCTYPE (the OOo-dependancy), so I would say this is highly recommendable. The XML looks like this:

[code=xml]<draw:frame

draw:name="Objekt1"

text:anchor-type="as-char"

svg:width="2.972cm"

svg:height="1.138cm"

draw:z-index="0">

<draw:object>

<math:math>

<math:mrow>

<math:mtext>cos</math:mtext>

<math:mo>(</math:mo>

<math:mfrac>

<math:mi>pi</math:mi>

<math:mn>4</math:mn>

</math:mfrac>

<math:mo>)</math:mo>

= <math:mo>(</math:mo>

<math:mfrac>

<math:msqrt>

<math:mn>2</math:mn>

</math:msqrt>

<math:mn>2</math:mn>

</math:mfrac>

<math:mo>)</math:mo>

</math:mrow>

</math:math>

</draw:object>

</draw:frame>[/code]

When opened in OOo (2.4 DA) the result looks like this:

Only remaining quirk is the missing "equals-sign", but I haven't had time to dig into those details yet.

If anyone can help and contribute here, that would be great.

As I promised in my latest article about ODF and MathML, I have worked a bit with the ECMA-equivilants of ODF and MathML: OOXML and OMML (Office Math ML).

A bit of introduction is propably a good idea:

In OOXML, mathematical content is structured using the internal markup language, Office Math ML or OMML, for short notation. OMML is closely tied to the structure of WordProcessingML and the look-and-feel is very similar. In contrast to the ODF-way, OMML is usually inserted *inline *in the WordProcessingML whereas it in ODF is kept in a seperat *part* of the package.

Ok - now that that is done with - lets get on with the good stuf!

As in my previous article, I'll work with the same base equation

Now, as I wrote in the other article, learning MathML is like learning a new (programming)-language, and I can tell you, it is no different with OMML. MathML arranges the mathematical elements by position whereas OMML arranges the mathematical elements by their explicit meaning, so a fraction is created in MathML as (simplified)

<math:mfrac>

<math:mi >π</math:mi>

<math:mn>4</math:mn>

</math:mfrac>

and in OMML it is created as (simplyfied)

<m:f>

<m:num>

<m:r>π</m:r>

</m:num>

<m:den>

<m:r>4</m:t>

</m:den>

</m:f>

So when dealing with MathML and e.g. fractions, we look at a fraction with "something at the top and something at the bottom". When dealing with OMML, we deal with "numerators" and "denominators". It is rather clear to me, that any skills learned in MathML are not directly applicable to OMML - and vice versa. It took me about the same amount of tíme to "get" MathML as it did to "get" OMML. In both cases, I had not worked with the specific ML before. It has taken me about a day to research and write each article.

**
Anyway - back to the plot.**

As always I work with my friend, "the minimal OOXML-file". It is an OOXML-file stripped from all the junk and cut down to the bare minimum - not even a single, not-used namespace declaration is left behind. You can see the minimal file here: Minimal OOXML.docx (1,16 kb).

So my task was a two-step-task: Since OOXML is rather new there is not that much information about OMML out there. So as first step I created a sample equation using Word 2007 to get a feeling of what it's all about. Then I found Part 4 of the OOXML-spec, located section 7 and started to put the OMML together. The OMML I ended with was this:

<m:oMathPara>

<m:oMath>

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>cos</m:t>

</m:r>

<m:f>

<m:num>

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>π</m:t>

</m:r>

</m:num>

<m:den>

<m:r>

<m:t>4</m:t>

</m:r>

</m:den>

</m:f>

<m:r>

<m:t>=</m:t>

</m:r>

<m:f>

<m:num>

<m:rad>

<m:radPr>

</m:radPr>

<m:deg/>

<m:e>

<m:r>

<m:t>2</m:t>

</m:r>

</m:e>

</m:rad>

</m:num>

<m:den>

<m:r>

<m:t>2</m:t>

</m:r>

</m:den>

</m:f>

</m:oMath>

I bet you are now thinking what I was thinking: what the f***? That's a lot of markup! Well, the reason why there is so much markup is that each piece of text/data in the equation is encapsulated in a *"run"*-element that enables additional styling. If all this additional markup including other property-markup is removed, the result is this:

<m:oMathPara>

<m:oMath>

cos

<m:f>

<m:num>π</m:num>

<m:den>4</m:den>

</m:f>

=

<m:f>

<m:num>

<m:rad>

<m:e>2</m:e>

</m:rad>

</m:num>

<m:den>2</m:den>

</m:f>

</m:oMath>

</m:oMathPara>

Ain't that purdy?

The OOXML-file with the equation is available here: minimal ooxml with math.docx (1,25 kb). It displays like this in Microsoft Office 2007:

Before I go into the details with converting from MathML to OMML, I think it is appropriate to pause and look at how MathML and OMML differ from each other. As I noted above there is quite a lot of "overhead" in OMML with everything being encapsulated in "runs". But there is a reason for this. The overhead enables us to do a couple of things that we cannot do with MathML.

You can put virtually everything into a OMML-formula that you can put into a normal WordprocessingML-fragment. As Murray Sargent puts it:

Word needs to allow users to embed arbitrary span-level material (basically anything you can put into a Word paragraph) in math zones and MathML is geared toward allowing only math in math zones. A subsidiary consideration is the desire to have an XML that corresponds closely to the internal format, aiding performance and offering readily achievable robustness. Since both MathML and OMML are XMLs, XSLTs can (and have) been created to convert one into the other. So it seems you can have your cake and eat it too. Thank you XML!

MathML allows some styling of the individual text fragments in the equations, but that's basically it.

To me it is really nice to work with markup for equations that is similar to the markup surrounding it. If I was to use MathML inline instead of OMML, the markup would be completely different than the markup around it. You can say that using MathML enables you to reuse any MathML-skills you might have in advance. Similarly you can say, that using OMML for equations enables you to reuse the skills you have from working with WordprocessingML. It's kind of a "give-and-take"-sitiation.

Having the overhead enables change-tracking on the same granular level as with your regular text. You can track changes in your equations on a character-by-character basis. In Word 2007 it looks like this when I make a modification to the equation (multiply the second fraction with "2" and remove the *cosine*-function from the first fraction).

The markup enabling this is here (for removing the cosine function, where "w:del" means "delete"):

**<w:del w:id="0" w:author=" Jesper Lund Stocholm" w:date="2008-01-30T10:41:00Z">**

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>cos</m:t>

</m:r>

**</w:del>**

This is not at all possible when using MathML out-of-the-box. You cannot merge the MathML with other markup like this, and if you use MathML as it is done in ODF (i.e. not "inline) it is simply impossible (at least as far as I can see). MathML in ODF is treated as an external object. which means that it is encapsulated in a OpenDocument Draw frame. The markup for one of the files I used in the other article is like this:

<text:p text:style-name="Standard">

<draw:frame

draw:style-name="fr1"

draw:name="Objekt1"

text:anchor-type="as-char"

svg:width="2.418cm"

svg:height="1.034cm"

draw:z-index="0"

>

<draw:object

xlink:href="./MathML"

xlink:type="simple"

xlink:show="embed"

xlink:actuate="onLoad"

/>

<draw:image

xlink:href="./ObjectReplacements/MathML"

xlink:type="simple"

xlink:show="embed"

xlink:actuate="onLoad"

/>

</draw:frame>

</text:p>

If I wanted to change some text like "Display equation below" to "Disrply equation below" (add an 'r' and delete an 'a') in ODT, it would look something like this:

<text:p>

Dis<text:change-start text:change-id="ct102825880"/>

r<text:change-end text:change-id="ct102825880"/>

pl<text:change text:change-id="ct102844952"/>

y equation below

</text:p>

So registration of the changes are - as with OOXML - merged into the text being modified. I think you could mark the whole equation as "modified" in ODF by putting an <text:change-start>-element around the complete <draw:object>-element, but I am not sure it would work. Also, OpenOffice.org doesn't seem to register changes to MathML-zones at all. Using OpenOffice.org it looks like this

(I changed the denominator of the first fraction to "54")

I cannot say that there are (or are not) other areas where MathML just doesn't cut it - these were just a couple of those that I have experienced myself. I do believe, though, that the examples above warrant the simply question:

Why the hell did OASIS ODF TC decide to use MathML in the first place?

Interoperability is clearly what the young kids want these days - so let's see what we can do with mathematical content. MathML and OMML are clearly two different markup languages, but is it possible to convert between them? Fortunately it is. Microsoft Office 2007 allows c/p of MathML into OMML-equations and it can even export OMML to MathML. Luckily for us the logic around this is not embedded into some fancy place in Microsoft Office 2007 - it is done using simple XSLT-transformations. They have made the stylesheets OMML2MML.xsl and MML2OMML.xls and if you apply these to either your OMML or MathML, it is translated to the other. Just for the fun of it I tried to convert the OMML-version of the equation to MathML. All I did was to find the OMML2MML.XSL and insert a single line in the XML-file document.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

**<?xml-stylesheet type="text/xsl" href="OMML2MML.XSL"?>**

<w:document

xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"

xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"

>

<w:body>

<w:p>

<m:oMathPara>

<m:oMath>

<m:r>

<w:rPr>

<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>

</w:rPr>

<m:t>cos</m:t>

</m:r>

...

(and then I processed the file using my favorite XSLT-translator)

I'm sure - if you are a "technical" person - you have found yourself using/writing some code and just before you press "Compile" or "Run" you think: "This is sooo not gonna work". This was one of those situations for me - but you know what, it actually worked in the first try. The MathML generated is this

<?xml version="1.0" encoding="utf-8"?>

<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">

<mml:mi mathvariant="italic">cos</mml:mi>

<mml:mfrac>

<mml:mrow>

<mml:mi>π</mml:mi>

</mml:mrow>

<mml:mrow>

<mml:mn>4</mml:mn>

</mml:mrow>

</mml:mfrac>

<mml:mo>=</mml:mo>

<mml:mfrac>

<mml:mrow>

<mml:mroot>

<mml:mrow>

<mml:mn>2</mml:mn>

</mml:mrow>

<mml:mrow />

</mml:mroot>

</mml:mrow>

<mml:mrow>

<mml:mn>2</mml:mn>

</mml:mrow>

</mml:mfrac>

</mml:math>

... and it validates as well (using Amaya and changing the XML-file from a UTF-16 file to UTF-8)

Now, wouldn't it be cool if the MathML generated from the OMML could be used in a ODT-document? You know what ... it can! I took the MathML above and inserted it into one of the documents I made for the ODF/MathML-article and inserted it into the MathML-zone of the ODF-package. The file is available here: minimal-mathml-omml-inject.odt (1,31 kb).

The result of opening the file using OpenOffice.org:

In the words of Murray Sargent, I guess you can have you cake and eat it too after all.

When writing my post about where to get help for ODF-development I suddenly remembered that I missed a part of this article: "The quirks". Because - naturally there are quirks with using OMML with Microsoft Office 2007 ... just as there were with MathML and OpenOffice.org.

Now, if you take another look at the OMML/XML-fragment I created, there were to parts I really couldn't figure out a way to remove:

<m:oMathPara>

<m:oMath>

<m:r>

<w:rPr>

**<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>**

</w:rPr>

<m:t>cos</m:t>

</m:r>

<m:f>

<m:num>

<m:r>

<w:rPr>

**<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>**

</w:rPr>

<m:t>π</m:t>

</m:r>

</m:num>

Now, the <w:rPr>-elements should have absolutely nothing to do with the content of <w:t>-element - or more correctly, the visibility of the text in the <w:t>-element should not depend of existance of an <w:rPr>-element. But if the two <w:rPr>-sections are omitted, the "cos"-text as well as the π-sign are not displayed. I really have no idea of why this is to so if you do, please let me know. Maybe one of the Microsoft Office 2007-Math guys could step in here?

When I studied at DTU (Technical University of Denmark) I basically lived in the Department of Mathematics. I did my bachelor project there and I did my thesis there. I think it would be fair to say that math is really in my blood (or was).

Of course - in those days we wrote our equations in LaTeX (not the suit) and I remember how we laughed diabolically at our co-students that did their papers in e.g. Microsoft Word and had to use the really, really annoying "Equation Editor" (shudder). I remember how we also laughed at the students that did pictures and graphs in e.g. Adobe PhotoShop or Visio (before it was aquired by Microsoft, afaik), coz everybody knew that it had to be done using xFig ... the program with the worst possible UI ever ... at least in those days.

For the purpose of these articles (an article about Microsoft Office 2007 and OMML will follow shortly) I dug into my thesis and looked at how math was displayed using LaTeX. I created a "reference equation" to use when trying to display some math in either ODF or OOXML. The test equation I made was this:

\begin{equation}

\cos\Big(\fraq{\pi}{4}\Big) = \Big(\fraq{\sqrt{2}}{2}\Big)

\end{equation}

For those of you not speaking LaTeX fluently - you should consult the "Not so short introduction to LaTeX" chapter 3 - or simply behold the equation below:

In ODF mathematical notations are done using MathML (section 12.5) - a W3C-standard for displaying mathematical content. The mathematical content is embedded in the ODF-package as an object and as far as I can see, it is not possible to use MathML inline in the content of the paragraphs of the document itself. I have earlier talked about ODF being vague and this is imo one of the places where some clarity could help.

But - learning MathML is like learning a new language ... it doesn't really make sense in the beginning. So I started to poke around a bit on the W3C-website in search of some tools or tutorials that would help me figure ot what MathML is all about. I eventually found a W3C tool called Amaya. It's a MathML/SVG-tool developed by W3C and I used this tool to create the MathML for the base equation above. In Amaya it looks like this:

The interesting part, of course, it the MathML created by Amaya. The MathML (slightly modified, but validated) is listed below

<?xml version="1.0" encoding="utf-8" ?>

<math xmlns="http://www.w3.org/1998/Math/MathML">

<mrow>

<mtext>cos</mtext>

<mo>(</mo>

<mfrac>

<mi>π</mi>

<mn>4</mn>

</mfrac>

<mo>)</mo>

<mi>=</mi>

<mo>(</mo>

<mfrac>

<msqrt>

<mn>2</mn>

</msqrt>

<mn>2</mn>

</mfrac>

<mo>)</mo>

</mrow>

</math>

If you look at the XML, it is pretty easy to identify the different parts of the equation.

So - in theory I should be able to put this into an ODF-document and it would be displayed when opening the document using OpenOffice.org - the reference implementation of ODF.

Let's see

Now, this was the easy part. I cannot figure out how to insert a regular "Pi"-sign in the formula, but the formula looks just fine. The file is available here: math.odt (9,72 kb). It looks like this:

This was a bit more tricky, since somehow it seems that the mathical formula can only be contained in a file called "content.xml" - otherwise OpenOffice.org simply shuts down. Also, I have removed alle meta-data, styling, extra namespace-declarations, embedded thumbnails and graphical representation of the formula. The cut-down ODT-file is available here: math-minimal.odt (1,43 kb). The visual representation is completely like the original file.

The MathML created by OpenOffice.org looks like this:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">

<math:math xmlns:math="http://www.w3.org/1998/Math/MathML">

<math:semantics>

<math:mrow>

<math:mi>cos</math:mi>

<math:mrow>

<math:mfenced math:open="" math:close="">

<math:mfrac>

<math:mi math:fontstyle="italic">pi</math:mi>

<math:mn>4</math:mn>

</math:mfrac>

</math:mfenced>

<math:mo math:stretchy="false">=</math:mo>

<math:mfenced math:open="" math:close="">

<math:mfrac>

<math:msqrt>

<math:mn>2</math:mn>

</math:msqrt>

<math:mn>2</math:mn>

</math:mfrac>

</math:mfenced>

</math:mrow>

</math:mrow>

<math:annotation math:encoding="StarMath 5.0">cos left ( pi over 4 right ) = left (sqrt{2} over 2 right )</math:annotation>

</math:semantics>

</math:math>

There are a couple of things to note about this. Firstly, I don't understand the namespace declaration as

"<!DOCTYPE math:math PUBLIC "-//OpenOffice.org//DTD Modified W3C MathML 1.01//EN" "math.dtd">"

The doctype should not matter at all - and why they chose to use a "DTD Modified W3C MathML 1.01" is beyond me. I'm not saying it's an error - I just don't get it. Enlighten me, pleze. Secondly the MathML created looks different from the MathML created my Amaya. However - just as the same paragraph can be presented in all sorts of way using HTML and the same equation can be presented in different ways (e.g. *sin ^{2}(x) + cos^{2}(x) = 1* is basically the same as

The picture below shows the content.xml loaded and displayed in Amaya. The green dot in the bottom right corner indicates that the MathML is valid. I have also made a test with embedding the MathML in a HTML-document and validated it against the W3C-validator and the result is the same.

Super!

Now, I have previously created the formula using Amaya and I just have to inject it into the ODT-file. I did and the file is available here: mathml-minimal-error.odt (1,23 kb). The result is, however, not as I expected

Ok - but as you might have noticed, all elements in the OOo MathML-file were namespace-prefixed, so maybe this will do the trick. I tried this as well but with the same result. File is available here: mathml-minimal-nsprefix-error.odt (1,24 kb).

I finally figured out what is wrong with the way OpenOffice.org handles MathML-content. It turns out that if I took the Amaya MathML (without ns-prefix) and inserted the MathML into the original content.xml-file * but preserved the DOCTYPE-declaration*, it works almost as expected. File is available here: mathml-minimal-inject-succes.odt (1,30 kb).

Well, some error are introduced. The Π-character is not displayed and the equation is displayed in bold. Also the equal-sign has disappeared as well.

Just for the fun of it I took the MathML-file generated by OpenOffice.org and removed the <semantics>-element as well as the <annotation>-element. File is available her: mathml-minimal-inject-no-semantics.odt (1,35 kb). The result when opening it in OpenOffice.org is .. well ... sad:

I have absolutely no idea of why it displays it like this. Removing the <semantics>-element and <annotation>-element should have no effect on the visual representation of the equation.

Well, I don't really know what to conclude. Most of the things I have shown above are imo due to errors in the implementation of OpenOffice.org where MathML is clearly not implemented ~~correctly~~ sufficiently. It seems that there are some unwritten rules to how MathML is supposed to be used when working with it in OpenOffice.org, but they seem rather unclear and weird to me.

But how OpenOffice.org behaves is really not important to me - some implementations of ODF are better than others, and maybe other implementations do a better job at displaying MathML. The point should be how the specification says it should be used. Luckily the ODF-spec only talks about how MathML is used in a single place - section 12.5 Mathematical Content. It says that "Mathematical content is represented by MathML 2.0 (see [MathML])". The RelaxNG-snippet provided also tells us that you can put everything into a "math area", **<math:math>**:

<?xml version="1.0" encoding="UTF-8" ?>

<define name="math-math">

<element name="math:math">

<ref name="mathMarkup" />

</element>

</define>

<!-- To avoid inclusion of the complete MathML schema, anything -->

<!-- is allowed within a math:math top-level element -->

<define name="mathMarkup">

<zeroOrMore>

<choice>

<attribute>

<anyName />

</attribute>

<text />

<element>

<anyName />

<ref name="mathMarkup" />

</element>

</choice>

</zeroOrMore>

</define>

So basically, all bets are off. I can only begin to wonder how other implementations of ODF use MathML.

As soon as I get the time for it, I'll write an article as this one with Office 2007 and OMML. I will investigate how to markup mathematical content using OMML and I will also try to use the XSL-files provided by Microsoft in Office 2007 to create XSLT-translations of my base equation from OMML to MathML and vice versa.

... stay tuned ...

Copyright © 2014 - Powered by BlogEngine.NET 2.9.1.0 - Theme by Farzin Seyfolahi