Do your math - OOXML and OMML (Updated 2008-02-12)

by jlundstocholm 30. January 2008 04:26

As I promised in my latest article about ODF and MathML, I have worked a bit with the ECMA-equivilants of ODF and MathML: OOXML and OMML (Office Math ML).

A bit of introduction is propably a good idea:

In OOXML, mathematical content is structured using the internal markup language, Office Math ML or OMML, for short notation. OMML is closely tied to the structure of WordProcessingML and the look-and-feel is very similar. In contrast to the ODF-way, OMML is usually inserted inline in the WordProcessingML whereas it in ODF is kept in a seperat part of the package. 

Ok - now that that is done with - lets get on with the good stuf!

As in my previous article, I'll work with the same  base equation



Now, as I wrote in the other article, learning MathML is like learning a new (programming)-language, and I can tell you, it is no different with OMML. MathML arranges the mathematical elements by position whereas OMML arranges the mathematical elements by their explicit meaning, so a fraction is created in MathML as (simplified)

<math:mfrac>
  <math:mi >
π</math:mi>
  <math:mn>4</math:mn>
</math:mfrac>

and in OMML it is created as (simplyfied)

<m:f>
  <m:num>
    <m:r>π</m:r>
  </m:num>
  <m:den>
    <m:r>4</m:t>
  </m:den>
</m:f>

So when dealing with MathML and e.g. fractions, we look at a fraction with "something at the top and something at the bottom". When dealing with OMML, we deal with "numerators" and "denominators". It is rather clear to me, that any skills learned in MathML are not directly applicable to OMML - and vice versa. It took me about the same amount of tíme to "get" MathML as it did to "get" OMML. In both cases, I had not worked with the specific ML before. It has taken me about a day to research and write each article.

Anyway - back to the plot.

As always I work with my friend, "the minimal OOXML-file". It is an OOXML-file stripped from all the junk and cut down to the bare minimum - not even a single, not-used namespace declaration is left behind. You can see the minimal file here: Minimal OOXML.docx (1,16 kb).

So my task was a two-step-task: Since OOXML is rather new there is not that much information about OMML out there. So as first step I created a sample equation using Word 2007 to get a feeling of what it's all about. Then I found Part 4 of the OOXML-spec, located section 7 and started to put the OMML together. The OMML I ended with was this:

<m:oMathPara>
  <m:oMath>
    <m:r>
      <w:rPr>
        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
      </w:rPr>
      <m:t>cos</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:r>
          <w:rPr>
            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
          </w:rPr>
          <m:t>π</m:t>
        </m:r>
      </m:num>
      <m:den>
        <m:r>
          <m:t>4</m:t>
        </m:r>
      </m:den>
    </m:f>
    <m:r>
      <m:t>=</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:rad>
          <m:radPr>
          </m:radPr>
          <m:deg/>
          <m:e>
            <m:r>
              <m:t>2</m:t>
            </m:r>
          </m:e>
        </m:rad>
      </m:num>
      <m:den>
        <m:r>
          <m:t>2</m:t>
        </m:r>
      </m:den>
    </m:f>
  </m:oMath>

I bet you are now thinking what I was thinking: what the f***? That's a lot of markup! Well, the reason why there is so much markup is that each piece of text/data in the equation is encapsulated in a "run"-element that enables additional styling. If all this additional markup including other property-markup is removed, the result is this:

<m:oMathPara>
  <m:oMath>
    cos
    <m:f>
      <m:num>π</m:num>
      <m:den>4</m:den>
    </m:f>
    =
    <m:f>
      <m:num>
        <m:rad>
          <m:e>2</m:e>
        </m:rad>
      </m:num>
      <m:den>2</m:den>
    </m:f>
  </m:oMath>
</m:oMathPara>

Ain't that purdy?

The OOXML-file with the equation is available here: minimal ooxml with math.docx (1,25 kb). It displays like this in Microsoft Office 2007:

Why not just use MathML?

Before I go into the details with converting from MathML to OMML, I think it is appropriate to pause and look at how MathML and OMML differ from each other. As I noted above there is quite a lot of "overhead" in OMML with everything being encapsulated in "runs". But there is a reason for this. The overhead enables us to do a couple of things that we cannot do with MathML.

Everything fits

You can put virtually everything into a OMML-formula that you can put into a normal WordprocessingML-fragment. As Murray Sargent puts it:

Word needs to allow users to embed arbitrary span-level material (basically anything you can put into a Word paragraph) in math zones and MathML is geared toward allowing only math in math zones. A subsidiary consideration is the desire to have an XML that corresponds closely to the internal format, aiding performance and offering readily achievable robustness. Since both MathML and OMML are XMLs, XSLTs can (and have) been created to convert one into the other. So it seems you can have your cake and eat it too. Thank you XML!

MathML allows some styling of the individual text fragments in the equations, but that's basically it.

WordprocessingML look-and-feel is preserved

To me it is really nice to work with markup for equations that is similar to the markup surrounding it. If I was to use MathML inline instead of OMML, the markup would be completely different than the markup around it. You can say that using MathML enables you to reuse any MathML-skills you might have in advance. Similarly you can say, that using OMML for equations enables you to reuse the skills you have from working with WordprocessingML. It's kind of a "give-and-take"-sitiation.

Revision-control (change-tracking) is possible

Having the overhead enables change-tracking on the same granular level as with your regular text. You can track changes in your equations on a character-by-character basis. In Word 2007 it looks like this when I make a modification to the equation (multiply the second fraction with "2" and remove the cosine-function from the first fraction).

 

 

The markup enabling this is here (for removing the cosine function, where "w:del" means "delete"):

<w:del w:id="0" w:author=" Jesper Lund Stocholm" w:date="2008-01-30T10:41:00Z">
  <m:r>
    <w:rPr>
      <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
    </w:rPr>
    <m:t>cos</m:t>
  </m:r>
</w:del>

This is not at all possible when using MathML out-of-the-box. You cannot merge the MathML with other markup like this, and if you use MathML as it is done in ODF (i.e. not "inline) it is simply impossible (at least as far as I can see). MathML in ODF is treated as an external object. which means that it is encapsulated in a OpenDocument Draw frame. The markup for one of the files I used in the other article is like this:

<text:p text:style-name="Standard">
 <draw:frame
   draw:style-name="fr1"
   draw:name="Objekt1"
   text:anchor-type="as-char"
   svg:width="2.418cm"
   svg:height="1.034cm"
   draw:z-index="0"
 >
  <draw:object
    xlink:href="./MathML"
    xlink:type="simple"
    xlink:show="embed"
    xlink:actuate="onLoad"
  />
  <draw:image
    xlink:href="./ObjectReplacements/MathML"
    xlink:type="simple"
    xlink:show="embed"
    xlink:actuate="onLoad"
  />
 </draw:frame>
</text:p>

If I wanted to change some text like "Display equation below"  to "Disrply equation below" (add an 'r' and delete an 'a') in ODT, it would look something like this:

<text:p>
  Dis<text:change-start text:change-id="ct102825880"/>
  r<text:change-end text:change-id="ct102825880"/>
  pl<text:change text:change-id="ct102844952"/>
  y equation below
</text:p>

So registration of the changes are - as with OOXML - merged into the text being modified. I think you could mark the whole equation as "modified" in ODF by putting an <text:change-start>-element around the complete <draw:object>-element, but I am not sure it would work. Also, OpenOffice.org doesn't seem to register changes to MathML-zones at all. Using OpenOffice.org it looks like this

 

(I changed the denominator of the first fraction to "54") 

 

I cannot say that there are (or are not) other areas where MathML just doesn't cut it - these were just a couple of those that I have experienced myself. I do believe, though, that the examples above warrant the simply question:

Why the hell did OASIS ODF TC decide to use MathML in the first place?

Interoperability

Interoperability is clearly what the young kids want these days - so let's see what we can do with mathematical content. MathML and OMML are clearly two different markup languages, but is it possible to convert between them? Fortunately it is. Microsoft Office 2007 allows c/p of MathML into OMML-equations and it can even export OMML to MathML. Luckily for us the logic around this is not embedded into some fancy place in Microsoft Office 2007 - it is done using simple XSLT-transformations. They have made the stylesheets OMML2MML.xsl and MML2OMML.xls and if you apply these to either your OMML or MathML, it is translated to the other. Just for the fun of it I tried to convert the OMML-version of the equation to MathML. All I did was to find the OMML2MML.XSL and insert a single line in the XML-file document.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="OMML2MML.XSL"?>
<w:document
  xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  >
  <w:body>
    <w:p>
      <m:oMathPara>
        <m:oMath>
          <m:r>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
            </w:rPr>
            <m:t>cos</m:t>
          </m:r>
...

(and then I processed the file using my favorite XSLT-translator)

I'm sure - if you are a "technical" person - you have found yourself using/writing some code and just before you press "Compile" or "Run" you think: "This is sooo not gonna work". This was one of those situations for me - but you know what, it actually worked in the first try. The MathML generated is this

<?xml version="1.0" encoding="utf-8"?>
<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <mml:mi mathvariant="italic">cos</mml:mi>
  <mml:mfrac>
    <mml:mrow>
      <mml:mi>π</mml:mi>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>4</mml:mn>
    </mml:mrow>
  </mml:mfrac>
  <mml:mo>=</mml:mo>
  <mml:mfrac>
    <mml:mrow>
      <mml:mroot>
        <mml:mrow>
          <mml:mn>2</mml:mn>
        </mml:mrow>
        <mml:mrow />
      </mml:mroot>
    </mml:mrow>
    <mml:mrow>
      <mml:mn>2</mml:mn>
    </mml:mrow>
  </mml:mfrac>
</mml:math>

... and it validates as well (using Amaya and changing the XML-file from a UTF-16 file to UTF-8)

Ét voilá

Now, wouldn't it be cool if the MathML generated from the OMML could be used in a ODT-document? You know what ... it can! I took the MathML above and inserted it into one of the documents I made for the ODF/MathML-article and inserted it into the MathML-zone of the ODF-package. The file is available here: minimal-mathml-omml-inject.odt (1,31 kb).

The result of opening the file using OpenOffice.org:

In the words of Murray Sargent, I guess you can have you cake and eat it too after all.

Smile

Update:

When writing my post about where to get help for ODF-development I suddenly remembered that I missed a part of this article: "The quirks". Because - naturally there are quirks with using OMML with Microsoft Office 2007 ... just as there were with MathML and OpenOffice.org.

Now, if you take another look at the OMML/XML-fragment I created, there were to parts I really couldn't figure out a way to remove:

<m:oMathPara>
  <m:oMath>
    <m:r>
      <w:rPr>
        <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
      </w:rPr>
      <m:t>cos</m:t>
    </m:r>
    <m:f>
      <m:num>
        <m:r>
          <w:rPr>
            <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
          </w:rPr>
          <m:t>π</m:t>
        </m:r>
      </m:num>

Now, the <w:rPr>-elements should have absolutely nothing to do with the content of <w:t>-element - or more correctly, the visibility of the text in the <w:t>-element should not depend of existance of an <w:rPr>-element. But if the two <w:rPr>-sections are omitted, the "cos"-text as well as the π-sign are not displayed. I really have no idea of why this is to so if you do, please let me know. Maybe one of the Microsoft Office 2007-Math guys could step in here?

Comments

2/13/2008 1:37:55 AM #

hAl

Two very nice articles about the Math support in ODF and OOXML. I'm thinking of adding the OOXML and OMML article as a reference in the wikipedia article.

hAl |

2/13/2008 2:05:42 AM #

jlundstocholm

hAl,

Thanks for your comment - it is nice to hear that the articles are actually of some use out there.

Smile

About wikipedia: Well, feel free ... that would be awesome.

jlundstocholm Denmark |

2/22/2008 12:40:26 AM #

hAl

Btw, I am not sure if you were already using this, but here is a tool to show the OOXML source of a MS Word document side by side with the document
blogs.code-counsel.net/.../Post.aspx?ID=28

Wouter also has an explorer package tool that can be used on all types of OOXML packages to explore the package content and for instance edit the XML files in them.

hAl |

2/22/2008 7:00:36 PM #

jlundstocholm

Hoi hAl,

Yes - I know both these tools. Unfortunateley Wouter has not released a new edition of the Package Explorer since august 2007 but it is very nifty. I have also installed the developer add-on for Office 2007 (Source viewer) but it didn't help me with the quirks mentioned above.

Also - I was just notified about the "Microsoft Visual Studio Tools for the Office System Power Tools"

www.microsoft.com/.../details.aspx

(now all I need if VS 2008)

Wink

jlundstocholm Denmark |

3/23/2008 1:18:06 AM #

Wouter

Silently working on it though Smile Started Code Counsel, and having a kid... Smile

Wouter Netherlands |

4/17/2008 10:44:39 AM #

Olly

Why did you leave out the parentheses of the formula when you created the OMML formula? You have them in your starting point. Any particular reason for this?

Olly Finland |

4/21/2008 6:04:43 AM #

Murray Sargent

Thanks for posting the nice articles about OMML and MathML. Couple of quick thoughts: there are bugs in the Office 2007 MathML <--> OMML XSLTs, but corrected versions will be available for downloading soon. You're right that Word 2007 needs the WordProcessingML run property specifying the math font (see <w:rFonts>...). This is a bug, or at least an annoying limitation, and hopefully will be fixed in a later release. The idea is that the default math font is specified as a document-level default math property and should be used if no other font is specified. These default math properties are specified in an <m:mathPr> element like

    <m:mathPr>
    <m:mathFont m:val="Cambria Math"/>
        <m:brkBin m:val="before"/>
        <m:brkBinSub m:val="--"/>
        <m:smallFrac m:val="off"/>
        <m:dispDef/>
        <m:lMargin m:val="0"/>
        <m:rMargin m:val="0"/>
        <m:defJc m:val="centerGroup"/>
        <m:wrapIndent m:val="1440"/>
        <m:intLim m:val="subSup"/>
        <m:naryLim m:val="undOvr"/>
    </m:mathPr>

and appear in the \word\settings.xml part of the docx file. MathML doesn't have such default properties, although it can be used with them.

Murray Sargent United States |

4/22/2008 9:39:53 PM #

jlundstocholm

Hi Murray,

Thanks for replying to my post.

As for the <w:rFonts> I did notice the text in the spec saying:


2.3.2.24 rFonts (Run Fonts)

If this element is not present, the default value is to leave the formatting applied at previous level in the style hierarchy. If this element is never applied in the style hierarchy, then the text shall be displayed in any default font which supports each type of content.


I just assumed that Microsoft Offic 2007 would know how to figure out a fall-back font to display the very normal chars, "c", "o" and "s".
Smile

And about the XSL-files to convert to/from MathML/OMML - will these be available under the OSP or somethihng similar as well? As you wrote on your blog in blogs.msdn.com/.../...-word-2007-mathematics.aspx, the XSL-files ship with Microsoft Office 2007 and is thereby - license-wise - guarded by the Microsoft Office 2007 EULA.

Will anyone be able to download them and use them for the purpose they want?

Thanks for your time.

Smile

jlundstocholm Denmark |

Comments are closed