a 'mooh' point

clearly an IBM drone

Struck by the Wrath of Roy "Kahn" Schestowitz

As the real work of maintaining OOXML in ISO has begun, I have had some time to ponder over events throughout the last year - starting with the BRM in Geneva in February 2008.

Being in Geneva was really hard work, negotiating all day in a 120-seat plenum while in the evening preparing suggestions in coorporation with other delegates from other countries. It was fun, but hard, nevertheless. I remember sitting on my bed in the hotel room trying to sort out everything while trying to keep up with the debates happening outside our meeting room (a defecto radio silence had been initiated voluntarily by the more prominent bloggers around the world, so no information was being released to the people desperate for the slightest amount of information).

One of the tools I used was to keep track of the sites referring to my blog and one evening as I sat eating Swiss chocolate on my bed in the hotel, I noticed a new referral from Google Groups.

link

link

link

Versioning of OOXML (thank you for all the fish)

One of the most pressing matters we had to deal with in Okinawa was a question raised by quite a few people including members of the national body of Switzerland as well as hAl on the blogs of Alex Brown, Doug Mahugh and yours truly:

How can you tell if a document is generated using the original set of schemas or the new (improved) ones?

The truth is: you can’t.

Well, at least not at the moment. You can get a hint from sniffing at various parts of the document, but there is no definitive way to do it. We all agreed that we had to come up with a solution, and we discussed (at length in session as well as during breaks, dinners and sight-seeing) what to do.

Roughly speaking, there are a few ways we could do it, including

  • Changing the namespace-name of the schemas
  • Expand the conformance attribute to indicate version of OOXML
  • Adding an optional version attribute to the root elements of the documents (WordpressingML, SpreadsheetML and PresentationML) defaulting to the original edition of  ECMA-376.

Version attribute

Let me start with the last option, since it is the easiest one to explain and understand.

ODF has a “version”-attribute in the root element of ODF-documents. It is defined in the urn:oasis:names:tc:opendocument:xmlns:office:1.0-namespace, so when creating e.g. an ODF spreadsheet using OOo 3, you will see the following xml-fragment:

[code:xml]<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" (...)
  office:version="1.2">
</office:document-content>[/code]

The above would tell you to use version 1.2 of the ODF-spec – currently being drafted by OASIS.

We could do a similar thing with OOXML, that is, having an optional version-attribute with the version number of the applied flavor of OOXML. This approach would have some clear advantages. First and foremost it would allow all the existing applications supporting OOXML to do absolutely nothing to their existing code base to continue to be able to read and process OOXML-files in ECMA-376 1st Ed format. It would also enable them to use any existing schema-validation of content and all existing files in ECMA-376 would still be perfectly valid.

Expanding the conformance attribute

Another thing to do would be to expand the new conformance attribute. At the BRM in Geneva a new conformance attribute was added to the root elements to display to which version of OOXML the document conforms. You will perhaps recognize this XML-fragment
[code:xml]<w:document conformance=”strict”>
</w:document>[/code]
We could also use this attribute and add version information to it. A way to do it would be
[code:xml]<w:document conformance=”transitional-1.0”>
</w:document>[/code]
for the ECMA-376 1st Ed and something else for any subsequent versions.

Fixing or solving?

The problem with the two alternatives mentioned above is that they provide an immediate fix, but they are in no way panaceas for the issue of versioning. In Geneva we split up OOXML into 4 distinct parts and tried the best we could to make sure, that they were “islands” within themselves. So in the original submission’s Part 2 dealing with OPC, there were dependencies to WordPressingML (AFAIK) and these were removed. The result is that you can now refer to ISO/IEC 29500-2 should you in your implementation need a packaging format where OPC suits your needs. The basic idea was exactly this; to provide a way for other standards to be able to “plug in” to OOXML and reuse specific parts of it.

The two fixes described above provide a fix for the problem with versioning of “the document stuff”; text documents, spreadsheets and presentations – but they do nothing for Part 2 and Part 3 (under the assumption that Part 4 will not change). The trouble is - this is not only a theoretical problem. ECMA TC46 working with XPS (Xml Paper Specification) has based the package format for XPS on OPC. But it is difficult for them to refer to ISO/IEC 29500-2 OPC since it is not possible to distinguish the namespace name from its predecessor ECMA-376 1st Ed. So unless we figure out a solution, they will have to refer to ECMA-376 1st Ed (and it was my impression that they’d prefer to refer to ISO OPC instead).

This is kind of annoying or maybe even embarrassing. We (the ISO process) chose to split up OOXML to allow reuse – but the first time someone knocks on our door and wishes to do exactly that – we (unless we find a solution to this problem) will have to say: “Well, we didn’t actually mean it”.

Change the namespace-name

An entirely different approach would be to change the namespace name(s) of IS29500. The original names where along the lines of

http://schemas.openxmlformats.org/package/2006/content-types
http://schemas.openxmlformats.org/package/2006/relationships
http://schemas.openxmlformats.org/spreadsheetml/2006/main
(…)

So an alternative solution would be to change the values of the namespace name. The names above could be changed to

http://schemas.openxmlformats.org/package/IS29500-2008/content-types
http://schemas.openxmlformats.org/package/ IS29500-2008/relationships
http://schemas.openxmlformats.org/spreadsheetml/IS29500-2008/main

(I would have liked to use colon as seperator between the ISO project number and year, but according to http://www.w3.org/TR/REC-xml/#sec-common-syn, it seems colons are not allowed in namespace names.)

What would be the consequence of this?

The up-side

Basically, changing the namespace name would solve the problem with distinguishing between ECMA-376 1st Ed and IS29500:2008. It would be trivial to distinguish content based on either standard and it would apply to all parts of the specification. Actually, it would apply to all schemas in the specification, so it would enable someone to create a document based on ECMA-376 OPC, IS29500 WordpressingML and ECMA-376 DrawingML (even though this is permitted in the current version of OOXML). It would also give us the chance to have a fresh start with IS29500:2008 and give us a clean slate for our further work.

The down-side

Changing the namespace is sadly not a silver bullet – unfortunately the free lunch comes with nausea as well. The trouble is – by changing the namespace, applications that support ECMA-376 will break if they try to load documents based on IS29500 since the namespace will be foreign to them.

The question is, though: shouldn’t they?

The purpose of XML namespaces are to identify the vocalulary of the elements of an XML-fragment. So the real question could be: are we talking about a new vocabulary when going from ECMA-376 to IS29500:2008? Are the changes from the BRM so drastic that we wouldn’t expect applications supporting ECMA-376 to be able to load documents conforming to IS29500?

Well, it was of importance to ECMA and most of the delegates at the BRM to ensure that whatever we did to change the specification did not render existing nonconformant. We succeeded quite well in doing just this,  so one could argue that the changes were not that big. However, this just concerns the transitional schemas. If you remember, the changes in schema structure were quite big. We divided one big chunk of schemas into two categories, “strict” and “transitional” and I would indeed argue that we changed the vocabulary by doing just that. We changed it from defining a vocabulary with a complete mess of legacy-stuff and new stuff into two separate piles with one “going-forward-vocabulary” and one “going-backwards-vocabulary”. Isn’t that big enough to change the namespace name?

Do it right the first time

At the WG4-meeting I was actually advocating for a simple addition of a version attribute and solve the bigger namespace problem at a later time for a revision of OOXML, but the more I think about it, the more I am convinced this is the wrong way. We are in a position right now where there are no applications out there supporting the full set of IS29500. Not changing the namespace name will not make the problem go away – it will just postpone the issue, and if we wait, the problem will become increasingly bigger as applications will surface with support for IS29500. The problem will be even bigger if you have a long list of supporting applications and not – as now – none a single one.

The more I think about it, the more I am sure the right way to do it is

  1. Add a new version attribute to the root elements defaulting to “1.0” which would be ECMA-376 1st Ed. IS29500:2008 would have version “1.1”.
  2. Change the namespace name for IS29500 in a matter as outlined above.

Vendors in the process of implementing IS29500 will then have to add some code to their application to support this.

But – I am in no way sure I have covered all angles. Am I missing something here?

Smile

Post WG4-meetings in Okinawa

 

Last week (week 4 of 2009) we had the first face-2-face meeting in SC34/WG4 on the Japanese island of Okinawa. Since there is quite a big overlap between the participants of WG4 and those of WG5, the two groups meet at the same time and place to minimize travel costs and time away.

Quite a lot of people had chosen to take the "small" trip to Okinawa, and at roll-call the first day, a total of 22 people sat around the table in the meeting room. Of these were 6 from ECMA and 14 represented various national bodies (of these were 3 employed by Microsoft)

How's that for full disclosure, eh?

The purpose of the meeting was to get started maintaining OOXML and to discuss what to do in the future. We were also to discuss the already submitted DRs and see what we could do about these.

One of the first things I realized on that morning was, that by participating in standardization in ISO (and from what I hear, also most other standardisation organisations) you need to accept following a certain number of rules. As it turns out, we are in no way free to fix problems in the spec, we are in no way free to make new additions of the spec etc. As it turns out, there are rules constraining all of these activities. So the project editor (Rex Jaeschke) took us on a lengthy trip down "ISO-regulation-lane". The idea was to give us all some knowledge of the rules and terms (as in 'nouns') used in the directives so that we would all be on the same, first page moving forward. The basis for the walk-through was a document prepared by the editor and it is available on WG4's website.

DRs

Quite a lot of DRs were submitted to WG4 before the meeting. I think the total number was about 25-30, and they ranged from fixing spelling errors to clarification of the text and schema changes. The first thing we discussed was how to categorize the DRs. The "buckets" were "defects" and "amendments" and how to distinguish between editorial defects and technical defects. We quickly agreed that focus should initially be to verify and aprove any DRs relating to decisions from Geneva that had not made it into the final text. ECMA also had quite a big batch of DRs submitted before the meetings, but since they were not submitted in time for everyone to look at them, we did not make any decisions about these - ECMA just went through them in detail and we discussed each of them.

Details we discussed were certainly of world-changing importance, such as the difference between the text fragments "nearest thousands of bytes" and "nearest thousand bytes", the allowed content of string-literals and intricate details of the xml:space-attribute in an XML-element based on the XML 1.0 specification. Still, it was quite entertaining and it was delightful to sit back and simply overhear the discussions of people that really know what they were talking about.

Comment collection form

ECMA has set up a comment collection form to submit DRs from interested national bodies. It has already been set to use by the Japanese national body and it seems to serve its purpose just fine. Hopefully it will enable us to improve data qualityof the incoming DRs. We gave feedback to the application to Doug Mahugh from ECMA and hopefully he will see to that the suggestions are implemented (especially mine!)

Smile

We discussed at length the concept of "openness" and how we should apply it to our work, and I will cover my feelings for this in detail in a top-post a bit later.

Last minute impressions

This was my second trip to Japan and I must say that I am getting more and more excited about it for every trip. The culture is fantastic and it is a good challenge to be in a part of the world, where you don't speak the language and is incapable of reading almost any signs. I did get a bit of "Lost in Translation"-feeling on my trip back (+40 hrs!), but it was really a good trip. Two thumbs up for the convener, Murata-san who showed us how a splendid host acts and shows their guests a great time.

All in all I also think we had some productive days on Okinawa. We managed to deal with quite a few DRs and to set up work-processes for the future and I am sure we will benefit in the near future of the work we did. It was also interesting to watch the "arm-wrestling" between the national bodies and ECMA. We were on the same page in most cases, but it was interesting to be part of the discussions where we were not. It will be interesting to see how this will evolve in the future. ISO is a bit different than, say, OASIS because of the involvement of national bodies. Where the basis for most of the groups in OASIS is "vendors", it is quite orthogonal to this in ISO where this concept does not really exist. Some of you may remember Martin Bryan's angry words at the plenary in Kyoto about vendor participation and "positions" vs. "opinions" and I am looking forward to take part in these discussions in WG4 as well as here.

 


Additional resources

Below are a couple of links that might be of interest to you

SC34 WG4 public website

SC34 website

(and for Okinawa-related activities)

Alex Brown's write-up about day 0, 1, 2 and 3-4 of the meetings

Doug Mahugh's summary of what took place

Pictures taken by the secretariat

Picture-stream from Doug Mahugh

Picture stream from Alex Brown

Picture stream from Jesper Lund Stocholm (me!)

Twitter stream from Doug Mahugh

Twitter stream from Alex Brown (notice the l33t-speek Twitter-tag Alex uses!)

Twitter stream from Jesper Lund Stocholm

Bonus for those of you waiting for the credits at the end of the movie:

The day I arrived I was met by Murata-san and Alex Brown in the lobby of the hotel. They were on their way to dinner at a restaurant called "Kalahaai" in the "American Village" of Naha. The dinner took place in a restaurant with live Japanese music from a group called "Tink Tink". Their music was really amazing. The last evening we went there again, and Shawn and I were listening completely baffled to the music and on-stage talks of the performers. It was an amazing experiance to sit in the restaurant not understanding a single word they said - and still not being able to stop listening to them.



(courtesy of Doug Mahugh)

And look at this picture. Thanks to Doug's tele/wide/fish-eye-whatever-lense on his camera, I look like an absolutely mad-/maniac man! No girls were hurt during this, I should point out.


(courtesy of Doug Mahugh)

Smile

The complexity of SpreadsheetML - oh the sheer joy of it!

Having a bit of time on my hands while attending the SC34/WG4-meeting in Okinawa, I thought I'd write up a blog post I have wanted to write in quite some time.

The reason for me doing this was a requirement I am often presented by CIBER's customers - export my data to Excel. The data they want us to export are traditionally grouped into three categories:

  • Text (strings)
  • Numbers
  • Dates

Creating cells with numbers and text are really a no-brainer in OOXML. It is a bit more complicated when it comes to dates, because dates in e.g. ISO 8601-format are not as such supported as "built-in cell data types" in SpreadsheetML. Instead, dates are presented by styling content in number-cells. This means that to be able to display a date in SpreadsheetML, you need to be know "a bit" about styling in spreadsheets.

Now, as some of you remember, representation of dates in spreadsheets using OOXML is done in "serial form" meaning that dates are stored as numbers. These numbers are also known as "Julian days" - not to be mistaken with the "Julian Calendar". In even other words a date is represented as the number of days since some starting point in time.

So if I wanted to store the date "December 20nd 2009" in OOXML, I would have to convert it to a "julian representation" - in this case "40167". This is really just a minor annoyance - the conversion is trivial and a no-brainer. However - the fun has not started yet.

If you look at the markup required, it would have to be like this:

[code:xml]<sheetData>
  <row r="1">
    <c r="A1">
      <v>40167</v>
    </c>
  </row>
</sheetData>[/code]

So this will give me a cell with a serial representation of 2009-12-22. However, if I open this in an OOXML-compliant application, it will display "40167". As I mentioned above, it turns out that displaying the serial representation as a "proper date" requires styling of the cell content.

The key is an attribute on the <c>-element I omitted in the example above.

[code:xml]<sheetData>
  <row r="1">
    <c r="A1" s="0">
      <v>40167</v>
    </c>
  </row>
</sheetData>[/code]

The "s"-attribute specified the style for the given cell. The specefication says this for this particular attribute:

The index of this cell's style. Style records are stored in the Styles Part.

Ok - cool so the good thing here is, that we now know what the attribute is used for. The bad thing is that we don't know anything about "how".

Styles for SpreadsheetML are described in section 3.8. The complete section is about 110 pages and it describes at length each element name and attribute but again it more answers "what" than "how".

(I just talked to another delegate about if a standard should describe both the hows and the whats, and it seems that the jury is still out on that one, so these are simply my personal observations of using the specification to solve a concrete problem).

So in figuring out how to do this, a good starting point would be to look at the list of valid child elements. These are defined as

[code:xml]<complexType name="CT_Stylesheet">
  <sequence>
    <element name="numFmts" type="CT_NumFmts" minOccurs="0" maxOccurs="1"/>
    <element name="fonts" type="CT_Fonts" minOccurs="0" maxOccurs="1"/>
    <element name="fills" type="CT_Fills" minOccurs="0" maxOccurs="1"/>
    <element name="borders" type="CT_Borders" minOccurs="0" maxOccurs="1"/>
    <element name="cellStyleXfs" type="CT_CellStyleXfs" minOccurs="0" maxOccurs="1"/>
    <element name="cellXfs" type="CT_CellXfs" minOccurs="0" maxOccurs="1"/>
    <element name="cellStyles" type="CT_CellStyles" minOccurs="0" maxOccurs="1"/>
    <element name="dxfs" type="CT_Dxfs" minOccurs="0" maxOccurs="1"/>
    <element name="tableStyles" type="CT_TableStyles" minOccurs="0" maxOccurs="1"/>
    <element name="colors" type="CT_Colors" minOccurs="0" maxOccurs="1"/>
    <element name="extLst" type="CT_ExtensionList" minOccurs="0" maxOccurs="1"/>
  </sequence>
</complexType>[/code]

The elements that should (ahem) draw attention to them are "cellStyles", "cellStyleXfs" and "cellXfs".So, if you want to apply formatting directly to a cell, look at e.g. the element <cellXfs> defined in section 3.8.10. It says (in abstract)

This element contains the master formatting records (xf) which define the formatting applied to cells in this workbook. These records are the starting point for determining the formatting for a cell. Cells in the Sheet Part reference the xf records by zero-based index.

The <cellXfs>-element has a child element called <xf>. The element is defined as

[code:xml]<complexType name="CT_Xf">
  <sequence>
    <element name="alignment" type="CT_CellAlignment" minOccurs="0" maxOccurs="1"/>
    <element name="protection" type="CT_CellProtection" minOccurs="0" maxOccurs="1"/>
    <element name="extLst" type="CT_ExtensionList" minOccurs="0" maxOccurs="1"/>
  </sequence>
  <attribute name="numFmtId" type="ST_NumFmtId" use="optional"/>
  <attribute name="fontId" type="ST_FontId" use="optional"/>
  <attribute name="fillId" type="ST_FillId" use="optional"/>
  <attribute name="borderId" type="ST_BorderId" use="optional"/>
  <attribute name="xfId" type="ST_CellStyleXfId" use="optional"/>
  <attribute name="quotePrefix" type="xsd:boolean" use="optional" default="false"/>
  <attribute name="pivotButton" type="xsd:boolean" use="optional" default="false"/>
  <attribute name="applyNumberFormat" type="xsd:boolean" use="optional"/>
  <attribute name="applyFont" type="xsd:boolean" use="optional"/>
  <attribute name="applyFill" type="xsd:boolean" use="optional"/>
  <attribute name="applyBorder" type="xsd:boolean" use="optional"/>
  <attribute name="applyAlignment" type="xsd:boolean" use="optional"/>
  <attribute name="applyProtection" type="xsd:boolean" use="optional"/>
</complexType>[/code]

The attribute you want here is "numFmtId". The attribute is described as "Id of the number format (numFmt) record used for this cell format".

(are we getting there soon?)

Anywho, going to the reference of numFmt will lead you to paragraph 3.8.30 numFmt (Number Format) and it will tell you, that some of the values of the attribute are implied. That's really just another way of saying "reserved values". 

ID
formatCode
 
 0
 General
 1  0
 2  0.00
 3  #,##0
 4  #,##0.00
 9  0%
 10  0.00%
 11  0.00E+00
 12  # ?/?
 13  # ??/??
 14  mm-dd-yy
 15  d-mmm-yy
 16  d-mmm
 17  mmm-yy
 18  h:mm AM/PM
 19  h:mm:ss AM/PM
 20  h:mm
 21  h:mm:ss
 22  m/d/yy h:mm
 37  #,##0 ;(#,##0)
 38  #,##0 ;[Red](#,##0)
 39  #,##0.00 ;(#,##0.00)
 40  #,##0.00 ;[Red](#,##0.00
 45  mm:ss
 46  [h]:mm:ss
 47  mmss.0
 48  ##0.0E+0
 49  @


It looks like id 15 could be the one we are looking for. So I'm gonna add this number format to the xf-elements's numFmt-attribute and create this xml-fragment:

[code:xml]<cellXfs count="2">
  <xf numFmtId="15" (...)  />
</cellXfs>[/code]

Behold - it actually works. When I load this in Microsoft Office 2007, it will display this:



So what have I learned here (apart from the astounding complexity of this relatively trivial task)? Well, to display a date using SpreadsheetML, you need to know a bit about SpreadsheetML styles. You will also need to do a fair amount of digging in the specification as well as in existing OOXML-files, since I could not find this information anywhere. Luckily for you, the content of this blog is licensed under Creative Commons attribution license, so feel free to use it however you should wish to do so.

To sum it all up, you will need the following items to display a cell in SpreadsheetML:

1. The cell fragment

[code:xml]<sheetData>
  <row r="1">
    <c r="A1" s="0">
      <v>40167</v>
    </c>
  </row>
</sheetData>[/code]

Notice that the cell is styled using the attribute "s" with a value of "0".

2. The style part

[code:xml]<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <cellXfs count="1">
    <xf numFmtId="15" (...) />
  </cellXfs>
</styleSheet>[/code]

Notice that index "0" of the <cellXfs>-collection has a numFmt-attribute with the value "15" resulting in displaying the date correctly.

I have created a small test file based on the walk-through above and it is available here: test_dates.xlsx (2.25 kb).

And in other news:

So, you might ask, how is this done using other document formats? Well, it turns out to be drastically less complex.

ODF

[code:xml]<table:table-row>
  <table:table-cell office:value-type="date" office:date-value="2009-12-20">
    <text:p>20-12-09</text:p>
  </table:table-cell>
</table:table-row>[/code]

OOXML IS29500

[code:xml]<sheetData>
  <row r="1">
    <c r="C4" t="d">
      <v>1976-11-22T08:30Z</v>
    </c>
  </row>
</sheetData>[/code] 

Both examples above should require no additional formatting.

You might also ask, if this could have been done in any other way in OOXML? Well, as far as I read the specification, there is no way around the style-part-trouble. But you could create your own number formatting if you should wish so. I would actually prefer this angle, since it would be a step away from pre-determined (implied) values in styles and keep the package content self-contained.

You know, this could actually be the basis for a nice new defect report for WG4: "Remove all implied values in the specification and move them to the transitional Part 4".

Is there an end of it?

I know this was quite a lenghty post - but is it of any value at all - and would you like more of these investigative posts in the future?

Smile

What's up Japan!

Jeeez ... has it been a long time since I last wrote a blog-entry here. It's not so much that I didn't want to write something ... but I have found myself pre-occupied with other tasks at the grinding mill. It also seems to me that most of the other participants of the disussions have done the same thing - maybe with on exception.

Anywho - in other news, OASIS submitted their response to the ISO/IEC JTC1/SC34-defect report.It is in form of the document "Open Document Format for Office Applications (OpenDocument) 1.0 Errata 01". The Danish mirror-committee to JTC1/SC34 talked about it at our meeting last Friday and we will look into the response as soon as possible. I have made a kindof-thorough look-through of the document and I was able to confirm and accept most of the corrections. A few were a bit odd, but it's not a big deal. To me, the most important thing is to move on, have the "chapter" on ODF 1.0 in ISO/IEC closed and now concentrate on development of ODF 1.2 and"ODFNext" or whatever the latest friendly name of ODF 1.2++ is. It is my feeling that the Danish mirror committee concurs with me here, so I would suspect us to approve the response from OASIS before the JTC1/SC34-plenary in Pragh in the end of March 2009. If anyone in JTC1/SC34 need a helping hand editing the response and turning it into a COR, please let me know.

On more thing, though. Can anyone tell me if the proposed changes to IS26300 in the Errata 01 document are all included in ODF 1.2?

So what about Japan?

Well, next week the first meeting of WG4 in SC34 will take place in Okinawa, Japan. The draft agenda can be seen at the Japanese SC34 website and is also listed here:

  1. Opening - 2009-01-28 10:00

  2. Roll Call of Delegates
  3. Adoption of the Agenda

  4. Overview of the JTC 1 Maintenance Process (WG 4 N0012)
  5. Defect Reports (WG 4 N 0015)

  6. Comment Collection Form
  7. Schedule for Reprints or Technical Corrigenda

  8. Accessing the SC 34/WG4 Email Archive and Document Repository (WG4 N0014)
  9. Future Meetings (F2F and Teleconferences)

  10. Any other business

  11. Closing

So we are basically going to bee looking at what to do next. How will we structure our work? How will we keep the pace up and (rather importantly), how will we collect suggestions or defects from the public. I know that it is important to the Danish mirror committee that the widest possible audience will be heard, so I am looking forward to some interesting discussions here.

We will of course take a look at the defects that have already made their way to our system. There are currently about 50 single defects reported, some by ECMA and some by various national bodies. The defects range from spelling errors through decisions from Geneva not being implemented correctly to errors in the XML-schemas for OOXML. Denmark will sadly not be able to contribute at this time, due to "shortage of labour" but we still hope that we will have something by the end of March.

And finally - I have not asked for a full list of participants to the meeting just yet, but the last figure I heard was about 20 people in total. That's a lot - but still less than the 120 we were in Geneva.

Smile

PS: Is it cold in Okinawa?

ISO publishes OOXML

ISO today decided to make OOXML ISO-edition (IS29500:2008) publically available.

You can choose to purchase a physical print of the specification or you can download it for free at the ITTF website.

Smile

 

Finally!

 

*sigh*

Microsoft Office 2007 - now with ODF-support

On October 22nd a long awaited email popped into my mailbox  - news of the release of first beta of Microsoft Office 2007 SP2. The reason for me longing to get my hands on this piece of software (and I have, in vain, tried to squize each and every single Microsoft employee I could to get it earlier) was not that it is a Service Pace for my current office application. Nor is it that I should now expect a more stable software package, because I am not troubled by instability in my everyday work with Microsoft Office.

My interest is caused by the fact that Microsoft Office 2007 SP2 includes support for ODF 1.1, and to be frank, it is not really because Microsoft has now chosen to support ODF natively in Microsoft Office - I am sure most would agree with me that they should have supported ODF a loooong time ago.

No, what will be interesting to see will be what it will mean for interoperability via ODF.

It's the standards, stupid

It has long been a public secret that you were walking in egg-shells when exchanging ODF-documents between ODF-supporting applications that are not somehow based/cloned from OpenOffice. Of course it is possible to exchange "BUI-documents" (yes, it is a acronym I have invented for this. It means Bold, Underline and Italics and represents rather simple documents without too much fancy pancy stuff in it.) but the best experience is when using OO spin-offs.

This makes perfect sense. When using the same program, you will get the least amount of problems. This is in essense the text-book/Page1 elevator pitch for Microsoft Office sales people

And this is exactly why ODF-support in Microsoft Office 2007 is interesting - it is the first major productivity application not based on OpenOffice that promises native ODF-support.

Now some people seem to think that as long as you use an open standard like ODF, PDF or OOXML, "interoperability" is somehow included. It is as if they are trying to apply some sort of Kant'ish "Das ding an sich"-thinking when they argue that achieved interoperability is somehow an intrinsic, guaranteed feature of an open standard. The funny thing is that every time I hear these arguments I always try (or fail, rather) to find a nice way of saying that they have understood squat of the problem and that they should try to work seriously with the subject at hand before speaking so bluntly about it.

The truth is of course somewhat different and this is why I genuinely applaud the work done with the OIIC in OASIS. The truth is that an open standard enables or facilitates good interoperability and that this potential is bigger for an open standard than for a closed standard. It is clear that both ODF and OOXML provide for better interoperability than the proprietary binary DOC-formats, but reversely the binary DOC-formats are also proof that fairly good interoperability is also possible when using non-open document formats. The world is not - once again - black/white, because it is clear that an open standard is not a requirement for interoperability - but it certainly helps a lot.

My point here is

Interoperability is not created by the standards. It is created in the applications based on the standard

All applications have bugs/quirks

This is the reason this is not about the standards - rather, it's about the applications. We are now in the situation that we have two big players supporting ODF (to a varying degree). But they will propably do it in different ways. We are now in a situation where we no longer have the luxury of the major ODF-producing/consuming applications being built on the same engine. My expectation is therefore that we will experience interoperability-problems with the ODF-applications, because Microsoft Office will likely do some things differently than the OpenOffice-clones (but comply to the ODF-spec at the same time).

This is why I asked Microsoft these two questions when I attented the first DII workshop in late July 2008 (they recently held another one but I did not attend).

1. How have you handled the possibility of using application specific settings in ODF?

As you know ODF has (and now also OOXML after BRM #¤"¤%¤#¤#&"#¤#"¤#¤%, thank you very much!) the so-called "config-item-set"-elements, which are used by the current ODF-implementations to store application specific behaviour. The problem with these elements and attributes is that they are not specified in the ODF spec, so there is really no obvious way to figure out what to do with the binary printer-blob that Lotus Symphony stores in ODF-documents produced by it. The short reply from Microsoft was: "We don't use it" and if you open the settings.xml-file in the ODF-package, it is empty. This is all fine and dandy - only problem is that you risk loosing information when exchanging documents.

2. How have you handled known bugs, features in other, major ODF-applications?

All applications have bugs - including ODF-supporting applications, so my question was perfectly legitimate. Again the answer was: "We don't handle it". With this answer Microsoft gets in line with alle the other application manufacturers that don't handle their competitor's bugs. There is e.g. a "bug" in KSpread's implementation of formulas (specifically the LOG-method). This is not handled by OpenOffice.org - even though it is fairly well known.The consequence is that strange things might happen when exchanging spreadsheets between KSpread and OOo Calc.

It didn't really matter before, 'cause not that many people use KSpread - but this picture is about to change with ODF-support in Microsoft Office 2007.

The bigger picture

I you will allow me to use one of my favorite, stupid expressions, then let's for a moment "step into the helicopter to see the bigger picture".

Because I believe that Microsoft's implementation of ODF will mean interoperability-problems using ODF-files in the short term. But I also think that it will mean better ODF-support on a broad scale - in the long run.

I have previously dealt with the MathML-support of OpenOffice.org which is slightly buggy. The ODF-spec says this about mathematical content:

Mathematical content is represented by MathML 2.0

And that's it.

As you might remember, the problems with OOo's MathML-support are due to the fact that OpenOffice.org requires a DOCTYPE-declaration in the MathML-object to display it. Also it seems that OOo will only display a certain kind of MathML. I have documented this in a previous post, but the short story here is that a simple mathematical equation in an ODF-document created using Microsoft Office 2007 SP2 will not display in OOo 3.0 nor Lotus Symphony 1.0 The ODF-file is perfectly valid and so is the MathML-fragment (tested using jing and the RelaxNG-schemas for ODF 1.1 and MathML as well as the MathML-tool from W3C, Amaya).

This example serves to illustrate my point: Microsoft's implementation of ODF will mean better support for ODF in the long run, because it forces existing problems in the applications to surface - and they can then be fixed.

And a small note for the trigger-happy ones: This is not due to the fact that Microsoft has implemented ODF - merely it is due to the fact that we will now have a new, major implementation of ODF to exchange documents with.

The problems described above have propably existed for years but no-one have noticed since most people use some kind of OpenOffice-clone for creation and display of ODF-documents. Now, on the other hand, errors in the applications (including in Microsoft Office) will be very obvious and the pressure to fix them will be much bigger. I also predict that Microsoft will have to speed up the release cycle of updates to their productivity-applications supporting ODF - at least when it comes to hotfixes of known problems. I don't think anyone will settle for bi-annual service packs for fixing trivial errors with big impact on productivity and interoperability.

Only remaining question now is: when will SP2 make it into Microsoft Office 2007? When it snows in Seattle?

(btw, I watched Grey's Anatomy yesterday, and according to them, it does snow in Seattle from time to time!)

JTC1/SC34 WG4 appointed Danish expert

On Friday, October 24th the Danish mirror-committee to JTC1/SC34 had its bi-monthly meeting. On the agenda was, amongst other things, assignment of participants to the newly created working groups in JTC1/SC34, WG4 and WG5.

For those of you not familiar with the establishment of these two groups, WG4 will deal with maintenance and development of OOXML. WG5 will work to "Develop principles of, and guidelines for, interoperability among documents represented using heterogeneous ISO/IEC document file formats." So the latter WG is not really about translating between document formats such as ODF and OOXML. No, it is about creating some guidelines that all (future or present) document formats could use as inspiration when designing the formats to be "interoperable".

I think the prospects of this could be really, really good and I hope as many stakeholders as possible chooses to join the work. It would be great to have som kind of guidelines for interoperability comparable to the Accessibility-guidelines from W3C (those that was added to OOXML during the BRM in Geneva).

We did not get any confirmed pledges to participate from the members of the Danish committee, but I was very pleased to hear that both ORACLE Denmark as well as the Technical University of Denmark would investigate if they could join the working group.

More interesting to me was assignment of participants for Working Group 4 to develop and maintain OOXML. Not surprisingly (since most of the participants of the committee are much more "anti-OOXML" than "pro-ODF" this point of the agenda received far less attention. We have in CIBER Denmark discussed for quite some time if we should join the working group, and we have reached the conclusion that we would. We do this of the following reasons:

  1. We believe that we would be able to deliver some technical skills that would be valuable to the work around OOXML
  2. We believe that it is important that development and maintenance of OOXML is not done exclusively by ECMA under the "ISO brand" and
  3. we believe that it is important to create a Danish "foot-print" on the development of the document format
So when the committee was asked if anyone would join, CIBER stepped up to the plate. I am happy to say that both the potential commitment of ORACLE Denmark and Technical University of Denmark and the confirmed commitment from CIBER received unanimous support from the other committee members.

So now what?

well, the first draft of the agenda for the meeting in Okinawa has been posted on the SC34-website. At present the agenda is this:

Draft agenda

  1. Opening - 2009-01-28 10:00
  2. Roll call of Delegates
  3. Adoption of the Agenda
  4. Defect Reports
  5. Any other business
  6. Closing

I think we will also talk about what to actually do in the foreseeable future both with respect to handling of defect reports and future maintenance. One of the things I will not accept (and I hope nor will the other appointed experts) is that the working group will primarily focus our time on defect handling - all while ECMA works on new stuff for OOXML and eventually dumping this on our table. So we will need to establish some sort of agreement around this.

Also we will need to talk about future places to meet. Next meeting will likely be held in Pragh, and I would like to some how make sure that future meetings are held in cities near major airport hubs around the world. It will take me about 24 hours to travel from Copenhagen to Okinawa, and that travel period would be cut in two, if the meeting was held in e.g. Tokyo or Kyoto. This is not a criticisme of the Japaneese decision to have the meeting in Okinawa, but I believe we would indirectly encourage more participation if the required travelling was not so extensive.

Oh ... and did anyone notice that I was only mentioned in the "Small news"-section of Alex Brown's recent post "More Standards news"? This really helps keeping both feet solidly on the ground and not thinking too much of myself.

Wink

IS 29500 has been sent to ITTF for publication

This email just landed in my mailbox this morning:

ISO/IEC JTC1/SC34 N1080
Final Text for ISO/IEC 29500-1, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 1: Fundamentals and Markup Language Reference
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
ISO/IEC JTC1/SC34 N1081
Final Text for ISO/IEC 29500-2, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 2: Open Packaging Conventions
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
ISO/IEC JTC1/SC34 N1082
Final Text for ISO/IEC 29500-3, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 3: Markup Compatibility and Extensibility
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
ISO/IEC JTC1/SC34 N1083
Final Text for ISO/IEC 29500-4, Information technology --Document description and processing languages -- Office Open XML File Formats -- Part 4: Transitional Migration Features Due date: --
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
This will finally make it possible for the NBs of ISO to verify that the editorial instructions from the BRM has made it into the final text. I have not yet had the time to investigate and verify that the Danish changes has been implemented, but I am sure lots of blogging will take place over the next days.
 
Smile

Day one of IS29500?

On August 15th 2008 ISO/IEC gave their "Go ahead" on the appeal against the IS29500-approval and the process leading to it. The decision was covered almost everywhere and the phrase that caused the most speculation was this:

According to the ISO/IEC rules, DIS 29500 can now proceed to publication as an ISO/IEC International Standard. This is expected to take place within the next few weeks on completion of final processing of the document, and subject to no further appeals against the decision.

(my emphasis)

So the battle was clearly not over since the appeal itself could also be appealed. The question was: until when? Then on September 1st news broke that the appealing countries would not appeal the decision to overthrow the appeals. Since it is my understanding that only the appeallants could appeal overthrowing the appeal (confused, anyone?), I suppose the case was finally closed.

But we are still waiting for the revised text from ITTF. I would imagine that they would hold the text until the period for appealing the appeal-overthrow was over with ... but when is that? This morning it occured to me that if the period was 30 days - today is the first working day after the deadline.

Could this be it then? Could today be the "Birthday" of IS29500?