a 'mooh' point

clearly an IBM drone

Excel 2010 (Microsoft Office 2010 CTP TO-do list (01)

I have been looking at how Excel 2010 has implemented various features using ISO/IEC 29500-4:2008 - also known as "OOXML Transitional".

Background:

ISO/IEC comes in two variants, a "transitional" (T) and a "strict (S). Transitional is the one containing all the legacy stuff such as VML, legacy digest algorithms, leap year bug etc. S does not contain these things and is therefore considered "more pure". T is practically identical to the document format submitted to ISO/IEC by ECMA - also known as "ECMA-376". A document conforming to ECMA-376 will therefore also conform to T, since the schemas are practically identical.

T is currently a superset of S, which means that "T includes everything in S". This has the effect that within a T document, a vendor can take advantage of new features in S while still being in "the comfort zone of T". T is therefore considered by some as providing a “graceful migration path to S”, meaning that vendors can change their existing T-compliant code, on a case-by-case basis, to gradually support the features of S instead of those of T.

Update 2009-11-17: By reading in the spec today, I realised that I was wrong in saying that the leap-year bug was not in S - indeed it is. I appologize for the confusion. Thank you, Jens Hørlück for pointing this out to me.

As you know, Microsoft is claiming "IS29500 compliance" for their latest incarnation of Microsoft Office. OOXML is a big and complex document format, and even though OOXML was not published until waaaay after "feature freeze" of Office 2010, I thought it'd be interesting to take a look at how Excel has reached compliance.

When looking at the markup generated by Excel 2010, it is important to remember the background information above, since you'd expect to find traces of this "graceful migration" in the markup already.

Conformance class

The DIS-process added a new attribute to the root elements of the documents. This element would specify the conformance level of the document. As to not invalidate existing documents, the default value for this attribute was “transitional”. The other possible value is "strict".

So when creating a new spreadsheet using Excel 2010, which markup does it create?

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook>
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>
  </sheets>
</workbook>[/code]

(condensed for easier reading)

The above is the markup for the root element of a SpreadsheetML spreadsheet.

So the conformance class attribute is omitted – making the document a T-document. I know that it is basically nonsense to add an attribute with the default value to implement support for this behavior, but for those of us that like looking at markup in text editors, I’d recommend Microsoft to include the conformanceClass-attribute with the appropriate value. As it is a no-brainer to implement for a consumer – it should be a no-brainer to implement for a producer.

Also – the existence of the conformanceClass-attribute is the only (or, best, anyway) indicator that a T document was created according to the schemas of ISO/IEC 29500 and not those of ECMA-376.

(and yes, I know that they are practically identical, but adding the attribute would be a clear indication that Microsoft wishes to produce markup according to ISO/IEC 29500 and not ECMA-376 1st Ed.)

Microsoft, please fix!

ISO-dates

Support for ISO-dates in SpreadsheetML just might be the biggest issue we faced in the DIS-process. It was hugely controversial and as such, a large amount of time was used at the BRM to figure out a solution.

Excel 2010 allows a user to manually chose the option to persist dates in ISO-8601 format. This is done through the "Settings-dialogue" as

So let us see what Excel 2010 does with dates. I opened the application and in a cell I wrote the date 28-02-1900. I then unzipped the XLSX-file and found this markup in the Worksheet-part:

[code:xml]<sheetData>

  <row r="1" >
    <c r="A1" t="d">
      <v>1900-02-28</v>
    </c>
  </row>
</sheetData>[/code]

So the date is actually persisted in the file using ISO-8601 notation and the cell is correctly typed with a t=”d”-declaration.

Now, depending on your point of view, this is a good thing, because it takes us further from the hell-hole I have previously written about with respect to “date-typing-of-numbers” in ECMA-376.

Thank you, Microsoft – good for you!

Smile

Leap-year bug

So … with the “Save-as-ISO- 8601-dates”-setting to “yes, please” as previously described, I added a formula to the spreadsheet above in cell B1. The formula simply was “=A1+1”.

Anyone up for guessing the result?

Well, the result was this:

That is funny because of two things:

  1. I set the “Save-as-ISO Dates”-setting to “yes”, so I was not expecting the leap-year-bug to still be used.
  2. I am pretty sure that the date-representation “1900-02-29” is not a valid ISO-date.

So I looked at bit at the markup generated for this spreadsheet.

The result was:

[code:xml]

<sheetData>
  <row r="1" >
    <c r="A1" t="d">
      <v>1900-02-28</v>
    </c>
    <c r="B1">

      <f>A1+1</f>
      <v>60</v>
    </c>
  </row>
</sheetData>[/code]

Ahem … dear Microsoft … WTF?

So you have persisted the first date as an ISO-date,

but the result of calculating a value based on it – you persist the result as a serial date? In which universe would this make sense?

(I should note that whenever doing calculations resulting in dates not being invalid, the date is correctly persisted by Excel 2010 as an ISO-8601 date)

But – as I was writing this I thought “wasn’t there something about an at-BRM-added attribute called “dateCompatibility”? This was the attribute used to determine if the date-system used should support the leap-year-bug or not. And truth be told, the attribute is not used – making the dateCompatibility default to “true” – hence honoring the leap-year-bug.

But – this could simply be a glitch of Excel2010, so I added the attribute myself to force Excel2010 to discard date compatibility. I’d then suspect the result of adding 1 to the date of 1900-02-28 would be 1900-03-01.

The markup was this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook dateCompatibility="false">
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>
  </sheets>
</workbook>[/code]

Result: still 1900-02-29

I this calls for a few conclusive words before I move on:

Microsoft, I think you should stop using ISO-8601 dates in SpreadsheetML in T because of the arguments provided by my fellow WG4-expert Gareth Horton. But if you insist on using ISO-8601-dates in SpreadsheetML-T, you should do it right,. If you ask me, a user manually choosing to have Excel2010 use ISO-8601 dates, is a clear indication that the user does not want to deal with the leap-year-bug. The user has even decided to continue despite your warning that he or she might lose precision. So when a user performs this specific action, set the damn dateCompatibility-attribute to “false” and persist ALL dates as ISO-8601 dates.

You have a golden opportunity here to ditch the leap-year-bug for all new documents. If you act wisely and add the conformance attribute (so consuming applications can distinguish the files from ECMA-files) and setting the dateCompatibility-attribute to “false” when persisting dates as ISO-8601 dates, you’d have done really, really good. You have the by far biggest implementation of OOXML – so the vendors will follow your lead. Any application adding support for ISO-8601 dates in transitional documents in their existing ECMA-376-compliant code, will be able to kno

w what to do - and they'll do it the way you do it.

Now, I know you would like to demonstrate to all of us that you want to go towards “the S way”, but you need to do it right. Mixing ISO-8601 dates with serial dates simply to be able to maintain the leap-year-bug is not the right thing to do.

I have absolutely no idea of how the internals

of either the calculation instructions in Excel works – or even the Excel-team itself, but the way you have “added” ISO-8601 dates indicates to me, that you have changed barely anything except for the persistance-mechanism for dates.

I’m sure you’d argue that there is no longer time to change the inner workings of Excel2010, and you might very well be right about that. If that is indeed the case, I suggest you remove the option of saving dates as ISO-8601-dates from Excel2010 as soon as possible to avoid doing any more harm to the ecosystem around OOXML that we both share deep concerns for.

VML

At the BRM the biggest pile of things added to OOXML was where a feature was previously only possible with the use of VML. Of these include adding comments to cells in spreadsheets. The comments themselves are stored in a “Comment part” of the OPC-package, but the display of the comment was done using VML. Luckily, at the BRM markup was added to allow DrawingML to be used instead.

So I used the spreadsheet from before and added a comment in cell B2 (the one with the crappy leap-year-bug result)

.

Anyone wanna guess if this comment is displayed using VML or DrawingML?

Yes, you guessed it … somewhere in the back of the document is a VML-fragment containing the “box” containing my comment.

[code:xml]<xml xmlns:v="urn:schemas-microsoft-com:vml"

 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel">
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1"/>

 </o:shapelayout><v:shapetype id="_x0000_t202" coordsize="21600,21600" o:spt="202"
  path="m,l,21600r21600,l21600,xe">
  <v:stroke joinstyle="miter"/>
  <v:path gradientshapeok="t" o:connecttype="rect"/>
 </v:shapetype><v:shape id="_x0000_s1025" type="#_x0000_t202" style='position:absolute;
  margin-left:120.75pt;margin-top:1.5pt;width:126pt;height:55.5pt;z-index:1;

  visibility:hidden;mso-wrap-style:tight' fillcolor="#ffffe1" o:insetmode="auto">
  <v:fill color2="#ffffe1"/>
  <v:shadow on="t" color="black" obscured="t"/>
  <v:path o:connecttype="none"/>

  <v:textbox style='mso-direction-alt:auto'>
   <div style='text-align:left'></div>
  </v:textbox>
  <x:ClientData ObjectType="Note">

   <x:MoveWithCells/>
   <x:SizeWithCells/>

   <x:Anchor>
    2, 15, 0, 2, 4, 55, 3, 16</x:Anchor>

   <x:AutoFill>False</x:AutoFill>
   <x:Row>0</x:Row>
   <x:Column>1</x:Column>
  </x:ClientData>
 </v:shape></xml>[/code]

If this was a tweet on Twitter, I’d finish it off using the tag #fail.

Document protection

And finally – what about document protection? “Document protection” is the mechanism in OOXML to allow applications to open documents in “read-only-mode”, to protect worksheets from editing etc. These are not really protecting the document as such, because a hash of the password is simply saved – but everything is still in clear text. Document protection was one of the areas where OOXML was fundamentally changed, allowing more robust algorithms to be used – and not only the weak “legacy-algorithm” previously supported by Office 2007.

So in Excel2010 I chose to protect the active sheet and looked at the markup again. Again, I was hoping to find some of the new markup, but again I was let down. Excel2010 still uses the weak algorithm and it still uses the legacy markup to persist it.

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<worksheet>

  <sheetData/>
  <sheetProtection password="8985" sheet="1" objects="1" scenarios="1" />
</worksheet>[/code]

This is not only an annoyance for those not working on the Windows-platform – it is also an annoyance for those of us working with e.g. .Net development. If Excel2010 used SHA1, SHA256 or one of the other modern algorithms supported directly by .Net framework, we’d be able to use it straight out of the box. Now all of us still need to add additional code/assemblies to our code – simply to protect a whorksheet.

At the end …

I must admit that I am a bit disappointed. I know that barely a year has gone since the publication of ISO/IEC 29500, but I had expected more of the things we discussed at the BRM to be implemented. At the very least I was expecting some of the more “easy ones” to have been implemented – like the conformanceClass-attribute, VML in comments and document protection. It would have been a nice token of “good faith”.

Now, we are basically left with nothing. Do note, however, that the observations above are by no means a comprehensive test. These represent simply a few of the thoughts on the top of my head while writing this.

Next time: WordpressingML

PS: I have been trying to find a list of the stuff from the BRM that was implemented in Office 2010 (like ISO-8601 dates), but I have been unsuccessful. So if anyone can point me towards such a list, I'd be happy to include those results above.

Denmark votes "yes" on IS29500 COR1 and FPDAM1

I know it has been a couple of weeks, but I just wanted to share current development with you.

On September 7th (in Danish), the Danish mirror committee to ISO/IEC JTC1 SC34 met at Danish Standards in Charlottenlund. On the agenda was, amongst other things, processing of documents under ballot. The relevant documents to WG4 was these

As appointed expert from Danish Standards in WG4, I have been working hard with the other experts in WG4 on these papers and I have for each meeting in Denmark provided oversights to the mirror committee on the current work. The members of the Danish committee have access to the same set of papers that I have, so we have primarily been discussing the more controversial ones - like usage of ISO-8601 dates in transitional files, reintroducing ST_OnOff in transitional schemas and changing the namespace name for strict files. A couple of times Danish committee members have requested information on more "trivial stuff", and we have then discussed this.

At the meeting of September 7th, I gave a quick sporadic overview of the more tough parts of COR1 and AMD1 and no comments were presented. We talked a bit about general principles of the work in WG4, but that was basically that.

After this, Denmark (Danish Standards) approved the document sets for COR1 and AMD1.

Obviously I think this is great news and the chairman of the Danish committee expressed his appreciation of the work put into creating these files.

Microsoft-stacking in WG4

Traditionally, for every meeting we have in WG4, some conspiracy-theory is born on how much money the delegates received from Microsoft, how many sports-cars we each got from Microsoft or how we each had a Microsoft employee sitting on our laps dictating what we should say.

So, I thought I'd beat the usual nut jobs to it and present the attendance list myself. The minutes from the meeting will be available soon, but who wants to wait for exiting news like this?

The attendance list was this:

Name Affiliation
Employer/sponsor
Pia Lange
Host Dansk Standard
Makato Murata
WG4 Convener
International University of Japan
Sam Oh
SC34 Chair
Sungkyunkwan University
Keld Simonsen
NO HoD DKUUG?
Dave Welsh
US HoD
Microsoft
Mario Wendt
DE HoD
Microsoft
Klaus-Peter Eckert
DE Fraunhofer Fokus
Jesper Lund Stocholm
DK HoD
CIBER
Rex Jaeschke ECMA HoD, project editor Consultant
Doug Mahugh
ECMA
Microsoft
Shawn Villaron
ECMA
Microsoft
Kimmo Bergious
FI HoD
Microsoft
Alex Brown
GB HoD
Griffin Brown Digital Publishing Ltd.
Gareth Horton
GB
Datawatch
Jaeho Lee
KR HoD
University of Seoul
Jung-Jin Yang
KR
The Catholic Univeristy of Korea

So out of a total of 1416 people attending ... 56 people were in some way affiliated with Microsoft and/or ECMA. What the hell, throw the Microsoft shill Alex Brown into the pot as well - that'll make it a total of 67 people.

I don't know what to say ... I'm shocked.

Update: I have been notified that I missed two persons on the list, Dave Welch and Keld Simonsen. List and numbers have been updated accordingly. Smile

Extending OOXML

This article will have to topics - one about extending OOXML using the built-in extension mechanisms and one about extending OOXML itself.

Using built-in mechanisms

As I have written about earlier OOXML has a (fun) part containing mechanisms for extending OOXML with vendor/domain-specific extensions. That part is "Part 3 - Markup Compatibility and Extensibility". The part describes different techniques when extending OOXML - most interesting is propably the sections about "Markup Compatibility Attributes and Elements" describing ways to extend OOXML while enabling compatibility to e.g. earlier/current version of the specification.

So if you were a vendor wanting to add something to the spec - but couldn't wait for the slow ISO pace or simply needed the competitive edge of not revealing anything about future software releases to your competitors ... what could you do?

The first thing you should do is to decide if you want your new stuff to eventually make it into the spec. If you don't want that - you're done already.

Assuming you want it into the spec, here are a couple of hints to how you might approach it:

  1. Document your extensions thoroughly
  2. Present these extensions to SC34/WG4 with justification to how and why you want it into the spec
  3. Work with us to polish the nitty-gritty details that you overlooked
  4. Make sure there are no legal nor technical barriers to implementing these new features for your competitors
  5. Wait for the stuff to eventually be included in IS29500

So the real target of this is - if you haven't already guessed it - Microsoft. So to be even more specific, here's a little list of things to do for Microsoft - in case they want to extend IS29500:

You will propably have some additions to IS29500 in your implementation of Office 14. Assuming that you will at some point like these to be added to IS29500, this is what you should do:

  1. Document your extensions thoroughly. Remember, the quality of the documentation will be under the same scrutiny as the text of DIS29500 so please do it right the first time.
  2. Add the documentation of your extensions to your "Implementer's notes" on the DII-website. 
  3. Present these extensions to SC34/WG4 with justification to how and why you want it into the spec.
  4. Work with us to polish the nitty-gritty details that you overlooked.
  5. Include the extensions and the documentation for it in your OSP.
  6. Wait for the stuff to eventually be included in IS29500.

Remember, the minute the first public beta of Office 14 hits the web, the documentation of the extensions as well as inclusion in OSP should be finished. Not a month later, not a week later - on day one!

Extending IS29500 itself

There has been a lot of talk lately to how IS29500 will be extended in the future. Specifically, how - and where - will new additions be included? IS29500 is comprised of two schema sets - a strict set and a transitional set. Currently the strict set is created from the transitional set, so strict is in fact a proper subset of the transitional set.

However - there is no guarentee that this will always be so.

My gut feeling is that transitional should be preserved as the "reflection" of the existing Microsoft Office documents (until March 2008) - in other words in term with the scope of IS29500.  I think that any new stuff should be added to the strict schema set only. The term "transitional" clearly implies this. As I recall the feeling in Geneva at the BRM, the idea behind the transitional set was, that eventually it would no longer be needed and hence removed from the standard - at some point in the future. If we continue to add new features to the transitional set, we will never get to the point where we can honor the sentiment of this particular issue.

...  now at the moment, we haven't decided anything yet ... so right now anything goes.

But what are your thoughts?

Markup compatibility and extensibility (MCE)

Part 3 of ISO/IEC 29500 is the fun part and if you haven’t read it yet, you really should do so – especially if you are thinking about implementing an IS29500-document consumer. Part 3 basically consists of two distinct areas – one that deals with compatibility and one that deals with extensibility. The first area is the target of this post.

To any markup consumer and producer of a format not cast in stone it is important to be able to ensure compatibility both forwards and backwards as the format changes over time. This is where the “compatibility-thingy” comes into play.

The compatibility-features of OOXML enable markup producers to target different versions of applications supporting different versions of the specification or different features all together. The tools to do this are called “Alternate Content Elements” (ACE) and “Compatibility-rule attributes” and Part 3 is supposedly an exact remake of how compatibility and extensibility is handled in the binary Microsoft Office files.

The latter tool enables markup producers to “force” other markup producers to preserve specific content – even if it is not known to them as well as instructing markup producers to which parts of the document could safely be ignored. It can even instruct markup consumers to fail if it doesn’t understand some parts of the markup. If this sounds kind-of “SOAP-ish” to you, the attribute name “MustUnderstand” to enable just this should sound even more familiar to you.

The first tool can be thought of as sort of “a switch statement for markup”. It allows a markup producer to serve alternate versions of markup to target alternate feature-sets of different applications. The diverging markup would be listed as different “alternate content blocks” or “ACB’s”, and it is essentially an intelligent way for a markup producer to tell a consumer that “if you don’t understand this bit, use this instead”.

An interesting use case would be to use ACE to improve interoperability when making text documents with mathematical content. It has long been a public secret that interoperability with OOXML was improving day by day – but not with mathematical content. Mathematical content in OOXML (or “OMML”) has for some reason not been a top priority with implementers of OOXML, so interoperability has been really, really bad.

Now, wouldn’t it be cool if there was some way for markup producers to serve MathML as well as OMML to consuming applications? Let’s face it – most of the competition to Microsoft Office 2007 is from applications supporting ODF, and they all (to a varying degree) support MathML. So a “safe assumption” would be that “if I create an OOXML text document with OMML and send it to a different application, it probably understands MathML much better than OMML”. Wouldn’t it be cool, if you could actually do this?

Well, to the rescue comes ACE.

ACE enables exactly this use case. ACE is based on qualified elements and attributes, so as long as you can distinguish between the qualified names of the content you are dealing with, ACE is your friend.

So let’s see how this would work out.

Take a look at this equation: 

 

In Office Math ML (OMML) this is represented as:

[code:xml]<m:oMath>
  <m:r>
    <m:t>a=</m:t>
  </m:r>
  <m:f>
    <m:num>
      <m:r>
        <m:t>b</m:t>
      </m:r>
    </m:num>
    <m:den>
      <m:r>
        <m:t>c</m:t>
      </m:r>
    </m:den>
  </m:f>
</m:oMath>
[/code]

In MathML thhe formula is represented as:

[code:xml]<math:math >
  <math:mrow>
    <math:mi>a</math:mi>
      <math:mo >=</math:mo>
      <math:mfrac>
        <math:mi>b</math:mi>
        <math:mi>c</math:mi>
      </math:mfrac>
    </math:mrow>
</math:math> [/code]

(both examples have been slightly shrinked)

So how would one specify both these ways of writing mathematical content? Well, it could look like this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document
  xmlns:omml="http://schemas.openxmlformats.org/officeDocument/2006/math"
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
 
  xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

  >
  <w:body>
    <w:p>
      <omml:oMathPara>
        <mc:AlternateContent xmlns:mathml="http://www.w3.org/1998/Math/MathML">
          <mc:Choice Requires="mathml">
            <mathml:math >
              <mathml:mrow>
                <mathml:mi>a</mathml:mi>
                <mathml:mo >=</mathml:mo>
                <mathml:mfrac>
                  <mathml:mi>b</mathml:mi>
                  <mathml:mi>c</mathml:mi>
                </mathml:mfrac>
              </mathml:mrow>
            </mathml:math>
          </mc:Choice>
          <mc:Choice Requires="omml">
            <omml:oMath>
          <omml:r>
            <omml:t>a=</omml:t>
            </omml:r>
            <omml:f>
            <omml:num>
              <omml:r>
              <omml:t>b</omml:t>
            </omml:r>
            </omml:num>
            <omml:den>
              <omml:r>
                <omml:t>c</omml:t>
              </omml:r>
            </omml:den>
            </omml:f>
            </omml:oMath>
          </mc:Choice>
          <mc:Fallback>
            <!-- do whatever -->
          </mc:Fallback>
        </mc:AlternateContent>
      </omml:oMathPara>
    </w:p>
  </w:body>
</w:document>[/code]

So you simply add the compatibility-namespace the file and add the "AlternateContent"-element. This element includes a list of "choices" and possibly a fallback choice. The choices are evaluated in the sequence they appear in the list of "choices".

And the benefit? Well, you can now have your cake and eat it too. If the consuming application supports it, it will display the equation based on the mathml-fragment – otherwise it will use OMML.

This is immensely interesting and applies to all sorts of places and use cases – heck, you can even use it to gain advantage of some of the new stuff in the strict schemas of IS29500 while keeping intelligent compatibility with existing applications only supporting ECMA-376 1st Ed. Imagine the ECMA-376-way of doing dates in spreadsheets and now add the possibility of using some of the new functionality added at the BRM - and without the risk of breaking applications nor losing data.

… that is if we change the namespace of the strict schemas, of course.

Smile

Lo(o)sing data the silent way - all the rest of it

Ok - this post is going to be soooo different than what I had envisioned. I had prepared documents for "object embedding" and "document protection" but when I started testing them, I soon realized that only Microsoft Office 2007 implemented these features - at least amongst the applications I had access to. These were:

Microsoft Office 2007 SP2

OpenOffice.org 3.0.1 (Windows)

OpenOffice.org 3.0.1 (Mac OS X)

NeoOffice (Mac)

iWorks 09 (Mac)

The reason?

  • OOo3 doesn't fully support object embeddin
  • OOo3 doesnt support document protection
  • iWorks doesn't support object embedding at all
  • iWorks doesn't support document protection

So I'll just give you one example of what will happen when strict documents come into play - when applied to document protection.

Document protection is the feature that allows an application to have a user enter a password and unless another user knows of this password, he or she cannot open the document in, say, "write-mode". There is no real security to it, though, it is simply a hashed password that gets stored in the document.

This data is stored in the "settings.xml"-file in the document, and this was rather drastically changed during the ISO-process.

If you use Microsoft Office 2007 to protect your document, it will result in an XML-fragment like this:

[code:xml]<w:documentProtection
  w:edit="readOnly"
  w:enforcement="1"
  w:cryptProviderType="rsaFull"
  w:cryptAlgorithmClass="hash"
  w:cryptAlgorithmType="typeAny"
  w:cryptAlgorithmSid="4"
  w:cryptSpinCount="100000"
  w:hash="XbDzpXCrrK+zmGGBk++64G99GG4="
  w:salt="aX4wmQT0Kx6oAqUmX6RwGQ=="/>[/code]

You will have to look into the specification to figure out what it says, but basically it tells you that it created the hash using the weak algorithm specified in ECMA-376.

But as I said, this was changed during the BRM. Quite a few of the attributes are now gone for the strict schemas, and my take on a transformation of the above to the new, strict edition is this:

[code:xml]<w:documentProtection
  w:edit="readOnly"
  w:enforcement="1"
  w:algorithmName="typeAny"
  w:spinCount="100000"
  w:hashValue="XbDzpXCrrK+zmGGBk++64G99GG4="
  w:saltValue="aX4wmQT0Kx6oAqUmX6RwGQ=="/>[/code] 

'Only thing I am a bit unsure about is the value for the attribute "algorithmName", but I guess it would be "typeAny". The result? Microsoft Office 2007 detects that the document has been protected, but it cannot remove the protection again - presumably due to the new attributes added to the schemas. I thought about creating new values using e.g. SHA-256 as specified in the spec, but the chances that Microsoft Office 2007 would detect this in unknown attribute values are almost nothing, so I didn't bother doing this. Feel to play around with it yourself.

The Chase

We need a namespace change for the strict schemas - and am thinking about ALL of the strict schemas including OPC. If we don't do it this way, my estimate is that we will lose all kinds of data - and the existing applications will not (as they behave currently) inform their users of it. Making existing applications break is a tough call, but I value data/information integrity more than vendors needing to update a bit of their code.

And as for the conformance attribute? Well, the suggestion as it is currently is to enlarge the range of allowed values of this attribute. Somehow I think it makes sense to enlarge the range as well.I think it would make sense to have the values one of

  • strict
  • transitional
  • ecma-376

or something similar. Then when we make a new revision at some point in the future, we can add version numbers to them at that time. Changing the namespaces will also make it possible to use MCE to take advantage of new features of IS29500 while maintaining compatibility with existing applications supporting only ECMA-376 1ed. (more about this later)

And what should the schemas be named?

Well, they are currently like "http://schemas.openxmlformats.org/wordprocessingml/2006/main" . So an obvious choice would be "http://schemas.openxmlformats.org/wordprocessingml/JLUNDSTOCHOLM/main"

Smile

... or maybe simply "http://schemas.openxmlformats.org/wordprocessingml/main" would be better? Of course it introduces easy causes for errors for developers, so maybe "http://schemas.openxmlformats.org/wordprocessingml/iso/main" would be even better?

Losing data the silent way - ISO8601-dates

In Prague we spent quite some time discussing how to deal with the fact that applications supporting ECMA-376 1st Ed. not necessarily support ISO/IEC 29500:2008 strict as well. Our talks revolved primarily around how major implementations dealt with the modified functionality of the elements <cell> and <v> in SpreadsheetML now that ISO-dates are allowed as content of the <v>-element. But “dates in spreadsheets” is not the only place where changes occurred. Changes were also made to other areas, including

  • Object embedding
  • Comments in spreadsheets
  • Hash-functions for document protection

This will be the first post in a series of posts evolving around how IS29500 differs from ECMA-376 and how existing applications behave when encountering a document with new content. What I will do here is to create some sample documents and load them in the applications I have access to that supports OOXML the best. In my case these are Microsoft Office 2007 SP2, OpenOffice.org 3.0.1 and NeoOffice for Mac and Apple iWorks. If you want to contribute and you have access to other applications, please let me know the result and I’ll update the article with your findings. If you have access to Microsoft Office 2007 SP1, I'd really like to know. When the series is done I’ll post a bit about MCE and how it might help overcome some of the problems I have highlighted (if we’ll get to change the namespace for the strict edition of IS29500 schemas)

I should also note that as the series progresses, the examples I make will increase in complexity. A consequence of this will be that my examples will be more of a “magic-8-ball-type prediction” than “simple examples of IS29500-strict documents”. Since there is not a single application out there supporting IS29500-strict, the examples will be my “qualified guesses” to how applications might interpret IS29500-strict when they implement it.

ISO-8601 dates in SpreadsheetML

Let me first touch upon the problem with dates in SpreadsheetML since this was the problem we talked about the most. Gareth Horton from the UK national body hand-crafted a spreadsheet document with these new dates. I have modified his example a bit to better illustrate the point. Files are found at the bottom of this post.

In the original submission to ISO dates were persisted in SpreadsheetML as “Julian numbers” (serial representation) and subsequently formatted as dates using number format styles.

[code=xml]<sheetData>
  <row r="1">
    <c r="A1" s="1">
      <v>39904</v>
    </c>
  </row>
  <row r="2">
    <c r="A2" s="1">
      <v>39905</v>
    </c>
  </row>
(…)
  <row r="10">
    <c r="A10" s="1">
      <v>39913</v>
    </c>
  </row>
</sheetData>[/code]

So the above would create a column with 10 rows displaying the dates from April 1st to April 10th.

Let’s change one of the cells to contain a date persisted in ISO-8601 format.

[code=xml]<row r="9">
  <c r="A9" s="1" t="d">
    <v>2009-04-09T01:02:03.04Z</v>
  </c>
</row>[/code]

So the cell contains an ISO-8601 date and it is formatted using the same number format as the other cells. I have added a bit of additional data to the spreadsheet to illustrate the problem with using formulas on these values.

Result

The interesting thing to investigate iswhat happens when this cell is loaded in a popular OOXML-supporting application. Note here that the existing corpus of implementations supporting OOXML supports the initial edition of OOXML, ECMA-376 1st Ed.So they would have no way to look into the specification and see what to do with a cell containing an ISO/IEC 8601 date value.

Microsoft Excel 2007 SP2

As you can see Excel 2007 screws up the content of the cell. And on top of that, should you try to manipulate the content of the cells with formulas, they are also basically useless. The trouble? Well, you are not notified that Excel 2007 does not know how to handle the content of the cell, so chances are that you’ll never find out – until you find yourself in a position where there are real consequences to the faulty data and kittens are killed.

OpenOffice 3.0.1 Calc

 

 

The result here is almost the same. Data is lost and the user is not notified.

NeoOffice for Mac

 

Again we see the same result. This is not so strange, since the latest version of NeoOffice shares the same code base as OOo 3.0.1 so behavious should be the same.

iWorks 09 Numbers

 



Wow, so for iWorks on the Mac, the user is actually warned that something went wrong. Only trouble is - it does not warn you that the content of the cell is not valid - it informs you that the system cannot find the font "Calibri".

Conclusion

It is pretty hard to conclude enything but "this sucks!". None of the applications warn the user that they have lost data - and they all do exactly that - loose data.

Original file: Book1.xlsx (8.82 kb)

Modified file: 

book2.xlsx (8.22 kb)

IS 29500 has been sent to ITTF for publication

This email just landed in my mailbox this morning:

ISO/IEC JTC1/SC34 N1080
Final Text for ISO/IEC 29500-1, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 1: Fundamentals and Markup Language Reference
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
ISO/IEC JTC1/SC34 N1081
Final Text for ISO/IEC 29500-2, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 2: Open Packaging Conventions
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
ISO/IEC JTC1/SC34 N1082
Final Text for ISO/IEC 29500-3, Information technology -- Document description and processing languages -- Office Open XML File Formats -- Part 3: Markup Compatibility and Extensibility
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
ISO/IEC JTC1/SC34 N1083
Final Text for ISO/IEC 29500-4, Information technology --Document description and processing languages -- Office Open XML File Formats -- Part 4: Transitional Migration Features Due date: --
Status: This text has been submitted to ITTF for publication. It is circulated to the SC 34 members for information.
 
This will finally make it possible for the NBs of ISO to verify that the editorial instructions from the BRM has made it into the final text. I have not yet had the time to investigate and verify that the Danish changes has been implemented, but I am sure lots of blogging will take place over the next days.
 
Smile

Day one of IS29500?

On August 15th 2008 ISO/IEC gave their "Go ahead" on the appeal against the IS29500-approval and the process leading to it. The decision was covered almost everywhere and the phrase that caused the most speculation was this:

According to the ISO/IEC rules, DIS 29500 can now proceed to publication as an ISO/IEC International Standard. This is expected to take place within the next few weeks on completion of final processing of the document, and subject to no further appeals against the decision.

(my emphasis)

So the battle was clearly not over since the appeal itself could also be appealed. The question was: until when? Then on September 1st news broke that the appealing countries would not appeal the decision to overthrow the appeals. Since it is my understanding that only the appeallants could appeal overthrowing the appeal (confused, anyone?), I suppose the case was finally closed.

But we are still waiting for the revised text from ITTF. I would imagine that they would hold the text until the period for appealing the appeal-overthrow was over with ... but when is that? This morning it occured to me that if the period was 30 days - today is the first working day after the deadline.

Could this be it then? Could today be the "Birthday" of IS29500?

Should the appeals stop all work on IS 29500?

Well, you tell me. All kinds or rumours are circulating amongst SC34-members of what the consequences of the appeals of approval of IS 29500 could be. The latest rumours are dwelving on the possibe outcome that all work on IS 29500 will be suspended until appeals are sorted out. These rumours are quite possibly fed by the email from the JTC1 chair to the members of SC34. Amongst other things it said

At its recent plenary meeting in Norway, JTC 1/SC 34 established 2 ad hoc groups concerned with the collection of comments on and maintenance of ISO/IEC 29500. Adjudicating these appeals will necessarily delay ISO/IEC 29500, at a minimum, and this delay could have a significant effect on the work of these ad hoc groups. SC 34 and its ad hoc groups should take this into account when planning their future work.

I am a participant in AHG1 (maintenance of IS 29500) and the purpose of the group is to plan and define the future work onIS 29500 in SC34 in, quite possibly, SC34 Working Group 4. We will meet in London in the end of July 2007 to start our work. From the roster/mail list of the group, I see quite a lot of "big cahunas" of SC34 and indeed in the debates of the last year or so. All these people have signed up to contribute to the onwards development and maintenance of IS 29500.

Now, let's pause for a second to look at the possible outcome of the appeals and their impact on the work in e.g. AHG1:

Approval is overturned

This will mean that IS 29500 no longer exists and that the work in the two AHGs have been a waste of time. We will spend two days in London and these days could then have been used more productively on other matters - like contributing to development of ODF 1.2 .

Appeals are dismissed 

If the appeals are dismissed - we will already have started the work on IS 29500. We will already have an idea of what to do and a preliminary view of what to do in the near future.

The outcome of the appeals is currently blowing in the wind, and I would not be the one to predict the outcome. However, I will say, that I would much rather start the important work on IS 29500 maintenance as soon as possible. This is in fact the core of the worst-case scenario: we just loose two days of work. I think most will agree that IS 29500 needs proper maintenance and I would prefer to get that work started - even though the work might be wasted. If all work in the AHGs are suspended and the appeals are dismissed, we will not be able to start this important work until Spring 2009.

For those of you being worried that allowing the AHGs to proceed with their work could be interpreted as some sort of bias towards dismissing the appeals, I think you are wrong. Allowing us to start the work on a provisionary basis does not in any way impact the decision of JTC1/ITTF regarding the appeals. It simply allows us to get a head start on maintenance of IS 29500 ... nothing more.

So please, let all the distinguished people in Ad Hoc Group 1 and 2 get at chance to start the important work they have volunteered to participate in.