a 'mooh' point

clearly an IBM drone

I hate automatic spam filtering

Every time you use a tool, you make a (conscious) decision to trust that tool to do what it says. This is true for closed source software and open source software alike (especially if you don't have the skills to plow through the source of whatever tool you use).

Most people use a generic blogging platform for their pieces of information that they wish to share with the world. Popular tools are Blogger by Google or WordPress. These tools often come with spam filtering and all sorts of other features intended to make your life as blog-owner more easy.

The catch you sometimes find yourself in, is that they don't work as they should. Spam is let through and non-spam is deleted. That can be a real pain in the ass - especially if you are not notified by this.


on Rob's blog I have been writing a bit with him and Luc. Rob moderates his blog (as most people, including me). We kind of have to - because blog spam is annoying and it disturbs not only the owner, but also everyone participating in the conversation if mail notification is enabled.

(as a funny side-note, I once promised never, ever to moderate my blog ... well, colour me stupid)


I think Rob has become a victim of too agressive spam filtering - it sems that two of my posts have been lost in cyber space - even though I have tried to repost them several times. Maybe it's my (grammer-error prone) English (or, "Dænglish", as we call it here) tricking something - I dunno.

Luckily I have an archive of the stuff I write (because Rob's blog is certainly not the only one suffering from this)

So to preserve our common digital legacy, here are my two posts that were erronously caught:


Hi Rob,

<i>I suppose at some point Microsoft will approach us with a list of suggested additions to OOXML. That is the prerogative of any vendor or any national body.</i>
So if I coined this differently to e.g.

"I suppose at some point ECMA will approach us with a list of suggested additions to OOXML. That is the prerogative of any liaison or any national body."

... that would make your day?

I'd be happy to make that correction - just say the word.

"In any case, aside from being inaccurate, your comment is off topic. No more, please."

Well, it is your blog, so feel free to censor whatever you want. My point was to confirm parts of what Luc was saying - that indeed some of the extensions Microsoft has made to OOXML will likely be added to the standard.

How that can be OT is beyond me.


Hi Luc,

<i>I agree, but then we must have a formal commitment by Microsoft that they will implement ISO29500 Strict within the coming 12 to 24 months latest.</i>

Yes, and I'd personally encourage them to make such a statement. But it is really out of scope of WG4 to do anything about it.

<i>I would recommend everybody to not invest one cent of their money or one second of their time improving it: it is lost time and money.</i>

Well, there are several tools one can use to push Microsoft to implement S - "not participating" is perhaps the least effective of those.


Excel 2010 (Microsoft Office 2010 CTP TO-do list (01)

I have been looking at how Excel 2010 has implemented various features using ISO/IEC 29500-4:2008 - also known as "OOXML Transitional".


ISO/IEC comes in two variants, a "transitional" (T) and a "strict (S). Transitional is the one containing all the legacy stuff such as VML, legacy digest algorithms, leap year bug etc. S does not contain these things and is therefore considered "more pure". T is practically identical to the document format submitted to ISO/IEC by ECMA - also known as "ECMA-376". A document conforming to ECMA-376 will therefore also conform to T, since the schemas are practically identical.

T is currently a superset of S, which means that "T includes everything in S". This has the effect that within a T document, a vendor can take advantage of new features in S while still being in "the comfort zone of T". T is therefore considered by some as providing a “graceful migration path to S”, meaning that vendors can change their existing T-compliant code, on a case-by-case basis, to gradually support the features of S instead of those of T.

Update 2009-11-17: By reading in the spec today, I realised that I was wrong in saying that the leap-year bug was not in S - indeed it is. I appologize for the confusion. Thank you, Jens Hørlück for pointing this out to me.

As you know, Microsoft is claiming "IS29500 compliance" for their latest incarnation of Microsoft Office. OOXML is a big and complex document format, and even though OOXML was not published until waaaay after "feature freeze" of Office 2010, I thought it'd be interesting to take a look at how Excel has reached compliance.

When looking at the markup generated by Excel 2010, it is important to remember the background information above, since you'd expect to find traces of this "graceful migration" in the markup already.

Conformance class

The DIS-process added a new attribute to the root elements of the documents. This element would specify the conformance level of the document. As to not invalidate existing documents, the default value for this attribute was “transitional”. The other possible value is "strict".

So when creating a new spreadsheet using Excel 2010, which markup does it create?

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>

(condensed for easier reading)

The above is the markup for the root element of a SpreadsheetML spreadsheet.

So the conformance class attribute is omitted – making the document a T-document. I know that it is basically nonsense to add an attribute with the default value to implement support for this behavior, but for those of us that like looking at markup in text editors, I’d recommend Microsoft to include the conformanceClass-attribute with the appropriate value. As it is a no-brainer to implement for a consumer – it should be a no-brainer to implement for a producer.

Also – the existence of the conformanceClass-attribute is the only (or, best, anyway) indicator that a T document was created according to the schemas of ISO/IEC 29500 and not those of ECMA-376.

(and yes, I know that they are practically identical, but adding the attribute would be a clear indication that Microsoft wishes to produce markup according to ISO/IEC 29500 and not ECMA-376 1st Ed.)

Microsoft, please fix!


Support for ISO-dates in SpreadsheetML just might be the biggest issue we faced in the DIS-process. It was hugely controversial and as such, a large amount of time was used at the BRM to figure out a solution.

Excel 2010 allows a user to manually chose the option to persist dates in ISO-8601 format. This is done through the "Settings-dialogue" as

So let us see what Excel 2010 does with dates. I opened the application and in a cell I wrote the date 28-02-1900. I then unzipped the XLSX-file and found this markup in the Worksheet-part:


  <row r="1" >
    <c r="A1" t="d">

So the date is actually persisted in the file using ISO-8601 notation and the cell is correctly typed with a t=”d”-declaration.

Now, depending on your point of view, this is a good thing, because it takes us further from the hell-hole I have previously written about with respect to “date-typing-of-numbers” in ECMA-376.

Thank you, Microsoft – good for you!


Leap-year bug

So … with the “Save-as-ISO- 8601-dates”-setting to “yes, please” as previously described, I added a formula to the spreadsheet above in cell B1. The formula simply was “=A1+1”.

Anyone up for guessing the result?

Well, the result was this:

That is funny because of two things:

  1. I set the “Save-as-ISO Dates”-setting to “yes”, so I was not expecting the leap-year-bug to still be used.
  2. I am pretty sure that the date-representation “1900-02-29” is not a valid ISO-date.

So I looked at bit at the markup generated for this spreadsheet.

The result was:


  <row r="1" >
    <c r="A1" t="d">
    <c r="B1">


Ahem … dear Microsoft … WTF?

So you have persisted the first date as an ISO-date,

but the result of calculating a value based on it – you persist the result as a serial date? In which universe would this make sense?

(I should note that whenever doing calculations resulting in dates not being invalid, the date is correctly persisted by Excel 2010 as an ISO-8601 date)

But – as I was writing this I thought “wasn’t there something about an at-BRM-added attribute called “dateCompatibility”? This was the attribute used to determine if the date-system used should support the leap-year-bug or not. And truth be told, the attribute is not used – making the dateCompatibility default to “true” – hence honoring the leap-year-bug.

But – this could simply be a glitch of Excel2010, so I added the attribute myself to force Excel2010 to discard date compatibility. I’d then suspect the result of adding 1 to the date of 1900-02-28 would be 1900-03-01.

The markup was this:

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook dateCompatibility="false">
    <sheet name="Sheet1" sheetId="1" r:id="rId1”>

Result: still 1900-02-29

I this calls for a few conclusive words before I move on:

Microsoft, I think you should stop using ISO-8601 dates in SpreadsheetML in T because of the arguments provided by my fellow WG4-expert Gareth Horton. But if you insist on using ISO-8601-dates in SpreadsheetML-T, you should do it right,. If you ask me, a user manually choosing to have Excel2010 use ISO-8601 dates, is a clear indication that the user does not want to deal with the leap-year-bug. The user has even decided to continue despite your warning that he or she might lose precision. So when a user performs this specific action, set the damn dateCompatibility-attribute to “false” and persist ALL dates as ISO-8601 dates.

You have a golden opportunity here to ditch the leap-year-bug for all new documents. If you act wisely and add the conformance attribute (so consuming applications can distinguish the files from ECMA-files) and setting the dateCompatibility-attribute to “false” when persisting dates as ISO-8601 dates, you’d have done really, really good. You have the by far biggest implementation of OOXML – so the vendors will follow your lead. Any application adding support for ISO-8601 dates in transitional documents in their existing ECMA-376-compliant code, will be able to kno

w what to do - and they'll do it the way you do it.

Now, I know you would like to demonstrate to all of us that you want to go towards “the S way”, but you need to do it right. Mixing ISO-8601 dates with serial dates simply to be able to maintain the leap-year-bug is not the right thing to do.

I have absolutely no idea of how the internals

of either the calculation instructions in Excel works – or even the Excel-team itself, but the way you have “added” ISO-8601 dates indicates to me, that you have changed barely anything except for the persistance-mechanism for dates.

I’m sure you’d argue that there is no longer time to change the inner workings of Excel2010, and you might very well be right about that. If that is indeed the case, I suggest you remove the option of saving dates as ISO-8601-dates from Excel2010 as soon as possible to avoid doing any more harm to the ecosystem around OOXML that we both share deep concerns for.


At the BRM the biggest pile of things added to OOXML was where a feature was previously only possible with the use of VML. Of these include adding comments to cells in spreadsheets. The comments themselves are stored in a “Comment part” of the OPC-package, but the display of the comment was done using VML. Luckily, at the BRM markup was added to allow DrawingML to be used instead.

So I used the spreadsheet from before and added a comment in cell B2 (the one with the crappy leap-year-bug result)


Anyone wanna guess if this comment is displayed using VML or DrawingML?

Yes, you guessed it … somewhere in the back of the document is a VML-fragment containing the “box” containing my comment.

[code:xml]<xml xmlns:v="urn:schemas-microsoft-com:vml"

 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1"/>

 </o:shapelayout><v:shapetype id="_x0000_t202" coordsize="21600,21600" o:spt="202"
  <v:stroke joinstyle="miter"/>
  <v:path gradientshapeok="t" o:connecttype="rect"/>
 </v:shapetype><v:shape id="_x0000_s1025" type="#_x0000_t202" style='position:absolute;

  visibility:hidden;mso-wrap-style:tight' fillcolor="#ffffe1" o:insetmode="auto">
  <v:fill color2="#ffffe1"/>
  <v:shadow on="t" color="black" obscured="t"/>
  <v:path o:connecttype="none"/>

  <v:textbox style='mso-direction-alt:auto'>
   <div style='text-align:left'></div>
  <x:ClientData ObjectType="Note">


    2, 15, 0, 2, 4, 55, 3, 16</x:Anchor>


If this was a tweet on Twitter, I’d finish it off using the tag #fail.

Document protection

And finally – what about document protection? “Document protection” is the mechanism in OOXML to allow applications to open documents in “read-only-mode”, to protect worksheets from editing etc. These are not really protecting the document as such, because a hash of the password is simply saved – but everything is still in clear text. Document protection was one of the areas where OOXML was fundamentally changed, allowing more robust algorithms to be used – and not only the weak “legacy-algorithm” previously supported by Office 2007.

So in Excel2010 I chose to protect the active sheet and looked at the markup again. Again, I was hoping to find some of the new markup, but again I was let down. Excel2010 still uses the weak algorithm and it still uses the legacy markup to persist it.

[code:xml]<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>


  <sheetProtection password="8985" sheet="1" objects="1" scenarios="1" />

This is not only an annoyance for those not working on the Windows-platform – it is also an annoyance for those of us working with e.g. .Net development. If Excel2010 used SHA1, SHA256 or one of the other modern algorithms supported directly by .Net framework, we’d be able to use it straight out of the box. Now all of us still need to add additional code/assemblies to our code – simply to protect a whorksheet.

At the end …

I must admit that I am a bit disappointed. I know that barely a year has gone since the publication of ISO/IEC 29500, but I had expected more of the things we discussed at the BRM to be implemented. At the very least I was expecting some of the more “easy ones” to have been implemented – like the conformanceClass-attribute, VML in comments and document protection. It would have been a nice token of “good faith”.

Now, we are basically left with nothing. Do note, however, that the observations above are by no means a comprehensive test. These represent simply a few of the thoughts on the top of my head while writing this.

Next time: WordpressingML

PS: I have been trying to find a list of the stuff from the BRM that was implemented in Office 2010 (like ISO-8601 dates), but I have been unsuccessful. So if anyone can point me towards such a list, I'd be happy to include those results above.

Mummy, Tom from school is an idiot

Back in the day when I started blogging, I showed a friend of mine a blog post I had written. He noticed the large number of links to other bloggers in the piece, and he asked me (tongue-in-cheek) if those links were some kind of geeky way of saying "I love you" to the people I linked to.

Well, to some extend, he was correct. But that is not the only way people communicate varm feelings to each other - without saying it directly. If you look at kids in (pre)school, when they start to have feelings for the opposite sex, the message to everyone else is typically "No, I don't like Tom at all ... he's a total idiot!". So parents (I presume) quickly learn that sometimes name-calling is a token of love and affection.

So ... Alex and I have made a little game. As you might now, name-calling is quite the way to do stuff - at least if you are in some way opposed to this whole "oh oh xml thingy". Classics are "drone" and "$hill" and more exotic examples are "nazi" or "saboteur". The current score is listed in the table at the right of this page.

The rules are quite simple:

  • Each mention of either of us with a special name scores one point
  • If one of us are mentioned with name in the title of the article/post, each score for each mention in the article is doubled

The prize is beer in Paris for the next WG4-meeting in the beginning of December.

This is all fine ... except for one thing ... I am loosing miserably. I had a good thing going for a few months (and I even was so cockey to suggest a wager on beer, because I "knew" I was winning"). But somehow Mr. Fox has gained momentum and with the latest bashing of Alex, we are now tied.

So Roy, Pam ... give me a hand here ... think about all the frustration I might have caused you (and you families) and put it down in writing ... otherwise I'll be the one buying beer in Paris and Alex will be winning ... and none of us want that to happen, do we?

PS: Roy, I love you too ...


Microsoft Office 2010 CTP1 TODO-list (Teaser)

It seems about time somebody wrote a bit about how Microsoft has chosen to implement ISO/IEC 29500:2008, aka OOXML. As you might know, Microsoft claims that Microsoft Office 2010 will implement “29500” in Transitional (T) sense. As far as my tests have shown (and they are in no way a complete and thorough application test) there no signs that this is not in fact true. So by launch day of Microsoft Office 2010 we (as in “the world”) will have at least one big implementation of 29500.

However – the devil, as always, lies in the details. So the question is not if they have implemented 29500 – the question is how.

This will be the first post in a series looking at the details (from a format perspective) of how Microsoft has chosen to implement 29500.

[Note: Documents markup for documents conforming to Transitional conformance clause (T) and for those that conform to Strict conformance clause (S) are virtually identical – for simple documents.]

I will maintain a list of items for Microsoft to consider as I go along. It is available at this permanent location.

Excel 2010

I will start by looking at Excel. Some of the most controversial parts of 29500 were focused on how Excel handles e.g. dates, VML, document protection etc., so this seems like a reasonable place to start. Also, the markup of SpreadsheetML is much easier to read than the markup of WordpressingML or PresentationML (methinks), so it should be the right place to kick this off.

Stay tuned for the first article looking at how the Excel team of Microsoft Office 2010 has dealt with ISO/IEC 29500.

Denmark votes "yes" on IS29500 COR1 and FPDAM1

I know it has been a couple of weeks, but I just wanted to share current development with you.

On September 7th (in Danish), the Danish mirror committee to ISO/IEC JTC1 SC34 met at Danish Standards in Charlottenlund. On the agenda was, amongst other things, processing of documents under ballot. The relevant documents to WG4 was these

As appointed expert from Danish Standards in WG4, I have been working hard with the other experts in WG4 on these papers and I have for each meeting in Denmark provided oversights to the mirror committee on the current work. The members of the Danish committee have access to the same set of papers that I have, so we have primarily been discussing the more controversial ones - like usage of ISO-8601 dates in transitional files, reintroducing ST_OnOff in transitional schemas and changing the namespace name for strict files. A couple of times Danish committee members have requested information on more "trivial stuff", and we have then discussed this.

At the meeting of September 7th, I gave a quick sporadic overview of the more tough parts of COR1 and AMD1 and no comments were presented. We talked a bit about general principles of the work in WG4, but that was basically that.

After this, Denmark (Danish Standards) approved the document sets for COR1 and AMD1.

Obviously I think this is great news and the chairman of the Danish committee expressed his appreciation of the work put into creating these files.

Danish Competition Authority suggests: Use ODF in public sector!

Get the information straight from the horse's mouth from DCA website. If you are not speaking Danish, Google will do a rough translation for you.

I'll update this article shortly ...

... oh ... and I almost forgot ... they suggested using OOXML as well.


Norway mandates PDF and ODF as exchange-formats

Norway has mandated use of PDF and/or ODF as document exchange formats. The baseline reference list of approved standards and formats has been released in a "version 2.0"-edition where, amongst other things, ODF has been approved in edition 1.1. An abstract of the text is

3.2.2 Dokumentstandarder for utveksling ved e-postvedlegg

Ved utveksling av dokumenter som vedlegg i e-post fra offentlig sektor til omverdenen (innbyggere og næringsliv), skal følgende standarder benyttes: PDF 1.4 – 1.6, PDF 1.7 (ISO 32000-1) eller PDF/A (ISO 19005-1) er obligatorisk format ved utveksling av dokumenter beregnet for lesing. ODF 1.1 (Oasis Standard 1. februar 2007) er obligatorisk og skal benyttes ved utveksling av dokumenter beregnet for redigering hos mottaker etter avsending fra offentlig myndighet. På grunn av begrenset utbredelse anbefales det midlertidig å legge ved ett eller flere tilleggsformater for å sikre allmenn tilgjengelighet. I slike tilfeller skal det tydelig informeres i e-posten om at vedleggene består av samme dokument gjort tilgjengelig i flere format.

Det er viktig å være oppmerksom på at publisering av dokumenter på nyere versjoner av PDF, kan føre til at en leser med støtte for en eldre versjon ikke kan lese hele dokumentet.

Ved mottak av ferdigstilte dokumenter i e-post fra innbyggere/ næringsliv, bør offentlig sektor som et minimum kunne håndtere følgende standarder: PDF, alle versjoner PNG (Portable Network Graphics, ISO/ IEC 15948:2003) JPEG (Joint Photographic Experts Group, ISO/IEC 10918-1) ODF, alle versjoner

For både ferdigstilte dokumenter og dokumenter for videre bearbeiding bør offentlig sektor også kunne motta alle andre formater med stor utbredelse innenfor anvendelsesområdet, som ikke gir den offentlige myndighet en urimelig stor konverteringsbyrde. Hvilke formater som konkret kan forventes vil være forskjellig innenfor sektorer og vil endre seg over tid.

Dokumentformatet OOXML ble publisert av ISO 18. november 2008. Den er besluttet fortsatt å være under observasjon.

I think this is great news for ODF that governments around the world are upgrading their procurement requirements to take advantage of the latest edition of ODF.

My translation of (parts of) the above is:

When exchanging documents as attachments in email from the public sector to users (citizens and corporations) the following standards must be used: PDF 1.4-1.6 (ISO-32000-1) or PFD/A (ISO 19005-1) are mandatory and must be used when exchanging documents designed for reading (only, ed). ODF 1.1 (OASIS Standard 1. February 2007) is mandatory and must be used when exchanging documents for editing purposes. Due to the limited market penetration (of ODF, ed), it is however recommended (temporarily) to attach additional document formats when sending data from a public institution.


OOXML was made public by ISO on November 18th 2008. It still under observation.


Microsoft-stacking in WG4

Traditionally, for every meeting we have in WG4, some conspiracy-theory is born on how much money the delegates received from Microsoft, how many sports-cars we each got from Microsoft or how we each had a Microsoft employee sitting on our laps dictating what we should say.

So, I thought I'd beat the usual nut jobs to it and present the attendance list myself. The minutes from the meeting will be available soon, but who wants to wait for exiting news like this?

The attendance list was this:

Name Affiliation
Pia Lange
Host Dansk Standard
Makato Murata
WG4 Convener
International University of Japan
Sam Oh
SC34 Chair
Sungkyunkwan University
Keld Simonsen
Dave Welsh
Mario Wendt
Klaus-Peter Eckert
DE Fraunhofer Fokus
Jesper Lund Stocholm
Rex Jaeschke ECMA HoD, project editor Consultant
Doug Mahugh
Shawn Villaron
Kimmo Bergious
Alex Brown
Griffin Brown Digital Publishing Ltd.
Gareth Horton
Jaeho Lee
University of Seoul
Jung-Jin Yang
The Catholic Univeristy of Korea

So out of a total of 1416 people attending ... 56 people were in some way affiliated with Microsoft and/or ECMA. What the hell, throw the Microsoft shill Alex Brown into the pot as well - that'll make it a total of 67 people.

I don't know what to say ... I'm shocked.

Update: I have been notified that I missed two persons on the list, Dave Welch and Keld Simonsen. List and numbers have been updated accordingly. Smile

Re-introducing on/off-values to ST-OnOff in OOXML Part 4

At this very moment we are discussing re-introducing the values on/off to the simple type ST_OnOff in the transitional part of OOXML.


Some countries (including Denmark and UK) argued during the DIS29500-process that the enumeration values "on" and "off" of the simple type ST_OnOff were inappropriate since they expanded the W3C Schema data type xsd:boolean. So at the BRM, these values were removed from the simply type ST_OnOff.

Now, that's all fine and dandy - only problem was that it made (according to a Microsoft estimate) 90% of all existing documents (and existing applications) non-conformant. Alex Brown demonstrated this in his article "OOXML and Office 2007 Conformanc: a smoke test". Further, it went directly aganst the scope of IS29500:2008 which was to "represent faithfully the existing corpus of word-processing documents, spreadsheets and presentations that have been produced by Microsoft Office pplications (from Microsoft Office 97 to Microsoft Office 2008, inclusive)".

So we have been disussing this quite a bit - because by re-introducing the values on/off would effectively be reversing a BRM decision ... in other words ... politically, it is a bit of a hot potatoe.

You might argue that this is a prime example of how Microsoft controls SC34/WG4 and how we simply align everything to what Microsoft or Microsoft Office does - but unless you consistantly opt for the sensational news, that position doesn't make very much sense.This has really nothing to do with aligning IS29500 with Microsoft Office; it has to do with aligning IS29500 with its scope.

Now, do note that we in WG4 cannot make decisions to alterating IS29500 - this is the prerogitive of the national bodies in SC34 or JTC1, so all we are doing is suggesting to the NBs that we think it is a good idea to reintroduce the two values.


WG4-meetings in Copenhagen

So ... everyone on the "who's who" list of OOXML maintenance is in Copenhagen eagerly working our way through a zillion defect reports and proposals for IS29500. The pace varies from hour to hour, but it is almost all of it quite interesting (cough!).

We have quite a busy schedule in front of us for these three days in Copenhagen. The # of DRs have climbed above 250. As you can see on the statistics page of WG4, we have successfully closed about 138 of them (through-out the last few weeks) and we are working our way through the rest.

The topics for this week evolve around mondane tasks as sorting out editorial defect, discussions about technical comments and figuring out what to put in a AMD-bucket and which ones to put in the COR-bucket. It's all about the glamour and fancy life style here.


We are certainly living in interesting times ... and I am sure we'll get a lot done in these three days.

PS: Ooh ... and we are gonna burn a witch on Tuesday evening.