Lo(o)sing data the silent way - all the rest of it

by jlundstocholm 15. April 2009 23:32

Ok - this post is going to be soooo different than what I had envisioned. I had prepared documents for "object embedding" and "document protection" but when I started testing them, I soon realized that only Microsoft Office 2007 implemented these features - at least amongst the applications I had access to. These were:

Microsoft Office 2007 SP2

OpenOffice.org 3.0.1 (Windows)

OpenOffice.org 3.0.1 (Mac OS X)

NeoOffice (Mac)

iWorks 09 (Mac)

The reason?

  • OOo3 doesn't fully support object embeddin
  • OOo3 doesnt support document protection
  • iWorks doesn't support object embedding at all
  • iWorks doesn't support document protection

So I'll just give you one example of what will happen when strict documents come into play - when applied to document protection.

Document protection is the feature that allows an application to have a user enter a password and unless another user knows of this password, he or she cannot open the document in, say, "write-mode". There is no real security to it, though, it is simply a hashed password that gets stored in the document.

This data is stored in the "settings.xml"-file in the document, and this was rather drastically changed during the ISO-process.

If you use Microsoft Office 2007 to protect your document, it will result in an XML-fragment like this:


<w:documentProtection
  w:edit="readOnly"
  w:enforcement="1"
  w:cryptProviderType="rsaFull"
  w:cryptAlgorithmClass="hash"
  w:cryptAlgorithmType="typeAny"
  w:cryptAlgorithmSid="4"
  w:cryptSpinCount="100000"
  w:hash="XbDzpXCrrK+zmGGBk++64G99GG4="
  w:salt="aX4wmQT0Kx6oAqUmX6RwGQ=="/>

You will have to look into the specification to figure out what it says, but basically it tells you that it created the hash using the weak algorithm specified in ECMA-376.

But as I said, this was changed during the BRM. Quite a few of the attributes are now gone for the strict schemas, and my take on a transformation of the above to the new, strict edition is this:


<w:documentProtection
  w:edit="readOnly"
  w:enforcement="1"
  w:algorithmName="typeAny"
  w:spinCount="100000"
  w:hashValue="XbDzpXCrrK+zmGGBk++64G99GG4="
  w:saltValue="aX4wmQT0Kx6oAqUmX6RwGQ=="/>
 

'Only thing I am a bit unsure about is the value for the attribute "algorithmName", but I guess it would be "typeAny". The result? Microsoft Office 2007 detects that the document has been protected, but it cannot remove the protection again - presumably due to the new attributes added to the schemas. I thought about creating new values using e.g. SHA-256 as specified in the spec, but the chances that Microsoft Office 2007 would detect this in unknown attribute values are almost nothing, so I didn't bother doing this. Feel to play around with it yourself.

The Chase

We need a namespace change for the strict schemas - and am thinking about ALL of the strict schemas including OPC. If we don't do it this way, my estimate is that we will lose all kinds of data - and the existing applications will not (as they behave currently) inform their users of it. Making existing applications break is a tough call, but I value data/information integrity more than vendors needing to update a bit of their code.

And as for the conformance attribute? Well, the suggestion as it is currently is to enlarge the range of allowed values of this attribute. Somehow I think it makes sense to enlarge the range as well.I think it would make sense to have the values one of

  • strict
  • transitional
  • ecma-376

or something similar. Then when we make a new revision at some point in the future, we can add version numbers to them at that time. Changing the namespaces will also make it possible to use MCE to take advantage of new features of IS29500 while maintaining compatibility with existing applications supporting only ECMA-376 1ed. (more about this later)

And what should the schemas be named?

Well, they are currently like "http://schemas.openxmlformats.org/wordprocessingml/2006/main" . So an obvious choice would be "http://schemas.openxmlformats.org/wordprocessingml/JLUNDSTOCHOLM/main"

Smile

... or maybe simply "http://schemas.openxmlformats.org/wordprocessingml/main" would be better? Of course it introduces easy causes for errors for developers, so maybe "http://schemas.openxmlformats.org/wordprocessingml/iso/main" would be even better?

Comments

4/16/2009 2:16:11 AM #

Alex Brown

I thought we were going to have "/document-freedom-day/" as a substring of the new Namespace Name...

- ALex.

Alex Brown United Kingdom |

4/16/2009 4:20:25 AM #

orlando

"I thought we were going to have "/document-freedom-day/" ..."

some day  , some day  Smile

orlando Argentina |

4/22/2009 7:52:41 PM #

hAl

ecma-376 is meaningless as a conformance attribute as the Ecma has updated their specification and will kee on doing so.

with year indication, like ecma-376:2006, it might be acceptable

Better would be then something like:
<w:Versioning
w:Version="Ecma-376:2006"
w:Subversion="Ecma-376:2007-errata" (non existing but for example purposes only)
w:Conformance="Strict" />

All published specs should have info on that in them and as such the info should always make certain that it is obvious which specs the document is based on.


hAl |

4/22/2009 8:17:26 PM #

jlundstocholm

hAl,

To simplify - could this do the trick?

strict-1.0
transitional-1.0
ecma376-1.0

?

Newer versions could then use

strict-1.2
strict-1.3
strict-2.0

(I am not sure ecma-1.0 and transitional-1.0 should ever change in the future of IS29500)

jlundstocholm Denmark |

4/23/2009 7:13:26 AM #

trackback

Trackback from Doug Mahugh

Miscellaneous Links, 04/22/2009

Doug Mahugh |

4/25/2009 1:17:00 AM #

hAl

I would not put the versions internally in the document on "1.0"
Since both ISO/IEC and Ecma both use years indication in the document specs I would think that was an obivious choice for the document as well.

People are to much hungup on sequential versioning.
As a developer / user I mostly need an easy bridge between an actual document and the formatversion. So why not use the actual names that the formatversion have been given.

Your simplification actually means a translation tabel between internal version of the document and format version of the standard.

hAl |

5/12/2009 6:17:10 AM #

TomS

Hi -- I'm not sure I agree with your statement:
"OOo3 doesnt support document protection"

I may misunderstand what you mean by document protection, but ODF supports password protected Files, Fields, Sections and comments.
The specific protection is either applied during the "Save As..."
or from the Edit...Changes...Protect_Records menu,
or from the Properties tab of specific graphics, tables, objects or forms.

If that is not the sort of protection you meant to test, could you clarify?

TomS

TomS United States |

5/12/2009 7:36:31 AM #

jlundstocholm

Hi Tom,

I was referring to OOXML-(un)supported features in the mentioned applications. That is, OOo supports document protection using ODF but not using OOXML.

I hope this clears it up.

Smile

jlundstocholm Denmark |

Comments are closed