Norbert Bollow's Comments on Standards

2009-11-02: Do you really want these time and date interoperability problems with OOXML spreadsheets?

Response to Gareth Horton's blog posting

Dear Mr Horton,

as you have started your recent blog post by picking on my use of "so called" in reference to ISO/IEC JTC1 SC34 WG4, it is maybe appropriate for me to briefly comment on that: While I do not recall precisely why exactly I used those words, I do think that it is justifiable to express some doubt about whether this WG deserves to be referred to with the full honor of its ISO/IEC title, even while it appears to be acting in strong disregard of both the spirit and the letter of the decisions of the BRM and the JTC1 directives. (With the point about disregard for the JTC1 directives I mainly mean the abuse of the corrigendum process.)

Whether or not you agree with that particular decision, the BRM clearly wanted to remove what is known as the leap-year bug from the storage format for OOXML. Besides the obvious desire to finally get rid of that old problem from the storage format at least, another motivation for this was the desire to extend the allowed set of date values to allow dates before January 1, 1900.

Since the leap-year bug in the "serial value" date format for OOXML spreadsheets was clearly the most widely publicized criticism of OOXML, the removal of these serial values from the storage format certainly cannot have been overlooked by the national member bodies of ISO/IEC in their review of the decisions of the BRM. Any national member bodies of ISO/IEC disagreeing with the change would have had 30 days time to change their vote to "DISAPPROVE". However, the opposite happened: Votes were changed away from "DISAPPROVE" to "ABSTAIN" or even "APPROVE". This allows the conclusion that overall at least, there was agreement with the change, and in fact (since this was the most prominent change), this change probably played a significant part in moving the overall result of the ballot from "DISAPPROVE" to "APPROVE". In my opinion, the main reason for this was not whatever the technical significance of getting rid of the leap-year bug might be, but rather that Microsoft was perceived at the BRM to be willing to accept the transition to a more standards-based, less "it is what it is because it has been that way since Lotus 1-2-3" storage format, even accepting the significant inconvenience that this transition must mean for the company and their customers.

That said, I agree with you that corrupting spreadsheets by losing or misinterpreting date information is absolutely unacceptable. The root of this problem is not that the storage format for dates is changing, but that it is possible for ISO/IEC 29500:2008 spreadsheet documents to be misinterpreted as ECMA-376:2006 spreadsheet documents. Of course, implementations of ECMA-376:2006 which were written before February 2008 cannot be expected to understand the date format of ISO/IEC 29500:2008. The correct solution to that problem would have been to change the versioning information in the storage format (for example by changing the string "2006" in the namespace names to "2008", or by adding a version attribute) and (in addition) to use a different file name extension for the new format (for example ".ixlsx" instead of ".xlsx", with the "i" standing for ISO/IEC). Shortly after the text of ISO/IEC 29500:2008 became available, I pointed out that the versioning "2006" in the namespace names had not been updated. In response Microsoft claimed that this was an oversight on the part of the Document Editor, and that it would be corrected quickly. Unfortunately, the issue is still not corrected. The change of the namespace names for the Strict conformance class only does not count, because it does not help: The problem that needs solving right now is the possibility of misinterpreting ISO/IEC 29500:2008 Transitional documents as ECMA-376:2006 documents and vice versa. This problem is not in any way reduced by a schema namespace change for the Strict conformance class only.

So, what does WG4's proposed change regarding the date-and-time representation do? It is true that the change reintroduces the "serial value" storage format only for the so-called Transitional conformance class. That the Strict conformance class is unaffected sounds nice, but it is not particularly significant, because for now, as long as Microsoft has not switched to the Strict conformance class, nobody else will be switching to the Strict conformance class either. I think that the main motivation for the proposed change is that it allows software vendors to claim ISO/IEC 29500 conformance while writing spreadsheets that can be read by implementations of ECMA-376:2006. That certainly has marketing benefits for these software vendors. However, the change creates a new kind of problem: If the proposed changes get accepted (and I do not doubt that that will happen, since Microsoft is active in every P-member's mirror committee, and is strongly opposed to voting "DISAPPROVE" to the proposed changes), there will be a large class of spreadsheet documents which conform both to ISO/IEC 29500 and to ECMA-376:2006 but with different semantics: The same serial value will refer to January 1, 1900 when the document is interpreted according to ECMA-376:2006, while it will refer to December 31, 1899 when the document is interpreted according to ISO/IEC 29500. That is a much worse form of leap-year bug than there was ever before! There is a similar problem with timezones, and that problem affects not only the distant past, but all date-and-time values: In ECMA-376:2006, date-and-time serial values do not carry any timezone information, so they refer to whatever timezone the user had in mind when entering the value. By contrast, in ISO/IEC 29500, all date-and-time serial values refer to UTC. For this reason, the interpretation of date-and-time values will in many cases be off by one or more hours.

Some further remarks:

  1. I am not saying that "ISO 8601 dates must be used everywhere without exception", just that they should be used in the storage format, like the BRM decided and the national ISO/IEC member bodies approved. There is nothing wrong with using serial values in the intermediate, computational stage of answering questions like "what date will it be in 125 days?"
  2. I am well-aware of the distinction between the Strict and the Transitional conformance class. I just think that what primarily matters in practice, for the foreseeable future at least, is how the so-called Transitional conformance class is defined. In fact it remains to be seen whether the so-called Transitional conformance class isn't going to describe the reality of how OOXML is used in practice in perpetuity, as least with regard to spreadsheets. The key question is: what is the strategy for ever reaching the point when consumers of spreadsheet documents are no longer forced by market realities to accept the problematic serial value storage format? If as per the decision of the BRM, producers of spreadsheet documents conform to the ISO/IEC standard only if they avoid generating spreadsheet documents containing dates and/or times represented as serial values, the transition to the ISO 8601 date format can be expected to actually happen over time. No credible alternative plan for reaching that objective has been proposed.
  3. Why did WG4 refuse to document the change of the date storage format in the appendix about changes? (I had submitted a Defect Report pointing out the omission of documenting this and other non-schema changes in the appendix about changes. WG4 closed that Defect Report as "invalid" on the basis of a narrow, legalistic interpretation of the BRM resolution to create that appendix.)
  4. You say that misinterpreting a date or time-and-date value as a different (but not obviously wrong) value is worse than misinterpreting it as a zero serial value or refusing to accept the spreadsheet document at all. I agree. But why then do you not reach the same conclusion as I that the proposed changes make the situation worse rather than better?
  5. I agree that the date format should be a carefully profiled subset of the ISO 8601 format, and that the current text in ISO/IEC 29500 is too vague. But it is not as broad or as vague as you represent. For example, the recurring time intervals syntax of ISO 8601 which you mention is clearly disallowed by what the text of ISO/IEC 29500:2008 says.

Best regards,
Norbert Bollow


Note: I aim to create a leading blog on the need for standards which are truly open standards and technically of excellent quality. More blog postings will come soon, but first I'm looking into choosing blog software which meets my needs well. As soon as that is installed, there will be at least an RSS feed, and possible also an Atom feed. If you would like to be informed when that is available, please drop me an email with this request.

Homepage

Benefit- oriented consulting

Golden mean between open source and profit oriented software

Norbert Bollow's Comments on Standards

Contact Adaptux

Adaptux GmbH (Zurich, Switzerland) / Tel 044 972 20 59 / Fax 044 972 20 69 / Email info2009@adaptux.com