The UK Government Statistical Service recently released its good practice guidance for releasing statistics in spreadsheets. While this advice is clearly well-intentioned*, and parts of it are good, the overall effect is to encourage the release of data in formats that are difficult to process by computer. This is a disappointing retrograde step.
The following spreadsheet is shown in Annex A:
If you work with data, I apologise for making you look at that. If you don’t, I should explain it is the sort of spreadsheet that, if you had to work with it, would cause you to wail in anguish and despair and demand to know what you had done to deserve such a fate. To enumerate the most serious problems briefly:
- Data and metadata are mixed together willy-nilly in the same sheet.
- The meaning of the columns – the ones that have a meaning, and aren’t just blank for layout purposes – is specified by four different rows, two of which use merged cells.
- Worst of all, background colour is used to convey the reliability of each estimate: information that is not provided in any other form. If the spreadsheet is converted to CSV – which is usually the first step when doing any serious work – this vital information is lost.
The use of minus signs to denote missing data, which might be irritating in an otherwise well-designed spreadsheet, is in this context so insignificant a problem as barely to register.
This spreadsheet is presented as a “good practice example”.
It isn’t as though the authors don’t know how to make a useful spreadsheet. Their example of a “spreadsheet focused on reusability”† is pretty much spot-on. But their advice on presentation ignores usability, where it doesn’t actively sabotage it. They recognise that there is a tension when they write:
“Providing an output which reconciles the requirement for clarity of presentation with reusable data can be hard.”
But that isn’t good enough. All data should be provided in a usable form, and ‘clarity of presentation’ shouldn’t be an excuse for poor usability.
We need an alternative, real good practice guide to releasing statistics in spreadsheets. I don’t see why it should be more than a page or so. Our attempt is online at clean-sheet.org. What have we got wrong, or missed out? Leave your suggestions in the comments below or submit a pull request to the git repository.
* Getting behind the GSS spreadsheet guidance describes the well-intentioned process that led to this unfortunate outcome.
† The guidance uses the word “reusability” for the quality we’re calling usability.