Spreadsheet data is dead; not live. Don’t try to Recover(y) it!
Whenever you copy data into spreadsheet you instantly render it 'dead'. In other words, you severed its connection to the source and in essence created copy of the real data. This may be fine if you’re doing historical analysis and trust that the data is valid. But it isn't fine if the data always changes and the source/timeliness of the data is as important as the data itself.
Here’s a great example: Recovery data. As part of the Recovery effort billions of dollars are flowing to the government agencies such as FDA, HUD, HHS, etc. These agencies get the money and distribute it. But they are also the ones responsible for reporting on the data internally, ‘upternally’ (a word I made up to denote reporting up the agency chain) and externally (to the public). Multiply all the agencies times the number of distributions times the frequency of data changes and all of a sudden the government has a major data problem.
Welcome to the ‘Decade of the Data’.
We know that this effort is being coordinated at www.recovery.gov and President Obama has appointed Vivek Kundra as Federal CIO. I’m a big Vivek fan and applaud his past efforts on data visibility work in the DC government. His zest for innovation and data transparency (such as Apps for Democracy) is inspiring in a public servant.
So like any good citizen, I went searching for the Recovery data and discovered all the weekly Excel Recovery reports. It’s great to see the data making it’s way to the public but I’m troubled by the notion of having this data put out as Excel. What happens when the data changes? Do I wait for the next spreadsheet that updates the prior spreadsheet’s data? It’s easy to see that this data is dynamic and ever-changing. Manual reconciliation just won’t do. Oh, since Excel is a manual process, mistakes in the data can easily make their way into the reports.
The problem is seductively simple: Spreadsheets are so ubiquitous and so easy to use we tend to use it for just as many wrong things as good things. Specifically, Excel is not the right tool to manage real-time data. Worse still, spreadsheet data is not governed, not secure and not easy to aggregate with lots of other spreadsheets. Excel fails to meet 3 of the ‘5Cs’ of enterprise mashups.
I think it is easy to define the exact characteristics we need for true Recovery data visibility across the entire government. And here’s why: when it comes to live data across multiple sources, the value in this data is not the individual set of numbers, but rather the ability to look at the data aggregated, sliced and diced, geographically transposed (i.e. on a map), temporally depicted (can you say timeline?), and any other way we need to see it right now. This means speed in creating and visualizing the data is just as important as access to the data.
And if you think spreadsheets, data warehouses or portals can do this, think again. My fellow blogger Chris Warner often talks about his 'Dead-and-Deadly Data Matrix’ that I think nicely lays out how other enterprise technologies fit into the ‘Live and Secure’ information management spectrum.
There’s really only one way to do this and that’s using enterprise mashup technology. Enterprise mashup platforms (EMPs) are built exactly for this purpose: to provide the fastest way to interact and visualize data from multiple disparate sources in real-time and safely. They do it in hours and days, not weeks, months or years. And EMPs plug directly into the authoritative data sources and provide the necessary security and governance to make the data available and visible internally and externally.And true EMPs let you publish the mashups as REST/XML services as well as mashup widgets (we call them 'Mashlets') that can land in your 1.0 portal, 2.0 portals (like Netvibes and iGoogle), collaboration platforms (like Sharepoint and Jive), mobile devices (like the iPhone) and even back into Excel as a live data source. And that’s exactly what users want. They want to have the data and visualization where they work, not where we want them to work.
The ‘Decade of the Data’ is definitely here. Let’s just make sure we do it the right way.



1 comments:
It's fine to publish data to spreadsheets as a mashup-driven snapshot, but it's better when you can make the mashup persistent. That means the next time you "mashup," you only get the data that's changed, your changes done "offline" are not overwritten, and the change is shown in Excel at the cell level. Check out www.boardwalktech.com
Post a Comment