Policy by the Numbers

Data for sound policymaking from Google and friends

Mapping the Ecology of Open Data Development

Thursday, October 25, 2012

Viktor Mayer-Schönberger is Professor of Internet Governance and Regulation at the Oxford Internet Institute/Oxford University. He is also a faculty affiliate of the Belfer Center of Science and International Affairs at Harvard University.

Many of us see open data as a potent tool to enhance and improve citizen empowerment and participation. The idea is not just that government data is brought to citizens in a more meaningful way. Many also hope for a rich ecosystem of open data sources and developers yielding amazing apps that provide society with novel insights.

Based on Zarino Zappia’s initial work and data collection, he and I haven taken a sharp look at this emerging network of open data sources and app developers. Data for developers of 175 open data apps were collected, including what data sets they had used. We then mapped the flow of information from initial data sources through the applications that developers had created to end-users.

Given the high hopes surrounding open data development, the results were somewhat sobering. The open data community that emerged from the data set we analyzed was relatively fragmented and disparate, with (as we noted in our paper), "far less participation and combination of data sources than originally hoped. Instead of a wide open playing field devoid of hierarchies we find developers and datasets alike become crucial linking points—crucial gateways for the flow of information—between sub-communities of open data development based on specific tasks or contexts."

We also found that most open data developers focused on a relatively narrow context for their applications. Thus insights they might have gained in one context – say local mapping apps - did not get transferred over easily into different contexts like apps on economic data or development.

Most open data apps were created by individuals (71 percent), and as far as we can tell only half of these individuals belong to an easily identifiable, working open data developer community. When and where these communities did form, data sources often provided a natural conduit. Thus perhaps unsurprisingly, apps that combine different data sources were also comparatively rare.

"The open data developer network," taken from Mayer-Schönberger/Zappia, Participation and Power: Intermediaries of Open Data

Moreover, the network of open data developers seems to replicate the tendencies towards a recreation of hierarchies and limitations on participation that research has shown to saddle the blogosphere, e-rulemaking, or (more recently) Wikipedia.

While these results may disappoint, a few words of caution are in order. First, despite our efforts our data collection may not have captured the ecosystem comprehensively enough. Second, we may have looked at open data developers too soon (data collection took place in 2011). It is still early days, and perhaps as more data sources are added, and apps gain in popularity, not only the number of developers may grow, but they may also become better connected. However, it is important to note that our initial findings were confirmed in in-depth interviews with a number of renowned open data developers.

Perhaps, though, these results also capture an important opportunity: if open data developers are not sufficiently connected with each other and the broader community, it may be because there is not yet an easy way to do so, independent of the large platforms providing data sources. Remedying that may help the open data ecosystem more than the release of further data sources or another application contest.


Jason Hare said...

Data developers and datasets ahould be set apart as having converging but not necessarily the same agendas. The producer should treat psi datasets as infrastructure and something like publishing open data as a nessary but not sufficient condition for innovarion. Much like the www 20 years ago we will go through episodic successes ans failures before we find a working model.