Rather than just open access to the (published) results of scientific research, ‘open data’ refers to the sharing of whole datasets and raw data. Published papers constitute a small, and un-representative, slice of the actual research going on, and usually funded to a large extent by the public purse. Only around half of conference presentations end up as published papers, and only half of what’s publicly funded ever results in any papers at all. Open data would mean leaving the raw datasets available for anyone to scrutinise, pick over, and work with. Simply knowing that their datasets are out there of others to check might make researchers more careful and responsible with their work. There is also an enormous potential in all the data that, whether for reasons of time or disinterest or funding or worry about ‘negative’ findings, is put together but not used in anything published or public. Once the most interesting or lucrative finding has been pulled from a dataset, which often requires a huge outlay of resources to acquire or build, the rest gets laid to rest in someone’s computer, or simply thrown out.
There are of course concerns – what if someone manipulates the data to ‘find’ something that isn’t there or ‘prove’ something that is false or dubious? What if someone out there picks your work apart and does it better? What do we do about privacy or confidentiality concerns? And what’s the benefit to researchers of buying in to ‘open data’? Certainly, technology/infrastructure limitations are no longer a valid excuse not to share data.
This issue of incentives for researches may be solved by working with the current citations system and a recent invention, the ‘data paper’. Instead of building academic credit solely by citations (by others) of the papers one has published, why not also allow for the citation of databases? Thus the data is shared – housed in a public repository – and the scientists who built the dataset receive credit when their data is used/cited. Some ‘data journals’ are now inviting scientists to write ‘data papers’ to accompany their datasets, with information about the data and how to use it. This might be a way of facilitating good use of shared data and incentivising the idea for researchers who put in the legwork of compiling the raw data. This, however, may not come to pass until bodies funding the research tug on the purse strings and begin to prompt researchers to share their raw data.
- The Conversation: Funding bodies will have to force scientists to share data