Added by Maarja Toots 2019/04/30
You have scraped some interesting data, downloaded it, perhaps modified a bit or combined with other data and what you now have is a really cool dataset that you would love to share with others. So here’s the question: can an organization or an individual share a dataset on opendata.riik.ee that is based on processing or combining someone else’s data?
The issue is there – not all interesting datasets are (yet) available on the portal and some not even on the data holder’s website. Can an eager user help a data holder publish their data? On what conditions? As an example, the need for more clarity recently came up in relation to election data, which the data holder has not released under an open license but in which there seems to be considerable public interest (see the Github conversation).
Since no rules and requirements exist to regulate the publication of secondary data, the participants in the legal issues workshop at the Open Data Forum of 18 April took the first steps in formulating some as a basis of further discussion. As a result of a heated debate, the following proposals were made:
Question: Can secondary data be published on the portal?
The short answer: Yes.
The correct answer: Yes, but on certain conditions.
What conditions should be met?
What metadata should be mandatory?
However, this may not be enough. In the discussion, participants raised a familiar problem: you have built a brilliant service on someone else’s data, you wake up one morning, open your computer and… it’s all broken! The cause may be a change in the data collection method, update frequency or the composition of the data, or perhaps change in a process, regulation or law due to which the data is no longer available in the same format. Such situations may be more common if the data is collected and published not because of a long-term legal obligation but at the data provider’s own initiative. In other words, it is crucial for the provider of an open data driven service to know whether and for how long the data that they use will continue to be published. It would therefore be extremely helpful if the data holder would give an advance notice of any changes in the availability of the data. To this end, the metadata could also include information on:
All of that may be thinkable if the data is published by the data owner. However, where should this information come for data that is scraped and uploaded by another party? What if the holder of the original data does not wish the data to be published on the portal? Should it only be allowed to publish data which the original data holder has provided with a clear license? What if the license is not specified?
These and many other questions still remain open, so this, dear readers, is where we’ll invite you to join the discussion on GitHub!! Based on your input, guidelines will be formulated to outline the data publisher's obligations and set a clearer responsibility for data holders, users and any intermediaries.
The Open Data Portal's content is created as part of the EU structural funds' programme 'Raising Public Awareness about the Information Society' financed through the EU Regional Development Fund. The project is implemented by Open Knowledge Estonia.