Guidelines for publishing secondary datasets

Added by Maarja Toots 2019/08/27

In one of the previous blog posts we asked whether it is OK to publish datasets on the portal that have been created as a result of processing someone else's data. We are happy to announce that in addition to the answer 'yes' we now also have guidelines and tips that may be useful for publishers of secondary datasets.

The new guidelines for publishing secondary datasets define secondary data as datasets created by way of automated processing, combining or scraping/polling publicly accessible data. In a situation where data holders are not able or interested in publishing their data on the portal, these guidelines provide a basic framework for those who want to lend them a hand or add value to the original data through processing or combinations with other datasets.

The main requirements for publishers of secondary datasets concern the obligation to make sure the data is processed in accordance with the law (this includes respecting personal data protection requirements), to allow public access to the source code of the program used for processing the data, and to provide true and detailed metadata about the dataset. This allows users to evaluate if and how they are going to use the data. Note also that the guidelines are not set in stone and can be adapted and changed based on users' feedback. Proposals for amendments and additions to the guidelines can be made on GitHub.

The guidelines can be accessed HERE Now let's publish some data!

The Open Data Portal's content is created as part of the EU structural funds' programme 'Raising Public Awareness about the Information Society' financed through the EU Regional Development Fund. The project is implemented by Open Knowledge Estonia.

European Union Regional Developmen Fund

The Open Data Portal's content is created as part of the EU structural funds' programme "Raising Public Awareness about the Information Society" financed through the EU Regional Development Fund.