Publish research data
How can research data be published?
Research data can be made available in subject-specific, interdisciplinary or institutional repositories. In some cases, they are also deposited on the institute or private homepage.
The most attractive option is publication in a subject-specific repository, as the form of data storage is most suitable there from the outset and the data can be presented in a suitable context in terms of content. If one is not available for certain research directions, interdisciplinary or institutional repositories can be used.
Finding suitable research data repositories
An overview of a large number of research repositories is provided by re3data. Filtering options include subjects and type of data.
More collections of research repositories:
Overview of data repositories compiled by Nature.
List of repositories on a wiki about Open Access
Search portal for suitable repositories (from fairsharing.org)
In order to make the research data as findable as possible, they should be described with further information, e.g. on the authors (see metadata).
If possible, the research data should be stored in open file formats (see file formats) and prepared in such a way that other scientists can understand and use the data. If necessary, additional documentation as a separate file may be useful.
To enable reliable citation of the data, research data are automatically assigned a unique identifier, usually a DOI.
Research data for scientific articles
Sometimes, publication of the research data linked to an article is enabled or even required by the publisher. Here, the procedures at the various publishers are very different and must be investigated on a case-by-case basis.
Selection of existing interdisciplinary repositories:
- Zenodo: repository supported by EU funds; so far no costs for research data with low data volume; also stores CERN's Large Hadron Collider data (in total about 100 PB storage space)
- Dryad: Non-profit organization that charges for publishing research data
- figshare: free publication of research data; also offers private (non-public) storage; used by various universities
- Harvard Dataverse: Repository where scientists* of all disciplines can publish
- Open Science Framework: repository for storing research data (including lab books, etc.), sharing it with colleagues, or making it public
- Mendeley Data: currently free publication; costs may be charged in the future, depending on file sizes.
What should be considered when publishing?
Over time, software becomes obsolete and is replaced. Thus, file formats in which research data was originally generated also become outdated. To ensure that the data can be re-used in the future, it is recommended to use open data formats so that the data can be more easily converted to current data formats. If this is not possible, it is recommended to use formats that have become standard in the respective scientific community.
Suggestions for data formats that can be archived well:
- Textual research data: XML, TXT, HTML, PDF/A
- Tabular research data: CSV
- Databases: XML, CSV
- (large) Datasets: CSV, HDF5, CDF
- Images: TIFF, PNG, JPEG
- Audio: FLAC, WAV, MP3
In the accompanying information to the research data, the so-called metadata, important information is deposited.
Keywords, subject headings and further information on the thematic classification of the research data help to make the data set as easy to find as possible.
In addition, references to further research data, to related publications or to research projects should be stored, so that, for example, the related publications can be linked directly.
Just as with books, journal articles, etc., research data is published under certain licenses (copyright terms). Typical licenses are Creative Commons, Open Data Commons, Open/Non-Commercial Government Licence or Public Domain (further information on an external website). To ensure easy re-usability of published research data, it is recommended to use licenses that are as free as possible.
The publication of research data requires that you own the corresponding rights and that the rights of third parties, e.g. in the case of personal data or non-disclosure agreements, are not violated. For example, if graphics or images have already been used in publications, the rights to these objects could be held by the publisher, depending on the contract.