Enhancing the visibility of (shared) data
Given the significant time and effort involved in the research, but also in obtaining the data, documenting it, archiving it and eventually sharing it in a data repository, making the data visible is essential.
There are various methods for improving the visibility of data deposited in a data repository. Here are some complementary methods:
- Make data visible in the publication
- Create links between different platforms
- Mention the data in CVs and digital profiles
- Publish a data paper
Adding the dataset reference directly in the publication, for example in the bibliography, or even in a dedicated section ("data statement") is a good practice.
a. Add the data set reference in the bibliography
The dataset reference should ideally include the authors of the dataset, the year of publication, the title of the dataset, the data repository where the data is deposited, and other essential information such as the URL or DOI to access it. This gives e.g. :
Creator (Publication year). Tile [Data set]. Editor or hosting platform. Unique identifier. Version (if appropriate). Date accessed (if appropriate).
Pouliot-Laforte, A. (2021). Impairments and sagittal kinematics of the lower limbs of children with cerebral palsy [Data set]. Université de Genève, Yareta. https://doi.org/10.26037/yareta:ghvxtm3d2naenafsu6ungo2sny
b. Add a data availability statement
More and more journals are offering an ad hoc section called data statement or data availability statement. This allows the citation of the dataset to be highlighted, but also to be supplemented with additional information, such as: availability on request, availability on signing a user agreement, restricted availability with precise reasons for this, and many others. In concrete terms, the structure is as follows:
"The [data type, e.g. sequencing / interview / …] data that support the findings of this study/ generated and/or analysed during the current study are openly available/available upon request/upon signing of a data use agreement in the [NAME] repository at [URL / DOI / Accession number / Other persistent unique identifier]. (Possibly followed by the reference pointing to the bibliography where you will have indicated the complete citation: authors of the data set, title, etc. …)"
- "The data that support the findings of this study are openly available in the Yareta repository at https://doi.org/10.26037/yareta:yqae72143d."
- "Single-cell and bulk targeted sequencing data are accessible through the EGA database (https://www.ega-archive.org) under accession numbers EGAS00001006784 and EGAS00001006901, respectively. Other data are available upon reasonable request to the principal investigator."
It is possible to reserve a DOI, to integrate it into the data statement even if the deposit has not yet been finalised. Yareta or Zenodo, for example, offer this possibility, making it possible to first submit the manuscript to the journal, and then take the time to finalise the upload of the dataset.
In addition to the above - or if the reference to the data has not been included in the article's bibliography or data statement - it is also possible to create or reinforce this link between the data and the publication after the official release of the publication. The idea is to allow people accessing the data or the publication to easily locate the linked resource.
a. Add a link to the publication in the data repository.
In Yareta, on the deposit detail page, the DOI of the publication can be entered in the dedicated "DOI referenced by" field.
b. Where the publication has been reported, also insert the reference to the data
For publications deposited in the Archive ouverte UNIGE, it is possible to indicate where to find the associated data in step 3 when uploading/editing a publication, via the ad hoc field "URL or DOI of the data set".
Finally, even if no publication comes from this dataset, it is still possible to report it and highlight it:
- in one's ORCID profile, by adding a new element in the "work" section, and then by manually selecting the work type « Other: Data set», or by adding it with its DOI;
- in one's curriculum vitae, under the "open science" or "open data" section if it exists, like for instance in the (new) CV template used at the Faculty of Medicine;
- in one's final scientific report, like it is requested by the SNSF at the end of the project.
A data paper is a publication describing a dataset and published in a peer-reviewed journal. This type of publication can also be called a data descriptor, dataset paper, database paper, etc.
Focused on the description of the dataset and its technical information, the data paper does not contain any research hypothesis, methodology to confirm/infirm them or conclusions following the data analysis. The potential for reuse is however usually emphasised.
The figure below shows the structure of a data paper, as well as its links to the data it presents and makes visible.
Figure 1: Structure d'un data paper. Source: Windpouire Esther Dzale Yeumo, Dominique L'Hostis. Open Science. Gestion et partage des données de la recherche. Journée de Formation - URFIST Paris (22/01/2015) ; Mise à jour - Agropolis Montpellier (01/04/15), 2015, slide 108 (mise à jour 01/04/15). ⟨hal-02800107⟩
There are journals (general or disciplinary) dedicated solely to the publication of data papers, such as Gigascience ou Scientific Data. Other journals accept this type of contribution among others.
The University of Edinburgh has compiled a list of data journals. The GBIF also proposes a list of journals publishing data papers related to biodiversity.