Anonymisation is an operation consisting of processing research data containing personal or sensitive information with the aim of making it impossible to attribute them directly or indirectly to the subjects they concern, identified or identifiable by these data. Anonymisation is an irreversible process.
Archiving research data means putting it on a secure platform for medium to long-term preservation. The retention period varies from one discipline to another, but the SNSF recommends archiving research data for a minimum period of 10 years.
It is important to distinguish between archiving and storage of research data. Indeed, these two actions do not pursue the same objectives and therefore there are dedicated and specific solutions and platforms for each of them.
Citation of data means providing a reference to the data, in the same way as for publications such as articles, reports and conference papers.
"Copyright protects the authors of literary and artistic works. It is the way in which an idea is expressed that is protected, not the idea or concept itself. Copyright protection therefore applies to the form of the work and not its content. For example, Einstein’s essay “The Foundation of the General Theory of Relativity" in the “Annals of Physics” is protected by copyright. The theory of relativity itself, however, may be freely used, just not with the same words as in Einstein’s original text."
Creative Commons or CC licences are a type of licence applicable to works specifying the conditions for their re-use and distribution. Creative Commons offers six licences to meet specific needs.
These licences can be applied to almost any type of work, for example: music, databases, photographs or educational resources. The only categories of works for which CC does not recommend its licences are software and hardware.
Source : Creative Commons
A data journal is a scientific journal dedicated to the publication of data papers.
The research data life cycle describes the different life stages data go through, from their creation to their archiving. This can be represented in several ways, for example the UK Data Service divides the life cycle into 6 stages:
- Planning data creation
- Data collection
- Data preparation and analysis
- Publish and share data
- Preparing data for preservation
- Reuse of data.
For each of these, actions and processes can be put in place to ensure that research data remains of high quality, integrity and security.
A Data Management Plan (DMP) is a formal document that outlines how research data will be handled during and after a research project.
Most funding agencies now require the submission of a DMP when applying for a grant. The Swiss National Science Foundation (SNSF) introduced this requirement during autumn 2017.
A data paper is a scientific article whose primary purpose is to describe in detail one or more datasets produced during a research project, typically with the help of metadata and without going into the analysis of the datasets themselves.
Data papers can be published in "traditional journals" or in dedicated journals called Data Papers and are in principle peer-reviewed.
A data repository is a space dedicated to the uploading of research data for the purpose of archiving and/or making it available for transparent or peer-to-peer re-use.
Repositories can be open or closed and can be classified into 3 main types:
- generic or multidisciplinary: open to all types of data
- disciplinary: open to data from a specific field/field of study
- institutional: managed by an institution and open to its members only
The San Francisco Declaration on Research Assessment (DORA) is a text published in 2013 by the American Society for Cell Biology and a group of scientific journal editors calling for the questioning and improvement of the evaluation of the performance of research, scientific journals and researchers, including bibliometric indicators such as the Journal Impact Factor or the H-Index..
The DLCM (Data Life-Cycle Management) project or national project on the lifecycle management of research data was launched in 2015 by 8 Swiss partner universities funded by swissuniversities. Their aim is to provide researchers with resources to support them in the various aspects of research data management and archiving.
In the context of knowledge dissemination, the embargo is a period of time during which access to a research product is restricted and permitted only under strict conditions. Embargoes can, for example, be requested by publishers in order to reserve exclusive rights to disseminate the publications concerned and thus give exclusive access to people who have subscribed to their services.
The FAIR principles aim to enforce data sharing standards to ensure that humans and computer systems can easily find, interpret and use data.
The acronym FAIR stands for :
- FINDABLE: Additional data and documents have sufficiently rich metadata and a unique and persistent identifier.
- ACCESSIBLE: Metadata and data are understandable to humans and machines. The data are deposited in a reliable repository.
- INTEROPERABLE: Metadata use a formal language that is accessible, shared and applicable to all forms of knowledge representation.
- REUSEABLE: The data and collections are clearly licensed for use and provide accurate information about their provenance.
File format is the way a software encodes information contained in a file. For each type of file (images, text, audio, spreadsheet, etc.), a number of specific formats are available.
In all cases, the format of a file can be identified by a suffix preceded by a dot at the end of the file name.
Example: "content.txt" --> .txt indicates a text file.
There are two types of file formats :
- Open or free formats, which can be used by anyone because the file specifications are publicly available.
- Proprietary or closed formats, which only work with the vendor's software, and when the software is no longer supported, files in this format are usually unreadable.
The file format greatly affects the accessibility and potential reusability of research data. For this reason, the choice of format must be made in an informed and carefully considered manner.
"The SNSF values research data sharing as a fundamental contribution to the impact, transparency and reproducibility of scientific research. In addition to being carefully curated and stored, the SNSF believes research data should be shared as openly as possible.
The SNSF therefore expects all its funded researchers
- to store the research data they have worked on and produced during the course of their research work,
- to share these data with other researchers, unless they are bound by legal, ethical, copyright, confidentiality or other clauses, and
- to deposit their data and metadata onto existing public repositories in formats that anyone can find, access and reuse without restriction."
Free formats or open formats are transparently encoded files whose technical specifications are public, accessible and unconditionally usable. These formats are interoperable, i.e. they can be opened and modified by any software designed to process the type of file (whether text, audio or images, etc.).
Open formats should be favoured as much as possible for data preservation and sharing, since they ensure the readability and reusability of these files over time while keeping them independent of a single technology.
LIMS is an acronym that stands for Laboratory Information Management System. LIMS works by being "connected directly to scientific measuring instruments (spectrometer, MRI, scanner or electron microscope) and by capturing data at source via an interface and ensuring their management and traceability".
Nowadays, LIMS are combined with ELNs in a single application, although historically they have been developed independently. These combined systems allow the entire laboratory workflow to be supported within a single tool.
Metadata (literally data about data) is information that describes the basic characteristics of a data item, regardless of its medium (physical or digital).
- Its author(s)
- Its contents
- Its creation date
- The place of capture/production
- The reason the data was generated
- How the data was created
These different specifications are called metadata fields.
The metadata therefore places the data in context, making it easier to understand, process and potentially reuse in the future.
In order to know what information to include in metadata, it is possible to rely on metadata standards, i.e. sets of specific fields aimed at simply describing datasets, such as the Dublin Core or Data Cite.
In the context of file and data organisation, naming conventions are standardised and systematic ways of naming the files produced during the search in order to facilitate their identification, in particular using short and descriptive names.
A naming convention is particularly important in the case of data managed within a team or laboratory.
The academic NAS (Network Attached Storage) is a storage space service for UNIGE researchers. It enables active research data to be stored on equipment that is easily accessible, fast and secure (authentication and integrated backup). It is suitable for data that need to be regularly consulted, exploited, modified and shared.
OpenAIRE (Open Access Infrastructure for Research in Europe) is a European project funded by Horizon 2020. It is organized around two main poles of action: networking experts in open science and leveraging their expertise for the creation of training courses and the development of an open technical infrastructure for the centralization, management and sharing of scientific publications and research data to support the work of European scientists.
Source : OpenAIRE
Open Research Data aims to make publicly funded research data freely and permanently accessible to researchers and citizens. This data must be FAIR (Findable, Accessible, Interoperable and Reusable) in order to be freely accessed, used, modified and shared.
Open Research Data is considered an essential element in the evolution of scientific research, particularly with regard to its transparency, reproducibility and measurement of its impact.
Open Science is an umbrella term for a set of initiatives and policies aimed at reforming the way in which scientific research is conducted, evaluated and disseminated. This initiative has notably given rise to Open Access and Open Research Data. Open Science emphasizes the importance of transparency, replicability and collaboration among all stakeholders in science.
ORCID, an acronym for Open Researcher and Contributor IDentifier, is a free, international persistent digital identifier system. This identifier allows a researcher to be uniquely identified and thus to distinguish him or her precisely from his or her peers, particularly from any of their namesakes. An ORCID can be linked to all of a scientist's productions such as publications, grants, and other contributions.
A persistent identifier, also known as a perennial identifier, is a string of characters and/or numbers used to uniquely identify a resource, irrespective of its location and with a long-term perspective.
The best known are: DOI, URI, ORCID and ARK.
These identifiers are generally structured in 3 parts. DOIs, for example DOI: 10.13097/archive-open/unige:27916, are structured as follows:
- a prefix corresponding to the type of identifier used
- the designation of the entity which has assigned the identifier ("10.13097" for UNIGE)
- the specific name of the resource ("archive-open/unige:27916")
- Overview of PID par Espasandin et al. 2018
Pseudonymisation is "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person".
Source : General Data Protection Regulation
Re3data or Registry of Research Data Repositories is a global directory of research data repositories launched in 2012.
It allows you to use several criteria to find research data repositories, such as:
This directory offers colored thumbnails assessing compliance with specific criteria, particularly in terms of accessibility. This makes it possible to assess whether a repository is compliant with FAIR principles.
Source : Re3Data
"Sensitive data is a specific category of personal data containing information about :
- religious, ideological, political or trade union-related views or activities,
- health, the intimate sphere or the racial origin,
- social security measures,
- administrative or criminal proceedings and sanctions;"
The storage of research data concerns so-called active data, i.e. data that is still in use. Storage must be on secure platforms, whose contents are backed up regularly, to ensure data integrity and security.
It is important to distinguish storage from archiving of research data. Indeed, these two actions do not pursue the same objectives and therefore there are dedicated and specific solutions and platforms for each of them.
Yareta is a Data Repository developed within the framework of the national "DLCM" project of swissuniversities and the cantonal bill "Digital Infrastructure for Research".
This platform complies with the FAIR principles for the management of research data. It is therefore in line with the requirements of the funders (SNSF, Horizon 2020) for the archiving and preservation of research data.
It is available to all researchers of Geneva's Higher Education Institutions .
Source : eresearch