Creating a DMP of the SNSF

A Data Management Plan (DMP) is a formal document that outlines how the data of a research are to be handled both during a research project, and after the project has ended.

Most funding agencies are now requiring that a DMP is submitted along with a grant application. The Swiss National Fonds (SNF) is introducing this requirement for the next call for projects in autumne 2017.

Data Management Plan of the SNSF

Applicants to a SNSF grant must fill the DMP form directly in their mySNF account. For your convenience, we present here below the 12 questions (also available in a printable form with comments) and examples of answers taken from different sources [1, 2].

1 Data collection and documentation

1.1 What data will you collect, observe, generate or reuse?

Questions you might want to consider:
- What type, format and volume of data will you collect, observe, generate or reuse?
- Which existing data (yours or third-party) will you reuse?

Example of answer

This project will work with and generate three main types of raw data.

1. Images from transmitted-light microscopy of giemsa-stained squashed larval brains.
2. Images from confocal microscopy of immunostained whole-mounted larval brains.
3. Western blot data.

All data will be stored in digital form, either in the format in which it was originally generated (i.e. Metamorph files, for confocal images; Spectrum Mill files, for mass spectra with results of mass spectra analyses stored in Excel files; tiff file s for gel images; Filemaker Pro files for genetics records), or will be converted into a digital form via scanning to create tiff or jpeg files (e.g. western blots or other types of results).

Measurements and quantification of the images will be recorded in spreadsheets. Micrograph data is expected to total between 100GB and 1TB over the course of the project. Scanned images of western blots are expected to total around 1GB over the course of the project. Other derived data (measurements and quantifications) are not expected to exceed 10MB.

1.2 How will the data be collected, observed or generated?

Questions you might want to consider:
- What standards, methodologies or quality assurance processes will you use?
- How will you organize your files and handle versioning?

Example of answer

All samples on which data are collected will be prepared according to published standard protocols in the field. Files will be named according to a pre-agreed convention. The dataset will be accompanied by a README file which will describe the directory hierarchy and file naming convention.

Each directory will contain an INFO.txt file describing the experimental protocol used in that experiment. It will also record any deviations from the protocol and other useful contextual information.

Microscope images capture and store a range of metadata (field size, magnification, lens phase, zoom, gain, pinhole diameter etc.) with each image.

This should allow the data to be understood by other members of our research group and add contextual value to the dataset should it be reused in the future.

1.3 What documentation and metadata will you provide with the data?

Questions you might want to consider:
- What information is required for users (computer or human) to read and interpret the data in the future?

- How will you generte this documentation?

- What community standards (if any) will be used to annotate the (meta)data?

Example of answer

Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively.

It will be the responsibility of each researcher to annotate their data with metadata, and it will be the responsibility of the Principal Investigator to check weekly (during the field season, monthly otherwise) with all participants to assure data is being properly processed, documented, and stored.

All the datasets produced by the project will be published under a GNU licence.

2 Ethics, legal and security issues

2.1 How will ethical issues be addressed and handled?

Questions you might want to consider:
- What is the relevant protection standard for your data? Are you bound by a confidentiality agreement?
- Do you have the necessary permission to obtain, process, preserve and share the data? Have the people whose data you are using been informed or did they give their consent?
- What methods will you use to ensure the protection of personal or other sensitive data?

Example 1 of answer:

Les données à caractère personnel seront anonymisées avant partage et diffusion selon les recommandations de la CNIL.

Example 2 of answer:

This project will generate data designed to study the prevalence and correlates of DSM III-R psychiatric disorders and patterns and correlates of service utilization for these disorders in a nationally representative sample of over 8000 respondents. The sensitive nature of these data will require that the data be released through a restricted use contract.

2.2 How will data access and security be managed?

Questions you might want to consider:
- What are the main concerns regarding data security, what are the levels of risk and what measures are in place to handle security risks?
- How will you regulate data access rights/permissions to ensure the security of the data?
- How will personal or other sensitive data be handled to ensure safe data storage and transfer?

Example of answer for people using the NAS:

Our data is stored on the academic NAS managed by the UNIGE IT department (DiSTIC). Access to the data is limited to rights holders (central authentication). The head of the laboratory that owns this disk space manages access himself, with the possibility of registering additional users.

2.3 How will you handle copyright and Intellectual Property Rights issues?

Questions you might want to consider:
- Who will be the owner of the data?
- Which licenses will be applied to the data?
- What restrictions apply to the reuse of third-party data?

Key points for your answer

Research data generated by UNIGE collaborators in the performance of their duties is the property of the institution.

When the data is produced in partnership with a third party, it is strongly recommended to draw up, before the research project starts and with all the parties concerned, an agreeement on the use of the research data. In the absence of such a document, the researcher of the University and the third party will have to agree on the use of the data.

When the researcher wishes to use data produced by a third party, he must comply with the copyright license or, in the absence of such a license, require prior consent of the third party.

If you wish to transfer to a company, outside of an existing research agreement, research data that may represent a commercial interest, you can contact Unitec, the technology transfer service, which can answer all your questions, assist you in the drafting of any contracts governing the transfer and remuneration of the University, and help in negotiation with third party.

In general, and as part of its mission in knowledge development and sharing, the University encourages free dissemination of data and research results, while respecting the rights and duties of the parties (management of personal  or sensitive data, for example). A license must be assigned to the data that may be shared in order to clarify the conditions associated with the use and possible transfer to third parties of such data. CreativeCommons licenses, such as CC0 or CC-BY, are common recommended choices. For any questions about these licenses, the Research Data team is available.

A decision tree to help you choose an appropriate license is available.

3 Data storage and preservation

3.1 How will your data be stored and backed-up during the research?

Questions you might want to consider:
- What are your storage capacity and where will the data be stored?
- What are the back-up procedures?

Example of answer for people using the NAS:

Our data is stored on the academic NAS managed by the University of Geneva's IT department - the Information and Communication Technologies and Systems Division (DiSTIC). This academic NAS follows common protocols and best practices to ensure maximum security, integrity and availability. It extends over two distinct physical locations (UniDufour and Campus Biotech) and automatically performs a snapshot of files every 4 hours, with a retention of copies of 6 weeks.

3.2 What is your data preservation plan?

Questions you might want to consider:
- What procedures would be used to select data to be preserved?
- What file formats will be used for preservation?

Example of answer:

We will preserve the data for 10 years on university’s servers and also deposit it in an appropriate data archive at the end of the project (see section 4.1 below). Where possible, we will store files in open archival formats e.g. Word files converted to PDF-A or simple text files encoded in UTF-8 and Excel files converted to CSV. In case this is not possible, we will include information on the software used and its version number.

 

4 Data sharing and reuse

4.1 How and where will the data be shared?

Questions you might want to consider:
- On which repository do you plan to share your data?
- How will potential users find out about your data?

Examples of answer:

Ex. 1:

The project data will be stored in the Swiss national repository developed by the University of Geneva within the Data Life Cycle Management (DLCM) project, which will be operational in the course of 2019. This will ensure data archiving and sharing is fully compliant with FAIR principles.

Ex. 2:

Datasets from this work which underpin a publication will be deposited in Enlighten: Research Data, the University of Glasgow’s institutional data repository, and made public at the time of publication. Data in the repository will be stored in accordance with funder and University data policies. Files deposited in Enlighten: Research Data will be given a Digital Object Identifier (DOI) and the associated metadata will be listed in the University of Glasgow Research Data Registry and the DataCite metadata store. The retention schedule for data in Enlighten: Research Data will be 10 years from date of deposition in the first instance, with extensions applied to datasets which are subsequently accessed. This complies with both University of Glasgow guidance and funder policies.

Enlighten: Research Data is backed by commercial digital storage wich is audited on a twice yearly basis for compliance with the ISO27001 Information Security Management standard.

The DOI issued to datasets in the repository can be included as part of a data citation in publications, allowing the datasets underpinning a publication to be identified and accessed.

Metadata about datasets held in the University Registry will be publicly searchable and discoverable and will indicate how and on what terms the dataset can be accessed.

4.2 Are there any necessary limitations to protect sensitive data?

Questions you might want to consider:
- Under which conditions will the data be made available (timing of data release, reason for delay if applicable)?

Example 1 of answer:

Astronomical data will be diffused but under an embargo of one year for priority of exploitation reasons.

Les données astronomiques sont destinées à être diffusées mais bénéficient d’une durée d’embargo d’un an pour priorité d’exploitation.

Example 2 of answer:

Personal data will be anonymized before diffusion based on the recommendations from CNIL.

Les données à caractère personnel seront anonymisées avant partage et diffusion selon les recommandations de la CNIL.

Example 3 of answer:

Data will be made available under Creative Commons License CC-BY.

4.3 I will choose digital repositories that are conform to the FAIR Data Principles. [CHECK BOX]

You can find certified repositories in the catalog of repositories Re3data.org

4.4 I will choose digital repositories maintained by a non-profit organisation. [RADIO BUTTON yes/no]

--> If the answer is no: “Explain why you cannot share your data on a non-commercial digital repository.”