Filling a DMP for the SNSF
Since 2017, the SNSF requires a DMP, which must be filled directly in the mySNF account. For your convenience, we present here below the 12 questions (also available in a printable form with comments) and examples of answers taken from different sources [1, 2].
1 Data collection and documentation
Questions you might want to consider:
- What type, format and volume of data will you collect, observe, generate or reuse?
- Which existing data (yours or third-party) will you reuse?
Example of answer
This project will work with and generate three main types of raw data.
1. Images from transmitted-light microscopy of giemsa-stained squashed larval brains.
2. Images from confocal microscopy of immunostained whole-mounted larval brains.
3. Western blot data.
All data will be stored in digital form, either in the format in which it was originally generated (i.e. Metamorph files, for confocal images; Spectrum Mill files, for mass spectra with results of mass spectra analyses stored in Excel files; tiff file s for gel images; Filemaker Pro files for genetics records), or will be converted into a digital form via scanning to create tiff or jpeg files (e.g. western blots or other types of results).
Measurements and quantification of the images will be recorded in spreadsheets. Micrograph data is expected to total between 100GB and 1TB over the course of the project. Scanned images of western blots are expected to total around 1GB over the course of the project. Other derived data (measurements and quantifications) are not expected to exceed 10MB.
Questions you might want to consider:
- What standards, methodologies or quality assurance processes will you use?
- How will you organize your files and handle versioning?
Example of answer
All samples on which data are collected will be prepared according to published standard protocols in the field. Files will be named according to a pre-agreed convention. The dataset will be accompanied by a README file which will describe the directory hierarchy and file naming convention.
Each directory will contain an INFO.txt file describing the experimental protocol used in that experiment. It will also record any deviations from the protocol and other useful contextual information.
Microscope images capture and store a range of metadata (field size, magnification, lens phase, zoom, gain, pinhole diameter etc.) with each image.
This should allow the data to be understood by other members of our research group and add contextual value to the dataset should it be reused in the future.
Questions you might want to consider:
- What information is required for users (computer or human) to read and interpret the data in the future?
- How will you generte this documentation?
- What community standards (if any) will be used to annotate the (meta)data?
Example of answer
Metadata will be tagged in XML using the Data Documentation Initiative (DDI) format. The codebook will contain information on study design, sampling methodology, fieldwork, variable-level detail, and all information necessary for a secondary analyst to use the data accurately and effectively.
It will be the responsibility of each researcher to annotate their data with metadata, and it will be the responsibility of the Principal Investigator to check weekly (during the field season, monthly otherwise) with all participants to assure data is being properly processed, documented, and stored.
All the datasets produced by the project will be published under a GNU licence.
2 Ethics, legal and security issues
Questions you might want to consider:
- What is the relevant protection standard for your data? Are you bound by a confidentiality agreement?
- Do you have the necessary permission to obtain, process, preserve and share the data? Have the people whose data you are using been informed or did they give their consent?
- What methods will you use to ensure the protection of personal or other sensitive data?
Example 1 of answer:
Les données à caractère personnel seront anonymisées avant partage et diffusion selon les recommandations du Préposé fédéral à la protection des données..
Example 2 of answer:
This project will generate data designed to study the prevalence and correlates of DSM III-R psychiatric disorders and patterns and correlates of service utilization for these disorders in a nationally representative sample of over 8000 respondents. The sensitive nature of these data will require that the data be released through a restricted use contract.
Questions you might want to consider:
- What are the main concerns regarding data security, what are the levels of risk and what measures are in place to handle security risks?
- How will you regulate data access rights/permissions to ensure the security of the data?
- How will personal or other sensitive data be handled to ensure safe data storage and transfer?
Example of answer for people using the NAS:
Our data is stored on the academic NAS managed by the UNIGE IT department (DiSTIC). Access to the data is limited to rights holders (central authentication). The head of the laboratory that owns this disk space manages access himself, with the possibility of registering additional users.
Questions you might want to consider:
- Who will be the owner of the data?
- Which licenses will be applied to the data?
- What restrictions apply to the reuse of third-party data?
Key points for your answer
Research data generated by UNIGE collaborators in the performance of their duties is the property of the institution.
When the data is produced in partnership with a third party, it is strongly recommended to draw up, before the research project starts and with all the parties concerned, an agreeement on the use of the research data. In the absence of such a document, the researcher of the University and the third party will have to agree on the use of the data.
When the researcher wishes to use data produced by a third party, he must comply with the copyright license or, in the absence of such a license, require prior consent of the third party.
If you wish to transfer to a company, outside of an existing research agreement, research data that may represent a commercial interest, you can contact Unitec, the technology transfer service, which can answer all your questions, assist you in the drafting of any contracts governing the transfer and remuneration of the University, and help in negotiation with third party.
In general, and as part of its mission in knowledge development and sharing, the University encourages free dissemination of data and research results, while respecting the rights and duties of the parties (management of personal or sensitive data, for example). A license must be assigned to the data that may be shared in order to clarify the conditions associated with the use and possible transfer to third parties of such data. Creative Commons licenses, such as CC0 or CC BY, is common recommended choice. For any questions about these licenses, the Research Data team is available.
A decision tree to help you choose an appropriate license is available.
Example of answer:
Research data generated by UNIGE collaborators in the performance of their duties are the property of the institution. As the data is not subjected to a contract and will not be patented, it will be released as open data under Creative Commons CC0 license.
3 Data storage and preservation
Questions you might want to consider:
- What are your storage capacity and where will the data be stored?
- What are the back-up procedures?
Example of answer for people using the NAS:
Our data is stored on the academic NAS managed by the University of Geneva's IT department - the Information and Communication Technologies and Systems Division (DiSTIC). This academic NAS follows common protocols and best practices to ensure maximum security, integrity and availability. It extends over two distinct physical locations (UniDufour and Campus Biotech) and automatically performs a snapshot of files every 4 hours, with a retention of copies of 6 weeks.
Questions you might want to consider:
- What procedures would be used to select data to be preserved?
- What file formats will be used for preservation?
Example of answer:
At the end of the project we will preserve the data and its documentation for 10 years on university’s servers and also deposit it in an appropriate data archive (see section 4.1 below). Where possible, we will store files in open archival formats e.g. Word files converted to PDF-A or simple text files encoded in UTF-8 and Excel files converted to CSV. In case this is not possible, we will include information on the software used and its version number.
4 Data sharing and reuse
Questions you might want to consider:
- On which repository do you plan to share your data?
- How will potential users find out about your data?
Example of answer 1:
The project data will be stored in Yareta, the University of Geneva data repository, also available to researchers from Geneva other high education institutions. This will ensure data archiving and sharing is fully compliant with FAIR principles.
Example of answer 2:
Datasets from this work which underpin a publication will be deposited in Enlighten: Research Data, the University of Glasgow’s institutional data repository, and made public at the time of publication. Data in the repository will be stored in accordance with funder and University data policies. Files deposited in Enlighten: Research Data will be given a Digital Object Identifier (DOI) and the associated metadata will be listed in the University of Glasgow Research Data Registry and the DataCite metadata store. The retention schedule for data in Enlighten: Research Data will be 10 years from date of deposition in the first instance, with extensions applied to datasets which are subsequently accessed. This complies with both University of Glasgow guidance and funder policies.
Enlighten: Research Data is backed by commercial digital storage wich is audited on a twice yearly basis for compliance with the ISO27001 Information Security Management standard.
The DOI issued to datasets in the repository can be included as part of a data citation in publications, allowing the datasets underpinning a publication to be identified and accessed.
Metadata about datasets held in the University Registry will be publicly searchable and discoverable and will indicate how and on what terms the dataset can be accessed.
Questions you might want to consider:
- Under which conditions will the data be made available (timing of data release, reason for delay if applicable)?
Example 1 of answer:
Astronomical data will be diffused but under an embargo of one year for priority of exploitation reasons.
Les données astronomiques sont destinées à être diffusées mais bénéficient d’une durée d’embargo d’un an pour priorité d’exploitation.
Example 2 of answer:
Personal data will be anonymized before diffusion based on the recommendations from Swiss Federal Data Protection and Information Commissioner.
Les données à caractère personnel seront anonymisées avant partage et diffusion selon les recommandations de la CNIL.
You can find certified repositories in the catalog of repositories Re3data.org
--> If the answer is no: “Explain why you cannot share your data on a non-commercial digital repository.”