Creating metadata

Metadata, literally "data about data", is information that describes the basic characteristics of a data item or dataset, regardless of the medium (physical or digital).

For example :

Its author(s)
Its content
Date of creation
The place of capture/production
The reason why the data was generated
How the data was created
etc.

These different elements are called metadata fields.

The role of metadata is therefore to set research data in the context of its creation and use, making it easier to understand, process and potentially reuse by oneself or others. Metadata should be as complete as possible, using the standards and conventions of the discipline in question, and should be machine readable.

Typology

Metadata can be classified into several main families. Several typologies coexist, such as the one proposed by the Australian National Data Service (ANDS) which distinguishes 6 families of metadata:

Descriptive Metadata

This metadata is used to facilitate the discovery, evaluation and understanding of content.

Examples include:

Title,
Author / contributors Description,
Location and dates of the study
Language
Keywords Unique identifiers (ISBN, DOI etc.)
etc.

Technical Metdata

This metadata enables the interoperability of data across different systems and in effect ensures that it can be read by both humans and machines.

Examples:

How is the data configured?
What formats and versions of formats are used?
How is the database configured?

Provenance Metadata

This metadata describes the origin of the data and the processes it has gone through. It is necessary for a good understanding, interpretation and reuse of the data. This metadata concerns human actions as well as very technical aspects and often requires some knowledge of the research domain to be filled in.

Examples:

Where did the data come from?
Why was it collected?
Who collected it, where and when?
What instruments or technologies were used to collect the data and how were they set up?
How was the data processed?
etc.

Rights and Access Metadata

This metadata describes how the data can be accessed and used, the copyright status, the licence conditions and the rights holders.

Examples:

How can someone access the data?
Who is authorised to read or modify the data or metadata, and under what conditions?
Who is responsible or has authority over the data?
Are there any costs associated with accessing the data?
Under what licence is the data available?

Citation Metadata

This metadata contains the information necessary for the data to be properly cited by third parties.

Example:

Creators
Year of publication
Title
Publisher
Identifier
etc.

Standards / Disciplinary metadata schemas

Determining precisely what metadata should be filled in is a difficult task, as the choice is highly dependent on the context of production and the use of the data. For this reason, initiatives have created templates with a list of elements that match the description needs of a discipline or a special purpose. These models are called metadata standards.

Examples of standards :

Dublin Core : a standard consisting, in its initial version, of 15 elements and generally used to describe books.
Darwin Core : a standard derived from the Dublin Core and developed for the specific needs of biodiversity informatics for describing and facilitating information sharing.
Data Documentation Initiative (DDI) : an international standard for describing data produced in the social, behavioral, economic, and health sciences. DDI standards enable data to be documented, discovered, and interoperable. The specifications and tools are available on the DDI website.
Digital Imaging and Communications in Medicine (DICOM) : an international standard accredited by ISO 12052 specific to medical images and their related information. It defines the formats of medical images that can be exchanged with the data and quality necessary for their clinical use.

Many academic disciplines have formalized specific metadata standards adapted to the needs of their communities and the reuse of their data.

On its website, the Digital Curation Centre (DCC) offers a page gathering these standards with general information for each of them, tools to implement them and use cases of data repositories that currently use them.

The FAIRSharing initiative also provides a summary table of metadata standards.

When a list of metadata fields has a particular structure and more constraining values in terms of format or options, it becomes a metadata schema.

The metadata schemas thus propose lists of elements, mandatory or optional, to be filled in, accompanied by the precise syntax to be used. For example, the formatting of dates following the model 2021-05-14 or 20210514.

Example of a schema:

DataCite schema: consisting of a list of fields selected for their suitability for accurate and consistent identification of a resource for citation and retrieval purposes. In addition, the fields have been classified into three categories: mandatory, recommended and optional. Full documentation with recommended usage instructions for this schema is available on their website.

When and how create metadata

As with the management of research data as a whole, metadata should be created as early as possible and over the course of the project to avoid overload at the end of the project when the research data is archived.

Metadata can be created manually or by relying on software or platforms to facilitate or automate this process. These platforms can be general or discipline-specific.

The Digital Curation Centre compiled a list of these tools.

Creating metadata

Typology

Standards / Disciplinary metadata schemas

When and how create metadata

Source: Australian National Data Service (ANDS), 2016. ANDS guide : metadata. Décembre 2016.