The Geneva Cancer Registry (GCR ) has been recording new cases of tumors in the canton of Geneva since 1970. Other registries in Switzerland and other countries do the same for their own populations.

The disease and patient characteristics recorded (type of disease, date of diagnosis, treatment, etc.) are called "variables". These variables make it possible to study oncological diseases statistically, and to draw conclusions about their evolution as a function of various factors (population characteristics, environment, treatments, etc.). This helps research on treatments, to establish prevention and screening policies and, more generally, to gain a better understanding of these diseases.

In general, the larger the observed population, the more statistically significant are the observations. It is therefore very interesting to be able to combine anonymized data from several registers for the purposes of a study. Unfortunately, the data (variables) recorded by different registers are not always similar. Registries may record different variables, or variables representing the same information may be recorded in different formats.

To combine these data, a unification effort is required, transforming the so-called "source" data (the registry data) into a "target" format common to several registries.

International initiatives exist to assist this effort and provide tools. Such as the European Health Data & Evidence Network (EHDEN), which draws on the work of the Observational Health Data Sciences and Informatics (OHDSI) to propose a unified format for representing medical data, the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM).

For medical registry data to conform to the CDM format, software must be developed to transform the source data into the CDM format. This type of software is called "Extract-Transform-Load" (ETL).

SciCoS collaborated with the Geneva Cancer Registry from January 2022 to summer 2023 to develop such software, in an effort directly supported by EHDEN. A SciCoS developer was able to collaborate with GCR epidemiologists, pooling the skills of both groups. The result is an ETL software that transforms variables recorded by the GCR into CDM format, opening the door to international oncology studies facilitated by data standardization.