Timothy Frayling

Team Science

An excellent data centric and “team science” approach to human genetic and genomic research

Data resources

The team’s program of research makes use of the incredibly rich resources that are available for human genetics research internationally, and within Switzerland. Research funders now insist that all data is openly accessible and “FAIR”. This increasingly open approach is facilitated by the move to working with data in “Safe Data Environments” also known as “Trusted research environments”. This new way of working requires robust analytical pipelines and training for all team members.

Cross-cutting Data Analytics and Statistics

Along research themes and projects, the team organisation includes a cross-cutting functional workstream focused on Data Analytics and Statistics, led by Aurélie and Lauric and summarised above.

This initiative aims at ensuring that members of the team have strong support and practical guidelines for all aspects of data analysis and statistics, especially ensuring PhD students and junior postdocs have the best training and support they need to effectively develop analytical pipelines in a collaborative and reproducible way.

This type of matrix organisation with both research and operational/functional dimensions is unusual in academia but will be increasingly important given the huge opportunities, but also considerable challenges, of working with large complex datasets, including in trusted research environments such as DNA nexus.

Agile methodology

Led by Aurélie, the team uses principles of Agile methodology to develop and build analytical pipelines within the constraints of the trusted research environments hosting datasets. These pipelines will be used across the team's different research projects.

Reproducible analytical pipelines

Code developments are implemented while ensuring reproducible data analysis and following FAIR principles – Findable, Accessible, Inter-operable and Reproducible. This includes adhering to basic good data practice, such as:

  • Keeping newly derived variables as code only, not in new copies of datasets;
  • Credit for sharing using gitbub Dissemination and version control using git and github repositories
  • Working in Trusted Research Environments / Secure Data Environments such as DNA Nexus