Discovery of biomarkers and predictive models
We perform machine learning (ML) analyses to help you reveal patterns in complex biomedical data (e.g. bulk or single-cell transcriptomics, image analysis, questionnaires, etc.) and extract scientifically relevant information. Our approach emphasises data processing quality, bias prevention (by avoiding overfitting), robust validation (cross-validation, bootstrap) and biological interpretation.
What we offer
- Predictive modelling (regularised regression, Random Forest, XGBoost gradient boosting, neural networks) based on the trade-off between performance and interpretability.
- Signature extraction (lists of genes for simplified biological interpretation)
- Variable selection (initial filtering by amplitude of variation, embedded methods such as glmnet / sPLS)
- Multi-cohort data harmonisation (normalisation, batch effect correction, clinical annotation harmonisation).
- Model validation and adjustment using anti-overfitting procedures (separation into training and test datasets, cross-validation, bootstrapping)
What we need from you
The question and phenotype to be predicted, with a clear definition of the ‘gold standard’.
The data (expression, metadata/annotations) and the sharing and access constraints.
A scientific contact to validate the choices (metrics, interpretability vs. complexity trade-offs).
Example: predictive biomarkers with PAGEpy
PAGEpy (Predictive Analysis of Gene Expression in Python) is an open-source Python programme that can be used to quickly test whether a multi-layer neural network can predict a target variable from a gene expression dataset. This tool integrates a train/test separation pipeline, variable gene selection and selection optimisation using a Particle Swarm Optimisation (PSO) system.