- Analyse statistique de données catégorielles (Master)
- Statistical Inference (Master)
- Master of science in Knowledge Discovery in Databases, Nantes-Lyon, France.
- Engineering Degree in Business Intelligence, Polytech'Nantes, Nantes, France.
- Master of science (Maîtrise universitaire) in Mathematics, Nantes, France.
- Machine Learning and Data Mining methodologies for Life Course analysis
- Swarm Intelligence
- Unbalanced Data
- Cognitive Psychology
- Health Sociology
Thesis project: A knowledge discovery and management framework for mining rare life course patterns
Increasingly used in social sciences during the past decades, longitudinal analysis has recently seen new tools
emerge, in particular in the sequences analysis field. These works showed that data mining tools, for example
association rules, decision trees, self-organizing maps, etc., can successfully be applied for extracting knowledge
about life trajectories. But a database and software framework for handling life course as a whole is currently
lacking. Therefore a first goal of this thesis project is to provide an high-level tool for manipulating and
managing life course data. The software currently in development aims at (1) securing data with automatic
tests of data consistency and representativity of the initial population, and facilitating (2) manipulation of
life courses, (3) transmission of datasets, (4) the interoperability between methods and (5) the interoperability
between datasets. In this sense this software aims at providing a rigourous and efficient framework for what we
could call "life course mining". Then, we will design inside this framework two specific mining methods. The
first one will aim at adapting the learning process of entropy-based decision trees in the case of unbalanced data
with a very low occuring in some classes. This case occurs in particular when we study vulnerable situations
(poor health, low income, divorce, etc.) which fortunately are usually rare, or in person-period data. The
second one will aim at extending association rule method based on the intensity of implication measure for the
mining of multi-channel sequences. Special attention will be paid to the treatment of rules redundancy. Behind
all this work two thematic goals in health sociology are followed: (1) having the possibility to better detect
and understand how some people fall into a poor health state, how some of them succeded in leaving this state
and how some nearly vulnerable people manage to preserve good health, and (2) gaining more insights on the
manifestation of the Cumulative Advantage/Disadvantage model in health trajectories.
Poster presented by at the NCCR LIVES Site Visit of the Review Panel, november 12, 2012
The Dataset project: handling survey data in R
Especially designed for social scientists, the project aims at providing an efficient and secure way of handling and preparing cross-sectional and longitudinal survey data ready for analysis.
The software is freely available on R-Forge, to install and use it please follow this link.
- For starting with the package easily, you can ask for an introductory material to .
- You can ask for help from other users by subscribing at the Dataset-users mailing list.
- Be alerted on new releases by subscribing at the Dataset-updates mailing list.
- Feel free to ask for feature request or bug fix by sending an email at .