Publications
| 2011 |
| Studer, M., Ritschard, G., Gabadinho, A. & Müller, N.S. (2011), "Discrepancy Analysis of State Sequences", Sociological Methods and Research. Vol. 40(3), pp. 471-510. |
| Abstract: In this article, the authors define a methodological framework for analyzing the relationship between state sequences and covariates. Inspired by the principles of analysis of variance, this approach looks at how the covariates explain the discrepancy of the sequences. The authors use the pairwise dissimilarities between sequences to determine the discrepancy, which makes it possible to develop a series of statistical significance-based analysis tools. They introduce generalized simple and multifactor discrepancy-based methods to test for differences between groups, a pseudo-R 2 for measuring the strength of sequence-covariate associations, a generalized Levene statistic for testing differences in the within-group discrepancies, as well as tools and plots for studying the evolution of the differences along the time frame and a regression tree method for discovering the most significant discriminant covariates and their interactions. In addition, the authors extend all methods to account for case weights. The scope of the proposed methodological framework is illustrated using a real-world sequence data set. |
BibTeX:
@article{StuderRitschardGabadinhoMuller2011SMR,
author = {Studer, Matthias and Gilbert Ritschard and Alexis Gabadinho and Nicolas S. Müller},
title = {Discrepancy Analysis of State Sequences},
journal = {Sociological Methods and Research},
year = {2011},
volume = {40},
number = {3},
pages = {471-510},
doi = {http://dx.doi.org/10.1177/0049124111415372}
}
|
| Studer, M. (2011), "Inégalités de genre au doctorat: Une analyse quantitative des trajectoires d'assistanat à l'Université de Genève.", In Work in Progress en Etudes genre |
BibTeX:
@conference{Studer2011,
author = {Matthias Studer},
title = {Inégalités de genre au doctorat: Une analyse quantitative des trajectoires d'assistanat à l'Université de Genève.},
booktitle = {Work in Progress en Etudes genre},
year = {2011}
}
|
| Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2011), "Searching for typical life trajectories applied to childbirth histories", In Levy, R. & Widmer, E. (eds) Between individualisation and standardisation - Life courses and their gendering in Switzerland and beyond. (book in preparation). |
BibTeX:
@incollection{GabadinhoRitschardStuderMuller2011typical,
author = {Gabadinho, Alexis and Gilbert Ritschard and Matthias Studer and Nicolas S. Müller},
title = {Searching for typical life trajectories applied to childbirth histories},
booktitle = {Between individualisation and standardisation - Life courses and their gendering in Switzerland and beyond},
editor = {Levy, René and Eric Widmer},
year = {2011},
note = {(book in preparation)}
}
|
| Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2011), "Extracting and Rendering Representative Sequences", In Fred, A., Dietz, J.L.G., Liu, K. & Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. Series: Communications in Computer and Information Science (CCIS). Volume 128, pp. 94-106. Springer-Verlag. |
| Abstract: This paper is concerned with the summarization of a set of categorical sequences. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighbourhood. The proposed heuristic for extracting the representative subset requires as main arguments a pairwise distance matrix, a representativeness criterion and a distance threshold under which two sequences are considered as redundant or, identically, in the neighbourhood of each other. It first builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in our TraMineR R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains. |
BibTeX:
@incollection{Gabadinho_et_al2011CCIS,
author = {Gabadinho, Alexis and Gilbert Ritschard and Matthias Studer and Nicolas S. Müller},
title = {Extracting and Rendering Representative Sequences},
booktitle = {Knowledge Discovery, Knowledge Engineering and Knowledge Management},
editor = {Fred, Ana and Jan L. G. Dietz and Kecheng Liu and Joaquim Filipe},
publisher = {Springer-Verlag},
year = {2011},
series = {Communications in Computer and Information Science (CCIS)},
volume = {128},
pages = {94-106},
doi = {http://dx.doi.org/10.1007/978-3-642-19032-2}
}
|
| Gabadinho, A., Ritschard, G., Müller, N.S. & Studer, M. (2011), "Analyzing and visualizing state sequences in R with TraMineR", Journal of Statistical Software. Vol. 40(4), pp. 1-37. |
BibTeX:
@article{GabadinhoRitschardMullerStuder2011JSS,
author = {Gabadinho, Alexis and Gilbert Ritschard and Nicolas S. Müller and Matthias Studer},
title = {Analyzing and visualizing state sequences in R with TraMineR},
journal = {Journal of Statistical Software},
year = {2011},
volume = {40},
number = {4},
pages = {1--37},
url = {http://www.jstatsoft.org/v40/i04/}
}
|
| 2010 |
| Studer, M., Ritschard, G., Gabadinho, A. & Müller, N.S. (2010), "Discrepancy analysis of complex objects using dissimilarities", In Guillet, F., Ritschard, G., Zighed, D.A. & Briand, H. (eds) Advances in Knowledge Discovery and Management. Series: Studies in Computational Intelligence. Volume 292, pp. 3-19. Berlin: Springer. |
| Abstract: In this article we consider objects for which we have a matrix of dissimilarities and we are interested in their links with covariates. We focus on state sequences for which pairwise dissimilarities are given for instance by edit distances. The methods discussed apply however to any kind of objects and measures of dissimilarities. We start with a generalization of the analysis of variance (ANOVA) to assess the link of complex objects (e.g. sequences) with a given categorical variable. The trick is to show that discrepancy among objects can be derived from the sole pairwise dissimilarities, which permits then to identify factors that most reduce this discrepancy. We present a general statistical test and introduce an original way of rendering the results for state sequences. We then generalize the method to the case with more than one factor and discuss its advantages and limitations especially regarding interpretation. Finally, we introduce a new tree method for analyzing discrepancy of complex objects that exploits the former test as splitting criterion. We demonstrate the scope of the methods presented through a study of the factors that most discriminate Swiss occupational trajectories. All methods presented are freely accessible in our TraMineR package for the R statistical environment. |
BibTeX:
@incollection{StuderRitschardGabadinhoMuller2010akdm,
author = {Studer, Matthias and Gilbert Ritschard and Alexis Gabadinho and Nicolas S. Müller},
title = {Discrepancy analysis of complex objects using dissimilarities},
booktitle = {Advances in Knowledge Discovery and Management},
editor = {Fabrice Guillet and Gilbert Ritschard and Djamel A. Zighed and Henri Briand},
publisher = {Springer},
year = {2010},
series = {Studies in Computational Intelligence},
volume = {292},
pages = {3-19},
address = {Berlin},
doi = {http://dx.doi.org/10.1007/978-3-642-00580-0\_1}
}
|
| Studer, M., Müller, N.S., Ritschard, G. & Gabadinho, A. (2010), "Classer, discriminer et visualiser des séquences d'événements", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. (in press). |
| Abstract: This article presents a set of tools to analyze event sequences in the social sciences and visualize the results. We begin by formalizing the notion of event sequence before defining a measure of dissimilarity between these sequences to cluster them and test the links between these sequences and other variables of interest. Initially defined by Moen (2000), this measure is based on the notion of edit distance between sequences and identifies the differences in sequencing and timing of events. We propose an extension of it in order to take into account the simultaneity of events and a normalization method that guarantees the respect of the triangle inequality. In a second step, we present a set of tools to interpret the results. We thus propose two methods of viewing a set of sequences and we introduce the concept of discriminant subsequence that identifies differences in sequencing that are the most significant between groups. All the tools presented are available in the TraMineR R library. |
BibTeX:
@article{StuderMullerRitschardGabadinho2010EGC,
author = {Studer, Matthias and Nicolas S. Müller and Gilbert Ritschard and Alexis Gabadinho},
title = {Classer, discriminer et visualiser des séquences d'événements},
booktitle = {Extraction et gestion des connaissances (EGC 2010)},
journal = {Revue des nouvelles technologies de l'information RNTI},
year = {2010},
note = {(in press)}
}
|
| Studer, M. (2010), "Analyse de dispersion des séquences: présentation des méthodes et application à l'étude des carrières académiques", In Colloque international sur les parcours sociaux entre nouvelles contraintes et affirmation du sujet, Le Mans, novembre 2010 |
BibTeX:
@conference{Studer2010LeMans,
author = {Studer, Matthias},
title = {Analyse de dispersion des séquences: présentation des méthodes et application à l'étude des carrières académiques},
booktitle = {Colloque international sur les parcours sociaux entre nouvelles contraintes et affirmation du sujet, Le Mans, novembre 2010},
year = {2010}
}
|
| Müller, N.S., Studer, M., Ritschard, G. & Gabadinho, A. (2010), "Extraction de règles d'association séquentielle à l'aide de modèles de durée", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. (in press). |
| Abstract: Association rules mining is a thriving research field in data mining. These methods can also be applied to sequential data. Two problems arise when one wants to apply association rules mining to sequential data. First, the main criterium used to extract sequential patterns is their frequency. However, two events might be strongly associated even if they do not happen frequently. Second, association rules measures do not take into account the temporal aspect of sequential data, like the importance of the duration between two events or the problem of censured obsevations. In this article, we propose a method to extract significant associations between events using duration models. Association rules are extracted from each sequential pattern observed in a set of sequences. Then, the influence on the risk that the "conclusion" event occurs after the "premise" event(s) is estimated using a proportional hazard semi-parametric duration model. This paper presents the method and a comparison with some classical association measures. |
BibTeX:
@article{MullerStuderRitschardGabadinho2010EGC,
author = {Müller, Nicolas S. and Matthias Studer and Gilbert Ritschard and Alexis Gabadinho},
title = {Extraction de règles d'association séquentielle à l'aide de modèles de durée},
booktitle = {Extraction et gestion des connaissances (EGC 2010)},
journal = {Revue des nouvelles technologies de l'information RNTI},
year = {2010},
note = {(in press)}
}
|
| Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2010), "Indice de complexité pour le tri et la comparaison de séquences catégorielles", In Extraction et gestion des connaissances (EGC 2010), Revue des nouvelles technologies de l'information RNTI. (in press). |
| Abstract: This paper introduces a complexity index for categorical state sequences. Though, the index is more specifically intended for measuring the complexity of sequences describing biographical trajectories in social sciences, it applies to all kind of ordered lists of states. The measure accounts for two distinct aspects of complexity: the complexity of the sequencing of the states captured by the number of transitions and the diversity of states in the sequence measured with Shannon's entropy. |
BibTeX:
@article{GabadinhoRitschardStuderMuller2010EGC,
author = {Gabadinho, Alexis and Gilbert Ritschard and Matthias Studer and Müller, Nicolas S.},
title = {Indice de complexité pour le tri et la comparaison de séquences catégorielles},
booktitle = {Extraction et gestion des connaissances (EGC 2010)},
journal = {Revue des nouvelles technologies de l'information RNTI},
year = {2010},
note = {(in press)}
}
|
| 2009 |
| Studer, M., Ritschard, G., Gabadinho, A. & Müller, N.S. (2009), "Analyse de dissimilarités par arbre d'induction", In Extraction et gestion des connaissances (EGC 2009), Revue des nouvelles technologies de l'information RNTI. Vol. E-15, pp. 7-18. |
| Abstract: In this article we consider objects for which we have a matrix of dissimilarities and we are interested in their links with attributes. We focus on state sequences for which dissimilarities are given for instance by edit distances. The methods discussed apply however to any kind of objects and measures of dissimilarities. We start with a generalization of the analysis of variance (ANOVA) to assess the link of non measurable objects (e.g. sequences) with a given categorical variable. The trick is to show that variability among objects can be derived from the sole dissimilarities, which permits then to identify factors that most reduce this variability. We infer a general statistical test and introduce an original way of rendering the results for state sequences. We then generalize the method to the case with more than one factor and discuss its benefits and limitations especially regarding interpretation. Finally, we introduce a new tree method for general objects that exploits the former test based on dissimilarity measures as splitting criterion. We demonstrate the scope of the various methods presented through a study of the factors that most discriminate occupational trajectories. |
BibTeX:
@article{StuderRitschardGabadinhoMuller2009EGC,
author = {Studer, Matthias and Gilbert Ritschard and Alexis Gabadinho and Nicolas S. Müller},
title = {Analyse de dissimilarités par arbre d'induction},
booktitle = {Extraction et gestion des connaissances (EGC 2009)},
journal = {Revue des nouvelles technologies de l'information RNTI},
year = {2009},
volume = {E-15},
pages = {7-18}
}
|
| Ritschard, G., Studer, M. & Oris, M. (2009), "Analyse statistique implicative des transitions professionnelles dans la Genève du 19e siècle", Revue des Nouvelles Technologies de l'Information (RNTI). (à paraitre). |
| Abstract: Cet article reprend l'analyse statistique implicative de la dynamique socioprofessionnelle dans la première moitié du 19e à Genève que nous avons présentée aux rencontres ASI4 (Oris et Ritschard, 2007) et la compare à une analyse supervisée des dissimilarités entre transitions. Les données considérées résultent de l'appariement deux à deux de 6 recensements. Plus précisément, nous considérons le groupe socioprofessionnel (GSP) des individus retenus et son changement entre deux recensements successifs. Nous nous intéressons aux types de transition (stable, devenir actif, cesser l'activité, ...) ainsi qu'aux nouveaux venus (immigrés et naissances) et disparus (émigrés et décédés). L'analyse statistique implicative donne une vision synthétique des liens entre ces dynamiques et les GSP concernés, ainsi qu'avec un certain nombre de variables démographiques et culturelles (sexe, âge, état-civil, religion). Elle met en lumière notamment des polarisations autour de variables clé. L'analyse des dissimilarités permet quant à elle de segmenter la population en groupes homogènes en fonction des caractéristiques démographiques et culturelles. Le recours à l'intensité d'implication pour identifier les transitions typiques des groupes ainsi obtenus s'avère une précieuse aide à l'interprétation et donne les éléments nécessaires à la comparaison avec les résultats du graphe implicatif. |
BibTeX:
@article{RitschardStuderOris2009rnti,
author = {Ritschard, Gilbert and Matthias Studer and Michel Oris},
title = {Analyse statistique implicative des transitions professionnelles dans la Genève du 19e siècle},
journal = {Revue des Nouvelles Technologies de l'Information (RNTI)},
year = {2009},
note = {(à paraitre)}
}
|
| Ritschard, G., Gabadinho, A., Studer, M. & Müller, N.S. (2009), "Converting between various sequence representations", In Ras, Z. & Dardzinska, A. (eds) Advances in Data Management. Series: Studies in Computational Intelligence. Springer. (forthcoming). |
| Abstract: This chapter is concerned with the organization of categorical sequence data. We first build a typology of sequences distinguishing for example between chronological sequences and sequences without time content. This permits to identify the kind of information that the data organization should preserve. Focusing then mainly on chronological sequences, we discuss the advantages and limits of different ways of representing time stamped event and state sequence data and present solutions for automatically converting between various formats, e.g., between horizontal and vertical presentations but also from state sequences into event sequences and reciprocally. Special attention is also drawn to the handling of missing values in these conversion processes. |
BibTeX:
@incollection{RitschardGabadinhoStuderMuller2009DManag,
author = {Ritschard, Gilbert and Alexis Gabadinho and Matthias Studer and Nicolas S. Müller},
title = {Converting between various sequence representations},
booktitle = {Advances in Data Management},
editor = {Ras, Zbigniew and Agnieszka Dardzinska},
publisher = {Springer},
year = {2009},
series = {Studies in Computational Intelligence},
note = {(forthcoming)}
}
|
| Ritschard, G., Gabadinho, A., Müller, N.S. & Studer, M. (2009), "Mining event histories: A social science perspective", International Journal of Data Mining, Modelling and Management. Vol. 1(1), pp. 68-90. |
| Abstract: We explore how recent data mining-based tools developed in domains such as biomedicine or text mining for extracting interesting knowledge from sequence data could be applied to personal life course data. We focus on two types of approaches: survival trees that attempt to partition the data into homogeneous groups regarding their survival characteristics, i.e., the duration until a given event occurs and the mining of typical discriminating episodes. We show how these approaches may fruitfully complement the outcome of more classical event history analyses and single out some specific issues raised by their application to socio-demographic data. |
BibTeX:
@article{RitschardGabadinhoMullerStuder2009IJDMMM,
author = {Ritschard, Gilbert and Alexis Gabadinho and Nicolas S. Müller and Matthias Studer},
title = {Mining event histories: A social science perspective},
journal = {International Journal of Data Mining, Modelling and Management},
year = {2009},
volume = {1},
number = {1},
pages = {68-90},
doi = {http://dx.doi.org/10.1504/IJDMMM.2008.022538}
}
|
| Ritschard, G., Gabadinho, A., Müller, N.S. & Studer, M. (2009), "Données séquentielles: Concepts, principes d'analyse et pratique", Tutoriel I + II, EGC 09, Strasbourg, janvier 2009. |
BibTeX:
@misc{tutorielEGC2009,
author = {Ritschard, Gilbert and Alexis Gabadinho and Nicolas S. Müller and Matthias Studer},
title = {Données séquentielles: Concepts, principes d'analyse et pratique},
howpublished = {Tutoriel I + II, EGC 09, Strasbourg, janvier 2009},
year = {2009},
url = {http://mephisto.unige.ch/biomining/EGC_tutoriel_donnees_sequentielles.html}
}
|
| Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2009), "Summarizing Sets of Categorical Sequences", In International Conference on Knowledge Discovery and Information Retrieval, Madeira, 6-8 October, 2009. INSTICC. (forthcoming). |
| Abstract: This paper is concerned with the summarization of a set of categorical sequence data. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighborhood. The goal is to yield a representative set that exhibits the key features of the whole sequence data set and permits easy sounded interpretation. We propose an heuristic for determining the representative set thatrst builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in TraMineR our R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains. |
BibTeX:
@incollection{Gabadinho2009,
author = {Gabadinho, Alexis and Gilbert Ritschard and Matthias Studer and Nicolas S. Müller},
title = {Summarizing Sets of Categorical Sequences},
booktitle = {International Conference on Knowledge Discovery and Information Retrieval, Madeira, 6-8 October, 2009},
publisher = {INSTICC},
year = {2009},
note = {(forthcoming)}
}
|
| Gabadinho, A., Ritschard, G., Studer, M. & Müller, N.S. (2009), "Mining Sequence Data in R with TraMineR: A User's Guide for version 1.1". Department of Econometrics and Laboratory of Demography, University of Geneva, Geneva, 2009. |
| Abstract: Full User Guide for TraMineR v. 1.1 |
BibTeX:
@techreport{Gabadinho_et_al_2009TraMineR-UGuide,
author = {Gabadinho, Alexis and Ritschard, Gilbert and Studer, Matthias and Müller, Nicolas S.},
title = {Mining Sequence Data in R with TraMineR: A User's Guide for version 1.1},
year = {2009},
institution = {Department of Econometrics and Laboratory of Demography, University of Geneva},
address = {Geneva},
note = {(TraMineR is on CRAN the Comprehensive R Archive Network)}
}
|
| Gabadinho, A., Müller, N.S., Ritschard, G. & Studer, M. (2009), "TraMineR: une librairie R pour l'analyse de données séquentielles", In Extraction et gestion des connaissances (EGC 2009), Revue des nouvelles technologies de l'information RNTI. Vol. E-15, pp. 7-18. |
| Abstract: Short presentation of TraMineR |
BibTeX:
@article{GabadinhoMullerRitschardStuder2009EGC,
author = {Gabadinho, Alexis and Nicolas S. Müller and Gilbert Ritschard and Studer, Matthias},
title = {TraMineR: une librairie R pour l'analyse de données séquentielles},
booktitle = {Extraction et gestion des connaissances (EGC 2009)},
journal = {Revue des nouvelles technologies de l'information RNTI},
year = {2009},
volume = {E-15},
pages = {7-18}
}
|
| 2008 |
| Studer, M., Gabadinho, A., Müller, N.S. & Ritschard, G. (2008), "Approches de type n-grammes pour l'analyse de parcours de vie familiaux", In Extraction et gestion des connaissances (EGC 2008), Revue des nouvelles technologies de l'information RNTI. Vol. E-11, II, pp. 511-522. Cépaduès. |
| Abstract: Cet article porte sur l'analyse de parcours de vie représentés sous forme de séquences d'événements. Plus spécifiquement, on examine les possibilités d'exploiter des codages de type n-grammes de ces séquences pour en extraire des connaissances. En fait, compte tenu de la simultanéité de certains événements, une procédure stricte de n-grammes comme on peut par exemple l'appliquer sur des textes, n'est pas applicable ici. Nous discutons diverses alternatives qui s'avèrent finalement plus proches de la fouille de séquences fréquentes. Les concepts discutés sont illustrés sur des données de l'enquête biographique rétrospective réalisée par le Panel suisse de ménages en 2002. Enfin, on précisera sur quels aspects l'approche proposée peut apporter un éclairage complémentaire utile par rapport à d'autres techniques plus classiques d'analyse exploratoire de parcours de vie. |
BibTeX:
@article{StuderGabadinhoMullerRitschard2008EGC,
author = {Studer, Matthias and Alexis Gabadinho and Nicolas S. Müller and Gilbert Ritschard},
title = {Approches de type n-grammes pour l'analyse de parcours de vie familiaux},
booktitle = {Extraction et gestion des connaissances (EGC 2008)},
journal = {Revue des nouvelles technologies de l'information RNTI},
publisher = {Cépaduès},
year = {2008},
volume = {E-11, II},
pages = {511-522}
}
|
| Ritschard, G., Studer, M. & Pisetta, V. (2008), "Strategies in Identifying Issues Addressed in Legal Reports", In Brito, P. (eds) COMPSTAT 2008 - Proceedings in Computational Statistics, pp. 277-288. Berlin: Springer. |
| Abstract: This paper deals with the automatic retrieval of issues reported in legal texts and presents an experience with expert's reports on the application of ILO Conventions. The aim is to provide the end user, i.e. the legal expert, with a set of rules that permits her/him to find among a predefined list of issues those addressed by any new text. Since the end user is not supposed to be able to pre-process the text, we need rules that can be directly applied on raw texts. We present the strategy followed for generating the rules in this ILO legal setting and single out a few possible improvements that should significantly improve the performance of the retrieval process. Our approach consists in characterizing in a first stage a list of descriptor concepts, which are then used to get a quantitative representation of the texts. In the learning phase, using a sample of texts labeled by legal experts with the issues they actually address, we build the rules by means of induced decision trees. |
BibTeX:
@incollection{RitschardStuderPisetta2008compstat,
author = {Ritschard, Gilbert and Matthias Studer and Vincent Pisetta},
title = {Strategies in Identifying Issues Addressed in Legal Reports},
booktitle = {COMPSTAT 2008 - Proceedings in Computational Statistics},
editor = {Paula Brito},
publisher = {Springer},
year = {2008},
pages = {277-288},
address = {Berlin},
doi = {http://dx.doi.org/10.1007/978-3-7908-2084-3_23}
}
|
| Müller, N.S., Lespinats, S., Ritschard, G., Studer, M. & Gabadinho, A. (2008), "Visualisation et classification des parcours de vie", In Extraction et gestion des connaissances (EGC 2008), Revue des nouvelles technologies de l'information RNTI. Vol. E-11, II, pp. 499-510. Cépaduès. |
| Abstract: Cet article propose une méthodologie pour la visualisation et la classification des parcours de vie. Plus spécifiquement, nous considérons les parcours de vie d'individus suisses nés durant la première moitié du XXème siècle en utilisant les données provenant de l'enquête biographique rétrospective menée en 2002 par le Panel suisse de ménages. Nous nous sommes concentrés sur ces événements du parcours de vie : le départ du foyer parental, la naissance du premier enfant, le premier mariage et le premier divorce. A partir des données de base sur ces événements, nous discutons de leur transformation en séquences d'états. Nous présentons ensuite notre méthodologie pour extraire de la connaissance des parcours de vie. Cette méthodologie repose sur des distances calculées par un algorithme d'optimal matching. Ces distances sont ensuite utilisées pour la classification des parcours de vie et leur visualisation à l'aide de techniques de "Multi Dimensional Scaling". Cet article s'intéresse en particulier aux problématiques entourant l'application de ces méthodes aux données de parcours de vie. |
BibTeX:
@article{MullerLespinatsRitschardStuderGabadinho2008EGC,
author = {Müller, Nicolas S. and Sylvain Lespinats and Gilbert Ritschard and Matthias Studer and Alexis Gabadinho},
title = {Visualisation et classification des parcours de vie},
booktitle = {Extraction et gestion des connaissances (EGC 2008)},
journal = {Revue des nouvelles technologies de l'information RNTI},
publisher = {Cépaduès},
year = {2008},
volume = {E-11, II},
pages = {499-510}
}
|
| Müller, N.S., Gabadinho, A., Ritschard, G. & Studer, M.(2008), "Extracting knowledge from life courses: Clustering and visualization", In Data Warehousing and Knowledge Discovery, 10th International Conference, DAWAK 2008, Turin, Italy, September 2-5. Berlin Heidelberg. Volume LNCS 5182, pp. 176-185. Springer. |
| Abstract: his article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the n most frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs. |
BibTeX:
@inproceedings{MullerGabadinhoRitschardStuder2008DaWaK,
author = {Müller, Nicolas S. and Alexis Gabadinho and Gilbert Ritschard and Matthias Studer},
title = {Extracting knowledge from life courses: Clustering and visualization},
booktitle = {Data Warehousing and Knowledge Discovery, 10th International Conference, DAWAK 2008, Turin, Italy, September 2-5},
publisher = {Springer},
year = {2008},
volume = {LNCS 5182},
pages = {176-185},
address = {Berlin Heidelberg},
doi = {http://dx.doi.org/10.1007/978-3-540-85836-2\_17}
}
|
| 2007 |
| Studer, M., Ritschard, G., Baccaro, L., Müller, N.S. & Zighed, D.A.(2007), "Relations entre types de violation des libertés syndicales garanties par les conventions de l'OIT: une analyse de statistique implicative des résultats d'une fouille de texte", In 4èmes Rencontres Internationales Analyse Statistique Implicative (ASI4), Castellón de la Plana (España), 18-21 octubre 2007., pp. 111-122. |
| Abstract: Au travers d'une analyse de violations observées de conventions de l'OIT, le but de ce papier est de montrer comment l'analyse implicative complémente avantageusement des analyses exploratoires plus classiques. Plus précisément, nous nous intéressons aux types de violations relevées par les experts chargés d'observer le respect des Conventions nº 87 et nº 98 de l'OIT sur les droits syndicaux. Les données sont des prédictions obtenues à l'aide d'un apprentissage fondé sur la fouille de texte. Nous comparons essentiellement trois méthodes soit l'analyse statistique implicative, l'analyse factorielle des correspondances et la classification automatique des individus. Nous discutons les apports de chacune de ces méthodes. |
BibTeX:
@inproceedings{StuderRitschardBaccaroMullerZighed2007asi,
author = {Studer, Matthias and Gilbert Ritschard and Lucio Baccaro and Nicolas S. Müller and Djamel A. Zighed},
title = {Relations entre types de violation des libertés syndicales garanties par les conventions de l'OIT: une analyse de statistique implicative des résultats d'une fouille de texte},
booktitle = {4èmes Rencontres Internationales Analyse Statistique Implicative (ASI4), Castellón de la Plana (España), 18-21 octubre 2007},
year = {2007},
pages = {111-122}
}
|
| Studer, M., Müller, N.S. & Ritschard, G. (2007), "Understanding the KDE Social Structure through Mining of Email Archive", In 2nd Workshop on Public Data about Software Development (WoPDaSD 2007), Third International Conference on Open Source Systems (OSS), June 11-14, 2007, Limerick, Ireland |
BibTeX:
@conference{StuderMullerRitschard2007WoPDaSD,
author = {Studer, Matthias and Nicolas S. Müller and Gilbert Ritschard},
title = {Understanding the KDE Social Structure through Mining of Email Archive},
booktitle = {2nd Workshop on Public Data about Software Development (WoPDaSD 2007), Third International Conference on Open Source Systems (OSS), June 11-14, 2007, Limerick, Ireland},
year = {2007}
}
|
| Studer, M.(2007), "Community Structure, Individual Participation and the Social Construction of Merit", In The Third International Conference on Open Source Systems. Springer-Verlag. |
BibTeX:
@inproceedings{Studer2007,
author = {Matthias Studer},
title = {Community Structure, Individual Participation and the Social Construction of Merit},
booktitle = {The Third International Conference on Open Source Systems},
publisher = {Springer-Verlag},
year = {2007}
}
|
| Ritschard, G., Zighed, D.A., Baccaro, L., Georgiou, I., Pisetta, V. & Studer, M. (2007), "Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining". Working Papers, 2007.02. Department of Econometrics of the University of Geneva, July, 2007. |
| Abstract: This paper explains how text mining was used within the context of a research project on social dialogue regimes, jointly undertaken by the University of Geneva, the University of Lyon 2 and the International Institute of Labour Studies of the International Labour Organisation (ILO). The research project, which was made possible through the generous support of the Geneva International Academic Network Foundation (GIAN), is seeking to provide a better understanding of the structural determinants (e.g., economic, social, cultural and institutional), as well as the socio-economic outcomes of social dialogue regimes." Animportant part of the research is based on the analysis of the reports of the Committee of Experts on the Application of Conventions and Recommendations (CEACR). The CEACR is one of the main bodies within the ILO supervisory system. It is responsible for supervising the application by member States of ILO Conventions and recommendations. Its observations concerning the progress made by members States in the implementation of ratified ILO Conventions are published in its reports. Text mining was used for the purpose of extracting useful information from these reports in semi-automatic way. This paper discusses the text mining approach that was followed, the dierent steps of the mining process and presents a synthetic analysis of the results obtained. |
BibTeX:
@techreport{Ritschard_et_al2007,
author = {Ritschard, Gilbert and Djamel A. Zighed and Lucio Baccaro and Irini Georgiou and Vincent Pisetta and Matthias Studer},
title = {Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining},
year = {2007},
number = {2007.02},
pages = {46},
type = {Working Papers},
institution = {Department of Econometrics of the University of Geneva}
}
|
| Ritschard, G., Studer, M., Muller, N. & Gabadinho, A. (2007), "Comparing and classifying personal life courses: From time to event methods to sequence analysis", In 2nd Symposium of COST Action C34 (Gender and Well-Being). The Transmission of Well-Being: Marriage Strategies and Inheritance Systems in Europe from 17th-20th Centuries. University of Minho, Guimaraes, Portugal, April 25-28, 2007 |
| Abstract: This paper is mainly methodological. It is concerned with the different ways me may analyse personal life course data. Personal life courses are defined by a succession of events regarding living arrangement, familial life, education, professional career, health, etc. We may focus on one of these events - leaving home, marriage, first job, divorce, becoming disabled - and examine how the hazard of experiencing it evolves with time and may be affected by other factors or events. Alternatively, we may be interested, in a more holistic way, in analysing and comparing whole sequences. The paper surveys the main available methods and classifies them into a typology that distinguishes between the nature of data - time stamped events or sequences - they need and the kind of questions - descriptive or causal - they address. Both classical statistical methods and promising but less known data-mining-based approaches are discussed. The aim of the paper is to put these approaches into perspective by focusing on their specificity and the complementary views they bring on life courses. Three illustrations using data from the Swiss Household Panel will show the kind of results we may expect from some of the less known methods: The first concerns sex differences in working status mobility, the second is a survival tree analysis of the risk to divorce, while the third focuses in sex differences in the sequencing of a selection of young adults life events. |
BibTeX:
@conference{Ritschard2007cost34,
author = {Ritschard, Gilbert and Matthias Studer and Nicolas Muller and Alexis Gabadinho},
title = {Comparing and classifying personal life courses: From time to event methods to sequence analysis},
booktitle = {2nd Symposium of COST Action C34 (Gender and Well-Being). The Transmission of Well-Being: Marriage Strategies and Inheritance Systems in Europe from 17th-20th Centuries. University of Minho, Guimaraes, Portugal, April 25-28, 2007},
year = {2007}
}
|
| Ritschard, G., Gabadinho, A., Müller, N. & Studer, M. (2007), "Innovative Data Mining Based Approaches for Life Course Analysis", In 4th International Conference of Panel Data Users in Switzerland (IPUC07), Neuchâtel, February 23-24, 2007, pp. 17. |
BibTeX:
@conference{RitschardGabadinhoMullerStuder2007ipuc,
author = {Ritschard, Gilbert and Alexis Gabadinho and Nicolas Müller and Matthias Studer},
title = {Innovative Data Mining Based Approaches for Life Course Analysis},
booktitle = {4th International Conference of Panel Data Users in Switzerland (IPUC07), Neuchâtel, February 23-24, 2007},
year = {2007},
pages = {17}
}
|
| Müller, N.S., Studer, M. & Ritschard, G. (2007), "Classification de parcours de vie à l'aide de l'optimal matching", In XIVe Rencontre de la Société francophone de classification (SFC 2007), Paris, 5 - 7 septembre 2007, pp. 157-160. |
| Abstract: Ce travail analyse les parcours de vie familiale en les considérant comme des séquences. Le but est de parvenir à observer le caractère temporel des parcours de vie en prenant en compte la durée entre chaque événement constitutif de ce parcours, mais aussi l'ordre dans lequel ils surviennent. Nous proposons d'appliquer aux données de l'enquête biographique rétrospective du Panel suisse de ménages une méthode d'analyse des séquences dans le but d'obtenir une typologie des parcours de vie du 20ème siècle. Celle-ci nous permettra ensuite de mieux approcher les changements qui ont pu intervenir dans leur structure. |
BibTeX:
@conference{MullerStuderRitschard2007,
author = {Müller, Nicolas S. and Matthias Studer and Gilbert Ritschard},
title = {Classification de parcours de vie à l'aide de l'optimal matching},
booktitle = {XIVe Rencontre de la Société francophone de classification (SFC 2007), Paris, 5 - 7 septembre 2007},
year = {2007},
pages = {157-160}
}
|
| 2005 |
| Studer, M. (2005), "Pratique photographique et jugement esthétique", In Ducret, A. & Schultheis, F. (eds) Pierre Bourdieu en algérie : Un photographe de circonstance. Association des étudiants en Sociologie. |
BibTeX:
@incollection{Studer2005,
author = {Matthias Studer},
title = {Pratique photographique et jugement esthétique},
booktitle = {Pierre Bourdieu en algérie : Un photographe de circonstance},
editor = {Ducret, André and Schultheis, Franz},
publisher = {Association des étudiants en Sociologie},
year = {2005}
}
|
| 2004 |
| Studer, M. (2004), "Gift and Free Software", The Commoner |
BibTeX:
@article{Studer2004,
author = {Matthias Studer},
title = {Gift and Free Software},
journal = {The Commoner},
year = {2004}
}
|
