Thomas Hills
New Methods in Text Analysis

Thomas Hills is a Professor of Psychology at the University of Warwick. He teaches courses in quantitative approaches to behavioral science, language, and computational social sciences. His publications include work in psychology, communications, education, and economics, and focus on issues associated with large-scale analysis of language, memory, and wellbeing. He is currently the Director of the Bridges Doctoral Training Centre in Mathematical and Social Sciences and the Co-Director of the Behavioural Science Global Research Priority at the University of Warwick, both of which aim to provide and develop quantitative approaches to data in the social sciences. He received a Mid-Career Fellowship from the British Academy, a merit award from the Royal Society, and is currently a fellow of the Alan Turing Institute.

Workshop contents and objectives

The aim of this workshop is to provide participants with an understanding of how new methods in data science are being applied to large text-based data sets (language corpora and social media). This will include new methods for collecting social media, methods for quantifying changes in text at the word and document level, and approaches to predicting behavior from these sources. Some of the kinds of questions this approach has been involved in include predicting consumer views of brands or political leaders, detecting regional and historical changes in happiness, and using language to predict personality. This workshop will use case-studies involving making psychological inferences from natural language, across domains such as culturomics, memory, wellbeing, in each case using millions of words of text derived from multiple sources. Students will learn the basic approaches for collecting data and quantifying text, from single words to structure at the document level.

The course will begin by providing participants with a broad overview of data science and big data applications to existing problems in text and natural language processing. Specific cases will then be taken up for a more detailed analysis of their methodological approach, and participants will work with data to replicate existing findings and investigate novel hypotheses of their own. Finally, participants will receive guidance in developing and answering questions of their own.

On completion of the course, participants will be able to recognize and implement many common approaches to text analysis and take the first steps towards formulating and addressing problems of their own as data scientists. Participants will also be provided with detailed information about how to follow up and learn more with respect to their particular area of interest.

Bibliography: Preparatory readings


Participants taking this course should be familiar with basic statistical ideas and have some experience with computer programming. The course will primarily use R, but I will provide all the code. Please bring your laptop with you.