In the project's kitchen

– or behind the scenes of Big Data

Béatrice Joyeux-Prunel & Nicola Carboni

Once the strategy and its purpose are well defined, how do you go about it?

The contemporary world of research lives on what are called calls for projects, opened by major research funding agencies - European and national agencies.

Our project on globalisation through images first interested Europe: the Erasmus + agency has supported three years of teaching, research and creation on the images that made Europe at the Ecole Normale Supérieure in Paris from autumn 2019 to autumn 2022. More recently, the Visual Contagions project has convinced the Swiss National Science Foundation (SNSF), which is, along with the University of Geneva, the leading funding institution of the Visual Contagions project.

Visual Contagions is a team of a dozen people, regularly reinforced by wonderful students from the University of Geneva, the Beaux-Arts de Paris, the Haute Ecole d'Art et de Design de Genève and the Ecole normale supérieure de Paris.

If we add the contributors to the Artl@s project, the Visual Contagions project's hive of specialists, scholars and students   reaches a hundred people.

--

The team has been busy building a corpus of millions of images, setting up robust infrastructures, establishing relevant methodologies and processing chains, and interpreting the results of the research, which must constantly be compared with other scales of analysis and other types of historical sources. The team will continue to work for several more years, as the project is so large.

The ideology of the digital and the aesthetics of the laboratory encourage the presentation of a scientific research project in its cleanest form. But it is only afterwards that a methodology resembles a simple recipe.

Because it is from mistakes, that one learns the most, as much about one's research object as about one's tools and their effects.

We have therefore decided to present our difficulties, our trials and failures; not an immaculate, tidy bench, as decorum would have it. Let's go through the dirtiest corners of the Visual Contagions kitchen.

The protocols of scientific research are deceptive. What reigns is bias, the incompleteness of the corpus, the difficulty of describing everything and reporting on what is happening.

Ingredients

In the age of Big Data and data science, it is clear that there is no transparent data, and that nothing is fully available. It is not certain that the humanistic sciences are more concerned about the availability of clean and exhaustive data than their so-called "hard sciences" sisters.

Between theory and reality, the constitution of a corpus can be much more difficult than expected. The team's first task was to gather, from digital libraries and archives, as many scans as possible of illustrated periodicals published since the 1880s, and images of works of art created at the same time; on as global a scale as possible. After a year's work, those involved in the collection have collected over 603'966 unique items, from a 50-year series of a weekly periodical to a single issue of a magazine. Just over two thousand seven hundred periodical titles were collected, published in two thousand four hundred cities in one hundred and twenty countries. As far as exhibition catalogues are concerned, the Artl@s team has so far collected more than 5,500 exhibitions, more than 3,000 catalogues and, of those that could be encoded in the database, about 120,000 works exhibited from the 1860s to the 1970s.​​​​​​​

 
 

--

Geographical distribution of the sources collected by Visual Contagions in April 2022. Total number of countries: 121.

 

 

It's not much, but it's huge. Still bigger than what an individual researcher can ever gather and collect.

For the images, some of the harvested resources were not available in the format that allows us not to have to restock them. When images are stored using the IIIF standard - ​​​​​IIIF – International Image Interoperability Framework it means they can be searched and analysed without being duplicated locally. This is not (yet) the case, so we had to process large quantites of data on our own servers. The team therefore had to transform these images into the standard in question, which would require the registration of more than 15,000 documents in JPEG and PDF formats within the infrastructure of the University of Geneva; that is to say, nearly 170 GigaBytes of visual data.

You can't have anything without infrastructure; to study images as well as to circulate them. It is understandable that no panoramic study of globalisation through images has yet been attempted.