Data Analysis.
From thousands of pages of illustrated printed material to the study of images in circulation.

 

 

Since January, the team has been collecting as large a corpus as possible of illustrated print items available in digital form. Thanks to the work of Céline Bélina, Thomas Gauffroy-Naudin and Barbara Topalov, the corpus now amounts to about 40,000 items (journals, magazines, complete collections, or simple posters), spread over 49 countries. Most of the sources retrieved are iiif compliant – and we are working to turn the non-interoperable images to iiif . 

Our next step is to identify the images that have been reproduced the most. Thanks to Robin Champenois, we have developed a platform, VisualContagions/explore, which allows us to treat our corpus. From the iiif urls that give access to an image and its metadata, we can:

1.     Retrive the images in the documents - all the pages concerned are thus isolated, without losing the metadata already associated with the medium (i.e. the date, place of publication, and the title of the source if applicable).

2.     We separate each image from its support (this is called segmentation)

3.     Then we characterize each image according to a vector defined by a comparison algorithm

4.     So that we can compare similar images and group them into clusters

Thanks to a group of 10 students of the "Cours transversal sur le numérique", with the assistance of Dr. Anna Scius-Bertrand, first tests were carried out during Spring Semester 2021, on a limited corpus of images already available in iiif, published between 1920 and 1939. See the CTN2's report. The results of the work of CTN2 were presented on May 27 and are available for replay.

In parallel, the team deployed several jupyter notebooks allowing the analysis of the recovered clusters: classification of the clusters according to their number of images, the number of unique cities represented, according to the date of appearance of the oldest image (developments carried out by Cédric Viaccoz, Anna Scius-Bertrand, and Béatrice Joyeux-Prunel)

July 2021 : The team is now working on extending the methodology to the whole corpus. This implies :

- converting a large part of our corpus to iiif format

- passing it into the Explore platform

- linking similar images retrieved from Explore, according to the RDF model of the project (CIDOC-CRM and CIDOC/VIR ontology).