MUSTI - Multimodal Understanding of Smells in Texts and Images

See the MediaEval 2022 webpage for information on how to register and participate.

Task Description

Smell is an underrepresented dimension of many multimedia analysis and representation tasks. The goal of Musti is to further the understanding of descriptions and depictions of smells in texts and images. In this shared task, participants are provided with multilingual texts (English, German, Italian, French) and images, from the 16th to the 20th century, that pertain to smell (i.e. selected because they evoke smells). This task’s goal is to recognise references to smells in texts and images and to connect these smell references across the different modalities.

Subtask 1: Musti Classification: Task participants develop language and image recognition technologies to predict whether a text passage and an image evoke the same smell source or not. This main mandatory task can therefore be cast as a binary classification problem.

Subtask 2: Musti detection: Task participants are asked to identify what is (are) the common smell source(s) between the text passages and the images. The detection of the smell source includes detecting the person, object or place that has a specific smell, or that produces odorous (e.g. plant, animal, perfume, human). In other words, the smell source is the entity or phenomenon that a perceiver experiences with his or her senses. This optional sub-task can be cast as a multi-label classification problem.

Motivation and background

To make sense of digital (heritage) collections, it is necessary to go beyond an oculo-centric approach and to engage with their olfactory dimension as it offers a powerful and direct entry to our emotions and memories. Via the Musti task, we aim to accelerate the understanding of olfactory references in English, French, Italian and German texts and images as well as the connection between these modalities. As recent and ongoing exhibitions at Mauritshuis in The Hague, Netherlands, Museum Ulm in Ulm, Germany, and the Prado Museum in Madrid, Spain demonstrate, museums and galleries are keen to enrich museum visits with olfactory components - either for a more immersive experience or to create a more inclusive experience for differently abled museum visitors such as those with a visual impairment.

Reinterpreting historical scents is attracting attention from various research disciplines (Huber et al., 2022) in some cases leading to interesting collaborations with perfume makers such as the Scent of the Golden Age candle developed after a recipe by Constantijn Huygens in a collaboration between historians and a perfume maker.

To ensure that such enrichments are grounded in historically correct contexts, language and computer vision technologies can aid in finding olfactory relevant examples in their collections and related sources.

Target group

The task is of interested to researchers interested in natural language processing, computer vision, multimedia analysis, and cultural heritage.


The Musti data set consists of copyright-free texts and images. It contains texts in English, German, Italian, and French that participants are to match to the images. The texts are selected from open repositories such as Project Gutenberg, Europeana, Royal Society Corpus, Deutsches Text Arxiv, Gallica, and the Italian Novel Collection.

The images are selected from different archives such as RKD, Bildindex der Kunst und Architektur, Museum Boijmans, Ashmolean Museum Oxford, Plateforme ouverte du patrimoine. The images are annotated with 80+ categories of smell objects and gestures such as flowers, food, animals, sniffing and holding the nose. The object categories are organised in a two-level taxonomy.

The Odeuropa text and image benchmark datasets are available as training data to the participants. The image dataset consists of ~3000 images with 20,000 associated object annotations and 600 gesture annotations.

Submissions will be evaluated on a held-out dataset of roughly 1,200 images with associated texts in the four languages.

Ground truth

The ground truth consists of images and text snippets that contain appearences or mentions of smell related objects. If a text passage and an image evoke the same smell the relation between an image and a text passage is manually positive, otherwise negative. This dataset is distilled from the Odeuropa text and image benchmark datasets.

Evaluation methodology

Task runs will be evaluated against a gold standard consisting of image-text pairs. We will evaluate using multiple statistics as each provides a slightly different perspective on the results. Main Task: Predicting whether an image and a text passage evoke the same smell source or not

This task will be evaluated using precision, recall and F1-measure. As multiple text passages in different languages can be linked to the same image, we will employ multiple linking scorers such as CEAF and BLANC to measure the performance across different smell reference chains. Subtask: Identifying the common smell source(s) between the text passages and the images

For this task, precision, recall and F1-measure will be employed, as well as more fine-grained evaluation methods such as RUFES, which can accommodate multi-level taxonomies.

Quest for insight

Participant information

Please contact your task organizers with any questions on these points.

B. Huber, T. Larsen, R. Spengler, and N. Boivin. “How to use modern science to reconstruct ancient scents,” Nat Hum Behav (2022).

S. Ehrich, C., Verbeek, M. Zinnen, L. Marx, C. Bembibre, and I. Leemans, “Nose-First. Towards an Olfactory Gaze for Digital Art History.” In 2021 Workshops and Tutorials-Language Data and Knowledge, LDK 2021 (pp. 1-17). September 2021, Zaragoza, Spain.

P. Lisena, D. Schwabe, M. van Erp, R. Troncy, W. Tullett, I. Leemans, L. Marx, and S. Ehrich, “Capturing the semantics of smell: The Odeuropa data model for olfactory heritage information,” in Proceedings of ESWC 2022, Extended Semantic Web Conference, May 29-June 2, 2022, Hersonissos, Greece.

S. Menini, T. Paccosi, S. Tonelli, M. van Erp, I. Leemans, P. Lisena, R. Troncy, W. Tullett, A. Hürriyetoğlu, G.Dijkstra, F. Gordijn, E. Jürgens, J. Koopman, A. Ouwerkerk, S. Steen, I. Novalija, J. Brank, D. Mladenic, and A. Zidar “A Multilingual Benchmark to Capture Olfactory Situations over Time” In Proceedings of LChange 2022. May 2022. Dublin, Ireland.

S. Menini, T. Paccosi, S. Tekiroğlu, and S. Tonelli “Building a Multilingual Taxonomy of Olfactory Terms with Timestamps” In Proceedings of Language Resources and Evaluation Conference (LREC) 2022. June 2022. Marseille, France.

S. Tonelli and S. Menini, “FrameNet-like annotation of olfactory information in texts,” in Proceedings of the 5th joint SIGHUM workshop on computational linguistics for cultural heritage, social sciences, humanities and literature, Punta Cana, Dominican Republic (online), 2021, p. 11–20.

M. Zinnen and V. Christlein “Annotated Image Data version 1 - Odeuropa Deliverable D2.2”

Task organizers

Task Schedule


This task is an output of Odeuropa project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101004469.