See the MediaEval 2021 webpage for information on how to register and participate.
Disaster-related images are complex and often evoke an emotional response, both good and bad. This task focuses on performing visual sentiment analysis on images collected from disasters across the world. The images contained in the provided dataset aim to provoke an emotional response through both intentional framining and based on the contents itself.
Subtask 1: Single-label Image Classification The first task aims at a single-label image classification task, where the images are arranged in three different classes, namely positive, negative, and neutral with a bias towards the negative samples, due to the topic taken into consideration.
Subtask 2: Multi-label Image Classification This is a multi-label image classification task where the participants will be provided with multi-labeled images. The multi-label classification strategy, which assigns multiple labels to an image, better suits our visual sentiment classification problem and is intended to show the correlation of different sentiments. In this task seven classes, namely joy, sadness, fear, disgust, anger, surprise, and neutral, are covered.
Subtask 3: Multi-label Image Classification The task is also a multi-label, however, a wider range of sentiment classes are covered. Going deeper in the sentiment hierarchy, the complexity of the task increases. The sentiment categories covered in this task include anger, anxiety, craving, empathetic pain, fear, horror, joy, relief, sadness, and surprise.
Participants are encouraged to make their code public with their submission.
As implied by the popular proverb “a picture is worth a thousand words,” visual contents are an effective means to convey not only facts but also cues about sentiments and emotions. Such cues representing the emotions and sentiments of the photographers may trigger similar feelings from the observer and could be of help in understanding visual contents beyond semantic concepts in different application domains, such as education, entertainment, advertisement, and journalism. To this aim, masters of photography have always utilized smart choices, especially in terms of scenes, perspective, angle of shooting, and color filtering, to let the underlying information smoothly flow to the general public. Similarly, every user aiming to increase in popularity on the Internet will utilize the same tricks. However, it is not fully clear how such emotional cues can be evoked by visual contents and more importantly how the sentiments derived from a scene by an automatic algorithm can be expressed. This opens an interesting line of research to interpret emotions and sentiments perceived by users viewing visual contents.
The task is appropriate for researchers in machine learning, multimedia retrieval, sentiment analysis, and visual analysis.
We provide a slightly modified version of our visual sentiment analysis dataset [1], including a different training and testing set, consisting of disaster-related images collected from social media platforms such as Google, Flickr, and Twitter.
The dataset was annotated through a crowd-sourcing study using Microworkers, where at least five different participants were assigned to annotate each image. The final tags were chosen based on a majority vote from the five participants assigned to it. The study concluded with 10,010 different responses from 2,338 participants. The participants included individuals from different age groups and 98 countries. The time spent by a participant on an image, which helped filter out careless or inappropriate responses. Before the study, two trial studies were performed to test, correct errors, and improve clarity and readability.
All the tasks will be evaluated using standard classification metrics, where weighted F1-Score will be used to rank the different submissions. We also encourage participants to carry out a failure analysis of the results to gain insight into why a classifier may make a mistake.