MediaEval 2020

The MediaEval Multimedia Evaluation benchmark offers tasks that are related to multimedia retrieval, analysis, and exploration. Participation is open to interested researchers who register. MediaEval focuses specifically on the human and social aspects of multimedia, and on multimedia systems that serve users. They offer the opportunity for researchers to tackle challenges that bring together multiple modalities (visual, text, music, sensor data).


The MediaEval 2020 Workshop took place 14-15 December 2020 fully online.

Workshop group photo:


The MediaEval Organization

MediaEval is made possible by the efforts of a larger number of task organizers, who each are responsible for organizing their own tasks. Please see the individual task pages for their name. The over all coordination is carried out by the MediaEval Logistics Committe and guided by the Community Council.

The MediaEval Logistics Committee (2020)
The MediaEval Community Council (2020)

MediaEval is grateful for the support of ACM Special Interest Group on Multimedia

For more information, contact m.larson (at) You can also follow us on Twitter @multimediaeval

Task List

Emotional Mario: Believable AI agents in video games

The Emotional Mario task explores the use of human emotion to improve the performance of AI-based agents playing Super Mario Bros. We provide a multimodal dataset consisting of video and sensor data to be used to complete two different subtasks.

Read more.

Emotions and Themes in Music

We invite the participants to try their skills in building a classifier to predict the emotions and themes conveyed in a music recording, using our dataset of music audio, pre-computed audio features, and tag annotations (e.g., happy, sad, melancholic). All data we provide comes from Jamendo, an online platform for music under Creative Commons licenses.

Read more.

FakeNews: Corona virus and 5G conspiracy

Spontaneous and intentional digital Fake News wildfires over on-line social media can be as dangerous as natural fires. A new generation of data mining and analysis algorithms is required for early detection and tracking of information waves. This task focuses on the analysis of tweets around Coronavirus and 5G conspiracy theories in order to detect misinformation spreaders.

Read more.

Flood-related Multimedia

Floods are one of the most common natural disasters that occur on our planet, and the destruction they cause is enormous. In this task, the participants receive a set of Twitter posts (tweets), including text, images, and other metadata, and are asked to automatically identify which posts are truly relevant to flooding incidents in the specific area of Northeastern Italy. The ground truth labels have been created by experts in flood risk management. The ultimate aim of this task is to develop technology that will support experts in flood disaster management.

Read more.

Insight for Wellbeing: Multimodal personal health lifelog data analysis

The quality of the air that we breathe as individuals as we go about our daily lives is important for health and wellbeing, However, measuring personal air quality remains a challenge. This task investigates the prediction of personal air quality using open data or data from lifelogs. The data includes images, tags, physiological data, and sensor readings.

Read more.


The fight against colorectal cancer requires better diagnosis tools. Computer-aided diagnosis systems can reduce the chance that diagnosticians overlook a polyp during a colonoscopy. This task focuses on robust and efficient algorithms for polyp segmentation. The data consists of a large number of endoscopic images of the colon.

Read more.

NewsImages: The role of images in online news

Images play an important role in online news articles and news consumption patterns. This task aims to achieve additional insight about this role. Participants are supplied with a large set of articles (including text body, and headlines) and the accompanying images. The task requires participants to predict which image was used to accompany each article and also predict frequently clicked articles on the basis of accompanying images.

Read more.

No-Audio Multimodal Speech Detection Task

Participants receive videos (top view) and sensor readings (acceleration and proximity) of people having conversations in a natural social setting and are required to detect speaking turns. No audio signal is available for use. The task encourages research on better privacy preservation during recordings made to study social interactions, and has the potential to scale to settings where recording audio may be impractical.

Read more.

Pixel Privacy: Quality Camouflage for Social Images

In this task, participants develop adversarial approaches that camouflage the quality of images. A camouflaged image appears to be unchanged, or even enhanced, to the human eye. At the same time, the image will fool a Blind Image Quality Assessment algorithm into predicting that its quality is low. Quality camouflage will help to ensure that personal photos, e.g., vacation photos depicting people, are less easily findable via image search engines.

Read more.

Predicting Media Memorability

The task requires participants to automatically predict memorability scores for videos, that reflect the probability for a video to be remembered. Participants will be provided with an extensive data set of videos with memorability annotations, related information, and pre-extracted state-of-the-art visual features.

Read more.

Scene Change: Fun faux photos

Tourist photography is due for a makeover, as people increasingly avoid travel due to environmental or safety concerns. In this task, participants create image composites: given a photo of a person, they must change the background to a popular tourist site. The special twist: a Scene Change photo must be fun without being deceptive. In other words, the photo fools you at first, but is identifiable as a composite upon closer inspection.

Read more.

Sports Video Classification

Participants are provided with a set of videos of table tennis games and are required to build a classification system that automatically labels video segments with the strokes that players can be seen using in those segments. The ultimate goal of this research is to produce automatic annotation tools for sport faculties, local clubs and associations to help coaches to better assess and advise athletes during training.

Read more.