NewsImages: Retrieval and generative AI for news thumbnails

See the MediaEval 2025 webpage for information on how to register and participate.

Task description

This year’s NewsImages challenge explores matching news articles with retrieved or generated images. Participants receive a large set of English-language articles from international publishers. Given the text of a news article, the goal of this task is to retrieve and/or generate a fitting image recommendation. In the Image Retrieval subtask, teams retrieve images from a larger collection. In the Image generation subtask, the teams use generative AI to produce an image that can be used as a thumbnail. (Teams can take part in one or both of two subtasks; separate submissions are required.) The tasks are as follows:

  1. Image retrieval, where participants design and implement approaches to retrieve relevant images in an existing image pool that fits a given news headline and lead.
  2. Image generation, where participants need to use and develop techniques for creating appropriate visuals for news articles.

The main criterion for evaluation (i.e., ranking of teams) is the image fit and relevance, i.e., whether the images capture key attributes of the text article, but do not depict any important elements not present in the article. The image retrieval/generation pipeline must not rely on any third-party closed-sourced APIs or resources. We provide a comprehensive dataset of news articles with article title, article lead, URL (to retrieve the full text), and an editorially assigned image.

In the final phase of this challenge, participants will take part in a crowdsourced evaluation event to rate the quality and relevance of the submissions. We provide an online interface where teams will be shown images produced by their peers. Each team will need to rate a subset of submitted images.

Furthermore, the images will be evaluated for their media policy compliance. We are particularly interested in non-photorealistic images that do not suggest that they accurately represent real events so as to not mislead and/or deceive readers.

Motivation and background

News publishers and recommender systems depend on images and thumbnails to engage readers with news articles. Technology has advanced to the point that it is possible to automatically find a matching image (image retrieval) or generate an appropriate image (generative AI) for a news article. Although these techniques present an opportunity for the news media, they also present a great number of technical and ethical challenges. It is of critical importance that the match between the news and the image is accurate. And images should not mislead and/or deceive readers into assuming it represents a real-life situation when it does not.

Online news articles are inherently multimodal. The text of an article is often accompanied by an image and/or other multimedia items. This image, however, is not only important for illustrating and complementing the text of news articles. It plays a critical role in capturing the readers’ attention; it is often the first thing readers see when browsing a news platform.

Research in multimedia and recommender systems generally assumes a simple relationship between images and text occurring together. For example, in image captioning [1], the caption is often assumed to describe the literally depicted content of the image. In contrast, when images accompany news articles, the relationship becomes less clear [2]. Since there are often no images available for the most recent news messages, stock images, archived photos, or even generated images are used. Here, preliminary studies showed that users prefer AI-generated content over stock images [3, 4]. The goal of this task is to investigate these intricacies in more depth, in order to understand the implications that it may have for the areas of journalism and news personalization.

Target group

This task targets researchers who are interested in investigating the retrieval and generation of images for news and the connection between images and text. This includes people working in the areas of computer vision, recommender systems, cross-modal information retrieval, as well as in the area of news analysis.

Data

This challenge uses the open-source GDELT news dataset. We create and share a subset of 8,500 news articles collected during 2022 and 2023. The news articles are all in English. Each item includes the article title, article lead, and the original image. The article text itself is not shared, but participants are free to retrieve it from the original source. We ask participants to use the Yahoo-Flickr Creative Commons 100 Million (YFCC100M) dataset to source the images for the retrieval task.

Evaluation methodology

The generated and retrieved image submissions of the participants will be evaluated during a crowd-sourced event. To that end, the organizers provide an online tool and distribute accounts to participants. The participants are requested to rate the fit and the relevance of the images that were submitted by their peers. The winning team is determined by the overall average rating for submitted images. In addition to the qualitative questions outlined above, the rating procedure for generated images will include a check for policy compliance. The policy applied within the scope of this challenge is based on existing AI guidelines in news media: Generated image submissions should adhere to current editorial standards and not suggest that they accurately represent real events, deceive, or mislead readers in any other way.

Two subsets of image submissions will be rated during the evaluation event. We have a small subset of predetermined articles (will be communicated in advance by the organizers) and a larger subset of randomly picked articles. This allows the participants to focus on creating hand-crafted solutions for the predetermined article-image pairs as well as fully automated and scalable solutions for the random sample.

Staying true to the principles of MediaEval of promoting reproducible research, the submissions of both retrieved and generated images must include the entire workflow. For image generation, we encourage participants to use tools that can automatically embed the workflow used for their generation in the image file, e.g., ComfyUI or WebUI). Furthermore, we ask participants to use open-source solutions that can be set up and run locally.

Quest for insight

This task invites participants to further reflect on the role and potential of news images in terms of generation and retrieval. The organizers encourage participants to explore further topics related to the task. Possible research questions are:

Participant information

[1] Hossain, M. Z., Sohel, F., Shiratuddin, M. F., & Laga, H. (2019). A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CSUR), 51(6), p. 1-36.

[2] Oostdijk, N., van Halteren, H., Bașar, E., & Larson, M. (2020). The Connection between the Text and Images of News Articles: New Insights for Multimedia Analysis. In Proceedings of The 12th Language Resources and Evaluation Conference, p. 4343-4351.

[3] Heitz, L., Bernstein, A. & Rossetto, L. (2024). An Empirical Exploration of Perceived Similarity between News Article Texts and Images. MediaEval 2023 Working Notes Proceedings.

[4] Heitz, L., Chan, Y. K., Li, H., Zeng, K., Bernstein, A. & Rossetto, L. (2024). Prompt-based Alignment of Headlines and Images Using OpenCLIP. MediaEval 2023 Working Notes Proceedings.

Task organizers

Task schedule