Scene Change: Fun faux photos

See the MediaEval 2020 webpage for information on how to register and participate.

Task Description

The MediaEval 2020 Scene Change Task is interested in exploring fun faux photo’s, images that fool you at first, but can be identified as an imitation on closer inspection. Task participants are provided with images of people (as a “foreground segment”) and are asked to change the background scene to Paris. We call this switch “scene change”.

Based on the dataset provided, participants are asked to develop a system that addresses the main task of creating a composite image:

Participants are encouraged to improve the systems that address the main task, by developing additional sub-systems:

Note that for this task photorealism is not a goal in and of itself. Similarly to [1], we do strive for realism in the sense of acceptability, which includes enjoyability and shareability, rather than of physical accuracy. Physical accuracy is not required for acceptability, for example it is known that in artistic work impossible lighting conditions and colors do not interfere with the viewer’s understanding of the scene and often go unnoticed [2]. We adopt the assumption that optimizing for this realism captures distracting properties of the composed image, resulting in more appealing final images.

alt text

Can you tell at first glance who was in Paris? Can you tell at second glance?

Motivation and Background

The task has multiple motivations:

The task focuses on Paris, both because it is a highly popular tourist destination and also due to the availability of a Paris Dataset [12]. In 2017, France was the most visited country in the world, with Paris having a total of 23,6 million hotel visits [13,14].

Target Group

The task targets (but is not limited to) people interested in art and social media, multimedia retrieval, machine learning, adversarial machine learning, privacy and computer vision.

Depending on your research interests, you might want to experiment in other directions. We have provided a recommended reading list (below) with some suggestions. You might consider using a Generative-Adversarial-Network based approach, for instance building on the work of Lin et al. 2018. You could also try an approach similar to that of Lalonde et al. 2007, who retrieve foreground segments that match certain conditions to the background.


The data will be drawn from the ADE20k [4] dataset and the Paris dataset.

Evaluation Methodology

Participants submit scene change examples for all images in the test set. The scene change is evaluated in an user study, where study participants are randomly shown original and composed images and are asked to point out which image was originally taken at the location. The study is repeated twice, once time-restricted, similar to [3] and once unrestricted. A good algorithm produces shallow fakes: it demonstrates a high error rate on the time-restricted experiment and a low error rate on the unbounded experiment. Submissions are ranked on the difference in error rates between the two experiments.


[1] Karsch, K., Hedau, V., Forsyth, D., & Hoiem, D. (2011). Rendering synthetic objects into legacy photographs. ACM Transactions on Graphics (TOG), 30(6), 157.

[2] Cavanagh, P. (2005). The artist as neuroscientist. Nature, 434(7031), 301.

[3] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, Dawn Song. Spatially Transformed Adversarial Examples. ICLR 2018.

[4] Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 633-641).

[5] Roy, E. A. (2018, December 06). Instacrammed: The big fib at the heart of New Zealand picture-perfect peaks. The Guardian. Retrieved from

[6] Gammon, K. (2019, March 19). #Superbloom or #poppynightmare? Selfie chaos forces canyon closure. The Guardian. Retrieved from

[7] Rogers,K. (2020, March 20) Coronavirus canceled this family’s Disney trip. They made better memories at home. CNN. Retrieved from

[8] Compton, N.B. (2020, April 8) Travel photographers are taking epic nature photos using indoor optical illusions. Washington Post. Retrieved from

[9] Jones, D. (2020, April 15) People miss flying so much they’re re-creating the airplane experience from home. Washington Post. Retrieved from

[10] Zhou, N. (2020, April 16) Coronavirus vacation: Australian family recreate 15-hour holiday flight in living room. The Guardian.

[11] Güera, D., & Delp, E. J. (2018, November). Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1-6). IEEE.

[12] Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008, June). Lost in quantization: Improving particular object retrieval in large scale image databases. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1-8). IEEE.

[13] UNWTO Tourism Highlights, 2017 Edition. (2017, August). Retrieved from

[14] Tourism in Paris - Key Figures - Paris tourist office. Retrieved from

Lalonde, J. F., Hoiem, D., Efros, A. A., Rother, C., Winn, J., & Criminisi, A. (2007). Photo clip art. ACM transactions on graphics (TOG), 26(3), 3.

Lin, C. H., Yumer, E., Wang, O., Shechtman, E., & Lucey, S. (2018, March). ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9455-9464).

For more insight on the state of the art in segmentation, you could take a look at the winner of COCO 2018. The slides of the winner’s presentation can be found here: Furthermore there are also industry solutions that offer segmentation, such as and

Task Organizers

Zhuoran Liu, Radboud University, Netherlands, z.liu (at)
Martha Larson, Radboud University, Netherlands

Task Schedule

Workshop will be held online.