Sports Video Classification

See the MediaEval 2020 webpage for information on how to register and participate.

Task Description

Participants are provided with a set of videos of table tennis games and are required to build a classification system that automatically labels video segments with the strokes that players can be seen using in those segments. The ultimate goal of this research is to produce automatic annotation tools for sport faculties, local clubs and associations to help coaches to better assess and advise athletes during training.

Action detection and classification is one of the main challenges in visual content analysis and mining. Sport video analysis has been a very popular research topic, due to the variety of application areas, ranging from analysis of athletes’ performances to multimedia intelligent devices with user-tailored digests. Datasets focused on sports activities or datasets including a large amount of sport activity classes are now available and many research contributions benchmark on those datasets. A large amount of work is also devoted to fine-grained classification through the analysis of sport gestures using motion capture systems. However, body-worn sensors and markers could disturb the natural behavior of sports players. Furthermore, motion capture devices are not always available for potential users, be it a University Faculty or a local sport team. Giving end-users the possibility to monitor their physical activities in ecological conditions through simple equipment is a challenging issue.

This task offers researchers an opportunity to compare their approaches to fine-grained sports Video Annotation by testing them on the task of recognizing strokes in table tennis videos. The low inter-class variability makes the task more difficult than with usual general datasets, like UCF-101 and DeepMind Kinetics.

Target Group

The task is of interest to researchers in the areas of machine learning (classification), visual content analysis, computer vision and sport performance. We explicitly encourage researchers focusing specifically in domains of computer-aided analysis of sport performance.


Our focus is on recordings that have been made by widespread and cheap video cameras, e.g. GoPro. We use a dataset specifically recorded in a sport faculty facility and continuously completed by students and teachers. This dataset is constituted of player-centred videos recorded in natural conditions without markers or sensors. It comprises 20 table tennis strokes and a rejection class can be build upon them. The problem is hence a typical research topic in the field of video indexing: for a given recording, we need to label the video by recognizing each stroke appearing in the whole video.

Evaluation Methodology

Twenty stroke classes are considered according to the rules of table tennis. This taxonomy was designed with professional table tennis teachers. We are working on videos recorded at the Faculty of Sports of the University of Bordeaux. Students are the sportsmen filmed and the teachers are supervising exercises conducted during the recording sessions. The recordings are markerless and allow the players to perform in natural conditions. In each video file a table tennis stroke is delimited by temporal borders. The latter are supplied in an xml file. For each test video the participants are invited to produce an xml file in which each stroke is labeled accordingly to a given taxonomy. Submissions will be evaluated in terms of accuracy per class of a stroke and of global accuracy.

Crisp Project

Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, Julien Morlier. 2020. Fine grained sport action recognition with siamese spatio-temporal convolutional neural networks. Multimedia Tools and Applications (19 Apr 2020).

Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, and Julien Morlier. 2019. Optimal choice of motion estimation methods for fine-grained action classification with 3D convolutional networks. In ICIP 2019. IEEE,554–558.

Gül Varol, Ivan Laptev, and Cordelia Schmid. 2018. Long-Term Temporal Convolutions for Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 6 (2018), 1510–1517.

Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CoRR abs/1705.07750 (2017).

Chunhui Gu, Chen Sun, Sudheendra Vijayanarasimhan, Caroline Pantofaru, David A. Ross, George Toderici, Yeqing Li, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. 2017. AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. CoRR abs/1705.08421 (2017).

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 hu- man actions classes from videos in the wild. CoRR 1212.0402 (2012).

Task Organizers

You can email us directly at (at)

Jenny Benois-Pineau, Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, F-33400, Talence, France (jenny.benois-pineau (at)
Pierre-Etienne Martin, Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, F-33400, Talence, France (pierre-etienne.martin (at)
Renaud Péteri, MIA, University of La Rochelle, La Rochelle, France (renaud.peteri (at)
Boris Mansencal, Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, F-33400, Talence, France (boris.mansencal (at)
Jordan Calandre, MIA, University of La Rochelle, La Rochelle, France
Julien Morlier, IMS, University of Bordeaux, Talence, France
Laurent Mascarilla, MIA, University of La Rochelle, La Rochelle, France

Task Schedule

Workshop will be held online. Exact dates to be announced.