See the MediaEval 2023 webpage for information on how to register and participate.
Positions and actions detection/classification are one of the main challenges in visual content analysis and mining. Sport video analysis has been a very popular research topic, due to the variety of application areas, ranging from analysis of athletes’ performances and rehabilitation to multimedia intelligent devices with user-tailored digests. We propose this year a series of 6 tasks, divided each into 2 sub-tasks for two sports, table tennis and swimming. Those tasks are a follow-up from the 2022 Sport Task and SwimTrack.
Task 1 - athletes positions detections
Subtask 1.1 (table tennis) - to detect 2 or 4 players (depending if single or double) and track them during the video especially during double games where players have a lot of overlaps, from videos recorded from various angles (e.g., side, corder).
Subtask 1.2 (swimming) - to detect up to 8 swimmers in the pool from static videos (recorded from the side of the pool).
Task 2 - strokes detection
Subtask 2.1 (table tennis) - to detect when a player is performing a stroke (i.e. a ball hit with the racket) using close-up videos
Subtask 2.2 (swimming) - to detect each time a swimmer is achieving a repeated motion for each swimming style (for freestyle, backstroke, and butterfly stroke once the swimmer’s right hand enters the water; for breaststroke once the head is at its highest point).
Task 3 - motion classification
Subtask 3.1 (table tennis) - to classify different strokes in table tennis from trimmed videos in which only one stroke is present. There are 3 different categories of strokes, services, forehand and backhand. For services we have 6 different classes. For forehand and backhand we have 5 classes. For a total of 16 classes and one non-stroke class.
Subtask 3.2 (swimming) - to classify different swimming styles (Freestyle, Backstroke, Breaststroke, Butterfly)
Task 4 - field/table registration
Subtask 4.1 (table tennis) - to detect the table position for a given video frame.
Subtask 4.2 (swimming) - to detect pool position for a given video frame.
Task 5 - sound detection
Subtask 5.1 (table tennis) - to detect when the ball hits the table or the racket.
Subtask 5.2 (swimming) - to detect buzzer sound. In swimming races, the beginning of a race is given by a buzzer sound to inform swimmers that they can start.
Task 6 - score and results extraction
Subtask 6.1 (table tennis) - to recognise the score of the match. In table tennis, the score of a match can be embedded in the broadcast video or it can be shown by referees with scoreboards. When score is embedded in stream video, names of players are also displayed.
Subtask 6.2 (swimming) - to recognise results of races. During swimmer competitions, after each race, results are displayed on digital boards. The goal is to recognise characters of these boards to obtain the results of races.
The task is of interest to researchers in the areas of machine learning (classification), visual content analysis, computer vision and sport performance. We explicitly encourage researchers focusing specifically in domains of computer-aided analysis of sport performance.
Our focus is on recordings that have been made by both widespread and cheap video cameras, e.g. GoPro, but also high-quality videos, e.g. Blackmagick 4K.
Each video has been manually annotated by experts. For event-based annotations, we have annotated moments in the video that are relevant for the event. For positions we have annotated key and intermediate positions of the athlete and relied upon interpolation for the remaining positions.
Each task will have its own evaluation methodology and will be provided once the dataset is released.
Please contact the task organizers by email if you have questions (see below).
Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, Julien Morlier. Three-Stream 3D/1D CNN for Fine-Grained Action Classification and Segmentation in Table Tennis. 4th International ACM Workshop on Multimedia Content Analysis in Sports, ACM Multimedia, Oct 2021, Chengdu, China.
Kaustubh Milind Kulkarni, Sucheth Shenoy: Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation. CVPR Workshops 2021: 4576-4584.
Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud Péteri, Julien Morlier. Fine grained sport action recognition with siamese spatio-temporal convolutional neural networks. Multimedia Tools and Applications, vol. 79, 20429–20447, Springer (2020).
Extended work in: Pierre-Etienne Martin. Fine-Grained Action Detection and Classification from Videos with Spatio-Temporal Convolutional Neural Networks. Application to Table Tennis. Neural and Evolutionary Computing [cs.NE]. Université de Bordeaux; Université de la Rochelle, 2020.
Gül Varol, Ivan Laptev, and Cordelia Schmid. Long-Term Temporal Convolutions for Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 6 (2018), 1510–1517.
Joao Carreira and Andrew Zisserman. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CoRR abs/1705.07750 (2017).
Chunhui Gu, Chen Sun, Sudheendra Vijayanarasimhan, Caroline Pantofaru, David A. Ross, George Toderici, Yeqing Li, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. CoRR abs/1705.08421 (2017).
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. UCF101: A dataset of 101 hu- man actions classes from videos in the wild. CoRR 1212.0402 (2012).
Nicolas Jacquelin, Romain Vuillemot, and Stefan Duffner. 2021. Detecting Swimmers in Unconstrained Videos with Few Training Data. 8th Workshop on Machine Learning and Data Mining for Sports Analytics (Sept. 2021).
T. F. H. Runia, C. G. M. Snoek, and A. W. M. Smeulders. 2018. Real-World Repetition Estimation by Div, Grad and Curl. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9009–9017.
Timothy Woinoski, Alon Harell, and I. Bajić. 2020. Towards Automated Swimming Analytics Using Deep Neural Networks. ArXiv (2020).
Contact: romain.vuillemot@ec-lyon.fr