SwimTrack: Swimmers and Stroke Rate Detection in Elite Race Videos

See the MediaEval 2022 webpage for information on how to register and participate.

Task Description

The SwimTrack represents a series of 5 multimedia tasks related to swimming video analysis from elite competition recordings. These tasks are related to video, image, and audio analysis which may be achieved independently. But when solved altogether, they form a grand challenge to provide sport federations and coaches with novel methods to assesand enhance swimmers’ performance, in particular related tTask Descriptiono stroke rate and length analysis. The five proposed tasks are as follows:

Swimmer Position Detection. In this task, participants are requested to estimate swimmers’ positions in a swimming pool. The input are videos with a number of occupied swimming lanes, and hence of swim mers, ranging from one to ten.
Stroke Rate Detection. Here, participants have to identify swimming strokes events, an important information to further calculate the stroke rate during a race. For Freestyle, Backstroke, and Butterfly a stroke is triggered once the swimmer right hand enters the water. For Breaststroke a stroke is triggered once the head is at its highest point. The strokes will be identified once the underwater phase has ended, and until the swimmer has not yet finished its race (also excluding underwater phases for races longer than 50m). For this reason, we will provide video clips of cropped swimmers excluding the underwater phases following dives and returns.
Camera Registration. The races are shot from the side of the pool or the stands, thus, due to the perspective and geometrical projection on the image, its shape is not rectangular and only partially visible. To compensate this effect, one can use a homography projection to create a virtual top-view of the pool. In this tasks, participants have to find the (absolute) homography matrix corresponding to each frame provided in the dataset.
Characters Recognition of Score Boards. The result of each race is displayed on a scoreboard displaying the race time, swimmer names, and sometimes additional information (e. g.,reaction time). Such scoreboards are usually displayed on a physical LCD screen located on the swimming pool wall, or a digital version is shown in the TV broadcast. In this task, the objective is to extract swimmers’ name, lane numbers, and their race result (time) from screenshots of such boards.
Sound detection. Every swimming race starts with a buzzer sound (preceded by the iconic on your mark). Participants have to estimate when such a sound occurs in audio files extracted from live videos. The files may or may not contain a buzzer sound, which may occur at any time during a recording. This task is far from trivial as the sound is captured from a rather long distance and there may be a large amount of background noise.

Motivation and background

Swimming has a long tradition of being analyzed (e.g., race time, lap time, rankings) due to official time recording devices. There is however little information at a more detailed level, i.e., within laps or on the swimmers’ speed and real-time motion, except for manually annotated datasets. The goal of the SwimTrack-v1 challenge is to push the envelope of systems that accurately track swimmers’ motion in a reliable way during elite competitions. Current state of the art in multi-object tracking is limited by the unusual nature of a swimmer’s motion and large noise generated by the water. This first version of the challenge is divided into 5 independent tasks. Each of them contains its own set of input data, output format, and an evaluation metric.

Target group

This task is targeting computer vision and machine learning scientists, researchers and students with a particular interest in processing sports-related multimodal content.

Data

The data set consists of swimming videos recorded during national and international competitions. Videos cover all the 4 swimming styles (Freestyle, Backstroke, Breaststroke, Butterfly, Medley), both genders (female, male) and principal race lengths (50m, 100m, 200m, 400m) for 50m-long swimming pools. They cover all the swimming phases (e. g.,standing, diving, underwater, return and finish). The camera view parameters vary from static wide angle to zoomed + moving using various camera types (GoPro 8, Blackmagic Pocket 6K and Panasonic HC-V750). Resolutions range from HD to 4K with variable frame rates across the recordings (between 25fps and up to 50fps). Despite those differences, the provided videos share the same MP4 format resulting from the same compression algorithm. We will make available multimedia data from 10 competitions recorded continuously, from a fixed spot on the stands, with or without any pan and zoom.

Evaluation methodology

Participants’ proposal will be evaluated when submissions of solutions to our website will be permitted. This website will dynamically calculate the score for the TEST dataset of each task. If all tasks have been addressed, a general “grand challenge” score will be calculated. As stated in the introduction, we will however limit the number of times participant can submit solutions. The metrics for each task will be as follows:

Swimmer Position Detection. The results will be evaluated by calculating the overlap between the participants’ answers and the ground truth, namely the Average Precision (AP) 25, i.e., if a true box is overlapped by an estimated box with an Intersection over Union (IoU) ratio greater than 0.25, it is counted as positive, if not, as negative. The AP25 is the ratio: #Positives / (#Positives + #Negatives) across the whole dataset.
Stroke Rate Detection. Evaluation will be performed by measuring the commonly used Off-By-One Accuracy (OBOA) [2]: it counts the proportion of videos in the dataset with a correctly estimated number of strokes, up to a tolerated error of one stroke.
Camera Registration. The precision of the projection is measured using the IoU between the ground truth top-view and the estimated one. We will use two metrics: IoU_part which compares only the pool’s visible parts of the top-views, and IoU_whole which uses the whole pool, i. e.,even the parts that are outside the camera’s field of view.
Characters Recognition of Score Boards. The swimmer names precision will be calculated using the edit distance between the prediction and the ground truth, while race results are compared using the average absolute time difference between the prediction and the ground truth (MAE).
Sound detection. Evaluation will be based on a precision-recall curve obtained by measuring the correct/missed detections and varying the tolerated absolute time difference between the predicted moments and the ground truth.

Quest for insight

Here are several research questions related to this challenge that participants can strive to answer in order to go beyond just looking at the evaluation metrics:

Is stroke rate constant within or between laps?
To increase their speed, do swimmers increase their stroke rate or their stroke length?
How do swimmers change their swimming strategy to cope with various constraints (e.g., fatigue, tight competition, bad start)?
Are there typical stroke rate profiles of swimmers? e.g.,can they be categorized based on their stroke rate?
How does a swimmer’s stroke rate and length profile evolve through his/her career ?

Participant information

Please contact your task organizers with any questions on these points.

Signing up: Fill in the registration form and fill out and return the usage agreement.
Making your submission: To be announced (check the task read me)
Preparing your working notes paper: Instructions on preparing you working notes paper can be found in MediaEval 2022 Working Notes Paper Instructions.

References and recommended reading

[1] Nicolas Jacquelin, Romain Vuillemot, and Stefan Duffner. 2021. Detecting Swimmers in Unconstrained Videos with Few Training Data. 8th Workshop on Machine Learning and Data Mining for Sports Analytics (Sept. 2021).

[2] T. F. H. Runia, C. G. M. Snoek, and A. W. M. Smeulders. 2018. Real-World Repetition Estimation by Div, Grad and Curl. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9009–9017.

[3] Timothy Woinoski, Alon Harell, and I. Bajić. 2020. Towards Automated Swimming Analytics Using Deep Neural Networks. ArXiv (2020).

Task organizers

Nicolas Jacquelin, École Centrale de Lyon, LIRIS
Théo Jaunet, INSA-Lyon, LIRIS
Romain Vuillemot, École Centrale de Lyon, LIRIS
Stefan Duffner, INSA-Lyon, LIRIS

Contact: romain.vuillemot (at) ec-lyon.fr

Task Schedule

July 31st 2022: Data release
November 2022: Runs due and results returned. Exact dates to be announced.
28 November 2022: Working notes paper
12-13 January 2023: 13th Annual MediaEval Workshop, Collocated with MMM 2023 in Bergen, Norway and also online.