Medical Multimedia Task - Transparent Tracking of Spermatozoa

See the MediaEval 2023 webpage for information on how to register and participate.

Task description

The 2023 Medico task tackles the challenge of tracking sperm cells in video recordings of spermatozoa. The provided development dataset contains 20 videos which have frame-by-frame bounding box annotations, each one 30 seconds long, as well as a set of sperm characteristics (hormone levels, fatty acid data, etc.), anonymized study participant data, and motility and morphology data aligned with WHO guidelines. The goal is to inspire task participants to track individual sperm cells in real time and integrate various data sources to predict standard measurements used for sperm quality assessment, specifically the motility (movement) of spermatozoa (living sperm cells).

We hope that this task will motivate the multimedia community to assist in the advancement of computer-assisted reproductive health and to devise innovative methods for analyzing multimodal datasets. In addition to effective analysis, the efficiency of the algorithms is crucial due to the real-time nature of the sperm assessment, which necessitates immediate feedback.

For the task, we will provide a dataset of videos and other data from 20 different patients. Based on this data, the participants will be asked to solve the following three subtasks:

Sperm detection and tracking : This task aims to automatically localize and track all sperm cells in a given video. In order to develop medical applications in the real world, this task focuses on both the prediction accuracy and efficiency (i.e., processing time) of proposed solutions. Particularly, for a given video of microscopic sperm examinations in which sperms are manually annotated by experts, participants are required to detect sperm cells on all video frames and track them. Tracking should be performed by predicting bounding box coordinates with the similar format to the bounding box coordinates provided with the development datasets.
Efficient detection and tracking: This task is very similar to Task 1 but has a larger focus on the efficiency of the system and not just the final predictions. Therefore, frames per second is an important factor to measure. To evaluate the efficiency of solutions, participants also need to report FPS and FLOPS of batch size of 1 during the inference state. To achieve the goal, participants expectedly develop methods with high prediction accuracy and fast inference times.
Prediction of motility: in terms of the percentage of progressive and non-progressive spermatozoa is the second task. The prediction needs to be performed sample wise resulting in one value per sample per predicted attribute. Sperm tracking or bounding boxes predicted in task 1 are required to use to solve the task. Motility is the ability of an organism to move independently, and where a progressive spermatozoon is able to “move forward”, a non-progressive would move in circles without any forward progression.
(Experimental) Predicting motility using graph data structures: We provide graph data extracted from the original VISEM-Tracking dataset. In this task, we ask participants to use these graph data structures as input to a model to predict motility level of sperm samples.

Motivation and background

Manual evaluation of a sperm sample using a microscope is time-consuming and requires costly experts who have extensive training. In addition, the validity of manual sperm analysis becomes unreliable due to limited reproducibility and high inter-personnel variations due to the complexity of tracking, identifying, and counting sperms in fresh samples. The existing computer-aided sperm analyzer systems are not working well enough for application in a real clinical setting due to unreliability caused by the consistency of the semen sample. Therefore, we need to research new methods for automated sperm analysis.

Target group

Through our broad team, we can actively invite people from multiple communities to submit solutions to the proposed task. We strongly believe that a significant fraction of multimedia researchers can contribute to the medical scenario. Therefore, we hope that many people are interested and involved on a personal level supporting a decision to work on the task and try out their ideas. To ensure that young researchers succeed, we will also provide mentoring for students that want to tackle the task (undergraduate and graduate levels are very welcome).

Data

VISEM [2] contains data from 85 male participants aged 18 years or older. For this task, we have selected only 30 seconds video clips from selected 20 videos. For each participant, we include a set of measurements from a standard semen analysis, a video of live spermatozoa, a sperm fatty acid profile, the fatty acid composition of serum phospholipids, study participants-related data, and WHO analysis data. Every frame of videos have the corresponding bounding box coordinates of sperms. Each video has a resolution of 640x480 and runs at 50 frames-per-second. The dataset contains in total six CSV files (five for data and one which maps video IDs to study participants’ IDs), a description file, and folders containing the videos and bounding box data. The name of each video file contains the video’s ID, the date it was recorded, and a small optional description. Then, the end of the filename contains the code of the person who assessed the video. Furthermore, VISEM contains five CSV files for each of the other data provided, a CSV file with the IDs linked to each video, and a text file containing descriptions of some of the columns of the CSV files. One row in each CSV file represents a participant. The provided CSV files are:

semen_analysis_data: The results of standard semen analysis.
fatty_acids_spermatozoa: The levels of several fatty acids in the spermatozoa of the participants.
fatty_acids_serum: The serum levels of the fatty acids of the phospholipids (measured from the blood of the participant).
sex_hormones: The serum levels of sex hormones measured in the blood of the participants.
study_participant_related_data: General information about the participants such as age, abstinence time, and Body Mass Index (BMI).
videos: Overview of which video file belongs to what participant.

In addition to the main dataset, VISEM-Tracking [1], we provide an additional graph dataset which was extracted from the VISEM-Tracking. More details about this graph dataset can be found here: https://huggingface.co/datasets/SimulaMet-HOST/visem-tracking-graphs.

Ground truth

The ground truth data provided in this task were prepared by expert computer scientists and verified by domain experts. Tracking ground truth uses the YOLO format while the motility ground truth is a CSV file containing the motility values.

Evaluation methodology

For Task 1 and Task 2 will be evaluated using standard detection and tracking metrics. For detection, this includes precision, recall, mAP@50, and mAP@50-95. For tracking we use Jonathan Luiten’s TrackEval library, which includes HOTA and other MOT evaluation metrics. Efficiency will be evaluated based on the number of samples that can be processed per second. Task 1 will only focus on the prediction metrics, while Task 2 will be weighted by the speed of the system.

For Task 3 and Task 4, we can use Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE) to evaluate the predictions.

The evaluation scripts are available here: https://github.com/LouisDo2108/MediaEval2023-Medico-EvalScript

Submission Instructions

Task 1:

All files of Task 1, as detailed below, should be compressed into a single .zip file and uploaded to the designated section for Task 1 on the submission form.

source_code
   	|- code_and_checkpoints
|- README.txt (must explain how to run your model to detect sperms on a new video)
	|- run.sh (shell script file to run your models for new video inputs (.mp4))

predictions
|- <test_video_ id>
	|- labels
             |- <video id>_frame_0.txt
             |- <video id>_frame_1.txt
             |- <video id>_frame_2.txt
              ...
 
    |-labels_ftid (optional) # labels with unique feature IDs to track them via multiple frames
             |- <video id>_frame_0.txt with tracking IDs.
             |- <video id>_frame_1.txt with tracking IDs.
             |- <video id>_frame_2.txt with tracking IDs.
             ...
     	|- <video id>.mp4 (showing sperm detection information)
  	|- <video id>_tracking.mp4 (showing sperm tracking information) - optional
	|- <video id>_detection.json
	|- tracking (same structure as this folder “VISEM_Tracking_Train_v4\trackeval_MOT\trackers\mot_challenge\MOT17-test\<any_name_you_want>\data” in the provided example  file (https://drive.google.com/file/d/1nSsQbAMxCmZoLeEAQwVLYVbQA7zq2WQG/view?usp=sharing))

|- …

The ‘predictions’ folder should also contain the result JSON from the detection subtask and the optional tracking subtask as described in https://github.com/LouisDo2108/MediaEval2023-Medico-EvalScript#subtask-1-sperm-detection-and-tracking. The .txt files are also required in the case the JSONs are ill-formatted. This is an example format for the

[
    {
        "bbox": [
            404.25,
            260.75,
            19.0,
            18.5
        ],
        "category_id": 0,
        "image_id": 1,
        "score": 0.84277
    },
    ...
]

Make sure the “image_id” matches the annotations/Train.json or annotations/Val.json in the provided files. We provide an example detection_example_prediction.json in the example data.

Task 2

All files for Task 2, as described below, should be compressed into a single .zip file and uploaded to the designated section for Task 2 on the submission form. We will execute your final model using our hardware resources; therefore, please ensure that your run.sh script functions correctly.

source_code
   	|- code_and_checkpoints
|- README.txt (must explain how to run your model to detect sperms on a new video)
	|- run.sh (shell script file to run your models for new video inputs (.mp4))

predictions
|- <test_video_ id>
	|- labels
             |- <video id>_frame_0.txt
             |- <video id>_frame_1.txt
             |- <video id>_frame_2.txt
              ...
 
    |-labels_ftid (optional) # labels with unique feature IDs to track them via multiple frames
             |- <video id>_frame_0.txt with tracking IDs.
             |- <video id>_frame_1.txt with tracking IDs.
             |- <video id>_frame_2.txt with tracking IDs.
             ...
     	|- <video id>.mp4 (showing sperm detection information)
  	|- <video id>_tracking.mp4 (showing sperm tracking information) - optional
|- ...

Task 3

We will compare your results with a ground truth file similar to ‘semen_analysis_data_Train.csv.’ Therefore, you are required to predict the following: Progressive Motility (%), Non-Progressive Sperm Motility (%), and Immotile Sperm (%). Please refer to the CSV file to locate these specific columns. Note that the sum of these three values should equal 100%. All files for Task 3, as described below, should be compressed into a single .zip file and uploaded to the designated section for Task 3 on the submission form.

– source_code
	|– code_and_checkpoints
	|– README.txt (must explain how to run your model to predict motility level of a new video)
	|– run.sh (shell script file to run your models for new video inputs (.mp4)) # must work with test video files

– motility_predictions.csv

--------------
Sample format
--------------
ID, Progressive motility (%), Non progressive sperm motility (%), Immotile sperm (%)
1, 25, 75, 25
2, 45, 35, 20
…

The motility_predictions.csv should have the exact columns as the ground-truth and contains only the ID of the test set.

Task 4

This task is experimental in nature. You are required to utilize your prediction models from either Task 1, Task 2, or both. Graph structures can be prepared using the predicted bounding boxes. Sample source code for generating these graphs can be accessed here: https://github.com/vlbthambawita/visem-tracking-graphs . An example video graph structure is available here: https://huggingface.co/datasets/SimulaMet-HOST/visem-tracking-graphs/tree/main/spatial_threshold_0.1/11 . Additional details about the graphs using development data can be found here: https://huggingface.co/datasets/SimulaMet-HOST/visem-tracking-graphs . All files related to Task 4, as detailed below, should be compressed into a single .zip file and uploaded to the designated section for Task 4 on the submission form.

– source_code
	|– code_and_checkpoints
	|- graph_data_stuctures
	|– README.txt (must explain how to run your model to predict motility level of a new video using graph stuctures)
	|– run.sh (shell script file to run your models for new video graph structures) 
– motility_predictions.csv

--------------
Sample format
--------------
ID, Progressive motility (%), Non progressive sperm motility (%), Immotile sperm (%)
1, 25, 75, 25
2, 45, 35, 20
…

Google form for submission: https://forms.gle/5EYy2zVrjhbh9ZzU8

Quest for insight

How accurate are deep learning methods for identifying sperms in a fresh sample?
Will continued tracking of sperm help to analyze the motility level of sperm samples?
How do we calculate the average speed of moving sperms, and how to track the fastest one among many moving sperms?
How can we convince doctors about the accuracy, reliability, and trustworthiness of the output of Deep Learning methods?
Comprehensive description of pre- and post-processing methods.
Explain why specific processing strategies were chosen and what insights were used to consider them.

Participant information

Please contact your task organizers with any questions on these points.

Signing up: Fill in the registration form and fill out and return the usage agreement.
Making your submission: See the submission requirements in Medicos official GitHub repository.
Preparing your working notes paper: Instructions on preparing you working notes paper can be found in MediaEval 2023 Working Notes Paper Instructions.

References and recommended reading

[1] Thambawita, V., Hicks, S.A., Storås, A.M. et al. VISEM-Tracking, a human spermatozoa tracking dataset. Sci Data 10, 260 (2023). https://doi.org/10.1038/s41597-023-02173-4

[2] Trine B. Haugen, Steven A. Hicks, Jorunn M. Andersen, Oliwia Witczak, Hugo L. Hammer, Rune Borgli, Pål Halvorsen, and Michael Riegler. 2019. VISEM: a multimodal video dataset of human spermatozoa. The Proceedings of the 10th ACM Multimedia Systems Conference (MMSys ‘19). Association for Computing Machinery, New York, NY, USA, 261–266. DOI:https://doi.org/10.1145/3304109.3325814

[3] Hicks, S.A., Andersen, J.M., Witczak, O. et al. Machine Learning-Based Analysis of Sperm Videos and Participant Data for Male Fertility Prediction. Sci Rep 9, 16770 (2019). https://doi.org/10.1038/s41598-019-53217-y

[4] Thambawita, V., Halvorsen, P., Hammer, H., Riegler, M., & Haugen, T. B. (2019). Stacked dense optical flows and dropout layers to predict sperm motility and morphology. arXiv preprint arXiv:1911.03086.

[5] Thambawita, V., Halvorsen, P., Hammer, H., Riegler, M., & Haugen, T. B. (2019). Extracting temporal features into a spatial domain using autoencoders for sperm video analysis. arXiv preprint arXiv:1911.03100.

Task organizers

Organizers

Vajira Thambawita, vajira (at) simula.no, SimuaMet
Steven Hicks, steven (at) simula.no, SimulaMet
Andrea Storås andrea (at) simula.no, SimulaMet
Michael Riegler, michael (at) simula.no, SimulaMet
Pål Halvorsen, paalh (at) simula.no, SimulaMet

Co-organizers

Tuan-Luc Huynh, htluc (at) selab.hcmus.edu.vn, University of Science - VNUHCM
Hai-Dang Nguyen, nhdang (at) selab.hcmus.edu.vn, University of Science - VNUHCM
Minh-Triet Tran, tmtriet (at) selab.hcmus.edu.vn, University of Science - VNUHCM
Trung-Nghia Le, ltnghia (at) selab.hcmus.edu.vn, University of Science - VNUHCM

Task schedule

31st August 2023: Development data release
15th October 2023: Test data release
15th November 2023: Runs due. Exact dates to be announced.
8 December 2023: Results returned
15 December 2023: Working notes paper
1-2 February 2024: 14th Annual MediaEval Workshop, Collocated with MMM 2024 in Amsterdam, Netherlands and also online.