See the MediaEval 2023 webpage for information on how to register and participate.
The 2023 Medico task tackles the challenge of tracking sperm cells in video recordings of spermatozoa. The provided development dataset contains 20 videos which have frame-by-frame bounding box annotations, each one 30 seconds long, as well as a set of sperm characteristics (hormone levels, fatty acid data, etc.), anonymized study participant data, and motility and morphology data aligned with WHO guidelines. The goal is to inspire task participants to track individual sperm cells in real time and integrate various data sources to predict standard measurements used for sperm quality assessment, specifically the motility (movement) of spermatozoa (living sperm cells).
We hope that this task will motivate the multimedia community to assist in the advancement of computer-assisted reproductive health and to devise innovative methods for analyzing multimodal datasets. In addition to effective analysis, the efficiency of the algorithms is crucial due to the real-time nature of the sperm assessment, which necessitates immediate feedback.
For the task, we will provide a dataset of videos and other data from 20 different patients. Based on this data, the participants will be asked to solve the following three subtasks:
Manual evaluation of a sperm sample using a microscope is time-consuming and requires costly experts who have extensive training. In addition, the validity of manual sperm analysis becomes unreliable due to limited reproducibility and high inter-personnel variations due to the complexity of tracking, identifying, and counting sperms in fresh samples. The existing computer-aided sperm analyzer systems are not working well enough for application in a real clinical setting due to unreliability caused by the consistency of the semen sample. Therefore, we need to research new methods for automated sperm analysis.
Through our broad team, we can actively invite people from multiple communities to submit solutions to the proposed task. We strongly believe that a significant fraction of multimedia researchers can contribute to the medical scenario. Therefore, we hope that many people are interested and involved on a personal level supporting a decision to work on the task and try out their ideas. To ensure that young researchers succeed, we will also provide mentoring for students that want to tackle the task (undergraduate and graduate levels are very welcome).
VISEM [2] contains data from 85 male participants aged 18 years or older. For this task, we have selected only 30 seconds video clips from selected 20 videos. For each participant, we include a set of measurements from a standard semen analysis, a video of live spermatozoa, a sperm fatty acid profile, the fatty acid composition of serum phospholipids, study participants-related data, and WHO analysis data. Every frame of videos have the corresponding bounding box coordinates of sperms. Each video has a resolution of 640x480 and runs at 50 frames-per-second. The dataset contains in total six CSV files (five for data and one which maps video IDs to study participants’ IDs), a description file, and folders containing the videos and bounding box data. The name of each video file contains the video’s ID, the date it was recorded, and a small optional description. Then, the end of the filename contains the code of the person who assessed the video. Furthermore, VISEM contains five CSV files for each of the other data provided, a CSV file with the IDs linked to each video, and a text file containing descriptions of some of the columns of the CSV files. One row in each CSV file represents a participant. The provided CSV files are:
In addition to the main dataset, VISEM-Tracking [1], we provide an additional graph dataset which was extracted from the VISEM-Tracking. More details about this graph dataset can be found here: https://huggingface.co/datasets/SimulaMet-HOST/visem-tracking-graphs.
The ground truth data provided in this task were prepared by expert computer scientists and verified by domain experts. Tracking ground truth uses the YOLO format while the motility ground truth is a CSV file containing the motility values.
For Task 1 and Task 2 will be evaluated using standard detection and tracking metrics. For detection, this includes precision, recall, mAP@50, and mAP@50-95. For tracking we use Jonathan Luiten’s TrackEval library, which includes HOTA and other MOT evaluation metrics. Efficiency will be evaluated based on the number of samples that can be processed per second. Task 1 will only focus on the prediction metrics, while Task 2 will be weighted by the speed of the system.
For Task 3 and Task 4, we can use Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE) to evaluate the predictions.
The evaluation scripts are available here: https://github.com/LouisDo2108/MediaEval2023-Medico-EvalScript
All files of Task 1, as detailed below, should be compressed into a single .zip file and uploaded to the designated section for Task 1 on the submission form.
source_code
|- code_and_checkpoints
|- README.txt (must explain how to run your model to detect sperms on a new video)
|- run.sh (shell script file to run your models for new video inputs (.mp4))
predictions
|- <test_video_ id>
|- labels
|- <video id>_frame_0.txt
|- <video id>_frame_1.txt
|- <video id>_frame_2.txt
...
|-labels_ftid (optional) # labels with unique feature IDs to track them via multiple frames
|- <video id>_frame_0.txt with tracking IDs.
|- <video id>_frame_1.txt with tracking IDs.
|- <video id>_frame_2.txt with tracking IDs.
...
|- <video id>.mp4 (showing sperm detection information)
|- <video id>_tracking.mp4 (showing sperm tracking information) - optional
|- <video id>_detection.json
|- tracking (same structure as this folder “VISEM_Tracking_Train_v4\trackeval_MOT\trackers\mot_challenge\MOT17-test\<any_name_you_want>\data” in the provided example file (https://drive.google.com/file/d/1nSsQbAMxCmZoLeEAQwVLYVbQA7zq2WQG/view?usp=sharing))
|- …
The ‘predictions’ folder should also contain the result JSON from the detection subtask and the optional tracking subtask as described in https://github.com/LouisDo2108/MediaEval2023-Medico-EvalScript#subtask-1-sperm-detection-and-tracking. The .txt files are also required in the case the JSONs are ill-formatted. This is an example format for the
[
{
"bbox": [
404.25,
260.75,
19.0,
18.5
],
"category_id": 0,
"image_id": 1,
"score": 0.84277
},
...
]
Make sure the “image_id” matches the annotations/Train.json or annotations/Val.json in the provided files. We provide an example detection_example_prediction.json in the example data.
All files for Task 2, as described below, should be compressed into a single .zip file and uploaded to the designated section for Task 2 on the submission form. We will execute your final model using our hardware resources; therefore, please ensure that your run.sh script functions correctly.
source_code
|- code_and_checkpoints
|- README.txt (must explain how to run your model to detect sperms on a new video)
|- run.sh (shell script file to run your models for new video inputs (.mp4))
predictions
|- <test_video_ id>
|- labels
|- <video id>_frame_0.txt
|- <video id>_frame_1.txt
|- <video id>_frame_2.txt
...
|-labels_ftid (optional) # labels with unique feature IDs to track them via multiple frames
|- <video id>_frame_0.txt with tracking IDs.
|- <video id>_frame_1.txt with tracking IDs.
|- <video id>_frame_2.txt with tracking IDs.
...
|- <video id>.mp4 (showing sperm detection information)
|- <video id>_tracking.mp4 (showing sperm tracking information) - optional
|- ...
We will compare your results with a ground truth file similar to ‘semen_analysis_data_Train.csv.’ Therefore, you are required to predict the following: Progressive Motility (%), Non-Progressive Sperm Motility (%), and Immotile Sperm (%). Please refer to the CSV file to locate these specific columns. Note that the sum of these three values should equal 100%. All files for Task 3, as described below, should be compressed into a single .zip file and uploaded to the designated section for Task 3 on the submission form.
– source_code
|– code_and_checkpoints
|– README.txt (must explain how to run your model to predict motility level of a new video)
|– run.sh (shell script file to run your models for new video inputs (.mp4)) # must work with test video files
– motility_predictions.csv
--------------
Sample format
--------------
ID, Progressive motility (%), Non progressive sperm motility (%), Immotile sperm (%)
1, 25, 75, 25
2, 45, 35, 20
…
The motility_predictions.csv should have the exact columns as the ground-truth and contains only the ID of the test set.
This task is experimental in nature. You are required to utilize your prediction models from either Task 1, Task 2, or both. Graph structures can be prepared using the predicted bounding boxes. Sample source code for generating these graphs can be accessed here: https://github.com/vlbthambawita/visem-tracking-graphs . An example video graph structure is available here: https://huggingface.co/datasets/SimulaMet-HOST/visem-tracking-graphs/tree/main/spatial_threshold_0.1/11 . Additional details about the graphs using development data can be found here: https://huggingface.co/datasets/SimulaMet-HOST/visem-tracking-graphs . All files related to Task 4, as detailed below, should be compressed into a single .zip file and uploaded to the designated section for Task 4 on the submission form.
– source_code
|– code_and_checkpoints
|- graph_data_stuctures
|– README.txt (must explain how to run your model to predict motility level of a new video using graph stuctures)
|– run.sh (shell script file to run your models for new video graph structures)
– motility_predictions.csv
--------------
Sample format
--------------
ID, Progressive motility (%), Non progressive sperm motility (%), Immotile sperm (%)
1, 25, 75, 25
2, 45, 35, 20
…
Google form for submission: https://forms.gle/5EYy2zVrjhbh9ZzU8
Please contact your task organizers with any questions on these points.
[1] Thambawita, V., Hicks, S.A., Storås, A.M. et al. VISEM-Tracking, a human spermatozoa tracking dataset. Sci Data 10, 260 (2023). https://doi.org/10.1038/s41597-023-02173-4
[2] Trine B. Haugen, Steven A. Hicks, Jorunn M. Andersen, Oliwia Witczak, Hugo L. Hammer, Rune Borgli, Pål Halvorsen, and Michael Riegler. 2019. VISEM: a multimodal video dataset of human spermatozoa. The Proceedings of the 10th ACM Multimedia Systems Conference (MMSys ‘19). Association for Computing Machinery, New York, NY, USA, 261–266. DOI:https://doi.org/10.1145/3304109.3325814
[3] Hicks, S.A., Andersen, J.M., Witczak, O. et al. Machine Learning-Based Analysis of Sperm Videos and Participant Data for Male Fertility Prediction. Sci Rep 9, 16770 (2019). https://doi.org/10.1038/s41598-019-53217-y
[4] Thambawita, V., Halvorsen, P., Hammer, H., Riegler, M., & Haugen, T. B. (2019). Stacked dense optical flows and dropout layers to predict sperm motility and morphology. arXiv preprint arXiv:1911.03086.
[5] Thambawita, V., Halvorsen, P., Hammer, H., Riegler, M., & Haugen, T. B. (2019). Extracting temporal features into a spatial domain using autoencoders for sperm video analysis. arXiv preprint arXiv:1911.03100.
Organizers
Co-organizers