See the MediaEval 2026 webpage for information on how to register and participate.
Given a tweet, determine whether it contains an implicit premise, an implicit conclusion, or neither. This is a three-class classification task.
implicit_premise, implicit_conclusion, or none.An implicit premise is a supporting assumption left unstated that the argument relies on. An implicit conclusion is a claim that follows from the stated premises but is never explicitly made. When neither component is missing, the label is none.
Tweets in the train and dev sets are each annotated by five independent annotators; those in the test set by three. Individual annotator labels — prior to any majority vote — are provided alongside the data, making it possible to treat disagreement as signal rather than noise.
Participants are invited to complete two tasks. While they may choose to complete only task 1, completion of task 2 is conditional upon prior completion of task 1.
Task 1: “Enthymeme Detection” — Detecting the absence or presence of enthymemes in tweets (three-class classification)
Task 2: “Proposition Generation” — For each tweet classified as containing an implicit argument, generate the text of the missing proposition. Task 2 requires prior completion of Task 1, as the predicted label is part of the input.
implicit_premise or implicit_conclusion).The generated sentence should be concise and declarative — it should make the unstated assumption or conclusion fully explicit, as if completing the argument.
Example:
If the tweet contains the following text: “Deterring the plans of illegal people smugglers is essential to controlled immigration. We should support all plans to stop them.”
The full argument can be reconstructed as:
In this example, the system should output: “Controlled immigration is desirable.”
Participating teams will write short working-notes papers that are published in the workshop proceedings (optional). We welcome two types of papers: first, conventional benchmarking papers, which describe the methods that teams use to address the task (enthymeme detection and implicit proposition generation) and analyze the results across the constrained and open runs; and second, “Quest for Insight” papers, which address a research question aimed at gaining deeper understanding of implicit argumentation, but do not necessarily present complete task results. Example questions for “Quest for Insight” papers include: How do different annotators interpret implicit premises? What linguistic features best signal the presence of enthymemes?
Enthymemes—arguments with missing components (premises or conclusions)—represent a fundamental challenge in understanding persuasive discourse and argumentation. These implicit arguments are particularly prevalent in social media contexts, where they serve as powerful means of persuasion (Lombardi Vallauri et al., 2020). By leaving key premises/conclusions unstated, enthymemes lead readers to perceive the implicit content as their own reasoning (Reboul, 2011), making them especially effective rhetorical devices.
The significance of detecting and reconstructing enthymemes extends beyond theoretical interest in argumentation theory. Enthymemes facilitate deceptive argumentation and manipulation, and help in spreading disinformation (Lombardi Vallauri et al., 2020). Understanding how implicit premises operate in controversial political discourse is therefore crucial for developing tools to combat misinformation and promote critical thinking.
The task of enthymeme detection can be framed as a binary classification problem: determining whether a given text contains an implicit argument or not. This simple formulation is interesting for several reasons. First, it provides a foundational step for more complex argument mining pipelines—before attempting to reconstruct missing propositions, systems must first identify where implicit argumentation occurs. Second, binary classification allows for systematic investigation of what linguistic and discourse characteristics signal the presence of enthymemes, enabling both interpretable models and empirical validation of theoretical claims about argumentation structure.
However, developing computational systems for enthymeme detection and reconstruction presents considerable challenges. The task is inherently interpretative, involving natural language inference and semantic interpretation where high human disagreement is common (Plank et al., 2014; Aroyo & Welty, 2015). Language tasks of this nature involve interpretation, multiple plausible answers, and indirect meanings (Pavlick & Kwiatkowski, 2019), and relying on a single “correct” label ignores rich variation in human judgments (Uma et al., 2021).
Our approach explicitly acknowledges the interpretative nature of the task by employing three independent annotators per instance, a design choice that would enable us to treat human label variation as signal rather than noise. This resource adds on an existing dataset of tweets (Flaccavento et al., 2025) and provides the first annotated dataset specifically designed for investigating enthymemes in controversial political discourse, enabling research into how discourse characteristics of enthymemes can improve their detection with NLP methods.
This task is interesting to anyone who is interested in text analysis. We expect it to attract people working in areas such as natural language processing, argument mining, computational linguistics, misinformation detection, and social media analysis, from both academic and industrial settings.
We especially welcome interdisciplinary teams, including researchers from argumentation theory, philosophy, rhetoric, communication studies, political science, and computational social science, as these perspectives are essential for understanding how implicit argumentation influences persuasion, shapes political discourse, and affects the processes by which audiences interpret and reason about controversial topics. The use of explicit structural modeling, linguistic feature-based approaches, and even rule-based systems of all sorts are encouraged.
The dataset consists of tweets that have been annotated by multiple annotators who judged whether or not the tweet contains an enthymeme. For each enthymeme, the annotators also propose a reconstruction of the implicit and explicit propositional content and argument structure in cases of enthymeme presence. The tweets are a subset of the tropes dataset by Flaccavento et al. (2025), which was selected to include a balance of tweets on the topics of immigration in the UK and the COVID-19 vaccine.
Participants are encouraged to visit the Enthymemes Dataset Portal for full documentation on dataset construction, annotation guidelines, the argumentation framework underlying the task, and an interactive explorer to browse and search the data.
The data will be released in three parts:
⚠️ Participants should be aware that the data contains language hurtful towards immigrants and should be ready for this when reading the data.
Task 1: Performance is measured with macro F1-score. Evaluation is conducted in two modes: a 3-class mode using the full label set (implicit_premise, implicit_conclusion, and none), and a 2-class (merged) mode where implicit_premise and implicit_conclusion are collapsed into a single implicit label, reducing the task to a binary distinction. The 2-class (merged) mode is the primary metric used for ranking in the official evaluation.
Task 2: The generated propositions will be evaluated in two ways. First, BERTScore will be used to compare the reconstructions provided by the annotators with the propositions generated by the participants. Second, a subset of the test set will be sampled and evaluated by hand by experienced human annotators.
Data will be made available as of the 1st of March.
[1] Aroyo, L., & Welty, C. (2015). Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1), 15–24.
[2] Flaccavento, A., Peskine, Y., Papotti, P., Torlone, R., & Troncy, R. (2025, January). Automated detection of tropes in short texts. In Proceedings of COLING 2025: 31st International Conference on Computational Linguistics.
[3] Plank, B., Hovy, D., & Søgaard, A. (2014). Linguistically debatable or just plain wrong? In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 2: Short Papers, pp. 507–511). Association for Computational Linguistics.
[4] Reboul, A. (2011). A relevance-theoretic account of the evolution of implicit communication. Studies in Pragmatics, 13(1).
[5] Uma, A., Fornaciari, T., Hovy, D., Paun, S., Plank, B., & Poesio, M. (2021). Learning from disagreement: A survey. Journal of Artificial Intelligence Research, 72, 1385–1470.
[6] Vallauri, E. L., Baranzini, L., Cimmino, D., Cominetti, F., Coppola, C., & Mannaioli, G. (2020). Implicit argumentation and persuasion: A measuring model. Journal of Argumentation in Context, 9, 95–123.