Häffner and P. von der Maase Present Active Learning Framework at ISA 2026

A presenter discusses dimensions of attacks on education, highlighting core modalities such as violence, military occupation, and intimidation. The audience is engaged in a conference setting.
Machine Learning Developer Sonja Häffner presents recent work on the PRIO-based EdAttack project at ISA 2026.

Columbus, USA — On 23 March, Sonja Häffner and Simon Polichinel von der Maase presented their paper, From Scarcity to Signal – Combining LLMs, Synthetic Data and an Active Learning Framework for Rare Event Detection, in a panel titled New Data in International Relations at the International Studies Association (ISA’s) annual convention in Columbus, Ohio. Their paper is part of the EdAttack project led by Gudrun Østby at PRIO, involving several members of the VIEWS team.

Paper abstract:
Building structured event datasets from text is slow, expensive, and still relies largely on manual labour, especially when the events of interest are rare. While recent advances in NLP, particularly large language models (LLMs), have led to more automated approaches, many of these rely on simple prompting strategies that struggle to capture complex, domain-specific nuances and are often not reproducible. In practice, human expertise remains essential.

In this paper, we present a semi-automated pipeline to construct an event dataset from newspaper articles, focusing on attacks on education as a rare and underreported event type. The pipeline consists of two main components: an event extraction step that structures information from text, and an active learning framework that prioritizes the most informative samples for human annotation. To address extreme class imbalance, we incorporate synthetic data to augment rare event classes. This allows us to make more efficient use of limited labeling resources while improving model performance. Overall, our approach combines LLM-based extraction with human-in-the-loop learning to produce more reliable and scalable event data.

Browse the slides from their presentation

Learn more about the EdAttack project and the team behind it