Predicting fatalities using newspaper text

Abstract:

We submit two entries to the challenge, both country and grid-cell level predictions, using our proven ingredients and revised methodologies. Our approach relies on topics derived from summarizing a corpus of over 6 million newspaper articles, alongside historical conflict data in a Random Forest framework. For grid-cell level predictions, we use the locations detected in our corpus. Due to data availability and structure, we employ distinct strategies for the country level and grid-cell level sample forecasts. At the country level, we sample predicted errors based on the predicted probability of conflict and the predicted number of fatalities using a Tweedie distribution. In contrast, for grid-cell level forecasts, we draw samples from percentiles obtained through a Quantile Forest Regression.

Authors:

Alexandra Málaga, Hannes Mueller, Christopher Rauh, and Benjamin Seimon