fatalities002

fatalities002 is the second iteration of the fatalities model. Since its release in early 2023, it generates monthly predictions for impending state-based conflict across the world up to three years in advance.

Scope and Coverage

Model Documentation

How the Model Works

Predicted outcome #1

Number of fatalities

Point predictions for the number of fatalities per country-month and PRIO-GRID month.

Predicted outcome #2

Probability of conflict

Predicted probability of at least 25 battle-related deaths (BRDs) per country-month and at least 1 BRD per PRIO-GRID-month.

Predicted type(s) of violence

State-based conflict

Per UCDP definition: inter- or intrastate armed conflicts over government or territory, in which at least one of the warring parties are directly affiliated with a government of a state.

Country-level coverage

Global

The country level of analysis is based on the Gleditsch & Ward (1999) list of independent states, combined with the GIS dataset CShapes that specifies the geographic coverage of the included countries.

Sub-national coverage

Africa + Middle East (0.5°)

The sub-national level of analysis is derived from PRIO-GRID 2.0, a spatial grid structure of quadratic cells that jointly cover all areas of the world at a resolution of 0.5 x 0.5 decimal degrees, approximately 55×55 km around the equator.

Lead time

1-36 months

The model generates predictions for each month in a rolling 3-year window.

Update schedule

Monthly

The model generates new predictions each month, based on the most recently available input data.

Codebase

Open code

Source code and additional documentation of this model are available under a CC-BY-NC 4.0 license.

Model documentation series

The Fatalities Models

Paper series documenting the iterative development of the conflict prediction models known as the fatalities models, complete with change histories recording progression through model versions.

Models

Levels of analysis & Dependent variables

Partitioning & Time shifting

Ensembling & Calibration

How Are the Forecasts Generated?

What data informs the model?

Input data (predictors)

The model is informed by publicly available (open-source) time-series data on hundreds of conflict-related indicators, ranging from quickly-changing drivers such as conflict history and political context, to slowly changing structural drivers such as democracy indices, the strength of political institutions, demography, and development indices. The model also makes use of climate data and indicators on societal vulnerability to climate extremes, as well as natural and social geography such as terrain type, proximity to natural resources, and distance to urban areas and country borders.

Browse the input data catalogue for the fatalities002 model

Data harmonization

As the raw datasets are ingested into the VIEWS database, the indicators undergo a series of post-processing procedures. To match the units and levels of analysis used by the VIEWS models, sub-national data is aggregated to PRIO-GRID cells, and country level data is matched with the country identifiers applied by the VIEWS system. This harmonizes the input data and allows users to query the database on the country identifier(s) of choice, for example country names, ISO codes, Gleditch-Ward country IDs, or the internally assigned VIEWS country IDs. Similarly, all data is aggregated to a monthly or annual temporal resolution.

Feature Engineering

Why categorize data into feature sets? Categorizing input data variables into feature sets is part of the standard data organization routines in VIEWS, which greatly facilitates model development. Amongst other benefits, it allows us to call upon a pre-determined set of features, which is maintained in a single location, when training our models. This minimizes the risk of human error when compiling the input datasets and greatly facilitates maintenance of the model documentation.

As a final step, the features above are grouped into feature sets based on the overall theme they relate to and/or the data provider(s) from which they are derived. They are documented in the GitHub repository viewsforecasting, coupled with notations of any data transformations that have been applied to the original variables. The various data transformations applied by the system, in turn, are described in a dedicated Jupyter notebook. These include temporal and spatial lags, efforts to fill in for missing data, decay functions, etc.

How does the model work?

From feature sets to sub-models

Once the input data has been ingested, post-processed, and organized into feature sets, each feature set is combined with an advanced machine learning algorithm that uses patterns in a subset (partition) of the feature set data to infer predictions for the future.

These sub-models, or constituent models are they are more commonly called, are specialized at deducing both linear and non-linear trends identified amongst their own themes of conflict drivers. As such, they not only approach the prediction challenge from a novel perspective but also from a different methodological approach.

What machine learning algorithms does the model use?

The fatalities002 model currently employs four different machine learning algorithms to train the sub-models: random forests, gradient boosting, markov models, and hurdle models. They are discussed in the technical paper on the first iteration of the fatalities model.

Combining sub-models into ensembles: learning from the ‘wisdom of the crowd’

Much like a crowd tends to be wiser than the individuals composing it, prediction models that are informed by a number of smaller and specialized sub-models are known to be more robust and generate stronger predictions than single models.

The sub-models above are therefore combined into two groups of models: one for each level of analysis. This is known as model ensembling and constitute a core tenet of the VIEWS model.

Two different ensembling techniques are employed by the fatalities002 model:

The country-level ensemble model combines the predictions from each of the sub-models using a genetic algorithm. It assigns different weights to the contribution from each model in order to maximise predictive performance. The weight distribution is optimized for each month ahead that the model seeks to predict, allowing the ensemble to place more emphasis on sub-models that capture quickly changing factors such as conflict history when predicting one or a few months into the future, and give sub-models that capture structural factors more weight when forecasting several years ahead.
The sub-national ensemble model, in turn, uses a simple unweighted average of the sub-model results for each month it seeks to predict. At this level of analysis, it has proven to be as effective as the more complex genetic algorithm used at the country level of analysis.

The ensembling techniques above are motivated and described at length in the technical report on the first iteration of the fatalities model.

Data infrastructure

The fatalities models are built in a rigorous and sophisticated data infrastructure called VIEWS3 – the third iteration of the back-end system and database supporting the VIEWS system. While parts of the database are restricted in order to comply with the user licenses applied by our data providers, the codebase – much like our model documentation – is available under an open-source license in a series of GitHub repositories that ensure full transparency of our work. The VIEWS3 infrastructure is provided alongside the web-based CLI viewser, which allows users to interact with the VIEWS3 back-end system directly from the browser.