Generating the ViEWS forecasts

Overview of the methodology and forecasting models
ViEWS applies a “divide and conquer” strategy to the forecasting problem. Three outcomes (fatal state-based conflict, non-state conflict, and one-sided violence) are analyzed separately at two levels of analysis ( cm and pgm), all of which are allowed to inform each other. 

For each type of violence, the system generates a monthly probabilistic assessment of the likelihood that 25 or more battle-related deaths will occur in a given country and month over a rolling three-year window, and the predicted risk that at least one such fatality will occur per 0.5×0.5 decimal degree PRIO-GRID cell (approximately 55x55km each) and month. The system also generates a combined forecast of the risk that the thresholds above will be reached in a given country-month or PRIO-GRID-month from either one of the three types of violence. 

The sections below outline how the ViEWS forecasts are generated, presents an overview of the forecasting models, and introduces the model repository. 
Spatial coverage: Africa

Outcomes:
 state-based, non-state, and one-sided violence

Spatial units of analysis: 
countries and PRIO-GRID cells

Temporal unit of analysis: 
calendar months

Forecasting window: 
36 months

Update schedule:
 monthly

Identifying and weighting the predictors of conflict

Building the constituent models

Informing the ViEWS forecasts are advanced models that that compile, analyse, and evaluate historical time-series data from 1989 up until the month prior to each run of the forecasting system – the monthly production of the ViEWS forecasts. The data cover a multitude of variables that decades of peace research have shown to correlate with political violence, or – conversely – with the lack thereof.

Variables that share a common theme, such as conflict history or different measures of the strength of political institutions, are grouped together into so-called “constituent models”, which are trained and fitted independently.

Within the constituent models, each theme of variables is fed into a number of so-called random forecast algorithms – machine learning algorithms that learn from historical observations in order to generate forecasts for the future. The algorithms use a subset of the available data to identify predictors that perform particularly well in predicting conflict for a later subset of the same data. It repeats this multiple times, generating a list of the predictors in each theme that perform well over and over again – even taking into account the prevalence of non-linear relationships and interactive effects amongst the pool of predictors. Along with a calibration procedure, the result from this exercise is used to determine the relative weight that is placed on each variable when generating the constituent model forecasts, and (if needed) to weed out variables that have no bearing on the results.
1
1

Forecasting violence with the “wisdom of the crowds”

Compiling the model ensembles

Once the thematic constituent models are trained and fitted, they are combined into broader models known as “ensembles” – a key tenet of the ViEWS system. Much like a crowd is wiser than the single individuals composing it, broader models that make use of forecasts from a number of smaller and specialized models are known to generate more accurate predictions. In addition to the benefits of incorporating multiple themes of conflict predictors and thus becoming more comprehensive forecasting models, ensembles are less sensitive to overfitting and more robust to new data.

The ViEWS forecasts are currently generated by means of two such model ensembles: one that incorporates forecasts from constituent models trained specifically to predict conflict at the country level, and one that is trained for geographically refined locations spanning approximately 55x55km each (0.5×0.5 degrees). Both ensembles use calendar months as the temporal unit of analysis. They are known as the country-month (
cm) ensemble and the PRIO-GRID-month (pgm) ensemble and each contain a list of constituent models that are interpretable on their own and that have shown to improve the predictive performance of either one of the two ensembles. 16 models currently meet these criteria for the cm ensemble, and 12 for the pgm ensemble. An overview of these is presented in the model section below, described in depth in the ViEWS’ Special Data Feature in Journal of Peace Research.

Estimating the model weights

Similar to the evaluation procedure that the individual conflict predictors are subjected to in order to single out the most important variables in the constituent model forecasts, also the constituent models themselves undergo a weighting procedure upon incorporation into the final ensembles.

Over the course of the research project, ViEWS has tested a number of different techniques to this end. Up until February 2020, simple unweighted model averaging emerged as the preferred solution for both levels of analysis, as this method produced similar results to more complex weighting alternatives. This means that the final ensemble forecasts were estimated as a simple average of the forecasts generated by each of the included constituent models.

Following the introduction of a new infrastructure that provides more data for model weighting, ViEWS has however shifted to Ensemble Bayesian Model Averaging (EBMA) for the country-month (
cm) level. EBMA allows for inclusion of more models that specialize for subsets of the data, in addition to broader ones, resulting in more accurate forecasts. At the geographic ( pgm) level, unweighted model averaging however continues to be used, since the EMBA procedure does not yet improve the performance of forecasting system enough to justify a change.

The two procedures above are discussed at length in the Special Data Feature in Journal of Peace Research. Additional information is also found in Appendix D to that article, available on our publications page.
 
1
1

Computing the forecasts

To compute the forecasts, ViEWS makes use of two strategies: dynamic simulation (ds) and one-step-ahead modeling. The former builds on the procedures discussed in Hegre et al. (2013) and Hegre et al. (2016), where it is discussed at length. Both strategies are also discussed in ViEWS’ 2021 Special Data Feature in Journal of Peace Research and its appendices.

Read the 2021 Special Data Feature

ViEWS2020: Revising and evaluating the ViEWS political Violence Early-Warning System

Håvard Hegre, Curtis Bell, Michael Colaresi, Mihai Croicu, Frederick Hoyles, Remco Jansen, Maxine Ria Leis, Angelica Lindqvist-McGowan, David Randahl, Espen Geelmuyden Rød, and Paola Vesco Journal of Peace Research, Vol 58, Issue 3, 2021
Abstract
This article presents an update to the ViEWS political Violence Early-Warning System. This update introduces (1) a new infrastructure for training, evaluating, and weighting models that allows us to more optimally combine constituent models into ensembles, and (2) a number of new forecasting models that contribute to improve overall performance, in particular with respect to effectively classifying high- and low-risk cases. Our improved evaluation procedures allow us to develop models that specialize in either the immediate or the more distant future. We also present a formal, ‘retrospective’ evaluation of how well ViEWS has done since we started publishing our forecasts from July 2018 up to December 2019. Our metrics show that ViEWS is performing well when compared to previous out-of-sample forecasts for the 2015–17 period. Finally, we present our new forecasts for the January 2020–December 2022 period. We continue to predict a near-constant situation of conflict in Nigeria, Somalia, and DRC, but see some signs of decreased risk in Cameroon and Mozambique.

The forecasting models

An overview of the models currently informing the ViEWS forecasts at the country and sub-national levels of analysis. For detailed descriptions, please see the 2021 Special Data Feature in Journal of Peace Research with appendices above, or the model specifications available in the ViEWS source code. 
OVERVIEW

The country-level models 

The model ensemble trained to predict conflict at the country level consists of 16 smaller forecasting models that are divided into five key themes. 

For additional model specifications, please see the model and feature lists in our source code on GitHub, or the text-based model descriptions with feature importances in the online appendix to our 2021 Special Data Feature in Journal of Peace Research. 

Peace and security

Models informed by numerous measures of conflict and protest history, with data sourced from UCDP, ACLED, and the International Crisis Group. 
All models (9)
The UCDP conflict history model (cflong)
Features capturing different aspects of conflict history per country, as defined and sourced from the UCDP, including time since the last fatal event, which type of violence occurred, and which fatality thresholds were reached (at least 1, 25, 100, or 500 deaths).  

The 25 BRDs onset model
 (onset_24_25_all)
A model trained to predict onset of conflict, as recorded by the UCDP. Onset is defined as the first month that a country reaches or exceeds 25 battle-related deaths (BRDs) over a rolling 24-month window. The model captures all features informing the country-level models.

The country dummy model 
(cdummies
A model consisting of dummy variables based on a random forest variant of a random effects model (a type of regression model that assumes that the intercepts and/or some of the explanatory variables are random). 

The neighbour history model 
(neibhist
A model capturing the conflict history in neighbouring countries using a subset of the features from the cflong model. Sourced from UCDP. 

The dynamic simulation models
 ( ds_25; ds_dummy
Conflict history models sourced from UCDP that make use of dynamic simulations to generate predictions; one trained on the incidence of conflict with at least one battle-related death (BRD), one using the incidence of at least 25 BRDs, and one using the incidence of 500+ BRDs in a given month from state-based, non-state and one-sided violence together. Sourced from UCDP.  

The ACLED violence model
 (acled_violence)
Variables capturing the recent history of political violence as defined by the UCDP, sourced from ACLED. 

The ACLED protest model
 (acled_protest)
Variables capturing the recent history of protests in each country, sourced from the ACLED dataset.  

The ICG Crisis Watch model
 (icgcw
A model informed by monthly warnings issued by the International Crisis Group’s Crisis Watch. 

Governance

Models capturing the strength of political institutions coupled with comprehensive assessments of levels of democracy. Data is sourced from REIGN and V-Dem. 
All models (3)
The REIGN coups model (reign_coups)
A governance model predominantly informed by the predicted probability of coups from CoupCast (REIGN).  

The global REIGN model
 (reign_global
A global governance model informed by features derived from the monthly Rulers, Election, and Irregular Governance (REIGN) dataset, e.g. information on elections, leader traits, political regime tenures, and coups.  

The political institutions (V-Dem) model 
 (vdem_global)
A political institutions model informed by the Varieties of Democracy (V-Dem) dataset, which describes the political institutions of a country. Key features include physical integrity as a proxy for freedom from political killings and torture by the government, freedom of domestic movement, and indicators for rule of law and access to justice.

Development

Models capturing demographic data from IIASA, as well as development data from the World Development Indicators.  
All models (2)
The demography model (demog)
A development model capturing data on the Shared Socioeconomic Pathways (SSP) that represent socio-economic scenarios consistent with different climate mitigation and adaptation challenges. Data sourced from the IIASA dataset .

The World Development Indicators (WDI) model 
(wdi_global
A development model broadly capturing the level of development by country, including the quality of infrastructure, economic growth, national debt, education, unemployment, gender equality, health care and provision, agricultural dependence, migration flows, and country size. Sourced from the World Bank’s World Development Indicators.

Climate

A drought model informed by the REIGN dataset.  
All models (1)
The REIGN drought model (reign_drought
A climate model informed by the precipitation variable built into the REIGN dataset. 

Multi-feature

A multi-feature model trained on global data. 
All models (1)
The global model (all_global)  
A global model informed by all features that are fed into the country-level models, capturing interactions and non-linearities between the different predictors.
OVERVIEW

The sub-national level models 

Still informed by the country-level data (and vice versa), 11 models have been trained specifically to pick up on local variabilities in order to offer more geographically precise predictions of fatal conflict. They have been divided into three key themes.

For additional information, please see the 
model and feature specifications in our source code on GitHub, or the text-based model descriptions with feature importances in the online appendix to our 2021 Special Data Feature in Journal of Peace Research. 

Conflict history

Conflict history models informed by various measures of conflict history, including levels of violence, the time and the space proximity to the last fatal incidence. 
All models (7)
The geographic UCDP conflict history model (hist_legacy)
A model tracing the conflict history of each geographic grid-cell and its adjacent locations, as incidences of conflict are more likely in locations that have experienced conflict in the past. Sourced from UCDP.

The space-time conflict history model
(sptime)
A geographic-level conflict history model that captures the time since, and distance to, episodes of violence. Sourced from the UCDP.

The 1 and 100 BRDs onset models
( onset24_1_all, onset24_100_all)
Models trained to predict onset of conflict with at least one, or at least 100, battle-related deaths (BRD) in a given geographic location. Onset is defined as the first time a specific grid cell, or its neighbours, reaches the given threshold over a 24-month sliding window. The models use the feature set from the all_themes model, coupled with fatality estimates and conflict event counts related to the sptime model, and a subset of data from the Standardized Precipitation Evapotranspiration Index (SPEI), a water balance index computed from both precipitation and temperature data.

The XGBoost model
( all_gxgb)
A Gradient Boosting Machine (GBM) model using the feature set from the all_themes model, coupled with fatality estimates and conflict event counts related to the sptime model, and a subset of data from the Standardized Precipitation Evapotranspiration Index (SPEI), a water balance index computed from both precipitation and temperature data.

The dynamic simulation models
(ds_jpr2020_dummy, ds_jpr2020_greq_25)
Conflict history models making use of dynamic simulations to generate their forecasts. Trained on the incidence of conflict with at least 1 or at least 25 battle-related deaths (BRDs) in a given month from state-based, non-state, and one-sided violence together. Sourced from UCDP.

Human and natural geography

Models capturing terrain, distance to natural resources, human geography, and local development indicators, sourced from PRIO-GRID.
All models (2)
The natural geography model (pgd_natural)
A natural geography model capturing the spatial distance to exploitable resources such as diamonds and petroleum deposits, as well as data on the main type of land in the given area: cultivated areas, barren, forest, mountains, savanna, shrub, pasture, and urban areas. Sourced from PRIO-GRID.

The social geography model
(pgd_social)
A social geography model capturing a set of human geography features that may affect conflict, such as the distance to the capital, the nearest urban center, and the national border. It also captures grid-level population density and development variables such as local GDP, infant mortality rate, and the share of excluded ethnic groups in each location. Sourced from PRIO-GRID.

Multi-feature, cross-level

A multi-feature model trained on global data, and a cross-level model.
All models (2)
The all themes model (allthemes)
A broad model informed by outcome-specific features from all sub-national models, capturing interactions between different features. 

The cross-level model 
( crosslevel)
A cross-level model that allows the country and sub-national levels of analysis to inform one another.

Model specifications (ViEWS2)

Specifications of all models currently informing the ViEWS forecasts
Detailed information about the current models, such as the model estimators used for each model and the complete model feature list, can be found in our GitHub repository for ViEWS2. Key elements from the repository are linked below.
Please note that the forthcoming release of ViEWS (ViEWS3) contains a brand new infrastructure with new naming procedures. This page will be updated accordingly once it is launched. 

Terminology

Terminology in the model repository

col_outcome

The col_outcome attribute specifies the outcome that given a model serves to predict, i.e. surpass of a given threshold of fatalities from state-based (sb), non-state (ns), or one-sided (os) violence. The outcomes are variations of those recorded in the UCDP-GED dataset. Most common is the dummy encoder of whether 25 or more fatalities will occur from a certain form of violence, to be evaluated against the GED "best estimate" category. The latter outcome is indicated by the greq_25_ged_best prefix.

cols_features and colsets

The cols_features (column features) attribute specifies the sets of data, or – more specifically – the sets of data columns, that inform a given model. In the ViEWS database, all data entries relating to a given variable are collected in the same data column. Such columns contain either raw data on the given input variable – aggregated to ViEWS' levels of analysis – or data that have been processed by means of a specific modeling strategy. Data columns that share a common theme or data source are then grouped together into sets of columns (colsets), a selection of which each model makes use of. It is this selection of colsets that the cols_feature attribute specifies.

How colsets and col_features are named

colsets are named based on their respective themes and/or data sources. Those that are fully derived from the REIGN, V-Dem, or UCDP-GED datasets will contain the prefixes reign_, ged_ or vdem_. Thematically organised colsets with several different data sources will instead be named based on their common denominators, such as economy, gender or regime change.

col_features undergo a more structured specification. They are named by combining abbreviations or acronyms that depict their respective components. As a general rule, column names are constructed as follows: [f]_[parameter_1]_[parameter_2]_[col], where:

	f: transformation function name 
	parameter_1: value (optional) 
	parameter_2: value (optional) 
	col: source column 
Transformation functions

The various transformation functions applied by ViEWS are listed below:

  • delta_col: Time delta of col - tlag_1(col).
  • greq_value_col: Greater or equal dummy encoder.
  • smeq_value_col: Smaller or equal dummy encoder.
  • in_range_low_high_col: Dummy encoder for tlag_time_col
  • tlag_time_col: Time lag.
  • tlead: Time lead.
  • ma_time_col: Moving average over time
  • cweq_value_col: Count while col equals value.
  • time_since_col: Time since column != 0. Implemented as time-lag of 1 of count while col equals 0.
  • decay_halflife_col: Exponential decay function.
  • mean_col: Time-invariant mean of col.
  • ln_col: Natural log of col.
  • demean_col: De-meaned values of col. Is col - mean(col).
  • rollmax_window_col: Rolling max of time window.
  • onset_possible_col: Onset possible if no event occured in the preceeding window times.
  • onset_window_col: Onset is 1 if onset is possible and an event occured. 1 for first event in time window.
  • sum_cols: Sum of columns product: Product of columns.
  • spdist_col: Spatial distance to closest cell or country where col == 1.
  • stdist_k_tscale_col: Space-time distance to closest k cells or countries where col == 1.
  • splag_first_last_col: Spatial lag. Sum of col for all neighboring geographic units from first to last order neighbor. So splag_1_1_ged_dummy_sb is sum of ged_dummy_sb in immediate neighbors. splag_1_2_ged_dummy_sb is the sum of ged_dummy_sb in neighboring geographies and their neighbors. splag_2_2 would give a hollow circle of just neighbors neighbors, but not direct neighbors.

The most common transformation functions are the "greater than or equal to" dummy encoder (greq) and various time lags (tlag). The naming convention is that the transform name and parameters are prepended to the column name. transform_a(transform_b(col, params_b), params_a) is for example named transform_a_params_a_transform_b_params_b_col.

Parameters

Parameters are added where the transformation functions require further specifications, such as numerical thresholds for the "greater than or equal to" dummy encoders. When needed, these values are added immediately after the transformation function acronyms.

Source columns

The source columns (col), in turn, are with a few exceptions copied from the original data source. ged_best_sb is for example is a UCDP variable referring to best estimate (best) of the number of fatalities from state-based (sb) violence in a given time period, as recorded in their GED (ged) dataset.

A variable constructed by the "greater than or equal to" dummy encoder, the source variable ged_best_sb, and 1 fatality as the threshold and property value, would thus become greq_1_ged_best_sb.

Ensemble composition

Frequently updated lists of the constituent models currently included in the two model ensembles.

Constituent model specifications

Specifications of all models developed by ViEWS. For a list of the active models in each ensemble, please see Ensemble composition above. 

Model features (variables)

The colsets or “column sets” informing one or more of the constituent models developed by ViEWS. For more information, see Terminology above. 

Data sources

Data informing the models

The input data used in ViEWS are transported into tables in our database, where they are organised by theme and/or data source and prefixed accordingly. The individual sources are described below with their corresponding acronyms in parenthesis.

ACLED (acled_)

ACLED is the armed conflict location event data. ViEWS recodes ACLED into approximations of UCDP GED categories of violence. There are thus 8 primary columns exposed by ACLED in ViEWS data:

acled_count_pr: Protest event count
acled_count_sb: State-based violence event count
acled_count_ns: Non-state violence event count
acled_count_os: One sided violence event count
acled_fat_pr: Protest fatality count
acled_fat_sb: State-based violence fatality count
acled_fat_ns: Non-state violence fatality count
acled_fat_os: One sided violence fatality count
acled_dummy_[pr, sb, ns, os] are dummy encodings of acled_count_

FVP (fvp_)

A country-year dataset compiled for a another project. Combining data from VDEM, WDI, EPR. Columns prefixed prop_ are from EPR. Columns prefixed ssp2 are from SSP. Auto, demo, electoral, etc are from V-Dem.

GED (ged_)

The main outcome of ViEWS comes from UCDP-GED.6 main columns are exposed from GED:

ged_best_sb: Best estimate of fatalities for state-based violence.
ged_best_ns: Best estimate of fatalities for non-state violence
ged_best_os: Best estimate of fatalities for one-sided violence
ged_count_sb: Number of events for state-based violence
ged_count_ns: Number of events for non-state violence
ged_count_os: Number of events for one-sided violence

With the transform ged_dummy_[sb, ns, os] dummy encoding ged_count_[sb, ns, os].

ICGCW (icgcw_)

The international crisis group has an online conflict tracker at https://www.crisisgroup.org/crisiswatch.This is scraped and updates are encoded in 5 columns:

icgcw_alerts: Appeared in an alert
icgcw_deteriorated: Situation deteriorated
icgcw_improved: Situation improved
icgcw_opportunities: Opportunity spotted
icgcw_unobserved: Country doesn't appear

PRIO-GRID (pgdata_)

Prio-grid data is fetched from the PRIO-GRID API at https://grid.prio.org/#/apidocs. For full codebook see https://grid.prio.org/#/codebook. 41 columns are exposed from prio-grid with their original names retained. Columns where an yearly (_y) and an static (_s) version are sometimes taken the MAX() of to combine them.

REIGN (reign_)

REIGN Rulers, Elections, and Irregular Governance dataset. For details see https://oefdatascience.github.io/REIGN.github.io/.

SPEI (spei_)

SPEI GLobal Drought monitor. For details see https://spei.csic.es/map/maps.html.

VDEM (vdem_)

Varieties of democracy. Version 10 is currently loaded. For codebook see: https://www.v-dem.net/en/data/data-version-10/.Columns loaded from the Country-Year: V-DemFull+Others file. Columns ending in the following suffixes are currently not included due to memory constraints:

_codehigh
_codelow
_ord
_sd
_mean
_nr
_osp < br/>

WDI (wdi_)

World Bank World Development Indicators. Updated as of May 2020. Downloaded from http://databank.worldbank.org/data/download/WDI_csv.zip For details, see https://databank.worldbank.org/source/world-development-indicators.