VIEWS2 methodology
For each type of violence, the VIEWS2 system generated a monthly probabilistic assessment of the likelihood that 25 or more battle-related deaths would occur in a given country and month over a rolling three-year window, and the predicted risk that at least one such fatality would occur per 0.5×0.5 decimal degree PRIO-GRID cell (approximately 55x55km each) and month. The system also generated a combined forecast of the risk that the thresholds above would be reached in a given country-month or PRIO-GRID-month from either one of the three types of violence.
The sections below outline how the VIEWS2 forecasts were generated, presents an overview of these forecasting models, and guides you through the model repository.
Outcomes: state-based, non-state, and one-sided violence
Spatial units of analysis: countries and PRIO-GRID cells
Temporal unit of analysis: calendar months
Forecasting window: 36 months
Update schedule: monthly
Identifying and weighting the predictors of conflict
Building the constituent models
Variables that share a common theme, such as conflict history or different measures of the strength of political institutions, are grouped together into so-called “constituent models”, which are trained and fitted independently.
Within the constituent models, each theme of variables is fed into a number of so-called random forest algorithms – machine learning algorithms that learn from historical observations in order to generate forecasts for the future. The algorithms use a subset of the available data to identify predictors that perform particularly well in predicting conflict for a later subset of the same data. It repeats this multiple times, generating a list of the predictors in each theme that perform well over and over again – even taking into account the prevalence of non-linear relationships and interactive effects amongst the pool of predictors. Along with a calibration procedure, the result from this exercise is used to determine the relative weight that is placed on each variable when generating the constituent model forecasts, and (if needed) to weed out variables that have no bearing on the results.
Forecasting violence with the “wisdom of the crowds”
Compiling the model ensembles
The ViEWS forecasts are currently generated by means of two such model ensembles: one that incorporates forecasts from constituent models trained specifically to predict conflict at the country level, and one that is trained for geographically refined locations spanning approximately 55x55km each (0.5×0.5 degrees). Both ensembles use calendar months as the temporal unit of analysis. They are known as the country-month ( cm) ensemble and the PRIO-GRID-month (pgm) ensemble and each contain a list of constituent models that are interpretable on their own and that have shown to improve the predictive performance of either one of the two ensembles. 16 models currently meet these criteria for the cm ensemble, and 12 for the pgm ensemble. An overview of these is presented in the model section below, described in depth in the ViEWS’ Special Data Feature in Journal of Peace Research.
Estimating the model weights
Over the course of the research project, ViEWS has tested a number of different techniques to this end. Up until February 2020, simple unweighted model averaging emerged as the preferred solution for both levels of analysis, as this method produced similar results to more complex weighting alternatives. This means that the final ensemble forecasts were estimated as a simple average of the forecasts generated by each of the included constituent models.
Following the introduction of a new infrastructure that provides more data for model weighting, ViEWS has however shifted to Ensemble Bayesian Model Averaging (EBMA) for the country-month ( cm) level. EBMA allows for inclusion of more models that specialize for subsets of the data, in addition to broader ones, resulting in more accurate forecasts. At the geographic ( pgm) level, unweighted model averaging however continues to be used, since the EMBA procedure does not yet improve the performance of forecasting system enough to justify a change.
The two procedures above are discussed at length in the Special Data Feature in Journal of Peace Research. Additional information is also found in Appendix D to that article, available on our publications page.
Computing the forecasts
Read the 2021 Special Data Feature
ViEWS2020: Revising and evaluating the ViEWS political Violence Early-Warning System
The forecasting models in use 2020-2021
The country-level models
For additional model specifications, please see the model and feature lists in our source code on GitHub, or the text-based model descriptions with feature importances in the online appendix to our 2021 Special Data Feature in Journal of Peace Research.
Peace and security
Features capturing different aspects of conflict history per country, as defined and sourced from the UCDP, including time since the last fatal event, which type of violence occurred, and which fatality thresholds were reached (at least 1, 25, 100, or 500 deaths).
The 25 BRDs onset model (onset_24_25_all)
A model trained to predict onset of conflict, as recorded by the UCDP. Onset is defined as the first month that a country reaches or exceeds 25 battle-related deaths (BRDs) over a rolling 24-month window. The model captures all features informing the country-level models.
The country dummy model (cdummies)
A model consisting of dummy variables based on a random forest variant of a random effects model (a type of regression model that assumes that the intercepts and/or some of the explanatory variables are random).
The neighbour history model (neibhist)
A model capturing the conflict history in neighbouring countries using a subset of the features from the cflong model. Sourced from UCDP.
The dynamic simulation models ( ds_25; ds_dummy)
Conflict history models sourced from UCDP that make use of dynamic simulations to generate predictions; one trained on the incidence of conflict with at least one battle-related death (BRD), one using the incidence of at least 25 BRDs, and one using the incidence of 500+ BRDs in a given month from state-based, non-state and one-sided violence together. Sourced from UCDP.
The ACLED violence model (acled_violence)
Variables capturing the recent history of political violence as defined by the UCDP, sourced from ACLED.
The ACLED protest model (acled_protest)
Variables capturing the recent history of protests in each country, sourced from the ACLED dataset.
The ICG Crisis Watch model (icgcw)
A model informed by monthly warnings issued by the International Crisis Group’s Crisis Watch.
Governance
A governance model predominantly informed by the predicted probability of coups from CoupCast (REIGN).
The global REIGN model (reign_global)
A global governance model informed by features derived from the monthly Rulers, Election, and Irregular Governance (REIGN) dataset, e.g. information on elections, leader traits, political regime tenures, and coups.
The political institutions (V-Dem) model (vdem_global)
A political institutions model informed by the Varieties of Democracy (V-Dem) dataset, which describes the political institutions of a country. Key features include physical integrity as a proxy for freedom from political killings and torture by the government, freedom of domestic movement, and indicators for rule of law and access to justice.
Development
A development model capturing data on the Shared Socioeconomic Pathways (SSP) that represent socio-economic scenarios consistent with different climate mitigation and adaptation challenges. Data sourced from the IIASA dataset .
The World Development Indicators (WDI) model (wdi_global)
A development model broadly capturing the level of development by country, including the quality of infrastructure, economic growth, national debt, education, unemployment, gender equality, health care and provision, agricultural dependence, migration flows, and country size. Sourced from the World Bank’s World Development Indicators.
Climate
A climate model informed by the precipitation variable built into the REIGN dataset.
Multi-feature
A global model informed by all features that are fed into the country-level models, capturing interactions and non-linearities between the different predictors.
The sub-national level models
For additional information, please see the model and feature specifications in our source code on GitHub, or the text-based model descriptions with feature importances in the online appendix to our 2021 Special Data Feature in Journal of Peace Research.
Conflict history
A model tracing the conflict history of each geographic grid-cell and its adjacent locations, as incidences of conflict are more likely in locations that have experienced conflict in the past. Sourced from UCDP.
The space-time conflict history model (sptime)
A geographic-level conflict history model that captures the time since, and distance to, episodes of violence. Sourced from the UCDP.
The 1 and 100 BRDs onset models ( onset24_1_all, onset24_100_all)
Models trained to predict onset of conflict with at least one, or at least 100, battle-related deaths (BRD) in a given geographic location. Onset is defined as the first time a specific grid cell, or its neighbours, reaches the given threshold over a 24-month sliding window. The models use the feature set from the all_themes model, coupled with fatality estimates and conflict event counts related to the sptime model, and a subset of data from the Standardized Precipitation Evapotranspiration Index (SPEI), a water balance index computed from both precipitation and temperature data.
The XGBoost model ( all_gxgb)
A Gradient Boosting Machine (GBM) model using the feature set from the all_themes model, coupled with fatality estimates and conflict event counts related to the sptime model, and a subset of data from the Standardized Precipitation Evapotranspiration Index (SPEI), a water balance index computed from both precipitation and temperature data.
The dynamic simulation models (ds_jpr2020_dummy, ds_jpr2020_greq_25)
Conflict history models making use of dynamic simulations to generate their forecasts. Trained on the incidence of conflict with at least 1 or at least 25 battle-related deaths (BRDs) in a given month from state-based, non-state, and one-sided violence together. Sourced from UCDP.
Human and natural geography
A natural geography model capturing the spatial distance to exploitable resources such as diamonds and petroleum deposits, as well as data on the main type of land in the given area: cultivated areas, barren, forest, mountains, savanna, shrub, pasture, and urban areas. Sourced from PRIO-GRID.
The social geography model (pgd_social)
A social geography model capturing a set of human geography features that may affect conflict, such as the distance to the capital, the nearest urban center, and the national border. It also captures grid-level population density and development variables such as local GDP, infant mortality rate, and the share of excluded ethnic groups in each location. Sourced from PRIO-GRID.
Multi-feature, cross-level
A broad model informed by outcome-specific features from all sub-national models, capturing interactions between different features.
The cross-level model ( crosslevel)
A cross-level model that allows the country and sub-national levels of analysis to inform one another.
Model specifications (VIEWS2)
Terminology
col_outcome
The col_outcome
attribute specifies the outcome that given a model serves to predict, i.e. surpass of a given threshold of fatalities from state-based (sb), non-state (ns), or one-sided (os) violence. The outcomes are variations of those recorded in the UCDP-GED dataset. Most common is the dummy encoder of whether 25 or more fatalities will occur from a certain form of violence, to be evaluated against the GED "best estimate" category. The latter outcome is indicated by the greq_25_ged_best
prefix.
cols_features
and colsets
The cols_features
(column features) attribute specifies the sets of data, or – more specifically – the sets of data columns, that inform a given model. In the ViEWS database, all data entries relating to a given variable are collected in the same data column. Such columns contain either raw data on the given input variable – aggregated to ViEWS' levels of analysis – or data that have been processed by means of a specific modeling strategy. Data columns that share a common theme or data source are then grouped together into sets of columns (colsets
), a selection of which each model makes use of. It is this selection of colsets
that the cols_feature
attribute specifies.
How colsets
and col_features
are named
colsets
are named based on their respective themes and/or data sources. Those that are fully derived from the REIGN, V-Dem, or UCDP-GED datasets will contain the prefixes reign_
, ged_
or vdem_
. Thematically organised colsets
with several different data sources will instead be named based on their common denominators, such as economy, gender or regime change.
col_features
undergo a more structured specification. They are named by combining abbreviations or acronyms that depict their respective components. As a general rule, column names are constructed as follows: [f]_[parameter_1]_[parameter_2]_[col]
, where:
f: transformation function name
parameter_1: value (optional)
parameter_2: value (optional)
col: source column
Transformation functions
The various transformation functions applied by ViEWS are listed below:
delta_col
: Time delta ofcol - tlag_1(col)
.greq_value_col
: Greater or equal dummy encoder.smeq_value_col
: Smaller or equal dummy encoder.in_range_low_high_col
: Dummy encoder fortlag_time_col
tlag_time_col
: Time lag.tlead
: Time lead.ma_time_col
: Moving average over timecweq_value_col
: Count while col equals value.time_since_col
: Time since column!= 0
. Implemented as time-lag of 1 of count while col equals 0.decay_halflife_col
: Exponential decay function.mean_col
: Time-invariant mean of col.ln_col
: Natural log of col.demean_col
: De-meaned values of col. Is col - mean(col).rollmax_window_col
: Rolling max of time window.onset_possible_col
: Onset possible if no event occured in the preceeding window times.onset_window_col
: Onset is1
if onset is possible and an event occured.1
for first event in time window.sum_cols
: Sum of columns product: Product of columns.spdist_col
: Spatial distance to closest cell or country wherecol == 1
.stdist_k_tscale_col
: Space-time distance to closestk
cells or countries wherecol == 1
.splag_first_last_col
: Spatial lag. Sum of col for all neighboring geographic units from first to last order neighbor. Sosplag_1_1_ged_dummy_sb
is sum ofged_dummy_sb
in immediate neighbors.splag_1_2_ged_dummy_sb
is the sum ofged_dummy_sb
in neighboring geographies and their neighbors.splag_2_2
would give a hollow circle of just neighbors neighbors, but not direct neighbors.
The most common transformation functions are the "greater than or equal to" dummy encoder (greq
) and various time lags (tlag
). The naming convention is that the transform name and parameters are prepended to the column name. transform_a(transform_b(col, params_b), params_a)
is for example named transform_a_params_a_transform_b_params_b_col
.
Parameters
Parameters are added where the transformation functions require further specifications, such as numerical thresholds for the "greater than or equal to" dummy encoders. When needed, these values are added immediately after the transformation function acronyms.
Source columns
The source columns (col
), in turn, are with a few exceptions copied from the original data source. ged_best_sb
is for example is a UCDP variable referring to best estimate (best) of the number of fatalities from state-based (sb) violence in a given time period, as recorded in their GED (ged
) dataset.
A variable constructed by the "greater than or equal to" dummy encoder, the source variable ged_best_sb
, and 1 fatality as the threshold and property value, would thus become greq_1_ged_best_sb
.
Ensemble composition
Constituent model specifications
Model features (variables)
Data sources
The input data used in ViEWS are transported into tables in our database, where they are organised by theme and/or data source and prefixed accordingly. The individual sources are described below with their corresponding acronyms in parenthesis.
ACLED (acled_)
ACLED is the armed conflict location event data. ViEWS recodes ACLED into approximations of UCDP GED categories of violence. There are thus 8 primary columns exposed by ACLED in ViEWS data:
acled_count_pr
: Protest event count
acled_count_sb
: State-based violence event count
acled_count_ns
: Non-state violence event count
acled_count_os
: One sided violence event count
acled_fat_pr
: Protest fatality count
acled_fat_sb
: State-based violence fatality count
acled_fat_ns
: Non-state violence fatality count
acled_fat_os
: One sided violence fatality count
acled_dummy_[pr, sb, ns, os]
are dummy encodings of acled_count_
FVP (fvp_)
A country-year dataset compiled for a another project. Combining data from VDEM, WDI, EPR. Columns prefixed prop_
are from EPR. Columns prefixed ssp2
are from SSP. Auto, demo, electoral
, etc are from V-Dem.
GED (ged_)
The main outcome of ViEWS comes from UCDP-GED.6 main columns are exposed from GED:
ged_best_sb
: Best estimate of fatalities for state-based violence.
ged_best_ns
: Best estimate of fatalities for non-state violence
ged_best_os
: Best estimate of fatalities for one-sided violence
ged_count_sb
: Number of events for state-based violence
ged_count_ns
: Number of events for non-state violence
ged_count_os
: Number of events for one-sided violence
With the transform ged_dummy_[sb, ns, os]
dummy encoding ged_count_[sb, ns, os]
.
ICGCW (icgcw_)
The international crisis group has an online conflict tracker at https://www.crisisgroup.org/crisiswatch.This is scraped and updates are encoded in 5 columns:
icgcw_alerts
: Appeared in an alert
icgcw_deteriorated
: Situation deteriorated
icgcw_improved
: Situation improved
icgcw_opportunities
: Opportunity spotted
icgcw_unobserved
: Country doesn't appear
PRIO-GRID (pgdata_)
Prio-grid data is fetched from the PRIO-GRID API at https://grid.prio.org/#/apidocs. For full codebook see https://grid.prio.org/#/codebook. 41 columns are exposed from prio-grid with their original names retained. Columns where an yearly (_y) and an static (_s) version are sometimes taken the MAX() of to combine them.
REIGN (reign_)
REIGN Rulers, Elections, and Irregular Governance dataset. For details see https://oefdatascience.github.io/REIGN.github.io/.
SPEI (spei_)
SPEI GLobal Drought monitor. For details see https://spei.csic.es/map/maps.html.
VDEM (vdem_)
Varieties of democracy. Version 10 is currently loaded. For codebook see: https://www.v-dem.net/en/data/data-version-10/.Columns loaded from the Country-Year: V-DemFull+Others
file. Columns ending in the following suffixes are currently not included due to memory constraints:
_codehigh
_codelow
_ord
_sd
_mean
_nr
_osp
< br/>
WDI (wdi_)
World Bank World Development Indicators. Updated as of May 2020. Downloaded from http://databank.worldbank.org/data/download/WDI_csv.zip For details, see https://databank.worldbank.org/source/world-development-indicators.