Evaluation

An overview of the evaluation procedures in place to assess the predictive performance of the ViEWS system
The ViEWS system is evaluated in a number of ways. Most importantly, detailed out-of-sample evaluations are conducted in-house each year with full transparency.  The system can also be evaluated externally as all source code and replication data are publicly available. The ViEWS team furthermore collaborates with a number of other forecasting projects, renowned researchers and research institutes, traditional country experts, IGOs and NGOs,  solidifying the unprecedented scope and performance of the ViEWS system. The sections below discuss this is more detail. 

Rigorous in-house evaluation

Full transparency about the system’s predictive performance
Detailed out-of-sample evaluations are conducted in-house each year as new UCDP-GED data on ViEWS’ outcomes are released. Since model performance is multidimensional, ViEWS relies on a suite of metrics for these procedures. This offers a more complete picture of model performance, and lowers the risk of favouring models that perform well in one aspect (for example correctly classifying the absence of conflict) over others (for example correctly classifying conflict). An overview of the metrics is presented in the section below. Results from the last two annual evaluations can be found in ViEWS’ 2019 and 2021  articles in Journal of Peace Research, in the latter of which we also discuss the procedures used to determine what models go into the final forecasting ensembles.

Key evaluation metrics

AUPR

AUPR is a relative measure of how precisely a model predicts true positives and the true positive rate. Precision is measured as the proportion of predicted conflict onsets that are correct. This means that the AUPR measure rewards models for getting conflicts correct once a model predicts them. Since only a small percentage of observations experience conflict, it is more difficult to get predictions of conflicts correct than it is to get predictions of the absence of conflict correct. AUPR is therefore a more demanding measure than AUROC.

Since we are more interested in predicting instances of political violence than the absence of such, ViEWS gives priority to the the AUPR over the AUROC, as the former rewards models more for accurately predicting conflict, as compared to absence of conflict.

AUROC

AUROC summarizes performance as a relative measure of the true positive rate and the false positive rate of predictions.

The goal is to maximize true positives relative to false positives. In other words, the measure rewards models for increasing detection of actual conflict (true positives) relative to “false alarms” (false positives).

A model that predicts perfectly has a ROC-AUC value of 1 and a model which cannot distinguish the true and false positive has a value of .5 (equal to a coin toss).


BRIER SCORE

The Brier score measures the accuracy of probabilistic predictions. It favours sharp, accurate probabilistic predictions (near 0 or 1), which is different to the relative ordering of the forecasts that is needed for the computation of the AUPR and AUROC. The Brier score is particularly useful to distinguish models that perform similarly or inconsistent on the AUPR and AUROC scores.

CONFUSION MATRIX

A confusion matrix tabulates the performance of a model by actual class (did we observe conflict or not) and predicted class (did we predict conflict or not). When looking at binary outcomes, this becomes a two-by-two table with true positives, false positives, false negatives, and true negatives.

DIVERSE MODEL CALIBRATION METRICS

A system that produces probabilistic forecasts should also be well calibrated to the actual data. When the model suggests that there is a X percent chance of an event, do events happen approximately X percent of the time? Calibration can be effectively gauged visually using calibration plots, in which forecasts are binned on the x-axis and the frequency of actual events within the observations in each bin is plotted on the y-axis. A perfectly calibrated model follows a 45 degree angle. Calibration can also be gauged over time by plotting the actual vs predicted frequency of events in a given time interval.

ViEWS follows the guidelines of Colaresi and Mahmood (2017), who suggest an iterative loop whereby model representations are built from domain knowledge, their parameters computed, their performance critiqued, and then the successes and particularly the failures of the previous models inform a new generation of model representations. Crucial to this machine learning-inspired workflow are visual tools, such as model criticism and biseparation plots for researchers to inspect patterns that are captured by some models and ensembles but missed by others. We also expand on these tools, looking at mistakes in geographic context.

Install ViEWS

To install ViEWS on your local machine and replicate the ViEWS forecasts and/or evaluation procedures, please visit our GitHub repository. 

External assessments of the ViEWS system

Publicly available source code and replication data
In addition to rigorous in-house evaluations, the ViEWS team welcomes external assessments of the forecasting system. 

All ViEWS source code, in- and output data are available to the public free of charge. The source code along with an installation guide is readily available in ViEWS’ GitHub repository. Replication data and/or pre-computed forecasts can be shared upon request to views@pcr.uu.se. Select data (due to the size thereof) are also available through our Resources. Forecasts are available through the API, interactive dashboard, and our publications.

Collaborations

ViEWS collaborates with a number of other forecasting projects and renowned researchers across the globe – results from which are iteratively incorporated into the ViEWS system.

Comparison to other forecasting systems and expert assessments

In addition to in-house and external evaluations, the ViEWS team engages and collaborates with research teams and traditional country experts across the world.  

THe country expert survey

ViEWS engages with 70+ traditional country experts participating in a survey-based sub-project to ViEWS. The head-hunted experts respond to a quarterly online questionnaire on the conflict dynamics and conflict risks in their countries of expertise. These data will be fed into a forthcoming public survey dataset, and the explicit forecasts thereof will serve as another benchmark against which to compare and contrast the data-driven ViEWS forecasts.

THe Annual forecasting workshop

Each year, the ViEWS team invites renowned scholars and practitioners to a workshop on the topic of conflict forecasting – most recently in the form of a forecasting competition culminating in a forthcoming special issue showcasing the results.