Evaluation
Rigorous in-house evaluation
Key evaluation metrics
AUPR
Since we are more interested in predicting instances of political violence than the absence of such, ViEWS gives priority to the the AUPR over the AUROC, as the former rewards models more for accurately predicting conflict, as compared to absence of conflict.
AUROC
The goal is to maximize true positives relative to false positives. In other words, the measure rewards models for increasing detection of actual conflict (true positives) relative to “false alarms” (false positives).
A model that predicts perfectly has a ROC-AUC value of 1 and a model which cannot distinguish the true and false positive has a value of .5 (equal to a coin toss).
BRIER SCORE
CONFUSION MATRIX
DIVERSE MODEL CALIBRATION METRICS
ViEWS follows the guidelines of Colaresi and Mahmood (2017), who suggest an iterative loop whereby model representations are built from domain knowledge, their parameters computed, their performance critiqued, and then the successes and particularly the failures of the previous models inform a new generation of model representations. Crucial to this machine learning-inspired workflow are visual tools, such as model criticism and biseparation plots for researchers to inspect patterns that are captured by some models and ensembles but missed by others. We also expand on these tools, looking at mistakes in geographic context.