Evaluating Spatial Predictions
The Analytics X Prize evaluates entries by comparing the RMSE of the predicted proportion of homicides per zip code versus the actual proportion of homicides per zip code. RMSE is a standard way of comparing the predictive quality of models but suffers from a coarseness of resolution.
I would like to discuss what I think is a more intuitive method for evaluating spatial predictive models. The problem of predicting crime can be thought of as a resource allocation problem. Law enforcement would like to answer the question “If we were to surveil X percentage of the area in our jurisdiction, what percentage Y of crime would we prevent?”. If we assume that police presence and surveillance of an area where a crime is going occur prevents that crime all the time, then if the police were able to surveil 100% of the area they would stop 100% of the crime. Of course, this is impossible, hence the resource allocation perspective of this problem. A modified ROC plot that we call a Surveillance Plot is an intuitive visual display of the effectiveness of a spatial predictive model. The following evaluation plot was produced on a model generated against Philadelphia homicide data up to November 2009 and then evaluated against data for the month of December 2009.
The x-axis represents the percentage of the area is surveilled. The y-axis represents the proportion of homicides that occurred in the area that has been surveilled. So, if law enforcement surveilled the top 20% most threatened areas according to the predictive model, they would prevent 62.5% of homicides.
This kind of evaluation scales with the resolution of the prediction. The resolution of the model I am using broke Philadelphia into a grid of cells that are 56ft by 59ft. That is then the smallest incremental unit by which I can evaluate the resulting prediction. The resolution can be tweaked to a level appropriate for the resource allocation problem the law enforcement community faces.