This report by Team Klus for the National Institute of Justice’s (NIJ’s) Challenge to develop a recidivism prediction tool involves the exercise to develop a model for first-year recidivism that would minimize the potential for racial and gender bias.
The data were provided by NIJ in conjunction with the Georgia Department of Community Supervision regarding demographics, criminal history, and other characteristics of parolees in Georgia from January 2013 through December 2015. The data were limited to parolees whose race was identified as either Black or White. Parolees of Hispanic, Asian, Native American, or other racial or ethnic groups were not included in the provided data. Although the team engaged in some feature engineering, the data were not augmented by any external sources. One of the most time-consuming tasks in the development of this model was cleaning the data so that it would be usable for modeling purposes. Re-leveling categorical variables, collapsing categories, and in some cases transforming categorical variables to ordinal variables to achieve a more parsimonious model were key components of this task. The most important component of the model development strategy for this challenge was exploratory data analysis. The NIJ data consisted of 49 candidate predictors obtained by assessing their relationship to the outcome of interest, i.e., recidivism at 1-year post-release. The candidate predictors were viewed as belonging to one of six broad groups: demographics, general risk factors, criminal history, parole history, drug use, and employment. The impact of these variables on recidivism by race and gender are reported .