This is Team MattMarifelSora’s submission to the National Institute of Justice’s (NIJ’s) Recidivism Forecasting Challenge.
The team applied data processing and machine learning techniques to predict how likely it was that individuals would recidivate. This included applying hierarchical Bayesian target encoding and trained models that are known to perform well on binary classification and multiclass classification problems that involve tabular data. Following the industry standard in machine learning competitions, the team combined predictions from many models into an ensemble to boost the team’s score. In its work, the team used gradient boosted decision trees via the XGBoost and LightGBM libraries and created a custom MLP with skip connections using the PyTorch library. In addition, the team used the dreamquark implementation of a modern neutral network architecture known as TabNet, which takes advantage of attention mechanisms to selectively focus on input features. Further, the team tried NODE and SVM models; however, their performances were comparatively worse and not included in the team’s pipeline. Regarding efforts to reduce racial bias in predicting recidivism this was complicated by bias in initial arrests, since arrest data persist in data that informs recidivism.