Description of original award (Fiscal Year 2023, $166,500)
The legal and technical AI communities have developed concepts of fairness largely independently. Consequently, complications arise on how technical progress on AI fairness can be incorporated into existing sources of law that are relevant to downstream high-stakes applications such as recidivism risk assessment for bail, probation, sentencing, or parole. From the legal side, we propose to map the individual versus group fairness dichotomy onto US sources of law, as well as to identify the challenges of translating a high-level constitutional concept like “equal protection” into operationalizable fairness standards of criminal law, selecting permissible features, and guarding against overreliance on AI. Three technical research questions follow:
RQ1: At which technical stage should individual v. group fairness be enforced?
RQ2: How much scrutiny and which fairness criteria does each feature warrant?
RQ3: How does AI reliance relate to the fairness of criminal risk assessment by humans?
For RQ1, we propose semi-structured interviews with domain experts followed by both deductive and inductive coding to attain qualitative insights about whether and why individual or group fairness should be enforced on each technical component of a four-stage criminal risk assessment pipeline: dataset pre-processing, algorithm design, output post-processing, and result analysis.
For RQ2, we give a conceptual extension of the case laws-based framework of continuous scrutiny ranges for features. Our extension includes three discrete scrutiny thresholds: third-highest threshold (exclusion from model inputs), second-highest threshold (group parity required), and highest threshold (ignorance in individual similarity function). The pass conditions of these thresholds are related to procedural fairness, group outcome fairness, and individual outcome fairness, respectively. With this conceptual framework, we propose quantitative and qualitative human studies to arrange disputed features with respect to those thresholds, including both algorithmic innovations and semi-structured expert interviews.
For RQ3, we propose an AI-assisted bail decision human study to characterize the relationship between reliance metrics (appropriate reliance, over-reliance, and under-reliance) versus individual or group fairness metrics per participant. The goal is to spot any concerning patterns, e.g. over-reliance above a certain pattern may severely affect fairness of humans’ decisions.
Across the three RQs, we expect several concrete short term outcomes, such as publications at Fairness-oriented, Human-centric, or Law-informed Computer Science conferences (e.g. FAccT, AIES, CHI, ICAIL). In terms of broader social impact, our work will systemize the process of evaluating the fairness of recidivism risk assessment models across US jurisdictions and answer the normative questions about which demographic features are relevant. CA/NCF