One of the first steps in a rape investigation[1] is the responding officer’s written report. What the officer includes and how those conclusions are worded can have an impact on the case.
In a study sponsored by the National Institute of Justice (NIJ) that used cross-disciplinary research, data scientists applied machine learning techniques to nearly two decades’ worth of police reports on rape cases.
The data scientists used advanced computational power to support social scientists in a study of how evidence of officer sentiment — meaning opinions and subjectivity — toward victims’ credibility may affect key procedural decisions down the line, such as whether to prosecute a rape case.
The study — conducted by a team of scholars from Case Western University, Cleveland State University, and Texas A&M University — aimed to identify linguistic “signaling” of officers’ views or biases found in their narratives of rape reports.
The research team evaluated narratives in more than 5,600 police reports of rape in one large urban jurisdiction from 1993–2011. Using sentiment analysis, a form of natural language processing, to screen for words or phrases that contain hidden emotions — such as dissatisfaction, happiness, or doubt — researchers detected and interpreted evidence of officer emotion or bias found in the narratives. (See “Natural Language Processing: AI Dives into Human Narratives” and “Leveraging the Nuance of Qualitative Analysis on a Larger Scale.”)
The study demonstrates the power of algorithm-based technology to help social and behavioral sciences address pressing social issues. Cross-disciplinary research, like this joint effort of data scientists and social scientists, is a key priority of NIJ Director Nancy La Vigne.
Report Language Matters
The research yielded important new insights on the relationship between police incident reports and case outcomes. In considering these findings, it is important to note that, overall, the reports did not contain high levels of sentiment. Findings from the study include:
- Longer police reports were highly predictive of more investigative activity and successful case outcomes such as convictions.
- Incident report length was associated with the reporting officer’s view of the assault victim’s credibility, with longer reports linked to higher perceptions of credibility.
- The rape cases with the most successful prosecution outcomes tended to have significant positive officer sentiment and subjectivity in the police incident reports compared to reports with negative or neutral (non-significant) officer sentiment.
- Successfully prosecuted cases were often those in which officer reports used the terms “rape” and “unlawful” as well as the criminal statute number for rape, signals that indicated the officer believed the victim was credible.
- In the more successful rape prosecution cases, initial police incident reports contained details of investigative procedures and case facts.
- Black victims tended to be subjects of police reports with fewer words and with words that were more negative or subjective compared to reports on non-Black victims (almost all of whom were white).
- Reports in which the suspect was not named were, on average, 100 words shorter.
- Cases found not to be supported by the evidence or law and cases closed because of a lack of victim engagement had reports with fewer words and were more negatively worded.
- Phrases most predictive of a case proceeding to prosecution included references to prosecutorial involvement or arresting, charging, or naming a suspect.
The researchers also noted that, overall, reports did not change much in terms of sentiment levels or report length over the two-decade study period.
Implications for Best Practices
The researchers identified several important implications of the collaborative study for best practices, including that:
- Officers should write detailed incident reports in rape investigations to include a narrative provided by the victim that conveys the trauma of rape.
- Rape reports should document not just what happened to the victim or what they did or did not remember. They should include as many details about the rape as possible, such as the victim’s fears and thoughts and what they heard, saw, smelled, tasted, etc.
- Victims should receive better support throughout the entire criminal justice process even before the report is written.
- Law enforcement agencies should prioritize improved report writing.
- Officers should minimize the number of irrelevant and nonfactual statements and observations, especially in reference to the victim.
Researchers highlighted some examples of factually unsupported “signaling” statements from officer incident reports, including:
- “. . . observed no bruises, contusions on the female nor were her clothes disheveled. At times during the interview, she smirked as if it was funny, but she did show signs that she was in pain or discomfort.”
- “Juvenile has had sex in the past.”
- “Victim is a known prostitute and crack cocaine abuser.”
The researchers observed, with respect to these statements from different police incident reports:
The report writer does not provide detail as to why there were no bruises and disheveled clothes, why a victim’s prior sexual history or being a “known prostitute” [is] mentioned or relevant. . . However, without that important next statement qualifying why the factual statement is pertinent to the investigation, a human likely reads this as signaling — disbelieving the victim’s statements and/or blaming a victim for what happened to them.
Conclusion
This cross-disciplinary research broke new ground through machine learning analysis that gained insights into the significance of police incident report language in rape investigations. The study identified how certain words could signal officer attitudes regarding victim credibility and possibly foreshadow assault case outcomes.
Importantly, the NIJ-supported research enabled researchers to leverage the nuances of qualitative (or narrative-based) data on a scale previously seen only in quantitative (or numbers-driven) studies.
About This Article
The research described in this article was funded by NIJ award 2018-VA-CX-0002, awarded to Case Western Reserve University. This article is based on the grantee report “Using Sentiment Analysis and Topic Modeling in Assessing the Impact of Police Signaling on Investigative and Prosecutorial Outcomes in Sexual Assault Reports,” by Rachel Lovell, Joanna Klingenstein, Jiaxin Du, Laura Overman, Danielle Sabo, Danielle Flannery, and Xinyue Ye.
Sidebar: Natural Language Processing: AI Dives Into Human Narratives
Machine learning is an artificial intelligence (AI) application that mimics the human brain’s ability to learn from experience. From the criminal justice perspective, a critical function of machine learning is pattern recognition. Self-learning algorithms use datasets to understand how to identify people from images, complete intricate computational and robotics tasks, detect medical conditions from complex scans, and understand online purchasing habits.
Natural language processing is a branch of machine learning designed to enable computers to process language the way that humans do. Although computers have surpassed humans in data-driven calculations, it was believed that computers could not master qualitative tasks such as analyzing or creating narratives. Recent developments in AI, however, establish that machines can perform qualitative tasks.
Quantitative research collects and analyzes numerical data to measure variables. Qualitative research collects non-numerical data to gain insights on a subject. It generally measures views and attributes rather than hard numbers. Qualitative research adds a human voice and narrative to research, creating a human context for research findings.
Sidebar: Scaling Qualitative Analysis
Natural language processing enabled the investigators to, in the words of their report, “leverage the nuance of qualitative research on a scale previously seen only in quantitative assessments.”
Media coverage of, public interest in, and concern over the exceptional power and ability of AI products that perform human cognitive tasks, such as producing academic essays that mimic human composition, underscore how much and how quickly the power of natural language processing has evolved.