Replication Studies are Vital to Science We Can Count On

Date

December 21, 2020

Speaker

Science is a cornerstone of the justice system, and reliability is a cornerstone of science. Justice tools we take for granted today, such as crime mapping software, DNA identification of suspects, and automated fingerprint systems, are products of well-tested science. Extraordinary tools are now ordinary features of our justice system because the science has persuaded the courts and law enforcement that those tools, in the end, are highly reliable.

As a facilitator of the science that supports our nation’s justice system,[1] the National Institute of Justice also acts as a caretaker of the reliability of justice research. Each year, NIJ articulates leading social science and forensic science research needs and then funds high-quality proposals from science teams to perform those studies. NIJ shepherds that work to conclusion, for the benefit of justice institutions and those they serve.

One core function of NIJ, as the scientific research, development, and evaluation arm of the Department of Justice, is initiating scientifically sound evaluations of the effectiveness of new justice system models, methods, and programs. Through its CrimeSolutions.ojp.gov resource, NIJ also evaluates and rates the effectiveness of various criminal justice, juvenile justice, and victimization-related tools and approaches, posting the results online. Knowing whether a crime-fighting resource is rated as “effective,” “promising,” or “no effect” can have high utility for an agency or official weighing the merits of using or maintaining that resource.

The dependability of the science NIJ nurtures has been a point of emphasis for this agency throughout its half-century history. Now, with the scientific enterprise accelerating as never before, in volume and complexity, NIJ has redoubled its commitment to fostering science as reliable as it is relevant.

One measure of that commitment is NIJ’s push in recent years for wider use of randomized controlled trials, or RCTs, the gold standard of evaluation. RCTs divide subjects into equivalent treatment and control groups to reliably and precisely measure an experimental treatment’s effects. NIJ’s recent grant solicitations have placed a premium on research proposals built around RCTs, wherever feasible.

Research Replication — Double-Checking Impactful Science

A vital indicator of the reliability of scientific research is its replicability, that is, the capacity to achieve consistent results across different studies designed to answer the same research questions. NIJ is inviting new replication research in its fiscal year 2021 research solicitations.

On the social sciences side, 2021 NIJ solicitations will welcome proposals to replicate seminal studies in several research areas. Replication research is vital for building evidence for what works. To that end, solicitations will also encourage study proposals to revisit programs rated “promising” or “effective” in the CrimeSolutions repository, by applying rigorous RCTs where less rigorous quasi-experimental evaluations had initially generated those ratings.

On the forensic science side, NIJ’s 2021 solicitations will welcome proposals for replication studies of published results that have the potential to significantly influence forensic science policy and practice, or that have already had such an impact but have not been independently verified. These solicitations will instruct applicants proposing replication research to offer strong justification for why the original results might be called into question, or why the potential consequences for forensic laboratories acting on those results are significant enough to warrant replication.

Replication research can reinforce and validate existing studies. But just as importantly, it can expose material error in the original work. Stated another way, well-designed replication research can uncover critical errors in the original work — errors that, if unrecognized, could mislead other researchers, practitioners, and policymakers. According to a recent comprehensive report by the National Academies on reproducibility and replicability in science, some common causes of research non-replicability, or inability to replicate earlier research results, are that one or more of the following factors compromised the original work:[2]

Publication bias
Misaligned incentives
Inappropriate statistical inference
Study design
Incomplete reporting of a study
Errors in analysis

A rigorous NIJ-sponsored multi-site randomized controlled trial (RCT) evaluation of a popular probation reform model underscored the value of scientific reexamination of existing methods. The original reform program, pioneered in Hawaii called Hawaii’s Opportunity Probation with Enforcement (HOPE).[3], was designed to use swift, certain, and fair sanctioning for even relatively minor probation violations, in order to keep individuals on probation in line, drug-free, and out of prison. The 2009 single-site RCT funded by NIJ found, after one year, that the treatment group was less likely to be arrested for a new crime, less likely to use drugs, and less likely to have their probation revoked than those on regular probation.[4] Eventually, 28 states, one Indian nation and one Canadian province adopted a form of the model. The approach seemed to have a lot of promise, but the question remained – can the results be replicated?

So NIJ funded a multi-site RCT, called the HOPE Demonstration Field Experiment (DFE), that attempted to replicate the original HOPE effects in four locations in the states of Arkansas, Massachusetts, Oregon, and Texas.[5] Overall, this large-scale study found that the HOPE model was not associated with significant reductions in arrests and was unlikely to yield cost savings. The HOPE DFE clearly established that the model was no more effective than conventional probation programs in terms of rates of re-arrest, re-conviction, and revocation of probation.[6] Other jurisdictions will now have the benefit of that evaluation when deciding whether or how to reform their own probation system policies and practices.

As the volume of research rapidly expands, so does the opportunity for error. In 2016, nearly 2.3 million research articles were published, according to the National Science Foundation.[7] The explosion of computing power in the 21st century, often now coupled with databases of unprecedented magnitude, has greatly increased the proportion of research driven by big data. That trend may exacerbate the risk of misleading conclusions stemming from any number of factors, from incorrect assumptions, to mistakes at the data input stage, to application of flawed algorithms.

In key respects, replication research can be more beneficial in the social sciences than in other fields. Social science research commonly relies on statistical inference from human subjects data. Strict experimental controls in these studies are often costly or infeasible. In the physical and life sciences, in contrast, controlled experiments are common and prior results are often independently confirmed as an essential step of new original research.

Some areas of research where the hard sciences and social sciences overlap, however, may be especially amenable to replication. The measurement of forensic examiner performance is one such area. An influential 2016 report by the President’s Council of Advisors on Science and Technology (PCAST) called for studies of examiner error rates—so called “black box” studies—in the forensic pattern disciplines.[8] But it also specifically called for replication of those studies, so that the results could be understood with greater confidence.

Replication Advances Solutions to Crime

For justice communities, the essential value of replication research, along with other measures of scientific reliability, is finding out what works and thereby advancing solutions to crime. A replication study that validates existing research could, for example, improve the rating of a new law enforcement method from “promising” to “effective.”

The quest for reliability cannot mask the uncertainty inherent in all science. A lack of scientific transparency, however, should not be allowed to contribute to that uncertainty. As the NAS report pointed out, researchers should be transparent about all factors influencing their methods and results, including all known factors informing the conclusions. Transparency elevates the field generally. Specifically, it helps replication researchers down the line fully comprehend the limits of the original work and the task at hand.

Going forward, large-scale meta-analyses may emerge as stronger measures of research reliability than individual, one-to-one replication studies. Whatever the mechanism, advancing justice through science must include ensuring research reliability.

Replication Studies are Vital to Science We Can Count On

Research Replication — Double-Checking Impactful Science

Replication Advances Solutions to Crime

Notes