Program evaluation is essential to ensuring that prison systems adopt effective programs and policies. The “gold standard” method for evaluating outcomes of programs and policies is the randomized controlled trial (RCT), a type of scientific experiment featuring random assignment of individuals to either a treatment or control group in order to precisely measure the treatment’s impact.
This article presents an overview of the RCT design, as well as its use and importance in a correctional setting. The takeaway from this article is that, while conducting RCT evaluations can be met with skepticism and challenges, where feasible RCTs deliver superior, more reliable evaluations of the impact of policies and programs.
A singular advantage of RCTs, over other evaluation methods, is their ability to reliably establish a causal link between a program or policy and an outcome.
When evaluating the impact of a particular program or policy, an evaluator is typically attempting to draw that link between the program or policy (X) and a specific outcome (Y), independent of any other external influences (Z). This is referred to as causal inference. In order to develop a strong causal link between program or policy X and outcome Y, at least three criteria must be satisfied:
- X must precede Y temporally;
- X and Y must “covary” together — that is, whenever X changes, Y changes in correlation to X; and
- There can be no other factor Z that explains the relationship between X and Y.
As an example, if the impact of a drug treatment program on the rate of drug relapse is being explored, the drug treatment (X) must be provided before the period of time that drug relapse (Y) is measured, there must be a relationship between receiving the drug treatment program (X) and whether or not drug relapse (Y) occurs, and there cannot be variation between those who do and do not receive the drug treatment (X) in other factors that affect drug relapse (Y) — such as a person’s internal motivation to change (Z).
Establishing a good comparison the causal impact of a program or policy. The goal is to compare those receiving the treatment program or policy of interest to a comparison group of those who look identical to the treatment group, with the only difference being that one group receives the treatment and the other group does not.
While various statistical options exist for establishing a comparison group, the RCT is the strongest design because it best establishes equally comparable groups on all known/measurable and unknown/unmeasurable factors. It accomplishes this by assignment of eligible individuals at random either to a treatment group (those who will receive the program or policy) or to a control group (those who will not receive the program or policy). By random assignment, the treatment and control group are essentially identical on all observed and unobserved factors. That is why the RCT design is so appealing, often referred to as the “gold standard” of evaluation methods.
Why Not Use a Quasi- experimental Alternative to the RCT?
a program or policy, the alternative to an RCT is an observational study, in which the evaluator must rely on a retrospective look at the individuals who have already participated in the program or policy and try to identify a suitable comparison group of those who did not participate. This approach is difficult because often program participants self-select into treatment, or treatment is assigned based on predetermined criteria. Differing outcomes following the treatment may be due to preexisting differences between the groups, as opposed to the treatment itself. This phenomenon is often referred to as selection bias.
Evaluators can choose from a number of alternative statistical methods for creating comparable groups and addressing selection bias. These methods are referred to as quasi-experimental designs. For several reasons, however, none of these methods can produce estimates of the causal impact of a program or policy that are as unbiased and consistent as the estimates produced by an RCT design.
First, quasi-experimental designs only allow the evaluator to address observed selection biases. Only the RCT can also address unobserved biases, such as differences in internal motivation to change. Second, even if very little unobserved selection bias existed, the statistical methods required to address observed biases in high-quality quasi-experimental designs are quite sophisticated and complex. Performing these experiments correctly, with a reasonable level of credibility, requires a fair amount of statistics knowledge and experience. In other words, it is easy to get things wrong using quasi-experimental methods. Third, it is fairly well documented that quasi-experimental methods tend to exaggerate the size of the effects found in criminal justice programs or policies. That distortion produces misleading results on what programs or policies work and by how much.
It should therefore be suspected that to the degree evaluation research in correctional contexts relies on quasi-experimental designs, it is likely exaggerating the true impact of programs and policies. This may lead policymakers to commit too many resources to programs and policies that do not work or that have minimal impacts.
A Word of Caution on Evidence-based Practices
Another strength of randomized controlled trials is setting a scientifically sound foundation for evidence-based practices (EBPs), a common term in correctional settings. It has become popular for policymakers to claim that they are implementing EBPs. In theory, this is a good thing. In practice, however, EBPs are only as good as the quality of the evidence behind them. In some cases, the evidence base for EBPs is thin, exaggerated, or weak. For example, some so-called EBPs are indeed based on dozens of studies documenting their effectiveness. However, a closer look reveals that most of those studies used weaker quasi-experimental designs with all of the limitations previously mentioned, including exaggerating the true impact. A program or policy should not be declared an EBP only on the basis of the number of evaluations finding a positive impact. Rather, programs and policies should be judged based on both the number and the quality of the evaluations. RCTs establish quality to a degree that quasi-experimental designs cannot.
Adopting programs or policies that have been labeled as EBPs as a result of research conducted elsewhere can also be a way to avoid evaluating the impact of a jurisdiction’s own programs or policies, as actually implemented in that jurisdiction. For instance, prison staff may decide to adopt a specific drug treatment program that has received the EBP label due to its positive findings from evaluations in other prisons or jurisdictions, and then claim that there is no need to evaluate it locally because it is already known to be an EBP. The problem with this logic becomes evident when a program or policy labeled as an EBP does not actually work when it is transplanted to a context different from the one in which it was originally evaluated (e.g., in a different prison or among a different population), or when it is implemented differently. Conversely, programs or policies that are not labeled as EBPs may actually be effective in prisons or other specific environments, or when implemented in a certain way. It is crucially important to evaluate programs locally rather than rely on evaluation results from other jurisdictions.
Common Objections to Conducting RCTs in Corrections
It is not unusual for the proposed use of an RCT evaluation design in a correctional setting to encounter one or more common objections. The first is that it is unethical to assign participants to a program or policy on a random basis. Practitioners will often say they are concerned that if an RCT evaluation is conducted, someone in need of a program will be withheld from that program as a result of being randomly selected for the control group. For instance, an individual soon to be released from prison may be withheld from a new reentry services program because he or she was randomly assigned to a control group. However, it is important to understand that this reasoning assumes that solid evidence already exists establishing the program’s effectiveness in producing its intended impacts. The real impact of a program or policy is often not truly known.
The reason that an RCT evaluation is proposed in the first place is that there is uncertainty as to the impact of the program or policy in question. If it was already known with a high degree of certainty that a program or policy worked, then it would be easier to argue that it is unethical to randomly hold back people from receiving the program or policy. There have been instances of well-intentioned programs, however, that turned out to make people worse off. In such a case, random assignment of individuals to a control group would have actually made them better off. If the impact of a program is not known and there are limited program resources, then the fairest way to assign someone to the program is through random assignment (like a lottery system).
A second common objection to an RCT evaluation is that conducting RCTs is expensive and slow. A traditional RCT could cost thousands of dollars to conduct and take several years to complete. Fortunately, RCT evaluations do not need to follow this traditional model — they do not need to be expensive and slow. Alternative RCT evaluation models are emerging that involve rapid-cycle testing and the use of existing staff to minimize cost and time. One example of the new model comes from an organization at New York University named BetaGov, whose mission is to help policymakers and government agencies identify problems, develop innovative solutions, and test them rapidly using rigorous research methods. BetaGov has helped government organizations in criminal justice and other public sector areas conduct dozens of rapid RCT evaluations. The typical BetaGov RCT evaluation is concluded in three to six months. The model is drawn from the private sector, which has long relied on simple, pragmatic RCTs to improve efficiency and performance.
RCT Evaluations in Policing
behind the field of policing in embracing the use of RCTs to evaluate programs, practices, and policies. RCTs have generally caught on earlier and with more traction in policing than in corrections. This section provides an example of one bold policing experiment that should serve to stimulate correctional agencies to further embrace RCT evaluation designs.
The Kansas City Preventive Patrol Experiment was an RCT designed to test the assumption that the presence of police officers in marked cars reduced the likelihood of a crime being committed. The experiment took different police beats in Kansas City and randomized them to varying patrol routines: (1) no patrol routine but only reactive calls from residents, (2) a normal level of patrol, or (3) two to three times the normal level of patrol. The study found that the rate at which crime was reported did not vary across the different patrolling routines, nor did citizen perceptions of crime vary across the routines. This groundbreaking study in part moved modern American policing away from random preventive patrolling to more proactive and targeted patrolling.
Examples like this can motivate the field of corrections to catch up with the policing field in embracing RCTs, a practice that can move corrections forward significantly.
An Example from One State Prison Jurisdiction: Pennsylvania
Although the corrections field in general has lagged behind in adopting RCTs, there are exceptions. Over the past 15 years, the Pennsylvania Department of Corrections (PA DOC) has conducted several program evaluations using an RCT design, including evaluations of a reentry program, a life skills program, a therapeutic community program, a medication-assisted treatment program for incarcerated persons with an opioid use problem, and a post-release community relocation program. Until 2015, PA DOC followed a traditional model for conducting RCT studies. While this model worked well for the department in certain cases, it also suffered limitations noted above, including cost and duration concerns.
In 2015, PA DOC, with free support from BetaGov, started using a rapid-cycle model for conducting RCT evaluations and experimentation around three agency goals:
- Reducing in-prison violence;
- Reducing the use of restrictive housing; and
- Improving staff wellness.
All staff at every level in the agency were invited to submit ideas for experimenting with new programs, practices and policies around these three goals. Since 2015, more than 200 trial ideas have been submitted, and at least three dozen RCT evaluations have been completed.
Trials tested practices such as varying rates of pat searches; providing visitors with notification of the consequences of bringing in contraband; use of colored bed sheets for bed linens as an alternative to the traditional white bed sheets; aromatherapy; a swift and certain inmate discipline system in response to minor misconduct; an anxiety-reduction “chill plan” program for incarcerated females; use of virtual reality as an incentive for good behavior; the introduction of an intelligence officer staff position at the prison; unit dogs; suicide prevention training; and crisis intervention team training for working with incarcerated persons who are mentally ill.
PA DOC still uses the traditional RCT evaluation design for larger interventions that take more time to evaluate. Currently, the department is conducting several large-scale RCT evaluations, including an evaluation of providing Pell Grants for funding in-prison college courses, an evaluation of a program for teaching financial management skills, and an evaluation of providing those who are released from prison with overdose-reversing naloxone kits before release.
If corrections professionals are interested in understanding the true causal impact of various policies and programs, the RCT evaluation design provides the strongest model for doing so. Correctional programs and policies should be evaluated locally rather than rely on evidence from other jurisdictions. Such evaluations do not need to be expensive or drawn out over long periods of time.
Ethical objections to RCTs, typically on the grounds that control group members are denied treatment benefits, may fail to consider the fact that benefits are commonly unknown until the scientific evaluation of a program is complete. Experience demonstrates that most RCTS are ethical by design and in practice. In the end, there is no general ethical or logistical barrier preventing RCTs in correctional environments.
The field of corrections has lagged behind other criminal justice fields (such as policing) in embracing RCT designs, but this can change. Correctional departments should commit to fostering a learning organization, where the strongest possible evidence is generated for making decisions about what programs, policies, and practices to use or not use.
[note 1] Weisburd, David, Cynthia M. Lum, and Anthony Petrosino, “Does research design affect study outcomes in criminal justice?” The Annals of the American Academy of Political and Social Sciences 578 (2001): 50-70, doi: 10.1177/000271620157800104.