U.S. flag

An official website of the United States government, Department of Justice.

A Century of Ballistics Comparison Giving Way to Virtual 3D Methods

New systems allow high-definition scans of bullets and cartridge cases to be shared and compared virtually.
Date Published
March 23, 2022

The scene is repeated endlessly on television crime shows, be it fictional stories of murder and mayhem or docudramas of actual police work. Cartridge cases or bullets from a crime scene are sent to a forensics lab for testing and, moments later in TV time, a suspect’s gun is linked to the crime. The evidence is overwhelming, and the shooter is arrested.

Confirming that a particular gun fired a particular bullet seems straightforward. The bullet, spinning its way down a gun’s rifled barrel picks up marks specific to that barrel. And the cartridge case marks caused when a gun fires have been long-thought to be unique to that weapon. Match the grooves on the bullets or the marks on the cartridge case, and that is solid evidence they were fired by the same gun.

Except it isn’t.

For the more than 100 years that forensic ballistics comparisons have been done, investigators have gone to court confident that a “match” in rifling marks on the bullets or firearm toolmarks on cartridge cases was conclusive evidence. That changed in 2009 with a National Academy of Sciences (NAS) report about forensic analysis that concluded, “sufficient studies have not been done to understand the reliability and reproducibility of the methods.”

In 2016, the President’s Council of Advisors on Science and Technology Report (PCAST) reinforced the NAS finding. After reviewing the studies done in the seven years since the NAS report, the authors found that, “the current evidence still falls short of the scientific criteria for foundational validity.”

The limits on comparison evidence were made clear in a 2020 U.S. District Court case in the District of Columbia. A judge considering weapons charges set limitations on what the prosecutor’s firearms expert could say about the ballistics evidence. “He will not use terms such as ‘match’,” the judge ordered. “He will not state his expert opinion with any level of statistical certainty,” and, the judge concluded, he cannot use the phrases, “to the exclusion of all other firearms” or “to a reasonable degree of scientific certainty.”

Creating the Database

The forensic science community has responded to the challenges with research, much of it supported by the National Institute of Justice. Over the past decade the Institute has supported efforts to establish a statistical basis for firearm toolmark comparisons and move from 2D to 3D comparisons. Much of the research has been in collaboration with the National Institute of Standards and Technology (NIST), which in 2016 created the NIST Ballistics Toolmark Research Database (NBTRD).

To create the research database, NIST mechanical engineer Xiaoyu Alan Zheng attended forensics and law enforcement conferences and asked police departments and other agencies to test-fire their reference firearms collections and send him the bullets and cartridge cases, along with information about the guns and ammunition used. Zheng scanned the samples using a high-resolution 3D microscope, creating a virtual model of the toolmarks found on the bullets and cartridge cases. The 3D maps provide a high level of detail, and unlike conventional 2D microscopy images, the 3D images are not affected by lighting conditions, which allows for more objective comparisons.

The Database as a Research Catalyst

The open access database, largely funded by the National Institute of Justice, allows researchers to download firearms comparison data acquired from NIST and other researchers and upload their own data, steadily increasing the size and value of the database. As the database grows, more information is available for developing and validating algorithms that quantify the similarity between firearm toolmarks. That helps address the “foundational uncertainty” cited by the judge in the 2020 case.

“We made it a centralized research hub of images of firearm toolmarks on cartridge cases and bullets,” said Zheng. Without the database, researchers were extremely limited in the size and diversity of the samples they had access to. That limitation was a roadblock for the development and testing of quantitative toolmark similarity metrics, as well as the statistical methods needed for establishing the evidentiary validity of comparisons. Without the database, a robust study would have been cost-prohibitive, Zheng said. “The database is an effort to remove that initial roadblock from researchers and give them the ability to acquire data from a large variety of firearms.”

Advantage of 3D Imaging

A key advantage of 3D imaging is that it allows high-definition scans of the actual surface topography of a sample with high repeatability, which is important for database searches during the investigative phase. Although 3D images are now routinely used to rank samples in a database against crime scene evidence, 3D imaging has not yet replaced the use of traditional optical comparison microscopy for confirming whether two samples were actually fired from the same firearm. With a traditional 2D light microscope, an examiner views two samples simultaneously and subjectively concludes whether they were fired from the same firearm. The examiner typically adjusts the lighting conditions to highlight similarities and differences in the toolmarks. These changes in lighting are serious because they can dramatically affect the 2D images of toolmarks, which is detrimental to computer-based comparisons that require repeatable measurement data, Zheng said.

With 3D images, the data is a direct measurement and digital representation of the surface topography, Zheng said. Compared to traditional 2D images, this 3D data is more repeatable and not sensitive to lighting conditions and can be compared by both algorithms and examiners, he said. And virtual comparison microscopy software allows an examiner to compare 2D renderings of 3D surface data on a computer in a manner similar to traditional comparison microscopy.

Moving from 2D to 3D

However, the transition from comparing samples with a light comparison microscope to comparing 3D topographic images has been gradual, partly due to the need for expensive new equipment, Zheng said. The move to 3D is taking place because the advantages are significant, and standards are being established by NIST and the forensic firearms community.

See a list Cadre Research Labs awards from NIJ.

A leading developer of the high-definition 3D forensic firearms imaging systems is the Chicago-based Cadre Research Labs, founded by Ryan Lilien. The laboratory’s research, supported by a series of National Institute of Justice grants, has pioneered the development of 3D virtual comparison of bullets and cartridge casings using state-of-the-art scanning technology. Lilien, Cadre’s chief science officer, said that there has been a move toward 3D virtual microscopy over the past several years because recording the three-dimensional surface of a piece of evidence allows for accurate examination of the virtual surface in place of the original surface, either at the time of the initial exam, or at a later time and place.

The time element is important, Lilien noted, because traditionally “if I wanted to compare a cartridge case found at a crime scene today with a cartridge that was from a crime scene a year ago, I’d have to go down to the evidence archives and get access to that original specimen.” If a sample has to be mailed to another jurisdiction, there are chain of custody issues and the need for official approval of the evidence transfer, he said.

With 3D scanning, he said, “you greatly simplify the process.” Examiners measure an “accurate three-dimensional surface, and if you want to access a historic sample you scanned a year ago, when a previous crime occurred, you just double click on the case file.”

The Value of Virtual Comparison

Because the scan is a virtual file, when it is sent across the country to another law enforcement agency, the originating agency doesn’t have to worry about chain of custody issues or damage to the original evidence, he said. “It makes it very easy to access historical evidence, even from another jurisdiction.”

Working with virtual scans also allows a second examiner to independently verify the conclusions of the original examiner. “You can truly make the verification process blind,” Lilien said. Because the bullets or casings are virtual files, “you can hide all of the work of the first examiner when the second person is looking at it. Using the traditional tools, you had an envelope with test fires or other evidence items. When the second examiner gets it there may be sticky notes left in the file, or little marks on cartridge cases the original examiner put on with a marker.”

None of that occurs with virtual exams, he said, because the verification examiner starts with a clean file. And, importantly, the verification can be done remotely.

“Because it doesn’t have to be done at the same site,” Lilien said, “a lab with a backlog of cases can go to another lab, maybe in a different part of the state, that has a little extra capacity. An examiner at the second site could log in electronically and help reduce the backlog.”

Barriers to 3D Scanning

The time and effort involved in implementing new 3D systems is significant and can make some agencies reluctant to change. “The case we’re trying to make right now for 3D is that once you get through these initial hurdles, there is a cost savings on the back end, especially with the idea that examiners can easily pull up casework anywhere they are and conduct their analysis,” Zheng said. He hopes that as NIST continues to educate the firearms and toolmark community on the advantages and the reliability of 3D data, the resistance to the new technology will fade.

As is common with most new technology, cost is an issue. Typical comparison microscope systems used in traditional 2D analysis cost from $50,000 to $80,000, but a 3D instrument “could cost anywhere from $100,000 to $250,000, and of course you are going to need to train your people to use it, and develop a plan for deployment, validation, and quality control,” Zheng said. NIST is working with several forensic laboratories to develop best practice guides for the 3D systems. “We have to be able to document and ensure that the data you’re using for casework is accurate.”

From Research to Casework

The NIST Ballistics Toolmark Research Database is a research tool, where, as Zheng notes, none of the data is going to be used in actual casework. For use in casework, NIST is collaborating with the FBI and the Netherlands Forensic Institute to develop methods and reference data for statistical approaches to characterize the evidentiary strength or uncertainty of a comparison result.

NIST mechanical engineer Hans Soons noted that a similarity score between two pieces of evidence by itself is often meaningless. “A comparison score needs context,” he said. “How does it compare with scores obtained when comparing samples fired from the same firearm versus scores obtained when comparing samples fired from different firearms? To provide this context, we are developing the Reference Population Database of Firearm Toolmarks (RPDFT).”

The images in the reference population database are indexed according to class characteristics, such as caliber, firearm manufacturing method, ammunition type, and the number of land/grooves. “There is a whole list of characteristics, where you can filter data to samples of the relevant population, meaning the population that has the same characteristics as your evidence,” Zheng said. With a large enough set of reference samples, statistical models can be generated that describe the distribution of sample comparison scores, he said. These models can then be used to generate quantitative statistical measures, such as likelihood ratios, that summarize the strength or uncertainty of the casework comparison results.

The ultimate goal is to be able to go into court and give a statistical statement similar to what is currently done for forensic DNA, where experts can confidently state the likelihood that two samples came from the same source with a high degree of certainty.

When will that reference population of bullets and shell casings be large enough to be valid in court?

“That’s to be determined still,” Zheng said. “We’re not going to say by next year we’ll have the entire thing figured out. This is going to be a slow process, meaning we’re going to build this reference population incrementally as we go to include more and more different types of firearms and tool marks.”

After the first stages of developing and populating the RPDFT database, “the only thing we may be able to make any kind of statistical statement about is Glocks or Rugers, because those are the only firearms we have data for,” Zheng said. “You have to build more and more of that reference population, utilizing other types of class characteristics, and at the end of the day it will allow us to be able to say we know what a match and a non-match look like.” The experience obtained with these two brands of firearms will be used to refine and further populated the database, Zheng said.

Meeting Courtroom Standards

How close is the research to meeting courtroom standards? Applications of algorithms and statistical models to casework will require more time, however the use of virtual comparison microscopy of 3D toolmark images is already being introduced in court.

A National Institute of Justice supported study by Lilien and Cadre Research Labs looked specifically at examiner error rates when doing virtual comparisons using a 3D scanning system. The study concluded that such systems provide examiners with “a number of functional advantages” in actual casework. In addition, the work adds to a growing consensus that 3D exams require less time and result in more accurate conclusions than traditional microscopy.

The Cadre study also examined “how inconclusive rates (in comparing bullets and cartridges) vary with scan resolution.” The study found that low resolution scans “are likely to result in higher inconclusive rates than would be achieved from high resolution scans.”

The goal is to set standards so the resolution is high enough that examiners can see relevant details and marks, but not so high that they are overwhelmed with details that aren’t important or useful.

Zheng, Soons, and Lilien all agree that it could be three to five years before the results of the 3D scans are routinely accepted by the courts. The standards for quality control, training, algorithmic comparison, and deployment validation must be agreed upon and published, Zheng said. The guardrails for responsible use of the technology have to be put in place.

Learn more from a paper by the researchers published in the Journal of Forensic Sciences

Date Published: March 23, 2022