The most common type of DNA profiling today for criminal cases and other types of forensic uses is called "STR" (short tandem repeat) analysis.
Using DNA to distinguish between two individuals is a tricky matter, because close to 99.9 percent of our DNA is the same as everybody else's DNA. DNA that actually codes for proteins cannot vary much without rendering the proteins ineffective. The four nucleotide bases that make up the backbone of DNA provide instructions for assembling the amino acids in proteins by being in a precise sequence, with each three-base group coding for a specific amino acid. If that DNA base sequence is altered (or "mutated," as scientists generally say), the sequence of amino acids in the resulting protein can also be altered. As a result, because protein function derives from a specific amino acid sequence, the protein may not work.
Think of DNA as the "blueprint" for a house and proteins as the steel, timber, bricks and mortar, from which the house will be built. A brick that is mostly sand instead of clay will crumble, and mortar with the wrong ratio of cement to aggregate will fail. Likewise, a protein with the wrong sequence of amino acids often won't function. (This analogy fails to capture the complexity of the DNA-protein system, however, because proteins are not only the "bricks" and "timber." Some "read" the "blueprint" and supervise the building, others are the "bricklayers" and "carpenters," and still others maintain and keep the house functioning after it is built.) Non-functional or missing proteins are the basis for many genetic diseases. Useful differences in the DNA must be found in the remaining one-tenth of one percent, which is not known to code for anything specific. Because this section of the DNA's precise sequence is not so important, it is quite variable, which makes it possible to use DNA to distinguish between individuals.
Among the 3 million or so DNA bases that do not code for proteins are regions with multiple copies of short repeating sequences of these bases, which make up the DNA backbone (for example, TATT). These sequences repeat a variable number of times in different individuals. Such regions are called "variable number short tandem repeats," and they are the basis of STR analysis. A collection of these can give nearly irrefutable evidence statistically of a person's identity because the likelihood of two unrelated people having the same number of repeated sequences in these regions becomes increasingly small as more regions are analyzed.
Autosomal chromosomes are those not involved in determining a person's gender, and STRs on these chromosomes are called autosomal STRs. Other STRs used for forensic purposes are called Y-STRs, which are derived solely from the male sexdetermining Y chromosome. Profiles based on autosomal STRs provide far stronger statistical power than profiles based on Y-STRs, because autosomal DNA is randomly exchanged between matched pairs of chromosomes in the process of making egg and sperm cells. That's how, with billions of humans on the planet, no two people who are not identical twins are exactly alike. Profiles based on Y-STRs are statistically weaker because only males have a Y chromosome and all males get theirs from their fathers, so all males in any paternal line have nearly identical Y chromosomes. Given enough Y-STRs, which scientists call loci, a Y-STR profile can offer substantial power to discriminate between individuals, but this type of profile is certainly not as powerful as an autosomal STR profile.
In the United States, 13 autosomal STR loci are now accepted as the system used for forensic purposes. Given a robust crime scene DNA sample with good data for all 13 STRs, the likelihood of a person unrelated to the actual perpetrator having a perfect match for all 13 is typically around 1 in 1 billion. By contrast, experimental work with a very robust set of 30 Y-STR loci showed a probability of about 1 in 50,000 for a perfect match.
About This Article
This article appeared in NIJ Journal Issue 267, March 2011, as a sidebar to the article Extending the Time to Collect DNA in Sexual Assault Cases by Terry Taylor.
[note 2] TATT stands for a specific string of nucleotide bases, thymine-adeninethymine- thymine. Thymine and adenine are two of the four bases frequently found in DNA. The other two are cytosine (C) and guanine (G).
[note 3] Norrgard, K., "Forensics, DNA Fingerprinting, and CODIS" (accessed July 7, 2010) Nature Education 1(1) (2008).
[note 4] Hanson, E., and J. Ballantyne, "A Highly Discriminating 21 Locus Y-STR 'Megaplex' System Designed to Augment the Minimal Haplotype Loci for Forensic Casework," Journal of Forensic Sciences 49 (January 2004): 1-12.