U.S. flag

An official website of the United States government, Department of Justice.

Genetic differentiation between and within Northern Native American language groups: an argument for the expansion of the Native American CODIS database

NCJ Number
Forensic Sciences Research Dated: 2021
Date Published

The goal of this study was to determine whether genetic differences correlate with language groupings among Native Americans in the United States; and, if so, whether additional language families would provide a more accurate representation of current genetic diversity among tribal populations.


The National Research Council recommends that genetic differentiation among subgroups of ethnic samples be lower than 3 percent of the total genetic differentiation within the ethnic sample to be used for estimating reliable random match probabilities for forensic use. Native American samples in the United States’ Combined DNA Index System (CODIS) database represent four language families: Algonquian, Na-Dene, Eskimo-Aleut, and Salishan; however, a minimum of 27 Native American language families exists in the United States, not including language isolates. In the current study, the 21 short tandem repeat (STR) loci included in the Globalfiler® PCR Amplification Kit were used to characterize six indigenous language families, including three of the four represented in the CODIS database (i.e. Algonquian, Na-Dene, and Eskimo-Aleut), and two language isolates (Miwok and Seri) using major population genetic diversity metrics such as F statistics and Bayesian clustering analysis of genotype frequencies. Most of the genetic variation (97 percent) was found to be within language families instead of among them (3percent). In contrast, when only the three of the four language families represented in both the CODIS database and the present study were considered, 4 percent of the genetic variation occurred among the language groups. Bayesian clustering resulted in a maximum posterior probability indicating three genetically distinct groups among the eight language families and isolates: (1) Eskimo, (2) Seri, and (3) all other language groups and isolates, thus confirming genetic subdivision among subgroups of the CODIS Native American database. This genetic structure indicates the need for an increased number of Native American populations based on language affiliation in the CODIS database as well as more robust sample sets for those language families. (publisher abstract modified)

Date Published: January 1, 2021