U.S. flag

An official website of the United States government, Department of Justice.

Guiding Interpretation: Leveraging High-Density SNP Data from Major U.S. Populations for Forensic Genetic Analyses

Award Information

Award #
GRANT14121445
Funding Category
Competitive
Status
Open
Funding First Awarded
2024
Total funding (to date)
$990,468

Description of original award (Fiscal Year 2024, $990,468)

Advancements in next generation sequencing (NGS) technologies have enhanced forensic genetic investigations by enabling the analysis of single nucleotide polymorphisms (SNPs) through whole genome sequencing (WGS) or targeted panels. Using SNPs for challenging DNA samples or complex missing persons cases have the potential to provide DNA evidence to be used to solve crimes, identify missing persons, or generate investigative leads through forensic investigative genetic genealogy (FIGG). The U.S. demographics is unique because of the large proportion of admixture observed, making existing high-density SNP reference data such as those in the 1000 Genomes Project (1000G) not representative of these populations. Reliable population allele frequencies with knowledge about population structure and DNA marker dependencies is crucial for accurate forensic DNA interpretation, particularly for larger forensically relevant SNP panels, but such information is limited for U.S. populations.

This project aims to advance forensic genetic SNP analysis by utilizing high-density SNP data from major U.S. populations, with a special focus on larger SNP panels used for identity testing, missing persons identification, kinship analysis and FIGG (e.g., FORCE, Kintelligence, SNP array panels and WGS based SNP panels). Over 1200 samples, previously collected from unrelated individuals with self-identified race (i.e., White or Caucasian, Black or African American, Hispanic, American Indian, Asian, and Multiracial), will be analyzed with a custom version of the Infinium Global Screening Array (GSA) SNP chip (more than 700,000 SNPs). Population genetic metrics will be explored, including allele frequency distributions and linkage disequilibrium patterns, population structure, and coancestry. These metrics will be compared with existing genome data (e.g., 1000G), and its impact on DNA evidence interpretation will be examined. The genotyping of an additional 400 samples from known related individuals (2nd to 5th degrees) will enable exploration of the possibilities and limitations in terms of kinship assessments. These metrics are most often studied by simulation approaches, and there is a need to confirm such results with empirical data from known relatives. Additionally, access to existing autosomal short tandem repeat (STR), Y-chromosomal STR, and mitochondrial DNA (mtDNA) data along with the newly generated SNP data will allow SNP-SNP, SNP-STR, STR-mtDNA dependencies within U.S. populations to be assessed, providing recommendations regarding combining statistics from multiple marker systems. The outcomes of this project will provide extensive SNP allele data for U.S. populations as well as guidance to practitioners on the statistical interpretation when including SNP data and/or multiple markers in forensic investigations.

Date Created: October 18, 2024