With the increasing abundance of genetic data, the usefulness of a genetic dataset now depends in part on the possibility of productively linking it with other datasets. One issue that arises in combining multiple datasets is the record-matching problem, i.e., the identification of dataset entries that, although labeled differently in separate datasets, represent the same underlying entity. In a genetic context, record-matching involves the identification of the same individual genome across multiple datasets when unique identifiers, such as participant names, are unavailable. By using correlations among genetic markers close to one another in the genome, the method proposed in this report can succeed even when the datasets contain no overlapping marker. The authors show that the method can link a dataset similar to those used in genomic studies with another dataset containing markers used for forensics. The proposed approach can assist in maintaining backward compatibility with databases of existing forensic genetic profiles as systems move to new marker types. This approach illustrates that the privacy risks that can arise from the cross-linking of databases are inherent even for a small number of markers. 4 figures, 1 tables, and 39 references
Downloads
Similar Publications
- Habeas Litigation in U.S. District Courts: An Empirical Study of Habeas Corpus Cases Filed by State Prisoners Under the Antiterrorism and Effective Death Penalty Act of 1996, Final Technical Report
- Correction: Shelly Y. Shih; et al.; Applications of Probe Capture Enrichment Next Generation Sequencing for Whole Mitochondrial Genome and 426 Nuclear SNPs for Forensically Challenging Samples. Genes 2018, 9, 49
- Improving the Effectiveness and Utilization of Neighborhood Watch Programs: Executive Summary