Award Information
Description of original award (Fiscal Year 2022, $306,271)
The estimation of ancestry from skeletal remains is one of the most crucial tasks of forensic anthropology. Craniometric data is the most commonly used data source for this task. Multiple statistical and computational methods have been developed for craniometric ancestry estimation, including but not limited to linear discriminant analysis (LDA), geometric morphometrics (GM), mixture model-based clustering, and Artificial Intelligence (AI). While LDA remains the most commonly used method, AI or machine learning-based methods often outperform traditional statistical methods (e.g., LDA) as the accuracy of the predictions. Currently, the state-of-art software for craniometric ancestry estimation based on machine learning is AncesTrees, which uses a traditional machine learning method, random forest. The classification accuracy with their 6-group (European, African, Austro-Melanesian, Polynesian, Native American, East Asian) model is about 80% while using their test samples from Africa and Europe. This accuracy is based on traditional caliper measurements. Another type of craniometric data is 3D coordinates of cranio-landmarks and a popular database of this type of data is 3D-ID. However, recent 3D-ID’s ancestry estimation algorithm evaluations showed that its accuracy might be lower than LDA with caliper measurements.
Our hypothesis is that the craniometric ancestry estimation accuracy can be further improved using deep learning (DL). DL is a recent breakthrough in AI that has surpassed the traditional machine learning methods in many areas. Its real-world applications in biology, including some from our lab, suggested that its good performance not necessarily requires large data sets. The overall objective of this application is to develop deep learning models to estimate ancestry using craniometric data. Specifically, we plan to pursue the following specific aims: Aim 1: Use deep learning methods to build prediction models to classify a skull of unknown ancestry to each of the six groups (European, African, East Asian, African American/European-African Admix, Latin American/European-American-African Admix, Middle Eastern/European-East-Asian Admix) with associated probabilities using the 3D-ID data set. Our models will also handle incomplete skulls and provide uncertainty estimation. Aim 2: Compare the performance of our models with the original 3D-ID algorithm, using independent test data. Aim 3: Implement our models into a web service for other researchers and law enforcement agents.
The deliverables will be novel DL methods for craniometric ancestry estimation and a web service implementing the methods. They will benefit the forensic anthropology community, including the law enforcement agents using forensic anthropology to match skulls of unknown individuals to missing persons. CA/NCF