Every protein in the human body originates from a DNA sequence, which determines the proper sequence of amino acids needed to create the protein. This sequence of amino acids dictates the folding and final shape of the protein and thus the protein’s ability to perform its particular function in the cell. Although the DNA is housed in the nucleus of the cell, the molecular machinery responsible for protein assembly is housed outside the nucleus. In order for the critical instructions to exit the nucleus, they have to be transcribed to a temporary molecule called RNA which, due to its small size, can travel outside the nucleus.
Just as all the cell’s DNA is called the “genome,” all of the transcribed RNA molecules are called its “transcriptome.” Since it comes directly from DNA, the transcriptome has the potential to identify individuals, reconstruct phenotypes, and suggest ancestry, just like DNA. And because each type of cell in the body is transcribing different parts of the DNA to perform its unique functions, RNA can potentially identify tissue and fluid types, as well.
A major complication with the transcriptome with regard to its forensic value, however, is that it degrades rapidly. Cellular enzymes break down RNA to regulate protein levels and recycle molecular components, and unless preserved for transcriptome analysis, biological samples have generally been considered useless for collecting RNA information. Nonetheless, identifiable transcripts can be isolated from samples many years old, and in fact, the degree of RNA degradation could be used to ascertain how old a sample is.
What determines the rate of RNA decay, and whether this rate is different for different genes — and if so, why — was investigated by Robert Allen, chairman of the School of Forensic Sciences at Oklahoma State University. His team collected specimens of blood, saliva, semen, and vaginal fluids, and from zero to 360 days after collection, RNA was isolated from the samples and sequenced using a Next-Generation Sequencing (NGS) system. The NGS system generates and detects millions of individual copies (called “reads”) of the prepared RNA sample. All of the reads were aligned to the human genome for gene identification, and the number of reads at each gene was an indicator of the transcript’s abundance in the sample. To standardize read counts across transcripts of different lengths and to use reads as an estimate of the number of transcripts in the sample, RNA molecules were introduced at known concentrations and sequenced to provide a calibration curve.
What Allen discovered was that, as expected, transcripts which were more abundant in the initial sample (time zero) took longer to disappear. Interestingly, the degradation rate differed by gene and did not appear to be dependent on the length of the sequence. That is, different transcripts with similar initial concentrations could either degrade to undetectable levels within days, or last for months, depending not on the size of the transcript but on the gene it was copied from. Moreover, replicate sequencing runs at time zero matched closely, and read counts for the introduced, standard RNAs correlated well with their set concentrations, all indicating that read counts were indicative of true RNA concentrations, and were not an experimental artifact.
Allen discovered that the rate of RNA decay also depended on the body fluid from which the RNA originated. As expected, different fluids generally had different RNA transcripts (that is, different genes were expressed), or if the same, at different abundances. Importantly, though, even when comparable in both identity and abundance at time zero, they would disappear at different rates. For example, despite having similar concentrations in new semen and saliva samples, transcripts of the gene GAPDH disappear after six months in saliva but are still at 44% of their original levels in semen at that time.
To determine the age of a sample, Allen recommends that instead of using concentrations of the full transcriptome, or of transcripts of selected genes, forensic scientists should use transcriptome concentrations from categories of genes known to be short-, mid-, or long-lived in different fluids and tissues. For example, if no transcripts are found in a bloodstain for those genes whose transcripts are known to degrade after 60 days, but they are detected for genes whose transcripts are known to last 120 and 180 days, the sample could be estimated as being less than 120 days old. Armed with this knowledge, forensic scientists could begin to identify not just who was present at the scene of the crime, but when the body fluid was deposited.
About this Article
The research described in this article was funded by NIJ grant 2014-DN-BX-K025, awarded to the Oklahoma State University Center for Health Sciences. This article is based on the grantee report, “Transcriptome sequencing of forensically relevant biological fluids and tissues to optimize degradation analysis for sample age estimation,” by Robert W. Allen, principal investigator, School of Forensic Sciences, Oklahoma State University.