The authors discuss the suitability and potential benefits of using hidden Markov models as a basecalling tool, presenting a very simple model that matches the overall performance of PHRED in a preliminary evaluation; they provide detailed discussion of the motivation and theoretical background for their research, model selection for DNA sequencing, the generation of training data and a discussion of their model training, and model implementation and results.
In this paper we propose hidden Markov models to model electropherograms from DNA sequencing equipment and perform basecalling. The authors model the state emission densities using artificial neural networks, and modify the Baum–Welch re-estimation procedure to perform training. Moreover, they develop a method that exploits consensus sequences to label training data, thus minimizing the need for hand labeling. The authors propose the same method for locating an electropherogram in a longer DNA sequence. They also perform a careful study of the basecalling errors and propose alternative HMM topologies that might further improve performance. Their results demonstrate the potential of these models. Based on these results, the authors conclude by suggesting further research directions. (Published Index Provided)
Downloads
Similar Publications
- Forensic Discrimination of Dyed Hair Color: I. UV-Visible Microspectrophotometry
- Technology-Facilitated Abuse in Intimate Partner Violence (IPV): An Exploration of Costs and Consequences, Executive Summary
- Assessing Screw Length Impact on Bone Strain in Proximal Humerus Fracture Fixation Via Surrogate Modelling