The authors discuss the suitability and potential benefits of using hidden Markov models as a basecalling tool, presenting a very simple model that matches the overall performance of PHRED in a preliminary evaluation; they provide detailed discussion of the motivation and theoretical background for their research, model selection for DNA sequencing, the generation of training data and a discussion of their model training, and model implementation and results.
In this paper we propose hidden Markov models to model electropherograms from DNA sequencing equipment and perform basecalling. The authors model the state emission densities using artificial neural networks, and modify the Baum–Welch re-estimation procedure to perform training. Moreover, they develop a method that exploits consensus sequences to label training data, thus minimizing the need for hand labeling. The authors propose the same method for locating an electropherogram in a longer DNA sequence. They also perform a careful study of the basecalling errors and propose alternative HMM topologies that might further improve performance. Their results demonstrate the potential of these models. Based on these results, the authors conclude by suggesting further research directions. (Published Index Provided)
Downloads
Similar Publications
- Taku Eyachantognaka Owihankeya Wanica, Community Brief
- A Self-assessment Tool for Helping Identify Police Burnout Among Investigators of Child Sexual Abuse Material
- Evaluation of Cannabis Product Mislabeling: The Development of a Unified Cannabinoid LC-MS/MS Method to Analyze E-liquids and Edible Products