Review and Comparison of Computational Methods for Identifying Translation Start Sites in EST Data

Submitted: May 5, 2003

ABSTRACT
Expressed Sequence Tags (ESTs) are, next to cDNA sequences, the most direct way to identify genes in silico. EST projects have the added advantages of efficiency and scalability. Accordingly, the number of novel EST sequences is growing rapidly. There has also been rapid progress in the development of new methods for assessing the 5’-completeness of EST sequences by identifying translation initiation sites (TIS). This project assesses the challenge of EST analysis in the broader context of gene discovery, reviews the key concepts and methods for identifying translation initiation sites, and compares the performance of these methods on a dataset of expressed sequence tags. An effective method for identifying translation start sites is identified in this paper. ATGpr demonstrates high sensitivity, specificity, and overall accuracy in identifying start sites while also rejecting incomplete sequences. Finally, avenues for future improvements in start site prediction and EST analysis are discussed.