Review and Comparison of Computational Methods for Identifying Translation Start Sites in EST Data
Submitted: May 5, 2003
ABSTRACT
Expressed Sequence Tags (ESTs) are, next to cDNA sequences, the most direct
way to identify genes in silico. EST projects have the added advantages of efficiency
and scalability. Accordingly, the number of novel EST sequences is growing rapidly.
There has also been rapid progress in the development of new methods for assessing
the 5-completeness of EST sequences by identifying translation initiation
sites (TIS). This project assesses the challenge of EST analysis in the broader
context of gene discovery, reviews the key concepts and methods for identifying
translation initiation sites, and compares the performance of these methods
on a dataset of expressed sequence tags. An effective method for identifying
translation start sites is identified in this paper. ATGpr demonstrates high
sensitivity, specificity, and overall accuracy in identifying start sites while
also rejecting incomplete sequences. Finally, avenues for future improvements
in start site prediction and EST analysis are discussed.