↓ Skip to main content

Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features

Overview of attention for article published in Algorithms for Molecular Biology, June 2016
Altmetric Badge

Citations

dimensions_citation
18 Dimensions

Readers on

mendeley
17 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Identification of donor splice sites using support vector machine: a computational approach based on positional, compositional and dependency features
Published in
Algorithms for Molecular Biology, June 2016
DOI 10.1186/s13015-016-0078-4
Pubmed ID
Authors

Prabina Kumar Meher, Tanmaya Kumar Sahu, A. R. Rao, S. D. Wahi

Abstract

Identification of splice sites is essential for annotation of genes. Though existing approaches have achieved an acceptable level of accuracy, still there is a need for further improvement. Besides, most of the approaches are species-specific and hence it is required to develop approaches compatible across species. Each splice site sequence was transformed into a numeric vector of length 49, out of which four were positional, four were dependency and 41 were compositional features. Using the transformed vectors as input, prediction was made through support vector machine. Using balanced training set, the proposed approach achieved area under ROC curve (AUC-ROC) of 96.05, 96.96, 96.95, 96.24 % and area under PR curve (AUC-PR) of 97.64, 97.89, 97.91, 97.90 %, while tested on human, cattle, fish and worm datasets respectively. On the other hand, AUC-ROC of 97.21, 97.45, 97.41, 98.06 % and AUC-PR of 93.24, 93.34, 93.38, 92.29 % were obtained, while imbalanced training datasets were used. The proposed approach was found comparable with state-of-art splice site prediction approaches, while compared using the bench mark NN269 dataset and other datasets. The proposed approach achieved consistent accuracy across different species as well as found comparable with the existing approaches. Thus, we believe that the proposed approach can be used as a complementary method to the existing methods for the prediction of splice sites. A web server named as 'HSplice' has also been developed based on the proposed approach for easy prediction of 5' splice sites by the users and is freely available at http://cabgrid.res.in:8080/HSplice.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 17 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
India 1 6%
Unknown 16 94%

Demographic breakdown

Readers by professional status Count As %
Student > Master 4 24%
Researcher 3 18%
Student > Ph. D. Student 3 18%
Student > Bachelor 2 12%
Lecturer 1 6%
Other 3 18%
Unknown 1 6%
Readers by discipline Count As %
Computer Science 7 41%
Biochemistry, Genetics and Molecular Biology 3 18%
Engineering 2 12%
Chemistry 1 6%
Agricultural and Biological Sciences 1 6%
Other 0 0%
Unknown 3 18%