↓ Skip to main content

MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

Overview of attention for article published in BioData Mining, December 2016
Altmetric Badge

Citations

dimensions_citation
14 Dimensions

Readers on

mendeley
18 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification
Published in
BioData Mining, December 2016
DOI 10.1186/s13040-016-0116-2
Pubmed ID
Authors

Giulia Fiscon, Emanuel Weitschek, Eleonora Cella, Alessandra Lo Presti, Marta Giovanetti, Muhammed Babakir-Mina, Marco Ciotti, Massimo Ciccozzi, Alessandra Pierangeli, Paola Bertolazzi, Giovanni Felici

Abstract

Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods. We propose a supervised method based on a genetic algorithm to identify small genomic subsequences that discriminate among different species. The method identifies multiple subsequences of bounded length with the same information power in a given genomic region. The algorithm has been successfully evaluated through its integration into a rule-based classification framework and applied to three different biological data sets: Influenza, Polyoma, and Rhino virus sequences. We discover a large number of small subsequences that can be used to identify each virus type with high accuracy and low computational time, and moreover help to characterize different genomic regions. Bounding their length to 20, our method found 1164 characterizing subsequences for all the Influenza virus subtypes, 194 for all the Polyoma viruses, and 11 for Rhino viruses. The abundance of small separating subsequences extracted for each genomic region may be an important support for quick and robust virus identification. Finally, useful biological information can be derived by the relative location and abundance of such subsequences along the different regions.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 18 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Germany 1 6%
Unknown 17 94%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 5 28%
Researcher 3 17%
Student > Doctoral Student 2 11%
Student > Master 2 11%
Professor 1 6%
Other 1 6%
Unknown 4 22%
Readers by discipline Count As %
Agricultural and Biological Sciences 4 22%
Computer Science 3 17%
Medicine and Dentistry 2 11%
Biochemistry, Genetics and Molecular Biology 1 6%
Immunology and Microbiology 1 6%
Other 1 6%
Unknown 6 33%