MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

Overview of attention for article published in BioData Mining, December 2016

Altmetric Badge

Citations

dimensions_citation: 14 Dimensions

Readers on

mendeley: 18 Mendeley

Summary Dimensions citations

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification
Published in	BioData Mining, December 2016
DOI	10.1186/s13040-016-0116-2
Pubmed ID	27980679
Authors	Giulia Fiscon, Emanuel Weitschek, Eleonora Cella, Alessandra Lo Presti, Marta Giovanetti, Muhammed Babakir-Mina, Marco Ciotti, Massimo Ciccozzi, Alessandra Pierangeli, Paola Bertolazzi, Giovanni Felici
Abstract	Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods. We propose a supervised method based on a genetic algorithm to identify small genomic subsequences that discriminate among different species. The method identifies multiple subsequences of bounded length with the same information power in a given genomic region. The algorithm has been successfully evaluated through its integration into a rule-based classification framework and applied to three different biological data sets: Influenza, Polyoma, and Rhino virus sequences. We discover a large number of small subsequences that can be used to identify each virus type with high accuracy and low computational time, and moreover help to characterize different genomic regions. Bounding their length to 20, our method found 1164 characterizing subsequences for all the Influenza virus subtypes, 194 for all the Polyoma viruses, and 11 for Rhino viruses. The abundance of small separating subsequences extracted for each genomic region may be an important support for quick and robust virus identification. Finally, useful biological information can be derived by the relative location and abundance of such subsequences along the different regions.

View on publisher site Alert me about new mentions

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 18 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Germany	1	6%
Unknown	17	94%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	5	28%
Researcher	3	17%
Student > Doctoral Student	2	11%
Student > Master	2	11%
Professor	1	6%
Other	1	6%
Unknown	4	22%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	4	22%
Computer Science	3	17%
Medicine and Dentistry	2	11%
Biochemistry, Genetics and Molecular Biology	1	6%
Immunology and Microbiology	1	6%
Other	1	6%
Unknown	6	33%