↓ Skip to main content

matK-QR classifier: a patterns based approach for plant species identification

Overview of attention for article published in BioData Mining, December 2016
Altmetric Badge

Citations

dimensions_citation
12 Dimensions

Readers on

mendeley
41 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
matK-QR classifier: a patterns based approach for plant species identification
Published in
BioData Mining, December 2016
DOI 10.1186/s13040-016-0120-6
Pubmed ID
Authors

Ravi Prabhakar More, Rupali Chandrashekhar Mane, Hemant J. Purohit

Abstract

DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK (matK) and ribulose-1, 5-bisphosphate carboxylase (rbcL) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Naïve Bayes (NB) methods against NCBI-GenBank matK dataset. Due to the higher discrimination success obtained with the matK as compared to the rbcL, we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK-QR Classifier (http://www.neeri.res.in/matk_classifier/index.htm), which search signatures in the query matK gene sequences and predict corresponding plant species. This novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that matK-QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 41 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Germany 1 2%
Unknown 40 98%

Demographic breakdown

Readers by professional status Count As %
Student > Bachelor 10 24%
Student > Master 6 15%
Researcher 5 12%
Student > Ph. D. Student 5 12%
Professor > Associate Professor 3 7%
Other 7 17%
Unknown 5 12%
Readers by discipline Count As %
Agricultural and Biological Sciences 13 32%
Computer Science 7 17%
Biochemistry, Genetics and Molecular Biology 4 10%
Engineering 4 10%
Environmental Science 1 2%
Other 3 7%
Unknown 9 22%