↓ Skip to main content

Detection and categorization of bacteria habitats using shallow linguistic analysis

Overview of attention for article published in BMC Bioinformatics, June 2015
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
6 Dimensions

Readers on

mendeley
35 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Detection and categorization of bacteria habitats using shallow linguistic analysis
Published in
BMC Bioinformatics, June 2015
DOI 10.1186/1471-2105-16-s10-s5
Pubmed ID
Authors

İlknur Karadeniz, Arzucan Özgür

Abstract

Information regarding bacteria biotopes is important for several research areas including health sciences, microbiology, and food processing and preservation. One of the challenges for scientists in these domains is the huge amount of information buried in the text of electronic resources. Developing methods to automatically extract bacteria habitat relations from the text of these electronic resources is crucial for facilitating research in these areas. We introduce a linguistically motivated rule-based approach for recognizing and normalizing names of bacteria habitats in biomedical text by using an ontology. Our approach is based on the shallow syntactic analysis of the text that include sentence segmentation, part-of-speech (POS) tagging, partial parsing, and lemmatization. In addition, we propose two methods for identifying bacteria habitat localization relations. The underlying assumption for the first method is that discourse changes with a new paragraph. Therefore, it operates on a paragraph-basis. The second method performs a more fine-grained analysis of the text and operates on a sentence-basis. We also develop a novel anaphora resolution method for bacteria coreferences and incorporate it with the sentence-based relation extraction approach. We participated in the Bacteria Biotope (BB) Task of the BioNLP Shared Task 2013. Our system (Boun) achieved the second best performance with 68% Slot Error Rate (SER) in Sub-task 1 (Entity Detection and Categorization), and ranked third with an F-score of 27% in Sub-task 2 (Localization Event Extraction). This paper reports the system that is implemented for the shared task, including the novel methods developed and the improvements obtained after the official evaluation. The extensions include the expansion of the OntoBiotope ontology using the training set for Sub-task 1, and the novel sentence-based relation extraction method incorporated with anaphora resolution for Sub-task 2. These extensions resulted in promising results for Sub-task 1 with a SER of 68%, and state-of-the-art performance for Sub-task 2 with an F-score of 53%. Our results show that a linguistically-oriented approach based on the shallow syntactic analysis of the text is as effective as machine learning approaches for the detection and ontology-based normalization of habitat entities. Furthermore, the newly developed sentence-based relation extraction system with the anaphora resolution module significantly outperforms the paragraph-based one, as well as the other systems that participated in the BB Shared Task 2013.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 35 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 35 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 5 14%
Student > Doctoral Student 5 14%
Student > Bachelor 5 14%
Researcher 4 11%
Professor > Associate Professor 3 9%
Other 6 17%
Unknown 7 20%
Readers by discipline Count As %
Computer Science 12 34%
Biochemistry, Genetics and Molecular Biology 3 9%
Agricultural and Biological Sciences 3 9%
Social Sciences 2 6%
Engineering 2 6%
Other 6 17%
Unknown 7 20%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 July 2015.
All research outputs
#18,418,919
of 22,817,213 outputs
Outputs from BMC Bioinformatics
#6,314
of 7,284 outputs
Outputs of similar age
#189,577
of 263,947 outputs
Outputs of similar age from BMC Bioinformatics
#98
of 109 outputs
Altmetric has tracked 22,817,213 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,284 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 5th percentile – i.e., 5% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 263,947 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 16th percentile – i.e., 16% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 109 others from the same source and published within six weeks on either side of this one. This one is in the 6th percentile – i.e., 6% of its contemporaries scored the same or lower than it.