↓ Skip to main content

Sieve-based relation extraction of gene regulatory networks from biological literature

Overview of attention for article published in BMC Bioinformatics, October 2015
Altmetric Badge

About this Attention Score

  • Average Attention Score compared to outputs of the same age

Mentioned by

twitter
3 X users

Citations

dimensions_citation
11 Dimensions

Readers on

mendeley
33 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Sieve-based relation extraction of gene regulatory networks from biological literature
Published in
BMC Bioinformatics, October 2015
DOI 10.1186/1471-2105-16-s16-s1
Pubmed ID
Authors

Slavko Žitnik, Marinka Žitnik, Blaž Zupan, Marko Bajec

Abstract

Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.

X Demographics

X Demographics

The data shown below were collected from the profiles of 3 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 33 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 2 6%
Unknown 31 94%

Demographic breakdown

Readers by professional status Count As %
Researcher 6 18%
Student > Master 6 18%
Student > Ph. D. Student 4 12%
Student > Bachelor 3 9%
Professor 2 6%
Other 6 18%
Unknown 6 18%
Readers by discipline Count As %
Computer Science 14 42%
Medicine and Dentistry 4 12%
Business, Management and Accounting 1 3%
Nursing and Health Professions 1 3%
Biochemistry, Genetics and Molecular Biology 1 3%
Other 4 12%
Unknown 8 24%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 2. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 10 November 2015.
All research outputs
#14,827,682
of 22,831,537 outputs
Outputs from BMC Bioinformatics
#5,046
of 7,288 outputs
Outputs of similar age
#157,512
of 284,596 outputs
Outputs of similar age from BMC Bioinformatics
#102
of 157 outputs
Altmetric has tracked 22,831,537 research outputs across all sources so far. This one is in the 32nd percentile – i.e., 32% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,288 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 26th percentile – i.e., 26% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 284,596 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 41st percentile – i.e., 41% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 157 others from the same source and published within six weeks on either side of this one. This one is in the 29th percentile – i.e., 29% of its contemporaries scored the same or lower than it.