↓ Skip to main content

Semi-supervised Learning for the BioNLP Gene Regulation Network

Overview of attention for article published in BMC Bioinformatics, June 2015
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
4 Dimensions

Readers on

mendeley
31 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Semi-supervised Learning for the BioNLP Gene Regulation Network
Published in
BMC Bioinformatics, June 2015
DOI 10.1186/1471-2105-16-s10-s4
Pubmed ID
Authors

Thomas Provoost, Marie-Francine Moens

Abstract

The BioNLP Gene Regulation Task has attracted a diverse collection of submissions showcasing state-of-the-art systems. However, a principal challenge remains in obtaining a significant amount of recall. We argue that this is an important quality for Information Extraction tasks in this field. We propose a semi-supervised framework, leveraging a large corpus of unannotated data available to us. In this framework, the annotated data is used to find plausible candidates for positive data points, which are included in the machine learning process. As this is a method principally designed for gaining recall, we further explore additional methods to improve precision on top of this. These are: weighted regularisation in the SVM framework, and filtering out unlabelled examples based on a probabilistic rule-finding method. The latter method also allows us to add candidates for negatives from unlabelled data, a method not viable in the unfiltered approach. We replicate one of the original participant systems, and modify it to incorporate our methods. This allows us to test the extent of our proposed methods by applying them to the GRN task data. We find a considerable improvement in recall compared to the baseline system. We also investigate the evaluation metrics and find several mechanisms explaining a bias towards precision. Furthermore, these findings uncover an intricate precision-recall interaction, depriving recall of its habitual immediacy seen in traditional machine learning set-ups. Our contributions are twofold.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 31 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 31 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 6 19%
Student > Master 4 13%
Student > Bachelor 4 13%
Researcher 3 10%
Lecturer 3 10%
Other 6 19%
Unknown 5 16%
Readers by discipline Count As %
Computer Science 8 26%
Medicine and Dentistry 4 13%
Agricultural and Biological Sciences 2 6%
Engineering 2 6%
Social Sciences 2 6%
Other 4 13%
Unknown 9 29%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 24 July 2015.
All research outputs
#20,283,046
of 22,817,213 outputs
Outputs from BMC Bioinformatics
#6,855
of 7,284 outputs
Outputs of similar age
#219,962
of 263,947 outputs
Outputs of similar age from BMC Bioinformatics
#103
of 109 outputs
Altmetric has tracked 22,817,213 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 7,284 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 263,947 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 109 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.