↓ Skip to main content

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research

Overview of attention for article published in BMC Bioinformatics, February 2015
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (96th percentile)
  • High Attention Score compared to outputs of the same age and source (99th percentile)

Mentioned by

blogs
1 blog
twitter
53 X users
weibo
2 weibo users
facebook
2 Facebook pages
wikipedia
3 Wikipedia pages
googleplus
1 Google+ user

Citations

dimensions_citation
175 Dimensions

Readers on

mendeley
277 Mendeley
citeulike
2 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research
Published in
BMC Bioinformatics, February 2015
DOI 10.1186/s12859-015-0472-9
Pubmed ID
Authors

Àlex Bravo, Janet Piñero, Núria Queralt-Rosinach, Michael Rautschka, Laura I Furlong

Abstract

Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.

X Demographics

X Demographics

The data shown below were collected from the profiles of 53 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 277 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Spain 4 1%
Portugal 2 <1%
Brazil 2 <1%
United States 2 <1%
France 1 <1%
Mexico 1 <1%
New Caledonia 1 <1%
Netherlands 1 <1%
Croatia 1 <1%
Other 0 0%
Unknown 262 95%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 56 20%
Researcher 50 18%
Student > Master 40 14%
Student > Bachelor 19 7%
Student > Doctoral Student 14 5%
Other 43 16%
Unknown 55 20%
Readers by discipline Count As %
Computer Science 91 33%
Agricultural and Biological Sciences 29 10%
Biochemistry, Genetics and Molecular Biology 27 10%
Medicine and Dentistry 15 5%
Engineering 10 4%
Other 42 15%
Unknown 63 23%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 49. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 13 November 2016.
All research outputs
#886,096
of 25,880,948 outputs
Outputs from BMC Bioinformatics
#55
of 7,761 outputs
Outputs of similar age
#10,684
of 270,173 outputs
Outputs of similar age from BMC Bioinformatics
#1
of 137 outputs
Altmetric has tracked 25,880,948 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 96th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,761 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.6. This one has done particularly well, scoring higher than 99% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 270,173 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 96% of its contemporaries.
We're also able to compare this research output to 137 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 99% of its contemporaries.