Altmetric – Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature
Published in	BMC Bioinformatics, June 2015
DOI	10.1186/s12859-015-0609-x
Pubmed ID	26047637
Authors	Komandur Elayavilli Ravikumar, Kavishwar B. Wagholikar, Dingcheng Li, Jean-Pierre Kocher, Hongfang Liu
Abstract	Advances in the next generation sequencing technology has accelerated the pace of individualized medicine (IM), which aims to incorporate genetic/genomic information into medicine. One immediate need in interpreting sequencing data is the assembly of information about genetic variants and their corresponding associations with other entities (e.g., diseases or medications). Even with dedicated effort to capture such information in biological databases, much of this information remains 'locked' in the unstructured text of biomedical publications. There is a substantial lag between the publication and the subsequent abstraction of such information into databases. Multiple text mining systems have been developed, but most of them focus on the sentence level association extraction with performance evaluation based on gold standard text annotations specifically prepared for text mining systems. We developed and evaluated a text mining system, MutD, which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3 % for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10 % in F-measure when compared against the sentence level association extraction. Our error analysis indicates that 23 of the 64 precision errors are true associations that were not captured by database curators and 68 of the 113 recall errors are caused by the absence of associated disease entities in the abstract. After adjusting for the defects in the curated database, the revised F-measure of MutD in association detection reaches 81.5 %. Our quantitative analysis reveals that MutD can effectively extract protein mutation disease associations when benchmarking based on curated database records. The analysis also demonstrates that incorporating discourse level analysis significantly improved the performance of extracting the protein-mutation-disease association. Future work includes the extension of MutD for full text articles.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.
As of 1 July 2024, you may notice a temporary increase in the numbers of X profiles with Unknown location. Click here to learn more.

Geographical breakdown

Country	Count	As %
Israel	1	20%
Switzerland	1	20%
Unknown	3	60%

Demographic breakdown

Type	Count	As %
Scientists	3	60%
Members of the public	1	20%
Practitioners (doctors, other healthcare professionals)	1	20%

Mendeley readers

The data shown below were compiled from readership statistics for 104 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
United States	2	2%
Spain	1	<1%
Japan	1	<1%
Unknown	100	96%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	26	25%
Student > Ph. D. Student	20	19%
Student > Master	7	7%
Student > Bachelor	7	7%
Student > Doctoral Student	5	5%
Other	21	20%
Unknown	18	17%

Readers by discipline	Count	As %
Computer Science	35	34%
Agricultural and Biological Sciences	14	13%
Biochemistry, Genetics and Molecular Biology	9	9%
Medicine and Dentistry	6	6%
Pharmacology, Toxicology and Pharmaceutical Science	4	4%
Other	12	12%
Unknown	24	23%

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 20 October 2015.

All research outputs

#14,641,036

of 25,540,105 outputs

Outputs from BMC Bioinformatics

#4,015

of 7,717 outputs

Outputs of similar age

#129,204

of 280,931 outputs

Outputs of similar age from BMC Bioinformatics

#70

of 118 outputs

Altmetric has tracked 25,540,105 research outputs across all sources so far. This one is in the 42nd percentile – i.e., 42% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,717 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one is in the 47th percentile – i.e., 47% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 280,931 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 53% of its contemporaries.

We're also able to compare this research output to 118 others from the same source and published within six weeks on either side of this one. This one is in the 41st percentile – i.e., 41% of its contemporaries scored the same or lower than it.

Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context