Report for: Improving chemical disease relation extraction with rich features and weakly labeled data

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Improving chemical disease relation extraction with rich features and weakly labeled data
Published in	Journal of Cheminformatics, October 2016
DOI	10.1186/s13321-016-0165-z
Pubmed ID	28316651
Authors	Yifan Peng, Chih-Hsuan Wei, Zhiyong Lu
Abstract	Due to the importance of identifying relations between chemicals and diseases for new drug discovery and improving chemical safety, there has been a growing interest in developing automatic relation extraction systems for capturing these relations from the rich and rapid-growing biomedical literature. In this work we aim to build on current advances in named entity recognition and a recent BioCreative effort to further improve the state of the art in biomedical relation extraction, in particular for the chemical-induced disease (CID) relations. We propose a rich-feature approach with Support Vector Machine to aid in the extraction of CIDs from PubMed articles. Our feature vector includes novel statistical features, linguistic knowledge, and domain resources. We also incorporate the output of a rule-based system as features, thus combining the advantages of rule- and machine learning-based systems. Furthermore, we augment our approach with automatically generated labeled text from an existing knowledge base to improve performance without additional cost for corpus construction. To evaluate our system, we perform experiments on the human-annotated BioCreative V benchmarking dataset and compare with previous results. When trained using only BioCreative V training and development sets, our system achieves an F-score of 57.51 %, which already compares favorably to previous methods. Our system performance was further improved to 61.01 % in F-score when augmented with additional automatically generated weakly labeled data. Our text-mining approach demonstrates state-of-the-art performance in disease-chemical relation extraction. More importantly, this work exemplifies the use of (freely available) curated document-level annotations in existing biomedical databases, which are largely overlooked in text-mining system development.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 5 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Spain	1	20%
Switzerland	1	20%
United States	1	20%
Unknown	2	40%

Demographic breakdown

Type	Count	As %
Members of the public	3	60%
Scientists	1	20%
Science communicators (journalists, bloggers, editors)	1	20%

Mendeley readers

The data shown below were compiled from readership statistics for 69 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	69	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	23	33%
Student > Master	11	16%
Researcher	6	9%
Lecturer	4	6%
Student > Doctoral Student	4	6%
Other	4	6%
Unknown	17	25%

Readers by discipline	Count	As %
Computer Science	28	41%
Agricultural and Biological Sciences	5	7%
Engineering	3	4%
Medicine and Dentistry	2	3%
Biochemistry, Genetics and Molecular Biology	1	1%
Other	8	12%
Unknown	22	32%

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 09 December 2016.

All research outputs

#13,822,239

of 24,143,470 outputs

Outputs from Journal of Cheminformatics

#667

of 891 outputs

Outputs of similar age

#165,170

of 325,012 outputs

Outputs of similar age from Journal of Cheminformatics

#18

of 24 outputs

Altmetric has tracked 24,143,470 research outputs across all sources so far. This one is in the 42nd percentile – i.e., 42% of other outputs scored the same or lower than it.

So far Altmetric has tracked 891 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.7. This one is in the 23rd percentile – i.e., 23% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 325,012 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 48th percentile – i.e., 48% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 24 others from the same source and published within six weeks on either side of this one. This one is in the 25th percentile – i.e., 25% of its contemporaries scored the same or lower than it.

Improving chemical disease relation extraction with rich features and weakly labeled data

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context