↓ Skip to main content

The CHEMDNER corpus of chemicals and drugs and its annotation principles

Overview of attention for article published in Journal of Cheminformatics, January 2015
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Good Attention Score compared to outputs of the same age (78th percentile)
  • High Attention Score compared to outputs of the same age and source (88th percentile)

Mentioned by

twitter
9 X users

Citations

dimensions_citation
191 Dimensions

Readers on

mendeley
211 Mendeley
citeulike
1 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
The CHEMDNER corpus of chemicals and drugs and its annotation principles
Published in
Journal of Cheminformatics, January 2015
DOI 10.1186/1758-2946-7-s1-s2
Pubmed ID
Authors

Martin Krallinger, Obdulia Rabal, Florian Leitner, Miguel Vazquez, David Salgado, Zhiyong Lu, Robert Leaman, Yanan Lu, Donghong Ji, Daniel M Lowe, Roger A Sayle, Riza Theresa Batista-Navarro, Rafal Rak, Torsten Huber, Tim Rocktäschel, Sérgio Matos, David Campos, Buzhou Tang, Hua Xu, Tsendsuren Munkhdalai, Keun Ho Ryu, SV Ramanan, Senthil Nathan, Slavko Žitnik, Marko Bajec, Lutz Weber, Matthias Irmer, Saber A Akhondi, Jan A Kors, Shuo Xu, Xin An, Utpal Kumar Sikdar, Asif Ekbal, Masaharu Yoshioka, Thaer M Dieb, Miji Choi, Karin Verspoor, Madian Khabsa, C Lee Giles, Hongfang Liu, Komandur Elayavilli Ravikumar, Andre Lamurias, Francisco M Couto, Hong-Jie Dai, Richard Tzong-Han Tsai, Caglar Ata, Tolga Can, Anabel Usié, Rui Alves, Isabel Segura-Bedmar, Paloma Martínez, Julen Oyarzabal, Alfonso Valencia

Abstract

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.

X Demographics

X Demographics

The data shown below were collected from the profiles of 9 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 211 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Germany 3 1%
United States 2 <1%
Australia 1 <1%
Netherlands 1 <1%
Spain 1 <1%
Croatia 1 <1%
Unknown 202 96%

Demographic breakdown

Readers by professional status Count As %
Researcher 40 19%
Student > Ph. D. Student 39 18%
Student > Master 27 13%
Student > Bachelor 17 8%
Other 14 7%
Other 32 15%
Unknown 42 20%
Readers by discipline Count As %
Computer Science 79 37%
Medicine and Dentistry 15 7%
Agricultural and Biological Sciences 12 6%
Chemistry 10 5%
Biochemistry, Genetics and Molecular Biology 8 4%
Other 37 18%
Unknown 50 24%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 6. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 June 2019.
All research outputs
#6,225,301
of 25,079,131 outputs
Outputs from Journal of Cheminformatics
#480
of 942 outputs
Outputs of similar age
#78,177
of 364,401 outputs
Outputs of similar age from Journal of Cheminformatics
#3
of 17 outputs
Altmetric has tracked 25,079,131 research outputs across all sources so far. Compared to these this one has done well and is in the 75th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 942 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.2. This one is in the 49th percentile – i.e., 49% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 364,401 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 78% of its contemporaries.
We're also able to compare this research output to 17 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 88% of its contemporaries.