↓ Skip to main content

Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge

Overview of attention for article published in Genome Biology, September 2008
Altmetric Badge

Mentioned by

wikipedia
3 Wikipedia pages

Citations

dimensions_citation
153 Dimensions

Readers on

mendeley
247 Mendeley
citeulike
8 CiteULike
connotea
1 Connotea
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge
Published in
Genome Biology, September 2008
DOI 10.1186/gb-2008-9-s2-s1
Pubmed ID
Authors

Martin Krallinger, Alexander Morgan, Larry Smith, Florian Leitner, Lorraine Tanabe, John Wilbur, Lynette Hirschman, Alfonso Valencia

Abstract

Genome sciences have experienced an increasing demand for efficient text-processing tools that can extract biologically relevant information from the growing amount of published literature. In response, a range of text-mining and information-extraction tools have recently been developed specifically for the biological domain. Such tools are only useful if they are designed to meet real-life tasks and if their performance can be estimated and compared. The BioCreative challenge (Critical Assessment of Information Extraction in Biology) consists of a collaborative initiative to provide a common evaluation framework for monitoring and assessing the state-of-the-art of text-mining systems applied to biologically relevant problems. The Second BioCreative assessment (2006 to 2007) attracted 44 teams from 13 countries worldwide, with the aim of evaluating current information-extraction/text-mining technologies developed for one or more of the three tasks defined for this challenge evaluation. These tasks included the recognition of gene mentions in abstracts (gene mention task); the extraction of a list of unique identifiers for human genes mentioned in abstracts (gene normalization task); and finally the extraction of physical protein-protein interaction annotation-relevant information (protein-protein interaction task). The 'gold standard' data used for evaluating submissions for the third task was provided by the interaction databases MINT (Molecular Interaction Database) and IntAct. The Second BioCreative assessment almost doubled the number of participants for each individual task when compared with the first BioCreative assessment. An overall improvement in terms of balanced precision and recall was observed for the best submissions for the gene mention (F score 0.87); for the gene normalization task, the best results were comparable (F score 0.81) compared with results obtained for similar tasks posed at the first BioCreative challenge. In case of the protein-protein interaction task, the importance and difficulties of experimentally confirmed annotation extraction from full-text articles were explored, yielding different results depending on the step of the annotation extraction workflow. A common characteristic observed in all three tasks was that the combination of system outputs could yield better results than any single system. Finally, the development of the first text-mining meta-server was promoted within the context of this community challenge.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 247 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 17 7%
Spain 5 2%
United Kingdom 5 2%
Australia 3 1%
Germany 2 <1%
Denmark 2 <1%
Netherlands 2 <1%
Switzerland 1 <1%
Brazil 1 <1%
Other 6 2%
Unknown 203 82%

Demographic breakdown

Readers by professional status Count As %
Researcher 52 21%
Student > Ph. D. Student 51 21%
Student > Master 39 16%
Professor 22 9%
Professor > Associate Professor 19 8%
Other 54 22%
Unknown 10 4%
Readers by discipline Count As %
Computer Science 75 30%
Agricultural and Biological Sciences 46 19%
Social Sciences 26 11%
Medicine and Dentistry 14 6%
Linguistics 9 4%
Other 60 24%
Unknown 17 7%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 30 January 2021.
All research outputs
#8,534,528
of 25,373,627 outputs
Outputs from Genome Biology
#3,489
of 4,467 outputs
Outputs of similar age
#35,135
of 95,711 outputs
Outputs of similar age from Genome Biology
#18
of 32 outputs
Altmetric has tracked 25,373,627 research outputs across all sources so far. This one is in the 43rd percentile – i.e., 43% of other outputs scored the same or lower than it.
So far Altmetric has tracked 4,467 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 27.6. This one is in the 14th percentile – i.e., 14% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 95,711 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 17th percentile – i.e., 17% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 32 others from the same source and published within six weeks on either side of this one. This one is in the 28th percentile – i.e., 28% of its contemporaries scored the same or lower than it.