↓ Skip to main content

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining

Overview of attention for article published in BMC Bioinformatics, June 2018
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (86th percentile)
  • High Attention Score compared to outputs of the same age and source (90th percentile)

Mentioned by

twitter
18 X users
wikipedia
1 Wikipedia page

Readers on

mendeley
71 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining
Published in
BMC Bioinformatics, June 2018
DOI 10.1186/s12859-018-2211-5
Pubmed ID
Authors

John A. Bachman, Benjamin M. Gyori, Peter K. Sorger

Abstract

For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.

X Demographics

X Demographics

The data shown below were collected from the profiles of 18 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 71 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 71 100%

Demographic breakdown

Readers by professional status Count As %
Student > Master 15 21%
Other 9 13%
Researcher 9 13%
Student > Ph. D. Student 7 10%
Student > Doctoral Student 4 6%
Other 13 18%
Unknown 14 20%
Readers by discipline Count As %
Agricultural and Biological Sciences 14 20%
Computer Science 14 20%
Biochemistry, Genetics and Molecular Biology 9 13%
Business, Management and Accounting 3 4%
Nursing and Health Professions 2 3%
Other 11 15%
Unknown 18 25%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 15. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 06 November 2021.
All research outputs
#2,359,187
of 25,216,325 outputs
Outputs from BMC Bioinformatics
#574
of 7,661 outputs
Outputs of similar age
#46,616
of 335,916 outputs
Outputs of similar age from BMC Bioinformatics
#10
of 96 outputs
Altmetric has tracked 25,216,325 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 90th percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 7,661 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done particularly well, scoring higher than 92% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 335,916 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 86% of its contemporaries.
We're also able to compare this research output to 96 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 90% of its contemporaries.