Title |
Finding biomedical categories in Medline®
|
---|---|
Published in |
Journal of Biomedical Semantics, October 2012
|
DOI | 10.1186/2041-1480-3-s3-s3 |
Pubmed ID | |
Authors |
Lana Yeganova, Won Kim, Donald C Comeau, W John Wilbur |
Abstract |
There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly defined ontologies. Automatically identifying meaningful categories of entities in a large text corpus is useful for information extraction, construction of machine learning features, and development of semantic representations. In this paper we describe and compare two methods for automatically learning meaningful biomedical categories in Medline. The first approach is a simple statistical method that uses part-of-speech and frequency information to extract a list of frequent nouns from Medline. The second method implements an alignment-based technique to learn frequent generic patterns that indicate a hyponymy/hypernymy relationship between a pair of noun phrases. We then apply these patterns to Medline to collect frequent hypernyms as potential biomedical categories. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 1 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 1 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Japan | 1 | 5% |
Mexico | 1 | 5% |
French Polynesia | 1 | 5% |
Unknown | 18 | 86% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 7 | 33% |
Other | 4 | 19% |
Student > Ph. D. Student | 4 | 19% |
Professor | 1 | 5% |
Professor > Associate Professor | 1 | 5% |
Other | 0 | 0% |
Unknown | 4 | 19% |
Readers by discipline | Count | As % |
---|---|---|
Agricultural and Biological Sciences | 8 | 38% |
Computer Science | 4 | 19% |
Biochemistry, Genetics and Molecular Biology | 2 | 10% |
Engineering | 2 | 10% |
Medicine and Dentistry | 1 | 5% |
Other | 0 | 0% |
Unknown | 4 | 19% |