Title |
A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature
|
---|---|
Published in |
BMC Bioinformatics, October 2011
|
DOI | 10.1186/1471-2105-12-s8-s12 |
Pubmed ID | |
Authors |
Anália Lourenço, Michael Conover, Andrew Wong, Azadeh Nematzadeh, Fengxia Pan, Hagit Shatkay, Luis M Rocha |
Abstract |
We participated, as Team 81, in the Article Classification and the Interaction Method subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of the BioCreative III Challenge. For the ACT, we pursued an extensive testing of available Named Entity Recognition and dictionary tools, and used the most promising ones to extend our Variable Trigonometric Threshold linear classifier. Our main goal was to exploit the power of available named entity recognition and dictionary tools to aid in the classification of documents relevant to Protein-Protein Interaction (PPI). For the IMT, we focused on obtaining evidence in support of the interaction methods used, rather than on tagging the document with the method identifiers. We experimented with a primarily statistical approach, as opposed to employing a deeper natural language processing strategy. In a nutshell, we exploited classifiers, simple pattern matching for potential PPI methods within sentences, and ranking of candidate matches using statistical considerations. Finally, we also studied the benefits of integrating the method extraction approach that we have used for the IMT into the ACT pipeline. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 1 | 100% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 1 | 100% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 7% |
Germany | 1 | 3% |
Portugal | 1 | 3% |
United Kingdom | 1 | 3% |
Brazil | 1 | 3% |
Unknown | 24 | 80% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Master | 6 | 20% |
Student > Ph. D. Student | 6 | 20% |
Professor > Associate Professor | 4 | 13% |
Researcher | 3 | 10% |
Other | 1 | 3% |
Other | 3 | 10% |
Unknown | 7 | 23% |
Readers by discipline | Count | As % |
---|---|---|
Computer Science | 10 | 33% |
Agricultural and Biological Sciences | 5 | 17% |
Biochemistry, Genetics and Molecular Biology | 2 | 7% |
Business, Management and Accounting | 1 | 3% |
Economics, Econometrics and Finance | 1 | 3% |
Other | 2 | 7% |
Unknown | 9 | 30% |