Title |
Text data extraction for a prospective, research-focused data mart: implementation and validation
|
---|---|
Published in |
BMC Medical Informatics and Decision Making, September 2012
|
DOI | 10.1186/1472-6947-12-106 |
Pubmed ID | |
Authors |
Monique Hinchcliff, Eric Just, Sofia Podlusky, John Varga, Rowland W Chang, Warren A Kibbe |
Abstract |
Translational research typically requires data abstracted from medical records as well as data collected specifically for research. Unfortunately, many data within electronic health records are represented as text that is not amenable to aggregation for analyses. We present a scalable open source SQL Server Integration Services package, called Regextractor, for including regular expression parsers into a classic extract, transform, and load workflow. We have used Regextractor to abstract discrete data from textual reports from a number of 'machine generated' sources. To validate this package, we created a pulmonary function test data mart and analyzed the quality of the data mart versus manual chart review. |
Twitter Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 4 | 44% |
United Kingdom | 2 | 22% |
India | 2 | 22% |
Unknown | 1 | 11% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Practitioners (doctors, other healthcare professionals) | 4 | 44% |
Members of the public | 4 | 44% |
Scientists | 1 | 11% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 2 | 3% |
United States | 2 | 3% |
Canada | 1 | 1% |
Unknown | 67 | 93% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 14 | 19% |
Student > Ph. D. Student | 10 | 14% |
Student > Master | 9 | 13% |
Student > Bachelor | 6 | 8% |
Student > Postgraduate | 6 | 8% |
Other | 12 | 17% |
Unknown | 15 | 21% |
Readers by discipline | Count | As % |
---|---|---|
Medicine and Dentistry | 17 | 24% |
Computer Science | 12 | 17% |
Engineering | 5 | 7% |
Agricultural and Biological Sciences | 4 | 6% |
Social Sciences | 4 | 6% |
Other | 9 | 13% |
Unknown | 21 | 29% |