↓ Skip to main content

Improving prevalence estimation through data fusion: methods and validation

Overview of attention for article published in BMC Medical Informatics and Decision Making, June 2015
Altmetric Badge

About this Attention Score

  • Above-average Attention Score compared to outputs of the same age (56th percentile)
  • Above-average Attention Score compared to outputs of the same age and source (64th percentile)

Mentioned by

twitter
6 X users
facebook
1 Facebook page

Citations

dimensions_citation
4 Dimensions

Readers on

mendeley
23 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Improving prevalence estimation through data fusion: methods and validation
Published in
BMC Medical Informatics and Decision Making, June 2015
DOI 10.1186/s12911-015-0169-z
Pubmed ID
Authors

Tomàs Aluja-Banet, Josep Daunis-i-Estadella, Núria Brunsó, Anna Mompart-Penina

Abstract

Estimation of health prevalences is usually performed with a single survey. Some attempts have been made to integrate more than one source of data. We propose here to validate this approach through data fusion. Data Fusion is the process of integrating two sources of data into one combined file. It allows us to take even greater advantage of existing information collected in databases. Here, we use data fusion to improve the estimation of health prevalences for two primary health factors: cardiovascular diseases and diabetes. We use a real data fusion operation on population health, where the imputation of basic health risk factors is used to enrich a large-scale survey on self-reported health status. We propose choosing the imputation methodology for this problem through a suite of validation statistics that assess the quality of the fused data. The compared imputation techniques have been chosen from among the main imputation methodologies: k-nearest neighbor, probabilistic modeling and regression. We use the 2006 Health Survey of Catalonia, which provides a complete report of the perceived health status. In order to deal with the uncertainty problem, we compare these methodologies under the single and multiple imputation frames. A suite of validation statistics allows us to discern the strengths and weaknesses of studied imputation methods. Multiple outperforms single imputation by providing better and much more stable estimates, according to the computed validation statistics. The summarized results indicate that the probabilistic methods preserve the multivariate structure better; sequential regression methods deliver greater accuracy of imputed data; and nearest neighbor methods end up with a more realistic distribution of imputed data. Data fusion allows us to integrate two sources of information in order to take grater advantage of the available data. Multiple imputed sequential regression models have the advantage of grater interpretability and can be used for health policy. Under certain conditions, more accurate estimates of the prevalences can be obtained using fused data (the original data plus the imputed data) than just by using only the observed data.

X Demographics

X Demographics

The data shown below were collected from the profiles of 6 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 23 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 4%
Unknown 22 96%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 4 17%
Student > Doctoral Student 4 17%
Researcher 3 13%
Professor 2 9%
Librarian 2 9%
Other 7 30%
Unknown 1 4%
Readers by discipline Count As %
Medicine and Dentistry 7 30%
Engineering 3 13%
Mathematics 2 9%
Social Sciences 2 9%
Computer Science 1 4%
Other 5 22%
Unknown 3 13%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 3. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 April 2016.
All research outputs
#12,735,145
of 22,815,414 outputs
Outputs from BMC Medical Informatics and Decision Making
#847
of 1,988 outputs
Outputs of similar age
#113,712
of 264,049 outputs
Outputs of similar age from BMC Medical Informatics and Decision Making
#15
of 42 outputs
Altmetric has tracked 22,815,414 research outputs across all sources so far. This one is in the 43rd percentile – i.e., 43% of other outputs scored the same or lower than it.
So far Altmetric has tracked 1,988 research outputs from this source. They receive a mean Attention Score of 4.9. This one has gotten more attention than average, scoring higher than 56% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 264,049 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 56% of its contemporaries.
We're also able to compare this research output to 42 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 64% of its contemporaries.