↓ Skip to main content

Utilising identifier error variation in linkage of large administrative data sources

Overview of attention for article published in BMC Medical Research Methodology, February 2017
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
22 Dimensions

Readers on

mendeley
52 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Utilising identifier error variation in linkage of large administrative data sources
Published in
BMC Medical Research Methodology, February 2017
DOI 10.1186/s12874-017-0306-8
Pubmed ID
Authors

Katie Harron, Gareth Hagger-Johnson, Ruth Gilbert, Harvey Goldstein

Abstract

Linkage of administrative data sources often relies on probabilistic methods using a set of common identifiers (e.g. sex, date of birth, postcode). Variation in data quality on an individual or organisational level (e.g. by hospital) can result in clustering of identifier errors, violating the assumption of independence between identifiers required for traditional probabilistic match weight estimation. This potentially introduces selection bias to the resulting linked dataset. We aimed to measure variation in identifier error rates in a large English administrative data source (Hospital Episode Statistics; HES) and to incorporate this information into match weight calculation. We used 30,000 randomly selected HES hospital admissions records of patients aged 0-1, 5-6 and 18-19 years, for 2011/2012, linked via NHS number with data from the Personal Demographic Service (PDS; our gold-standard). We calculated identifier error rates for sex, date of birth and postcode and used multi-level logistic regression to investigate associations with individual-level attributes (age, ethnicity, and gender) and organisational variation. We then derived: i) weights incorporating dependence between identifiers; ii) attribute-specific weights (varying by age, ethnicity and gender); and iii) organisation-specific weights (by hospital). Results were compared with traditional match weights using a simulation study. Identifier errors (where values disagreed in linked HES-PDS records) or missing values were found in 0.11% of records for sex and date of birth and in 53% of records for postcode. Identifier error rates differed significantly by age, ethnicity and sex (p < 0.0005). Errors were less frequent in males, in 5-6 year olds and 18-19 year olds compared with infants, and were lowest for the Asian ethic group. A simulation study demonstrated that substantial bias was introduced into estimated readmission rates in the presence of identifier errors. Attribute- and organisational-specific weights reduced this bias compared with weights estimated using traditional probabilistic matching algorithms. We provide empirical evidence on variation in rates of identifier error in a widely-used administrative data source and propose a new method for deriving match weights that incorporates additional data attributes. Our results demonstrate that incorporating information on variation by individual-level characteristics can help to reduce bias due to linkage error.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 52 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 52 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 17 33%
Student > Ph. D. Student 7 13%
Student > Master 7 13%
Student > Bachelor 4 8%
Student > Doctoral Student 3 6%
Other 1 2%
Unknown 13 25%
Readers by discipline Count As %
Medicine and Dentistry 9 17%
Social Sciences 4 8%
Computer Science 4 8%
Nursing and Health Professions 4 8%
Psychology 3 6%
Other 10 19%
Unknown 18 35%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 01 March 2017.
All research outputs
#20,407,586
of 22,957,478 outputs
Outputs from BMC Medical Research Methodology
#1,889
of 2,026 outputs
Outputs of similar age
#355,872
of 420,247 outputs
Outputs of similar age from BMC Medical Research Methodology
#31
of 34 outputs
Altmetric has tracked 22,957,478 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.
So far Altmetric has tracked 2,026 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.2. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 420,247 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 34 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.