↓ Skip to main content

Sociodemographic differences in linkage error: an examination of four large-scale datasets

Overview of attention for article published in BMC Health Services Research, September 2018
Altmetric Badge

Citations

dimensions_citation
7 Dimensions

Readers on

mendeley
30 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Sociodemographic differences in linkage error: an examination of four large-scale datasets
Published in
BMC Health Services Research, September 2018
DOI 10.1186/s12913-018-3495-x
Pubmed ID
Authors

Sean Randall, Adrian Brown, James Boyd, Rainer Schnell, Christian Borgs, Anna Ferrante

Abstract

Record linkage is an important tool for epidemiologists and health planners. Record linkage studies will generally contain some level of residual record linkage error, where individual records are either incorrectly marked as belonging to the same individual, or incorrectly marked as belonging to separate individuals. A key question is whether errors in linkage quality are distributed evenly throughout the population, or whether certain subgroups will exhibit higher rates of error. Previous investigations of this issue have typically compared linked and un-linked records, which can conflate bias caused by record linkage error, with bias caused by missing records (data capture errors). Four large administrative datasets were individually de-duplicated, with results compared to an available 'gold-standard' benchmark, allowing us to avoid methodological issues with comparing linked and un-linked records. Results were compared by gender, age, geographic remoteness (major cities, regional or remote) and socioeconomic status. Results varied between datasets, and by sociodemographic characteristic. The most consistent findings were worse linkage quality for younger individuals (seen in all four datasets) and worse linkage quality for those living in remote areas (seen in three of four datasets). The linkage quality within sociodemographic categories varied between datasets, with the associations with linkage error reversed across different datasets due to quirks of the specific data collection mechanisms and data sharing practices. These results suggest caution should be taken both when linking younger individuals and those in remote areas, and when analysing linked data from these subgroups. Further research is required to determine the ramifications of worse linkage quality in these subpopulations on research outcomes.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 30 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 30 100%

Demographic breakdown

Readers by professional status Count As %
Researcher 10 33%
Student > Master 6 20%
Student > Bachelor 2 7%
Student > Ph. D. Student 1 3%
Unknown 11 37%
Readers by discipline Count As %
Medicine and Dentistry 6 20%
Computer Science 5 17%
Nursing and Health Professions 3 10%
Psychology 3 10%
Pharmacology, Toxicology and Pharmaceutical Science 1 3%
Other 0 0%
Unknown 12 40%