↓ Skip to main content

A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments

Overview of attention for article published in BMC Research Notes, January 2017
Altmetric Badge

Mentioned by

twitter
1 X user

Citations

dimensions_citation
27 Dimensions

Readers on

mendeley
66 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A step-by-step approach to improve data quality when using commercial business lists to characterize retail food environments
Published in
BMC Research Notes, January 2017
DOI 10.1186/s13104-016-2355-1
Pubmed ID
Authors

Kelly K. Jones, Shannon N. Zenk, Elizabeth Tarlov, Lisa M. Powell, Stephen A. Matthews, Irina Horoi

Abstract

Food environment characterization in health studies often requires data on the location of food stores and restaurants. While commercial business lists are commonly used as data sources for such studies, current literature provides little guidance on how to use validation study results to make decisions on which commercial business list to use and how to maximize the accuracy of those lists. Using data from a retrospective cohort study [Weight And Veterans' Environments Study (WAVES)], we (a) explain how validity and bias information from existing validation studies (count accuracy, classification accuracy, locational accuracy, as well as potential bias by neighborhood racial/ethnic composition, economic characteristics, and urbanicity) were used to determine which commercial business listing to purchase for retail food outlet data and (b) describe the methods used to maximize the quality of the data and results of this approach. We developed data improvement methods based on existing validation studies. These methods included purchasing records from commercial business lists (InfoUSA and Dun and Bradstreet) based on store/restaurant names as well as standard industrial classification (SIC) codes, reclassifying records by store type, improving geographic accuracy of records, and deduplicating records. We examined the impact of these procedures on food outlet counts in US census tracts. After cleaning and deduplicating, our strategy resulted in a 17.5% reduction in the count of food stores that were valid from those purchased from InfoUSA and 5.6% reduction in valid counts of restaurants purchased from Dun and Bradstreet. Locational accuracy was improved for 7.5% of records by applying street addresses of subsequent years to records with post-office (PO) box addresses. In total, up to 83% of US census tracts annually experienced a change (either positive or negative) in the count of retail food outlets between the initial purchase and the final dataset. Our study provides a step-by-step approach to purchase and process business list data obtained from commercial vendors. The approach can be followed by studies of any size, including those with datasets too large to process each record by hand and will promote consistency in characterization of the retail food environment across studies.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 66 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 2%
Unknown 65 98%

Demographic breakdown

Readers by professional status Count As %
Researcher 13 20%
Student > Master 9 14%
Student > Bachelor 6 9%
Student > Ph. D. Student 5 8%
Other 4 6%
Other 16 24%
Unknown 13 20%
Readers by discipline Count As %
Nursing and Health Professions 11 17%
Social Sciences 11 17%
Medicine and Dentistry 7 11%
Agricultural and Biological Sciences 3 5%
Engineering 3 5%
Other 13 20%
Unknown 18 27%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 15 May 2017.
All research outputs
#18,547,867
of 22,971,207 outputs
Outputs from BMC Research Notes
#3,035
of 4,282 outputs
Outputs of similar age
#311,311
of 421,203 outputs
Outputs of similar age from BMC Research Notes
#53
of 64 outputs
Altmetric has tracked 22,971,207 research outputs across all sources so far. This one is in the 11th percentile – i.e., 11% of other outputs scored the same or lower than it.
So far Altmetric has tracked 4,282 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.6. This one is in the 16th percentile – i.e., 16% of its peers scored the same or lower than it.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 421,203 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 15th percentile – i.e., 15% of its contemporaries scored the same or lower than it.
We're also able to compare this research output to 64 others from the same source and published within six weeks on either side of this one. This one is in the 9th percentile – i.e., 9% of its contemporaries scored the same or lower than it.