↓ Skip to main content

A systematic review of data mining and machine learning for air pollution epidemiology

Overview of attention for article published in BMC Public Health, November 2017
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (85th percentile)
  • Good Attention Score compared to outputs of the same age and source (69th percentile)

Mentioned by

twitter
27 X users

Readers on

mendeley
544 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A systematic review of data mining and machine learning for air pollution epidemiology
Published in
BMC Public Health, November 2017
DOI 10.1186/s12889-017-4914-3
Pubmed ID
Authors

Colin Bellinger, Mohomed Shazan Mohomed Jabbar, Osmar Zaïane, Alvaro Osornio-Vargas

Abstract

Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.

X Demographics

X Demographics

The data shown below were collected from the profiles of 27 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 544 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 544 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 83 15%
Student > Master 73 13%
Researcher 61 11%
Student > Bachelor 42 8%
Student > Doctoral Student 27 5%
Other 89 16%
Unknown 169 31%
Readers by discipline Count As %
Computer Science 112 21%
Engineering 46 8%
Environmental Science 39 7%
Medicine and Dentistry 33 6%
Earth and Planetary Sciences 16 3%
Other 96 18%
Unknown 202 37%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 12. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 25 May 2021.
All research outputs
#2,799,996
of 23,881,329 outputs
Outputs from BMC Public Health
#3,185
of 15,466 outputs
Outputs of similar age
#61,931
of 443,144 outputs
Outputs of similar age from BMC Public Health
#54
of 176 outputs
Altmetric has tracked 23,881,329 research outputs across all sources so far. Compared to these this one has done well and is in the 88th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 15,466 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 14.3. This one has done well, scoring higher than 79% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 443,144 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 85% of its contemporaries.
We're also able to compare this research output to 176 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 69% of its contemporaries.