Application of machine learning algorithms to PM2.5 concentration analysis in the state of São Paulo, Brazil




Air pollutants; Particulate Matter; Clustering; Association Rules; Air quality; Respiratory diseases.


Air quality monitoring data are useful in different areas of research and have varied applications, especially with a focus on the relationship between air pollution, respiratory problems, and other health hazards. The main atmospheric pollutants are: ozone (O3), sulfur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2), and particulate matter (PM). PM is one of the main objects of study when one intends to protect people from exposure to pollutants. This study contributes to the analysis of PM2.5 in 21 stations in the state of São Paulo monitored by the Environmental Company of São Paulo State (CETESB). It employs cluster analysis, a prominent data mining method for detecting patterns and discovering similarities which is important for assessing air pollution, especially in a geographically vast area such as that of the state of São Paulo, which does not follow a single pattern. Another data mining technique (association rules) supports the analysis of the relationship between pollutants and meteorological variables, as it allows identifying changes between elements that occur together, in a wide variety of data. Our objectives include determining stations with similar behaviors and exploring the temporal variety of the pollutant as it relates to the dominant meteorological factors in the periods of high concentration. The clustering algorithm automatically separates stations according to their monthly averages of PM2.5 concentration between 2017 and 2019. The clusters of stations that showed the highest pollution rates essentially included urban centers with emissions by industries and vehicles, while those with the lowest rates were located further inland. A cyclical behavior in pollutant variation was also observed in the three years under study and for both clusters. For the months with the highest concentration of PM2.5, association rule learning was applied to connect air temperature, relative humidity, and wind speed with PM2.5 and carbon monoxide (CO) concentrations. The obtained results are useful to analyze the temporal and geolocation profiles of pollution by particulate matter, since they identify the behavior of the meteorological factors that predominate in periods of greater concentration.


Download data is not yet available.


ABE, K.; MIRAGLIA, S. Avaliação de impacto à saúde do programa de controle de poluição do ar por veículos automotores no município de São Paulo, Brasil. Revista Brasileira de Ciências Ambientais (Online), n. 47, p. 61-73, 2018.

AGRAWAL, R.; SRIKANT, R. Fast Algorithms for Mining Association Rules in Large Databases. In: INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 20., 1994. Proceedings… 1994. p. 487-499.

AMEER, S.; SHAH, M. A.; KHAN, A.; SONG, H.; MAPLE, C.; ISLAM, S. U.; ASGHAR, M. N. Comparative Analysis of Machine Learning Techniques for Predicting Air Quality in Smart Cities. IEEE Access, v. 7, p. 128325-128338, 2019.

ANDRADE, M.; MIRANDA, R. M.; FORNARO, A.; KERR, A.; OYAMA, B.; ANDRE, P. A.; SALDIVA, P. Vehicle emissions and PM2.5 mass concentrations in six Brazilian cities. Air Quality, Atmosphere and Health, v. 5, p. 79-88, 2012.

ARAÚJO, J.; ROSÁRIO, N. Poluição atmosférica associada ao material particulado no estado de São Paulo: análise baseada em dados de satélite. Revista Brasileira de Ciências Ambientais (Online), v. 55, n. 1, p. 32-47, 2020.

AUSTIN, E.; COULL, B. A.; ZANOBETTI, A.; KOUTRAKIS, P. A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition. Environment International, v. 59, p. 244-254, 2013.

BATISTA, A. F. M.; CHIAVEGATTO, A. D. P. Machine Learning aplicado à Saúde. Workshop: Machine Learning. In: SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADO À SAÚDE, 19., 2019. Proceedings... Sociedade Brasileira de Computação, 2019. Available at: < /sbc/catalog/view/29/95/245-1>. Accessed on: Jul. 20, 2020.

BISHT, M.; SEEJA K.R. Air Pollution Prediction Using Extreme Learning Machine: A Case Study on Delhi (India). In: SOMANI, A.; SRIVASTAVA, S.; MUNDRA, A.; RAWAT, S. (eds.). Proceedings of First International Conference on Smart System, Innovations and Computing. Smart Innovation, Systems and Technologies. Singapore: Springer, 2018. v. 79. p. 181-189.

BRAZIL. Ministério do Meio Ambiente. Conselho Nacional do Meio Ambiente. Resolução nº 491, de 19 de novembro de 2018. Brasil, 2018. Available from: <>. Accessed on: Jun. 10, 2019.

CARDOSO, K. M.; PAULA, A.; SANTOS, J. S.; SANTOS, M. L. P. Uso de espécies da arborização urbana no biomonitoramento de poluição ambiental. Ciência Florestal, v. 27, n. 2, p. 535-547, 2017.

CASTRO, L. N.; FERRARI, D. G. Introdução a Mineração de Dados. Conceitos Básicos, Algoritmos e Aplicações. São Paulo: Saraiva, 2016. 351 p.

CÉSAR, A. C. G.; NASCIMENTO, L. F. C.; MANTOVANI, K. C. C.; VIEIRA, L. C. P. Fine particulate matter estimated by mathematical model and hospitalizations for pneumonia and asthma in children. Revista Paulista de Pediatria, v. 34, n. 1, p. 18-23, 2016.

COMPANHIA AMBIENTAL DO ESTADO DE SÃO PAULO (CETESB). Relatório de Qualidade do Ar no estado de São Paulo. São Paulo: Governo do Estado de São Paulo / Secretaria do Meio Ambiente / Companhia Ambiental do Estado de São Paulo, 2019. Available from: <>. Accessed on: May 8, 2019.

COMPANHIA AMBIENTAL DO ESTADO DE SÃO PAULO (CETESB). Winter Operation Report. Available at: <ório-Operação-Inverno-2019.pdf>. Accessed on: Apr. 12, 2020.

DIMITRIOU, K. Upgrading the estimation of daily PM10 concentrations utilizing prediction variables reflecting atmospheric processes. Aerosol and Air Quality Research, v. 16, n. 9, p. 2245-2254, 2016.

DU, X.; VARDE, A. S. Mining PM2.5 and traffic conditions for air quality. In: INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, 7., 2016. Proceedings… ICICS, 2016. p. 33-38.

GONÇALVES, F. L. T.; CARVALHO, L. M. V.; CONDE, F. C.; LATORRE, M. R. D. O.; SALDIVA, P. H. N.; BRAGA, A. L. F. The efects of air pollution and meteorological parameters on respiratory morbidity during summer in São Paulo City. Environment International, v. 31, n. 3, p. 343-349, 2005.

GUERRA, F. P.; MIRANDA, R. M. Influência da meteorologia na concentração do poluente atmosférico PM2,5 na RMRJ e na RMSP. In: CONGRESSO BRASILEIRO DE GESTÃO AMBIENTAL, 2., 2011. Proceedings... 2011.

GUIDETTI, B.; PEREDA, P. Air Pollution Consequences in São Paulo: Evidence for Health. 2018. 20 p.

HAN, J.; KAMBER, M. Data Mining: Concepts and Techniques. 2nd ed. San Francisco: Morgan Kaufmann Publishers, 2006.

HAN, J.; KAMBER, M.; PEI, J. Data Mining: Concepts and Techniques. 3ª ed. Burlington: Morgan Kaufmann, 2011.

HUANG, P.; ZHANG, J.; TANG, Y.; LIU, L. Spatial and temporal distribution of PM2.5 pollution in Xi’an city, China. International Journal of Environmental Research and Public Health, v. 12, n. 6, p. 6608-6625, 2015.

INSTITUO NACIONAL DE PESQUISAS ESPACIAIS (INPE). Boletins de Informações Climáticas do CPTEC/INPE, ano 24, n. 1-12, 2019. Available from: <>. Accessed on: May 8, 2019.

JIN, X.; HAN, J. K-Medoids Clustering. In: SAMMUT, C.; WEBB, G. I. (Eds.). Encyclopedia of Machine Learning and Data Mining. Boston: Springer, 2017. p. 697-700.

KAUFMAN, L.; ROUSSEEUW, P. J. Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley Series in Probability and Statistics, 2005.

KWEDLO, W. A clustering method combining differential evolution with the K-means algorithm. Pattern Recognition Letters, v. 32, n. 12, p. 1613-1621, 2011.

LI, Z.; ZHOU, W.; LIU, X.; QUIAN, Y.; WANG, C.; XIE, Z.; MA, H. Research on Association Rules Mining of Atmospheric Environment Monitoring Data. In: HONG, W.; LI, C.; WANG, Q. (eds.). Technology-Inspired Smart Learning for Future Education. NCCSTE 2019. Singapore: Springer, 2020. (Communications in Computer and Information Science, v. 1216.)

MACHIN, A. B.; NASCIMENTO, L. F. C. Efeitos da exposição a poluentes do ar na saúde das crianças de Cuiabá, Mato Grosso, Brasil. Cadernos de Saúde Pública, v. 34, n. 3, p. 1-9, 2018.

MITSA, T. Temporal data mining. In: MITSA, T. Temporal Data Mining. New York: Chapman and Hall, 2010. p. 46-48.

MOISAN, S.; HERRERA, R.; CLEMENTS, A. A dynamic multiple equation approach for forecasting PM2,5 pollution in Santiago, Chile. International Journal of Forecasting, v. 34, n. 4, p. 566-581, 2018.

MORAES, S. L.; ALMENDRA, R.; SANTANA, P.; GALVANI, E. Meteorological variables and air pollution and their association with hospitalizations due to respiratory diseases in children: A case study in São Paulo, Brazil. Cadernos de Saúde Pública, v. 35, n. 7, p. 1-16, 2019.

MUELLER, A. Fast sequential and parallel algorithms for association rule mining: a comparison. Thesis (M.S.) – Department of Computer Science, University of Maryland, College Park, 1995.

NEIROTTI, P.; MARCO, A.; CAGLIANO, A. C.; MANGANO, G.; SCORRANO, F. Current trends in smart city initiatives: Some stylised facts. Cities, v. 38, p. 25-36, 2014.

NODARI, A. S.; SALDANHA, C. B. Episódios críticos de Poluição Atmosférica no município de Porto Alegre/RS. In: INTERNATIONAL SYMPOSIUM ON ENVIRONMENTAL QUALITY, 10., 2016. Available at: <>. Accessed on: Feb. 20, 2019.

NOGAROTTO, D. C. Avaliação de modelos de regressão de trajetórias para a previsão de poluentes atmosféricos. 145f. Thesis (Doctoring) – Faculdade de Tecnologia, Universidade Estadual de Campinas, Limeira, 2019. Available at: <>. Accessed on: May 22, 2020.

PEDREGOSA, F.; VAROQUAUX, G.; GRAMFORT, A.; MICHEL, V.; THIRION, B.; GRISEL, O.; BLONDEL, M.; PRETTENHOFER, P.; WEISS, R.; DUBOURG, V.; VANDERPLAS, J.; PASSOS, A.; COURNAPEAU, D.; BRUCHER, M.; PERROT, M.; DUCHESNAY, E.. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, v. 12, n. 85, p. 2825-2830, 2011. Available from: <>. Accessed on: Mar. 5, 2020.

PLAIA, A., BONDI, A. L. Single imputation method of missing values in environmental pollution datasets. Atmospheric Environment, v. 40, n. 38, p. 7316-7330, 2006.

POLEZER, G.; TADANO, Y. S.; SIQUEIRA, H. V.; GODOI, A. F. L.; YAMAMOTO, C. I.; ANDRÉ, P. A.; PAULIQUEVIS, T.; ANDRADE, M. F.; OLIVEIRA, A.; SALDIVA, P. H. N.; TAYLOR, P. E.; GODOI, R. H. M. Assessing the impact of PM2.5 on respiratory disease using artificial neural networks. Environmental Pollution, v. 235, p. 394-403, 2018.

QUALAR (2019). Qualidade do Ar. Dados meteorológicos. CETESB. Available from: <>. Accessed on: May 8, 2019.

REINHARDT, T. E.; OTTMAR, R. D.; CASTILLA, C.; Smoke Impacts from Agricultural Burning in a Rural Brazilian Town. Journal of the Air & Waste Management Association, v. 51, n. 3, p. 443-450, 2011.

SADAT, Y. K.; KARIMIPOUR, F.; SADAT, A. K. Investigating the relation between prevalence of asthmatic allergy with the characteristics of the environment using association rule mining. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, v. 40, n. 2W3, p. 169-174, 2014.

SAIDE, P. E.; CARMICHAEL, G. R.; SPAK, S. N.; GALLARDO, L.; OSSES, A.; MENA-CARRASCO, M.; PAGOWSKI, M. Forecasting urban PM10 and PM2. 5 pollution episodes in very stable nocturnal conditions and complex terrain using WRF–Chem CO tracer model. Atmospheric Environment, v. 45, n. 16, p. 2769-2780, 2011.

SANTOS, F. S.; PINTO, J. A.; MACIEL, F. M.; HORTA, F. S.; ALBUQUERQUE, T. T. A.; ANDRADE, M. F. Avaliação da influência das condições meteorológicas na concentração de material particulado fino (MP2,5) em Belo Horizonte, MG. Engenharia Sanitária e Ambiental, v. 24, n. 2, p. 371-381, 2019.

SANTOS, T. C.; CARVALHO, V. S. B; REBOITA, M. S. Avaliação da influência das condições meteorológicas em dias com altas concentrações de material particulado na Região Metropolitana do Rio de Janeiro. Engenharia Sanitária e Ambiental, v. 21, n. 2, p. 307-313, 2016.

SÃO PAULO. Decreto nº 59.113, de 23 de abril de 2013. Estabelece novos padrões de qualidade do ar e dá providências correlatas. Com retificações posteriores. São Paulo, 2013. Available from: <>. Accessed on: Dec., 2019.

SEINFELD, J. H.; PANDIS, S. N. Atmospheric Chemistry and Physics from Air Pollution to Climate Change. 3rd ed. New York: Wiley, 2016.

SOUZA, F. T.; RABELO, W. S. A data mining approach to study the air pollution induced by urban phenomena and the association with respiratory diseases. In: INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, 2016. Proceedings… 2016. p. 1045-1050.

WORLD HEALTH ORGANIZATION (WHO). Nine out of ten people worldwide breathe polluted air, but more countries are taking action. WHO, 2019. Available from: <>. Accessed on: May 8, 2019.

XIAO, C.; CHANG, M.; GUO, P.; YUAN, M.; XU, C.; SONG, X.; XIONG, X.; LI, Y.; LI, Z. Characteristics analysis of industrial atmospheric emission sources in Beijing–Tianjin–Hebei and Surrounding Areas using data mining and statistics on different time scales. Atmospheric Pollution Research, v. 11, n. 1, p. 11-26, 2020.

YANAGI, Y.; ASSUNÇÃO, J. V.; BARROZO, L. V. The impact of atmospheric particulate matter on cancer incidence and mortality in the city of São Paulo, Brazil Influência do material particulado atmosférico na incidência e mortalidade por câncer no Município. Cadernos de Saúde Pública, v. 28, n. 9, p. 1737-1748, 2012.

ZOU, B.; PENG, F.; WAN, N.; MAMADY, K.; WILSON, G. J. Spatial cluster detection of air pollution exposure inequities across the United States. PLoS One, v. 9, n. 3, e91917, 2014.




How to Cite

Godoy, A. R. L., Silva, A. E. A. da, Bueno, M. C., Pozza, S. A., & Coelho, G. P. (2021). Application of machine learning algorithms to PM2.5 concentration analysis in the state of São Paulo, Brazil. Revista Brasileira De Ciências Ambientais (RBCIAMB), 56(1), 152–165.