Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru

Authors

DOI:

https://doi.org/10.26439/interfases2024.n020.7417

Keywords:

air pollution, air quality, meteorological data, machine learning, XGBoost, LightGBM

Abstract

Air pollution is a major problem that affects both human health and the environment, causing millions of premature deaths annually worldwide and severely degrading the state of the planet. Exposure to fine particulate matter, which is highly hazardous, enables these particles to penetrate deeply into the lungs and lead to serious health issues, including a reduction in life expectancy by more than two years. In response to this problem, it is crucial to identify effective ways to monitor the levels of these pollutants in our daily surroundings. This article presents a case study conducted in the district of San Borja, Lima, Peru, where prediction models for PM2.5 and PM10 were implemented using the XGBoost and LightGBM algorithms. Employing data from the SENAMHI portal and a correlation analysis of variables, two different scenarios were developed for training the models. In scenario 1, prediction models for PM2.5 and PM10 were trained using all available meteorological and pollution variables. In scenario 2, the models were trained for PM2.5 excluding the PM10 variable, and vice versa. The results showed that both models achieved high accuracy, measured by the coefficient of determination, with no statistically significant difference indicating the superiority of either model. Furthermore, the analysis of the proposed scenarios revealed that excluding key variables can result in significantly less accurate predictions, potentially undermining the effectiveness of environmental management strategies.

Downloads

Download data is not yet available.

Author Biographies

  • Johan Andrés Oblitas Mantilla, Universidad de Lima, Peru

    Bachiller en Ingeniería de Sistemas por la Universidad de Lima (décimo superior), donde se especializó principalmente en ingeniería de software. Actualmente cursa una maestría en Ingeniería Eléctrica e Informática (ECE) en la Universidad de Oklahoma, donde previamente realizó dos pasantías en el Centro de Investigación de Radar Avanzado, aplicando conocimientos en electromagnetismo y teledetección, en colaboración con el Grupo de Investigación y Desarrollo de Antenas en Fase (PAARD). Su experiencia abarca procesos como la caracterización de materiales, corte por láser y simulación electromagnética, además de brindar soporte en el mantenimiento y desarrollo del sitio web oficial de PAARD. Entre sus competencias en informática se destacan el desarrollo web, programación en Java, Python, JavaScript y C++, gestión de bases de datos, análisis de datos y gestión de procesos.

  • Edwin Jhonatan Escobedo Cárdenas, Universidad de Lima, Peru

    Doctor y magíster en Ciencias de la Computación por la Universidade Federal de Ouro Preto, Brasil y bachiller en Ciencias de la Computación e Ingeniería Informática por la Universidad Nacional de Trujillo. Actualmente se desempeña como docente en la carrera de Ingeniería de Sistemas de la Universidad de Lima y es investigador registrado en RENACYT. Sus áreas de interés incluyen la visión computacional, el aprendizaje automático (machine learning) y la ciencia de datos.

References

Ameer, S., Shah, M, A., Khan, A., Song, H., Maple, C., Islam, S. U., & Asghar, M. N. (2019). Comparative analysis of machine learning techniques for predicting air quality in smart cities, IEEE Access, 7, 128325–128338. https://doi.org/10.1109/ACCESS.2019.2925082

Amuthadevi, C., Vijayan, D. S. & Ramachandran, V. (2021). Development of air quality monitoring (AQM) models using different machine learning approaches, Journal of Ambient Intelligence and Humanized Computing, 13(1), 33. https://doi.org/10.1007/s12652-020-02724-2

Ayus, I., Natarajan, N. & Gupta, D. (2023). Comparison of machine learning and deep learning techniques for the prediction of air pollution: a case study from China, Asian Journal of Atmospheric Environment, 17, Article 4. https://doi.org/10.1007/s44273-023-00005-w

Bai, Y., Li, Y., Wang, X., Xie, J., & Li, C. (2016). Air pollutants concentrations forecasting using back propagation neural network based on wavelet decomposition with meteorological conditions, Atmospheric Pollution Research, 7(3), 557–566. https://doi.org/10,1016/j.apr.2016.01.004

Cordova, C. H., Portocarrero, M. N. L., Salas, R., Torres, R., Canas, P., & López-Gonzales J. L. (2021). Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru, Scientific Reports, 11, Article 24232. https://doi.org/10.1038/s41598-021-03650-9

Gokul, P. R., Mathew, A., Bhosale, A., & Nair, A. T. (2023). Spatio-temporal air quality analysis and PM2,5 prediction over Hyderabad City, India using artificial intelligence techniques, Ecological Informatics, 76, Article 102067. https://doi.org/10.1016/j.ecoinf.2023.102067

Gryech, I., Ghogho, M., Elhammouti, H., Sbihi, N., & Kobbane, A. (2020). Machine learning for air quality prediction using meteorological and traffic related features, Journal of Ambient Intelligence and Smart Environments, 12(5), 379–391. https://doi.org/10.3233/AIS-200572

Liang, Y-C., Maimury, Y., Chen, A. H-L., & Juarez, J. R. C. (2020). Machine learning-based prediction of air quality, Applied Sciences, 10(24), Article 9151. https://doi.org/10.3390/app10249151

Liu, X., Zhao, K., Liu, Z., & Wang, L. (2023). PM2,5 Concentration Prediction Based on LightGBM Optimized by Adaptive Multi-Strategy Enhanced Sparrow Search Algorithm, Atmosphere, 14(11), Article 1612. https://doi.org/10.3390/atmos14111612

Martín-Baos, J. Á., Rodriguez-Benitez, L., García-Ródenas, R., & Liu, J. (2022). IoT based monitoring of air quality and traffic using regression analysis, Applied Soft Computing, 115, Article 108282. https://doi.org/10.1016/j.asoc.2021.108282

Pan, B. (2018). Application of XGBoost algorithm in hourly PM2,5 concentration prediction, IOP Conference Series: Earth and Environmental Science, 113, Article 012127. https://doi.org/10.1088/1755-1315/113/1/012127

Servicio Nacional de Meteorología e Hidrología del Perú. (2024). Monitoreo de la Calidad de Aire, para Lima Metropolitana. https://www.senamhi.gob.pe/?p=calidad-del-aire-estacion&e=112194

Shakya, D., Deshpande, V., Goyal, M. K., & Agarwal, M. (2023). PM2,5 air pollution prediction through deep learning using meteorological, vehicular, and emission data: A case study of New Delhi, India, Journal of Cleaner Production, 427, Article 139278. https://doi.org/10.1016/j.jclepro.2023.139278

Sulaimon, I. A., Alaka, H., Olu-Ajayi, R., Ahmad, M., Ajayi, S. & Hye, A. (2022). Effect of traffic data set on various machine-learning algorithms when forecasting air quality, Journal of Engineering, Design and Technology, 22(3), 1030–1056. https://doi.org/10.1108/JEDT-10-2021-0554

Wang, Z., Chen, P., Wang, R., An, Z., & Qiu, L. (2023). Estimation of PM2,5 concentrations with high spatiotemporal resolution in Beijing using the ERA5 dataset and machine learning models, Advances in Space Research, 71(8), 3150–3165. https://doi.org/10.1016/j.asr.2022.12.016

World Health Organization. (2021). WHO global air quality guidelines: particulate matter (PM2,5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. https://apps.who.int/iris/handle/10665/345329

World Health Organization. (2022). Air pollution. https://www.who.int/health-topics/air-pollution#tab=tab_1

Yang, W., Deng, M., Xu, F., & Wang, H. (2018). Prediction of hourly PM2,5 using a spacetime support vector regression model, Atmospheric Environment, 181, 12–19. https://doi.org/10.1016/j.atmosenv.2018.03.015

Zhang, D., & Woo, S. S. (2020). Real time localized air quality monitoring and prediction through mobile and fixed IoT sensing network, IEEE Access, 8, 89584–89594. https://doi.org/10.1109/ACCESS.2020.2993547

Zhang, K., Yang, X., Cao, H., Thé, J., Tan, Z., & Yu, H. (2023). Multi-step forecast of PM2,5 and PM10 concentrations using convolutional neural network integrated with spatial–temporal attention and residual learning, Environment International, 171, Article 107691. https://doi.org/10.1016/j.envint.2022.107691

Downloads

Published

2024-12-26

Issue

Section

Research papers

How to Cite

Prediction of PM2.5 and PM10 Concentrations Using XGBoost and LightGBM Algorithms: A Case Study in Lima, Peru. (2024). Interfases, 020, 185-208. https://doi.org/10.26439/interfases2024.n020.7417

Most read articles by the same author(s)