Articles | Volume 9, issue 2
https://doi.org/10.5194/ascmo-9-121-2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/ascmo-9-121-2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Forecasting 24 h averaged PM2.5 concentration in the Aburrá Valley using tree-based machine learning models, global forecasts, and satellite information
Jhayron S. Pérez-Carrasquilla
CORRESPONDING AUTHOR
Department of Atmospheric and Oceanic Science, University of Maryland, College Park, USA
Área Metropolitana del Valle de Aburrá, Proyecto Sistema de Alerta Temprana de Medellín y el Valle de Aburrá (SIATA), Medellín, Colombia
Paola A. Montoya
CORRESPONDING AUTHOR
Área Metropolitana del Valle de Aburrá, Proyecto Sistema de Alerta Temprana de Medellín y el Valle de Aburrá (SIATA), Medellín, Colombia
Escuela Ambiental, Facultad de Ingeniería, Universidad de Antioquia, Medellín, Colombia
Juan Manuel Sánchez
Área Metropolitana del Valle de Aburrá, Proyecto Sistema de Alerta Temprana de Medellín y el Valle de Aburrá (SIATA), Medellín, Colombia
Escuela Ambiental, Facultad de Ingeniería, Universidad de Antioquia, Medellín, Colombia
K. Santiago Hernández
Área Metropolitana del Valle de Aburrá, Proyecto Sistema de Alerta Temprana de Medellín y el Valle de Aburrá (SIATA), Medellín, Colombia
Escuela Ambiental, Facultad de Ingeniería, Universidad de Antioquia, Medellín, Colombia
Mauricio Ramírez
Área Metropolitana del Valle de Aburrá, Proyecto Sistema de Alerta Temprana de Medellín y el Valle de Aburrá (SIATA), Medellín, Colombia
Related authors
Anja Katzenberger, Jhayron S. Perez-Carrasquilla, Keighan Gemmell, Evgenia Galytska, Christine Leclerc, P. Punya, Indrani Roy, Arianna Varuolo-Clarke, Milica Tošić, and Nina Črnivec
EGUsphere, https://doi.org/10.5194/egusphere-2025-4744, https://doi.org/10.5194/egusphere-2025-4744, 2025
This preprint is open for discussion and under review for Earth System Dynamics (ESD).
Short summary
Short summary
Multi-model ensembles are a central approach in climate model analysis, but their use involves many complex considerations. In this work, we review relevant literature and synthesize existing studies to contribute to the development of guidelines for designing and conducting ensemble analyses. This is complemented by a collection of useful resources and a discussion of emerging trends, supported by statistics tracing the number of publications.
Anja Katzenberger, Jhayron S. Perez-Carrasquilla, Keighan Gemmell, Evgenia Galytska, Christine Leclerc, P. Punya, Indrani Roy, Arianna Varuolo-Clarke, Milica Tošić, and Nina Črnivec
EGUsphere, https://doi.org/10.5194/egusphere-2025-4744, https://doi.org/10.5194/egusphere-2025-4744, 2025
This preprint is open for discussion and under review for Earth System Dynamics (ESD).
Short summary
Short summary
Multi-model ensembles are a central approach in climate model analysis, but their use involves many complex considerations. In this work, we review relevant literature and synthesize existing studies to contribute to the development of guidelines for designing and conducting ensemble analyses. This is complemented by a collection of useful resources and a discussion of emerging trends, supported by statistics tracing the number of publications.
Maria P. Velásquez-García, K. Santiago Hernández, James A. Vergara-Correa, Richard J. Pope, Miriam Gómez-Marín, and Angela M. Rendón
Atmos. Chem. Phys., 24, 11497–11520, https://doi.org/10.5194/acp-24-11497-2024, https://doi.org/10.5194/acp-24-11497-2024, 2024
Short summary
Short summary
In the Aburrá Valley, northern South America, local emissions determine air quality conditions. However, we found that external sources, such as regional fires, Saharan dust, and volcanic emissions, increase particulate concentrations and worsen chemical composition by introducing elements like heavy metals. Dry winds and source variability contribute to seasonal influences on these events. This study assesses the air quality risks posed by such events, which can affect broad regions worldwide.
Cited articles
Ballesteros-González, K., Sullivan, A. P., and Morales-Betancourt, R.: Estimating the air quality and health impacts of biomass burning in northern South America using a chemical transport model, Sci. Total Environ., 739, 139755, https://doi.org/10.1016/j.scitotenv.2020.139755, 2020. a
Benedetti, A., Morcrette, J.-J., Boucher, O., Dethof, A., Engelen, R., Fisher, M., Flentje, H., Huneeus, N., Jones, L., Kaiser, J., Razinger, M., Schulz, M., Serrar, S., Simmons, A. J., Sofiev, M., Suttie, M., Tompkins, A. M., and Untch, A.: Aerosol analysis and forecast in the European centre for medium-range weather forecasts integrated forecast system: 2. Data assimilation, J. Geophys. Res.-Atmos., 114, D06206, https://doi.org/10.1029/2008JD011235, 2009. a
Bond, T. C., Doherty, S. J., Fahey, D. W., Forster, P. M., Berntsen, T., DeAngelo, B. J., Flanner, M. G., Ghan, S., Kärcher, B., Koch, D., Kinne, S., Kondo, Y., Quinn, P. K., Sarofim, M. C., Schultz, M. G., Schulz, M., Venkataraman, C., Zhang, H., Zhang, S., Bellouin, N., Guttikunda, S. K., Hopke, P. K., Jacobson, M. Z., Kaiser,J. W., Klimont, Z., Lohmann, U., Schwarz, J. P., Shindell, D., Storelvmo, T., Warren, S. G., and Zender, C. S.: Bounding the role of black carbon in the climate system: A scientific assessment, J. Geophys. Res.-Atmos., 118, 5380–5552, 2013. a
Breiman, L.: Random forests, Machine Learning, 45, 5–32, 2001. a
Chellali, M., Abderrahim, H., Hamou, A., Nebatti, A., and Janovec, J.: Artificial neural network models for prediction of daily fine particulate matter concentrations in Algiers, Environ. Sci. Pollut. R., 23, 14008–14017, 2016. a
Cutler, A., Cutler, D. R., and Stevens, J. R.: Random Forests, 157–175, Springer US, Boston, MA, ISBN 978-1-4419-9326-7, https://doi.org/10.1007/978-1-4419-9326-7_5, 2012. a
Dubovik, O., Holben, B., Eck, T. F., Smirnov, A., Kaufman, Y. J., King, M. D., Tanré, D., and Slutsker, I.: Variability of absorption and optical properties of key aerosol types observed in worldwide locations, J. Atmos. Sci., 59, 590–608, 2002. a
Friedman, J. H.: Greedy function approximation: a gradient boosting machine, Ann. Stat., 29, 1189–1232, 2001. a
Giglio, L., Descloitres, J., Justice, C. O., and Kaufman, Y. J.: An enhanced contextual fire detection algorithm for MODIS, Remote Sens. Environ., 87, 273–282, 2003. a
Giglio, L., Schroeder, W., and Justice, C. O.: The collection 6 MODIS active fire detection algorithm and fire products, Remote Sens. Environ., 178, 31–41, https://doi.org/10.1016/j.rse.2016.02.054, 2016. a
Gregorich, M., Strohmaier, S., Dunkler, D., and Heinze, G.: Regression with highly correlated predictors: variable omission is not the solution, Int. J. Env. Res. Pub. He., 18, 4259, https://doi.org/10.3390/ijerph18084259, 2021. a
Guo, L.-C., Bao, L.-J., She, J.-W., and Zeng, E. Y.: Significance of wet deposition to removal of atmospheric particulate matter and polycyclic aromatic hydrocarbons: A case study in Guangzhou, China, Atmos. Environ., 83, 136–144, 2014. a
Guo, W., Zhang, B., Wei, Q., Guo, Y., Yin, X., Li, F., Wang, L., and Wang, W.: Estimating ground-level PM2.5 concentrations using two-stage model in Beijing-Tianjin-Hebei, China, Atmos. Pollut. Res., 12, 101154, https://doi.org/10.1016/j.apr.2021.101154, 2021. a
Gutowski, W. J., Ullrich, P. A., Hall, A., Leung, L. R., O’Brien, T. A., Patricola, C. M., Arritt, R., Bukovsky, M., Calvin, K. V., Feng, Z., Jones, A. D., Kooperman, G. J., Monier, E., Pritchard, M. S., Pryor, S. C., Qian, Y., Rhoades, A. M., Roberts, A. F., Sakaguchi, K., Urban, N., and Zarzycki, C.: The ongoing need for high-resolution regional climate models: Process understanding and stakeholder information, B. Am. Meteorol. Soc., 101, E664–E683, 2020. a
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.: Array programming with NumPy, Nature, 585, 357–362, https://doi.org/10.1038/s41586-020-2649-2, 2020. a
Henao, J. J., Mejía, J. F., Rendón, A. M., and Salazar, J. F.: Sub-kilometer dispersion simulation of a CO tracer for an inter-Andean urban valley, Atmos. Pollut. Res., 11, 928–945, 2020. a
Hernandez, A. J., Morales-Rincon, L. A., Wu, D., Mallia, D., Lin, J. C., and Jimenez, R.: Transboundary transport of biomass burning aerosols and photochemical pollution in the Orinoco River Basin, Atmos. Environ., 205, 1–8, https://doi.org/10.1016/j.atmosenv.2019.01.051, 2019. a
Hernández, K. S., Henao, J. J., and Rendón, A. M.: Dispersion simulations in an Andean city: Role of continuous traffic data in the spatio-temporal distribution of traffic emissions, Atmos. Pollut. Res., 13, 101361, https://doi.org/10.1016/j.apr.2022.101361, 2022. a
Herrera-Mejía, L. and Hoyos, C. D.: Characterization of the atmospheric boundary layer in a narrow tropical valley using remote-sensing and radiosonde observations and the WRF model: the Aburrá Valley case-study, Q. J. Roy. Meteor. Soc., 145, 2641–2665, https://doi.org/10.1002/qj.3583, 2019. a, b, c, d, e, f
Hoyos, C. D., Herrera-Mejía, L., Roldán-Henao, N., and Isaza, A.: Effects of fireworks on particulate matter concentration in a narrow valley: the case of the Medellín metropolitan area, Environ. Monit. Assess., 192, 6, https://doi.org/10.1007/s10661-019-7838-9, 2020. a, b
Hunter, J. D.: Matplotlib: A 2D graphics environment, Comput. Sci. Eng., 9, 90–95, https://doi.org/10.1109/MCSE.2007.55, 2007. a
Inness, A., Ades, M., Agustí-Panareda, A., Barré, J., Benedictow, A., Blechschmidt, A.-M., Dominguez, J. J., Engelen, R., Eskes, H., Flemming, J., Huijnen, V., Jones, L., Kipling, Z., Massart, S., Parrington, M., Peuch, V.-H., Razinger, M., Remy, S., Schulz, M., and Suttie, M.: The CAMS reanalysis of atmospheric composition, Atmos. Chem. Phys., 19, 3515–3556, https://doi.org/10.5194/acp-19-3515-2019, 2019. a
Isaza Uribe, A.: Evaluación de la variabilidad temporal de la estructura termodinámica de la atmósfera y su influencia en las concentraciones de material particulado dentro del Valle de Aburrá, Escuela de Geociencias y Medio Ambiente, Master's thesis, Collections: Maestría en Ingeniería – Recursos Hidráulicos [171], Universidad Nacional de Colombia, Medellín, https://repositorio.unal.edu.co/handle/unal/69429 (last access: 19 December 2023), 2020. a
Pérez-Carrasquilla, J. S.: jhayron-perez/ForecastPM2.5-SIATA: ForecastPM2.5-SIATA (v1.0.0), Zenodo [code], https://doi.org/10.5281/zenodo.10383573, 2023. a
Justice, C., Giglio, L., Korontzi, S., Owens, J., Morisette, J., Roy, D., Descloitres, J., Alleaume, S., Petitcolin, F., and Kaufman, Y.: The MODIS fire products, Remote Sens. Environ., 83, 244–262, 2002. a
Ke, H., Gong, S., He, J., Zhang, L., Cui, B., Wang, Y., Mo, J., Zhou, Y., and Zhang, H.: Development and application of an automated air quality forecasting system based on machine learning, Sci. Total Environ., 806, 151204, https://doi.org/10.1016/j.scitotenv.2021.151204, 2022. a, b
Lee, M., Lin, L., Chen, C.-Y., Tsao, Y., Yao, T.-H., Fei, M.-H., and Fang, S.-H.: Forecasting air quality in Taiwan by using machine learning, Scientific Reports, 10, 1–13, https://doi.org/10.1038/s41598-020-61151-7, 2020. a
Lelieveld, J., Evans, J. S., Fnais, M., Giannadaki, D., and Pozzer, A.: The contribution of outdoor air pollution sources to premature mortality on a global scale, Nature, 525, 367–371, 2015. a
Lepeule, J., Laden, F., Dockery, D., and Schwartz, J.: Chronic exposure to fine particles and mortality: an extended follow-up of the Harvard Six Cities study from 1974 to 2009, Environ. Health Persp., 120, 965–970, 2012. a
Lewis, T. C., Robins, T. G., Dvonch, J. T., Keeler, G. J., Yip, F. Y., Mentz, G. B., Lin, X., Parker, E. A., Israel, B. A., Gonzalez, L., and Hill, Y.: Air pollution–associated changes in lung function among asthmatic children in Detroit, Environ. Health Persp., 113, 1068–1075, 2005. a
Lin, C.-Y., Chang, Y.-S., and Abimannan, S.: Ensemble multifeatured deep learning models for air quality forecasting, Atmos. Pollut. Res., 12, 101045, https://doi.org/10.1016/j.apr.2021.03.008, 2021. a
Loecher, M.: Unbiased variable importance for random forests, Communications in Statistics – Theory and Methods, 51, 1413–1425, 2022. a
Lorenz, E. N.: Three approaches to atmospheric predictability, B. Am. Meteorol. Soc, 50, 345–349, 1969. a
Lundberg, S. M., Erion, G. G., and Lee, S.-I.: Consistent individualized feature attribution for tree ensembles, arXiv [preprint], https://doi.org/10.48550/arXiv.1802.03888, 2018. a
Lv, L., Wei, P., Li, J., and Hu, J.: Application of machine learning algorithms to improve numerical simulation prediction of PM2.5 and chemical components, Atmos. Pollut. Res., 12, 101211, https://doi.org/10.1016/j.apr.2021.101211, 2021. a, b
Mabahwi, N. A. B., Leh, O. L. H., and Omar, D.: Human health and wellbeing: Human health effect of air pollution, Procedia – Social and Behavioral Sciences, 153, 221–229, 2014. a
McDonald, G. C.: Ridge regression, Wiley Interdisciplinary Reviews: Computational Statistics, 1, 93–100, 2009. a
Mendez-Espinosa, J., Belalcazar, L., and Betancourt, R. M.: Regional air quality impact of northern South America biomass burning emissions, Atmos. Environ., 203, 131–140, 2019. a
Meyer, H. and Pebesma, E.: Predicting into unknown space? Estimating the area of applicability of spatial prediction models, Methods Ecol. Evol., 12, 1620–1633, 2021. a
National Centers for Environmental Prediction, National Weather Service, NOAA, U.S. Department of Commerce: NCEP GFS 0.25 Degree Global Forecast Grids Historical Archive, NCAR [data set], https://doi.org/10.5065/D65D8PWK, 2015. a
Orru, H., Maasikmets, M., Lai, T., Tamm, T., Kaasik, M., Kimmel, V., Orru, K., Merisalu, E., and Forsberg, B.: Health impacts of particulate matter in five major Estonian towns: main sources of exposure and local differences, Air Quality, Atmosphere & Health, 4, 247–258, 2011. a
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.: Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2825–2830, 2011. a, b
Perez, P. and Gramsch, E.: Forecasting hourly PM2.5 in Santiago de Chile with emphasis on night episodes, Atmos. Environ., 124, 22–27, 2016. a
Pérez-Carrasquilla, J. S.: Forecasting 24-hour-averaged PM2.5 concentration in the Aburrá Valley using tree-based ML models, global forecasts, and satellite information: Dataset, Zenodo [data set], https://doi.org/10.5281/zenodo.7091239, 2022. a
Perišić, M., Maletić, D., Stojić, S. S., Rajšić, S., and Stojić, A.: Forecasting hourly particulate matter concentrations based on the advanced multivariate methods, Int. J. Environ. Sci. Te., 14, 1047–1054, 2017. a
Posada-Marín, J. A., Rendón, A. M., Salazar, J. F., Mejía, J. F., and Villegas, J. C.: WRF downscaling improves ERA-Interim representation of precipitation around a tropical Andean valley during El Niño: implications for GCM-scale simulation of precipitation over complex terrain, Clim. Dynam., 52, 3609–3629, 2019. a
Quinlan, J. R.: Induction of decision trees, Machine Learning, 1, 81–106, 1986. a
Rincón-Riveros, J. M., Rincón-Caro, M. A., Sullivan, A. P., Mendez-Espinosa, J. F., Belalcazar, L. C., Quirama Aguilar, M., and Morales Betancourt, R.: Long-term brown carbon and smoke tracer observations in Bogotá, Colombia: association with medium-range transport of biomass burning plumes, Atmos. Chem. Phys., 20, 7459–7472, https://doi.org/10.5194/acp-20-7459-2020, 2020. a
Rodriguez-Gomez, C., Echeverry, G., Jaramillo, A., and Ladino, L. A.: The negative impact of biomass burning and the Orinoco low-level jet on the air quality of the Orinoco River basin, edited by: Grutter, M., Atmósfera, 35, 497–520, https://doi.org/10.20937/atm.52979, 2022. a, b
Samet, J. M., Dominici, F., Curriero, F. C., Coursac, I., and Zeger, S. L.: Fine particulate air pollution and mortality in 20 US cities, 1987–1994, New Engl. J. Med., 343, 1742–1749, 2000. a
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., and Vlahavas, I.: Multi-target regression via input space expansion: treating targets as inputs, Machine Learning, 104, 55–98, 2016. a
Steininger, M., Kobs, K., Davidson, P., Krause, A., and Hotho, A.: Density-based weighting for imbalanced regression, Machine Learning, 110, 2187–2211, 2021. a
Tao, Q., Li, Z., Xu, J., Xie, N., Wang, S., and Suykens, J. A.: Learning with continuous piecewise linear decision trees, Expert Syst. Appl., 168, 114214, https://doi.org/10.1016/j.eswa.2020.114214, 2021. a
Tian, J. and Chen, D.: A semi-empirical model for predicting hourly ground-level fine particulate matter (PM2.5) concentration in southern Ontario from satellite remote sensing and ground-based meteorological measurements, Remote Sens. Environ., 114, 221–229, 2010. a
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and SciPy 1.0 Contributors: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, Nat. Methods, 17, 261–272, https://doi.org/10.1038/s41592-019-0686-2, 2020. a
Xing, Y.-F., Xu, Y.-H., Shi, M.-H., and Lian, Y.-X.: The impact of PM2.5 on the human respiratory system, J. Thorac. Dis., 8, E69–E74, https://doi.org/10.3978/j.issn.2072-1439.2016.01.19, 2016. a
Xu, X., Tong, T., Zhang, W., and Meng, L.: Fine-grained prediction of PM2.5 concentration based on multisource data and deep learning, Atmos. Pollut. Res., 11, 1728–1737, 2020. a
Yang, G., Lee, H., and Lee, G.: A hybrid deep learning model to forecast particulate matter concentration levels in Seoul, South Korea, Atmosphere, 11, 348, https://doi.org/10.3390/atmos11040348, 2020. a
Yang, J., Yan, R., Nong, M., Liao, J., Li, F., and Sun, W.: PM2.5 concentrations forecasting in Beijing through deep learning with different inputs, model structures and forecast time, Atmos. Pollut. Res., 12, 101168, https://doi.org/10.1016/j.apr.2021.101168, 2021. a
Zhang, T., He, W., Zheng, H., Cui, Y., Song, H., and Fu, S.: Satellite-based ground PM2.5 estimation using a gradient boosting decision tree, Chemosphere, 268, 128801, https://doi.org/10.1016/j.chemosphere.2020.128801, 2021. a
Zhang, X., Sun, J., Wang, Y., Li, W., Zhang, Q., Wang, W., Quan, J., Cao, G., Wang, J., Yang, Y., and Zhang, Y.: Factors contributing to haze and fog in China, Chinese Sci. Bull., 58, 1178–1187, 2013. a
Short summary
This study uses tree-based machine learning (ML) to forecast PM2.5 in a complex terrain region. The models show the potential to predict pollution events with several hours of anticipation, and they integrate multiple sources of information, including in situ stations, satellite data, and deterministic model outputs. The importance analysis helps understand the processes affecting air quality in the region and highlights the relevance of external sources of pollution in PM2.5 predictability.
This study uses tree-based machine learning (ML) to forecast PM2.5 in a complex terrain region....