Forecasting the Air Quality Index Using Machine Learning Models, Bayesian Optimization, and the Development of the S-GBR Model Incorporating Seasonal Variables

Authors

    Mahtab Mahbodi MA Student, Artificial Intelligence and Robotics Department, Qa.C., Islamic Azad University, Qazvin, Iran
    Babak Karasfi * Department of Computer Engineering and Information Technology, Qa.C., Islamic Azad University, Qazvin, Iran karasfi@qiau.ac.ir

Keywords:

Air Quality Index forecasting, machine learning, hyperparameter optimization, Bayesian search, seasonal features, S-GBR model

Abstract

Air pollution is considered one of the most serious environmental and public health challenges in urban communities, and accurately forecasting the Air Quality Index (AQI) plays a crucial role in mitigating its negative impacts and supporting data-driven decision-making. Given the complexity and nonlinear nature of factors influencing air quality, the use of machine learning methods has attracted widespread attention in recent years. However, a review of previous studies reveals two major shortcomings: first, many models have been implemented based on default hyperparameter values, which has led to reduced accuracy and generalizability; second, temporal and seasonal components have often been overlooked, even though they play a decisive role in variations in air quality. To address these shortcomings, this study proposes a novel framework called the Seasonal Gradient Boosting Regressor (S-GBR). In this model, the Bayesian optimization search method is used for hyperparameter optimization, and the seasonal feature is incorporated as an input to the Gradient Boosting Regressor algorithm. In addition, baseline models such as Random Forest and XGBoost were also simulated and compared to determine the standing of the proposed model. Empirical findings show that the proposed model achieved a coefficient of determination of 0.9686 and significantly reduced errors, performing almost as well as the most accurate baseline model (Random Forest with 0.9796) while outperforming XGBoost. These results demonstrate that combining Bayesian optimization with the inclusion of seasonal components can raise prediction accuracy to the level of rich and complex datasets, even under limited data conditions. Such an achievement highlights the high potential of the proposed model for use in practical air quality monitoring and management.

References

Aram, S., Nketiah, E., Saalidong, B., Wang, Afitiri, A.-R., Akoto, A., & Osei Lartey, P. (2023). Machine learning-based prediction of air quality index and air quality grade: a comparative analysis. International Journal of Environmental Science and Technology. https://doi.org/10.1007/s13762-023-05016-2

Beheshtifar, S., & Rahimzad, M. (2018). Forecasting PM10 Concentration in Tehran Using Neural Network and MODIS Sensor Images. 4th International Conference on Environmental Engineering with a Focus on Sustainable Development, Tehran.

Brahmi, N., Meftah, L. H., & Chaabene, M. (2023). Machine Learning-Based Wind Speed Prediction: A Study on Gradient Boosting Regressor Algorithm. 14th International Renewable Energy Congress (IREC), Sousse, Tunisia.

Castelli, M., Clemente, F. M., Popovič, A., Silva, S., & Vanneschi, L. (2020). A Machine Learning Approach to Predict Air Quality in California. Complexity, 2020, 8049504:8049501-8049504:8049523.

Danesh Yazdi, M., Kuang, Z., Dimakopoulou, K., Barratt, B., Suel, E., Amini, H., Lyapustin, A., Katsouyanni, K., & Schwartz, J. (2020). Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods. Remote Sensing, 12(6), 914. https://doi.org/10.3390/rs12060914

Du, S., Li, T., Yang, Y., & Horng, S. J. (2021). Deep Air Quality Forecasting Using Hybrid Deep Learning Framework. IEEE Transactions on Knowledge and Data Engineering, 33(6), 2412-2424. https://doi.org/10.1109/TKDE.2019.2954510

Farhadi, R., Hadavifar, M., Moeinoddini, M., & Amin Toosi, M. (2020). Forecasting Concentration of Tehran Air Pollutants Using Artificial Neural Network and Linear Regression. Journal of Natural Environment, 73(1), 115-127. https://www.magiran.com/paper/2112145

Ganesh, N., Jain, P., Choudhury, A., Dutta, P., Kalita, K., & Barsocchi, P. (2021). Random forest regression-based machine learning model for accurate estimation of fluid flow in curved pipes. Processes, 9, 2095. https://doi.org/10.3390/pr9112095

Goudarzi, G., Maleki, H., Yazdani, M., Hashemi, F., Ghaedrahmat, Z., & Bably, Z. (2020). Forecasting Air Pollution Using Neural Network Model. 8th National Conference on Air and Noise Pollution Management, Tehran.

Gupta, R., & Singla, P. (2023). Prediction of AQI using hybrid approach in machine learning. ICTACT Journal on Soft Computing, 13, 2917-2921. https://doi.org/10.21917/ijsc.2023.0412

Gupta, S., Mohta, Y., Heda, K., Armaan, R., Valarmathi, B., & Ganeshan, A. (2023). Prediction of Air Quality Index Using Machine Learning Techniques: A Comparative Analysis. Journal of Environmental and Public Health, 2023, 1-26. https://doi.org/10.1155/2023/4916267

Haq, M. A. (2022). SMOTEDNN: A Novel Model for Air Pollution Forecasting and AQI Classification. Computational Materials & Continua, 71(1), 1403-1425. https://doi.org/10.32604/cmc.2022.021968

Haqbian, S., Momeni, M., & Tashayyo, B. (2023). Forecasting Air Pollution Using Machine Learning Method. 20th National Conference on Civil Engineering, Architecture, and Urban Development, Babol.

Hardini, M., Sunarjo, R. A., Asfi, M., Chakim, M. H. R., & Sanjaya, Y. P. A. (2023). Predicting Air Quality Index using Ensemble Machine Learning. ADI Journal on Recent Innovation, 5(1Sp), 78-86.

Just, A. C., Arfer, K. B., Rush, J., Dorman, M., Shtein, A., Lyapustin, A., & Kloog, I. (2020). Advancing methodologies for applying machine learning and evaluating spatiotemporal models of fine particulate matter (PM2.5) using satellite data over large regions. Atmospheric Environment, 239, 117649. https://doi.org/10.1016/j.atmosenv.2020.117649

Kalantari, E., Gholami, H., Malakooti, H., Nafarzadegan, A. R., & Moosavi, V. (2024). Machine learning for air quality index (AQI) forecasting: shallow learning or deep learning? Environmental Science and Pollution Research, 31, 62962-62982. https://doi.org/10.1007/s11356-024-35404-1

Karami, P., Eslaminejad, S. A., Eftekhari, M., Boroumand, F., & Akbari, M. (2023). Developing Machine Learning Algorithms to Forecast Urban Air Quality Index (Case Study: Tehran). Geography and Environmental Hazards, 12(2), 165-186. https://doi.org/10.22067/geoeh.2022.76121.1212

Kaur, M., Singh, D., Jabarulla, M. Y., & et al. (2023). Computational deep air quality prediction techniques: a systematic review. Artificial Intelligence Review, 56(Suppl 2), 2053-2098. https://doi.org/10.1007/s10462-023-10570-9

Kothandaraman, D., Praveena, N., Varadarajkumar, K., & et al. (2022). Intelligent Forecasting of Air Quality and Pollution Prediction Using Machine Learning. Adsorption Science & Technology. https://doi.org/10.1155/2022/5086622

Liu, R., Ma, Z., Liu, Y., Shao, Y., Zhao, W., & Bi, J. (2020). Spatiotemporal distributions of surface ozone levels in China from 2005 to 2017: A machine learning approach. Environment International, 142, 105823. https://doi.org/10.1016/j.envint.2020.105823

Mahesh, T. R., Vinoth Kumar, V., Muthukumaran, V., Shashikala, H. K., Swapna, B., & Guluwadi, S. (2022). Performance analysis of XGBoost ensemble methods for survivability with the classification of breast cancer. Journal of Sensors. https://doi.org/10.1155/2022/4649510

Mishra, S., Mishra, D., & Santra, G. H. (2020). Adaptive boosting of weak regressors for forecasting of crop production considering climatic variability: an empirical assessment. Journal of King Saud University - Computer and Information Sciences, 32, 949-964. https://doi.org/10.1016/j.jksuci.2017.12.004

Natarajan, S. K., Shanmurthy, P., & Arockiam, D. (2024). Optimized machine learning model for air quality index prediction in major cities in India. Scientific reports, 14, 6795.

Omidvar, S., Alavi, C., Bemani, A., & Mahdavi, A. (2018). Comparison of CO2 Concentration Forecasting Models Using Univariate and Multivariate Regression. 3rd National Conference on Agricultural Sciences, Natural Resources and Environment of Iran, Tehran.

Ragab, M., Jadid Abdulkadir, S., Aziz, N., Al-Tashi, Q., Alyousifi, Y., Alhussian, H., & Alqushaibi, A. (2020). A Novel One-Dimensional CNN with Exponential Adaptive Gradients for Air Pollution Index Prediction. Sustainability, 12, 10090. https://doi.org/10.3390/su122310090

Ravindiran, G., Hayder, G., Kanagarathinam, K., Alagumalai, A., & Sonne, C. (2023). Air quality prediction by machine learning models: A predictive study on the Indian coastal city of Visakhapatnam. Chemosphere, 338, 139518. https://doi.org/10.1016/j.chemosphere.2023.139518

Sharma, M., Jain, S., Mittal, S., & Sheakh, T. (2021). Forecasting And Prediction Of Air Pollutants Concentrates Using Machine Learning Techniques: The Case Of India. IOP Conference Series: Materials Science and Engineering,

Shayegan, M., & Makram, M. (2023). Investigation of Air Pollution During and Before COVID-19 in the Metropolises of Tehran, Isfahan and Qom. Iranian Journal of Remote Sensing & GIS, 15(2), 101-116. https://doi.org/10.48308/gisj.2023.103607

Wu, Y., Qian, C., & Huang, H. (2024). Enhanced Air Quality Prediction Using a Coupled DVMD Informer-CNN-LSTM Model Optimized with Dung Beetle Algorithm. Entropy, 26(4), 534. https://www.mdpi.com/1099-4300/26/4/534

Xu, R., Deng, X., Wan, H., Cai, Y., & Pan, X. (2021). A deep learning method to repair atmospheric environmental quality data based on Gaussian diffusion. Journal of Cleaner Production, 308, 127446. https://doi.org/10.1016/j.jclepro.2021.127446

Zhang, Y., Zhao, Z., & Zheng, J. (2020). CatBoost: a new approach for estimating daily reference crop evapotranspiration in arid and semi-arid regions of Northern China. Journal of Hydrology, 588, 125087. https://doi.org/10.1016/j.jhydrol.2020.125087

Zhou, Y., Wang, W., Wang, K., & Song, J. (2022). Application of LightGBM algorithm in the initial design of a library in the cold area of China based on comprehensive performance. Buildings, 12, 1309. https://doi.org/10.3390/buildings12091309

Downloads

Published

2025-10-15

Submitted

2024-06-01

Revised

2024-08-01

Accepted

2024-08-07

Issue

Section

Articles

How to Cite

Mahbodi, M. ., & Karasfi, B. (2025). Forecasting the Air Quality Index Using Machine Learning Models, Bayesian Optimization, and the Development of the S-GBR Model Incorporating Seasonal Variables. Journal of Resource Management and Decision Engineering, 1-16. https://journalrmde.com/index.php/jrmde/article/view/147

Similar Articles

1-10 of 81

You may also start an advanced similarity search for this article.