Decoding Pollution: A Federated Learning-Based Pollution Prediction Study with Health Ramifications Using Causal Inferences
Abstract
:1. Introduction
- I.
- The main novelty of the proposed work is the introduction of a federated learning framework for pollution detection and analysis. The proposed federated framework uses reward-based client selection and initially aggregates locally before using the FedProx aggregation method for global aggregation.
- II.
- The proposed framework uses the VGG-19 model for image-based pollution prediction. The model uses the LIME tool for better explanation and causal inference for impact analysis of the predicted PM2.5 and PM10.
- III.
- The datasets utilized for the implementation were collected from ITO-Delhi, Knowledge Park-III Greater Noida, Oragadam-Tamil Nadu, New town—Faridabad, some locations from Nagaland, and Mumbai. Additionally, some datasets were collected from Biratnagar—Nepal, and some images from Beijing. All in all, this work utilizes a vast dataset for the prediction of AQI, PM2.5, and PM10.
- IV.
- The federated learning framework was found to attain an accuracy of 92.33%, and the causal inference-based impact analysis was found to produce an accuracy of 84% for training and 72% for testing with respect to PM2.5, and an accuracy of 79% for training and 74% for testing with respect to PM10. Hence, the proposed federated learning method was seen to produce better accuracy than previous state-of-the-art studies.
2. Related Work
3. Federated Learning-Based Framework for Health Assessment Using Pollution Detection
3.1. Federated Learning Framework
3.1.1. Client Selection and Aggregation of Clients
Algorithm 1 Client Selection Algorithm |
Input: Status of Clients, Reward Status; Output: Selecting the clients for participation 1. Initialize the Model [S = 0, R = 0] //S = Set of Clients, R = Rewards of Clients 2. S = ++; R = ++; // Status information of Clients 3. for and ranges to do [Threshold value for Clients and Rewards] 4. updates (++) 5. updates (++) 6. End for 7. Analysis and setting of the threshold value of participation 8. = ( is the Qualifies Clients) 9. for i in ranges to do 10. if ( and > threshold Value) then 11. = marked Q (Q-Qualified Devices) 12. End if 13. End for 14. Qualified Devices for Participation 15. = Best Q Devices 17. = Best (0: f*n) 18. Return to and 19. End |
3.1.2. Aggregation
3.2. Pollution Prediction Model
3.3. Pollution and Impact Analysis Model
- i.
- The federated learning infrastructure uses client selection and local and global aggregation models.
- ii.
- The PM2.5, PM10, and AQI are predicted using VGG19, and the prediction model is explained using (AXI) using LIME.
- iii.
- The correlation model is created and combined using the impact analysis model.
- iv.
- The causal inference model is used to analyze the impact of pollution.
- v.
- In causal inference, PSM is used to map the outcome.
Algorithm 2 Federated learning-based pollution prediction and impact analysis with causal inference. |
Input: Client Data, Global Model, Factors influencing pollution, Outcome variable using PSM Output: Predicted Values of PM2.5, PM10 and AQI, Explanation Using LIME, Correlation of Pollution Algorithm Begin: { Create the FL Infrastructure Identify the participants of clients updated model = global model //Clients initialized with updated global model } // Client selection Process and aggregation { if client update reward > previous reward: //Compare client updates to previous rewards communication status = True //Start communication for aggregation else: = False } { Initialized to aggregation If { if aggregation result == “positive”: global model = updated model // Update global model prediction model = global model // Update prediction model } then for client in participants: client.model = updated model } then { Updated model is transferred to the Clients } end |
4. Results and Discussion
4.1. Explainability of Model Predictions
4.2. Impact and Health Analysis of PM2.5 and PM10 Model
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Messan, S.; Shahud, A.; Anis, A.; Kalam, R.; Ali, S.; Aslam, M.I. Air-MIT: Air Quality Monitoring Using Internet of Things. Eng. Proc. 2022, 20, 45. [Google Scholar] [CrossRef]
- Beriwal, S.; John, A. A review of Various Techniques for Forecasting Pollution and Air Quality Indexing. In Proceedings of the 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 11–13 November 2021; pp. 1680–1686. [Google Scholar] [CrossRef]
- Gilik, A.; Ogrenci, A.S.; Ozmen, A. Air quality prediction using CNN+LSTM-based hybrid deep learning architecture. Environ. Sci. Pollut. Res. 2022, 29, 11920–11938. [Google Scholar] [CrossRef]
- Xing, Y.-F.; Xu, Y.-H.; Shi, M.-H.; Lian, Y.-X. The impact of PM2.5 on the human respiratory system. J. Thorac. Dis. 2016, 8, E69. [Google Scholar] [PubMed]
- Available online: https://www.marlborough.govt.nz/environment/air-quality/smoke-and-smog/health-effects-of-pm10 (accessed on 14 January 2025).
- Siddique, S.; Ray, M.R.; Lahiri, T. Effects of air pollution on the respiratory health of children: A study in the capital city of India. Air Qual. Atmosphere Health 2011, 4, 95–102. [Google Scholar] [CrossRef]
- Utomo, S.; John, A.; Rouniyar, A.; Hsu, H.-C.; Hsiung, P.-A. Federated Trustworthy AI Architecture for Smart Cities. In Proceedings of the 2022 IEEE International Smart Cities Conference (ISC2), Pafos, Cyprus, 26–29 September 2022; pp. 1–7. [Google Scholar] [CrossRef]
- Utomo, S.; John, A.; Pratap, A.; Jiang, Z.-S.; Karthikeyan, P.; Hsiung, P.-A. AIX Implementation in Image-Based PM2.5 Estimation: Toward an AI Model for Better Understanding. In Proceedings of the 2023 15th International Conference on Knowledge and Smart Technology (KST), Phuket, Thailand, 21–24 February 2023; pp. 1–6. [Google Scholar] [CrossRef]
- McGovern, A.; Ebert-Uphoff, I.; Gagne, D.J.; Bostrom, A. Why we need to focus on developing ethical, responsible, and trustworthy artificial intelligence approaches for environmental science. Environ. Data Sci. 2022, 1, e6. [Google Scholar] [CrossRef]
- Yan, R.; Liao, J.; Yang, J.; Sun, W.; Nong, M.; Li, F. Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Syst. Appl. 2021, 169, 114513. [Google Scholar] [CrossRef]
- Du, S.; Li, T.; Yang, Y.; Horng, S.J. Deep Air Quality Forecasting Using Hybrid Deep Learning Framework. IEEE Trans. Knowl. Data Eng. 2019, 33, 2412–2424. [Google Scholar] [CrossRef]
- Zhao, Z.; Qin, J.; He, Z.; Li, H.; Yang, Y.; Zhang, R. Combining forward with recurrent neural networks for hourly air quality prediction in Northwest of China. Environ. Sci. Pollut. Res. 2020, 27, 28931–28948. [Google Scholar] [CrossRef] [PubMed]
- Samal, K.K.R.; Babu, K.S.; Das, S.K. Time Series Forecasting of Air Pollution using Deep Neural Network with Multi-output Learning. In Proceedings of the 2021 IEEE 18th India Council International Conference (INDICON), Guwahati, India, 19–21 December 2021. [Google Scholar]
- Zou, G.; Zhang, B.; Yong, R.; Qin, D.; Zhao, Q. FDN-learning: Urban PM2.5-concentration Spatial Correlation Prediction Model Based on Fusion Deep Neural Network. Big Data Res. 2021, 26, 100269. [Google Scholar] [CrossRef]
- Zhou, Y.; Chang, F.-J.; Chang, L.-C.; Kao, I.-F.; Wang, Y.-S. Explore a deep learning multi-output neural network for regional multi-step-ahead air quality forecasts. J. Clean. Prod. 2018, 209, 134–145. [Google Scholar] [CrossRef]
- Beriwal, S.; A, J.; K, S.K. Spatial and Temporal based Pollution Forecasting using Hybrid Model. In Proceedings of the 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 9–11 May 2022; pp. 991–998. [Google Scholar] [CrossRef]
- Huang, Y.; Ying, J.J.-C.; Tseng, V.S. Spatio-attention embedded recurrent neural network for air quality prediction. Knowledge-Based Syst. 2021, 233, 107416. [Google Scholar] [CrossRef]
- Zhang, K.; Thé, J.; Xie, G.; Yu, H. Multi-step ahead forecasting of regional air quality using spatial-temporal deep neural networks: A case study of Huaihai Economic Zone. J. Clean. Prod. 2020, 277, 123231. [Google Scholar] [CrossRef]
- Zhang, Q.; Han, Y.; Li, V.O.K.; Lam, J.C.K. Deep-AIR: A Hybrid CNN-LSTM Framework for Fine-Grained Air Pollution Estimation and Forecast in Metropolitan Cities. IEEE Access 2022, 10, 55818–55841. [Google Scholar] [CrossRef]
- Wang, Q.; Liu, Y.; Pan, X. Atmosphere pollutants and mortality rate of respiratory diseases in Beijing. Sci. Total Environ. 2008, 391, 143–148. [Google Scholar] [CrossRef]
- Abe, K.C.; Miraglia, S.G.E.K. Health impact assessment of air pollution in São Paulo, Brazil. Int. J. Environ. Res. Public Health 2016, 13, 694. [Google Scholar] [CrossRef] [PubMed]
- Olstrup, H. An Air Quality Health Index (AQHI) with Different Health Outcomes Based on the Air Pollution Concentrations in Stockholm during the Period of 2015–2017. Atmosphere 2020, 11, 192. [Google Scholar] [CrossRef]
- Xu, K.; Cui, K.; Young, L.-H.; Wang, Y.-F.; Hsieh, Y.-K.; Wan, S.; Zhang, J. Air Quality Index, Indicatory Air Pollutants and Impact of COVID-19 Event on the Air Quality near Central China. Aerosol Air Qual. Res. 2020, 20, 1204–1221. [Google Scholar] [CrossRef]
- Abelsohn, A.; Stieb, D.M. Health effects of outdoor air pollution: Approach to counseling patients using the Air Quality Health Index. Can. Fam. Physician 2011, 57, 881–887. [Google Scholar]
- Air Quality, Health Impacts and Burden of Disease Due to Air Pollution (PM10, PM2.5, NO2 and O3): Application of AirQ+ Model to the Camp de Tarragona County (Catalonia, Spain). Available online: https://europepmc.org/article/med/31759725 (accessed on 15 January 2025).
- Jalili, M.; Ehrampoush, M.H.; Mokhtari, M.; Ebrahimi, A.A.; Mazidi, F.; Abbasi, F.; Karimi, H. Ambient air pollution and cardiovascular disease rate an ANN modeling: Yazd-Central of Iran. Sci. Rep. 2021, 11, 16937. [Google Scholar] [CrossRef] [PubMed]
- Available online: http://aphekom.org/web/aphekom.org/home (accessed on 15 January 2025).
- Fei, Z.; Ryeznik, Y.; Sverdlov, A.; Tan, C.W.; Wong, W.K. An overview of healthcare data analytics with applications to the COVID-19 pandemic. IEEE Trans. Big Data 2021, 8, 1463–1480. [Google Scholar] [CrossRef]
- Li, L.; Fan, Y.; Tse, M.; Lin, K. A review of applications in federated learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
- Niknam, S.; Dhillon, H.S.; Reed, J.H. Federated learning for wireless communications: Motivation, opportunities, and challenges. IEEE Commun. Mag. 2020, 58, 46–51. [Google Scholar] [CrossRef]
- Jiang, J.C.; Kantarci, B.; Oktug, S.; Soyata, T. Federated Learning in Smart City Sensing: Challenges and Opportunities. Sensors 2020, 20, 6230. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, D.-V.; Zettsu, K. Spatially-distributed Federated Learning of Convolutional Recurrent Neural Networks for Air Pollution Prediction. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 3601–3608. [Google Scholar] [CrossRef]
- Abimannan, S.; A, J.; Shukla, S.; Satheesh, D. Federated Learning for Improved Air Pollution Prediction: A Combined LSTM-SVR Approach. In Proceedings of the 2023 IEEE 4th Annual Flagship India Council International Subsections Conference (INDISCON), Mysore, India, 5–7 August 2023; pp. 1–7. [Google Scholar] [CrossRef]
- Neo, E.X.; Hasikin, K.; Mokhtar, M.I.; Lai, K.W.; Azizan, M.M.; Razak, S.A.; Hizaddin, H.F. Towards Integrated Air Pollution Monitoring and Health Impact Assessment Using Federated Learning: A Systematic Review. Front. Public Health 2022, 10, 851553. [Google Scholar] [CrossRef]
- Smuha, N.A. The EU Approach to Ethics Guidelines for Trustworthy Artificial Intelligence. Comput. Law Rev. Int. 2019, 20, 97–106. [Google Scholar] [CrossRef]
- Liu, H.; Wang, Y.; Fan, W.; Liu, X.; Li, Y.; Jain, S.; Tang, J. Trustworthy ai: A computational perspective. ACM Trans. Intell. Syst. Technol. 2022, 14, 1–59. [Google Scholar] [CrossRef]
- Ho, C.W.L.; Ali, J.; Caals, K. Ensuring trustworthy use of artificial intelligence and big data analytics in health insurance. Bull. World Health Organ. 2020, 98, 263. [Google Scholar] [CrossRef]
- Putra, M.A.P.; Karna, N.; Alief, R.N.; Zainudin, A.; Kim, D.-S.; Lee, J.-M.; Sampedro, G.A. PureFed: An Efficient Collaborative and Trustworthy Federated Learning Framework Based on Blockchain Network. IEEE Access 2024, 1, 82413–82426. [Google Scholar] [CrossRef]
- Lee, W. Reward-based participant selection for improving federated reinforcement learning. ICT Express 2022, 9, 803–808. [Google Scholar] [CrossRef]
- Rouniyar, A.; Utomo, S.; John, A.; Hsiung, P.A. Air Pollution Image Dataset from India and Nepal. 2023. Available online: https://www.kaggle.com/datasets/adarshrouniyar/air-pollution-image-dataset-from-india-and-nepal (accessed on 15 January 2025).
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Ayeelyan, J.; Utomo, S.; Rouniyar, A.; Hsu, H.C.; Hsiung, P.A. Federated learning design and functional models: Survey. Artif. Intell. Rev. 2025, 58, 21. [Google Scholar] [CrossRef]
- Li, M. Using the propensity score method to estimate causal effects: A review and practical guide. Organ. Res. Methods 2013, 16, 188–226. [Google Scholar] [CrossRef]
- Liu, C.; Tsow, F.; Zou, Y.; Tao, N. Particle Pollution Estimation Based on Image Analysis. PLoS ONE 2016, 11, e0145955. [Google Scholar] [CrossRef] [PubMed]
- Bo, Q.; Yang, W.; Rijal, N.; Xie, Y.; Feng, J.; Zhang, J. Particle Pollution Estimation from Images Using Convolutional Neural Network and Weather Features. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 3433–3437. [Google Scholar]
- Wang, X.; Wang, M.; Liu, X.; Zhang, X.; Li, R. A PM2.5 concentration estimation method based on multi-feature combination of image patches. Environ. Res. 2022, 211, 113051. [Google Scholar] [CrossRef]
- Zhang, Q.; Fu, F.; Tian, R. A deep learning and image-based model for air quality estimation. Sci. Total. Environ. 2020, 724, 138178. [Google Scholar] [CrossRef] [PubMed]
- APCI. Available online: https://data.moenv.gov.tw/en/dataset/detail/aqx_p_488 (accessed on 10 December 2024).
- Zhang, Q.; Tian, L.; Fu, F.; Wu, H.; Wei, W.; Liu, X. Real-Time and Image-Based AQI Estimation Based on Deep Learning. Adv. Simul. 2022, 5, 2100628. [Google Scholar] [CrossRef]
- Kow, P.-Y.; Hsia, I.-W.; Chang, L.-C.; Chang, F.-J. Real-time imagebased air quality estimation by deep learning neural networks. J. Environ. Manag. 2022, 307, 114560. [Google Scholar] [CrossRef]
S. No. | Pollutants | Impact of Pollutants |
---|---|---|
1 | PM2.5 | Asthma, respiratory inflammation, jeopardized lung function, and lung cancers [4]. |
2 | PM10 | Asthma, bronchitis, high blood pressure, heart attacks, and strokes [5]. |
Participants | Affected by Respiratory Problem |
---|---|
11,628 (7757 Boys and 3871 Girls) School Children | 4536 (2950 Boys and 1586 Girls) [4] |
S. No | Hyperparameters of Model | Options |
---|---|---|
1 | Image Size | 224 × 224 |
2 | Batch Size | 16 |
3 | Epochs | 20 |
4 | Optimizer | Adam |
5 | Dropout | 0.5 |
6 | Filter Size | 64,128,256,512 |
7 | Global Server | 1 |
8 | Federated Client | 5 |
Parameters | Training Prediction | Testing Prediction |
---|---|---|
Epochs | 20 | 20 |
RMSE | 1.86 | 15.06 |
R2 | 0.87 | 0.83 |
F1-Score | 0.67 | 0.87 |
Accuracy | 0.92 | 0.84 |
Inference Time | 0.180 ms | 0.250 ms |
Reference Number and Year | R-Square | RMSE |
---|---|---|
Liu et al. (2016) [44] | 0.68 | 40.43 |
Bo et al. (2018) [45] | 0.60 | 56.03 |
Wang et al. (2022) [46] | 0.80 | 33.07 |
Sapdo Utomo et al. (2022) [8] | 0.0.83 | 30.10 |
Our Model without FL | 0.84 | 28.10 |
Our Model with FL | 0.85 | 25.02 |
Reference Number and Year | Dataset Details | Accuracy |
---|---|---|
Zhang and Y. Zou et al. (2020) [47] | NWNU-AQI [47] | 74.00 |
Zhang and Y. Xie et al. (2022) [49] | NWNU-AQI [47] | 75.15 |
Kow and X. Zhang et al. (2022) [50] | Linyuan [48] | 76.00 |
Sapdo Utomo al et al. (2022) [8] | Beijing [44] | 76.15 |
Our model without FL | Indian and Nepal | 85.15 |
Our Model with FL | Indian and Nepal | 92.22 |
Model | AQI | PM2.5 | PM10 |
---|---|---|---|
Model Training Accuracy | 167 | 142 | 145 |
Model Prediction without FL | 143 | 110 | 121 |
Model with FL | 150 | 123 | 128 |
Accuracy of Model without FL | 85.6 | 84.01 | 83.12 |
Accuracy of Model with FL | 92.2 | 91.0 | 93.12 |
Model | AQI | PM2.5 | PM10 |
---|---|---|---|
Knowledge Park-III-Greater Noida | 210 | 194 | 223 |
ITO-Delhi | 240 | 170 | 130 |
New Ind town—Faridabad | 230 | 188 | 190 |
Pollutants | Client-1 | Client-2 | Client-3 | Client-4 | Client-5 |
---|---|---|---|---|---|
AQI | 225 | 248 | 240 | 180 | 210 |
PM2.5 | 185 | 248 | 190 | 160 | 190 |
PM10 | 240 | 140 | 198 | 150 | 175 |
Pollutant | Common Impact | Impact on Children | Impact on Elderly |
---|---|---|---|
PM2.5 | Asthma exacerbations, bronchitis, reduced lung function, heart attacks | Asthma attacks, bronchitis, reduced immune efficiency | Heart attacks, arrhythmia, and reduced oxygen supply |
PM10 | Coughing, wheezing, cardiovascular stress | Coughing, wheezing, and asthma attacks, | Bronchitis symptoms and cardiovascular stress |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Beriwal, S.; Ayeelyan, J. Decoding Pollution: A Federated Learning-Based Pollution Prediction Study with Health Ramifications Using Causal Inferences. Electronics 2025, 14, 350. https://doi.org/10.3390/electronics14020350
Beriwal S, Ayeelyan J. Decoding Pollution: A Federated Learning-Based Pollution Prediction Study with Health Ramifications Using Causal Inferences. Electronics. 2025; 14(2):350. https://doi.org/10.3390/electronics14020350
Chicago/Turabian StyleBeriwal, Snehlata, and John Ayeelyan. 2025. "Decoding Pollution: A Federated Learning-Based Pollution Prediction Study with Health Ramifications Using Causal Inferences" Electronics 14, no. 2: 350. https://doi.org/10.3390/electronics14020350
APA StyleBeriwal, S., & Ayeelyan, J. (2025). Decoding Pollution: A Federated Learning-Based Pollution Prediction Study with Health Ramifications Using Causal Inferences. Electronics, 14(2), 350. https://doi.org/10.3390/electronics14020350