Framework for Addressing Imbalanced Data in Aviation with Federated Learning
Abstract
:1. Introduction
1.1. Background and Motivation
1.2. Related Works
1.2.1. Federated Learning for Imbalanced Data
1.2.2. Aviation-Specific FL Applications
1.2.3. Risk Assessment and Predictive Maintenance
1.3. Research Gap, Contributions, and Paper Structure
- Development of a federated learning-based approach that addresses data imbalance issues in aviation safety applications, including fault detection, predictive maintenance, and risk assessment.
- Integration of aviation-specific risk assessment metrics into the federated learning process to enhance the detection of rare but critical faults while maintaining high performance in routine operations.
- Implementation of an adaptive weighted aggregation mechanism that considers both data quality and operational significance, ensuring more effective collaboration among aviation stakeholders.
- Validation through extensive experiments and case studies, demonstrating the framework’s ability to improve minority-class detection, optimize maintenance scheduling, and enhance risk assessment accuracy across distributed aviation datasets.
2. Materials and Methods
2.1. Strategies for Addressing Imbalanced Aviation Data
2.2. Imbalanced Data in Aviation for Critical Fault Detection and Predictive Maintenance
2.3. Aviation Technical Support as a Service as a Tool for Imbalanced Data Integration
- ATSaaS uses AIoT to process and analyze data from aircraft systems, identifying rare anomalies that traditional systems might overlook. This capability ensures early detection of critical issues, such as engine overheating or structural fatigue, enabling airlines and MROs to perform targeted maintenance and reduce the risk of in-flight failures.
- By integrating imbalanced data flows from multiple stakeholders, ATSaaS creates a unified ecosystem where airlines, MROs, manufacturers, and regulatory bodies can collaborate effectively. AIoT ensures seamless communication and real-time updates, while federated learning enables stakeholders to develop shared models without compromising data privacy.
- AIoT-enabled edge computing allows ATSaaS to process data directly on the aircraft, providing immediate alerts for anomalies or faults. This real-time capability is critical for safety and operational efficiency, as it allows airlines to address issues proactively during flight or upon landing.
- The modular architecture of ATSaaS makes it scalable across fleets of different sizes and adaptable to various operational contexts. AIoT ensures that the platform can handle the high volume and velocity of aviation data while remaining cost-effective for stakeholders.
- ATSaaS supports regulatory compliance by integrating safety audit data, incident reports, and operational anomalies. AI algorithms analyze these data to identify systemic risks and recommend corrective actions, helping stakeholders meet stringent safety standards and reduce the likelihood of incidents.
2.4. Framework for Addressing Imbalanced Data in Aviation Using FL
- Data Collection—Gathering data from Airlines, MROs, and ATSaaS platforms;
- Data Preprocessing—Handling imbalanced data, generating synthetic samples, and applying data augmentation;
- Model Training—Implementing Federated Learning (FL), anomaly detection, and predictive models;
- Evaluation and Validation—Measuring model performance, scalability, and accuracy;
- Deployment and Integration—Applying trained models for real-time predictive maintenance;
- Decision-Making and Optimization—Using insights for stakeholder collaboration, safety improvements, and operational decisions.
2.5. The Role of ML in Fault Detection and Predictive Maintenance
2.6. Federated Learning as a Decentralized Approach to Machine Learning
2.7. Mathematical Framework for FL in Fault Detection, Predictive Maintenance, and Proactive Safety Management with Imbalanced Data
2.7.1. Problem Setup and Notation
2.7.2. Data Balancing/Rebalancing Strategies
- Weighting/cost-sensitive learning. Each class has an associated weight that reflects its prevalence or importance. We incorporate into the local loss function.
- Oversampling. If class is underrepresented, replicate or generate (synthetic) samples from that class.
- Under-sampling. For an overrepresented class , reduce the number of samples in local training by randomly dropping (or selectively dropping) instances.
- Hybrid approaches. A combination of oversampling the minority and under-sampling the majority class.
- If , then , meaning that this sample contributes to computations related to class ;
- If , then , meaning that this sample does not contribute.
2.7.3. Local Model Training (Client-Side)
- The central server sends the global model parameters to client ;
- Client updates by minimizing a local loss function over .
2.7.4. Federated Aggregation
2.7.5. Evaluation Metric with Imbalance Emphasis
2.7.6. General Framework
3. Results
3.1. General Methodology of the Computational Experiment
3.2. Use Case: FL for Engine Health Monitoring
3.2.1. Scenario
- Participants (Nodes): Three stakeholders (Airline A, Airline B, and an MRO) contribute local data.
- Dataset sizes:
- ○
- Airline A has samples (99% normal, 1% faults);
- ○
- Airline B has samples (98.5% normal, 1.5% faults);
- ○
- MRO has samples (90% normal, 10% faults from maintenance inspections).
- Model type: Federated neural network for classification (fault detection) and regression (RUL prediction).
- Feature set: Engine telemetry (temperature, pressure, vibration, fuel efficiency), maintenance history, and operational conditions.
3.2.2. Fault Detection
3.2.3. Predictive Maintenance and Remaining Useful Life Prediction
3.2.4. Risk Assessment in the Aviation Ecosystem
- High risk with ;
- Moderate risk with ;
- Low risk with .
3.2.5. Comparison with Baseline Methods and Statistical Validation of Results
- Centralized learning model—a conventional centralized deep learning framework, where all training data are aggregated into a single repository and trained on a shared model;
- Traditional machine learning models—standard logistic regression, support vector machines, and random forest classifiers trained on the same dataset without federated learning integration.
- Centralized learning model accuracy CI: [84.5, 85.9];
- Proposed federated learning model accuracy CI: [90.9, 92.7].
4. Discussion
4.1. Benefits of the Proposed Approach
4.2. Risk Assessment Efficiency Improvement in FL-Based Aviation Safety Models
4.3. Challenges and Limitations of the Study
4.4. Future Research Directions
4.5. Challenges in Deploying Federated Learning in Real-World Aviation Systems
- Computational and Communication Overhead
- 2.
- Data Heterogeneity and Non-Independent and Identical Distributions
- 3.
- Security and Privacy Concerns
- 4.
- Integration with Existing Aviation Infrastructure
5. Conclusions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lee, H.; Madar, S.; Sairam, S.; Puranik, T.G.; Payan, A.P.; Kirby, M.; Pinon, O.J.; Mavris, D.N. Critical Parameter Identification for Safety Events in Commercial Aviation Using Machine Learning. Aerospace 2020, 7, 73. [Google Scholar] [CrossRef]
- Kabashkin, I.; Perekrestov, V. Ecosystem of Aviation Maintenance: Transition from Aircraft Health Monitoring to Health Management Based on IoT and AI Synergy. Appl. Sci. 2024, 14, 4394. [Google Scholar] [CrossRef]
- Yang, M.; Wang, X.; Zhu, H.; Wang, H.; Qian, H. Federated Learning with Class Imbalance Reduction. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; pp. 2174–2178. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, N.; Tao, M. Clustered federated learning with weighted model aggregation for imbalanced data. China Commun. 2022, 19, 41–56. [Google Scholar] [CrossRef]
- Hou, Y.; Li, H.; Guo, Z.; Wu, W.; Liu, R.; You, L. FedIBD: A federated learning framework in asynchronous mode for imbalanced data. Appl. Intell. 2025, 55, 122. [Google Scholar] [CrossRef]
- Peng, H.; Wu, T.; Shi, Z.; Li, X. FedEF: Federated Learning for Heterogeneous and Class Imbalance Data. In Proceedings of the 2023 IEEE Symposium on Computers and Communications (ISCC), Gammarth, Tunisia, 26–29 June 2023; pp. 619–624. [Google Scholar] [CrossRef]
- Shuai, X.; Shen, Y.; Jiang, S.; Zhao, Z.; Yan, Z.; Xing, G. BalanceFL: Addressing Class Imbalance in Long-Tail Federated Learning. In Proceedings of the 2022 21st ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), Milano, Italy, 4–6 May 2022; pp. 271–284. [Google Scholar] [CrossRef]
- Zhu, J.; Zheng, H.; Xu, W.; Wang, H.; He, Z.; Liu, Y.; Wang, S.; Sun, Q. Harmonizing Global and Local Class Imbalance for Federated Learning. IEEE Trans. Mob. Comput. 2025, 24, 1120–1131. [Google Scholar] [CrossRef]
- Wang, D.; Zhang, N.; Tao, M. Adaptive Clustering-Based Model Aggregation for Federated Learning with Imbalanced Data. In Proceedings of the 2021 IEEE 22nd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lucca, Italy, 13–16 September 2021; pp. 591–595. [Google Scholar] [CrossRef]
- Dust, L.J.; Murcia, M.L.; Mäkilä, A.; Nordin, P.; Xiong, N.; Herrera, F. Federated Fuzzy Learning with Imbalanced Data. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 1130–1137. [Google Scholar] [CrossRef]
- Mrad, L.; Samara, A.A.; Abdellatif, A.; Al-Abbasi, A.; Hamila, R.; Erbad, A. Federated Learning for UAV Swarms Under Class Imbalance and Power Consumption Constraints. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Duan, M.; Liu, D.; Chen, X.; Liu, R.; Tan, Y.; Liang, L. Self-Balancing Federated Learning with Global Imbalanced Data in Mobile Systems. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 59–71. [Google Scholar] [CrossRef]
- Ran, X.; Ge, L.; Zhong, L. Dynamic Margin for Federated Learning with Imbalanced Data. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Xie, Y.; Liang, H.; Wang, X.; Li, J.; Cheng, Z.; Huang, S.; Liu, F.; Guo, L. CR-IFSSL: Imbalanced Federated Semi-Supervised Learning with Class Rebalancing. In Intelligence Computation and Applications. ISICA 2023. Communications in Computer and Information Science; Li, K., Liu, Y., Eds.; Springer: Singapore, 2024; Volume 2146. [Google Scholar] [CrossRef]
- Sittijuk, P.; Tamee, K. Performance Measurement of Federated Learning on Imbalanced Data. In Proceedings of the 2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE), Lampang, Thailand, 15–17 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Jin, B.; Huang, D.; Chen, N.; He, J.; Xu, S.; Zhang, G. Federated Learning with Class-Imbalanced Heterogeneous. In Proceedings of the 2023 IEEE 14th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Beijing, China, 15–17 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Li, Q.; Sun, Y.; Gao, K.; Xi, N.; Zhou, X.; Wang, M.; Fan, K. LFL-COBC: Lightweight Federated Learning on Blockchain-Based Device Contribution Allocation. Electronics 2024, 13, 4395. [Google Scholar] [CrossRef]
- Che, L.; Wang, J.; Zhou, Y.; Ma, F. Multimodal Federated Learning: A Survey. Sensors 2023, 23, 6986. [Google Scholar] [CrossRef]
- Alsaif, K.M.; Albeshri, A.A.; Khemakhem, M.A.; Eassa, F.E. Multimodal Large Language Model-Based Fault Detection and Diagnosis in Context of Industry 4.0. Electronics 2024, 13, 4912. [Google Scholar] [CrossRef]
- Yang, T.; Lu, Y.; Deng, H.; Chen, J.; Tang, X. Acquisition and Processing of UAV Fault Data Based on Timeline Modeling Method. Appl. Sci. 2023, 13, 4301. [Google Scholar] [CrossRef]
- Adamopoulou, E.; Daskalakis, E. Applications and Technologies of Big Data in the Aerospace Domain. Electronics 2023, 12, 2225. [Google Scholar] [CrossRef]
- Ogundokun, R.O.; Misra, S.; Maskeliunas, R.; Damasevicius, R. A Review on Federated Learning and Machine Learning Approaches: Categorization, Application Areas, and Blockchain Technology. Information 2022, 13, 263. [Google Scholar] [CrossRef]
- Li, W.; Yang, W.; Jin, G.; Chen, J.; Li, J.; Huang, R.; Chen, Z. Clustering Federated Learning for Bearing Fault Diagnosis in Aerospace Applications with a Self-Attention Mechanism. Aerospace 2022, 9, 516. [Google Scholar] [CrossRef]
- Berghout, T.; Benbouzid, M.; Bentrcia, T.; Lim, W.H.; Amirat, Y. Federated Learning for Condition Monitoring of Industrial Processes: A Review on Fault Diagnosis Methods, Challenges, and Prospects. Electronics 2023, 12, 158. [Google Scholar] [CrossRef]
- Khan, S.; Gaba, G.S.; Gurtov, A. A Federated Learning Based Security for Controller-Pilot Data Link Communication. In Proceedings of the International Council of Aeronautical Sciences (ICAS), Stockholm, Sweden, 5–9 September 2022; Available online: https://www.icas.org/ICAS_ARCHIVE/ICAS2022/data/papers/ICAS2022_0704_paper.pdf (accessed on 5 September 2024).
- Llasag Rosero, R.H.; Silva, C.; Ribeiro, B. Remaining Useful Life Estimation in Aircraft Components with Federated Learning. PHM Soc. Eur. Conf. 2020, 5, 9. [Google Scholar] [CrossRef]
- Qu, Y.; Dai, H.; Zhuang, Y.; Chen, J.; Dong, C.; Wu, F.; Guo, S. Decentralized Federated Learning for UAV Networks: Architecture, Challenges, and Opportunities. IEEE Netw. 2021, 35, 156–162. [Google Scholar] [CrossRef]
- Doğru, A.; Bouarfa, S.; Arizar, R.; Aydoğan, R. Using Convolutional Neural Networks to Automate Aircraft Maintenance Visual Inspection. Aerospace 2020, 7, 171. [Google Scholar] [CrossRef]
- Abdelghany, E.S.; Farghaly, M.B.; Almalki, M.M.; Sarhan, H.H.; Essa, M.E.-S.M. Machine Learning and IoT Trends for Intelligent Prediction of Aircraft Wing Anti-Icing System Temperature. Aerospace 2023, 10, 676. [Google Scholar] [CrossRef]
- Gao, Z.; Mavris, D.N. Statistics and Machine Learning in Aviation Environmental Impact Analysis: A Survey of Recent Progress. Aerospace 2022, 9, 750. [Google Scholar] [CrossRef]
- Brandoli, B.; de Geus, A.R.; Souza, J.R.; Spadon, G.; Soares, A.; Rodrigues, J.F., Jr.; Komorowski, J.; Matwin, S. Aircraft Fuselage Corrosion Detection Using Artificial Intelligence. Sensors 2021, 21, 4026. [Google Scholar] [CrossRef]
- Yang, R.; Gao, Y.; Wang, H.; Ni, X. Fuzzy Neural Network PID Control Used in Individual Blade Control. Aerospace 2023, 10, 623. [Google Scholar] [CrossRef]
- Wang, Z.; Zhao, Y. Data-Driven Exhaust Gas Temperature Baseline Predictions for Aeroengine Based on Machine Learning Algorithms. Aerospace 2023, 10, 17. [Google Scholar] [CrossRef]
- Chen, J.; Qi, G.; Wang, K. Synergizing Machine Learning and the Aviation Sector in Lithium-Ion Battery Applications: A Review. Energies 2023, 16, 6318. [Google Scholar] [CrossRef]
- Baumann, M.; Koch, C.; Staudacher, S. Application of Neural Networks and Transfer Learning to Turbomachinery Heat Transfer. Aerospace 2022, 9, 49. [Google Scholar] [CrossRef]
- Quadros, J.D.; Khan, S.A.; Aabid, A.; Alam, M.S.; Baig, M. Machine Learning Applications in Modelling and Analysis of Base Pressure in Suddenly Expanded Flows. Aerospace 2021, 8, 318. [Google Scholar] [CrossRef]
- Kabashkin, I. Integration of Foundation Models and Federated Learning in AIoT-Based Aircraft Health Monitoring Systems. Mathematics 2024, 12, 3428. [Google Scholar] [CrossRef]
- Kabashkin, I.; Perekrestov, V. Concept of Aviation Technical Support as a Service. Transp. Telecommun. 2023, 24, 471–482. [Google Scholar] [CrossRef]
- Orozco-Arias, S.; Piña, J.S.; Tabares-Soto, R.; Castillo-Ossa, L.F.; Guyot, R.; Isaza, G. Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements. Processes 2020, 8, 638. [Google Scholar] [CrossRef]
Stakeholder | Weighted Loss | Normal Samples | Fault Samples | Normal Probability | Fault Probability |
---|---|---|---|---|---|
Airline A | 0.0842 | 9 900 | 100 | 98.0% | 85.0% |
Airline B | 0.0956 | 7 880 | 120 | 97.0% | 88.0% |
MRO | 0.1245 | 4 500 | 500 | 95.0% | 92.0% |
Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | False-Positive Rate (%) |
---|---|---|---|---|---|
Centralized Learning Model | 85.2 | 81.5 | 78.6 | 79.9 | 18.4 |
Logistic Regression | 82.7 | 78.1 | 74.3 | 76.1 | 21.1 |
Support Vector Machine | 84.0 | 80.0 | 75.8 | 77.8 | 19.3 |
Random Forest | 86.5 | 83.0 | 80.2 | 81.5 | 16.8 |
Proposed FL Framework | 91.8 | 88.7 | 84.5 | 86.6 | 12.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kabashkin, I. Framework for Addressing Imbalanced Data in Aviation with Federated Learning. Information 2025, 16, 147. https://doi.org/10.3390/info16020147
Kabashkin I. Framework for Addressing Imbalanced Data in Aviation with Federated Learning. Information. 2025; 16(2):147. https://doi.org/10.3390/info16020147
Chicago/Turabian StyleKabashkin, Igor. 2025. "Framework for Addressing Imbalanced Data in Aviation with Federated Learning" Information 16, no. 2: 147. https://doi.org/10.3390/info16020147
APA StyleKabashkin, I. (2025). Framework for Addressing Imbalanced Data in Aviation with Federated Learning. Information, 16(2), 147. https://doi.org/10.3390/info16020147