Next Article in Journal
Exoskeleton-Based Microgravity Simulation for Astronaut Training
Previous Article in Journal
A Generic Tool for Multi-Fidelity MDO Under Uncertainty, with Application on Hybrid Electric Regional Aircraft
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Certification of AI-Based Aviation Systems: A Methodology for Continuous Safety Assurance Across the System Life Cycle †

School of Mechanical, Industrial & Aeronautical Engineering, University of the Witwatersrand, Johannesburg 2050, South Africa
*
Author to whom correspondence should be addressed.
Presented at the 2025 SAIMechE Central Branch Conference on Mechanical Engineering and Related Disciplines, Johannesburg, South Africa, 28 October 2025.
Eng. Proc. 2026, 132(1), 7; https://doi.org/10.3390/engproc2026132007 (registering DOI)
Published: 13 May 2026

Abstract

Artificial Intelligence (AI) is emerging as a transformative enabler in aviation, with applications spanning Guidance, Navigation and Control (GNC), Air Traffic Management (ATM), and predictive maintenance. However, the adoption of AI in safety-critical domains remains constrained by the absence of established certification guidance. Traditional standards such as Aerospace Recommended Practice (ARP), ARP4754B, ARP4761A, DO-178C, and DO-254 assume deterministic behaviour and verifiable logic, whereas AI exhibits adaptive and non-deterministic characteristics. Regulatory initiatives, including the European Union Artificial Intelligence Act, the European Union Aviation Safety Agency (EASA) AI Roadmap 2.0, the Federal Aviation Administration (FAA) AI Safety Assurance Roadmap, and ISO/IEC Technical Report (TR) 5469:2024, signal progress but remain fragmented, exploratory, and often limited to low-level autonomous use cases. This study adopts a qualitative approach combining literature and standards analysis with expert interviews to identify gaps in post-deployment assurance, data governance, explainability, and accountability. A conceptual life cycle-oriented framework is proposed that embeds AI-specific assurance activities such as dataset validation, iterative verification, drift detection, and retraining oversight into established certification processes. The framework extends classical and emerging verification and validation models into operational service, linking machine learning constituents to system-level safety arguments and regulatory expectations to support the development of trustworthy and certifiable AI-enabled aviation systems.

1. Introduction

1.1. Background

The aerospace industry has long driven technological progress, from the Wright brothers’ pioneering flights and the National Aeronautics and Space Administration’s (NASA’s) space exploration initiatives [1] to contemporary advancements in reusable spacecraft [2], yielding substantial improvements in aviation safety and efficiency [3].
Central to this trajectory was the automation of complex human tasks, a concept formalised as “Artificial Intelligence” in the 1950s [4,5]. Recent times have witnessed accelerated AI development, propelled by exponential growth in computational capacity, innovative algorithms, and significant investments in computing architectures [6].
Today, AI encompasses a broad spectrum of models and applications [7], spurring widespread adoption across industries [8]. In aviation, AI is poised to enhance pilot decision-making, aircraft performance optimisation, ATM, and safety enhancements previously deemed unfeasible [9]. However, its integration into safety-critical domains such as GNC and Air Traffic Control (ATC) introduces profound risks, given the potential for catastrophic system failure [10].
Increasing operational complexity, driven by the proliferation of Unmanned Aerial Vehicles (UAVs) [11], escalating air traffic volumes [12], and rising demand for autonomous systems [13], exerts considerable pressure on conventional ATM infrastructures. These systems, historically effective, remain constrained by dependence on human operators, limited controller availability, and legacy protocols ill-suited to growing modern environments [14].
Amid the Fourth Industrial Revolution (Industry 4.0), AI holds transformative potential for aviation [15]. However, its adoption is impeded by inherent challenges: non-deterministic behaviour, limited explainability, and the absence of standardised certification processes [16]. These issues are compounded by the technical complexity of Machine Learning (ML) systems [17], alongside ethical considerations and security vulnerabilities associated with autonomous functionality [18].

1.2. Research Motivation

Advances in automation and intelligent control have demonstrated remarkable capability in managing complex aerospace operations. SpaceX’s recovery of the Super Heavy Booster 12 using the launch tower’s mechanical arms, known as “Mechazilla” [19,20], illustrates how embedded and automated GNC algorithms can execute safety-critical tasks under dynamic conditions.
The integration of AI into the aerospace sector presents both unprecedented opportunities and significant challenges, particularly concerning safety, reliability and regulatory assurance [21]. These developments signal a broader transition toward adaptive and data-driven systems in aviation. Yet, while such technologies promise efficiency and safety gains, they also challenge established assurance practices designed for deterministic and verifiable systems [10]. The motivation for this study arises from this emerging gap to reconcile the adaptive potential of AI with the rigorous assurance and certification principles that define aviation safety.

1.3. Problem Statement

At present, most certification guidance efforts remain exploratory in nature [6,22]. This has resulted in a fragmented landscape of ad hoc approaches developed independently by regulatory authorities, research institutions, and industry stakeholders. Such fragmentation risks delaying the effective adoption of AI technologies, increasing development cost and regulatory complexity, and reducing confidence in AI-based systems.
To realise the full potential of AI, aviation requires a unified and adaptable methodology that integrates verification, validation, and certification in a holistic and life cycle-oriented manner. Such a methodology should align with existing airworthiness standards while incorporating AI-specific provisions to ensure continued safety, transparency, and trustworthiness throughout system operation.

1.4. Research Question

The critical research question guiding this study is as follows:
How can a life cycle-oriented methodology be developed to enable trustworthy verification and certification of AI-based aviation systems through continuous safety assurance across the system life cycle?

1.5. Research Objectives

To address this question, the investigation pursues the following objectives:
  • Analyse challenges and regulatory gaps in the verification and certification of AI-based aviation systems.
  • Evaluate the current regulatory guidance, roadmaps, and approach suitability for modern AI-based systems certification and use.
  • Develop a conceptual methodology that supports trustworthy verification and certification of AI-based aviation systems by embedding safety assurance across the system life cycle.
  • Explore the current AI landscape and approach toward AI certification by engaging experts through semi-structured interviews.

2. Literature Study

This section reviews the body of literature relevant to the assurance and certification of AI-based systems in aviation. The objective is to trace how established concepts of design assurance, safety assurance and life cycle management have evolved in response to the introduction of adaptive, data-driven and non-deterministic technologies. The review draws from academic and technical sources to identify how traditional assurance principles are being extended or re-interpreted to address AI-specific challenges such as explainability, data integrity and continuous learning.
Detailed discussion of specific aerospace standards, guidance material, certification processes and current regulatory roadmaps is presented in Section 4.1. These frameworks are addressed as part of the study’s findings to evaluate how well existing certification practices can accommodate AI.

2.1. Design and Safety Assurance

According to Burgess [23], design assurance is a process of disciplined engineering to achieve excellence or standardise good practices across the industry. For technology developers, regulators and the broader aerospace sector, assurance extends beyond the design phase of systems. When applied across the full system life cycle, assurance necessarily encompasses both system safety and safety assurance, also referred to as the safety life cycle.
This broader view of assurance is reinforced in the work of Nouri & Warmuth [24], who compare the structures of IEC 61508 [25] (Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems) and ISO 26262 [26] (Road Vehicles—Functional Safety). These standards form the foundation of functional safety across many domains, and their principles apply directly to aviation systems.
Safety assurance, originally defined by McDermid et al. [27], is the process of establishing “justified confidence or certainty in a system’s capabilities, including its safety”. Habli et al. [28] further extend this, stating that “safety assurance is concerned with demonstrating confidence in a system’s safety”. The demonstration and maintenance of safety confidence is achieved by systematically constructing arguments and gathering supporting evidence throughout system development and testing phases.
This process typically results in a body of evidence that is presented through design assurance compliance, providing supporting proof and formally compiling a safety case for submission to the relevant regulatory authorities [29]. In the context of AI-based systems, the construction and maintenance of safety cases are complicated by their non-deterministic behaviour and adaptive learning characteristics, which challenge traditional verification and assurance methods [30].

2.2. Life Cycle Aspects

Design assurance across the system life cycle is foundational to certification of safety-critical aviation systems. Traditional assurance frameworks, such as those outlined in ARP4754B [31] and ARP4761A [32], employ a structured, life cycle-oriented process in which each development phase is paired with corresponding verification to ensure end-to-end traceability from requirements to safety evidence. This classical V-model remains central to civil aircraft certification, supporting rigorous validation and verification practices throughout design and implementation.
However, AI and ML challenge this as their behaviour depends on training data and statistical generalisation, making requirements less absolute and verification inherently probabilistic [33]. To accommodate this, regulators have begun extending classical assurance structures. EASA introduced the W-shaped learning assurance model, shown in Figure 1, which adds a parallel branch for data preparation, model training, and validation. These activities are not present in conventional system development processes [34]. This model further emphasises that assurance must continue through operational monitoring once deployment occurs. Complementary to this, the Machine Learning Application Approval (MLEAP) project defines tangible artefacts, such as dataset documentation, generalisation metrics, and robustness verification, to strengthen evidence for certification [35].
The FAA similarly recognises the need to adapt life cycle-based assurance. Its roadmap differentiates learned AI (trained prior to deployment) from learning AI (adaptive during operation), assigning the latter additional oversight through runtime monitoring and continued operational safety evaluation [22]. In both cases, classical assurance artefacts (requirements traceability and safety cases) must be AI-specific evidence to ensure safety assurance remains valid throughout the system life cycle. These include the following.
  • Training verification: Confirmation that the model training process has been conducted in accordance with defined objectives and safety constraints, ensuring that training data, configuration parameters, and resulting behaviour are fully documented, acceptably repeatable, and verifiable [21,35].
  • Dataset representativeness: Demonstrates that the training and validation datasets adequately reflect the diversity and variability of the operational environment. This ensures that the model generalises safely across anticipated conditions and mitigates bias or under-representation of critical cases [6,30].
  • Drift monitoring: Establishes continuous in-service surveillance of input data distributions and model outputs to detect deviations from the baseline behaviour established during training. This enables the early identification of data drift or concept drift that could degrade model performance or safety [22,36,37].
  • Operational domain definitions: Formal definition of the environmental and functional boundaries under which the AI system is qualified to operate. This includes specifying the Operational Design Domain (ODD) so that assurance arguments and certification claims remain valid only within the declared and validated context [38].
In summary, life cycle assurance for AI in aviation is progressing toward hybrid models: the V-model remains the basis, while AI-specific artefacts overlay it to maintain safety assurance from concept to deployment. A common thread emerges that practical guidance during operational phases remains vague or reliant on developer- and operator-led processes. As detailed later in Section 4.1, these life cycle considerations support ongoing developments in aerospace assurance approaches aimed at formalising AI within established certification ecosystems.

2.3. Certification Landscape

Building on the evolving life cycle assurance concepts outlined in the previous section, the certification landscape reflects a growing convergence between regulatory initiatives and academic research aimed at ensuring the safe integration of AI in aviation. In parallel with industry and authority-led efforts, the research community has been actively developing and evaluating methodologies for AI safety assurance, seeking to establish justified confidence in AI-enabled systems. The following section reviews key academic contributions that inform this study, highlighting their methodologies, strengths, limitations, and the gaps that persist within the broader certification context.

2.3.1. Frameworks and Methodologies for AI Safety Assurance

Silva Neto et al. [30] performed a comprehensive Systematic Literature Review (SLR) to establish the state of the art in safety assurance of AI-based systems across multiple domains. From an initial pool of 5090 references, 329 publications were selected as directly relevant to AI safety assurance. The review organises this body of knowledge into five conceptual approaches applicable to aviation and other safety-critical domains:
  • Black-box testing: Extensive evaluation of AI outputs without insight into internal processes.
  • Safety envelopes: Restricting AI responses to a predefined safe set of outputs.
  • Fail-safe AI design: Incorporating fallback mechanisms to ensure safety during failures.
  • White-box analyses with Explainable AI (XAI): Using transparent models and explainability techniques to clarify AI decision-making.
  • Life cycle safety assurance: Implementing continuous safety processes throughout the AI system’s life cycle.
These approaches capture the diversity of existing methods for testing and constraining AI behaviour, yet they collectively reveal that no unified framework presently exists to ensure acceptable confidence in AI safety throughout the system life cycle.
The authors identify several technical and methodological challenges that remain unresolved. Dataset adequacy and representativeness are recurring issues. Training data are often incomplete or biassed from a safety perspective. The justification for model architectures and hyperparameter choices is seldom documented in a manner acceptable to regulators, and the opacity of complex models complicates both hazard analysis and verification. Although the review synthesises numerous research themes prioritised for future work, including explainability for certification, safety-focused data preparation, and assurance of AI embedded in hardware, it concludes that the lack of an integrated assurance framework persists as the most significant gap. A further limitation lies in the dominance of autonomous driving literature, which may not translate directly to aviation, where verification rigour and regulatory oversight are more stringent.
Building upon this foundation, Schnitzer et al. [21] propose the Landscape of AI Safety Concerns (LAISC), a structured methodology that strengthens assurance cases for AI-based systems by systematically addressing AI-specific insufficiencies. The framework comprises four principal elements:
  • A comprehensive list of AI Safety Concerns (AI-SCs);
  • Metrics and Mitigation Measures (M&Ms);
  • Alignment with the AI life cycle to determine when evidence should be generated;
  • Verifiable Requirements (VRs) that translate abstract concerns into testable claims.
Together, these components enable practitioners to demonstrate the absence of AI SCs in a structured, evidence-driven way.
The methodology is applied in a case study of a driverless regional train, focusing on three critical AI-SCs: inaccurate data labels, problems with synthetic data (the “reality gap”), and lack of robustness. For each concern, the authors illustrate how to decompose it into specific goals, derive VRs, and apply appropriate metrics and mitigations. This process mirrors traditional functional safety standards such as IEC 61508, thereby supporting transparency and traceability. LAISC’s contribution lies in bridging high-level hazard identification and evidence-based argumentation. However, limitations remain. There are no universal metrics for several concerns; reliance on expert judgement introduces subjectivity in qualitative VRs, and system-level risk quantification is not addressed. Moreover, validation is confined to a single domain, and integration with regulatory artefacts such as certification plans and safety cases is not yet demonstrated. Consequently, while LAISC offers a valuable structure for AI safety reasoning, its scalability and applicability to aviation certification remain unproven.
Collectively, Silva Neto et al. and Schnitzer et al. lay the foundation for structured AI assurance but reveal enduring shortcomings, including limited domain adaptation for aviation, inadequate data governance mechanisms, and weak linkage between AI behavioural evidence and formal certification artefacts. These gaps underscore the need for a domain-tailored methodology capable of aligning AI-specific safety reasoning with established regulatory frameworks.

2.3.2. Trends in AI Adoption for Aviation Safety

Demir et al. [39] conduct a systematic and bibliometric literature review of AI applications in aviation safety, analysing 224 articles published between 2004 and January 2024. Their goal was to understand how AI techniques (especially ML and Deep Learning (DL)) have been used to improve aviation safety, and to identify the main methods, application areas, and research trends in this interdisciplinary field. This work provides a broad perspective on the landscape of “AI in aviation” research.
The review finds that research activity in this area has grown dramatically in recent years, with a sharp increase since 2021. In terms of methods, the vast majority of studies leverage ML or DL algorithms. Common techniques include various supervised learning models (classification and regression for predicting safety-related outcomes), neural networks and DL architectures for pattern recognition (e.g., identifying anomalies or precursors to accidents), time-series models for forecasting (such as predicting component failures or weather impacts), and optimisation approaches. There is also an exploration of human factors and pilot performance, using AI models to evaluate factors such as pilot workload or error likelihood. Strategic application areas identified in the literature include:
  • Accident analysis and prediction;
  • Flight data monitoring for anomaly detection;
  • Air Traffic Management support tools;
  • Pilot training and behaviour modelling;
  • Decision support systems for safety management;
  • Predictive maintenance for aircraft systems.
Through bibliometric mapping, Demir et al. highlight the major contributors and collaborations in this domain. Geographically, China has emerged as the leading contributor in terms of publication volume and has extensive international collaboration networks, followed by the United States and Italy as other major contributors. The analysis also shows how different research clusters have formed, for example, a cluster around predictive safety analysis using big data and ML, another around human factors and AI, and others around maintenance and component reliability.
The strength of Demir et al.’s contribution lies in its comprehensive overview and the clarity with which it presents the evolution of the field. The bibliometric approach provides an evidence-based mapping of what sub-topics are popular and how they are connected, which is valuable for new researchers entering the field. It effectively identifies where AI has made inroads in aviation safety research and who the key players are. For the purposes of this study, the review is useful in highlighting where AI techniques have been applied to aviation safety and, implicitly, where challenges might lie (for example, heavy use of black-box models like DL for critical safety tasks raises questions about how to assure those models).
However, the review by Demir et al. also has some limitations. By relying solely on the Scopus database for article retrieval, there is a chance that some relevant publications were missed, which could bias the findings. Additionally, the strong emphasis on machine learning and DL means that more traditional or non-AI approaches to aviation safety (or hybrid approaches) were not the focus. The authors do state that this was by design, but it means that the review does not address how those AI methods fare in terms of safety assurance. In other words, while much is known about how AI is being used to improve aviation safety, far less is understood about how the safety of AI itself is being ensured. This highlights the need for research not only on applying AI to address safety challenges, but also on ensuring that AI solutions are certifiable for use in aviation.
Overall, Demir et al. provide valuable context and confirm that interest in AI for aviation safety is booming. The trends and clusters they outline help validate that topics like anomaly detection, predictive maintenance, and human–AI interaction are seen as important. These are all areas where robust assurance will be needed if AI tools are to be trusted. Their work underscores the importance of developing assurance methodologies that can keep up with the rapid deployment of AI in various safety-critical aviation applications.

2.3.3. Emerging Engineering Models and Life Cycle Integration

Late addition: After the initial literature review, a relevant 2025 publication by Christensen et al. [38] was found, titled “Formulating an Engineering Framework for Future AI Certification in Aviation.” This study acknowledges this as a recent contribution that complements the discussion of AI development processes. Christensen et al. propose an extended W-shaped process that explicitly integrates Development Operations (DevOps) principles into the life cycle.
In their framework, the EASA W-model is combined with Agile/DevOps practices (such as continuous integration and continuous deployment of AI models) to form a more iterative “AI engineering” process. The authors note that while the W-shaped process provides a rigorous structure aligned with certification needs, introducing DevOps elements can help manage the “never-ending iterative” nature of machine learning and enable continuous improvement of the model even after deployment.
In practical terms, their DevOps-integrated W-model allows for re-evaluation of the AI system using operational data and rapid update cycles: “each iteration” can improve the system’s capabilities based on in-service feedback. This reflects a trend toward Machine Learning Operations (MLOps) in safety-critical domains, where updates and learning do not stop at certification but continue through the system’s operational life [40].
The proposed premise is an innovative approach to dealing with AI’s evolving/learning nature, but it also acknowledges challenges. As noted by the paper, continuously updating an on-aircraft AI system conflicts with the traditionally “fixed requirements” mindset of certification and with the fact that hardware in aviation cannot practically be updated as frequently as software [38]. In contrast, this paper emphasises the certification pathway: ensuring that every development iteration produces the artefacts needed to satisfy regulatory requirements within the system development life cycle perspective. Christensen et al.’s DevOps-centric W-model informs this perspective by highlighting the value of operational feedback and continuous improvement. The incorporation of the idea of monitoring and potential post-certification updates will need to form part of any framework’s long-term vision.
In summary, Christensen et al. add to the literature by blending DevOps with the W-shaped development model, indicating an industry push towards more DevOps/MLOps in aviation AI engineering.
This reference is included as it underscores evolving best practices and insights that reinforce the relevance of the W-model approach and provide a contrasting emphasis between DevOps and certification-centric approaches, thereby clarifying the direction of the proposed framework within the broader landscape of AI certification in aviation.

2.4. Synthesis of Literature

The reviewed literature shows that while traditional assurance frameworks remain foundational, they are not sufficient for certifying adaptive and data-driven systems. Design assurance provides a structured basis for establishing confidence in system safety, but the introduction of Artificial Intelligence has challenged these foundations through probabilistic behaviour, limited explainability, and dependence on dynamic data environments.
Silva Neto et al. [30] identify five principal approaches to AI safety assurance but note that these remain fragmented and lack integration with certification processes. Schnitzer et al. [21] propose the Landscape of AI Safety Concerns methodology, which offers a structure for addressing AI-specific risks, although its validation and regulatory integration are limited. Demir et al. [39] map the rapid growth of AI adoption in aviation, showing a strong focus on safety applications rather than certification readiness. Christensen et al. [38] introduce a life cycle model combining DevOps and W-model principles, highlighting the tension between continuous learning and fixed certification baselines.
Collectively, these studies provide conceptual advances but no unified framework for assuring AI systems within aviation certification constraints. Persistent gaps remain in linking AI evidence to formal assurance artefacts, managing learning and drift during operation, and integrating AI-specific verification across the system life cycle. Section 4.1 examines how existing aerospace standards and guidance frameworks respond to these challenges.
The literature demonstrates that while traditional assurance frameworks remain foundational, they are insufficient for certifying adaptive and data-driven systems. Design assurance, as defined in classical safety-critical engineering, provides a disciplined foundation for demonstrating confidence in system performance and safety. However, the introduction of AI has disrupted these foundations by introducing probabilistic behaviours, opaque decision-making processes, and dependencies on dynamic data environments.

3. Method and Materials

3.1. Research Design

This study adopts a qualitative exploratory research design using a multi-method approach, including a combination of the following:
  • Study of recent aerospace developments, concept papers, roadmaps and reviews.
  • Content/document analysis of technical standards, policy documents, and academic publications to identify integration patterns, barriers, and regulatory frameworks.
  • Semi-structured expert interviews with professionals in aerospace engineering and AI research to comment on the current landscape and identify practical constraints and opportunities.

3.2. Data Collection and Analysis Procedures

Data were obtained from regulatory documents, standards, academic literature, and expert interviews. Literature sources were identified through targeted searches of peer-reviewed journals, conference proceedings, and standards bodies, with inclusion based on relevance to AI assurance, aerospace certification, and system life cycle processes. Included sources were required to be either peer-reviewed publications or authoritative guidance from recognised regulators/standards bodies, and to provide sufficient technical detail to inform assurance activities.
Standards and guidance were obtained from regulator and standard body repositories, and academic literature was retrieved via academic indexing services and publisher libraries; references were managed using standard citation management tools.
All collected data were thematically coded to identify recurring issues such as assurance strategies, traceability, explainability, verification, and regulatory gaps. A constant comparative method was applied to iteratively refine themes across all data sources, ensuring consistency between documentary evidence and expert perspectives. The resulting themes formed the basis for the study’s findings and proposed framework.

3.3. Expert Interviews

The interviews were designed to extend the preliminary themes identified in this study. Six (6) semi-structured interviews were conducted.

3.3.1. Participant Selection Criteria

Participants were identified through professional networks and selected using purposive sampling based on the following:
  • Direct involvement in aerospace system development, certification, safety assurance, or AI research;
  • Demonstrated familiarity with relevant assurance/certification frameworks (e.g., ARP4754, ARP4761, DO-178C, DO-254, or equivalent);
  • Ability to provide perspectives from either industry practice or academic research, ensuring representation from both industry and academia.

3.3.2. Participant Profile

The six interview participants represented the following professional roles:
  • Senior avionics systems engineer involved in military aircraft development programmes.
  • Certification and compliance specialist supporting DO-178C and DO-254 projects.
  • System safety practitioner with experience applying ARP4754/ARP4761 processes.
  • Aerospace project manager responsible for validation and verification management.
  • Academic researcher specialising in AI applications for aerospace systems.
  • Machine learning engineer with experience in safety-critical software environments.
  • Participants had between approximately 8 and 25 years of professional experience in aerospace engineering, certification, or applied AI domains.

3.3.3. Interview Conduct and Recording

Interviews followed a semi-structured format and were conducted using a discussion guide. Responses were captured through contemporaneous notes and/or transcription from recordings where available. All interview data were anonymised prior to analysis to protect participant confidentiality, consistent with the approved ethics protocol.
The guide below was used to structure discussions while allowing for participant-driven insights:
  • Regulatory frameworks and gaps: How do current standards and guidance documents address (or fail to address) AI integration?
  • Life cycle assurance: What challenges exist in assuring safety across the full life cycle of AI systems?
  • Evidence and credibility: What forms of data, testing, or validation are seen as sufficient for certification?
  • Practical implementation: What barriers (technical, organisational, and/or cultural) affect the adoption of assurance processes?
  • Future outlook: What changes in certification practice are anticipated or needed?

4. Results and Discussion

The analysis of aerospace standards, regulatory guidance and academic literature revealed several themes that will shape the future approaches to certification of AI in aviation.

4.1. Existing Standards, Guidance and Frameworks for Certification

The certification of civil aircraft and systems traditionally begins with a Type Certification (TC) application [41] submitted to the relevant regulatory authority. The development process is structured around a functional breakdown and safety assessment of the aircraft and its subsystems. This assessment is governed by ARP4761A/ED-135 (Guidelines for Conducting the Safety Assessment Process on Civil Aircraft, Systems, and Equipment) [32,42].
Once safety requirements are defined, each system function is assessed for potential hazards. Based on the severity of these hazards, Design Assurance Levels (DALs) are assigned, linking functional criticality with the level of rigour required in development and verification. Figure 2 illustrates this allocation and its relationship to the broader system development process, as described by ARP4754B/ED-79 (Guidelines for Development of Civil Aircraft and Systems) [31].
Subsequent hardware and software items are developed in accordance with guidance by Radio Technical Commission for Aeronautics (RTCA) DO-254/ED-80 (Design Assurance Guidance for Airborne Electronic Hardware) [43] and RTCA DO-178C/ED-12 (Software Considerations in Airborne Systems and Equipment Certification) [44]. It is also noted that supporting guidance, also issued by RTCA, covers aspects such as environmental conditions [45] (DO-160), tool qualification [46] (DO-330), model-based development [47] (DO-331), object-oriented design [48] (DO-322), and formal methods [49] (DO-333), among others.
These standards prescribe a rigorous development process built on a V-model life cycle, where each development phase is paired with corresponding verification to ensure traceability and safety assurance. Figure 3, extracted from ARP4754A, depicts this alignment. Notably, ARP4754 is formally recognised by both EASA and FAA as an Acceptable Means of Compliance (AMC) for civil aircraft and systems development [50,51]. ARP4754 and its companion standards form the backbone of contemporary certification, providing the baseline against which additional sectoral and domain-specific frameworks are integrated.
Parallel domains, particularly airworthiness and functional safety, provide additional context to the certification landscape. Within defence aviation, the U.S. Department of Defense (DoD) issues handbooks and standards such as MIL-HDBK-516C [53], which establishes criteria for all manned and unmanned fixed and rotary-wing aircraft systems, and MIL-STD-882E [54], which establishes a structured process for system safety assessment and risk management. In parallel cross-sector functional safety frameworks, including IEC 61508 [25] for industrial systems and ISO 26262 [26] for automotive applications, also adopt life cycle-oriented approaches. These frameworks, although originating in different sectors, have influenced modern aviation practices by reinforcing the importance of continuous assurance, hazard analysis, and traceability across the system life cycle.
However, these traditional and industry-specific (aviation) standards inherently assume deterministic systems with transparent, human-understandable logic, verified through traditional methods like exhaustive testing and formal methods. AI, particularly those based on ML, deviates from these assumptions, introducing non-deterministic and “black-box” behaviour, challenging these conventional approaches [17,55]. AI’s unpredictability complicates exhaustive testing and traditional verification, both critical for aviation certification. In light of this challenge, the Society of Automotive Engineers (SAE), via collaboration with European Organisation for Civil Aviation Equipment (EUROCAE) is developing a new standard called ARP6983/ED-324 [56] (Process Standard for Development and Certification Approval of Aeronautical Safety-Related Products Implementing AI), specifically targeting AI-enabled systems and intended to bridge gaps between system-level certification and AI-based item development also known as development of ML constituents [57].
Figure 4 illustrates the envisaged role of ARP6983 and the intended system integrator, typically airframers and suppliers. Because ARP6983 is still under development, it is not currently considered a normative reference. Nonetheless, some high-level principles may inform the direction of current works or certification methodologies.
More recently, international standardisation efforts have sought to establish functional guidance. Notably, test report ISO/IEC TR 5469:2024, Artificial Intelligence—Functional Safety and AI Systems, provides recommendations for integrating functional safety principles into AI system design [58]. Although influential for internal industry practices, its adoption and formalisation by aviation certification authorities remain at an early stage.
This adds further need to develop robust assurance methods tailored to AI. Building on both academic work and industry experience, aviation regulators such as EASA and FAA are actively developing concept papers, guidance circulars, and research programmes to formalise AI safety assurance.

4.1.1. Regulatory Direction and Requirements

From a regulatory standpoint, both the EU and the US have published legislation and directives to this effect. The European Union’s AI Act [59], adopted in 2024, classifies many applications of AI as “high-risk”, imposing requirements on transparency and prohibiting select practices, while also outlining human oversight and governance [59,60].
In the United States, Executive Order 14110 on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence [61] establishes government-wide principles for AI safety and accountability. Although broader in scope and non-sector-specific, it directs federal agencies, including the FAA, to incorporate these principles into oversight.

4.1.2. EASA AI Roadmap and Related Initiatives

EASA has been instrumental in the establishment of early regulatory concepts and efforts to prepare aviation for the integration of AI. Its AI Roadmap 2.0 [6] expands upon the initial 2020 edition, positioning trustworthiness as the central principle for AI adoption in safety-critical domains. The roadmap defines the following building blocks for trustworthy AI:
  • Trustworthiness analysis: Systematic evaluation of AI systems against principles such as safety, robustness, security, and accountability.
  • Assurance concepts: Development of methods, artefacts, and compliance evidence to demonstrate conformity with certification requirements.
  • Human factors: Addressing human-AI interaction, usability, and operational explainability to ensure safe and effective teaming.
  • Safety risk mitigation: Identification, assessment, and reduction in AI-specific hazards to maintain an acceptable level of safety.
Extended life cycle processes, including data governance, continuous monitoring, and retraining provisions, are emphasised throughout the document to ensure safety assurance is maintained during operation.
Importantly, the roadmap is closely aligned with broader European policy instruments, most notably the EU AI Act, ensuring regulatory traceability and harmonisation with risk-based classifications or prohibition measures. EASA has also issued concept papers providing early guidance for machine learning applications (Levels 1 and 2), which already inform certification projects and anticipate the transition to more advanced automation [34,36].
Complementing regulatory development, EASA actively invests in research and stakeholder collaboration. Initiatives such as the Horizon Europe MLEAP project provide a testing ground for AMC [35], while Innovation Partnership Contracts (IPCs) and Memoranda of Cooperation (MoCs) enable early engagement with industry on AI applications.
In parallel, EASA co-chairs a joint initiative between EUROCAE Working Group 114 (WG-114) and the SAE G-34 Artificial Intelligence in Aviation Committee [62]. This group, comprising over 600 participants, includes researchers, SMEs, and representatives from global regulators and authorities such as EASA, Transport Canada Civil Aviation (TCCA), Brazil’s National Civil Aviation Agency (ANAC), NASA, the U.S. DoD, and EUROCONTROL. The collaboration also involves airframe manufacturers, UAV and aircraft manufacturers, technology providers, and other stakeholders. The WG-114/SAE G-34 aims to develop common standards, guidance materials, and related documents to support the certification and approval of aeronautical safety-related products incorporating AI technologies.
Figure 5 shows the planned and in-progress deliverables roadmap from EASA and WG-114/SAE G-34, indicating a promising direction towards future clarity in AI integration within aviation systems [6].
These EASA initiatives and the AI Roadmap illustrate a multi-layered approach that balances regulatory rigour with innovation support. Together, they provide a structured foundation for managing the risks and opportunities of AI adoption while promoting international harmonisation.

4.1.3. FAA AI Roadmap and Approach

In July 2024, the FAA published its first Roadmap for Artificial Intelligence Safety Assurance [22]. The roadmap provides a principle-driven strategy to assure AI safety in aviation, aligning with US Executive Order 14110 on trustworthy AI [61]. Seven guiding principles are highlighted by the roadmap [22]:
  • Work within the aviation ecosystem;
  • Focus on safety assurance and safety enhancements;
  • Avoid the personification of AI;
  • Differentiate between learned AI (static or offline trained) and learning AI (dynamic or adaptive);
  • Take an incremental approach;
  • Leverage the safety continuum (proportional assurance relative to function criticality);
  • Adopt industry consensus standards where feasible.
The FAA emphasises an incremental and scalable approach, encouraging industry-led projects with lower DAL functions to build “certification readiness” and experience. Figure 6 shows certification readiness development along with proposed DAL use cases.
A major differentiation is made between learned and learning AI implementations. These implementations are categorised with relation to their training set being dynamic or static during operation. Practically applied to training activities completed prior to deployment and use, versus a continuous learning mechanism. The latter will not be possible to qualify for each established generation, but rather, the focus is to be placed on the qualification of the mechanism itself.
The roadmap further identifies various actions, including [22]:
  • Industry collaboration and sharing of knowledge;
  • Workforce training and readiness;
  • Enhancing assurance of AI safety;
  • Inclusion of AI in the safety life cycle by means of monitoring and prediction of anomalous events;
  • Research and development of new assurance methods, such as exploring Overarching Properties (OPs) (intent, correctness, and innocuity) [63].
Planned milestones include the publishing of policy statements and Certification Position Papers (CPPs), as well as eventual Advisory Circulars (ACs) and consensus standards.
The FAA roadmap complements EASA initiatives: where EASA aims to codify life cycle assurance through structured artefacts (MLEAP tasks and W-model extensions), FAA focuses on embedding AI within the existing certification ecosystem, scaling up assurance with risk, and avoiding premature over-regulation. Both approaches highlight the importance of international harmonisation through G-34/WG-114 and standards development.

4.2. Thematic Analysis of Interviews

Six anonymised semi-structured interviews were thematically analysed to identify recurring issues in the certification of AI-based aviation systems. Thematic coding followed the five guiding prompts mentioned earlier. Strong convergence was observed across participants, with universal agreement on regulatory lag, data assurance, explainability, and continuous monitoring needs (see Figure 7).
  • Regulatory frameworks and gaps: All participants noted that existing standards assume deterministic behaviour and offer no specific guidance for adaptive AI systems. EASA and FAA roadmaps were viewed as necessary but remain conceptual. Several suggested interim, risk-based approaches to enable learning within certification practice.
  • Life cycle assurance: Interviewees emphasised the absence of post-deployment oversight and the need for continuous verification or re-certification of learning models. Continuous monitoring was viewed as essential to maintain safety and trust. Participant F proposed an ongoing “assurance auditor” concept, while Participant C advocated starting with learned (not learning) systems to build confidence progressively.
  • Evidence and credibility: Traditional artefacts were seen as insufficient; participants called for traceable datasets, transparent training records, and validation against real-world operational conditions. Participant D emphasised that “real-world representative data is needed, not just simulated.” Explainability was consistently highlighted as foundational for confidence and regulatory acceptance.
  • Practical implementation: Barriers identified included a lack of verification tools, limited interdisciplinary expertise, and cultural resistance within conservative engineering environments. Participant D observed a generational divide in AI acceptance. Participants stressed that human accountability must remain central and that AI cannot certify itself.
  • Future outlook: Experts anticipated a gradual integration of AI into certification processes, starting with low-criticality applications and post-production phases. Participant A described a “small victory approach”, systematically deploying AI in manageable contexts before expanding to safety-critical functions. Collaboration between regulators, researchers, and industry, alongside iterative “learning” frameworks, was seen as a key enabler toward trustworthy AI assurance.

4.3. Life Cycle-Oriented Assurance

A dominant theme across regulatory and research sources is the recognition that AI assurance must be continuous and life cycle-based. Both EASA and FAA emphasise that safety assurance for AI must extend from the earliest concept stage through to operational service [64]. The U.S. Department of Transport’s AI assurance whitepaper reinforces this view, noting that “AI requirements, data, and safety assessment” must be addressed “throughout the AI development, deployment, and operation phases” [65]. Similarly, the FAA AI Roadmap further identifies AI monitoring as part of the aircraft Continued Operational Safety (COS) programme [22,37].
These sources and the aviation sector’s long-standing safety record indicate that certification cannot end at deployment. Instead, assurance during operation must be continuously verified for compliance. For learning systems, the FAA further notes that the learning mechanism itself requires compliance monitoring [22].

4.4. Iterative and Continuous Learning

While existing frameworks acknowledge the need for iteration, they often lack full continuity. For example, the EASA W-shaped learning assurance cycle extends the traditional V-model by explicitly including dataset preparation and model training steps alongside design and verification activities. This acknowledges that AI development is not linear, but rather involves repeated cycles of data collection, training, and evaluation. However, these frameworks typically conclude at deployment and do not prescribe structured mechanisms for continuous post-deployment monitoring, retraining, and re-certification. In practice, this means that without explicit guidance, retraining and re-deployment could occur without sufficient oversight.

4.5. Traceability and Evidence

Across both regulatory guidance and industrial practice, traceability and verifiable evidence are universally highlighted as essential. Standards such as ARP4754B/ED-79A [31], DO-178C/ED-12C [44], and DO-254/ED-80 [43] require bidirectional links between requirements and implementation.
In the AI context, this extends to data: training datasets must be traced to requirements, and test results must be reproducible and recorded. EASA’s “Learning Assurance” concept [38] focuses on the safety case as the integrating artefact to provide structure and traceable evidence to justify confidence in AI behaviour. Regulators anticipate that any workable AI life cycle framework will require such verifiable artefacts for every phase and iteration, including data and model evolution, and validation outcomes.

4.6. Accountability and Oversight

Accountability emerged as a core theme. Regulators repeatedly point out that responsibility for the assurance of AI systems rests with developers, operators, and certifying organisations. The FAA explicitly warns against “personification” of AI, stressing that collectively all parties remain accountable, and remains a tool with human oversight [22].
This has direct implications for future frameworks as they must specify who certifies data adequacy, who validates the model, and who monitors operations. Such explicit declarations ensure not only regulatory compliance but also organisational readiness for AI adoption in safety-critical environments.

4.7. Practical Barriers to Implementation

Beyond theoretical and policy concerns, the research identified numerous practical barriers hindering the implementation of AI in aviation. These challenges include technical, organisational, and cultural domains, reflecting that introducing AI is as much a human and institutional challenge as a technological one.

4.7.1. Data Limitations

High-quality and representative data remain a bottleneck. Obtaining sufficiently large, diverse, and validated datasets for safety-critical use cases is difficult, as hazardous or rare operational scenarios seldom occur. While synthetic data offers a partial solution, it introduces the “reality gap” problem, where simulated data may not reflect true operational variability. Without robust data, both model training and testing are limited, undermining confidence in AI behaviour under edge conditions.

4.7.2. Verification Tools and Techniques

Traditional verification and validation methods struggle to address AI’s non-deterministic and opaque characteristics. Emerging tools for adversarial testing and XAI remain immature and disconnected from trusted aerospace toolchains. Consequently, engineers lack standardised means to validate AI performance and safety bounds, making assurance labour-intensive and uncertain.

4.7.3. Skill and Knowledge Gaps

AI assurance demands interdisciplinary expertise, including data science, software engineering, and system safety—all skills rarely co-existing in relatively small teams. The findings reveal a shortage of professionals capable of bridging all these domains. Regulators and manufacturers alike reported challenges in evaluating or implementing AI designs. This expertise gap highlights the need for dedicated training and collaborative research programmes for certifying AI systems.

4.7.4. Organisational and Cultural Resistance

Aviation’s inherent but warranted safety culture often translates into institutional conservatism. Many organisations remain reluctant to integrate AI due to fears of opaque black-box decision-making, public mistrust, or regulatory uncertainty. Interview participants noted that project leads sometimes avoid AI entirely in favour of deterministic logic to clear certification. Overcoming this requires not only technological advances in explainability but also confidence-building measures, such as incremental certification trials and regulator leadership.

4.7.5. Regulatory Uncertainty and Legal Limitation

The absence of formalised AI certification standards leads to hesitation across the industry. Organisations are wary of investing in systems that may not align with future requirements. Furthermore, unresolved liability questions regarding the accountability for AI-driven failures constrain deployment. This legal ambiguity contributes to a conservative but valid stance by limiting AI use to advisory or low-criticality functions until clearer regulatory and legal frameworks emerge.

4.7.6. Ethical and Security Concerns

Ensuring that AI models operate free from bias and are proven to be resilient against adversarial manipulation remains vital for trustworthiness. These dimensions extend the assurance challenge beyond traditional safety, aligning with the EU AI Act’s broader trustworthiness principles [59]. However, the lack of established procedures for bias testing, vulnerability assessment, and secure retraining creates further uncertainty for certification of higher-level autonomous applications.

4.8. Synthesis of Findings

These findings illustrate that while AI offers transformative potential, aviation faces significant practical challenges in achieving trustworthy, certifiable AI systems. The barriers identified emphasise that a comprehensive methodology must go beyond defining assurance requirements. It must also enable practitioners to overcome real-world implementation obstacles through structured processes, applied during the full life cycle to sustain safety assurance.
A comparative summary of the key standards, guidance documents, and academic contributions reviewed in this study is presented in Table 1. It illustrates both the strengths that support current assurance practice and the limitations/gaps that constrain the certification of AI-based systems.

5. Proposed Framework

The integration of AI into safety-critical aviation systems challenges traditional certification frameworks. In response, the authors propose a conceptual methodology that extends existing system development and certification processes to incorporate AI-specific assurance, continuous operational monitoring, and adaptive safety verification. The framework is based on established standards while embedding mechanisms to ensure trustworthiness and continued safety assurance across all phases of the system life cycle.

5.1. Framework Principles

The framework is guided by the following principles:
  • Full life cycle integration/coverage;
  • Compatibility with existing standards;
  • Continuous safety assurance.

5.1.1. Life Cycle Integration

Airworthiness certification should encompass the entire life cycle of an AI-enabled aviation system, from concept definition to decommissioning. The framework is mapped to common life cycle phases, ensuring that assurance activities are not confined to the development phase but extend through operations and disposal.

5.1.2. Compatibility with Existing Standards

The framework maintains interoperability with current aerospace guidance, notably ARP4754B/ED-79B for system development, ARP4761A/ED-135A for safety assessment, DO-178C/ED-12C for software, and DO-254/ED-80 for hardware. Rather than replacing these, it augments them with emerging AI-specific processes described by ARP6983/ED-324 and ISO/IEC TR 5469:2024. This ensures that AI components, or machine learning constituents, can be integrated into existing certification workflows without disrupting established assurance efforts.

5.1.3. Continuous Safety Assurance

Trustworthiness of AI cannot be guaranteed through one off verification. The framework therefore introduces a continuous assurance model, integrating operational feedback, model drift detection, and retraining oversight as extensions of the existing COS guidance. These processes sustain compliance throughout the system’s operational life.

5.2. Framework Structure

Figure 8 illustrates the overall structure of the proposed life cycle integrated assurance framework. The model aligns three primary development and assurance layers—aircraft or system level, item level, and ML constituent level—across the full system life cycle, from concept and preliminary design through operations and disposal.
At the system level, processes follow ARP4754B, beginning with needs analysis, safety assessment, and allocation of functional and safety requirements. These activities define the assurance baseline for subordinate development.
At the item level, a clear categorisation is established based on ML inclusion. If a system segment or item is based on ML/AI-enabled architectures, the development process will branch to the ML constituent level. For traditional hardware and software elements, development will fall under DO-254 and DO-178C, respectively, maintaining bidirectional traceability to system-level requirements and verification objectives.
At the ML constituent level, the framework introduces a development thread based on ARP6983. It covers ML and data requirements, dataset assurance, model design, ML requirements validation, ML verification, and explainability analysis or descriptions. Assurance artefacts generated at this level, such as data lineage records and model validation/verification evidence, are to be integrated back into the higher-level processes, preserving traceability and supporting certification review. An ML constituent item is then implemented on an architecture design supported by hardware and software guidance (DO-254 and DO-178C), with ML considerations and approved practices.

5.3. Integration of Assurance Activities

The methodology establishes a multi-layered assurance ecosystem, ensuring consistency between classical and AI-specific artefacts:
  • Requirement continuity: System, item, and AI-specific requirements are linked through a unified traceability chain. This allows verification of how AI behaviour contributes to, or constrains, overall aircraft safety functions.
  • Safety assessment integration: Hazard analyses performed under ARP4761A are proposed to be extended to include data-related risks and model failure modes. Failure conditions arising from data bias, drift, or misclassification are addressed alongside hardware and software hazards.
  • Verification and validation synergy: AI verification and validation (V&V) activities, such as ML requirements validation and ML verification, are treated as standalone assurance steps within system verification and validation plans. This promotes consistency with existing means of compliance and development.
  • Iterative development: The assurance process is recursive. Data quality findings, verification anomalies, or performance degradations inform updates to both the AI model and associated safety artefacts, maintaining alignment with system-level requirements baselines.
This integrated approach aims to enable the certification authority to evaluate AI components as traceable, verifiable, and explainable elements within an established assurance framework.

5.4. Operational Monitoring and Feedback

A central contribution of this framework is its explicit extension into the operational phase, ensuring that safety assurance remains active after certification. The model introduces three interdependent layers of monitoring:
  • Level 1—Item level monitoring: Continuous surveillance of low-level model performance metrics, dataset integrity, and operational validity. This detects anomalies or data drifting that may compromise safety.
  • Level 2—System-level monitoring: Integration of learned/static AI systems and human in the loop monitoring for high criticality operations, enabling intervention where automated reasoning deviates from expected behaviour.
  • Level 3—Aircraft level monitoring and notification: Aggregated operational feedback supports fleet-wide trend analysis and predictive maintenance, similar to COS concept frameworks proposed by the FAA.
These layers form a spiral monitoring model that closes the loop between operation and certification (deployment). Information from monitoring triggers retraining requests, design updates, or regulatory notifications, ensuring that AI-enabled systems maintain conformity with their certified performance envelopes.
By embedding monitoring as a continuous assurance function, the methodology transforms certification from a static approval process into a dynamic life cycle activity capable, in concept, of adapting to the evolving behaviour of learning systems.

6. Conclusions

This research examined how existing certification practices can evolve to accommodate AI within safety-critical aviation systems. Through analysis of current standards, regulatory initiatives, academic literature, and expert perspectives, the study identified key limitations in applying deterministic assurance frameworks to adaptive, data-driven technologies. Traditional certification models remain essential, yet they must expand to address post-deployment oversight, traceable data governance, and explainable system behaviour.
In response, a conceptual methodology was proposed that integrates AI-specific assurance and verification activities within established aerospace certification frameworks. The framework extends standards such as ARP4754B, Christensen et. al., DO-178C, and DO-254 by embedding continuous monitoring, retraining oversight, and traceable evidence generation across the entire life cycle. In doing so, it provides a pathway for aligning AI assurance principles with existing airworthiness expectations, supporting trustworthy and transparent certification outcomes.
This study intentionally presents the framework at a conceptual level, establishing a foundation for future operational validation. Subsequent research should focus on piloting the framework in representative certification contexts to refine assurance artefacts, define quantitative criteria for AI robustness, performance, and explainability, and develop supporting tools for implementation. The practical realisation of this approach will depend on regulatory alignment and the continued maturation of emerging standards such as ARP6983.
By advancing towards such validation, the proposed model can evolve into a practical methodology capable of sustaining continuous safety assurance for AI-enabled aviation systems.

Author Contributions

Writing—original draft and editing, A.S.; supervision, A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and ratified by the Ethics Committee of the University of the Witwatersrand (MIAEC 059/25, 21 July 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All data is anonymised and managed in accordance with the applicable regulations and ethics policies. Data can be made available upon request to the corresponding author.

Acknowledgments

The author would like to thank Aarti Panday for insightful guidance, the engineering professionals who participated in interviews for sharing their expertise. Special gratitude is also extended to the author’s family for their unwavering support and encouragement throughout this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ACAdvisory Circular
AIArtificial Intelligence
AMCAccepted Means of Compliance
ANACBrazil’s National Civil Aviation Agency
ARPAerospace Recommended Practice
ATCAir Traffic Control
ATMAir Traffic Management
CENELECEuropean Committee for Electrotechnical Standardization
COSContinued Operational Safety
CPPCertification Position Papers
DALDesign Assurance Level
DevOpsDevelopment Operations
DLDeep Learning
DoDDepartment of Defense
EASAEuropean Union Aviation Safety Agency
EUEuropean Union
EUROCAEEuropean Organisation for Civil Aviation Equipment
FAAFederal Aviation Administration
GNCGuidance, Navigation and Control
IECInternational Electrotechnical Commission
IPCInnovation Partnership Contracts
ISOInternational Standardisation Organisation
JTCJoint Technical Committee
LAISCLandscape of AI Safety Concerns
M&M’sMetrics and Mitigation Measures
MLMachine Learning
MLEAPMachine Learning Application Approval
MLOpsMachine Learning Operations
MoCMemoranda of Cooperation
NASANational Aeronautics and Space Administration
ODDOperational Design Domain
OPOverarching Properties
RTCARadio Technical Commission for Aeronautics
SAESociety of Automotive Engineers
SLRSystematic Literature Review
TCType Certification
TCCATransport Canada Civil Aviation
TRTechnical Report
U.S.United States
UAVUnmanned Aerial Vehicle
V&VVerification and Validation
VRVerifiable Requirement
WGWorking Group
XAIExplainable AI

References

  1. Baker, D. Scientific American Inventions From Outer Space: Everyday Uses For NASA Technology; Random House Reference and Information Publishing: New York, NY, USA, 2000. [Google Scholar]
  2. Bergan, B. Space Race 2.0: SpaceX, Blue Origin, Virgin Galactic, NASA, and the Privatization of the Final Frontier; Motorbooks International: Beverly, MA, USA, 2022. [Google Scholar]
  3. Elliott, S. Air-Travel Advancements in One Lifetime, Impressive, Final ed.; Gannett Media Corp: Evansville, Indiana, 2005. [Google Scholar]
  4. Turing, A.M. Computing Machinery and Intelligence. Mind 1950, LIX, 433–460. [Google Scholar] [CrossRef]
  5. McCarthy, J.; Minsky, M.; Rochester, N.; Shannon, C. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence: 1955. AI Mag. 2006, 27, 12–14. [Google Scholar] [CrossRef]
  6. European Union Aviation Safety Agency (EASA). Artificial Intelligence Roadmap 2.0: A human-Centric Approach to AI in Aviation; EASA: Cologne, Germany, 2023. [Google Scholar]
  7. Lu, Y. Artificial intelligence: A survey on evolution, models, applications and future trends. J. Manag. Anal. 2019, 6, 1–29. [Google Scholar] [CrossRef]
  8. Jones, E. Digital disruption: Artificial intelligence and international trade policy. Oxf. Rev. Econ. Policy 2023, 39, 70–84. [Google Scholar] [CrossRef]
  9. Zaoui, A.; Tchuente, D.; Wamba, S.F.; Kamsu-Foguem, B. Impact of artificial intelligence on aeronautics: An industry-wide review. J. Eng. Technol. Manag. 2024, 71, 101800. [Google Scholar] [CrossRef]
  10. Wang, Y.; Chung, S. Artificial intelligence in safety-critical systems: A systematic review. Ind. Manag. Data Syst. 2022, 122, 442–470. [Google Scholar] [CrossRef]
  11. Merkert, R.; Bushell, J. Managing the drone revolution: A systematic literature review into the current use of airborne drones and future strategic directions for their effective control. J. Air Transp. Manag. 2020, 89, 101929. [Google Scholar] [CrossRef]
  12. Anonymous. US Air Travel Volumes Topping 2019 Levels-Market Talk; Dow Jones & Company Inc.: New York, NY, USA, 2023. [Google Scholar]
  13. Nemeth, C.; Holbrook, J. Resilience Engineering’s Potential for Advanced Air Mobility (AAM). In Proceedings of the 2021 International Symposium on Aviation Psychology; International Symposium on Aviation Psychology: Corvallis, OR, USA, 2021. [Google Scholar]
  14. Xiao, M.; Zhang, J.; Cai, K.; Cao, X. ATCEM: A synthetic model for evaluating air traffic complexity. J. Adv. Transp. 2016, 50, 315–325. [Google Scholar] [CrossRef]
  15. Tafur, C.; Camero, R.; Rodríguez, D.; Rincón, J.; Saenz, E. Applications of artificial intelligence in air operations: A systematic review. Results Eng. 2025, 25, 103742. [Google Scholar] [CrossRef]
  16. Fox, S. The ‘risk’ of disruptive technology today (A case study of aviation-Enter the drone). Technol. Soc. 2020, 62, 101304. [Google Scholar] [CrossRef]
  17. Weissinger, L.B. AI, Complexity, and Regulation. In The Oxford Handbook of AI Governance; Bullock, J.B., Chen, Y., Himmelreich, J., Hudson, V.M., Korinek, A., Young, M.M., Zhang, B., Eds.; Oxford University Press: Oxford, UK, 2022. [Google Scholar]
  18. Elliott, D.; Soifer, E. AI Technologies, Privacy, and Security. Front. Artif. Intell. 2022, 5, 826737. [Google Scholar] [CrossRef] [PubMed]
  19. Dunn, M. SpaceX Successfully Launches and Catches Starship Booster in Fifth Test Flight; Los Angeles Times Communications LLC: Los Angeles, CA, USA, 2024. [Google Scholar]
  20. SpaceX. “Starship’s Fifth Flight Test”. 2024. Available online: https://www.spacex.com/launches/mission/?missionId=starship-flight-5 (accessed on 20 April 2025).
  21. Schnitzer, R.; Kilian, L.; Roessner, S.; Theodorou, K.; Zillner, S. Landscape of AI safety concerns-A methodology to support safety assurance for AI-based autonomous systems. arXiv 2024, arXiv:2412.14020. [Google Scholar] [CrossRef]
  22. Federal Aviation Administration (FAA). Roadmap for Artificial Intelligence Safety Assurance; FAA: Washington, DC, USA, 2024.
  23. Burgess, J.A. Design assurance-a tool for excellence. Eng. Manag. Int. 1988, 5, 25–30. [Google Scholar] [CrossRef]
  24. Nouri, A.; Warmuth, J. IEC 61508 and ISO 26262—A Comparison Study. In Proceedings of the 2021 5th International Conference on System Reliability and Safety (ICSRS), Palermo, Italy, 24–26 November 2021. [Google Scholar] [CrossRef]
  25. IEC 61508; Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems (E/E/PE, or E/E/PES). International Electrotechnical Commission (IEC): Geneva, Switzerland, 2010.
  26. ISO 26262; Road Vehicles-Functional Safety. International Organization for Standardization (ISO): Geneva, Switzerland, 2018.
  27. McDermid, J.A.; Jia, Y.; Habli, I. Towards a Framework for Safety Assurance of Autonomous Systems. In Proceedings of the Artificial Intelligence Safety 2019 (SafeAI 2019) Workshop, Stockholm, Sweden, 19 July 2019; CEUR Workshop Proceedings; CEUR-WS.org: Aachen, Germany, 2019; Volume 2419, pp. 1–7. [Google Scholar]
  28. Habli, I.; Lawton, T.; Porter, Z. Artificial intelligence in health care: Accountability and safety. Bull. World Health Organ. 2020, 98, 251. [Google Scholar] [CrossRef] [PubMed]
  29. Wilson, S.; Kelly, T.; McDermid, J. Safety case development: Current practice, future prospects. In Proceedings of the Safety and Reliability of Software Based Systems: Twelfth Annual CSR Workshop, Bruges, Belgium, 12–15 September 1997; Springer: London, UK, 1997. [Google Scholar] [CrossRef]
  30. Silva Neto, A.V.; Camargo, J.B.; Almeida, J.R.; Cugnasca, P.S. Safety Assurance of Artificial Intelligence-Based Systems: A Systematic Literature Review on the State of the Art and Guidelines for Future Work. IEEE Access 2022, 10, 130733–130770. [Google Scholar] [CrossRef]
  31. ARP4754B/ED-79; Guidelines for Development of Civil Aircraft and Systems. SAE International: Warrendale, PA, USA, 2023. [CrossRef]
  32. ARP4761A; Guidelines for Conducting the Safety Assessment Process on Civil Aircraft, Systems, and Equipment. SAE International: Warrendale, PA, USA, 2023. [CrossRef]
  33. Luettig, B.; Akhiat, Y.; Daw, Z. ML meets aerospace: Challenges of certifying airborne AI. Front. Aerosp. Eng. 2024, 3, 1475139. [Google Scholar] [CrossRef]
  34. European Union Aviation Safety Agency (EASA). EASA Concept Paper: First Usable Guidance for Level 1 Machine Learning Applications-A Deliverable for the EASA AI Roadmap; EASA: Cologne, Germany, 2021. [Google Scholar]
  35. European Union Aviation Safety Agency (EASA). Machine Learning Application Approval (MLEAP) Final Report; Research project final report, EASA, Horizon Europe; EASA: Cologne, Germany, 2024. [Google Scholar]
  36. European Union Aviation Safety Agency (EASA). Artificial Intelligence (AI) Concept Paper Issue 2: Guidance for Level 1 & 2 Machine Learning Applications; EASA: Cologne, Germany, 2024. [Google Scholar]
  37. Federal Aviation Administration (FAA). Continued Operational Safety (COS) Report; FAA: Washington, DC, USA, 2021.
  38. Christensen, J.M.; Stefani, T.; Anilkumar Girija, A.; Hoemann, E.; Vogt, A.; Werbilo, V.; Durak, U.; Köster, F.; Krüger, T.; Hallerbach, S. Formulating an Engineering Framework for Future AI Certification in Aviation. Aerospace 2025, 12, 482. [Google Scholar] [CrossRef]
  39. Demir, G.; Moslem, S.; Duleba, S. Artificial Intelligence in Aviation Safety: Systematic Review and Biometric Analysis. Int. J. Comput. Intell. Syst. 2024, 17, 279. [Google Scholar] [CrossRef]
  40. Vasudevan, V.; Abdullatif, A.; Sohag, K.; Campean, F. Certifiability Analysis of Machine Learning Systems for Low-Risk Automotive Applications. Computer 2024, 57, 45–56. [Google Scholar] [CrossRef]
  41. Federal Aviation Administration (FAA). Order 8110.4C-Type Certification; FAA: Washington, DC, USA, 2017.
  42. Peterson, E.M. Application of SAE ARP4754A to Flight Critical Systems; NASA/CR-2015-218982; NASA, Langley Research Center: Hampton, VA, USA, 2015.
  43. RTCA/DO-254; Design Assurance Guidance for Airborne Electronic Hardware. Radio Technical Commission for Aeronautics (RTCA), Inc.: Washington, DC, USA, 2000.
  44. RTCA/DO-178C; Software Considerations in Airborne Systems and Equipment Certification. Radio Technical Commission for Aeronautics (RTCA), Inc.: Washington, DC, USA, 2011.
  45. RTCA/DO-160G; Environmental Conditions and Test Procedures for Airborne Equipment. RTCA, Inc.: Washington, DC, USA, 2010.
  46. RTCA/DO-330; Software Tool Qualification Considerations. RTCA, Inc.: Washington, DC, USA, 2011.
  47. RTCA/DO-331; Model-Based Development and Verification Supplement to DO-178C and DO-278A. RTCA, Inc.: Washington, DC, USA, 2011.
  48. RTCA/DO-332; Object-Oriented Technology and Related Techniques Supplement to DO-178C and DO-278A. RTCA, Inc.: Washington, DC, USA, 2011.
  49. RTCA/DO-333; Formal Methods Supplement to DO-178C and DO-278A. RTCA, Inc.: Washington, DC, USA, 2011.
  50. Advisory Circular 20-174; Development of Civil Aircraft and Systems. Federal Aviation Administration (FAA): Washington, DC, USA, 2011.
  51. Advisory Circular 25.1309-1B; System Design and Analysis. Federal Aviation Administration (FAA): Washington, DC, USA, 2024.
  52. ARP4754A; Guidelines for Development of Civil Aircraft and Systems. SAE International: Warrendale, PA, USA, 2010. [CrossRef]
  53. MIL-HDBK-516C; Airworthiness Certification Criteria. U.S. Department of Defense (DoD): Washington, DC, USA, 2014.
  54. MIL-STD-882E; System Safety. U.S. Department of Defense (DoD): Washington, DC, USA, 2012.
  55. Zednik, C. Solving the black box problem: A normative framework for explainable artificial intelligence. Philos. Technol. 2021, 34, 265–288. [Google Scholar] [CrossRef]
  56. European Union Aviation Safety Agency (EASA). EASA AI Days High-Level Conference 2024-Presentations Day 1; EASA: Cologne, Germany, 2024. [Google Scholar]
  57. ARP 6983/ED-324; G-34 Artificial Intelligence in Aviation Committee. SAE International: Warrendale, PA, USA, 2025.
  58. ISO/IEC TR 5469; 2024-Artificial Intelligence-Functional Safety and AI Systems. ISO/IEC: Geneva, Switzerland, 2024.
  59. European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act); European Union: Brussels, Belgium, 2024. [Google Scholar]
  60. Edwards, L. The EU AI Act: A summary of its significance and scope. Artif. Intell. (EU AI Act) 2021, 1, 25. [Google Scholar]
  61. The White House. Executive Order 14110 of October 30, 2023: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence; The White House: Washington, DC, USA, 2023; Volume 88, pp. 75191–75231.
  62. SAE International; EUROCAE. EUROCAE WG114-SAE G34: A Joint Standardisation Initiative to Support Artificial Intelligence Revolution in Aeronautics; SAE International: Warrendale, PA, USA, 2021. [Google Scholar]
  63. Holloway, C.M. Understanding the Overarching Properties: Intent, Correctness, Innocuity; NASA/TM–2019–220292; National Aeronautics and Space Administration, Langley Research Center: Hampton, VA, USA, 2019.
  64. Federal Aviation Administration (FAA). Technical Discipline: Artificial Intelligence-Machine Learning. 2024. Available online: https://www.faa.gov/aircraft/air_cert/step/disciplines/artificial_intelligence (accessed on 23 June 2025).
  65. U.S. Department of Transportation. An Overview of AI Assurance for Transportation; U.S. Department of Transportation: Washington, DC, USA, 2024.
Figure 1. Overview of the learning assurance W-shaped process overlapping the non-AI component V-model [35].
Figure 1. Overview of the learning assurance W-shaped process overlapping the non-AI component V-model [35].
Engproc 132 00007 g001
Figure 2. Overview of aircraft certification process and standards.
Figure 2. Overview of aircraft certification process and standards.
Engproc 132 00007 g002
Figure 3. Development processes related to aircraft certification, extracted from ARP4754A [52].
Figure 3. Development processes related to aircraft certification, extracted from ARP4754A [52].
Engproc 132 00007 g003
Figure 4. Envisaged role of SAE ARP6983/ED324 in supporting certification of AI-enabled aviation systems.
Figure 4. Envisaged role of SAE ARP6983/ED324 in supporting certification of AI-enabled aviation systems.
Engproc 132 00007 g004
Figure 5. EASA AI Roadmap 2.0 timeline indicating deliverables and approval milestones [6].
Figure 5. EASA AI Roadmap 2.0 timeline indicating deliverables and approval milestones [6].
Engproc 132 00007 g005
Figure 6. FAA AI roadmap certification readiness development [22].
Figure 6. FAA AI roadmap certification readiness development [22].
Engproc 132 00007 g006
Figure 7. Thematic map showing recurring issues identified by participants.
Figure 7. Thematic map showing recurring issues identified by participants.
Engproc 132 00007 g007
Figure 8. Proposed life cycle-integrated assurance framework for AI-enabled aviation systems, aligning classical system development standards with emerging ML guidance and extending assurance into continuous operational monitoring.
Figure 8. Proposed life cycle-integrated assurance framework for AI-enabled aviation systems, aligning classical system development standards with emerging ML guidance and extending assurance into continuous operational monitoring.
Engproc 132 00007 g008
Table 1. Select aerospace standards, guidance and research contributions applicable to AI assurance in aviation.
Table 1. Select aerospace standards, guidance and research contributions applicable to AI assurance in aviation.
Source/FrameworkStrengthsGaps for AI
ARP4754A/B [52]System-level developmentDoes not address adaptive learning
ARP6983/ED-324 AI-enabled systems focusWork in progress
DO-178C/DO-254Mature software/hardware assuranceNot suited to non-deterministic ML/AI
IEC 61508/ISO 26262Life cycle safety principlesAssume predictability
EU AI ActHigh-risk AI governanceGeneric, not aviation-specific
EASA AI Roadmap 2.0Early guidance (Level 1/2 ML)Limited to low autonomy
FAA AI Roadmap, Version 1Early guidance, industry-led,
continuous monitoring
Limited to lower DAL (currently)
ISO/IEC TR 5469:2024Aligns AI with safety life cycleExploratory, lacks prescriptive metrics
Silva Neto et al. (2022) [30]Broad SLR, assurance themesNo unified framework
Schnitzer et al. (2024) [21]LAISC links concerns to VRsSingle case study
Demir et al. (2024) [39]Trends in AI safety researchMinimal focus on certification
Christensen et al. (2025) [38]Novel approach to fluid nature of AICertification remains unclear
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Schoeman, A.; Panday, A. Certification of AI-Based Aviation Systems: A Methodology for Continuous Safety Assurance Across the System Life Cycle. Eng. Proc. 2026, 132, 7. https://doi.org/10.3390/engproc2026132007

AMA Style

Schoeman A, Panday A. Certification of AI-Based Aviation Systems: A Methodology for Continuous Safety Assurance Across the System Life Cycle. Engineering Proceedings. 2026; 132(1):7. https://doi.org/10.3390/engproc2026132007

Chicago/Turabian Style

Schoeman, André, and Aarti Panday. 2026. "Certification of AI-Based Aviation Systems: A Methodology for Continuous Safety Assurance Across the System Life Cycle" Engineering Proceedings 132, no. 1: 7. https://doi.org/10.3390/engproc2026132007

APA Style

Schoeman, A., & Panday, A. (2026). Certification of AI-Based Aviation Systems: A Methodology for Continuous Safety Assurance Across the System Life Cycle. Engineering Proceedings, 132(1), 7. https://doi.org/10.3390/engproc2026132007

Article Metrics

Back to TopTop