1. Introduction
New technologies and data-driven algorithms bring both opportunities and challenges for aircraft maintenance. Traditionally, the aircraft maintenance process consists of periodic tasks performed by mechanics at pre-determined, fixed time intervals, i.e., time-based maintenance (TBM) [
1]. In the last years, however, aircraft maintenance has increasingly made use of on-board sensors, aircraft condition monitoring systems (ACMS), and data-driven predictive algorithms. These new technologies increase the level of automation of the aircraft maintenance process. For example, on-board sensors and ACMS are used to continuously monitor the health condition of aircraft systems. The data are used to make predictions about the degradation levels of the systems. For example, data-driven algorithms are developed to detect damages (diagnostics) and predict the remaining useful life (RUL) of aircraft systems (prognostics) [
2,
3]. Using such predictive algorithms, maintenance tasks are generated only when needed [
4]. We refer this process of using sensor data and predictive algorithms to generate maintenance tasks as data-driven predictive aircraft maintenance (PdAM).
The use of data-driven technologies for aircraft maintenance poses novel challenges. For example, the retrieval, storage, processing, and utilization of sensor data involves risks such as data loss, data corruption, data transmission delays, and lack of accuracy of failure prediction algorithms. Furthermore, new experts handling the data and algorithms need to be involved in the traditional aircraft maintenance process. The manner in which these new experts interact with the existing maintenance teams may lead to new challenges. Thus, to safely implement data-driven PdAM, an analysis of emerging challenges is required.
To the best of our knowledge, emerging challenges of data-driven PdAM have not yet been identified and discussed. Existing studies mostly discuss challenges associated with the traditional aircraft maintenance process, TBM. In [
5], the authors use an extensive safety questionnaire and show that the behavior of the maintenance personnel is a critical contributing factor to errors in aircraft maintenance. In [
6], the authors show that the manner in which the maintenance personnel interact with each other and their use of hardware/software are the main contributing factors to human errors in aircraft maintenance. However, these studies are not considering the use of data-driven technologies for aircraft maintenance. Since 2018, when the EASA (European Union Aviation Safety Agency) integrated aircraft health monitoring (AHM) into the regulatory basis for aircraft maintenance [
7], no studies have discussed emerging challenges of data-driven PdAM, taking into account the entire maintenance process and interactions between maintenance personnel and new data-driven technologies.
The aim of this paper is to discuss emerging challenges of the data-driven PdAM, based on the identification and analysis of new hazards associated with the new data-driven technologies. In general, a
hazard implies the intrinsic ability of an agent or situation to cause adverse effects to a target [
8]. Specifically, a hazard in aviation is defined as follows:
Definition 1 (Hazard)
. A condition that could foreseeably cause or contribute to an aircraft accident [9]; any condition, event, or circumstance which could induce an accident [10]. In this paper, we consider hazards related to aircraft maintenance. We especially focus on the hazards associated with the adoption of new data-driven technologies, and the hazards related to the interactions between the maintenance personnel involved in new data-driven PdAM.
Traditional hazard identification methods, such as FMEA (failure mode and effects analysis) or HAZOP (hazard and operability study), look at individual process components. For each such component, potential failure modes, their causes and effects are identified [
11]. However, these methods fail to capture the interactions between process components and the hazards associated with these interactions [
12,
13]. For the case of aircraft maintenance, the interactions between maintenance personnel and the manner in which the personnel interacts with the digital systems are important contributing factors to hazards [
6]. Moreover, due to the only recent consideration of data-driven technologies for aircraft maintenance, there is a very limited amount of data and experience of data-driven PdAM.
To address the drawbacks of traditional methods and the lack of data and experience of data-driven PdAM, we apply a structured hazard identification brainstorming [
13,
14,
15]. The brainstorming is especially suited to identifying emerging hazards associated with novel processes. For instance, the brainstorming is used to identify hazards associated with maintenance outsourcing [
16] and future aviation concepts [
17]. Furthermore, the brainstorming is a useful method to supplement the lack of data in hazard identification [
18]. We facilitate this brainstorming using an agent-based model of the aircraft maintenance process [
4], which provides an intuitive understanding of the interactions between agents. The identified hazards are validated in the context of maintenance-related aircraft accidents reported between 2008 and 2013. Finally, in the light of the identified hazards, we discuss emerging challenges for a safe implementation of data-driven PdAM.
The main contributions of this paper are as follows:
We identify the agents and their interactions during the data-driven predictive aircraft maintenance process. This agent-based model illustrates how future aircraft maintenance will be changed when data-driven technologies and new experts are integrated into the traditional aircraft maintenance process.
We identify emerging hazards associated with data-driven predictive aircraft maintenance through a structured brainstorming session of experts. Here, the agent-based model is used to facilitate the brainstorming. We validate the identified hazards based on the historical accident/incident related to aircraft maintenance.
Based on the analysis of the hazards, we discuss three main challenges of data-driven predictive aircraft maintenance. These challenges suggest directions of future research and development in aircraft maintenance.
The remainder of this paper is organized as follows.
Section 2 introduces an agent-based model showing the stakeholders, digital systems, and their interactions in the data-driven PdAM.
Section 3 identifies and discusses the hazards associated with the data-driven PdAM.
Section 4 validates the identified hazards in the context of past aircraft accidents related to maintenance. In
Section 5, we discuss the emerging challenges of data-driven PdAM based on the identified hazards. Finally, we provide conclusions in
Section 6.
2. Agent-Based Model of Data-Driven Predictive Aircraft Maintenance
In this section, we model a data-driven predictive aircraft maintenance process (PdAM) using an agent-based model [
4]. Here, an
agent is defined as an independent entity that makes decisions based on a set of rules, interacts with other agents, and has its own goals [
19,
20].
The purpose of modeling the PdAM process is to facilitate brainstorming for hazard identification. The agent-based model of data-driven PdAM is first presented to the experts participating in the brainstorming to provide a solid understanding of this new aircraft maintenance process, and to trigger ideas about emerging hazards.
Table 1 and
Figure 1 show the main agents of the data-driven PdAM process and the interactions between them, respectively. In particular, we consider PdAM where a new data management team is introduced to the traditional aircraft maintenance process [
4]. The main agents identified for PdAM are: (i) the task generating team (TG), (ii) the task planning team (TP), (iii) the mechanics team (ME), (iv) the flight crews (CR), and (v) the data management team (DM). Among them, four agents (TG, TP, ME, and CR) are involved in both the traditional aircraft maintenance process (TBM) and the new aircraft maintenance process (PdAM), while DM is a new agent specifically supporting PdAM.
Below we characterize the agents of the aircraft maintenance process by describing their roles and interactions with other agents. In particular, we first elaborate the role and interactions under traditional TBM, and then describe the changes under new PdAM. A detailed model for each agent is given in [
4].
2.1. Task Generating Team (TG)
The role of the task generating team (TG) is to define the type, due date, and method used for a maintenance task. TG generates two types of tasks: periodic tasks and one-time tasks. The periodic tasks are generated based on the regulations introduced by air authorities such as EASA, the manuals provided by aircraft manufacturers, and the analysis of airlines’ operation data. TG integrates all this information and generates periodic tasks (type, due date, and method). Under TBM, these periodic tasks are extensively used as the primary measure to prevent failures. Apart from periodic tasks, one-time maintenance tasks are generated whenever TG receives complaints or findings from flight crews or mechanics. For example, if flight crews observe an abnormal performance of the aircraft during a flight, then they submit a complaint to TG. Similarly, during an inspection, if the mechanics observe an issue, then they submit a finding to TG. Finally, TG analyzes the submitted complaints and findings and generates necessary tasks to address these issues.
Under PdAM, TG receives additional input such as diagnostics and remaining useful life (RUL) prognostics based on DM’s data analytics with aircraft condition data. This input is verified and analyzed by TG. When needed, TG asks TP to plan necessary one-time tasks. For example, let the RUL prognostics of a brake indicate that the brake is expected to wear out within 50 flight cycles. If this is shorter than the remaining number of flight cycles before a planned periodic replacement for this brake (periodic task), then TG asks TP to reschedule the replacement of the brake earlier (data-driven one-time task). In this example, TG anticipates a maintenance issue before it happens, i.e., the maintenance tasks triggered by the prognostics are predictive.
2.2. Data Management Team (DM)
The data management team (DM) is a new agent specifically introduced to support the data-driven PdAM process. DM is responsible for handling the aircraft condition data and generating diagnostics and RUL prognostics. DM first collects the condition monitoring data from aircraft condition monitoring systems (ACMS), the sensors installed on board of the aircraft. Here, DM may also integrate external databases such as weather data, airport data, and/or data shared by other airlines or maintenance organizations [
4]. Data processing and validation are also part of the role of DM. With such data, DM generates diagnostics and RUL prognostics for aircraft systems and structures. In this step, various data-driven algorithms are utilized to generate diagnostics and prognostics depending on the characteristics of the target system, the inspection/monitoring intervals, and the redundancy of the system [
21,
22,
23]. Finally, DM transfers the diagnostics and prognostics information to TG.
During the entire process, DM uses a digitalized platform to collect, validate, analyze, and transfer the data and prognostics information. Such platforms to monitor condition data of an aircraft fleet are, for instance, Skywise of Airbus [
24] and Airplane Health Management of Boeing [
25].
2.3. Task Planning Team (TP)
The task planning team (TP) schedules in time for the execution of maintenance tasks. The tasks are given by TG (periodic and one-time tasks), as well as by mechanics (deferred tasks), in case additional issues are observed during inspections. TP finds available time slots when the aircraft can undergo maintenance, given the flight schedule of the aircraft, the due dates of each maintenance task, the availability of the mechanics, and the availability of necessary materials and resources. Ultimately, TG generates a schedule for the maintenance tasks. A scheduled task specifies the aircraft, the target system/structure, the maintenance tasks type, and the mechanics that need to execute the task.
Under PdAM, the role of TP does not change significantly since the tasks generated by TG using diagnostics and prognostics will be given to TP in a similar format as the non-data-driven tasks.
2.4. Mechanics Team (ME)
The mechanics team (ME) executes the scheduled tasks received from TP. Various types of maintenance tasks are executed, such as system/structure replacement, restoration, lubrication, and inspection [
1]. During an inspection, ME may observe additional issues such as an unexpected level of degradation in aircraft structure. Based on the manuals, ME reports such findings. The necessary tasks addressing these findings are executed on-site (unscheduled tasks) or reported to TG for rescheduling in other maintenance slots (deferred tasks).
Similar to TP, the role of ME does not change significantly under PdAM since the task type and schedules are already specified by TG and TP.
2.5. Flight Crew (CR)
The flight crew (CR) includes pilots and cabin crews who actually operate the aircraft. During a flight, CR monitors the condition of the aircraft using on-board ACMS. CR reports a complaint to TG when any abnormality is noticed. The complaints reported by CR are analyzed by TG who may generate additional tasks to address these issues.
Given that the operation of the aircraft is not subject to changes under PdAM, the role of CR is not expected to change significantly under PdAM.
4. Validation of the Identified Hazards using Reported Aircraft Incidents
In this section, we discuss past aircraft accidents/incidents as a means to validate the hazards identified in the brainstorming session. We first outline the chronology of the events leading to these incidents based on the official investigation reports. Using these reports, we identify similar hazards as those identified in the brainstorming session (see
Table 4,
Table 5 and
Table 6). This analysis shows that the hazards identified in the brainstorming session are also observed in the context of past incidents.
4.1. Nuisance False Positive Alerts Lead to Agents Ignoring a True Positive Alert
An aircraft incident reported in 2017 illustrates how the inadequate handling of alerts from ACMS contributes to the incident [
31]. On 29 April 2017 (Day 0), an aircraft was dispatched while the left air conditioning system (ACS) had been disabled, in accordance with the Minimum Equipment List. During the flight, the cabin pressure was lost because the right ACS failed while the left ACS was disabled. The incident investigation established that the component on the right ACS had been changed 11 days before the day of the incident (Day
). After the aircraft returned to service at Day
, the on-board aircraft health monitoring (AHM) system sent an alert message to the operator’s AHM ground-based data system and their engineering department (AHM ground-based data system and their engineering department perform the role of DM and TG in
Figure 1). This alert message indicates that a ‘high leakage/low inflow’ of the cabin pressurization system had been detected. The operator assessed the message and the necessary task was planned at Day
. Thereafter, during all the subsequent flights between Day
and Day 0, maintenance alert messages were sent by AHM, but no further action was taken by the operator.
From the investigation report of this incident, we identify the following hazards that contributed to the incident. The operator generated an inadequate task with too late due date (see hazard ). More importantly, the continuous alert was not taken seriously by the operator because they regarded this as a ‘nuisance’ (see hazard ).
In addition, an indirect, but crucial hazard is identified—the generated diagnostic results had been frequently faulty in the past (see hazards
and
), and therefore the engineering department classified the true positive alert as faulty (see hazard
). Regarding these hazards, we quote from the investigation report [
31]:
The operator later stated that the AHM system provides just over 1200 maintenance alerts. From experience, some maintenance alert messages are inadvertently triggered, which has led to refinements to improve the robustness of the system and reduce the level of ‘nuisance’ alerts. The operator had seen alert message 21-0209-C740 triggered ‘intermittently’ on other aircraft before and this had caused maintenance staff to question the reliability of this particular alert message.
This incident shows that it is critical to ensure the reliability of the diagnostic/prognostic algorithms and the alert systems, in order to make the agents trust the new PdAM technologies.
4.2. Damage Not Identified by Sensors and Inspections
Several incidents are caused by the damage done during hard landings, which was identified neither by the on-board sensors nor by inspections [
32,
33,
34]. Generally, on-board aircraft condition monitoring systems (ACMS) indicate hard landings to the flight crew. In this case, the pilots and the mechanics conduct inspections to identify and evaluate the potential damage, following the manuals.
In 2016, an aircraft damaged by a hard landing was released without addressing the damage [
32]. Although the subsequent flight was completed uneventfully, it was found later that the aircraft was in an unsafe condition due to the serious damage made by the previous hard landing.
In the investigation report, it was found that the ACMS did not submit the ‘G-Load’ report to the pilots because the peak load of 3.32 g persisted for less than 1 second only (see
Figure 2), while the report is issued when the load persists for at least 2 seconds [
32]. Furthermore, the ACMS sent the ‘A15 hard landing report’ to the Maintenance Operation Center (MOC) (MOC performs the role of DM and TG in
Figure 1). However, the MOC was not able to interpret the report properly (see hazard
) and on time (see hazards
and
). Furthermore, the subsequent inspection did not find any damage (see hazard
), and thus, the aircraft was released back to service.
Two similar aircraft incidents occurred in 2013 and 2008 [
33,
34]. In both cases, the damages to the landing gears were not identified after hard landings. A common contributing factor to these incidents was that the on-board ACMS did not trigger an alert for hard landing since the predefined load threshold had not been exceeded (see hazard
). For the incident in 2008, the engineers reasoned that no inspection was needed because the recorded parameters had not exceeded a predefined threshold, which is in accordance with the aircraft maintenance manual [
34]. For the incident in 2013, inspections were performed regardless of the ACMS alert, but the damage was not identified (see hazard
). According to the investigation of this incident in 2013, the other contributing factors were the bad meteorological conditions during the outdoor inspection, and the use of inspection procedures that were not consistent with the aircraft maintenance manual [
33].
These incidents show that the parameters and algorithms used for ACMS need to be updated continuously based on the actual operation data in order to properly identify hard landing or other abnormal events (see hazard ). In addition, the inspections carried out by mechanics need to be performed carefully, especially when there is a conflict between reports submitted by flight crews and aircraft condition monitoring systems (see hazard ).
4.3. Unidentified Damage due to Incomprehensible Data Presentation
In 2016, a helicopter lost its yaw control during landing [
35]. The helicopter has in place the Health and Usage Monitoring System (HUMS), which monitors the condition parameters such as engine vibration, rotor track balance, engine shaft balance, etc (HUMS performs the role of ACMS for aircraft in
Figure 1). One day before the incident (Day
), during flight, HUMS recorded vibration data, including a series of exceedences related to the tail rotor pitch change shaft (TRPCS) bearing. In the routine maintenance following this flight, the HUMS data were downloaded and analyzed. During the analysis, an abnormality for the tail rotor gear box bearing was detected, but the exceedence was not identified. During the first flight of the day of the incident (Day 0), the HUMS recorded further exceedence. However, it was planned to download and analyze the data only after the helicopter returns to the base. During the lift-off of the second flight on Day 0, the helicopter went through an uncommanded yaw. However, this was regarded as the influence of the wind on the helicopter. During landing of the same flight, the helicopter totally lost yaw control and landed expeditiously and heavily. The root cause of the lost yaw control was identified as the damage on the TRPCS caused by the failed bearing. The following two contributing factors were discussed in the investigation report [
35]:
Impending failure of the TRPCS bearing was detected by HUMS but was not identified during routine maintenance due to human performance limitations and the design of the HUMS Ground Station Human Machine Interface.
The HUMS Ground Station software in use at the time had a previously-unidentified and undocumented anomaly in the way that data could be viewed by maintenance personnel. The method for viewing data recommended in the manufacturer’s user guide was not always used by maintenance personnel.
For this incident, we identify the hazards related to the unclear communication (see hazards
,
,
, and
), and the delayed data/information sharing (see hazard
). The damage to TRPCS was properly detected by the HUMS before the incident, but this was not identified and resolved by the operator (The operator is performing the role of TG in
Figure 1) (see hazards
,
, and
). The first contributing factor was the design of the HUMS Ground Station Human Machine Interface. The information available through this interface needs to be zoomed in to identify the exceedence (see
Figure 3), but the two engineers did not address this (see hazard
,
and
). As a result, a proper inspection was not conducted (see hazard
). In addition, the HUMS data were not shared online, rather the storage card was supposed to be brought back to the base. Thus, the exceedence recorded during the first flight was not reported (see hazard
). Moreover, the global support team who received the HUMS data of the previous day (Day
) identified the exceedence and contacted the operator (The global support team performs as DM and TG in
Figure 1). However, the communication was not completed on time (see hazards
and
) as the incident already occurred by the time the support team transmitted their report.
This case shows the importance of the digital communication platform for data-driven PdAM. The digital platform should visualize the data in an intuitive manner and highlight crucial information to prevent hazards such as hazards , , , and . In addition, online data sharing is needed to prepare necessary maintenance tasks in advance (see hazard ).
With the analysis above, we validate the hazard list identified during the brainstorming session by revealing similar hazards encountered for actual incidents.
6. Conclusions
In this paper, we identify hazards associated with the introduction of data-driven predictive aircraft maintenance (PdAM), and discuss the emerging challenges of implementing data-driven PdAM. As a first step, the main agents of data-driven PdAM and their interactions are recognized. Then, a structured brainstorming for hazard identification is conducted with aircraft maintenance experts, each representing one of the maintenance agents. We focus on the emerging hazards associated with the adoption of new technologies, such as aircraft condition monitoring systems (ACMS), data-driven diagnostics and prognostics algorithms, and decision support systems for PdAM. As a result, 20 emerging hazards are uncovered for data-driven PdAM. Two agents, the data management team and task generating team, are associated with the largest number of new hazards of data-driven PdAM. These hazards are validated in the context of past aircraft incidents that occurred between 2008 and 2013.
Following the analysis of the hazards, we discuss three main challenges for safe implementation of data-driven PdAM: (i) guaranteeing the reliability of new data-driven technologies of PdAM, (ii) designing intuitive communication platforms that can facilitate communication between agents under PdAM, and (iii) building the agent’s trust in the new data-driven PdAM process. These challenges guide the future research direction for the successful implementation of data-driven PdAM.