1. Introduction
Modern aircraft use two types of avionics architecture: federated avionics (FA) and integrated modular avionics (IMA) [
1,
2]. In FA architecture, each system has a dedicated processing unit, and the hardware of the processing unit is not used by any other system. Each processing unit is loosely associated with the processing units of other functions. FA architecture assumes the use of a separate line replaceable unit (LRU) for each avionics function and interconnection of each LRU with others via point-to-point data buses such as ARINC 429. One of the drawbacks of FA architecture is that it is difficult to expand. Any additional LRUs require additional cable communication with existing LRUs. FA architecture requires long cable runs to connect far located LRUs that increase weight and can lead to reliability problems [
3]. Until the late 1990s, the digital avionics of most civil aircraft, including the Boeing and Airbus families, was based on the FA architecture principle. The basis of the IMA is an open network architecture and a single computing platform. At the same time, system functions are performed by software applications that share common computing resources. In the IMA, the computing system represents a set of LRMs such as processor modules, memory modules, network switch modules, power modules etc. The design of LRM is based on a single standard, ensuring the principle of unification and interchangeability. Compared to the FA architecture, the transition to the IMA architecture significantly reduces the weight and cost of aircraft equipment. In the IMA architecture, much of the functionality of a traditional LRU is performed by a special application. Standalone applications are located in the common IMA modules called core processing input/output (I/O) modules (CPIOMs). Each CPIOM integrates hardware and software functions and places them in the same computing and storage resources as well as providing I/O interface to some conventional avionics LRUs. In order to connect the traditional avionics LRUs, additional LRMs, called I/O modules, are also included in the IMA architecture. The dialogue between the LRMs is conducted via avionics data communication networks (ADCN) over the avionics full-duplex switched Ethernet (AFDX) compliant with the ARINC 664 standard. The IMA architecture connects all LRMs to the ADCN network, with all information routed to the appropriate LRM through the AFDX switch. The Boeing 777 was the first commercial aircraft with partially introduced IMA architecture. The Boeing 777 uses the IMA concept to implement only some of the important functions previously performed by independent LRUs (flight control, communications control, aircraft condition monitoring). In the A380, the IMA concept was implemented for the most functions [
4]. The A380 avionics comprises 30 LRMs and 22 programming functions, which are located in the CPIOMs [
5]. The ADCN communications network comprises 16 AFDX switches and corresponding cables. The switches perform connection of 8 IOMs, 22 CPIOMs, and 50 LRUs with AFDX interface [
4,
5]. To reduce the number of connections from the cockpit control panel to avionics cabin computers, the controller area network (CAN) bus is used for the A380 avionics [
6]. Although Airbus began to make extensive use of the CAN bus to reduce cabling on the A380, the popular ARINC 429 bus is still used to connect the radio control panels in the cockpit to the avionics LRUs.
Thus, modern avionics includes a large number of the LRMs and LRUs. In turn, each LRU/LRM consists of several printed circuit board assemblies, which are called shop replaceable units (SRUs). Modern LRUs and LRMs usually have built-in test equipment (BITE) for continuous monitoring during flight. In accordance with the design of modern avionics, the following three maintenance levels are typically considered: organization maintenance level (O-level), intermediate maintenance level (I-level), and depot maintenance level (D-level). The O-level is a flight-line maintenance where the LRU/LRM can be removed and replaced for a short period of time. The spare parts at the O-level are the LRUs/LRMs. Intermediate maintenance is conducted at a dedicated workshop to repair the removed LRUs/LRMs by replacing the failed SRUs. The spare parts at this maintenance level are the SRUs. Automated test equipment (ATE) is commonly used at the I-level maintenance. The failed SRUs are repaired at the D-level maintenance, where the spare parts are non-repairable electronic components.
Modern avionics systems are redundant systems. Therefore, each LRU/LRM operates up to a safe failure that is registered by the BITE during flight. The failed LRUs/LRMs are replaced by spare LRUs/LRMs at the O-level maintenance. This maintenance strategy is called a breakdown maintenance strategy. In [
7] it is stated that “intermittent faults are regarded as the most difficult class of faults to diagnose and are cited as one of the main root causes of No Fault Found.” Intermittent faults are mechanical by nature. They occur due to faults of electrical wiring, solder joints, screening braid, connectors, metal lines of integrated circuits, etc. Modern electronic components are reliable, so the intermittent discontinuity between printed circuit board components is becoming the main cost driver [
8]. Conventional ATE cannot detect intermittent faults that cause NFF for the following reasons [
8,
9]:
- (1)
Monitored object parameters are usually tested only once for a short period of time, which usually does not coincide with the intermittent fault event time;
- (2)
Digital averaging, scanning, and sampling of the test signal miss out intermittent faults;
- (3)
The removed LRUs/LRMs are tested in the laboratory rather than in the operating environment where faults occur; the electrical wiring interconnect system is also tested in a static environment;
- (4)
Conventional ATE are designed to detect permanent failures, faulty components, as well as short circuits and breaks in electrical circuits, and are not capable of intermittent fault diagnostics;
- (5)
Intermittent faults that cause NFF do not correspond to any certain failure pattern.
The breakdown maintenance strategy may be inefficient at a high incidence of the NFF events because traditional ATE cannot detect intermittent faults and the same intermittent fault may occur again on the next flight. As indicated in [
10], the estimated level of NFF for avionics is from 20% to 50%, and, moreover, avionics components account for 80.4% of all unconfirmed failures, which, in addition, leads to 26.6% of unscheduled removals of avionics LRUs/LRMs [
11]. The main reason for unconfirmed failures of electronic LRUs/LRMs is the occurrence of intermittent faults in flight [
12]. The negative impact of unconfirmed failures on airline operations includes increased service time, disruption of flight regularity, and an increase in spare LRUs and LRMs, which ultimately leads to an increase in the life cycle cost of avionics systems. Thus, the operational practices in commercial aviation confirm the relevance of assessing the impact of intermittent faults on the life cycle costs of avionics systems as well as the option of selecting a maintenance organization that minimizes the negative impact of intermittent faults.
The growing interest in assessing the cost of avionics maintenance is manifested in a large number of publications on this topic. A stochastic model evaluating the lifecycle cost associated with the application of prognostic health management (PHM) to helicopter avionics was considered in [
13]. However, the impact of intermittent faults on the lifecycle cost of helicopter avionics was not assessed in this model. Mathematical model of a periodically monitored avionic LRU was considered in [
14]. Equations of availability and expected maintenance costs of redundant avionics systems were derived. However, this model cannot be applied to modern avionics systems, which are continuously monitored. And, moreover, the proposed model does not consider the impact of intermittent faults on the maintenance effectiveness indicators. Avionics hardware lifecycle cost model was proposed in [
15]. The model considered such drivers of the life-cycle costs as type of technology, percentage of new designs, weight, volume, design reliability, and operational reliability in respect to permanent failures. Intermittent faults were not included into the model. Different maintenance strategies related to the presence of NFF were considered in [
16]. Statistics on NFF is analyzed, according to which the percentage of NFF in military avionics is about 70%. The proposed cost equations are empirical and can only be used for a given percentage distribution of the confirmed and unconfirmed failures. The structural-logical model of the main cost drivers of the total cost of NFF consequences was considered in [
17]. The conducted analysis shows how the most suitable drivers can be selected to represent aggregate costs due to NFF. This article shows the relevance of the task of quantifying the cost of the NFF effects, but it does not contain mathematical or other models to solve this problem. A maintenance model of a continuously tested single-unit digital system subject to revealed and unrevealed failures, and intermittent faults was considered in [
18]. The derived availability expression is valid only for the systems with continuous operation. Avionics systems are operated with alternation of flights and landings. Therefore, the proposed mathematical model cannot be used to estimate the availability of avionics systems. An empirical model to estimate the unit cost per hour depending on the total flight time, aircraft size, load factor, average flight time, exchange rate, number of passengers per airline, and gross domestic production was proposed in [
19]. The proposed cost equation is purely economic in nature and cannot be used to select the optimal option of avionics maintenance.
The following references do not relate to mathematical models of avionics maintenance, which are the subject of this study. However, these references are important to understand the proposed mathematical model for assessing the operational reliability of avionics LRUs/LRMs. The strategy of detection of intermittent faults is analyzed, in which testing is carried out at regular intervals in [
20]. An exponential time distribution to a permanent failure and intermittent fault is assumed. The optimal testing frequency is determined, which maximizes the probability of detecting an intermittent fault upon its first appearance. This model is not suitable for assessing the operational reliability because avionics systems are monitored continuously in flight, rather than periodically. A Markov model of general-purpose reliability with three states for fault-tolerant systems with both permanent failures and intermittent faults was considered in [
21]. A comparison is made between the reliability of the duplex and redundant systems in the presence of permanent failures and intermittent faults. The proposed model can only be used for the systems with continuous operation. A reliability model for determining the optimal frequency of testing intermittent faults in the built-in pipelined processor by the criterion of the minimum cost of testing was considered in [
22]. The Markov model with continuous time and two states is used for probabilistic modeling of intermittent faults. This model can be used only for the systems with periodic testing of the state, operating in continuous mode. A model for studying the reliability of digital systems subject to both permanent failures and intermittent faults was considered in [
23]. The reliability evaluation of a digital system is based on the Markov model containing three states. This model assumes continuous operation of the digital system, and therefore cannot be used to assess the operational reliability of avionics systems. The strategy of imperfect checks to detect intermittent faults in a computer system was investigated in [
24]. The system is checked at periodic times and its fault with a certain probability is detected in the next time of checking. The expected cost to intermittent fault detection is determined. This model can be used only for systems with periodic testing and continuous operation. The impact of various scenarios for restoring the processor after the occurrence of intermittent faults on its performance was assessed in [
25]. To achieve this goal, the operation of a fault-tolerant multi-core processor is simulated in the presence of intermittent faults, subject to exponential and Weibull distribution. The simulation shows that 40% of the processor faults are intermittent by nature, and 60% are permanent. The results of this study can be used to select the distribution law of time to intermittent fault in digital LRUs/LRMs of avionics systems. Mathematical maintenance models of avionics LRUs subject to permanent failures and intermittent faults were considered in [
26,
27]. However, in order to determine the operational reliability indicators in these models, it is necessary to know the conditional probabilities of occurrence and non-occurrence of intermittent faults during flight. These probabilities are not easy to derive for an arbitrary probability distribution function (PDF) of time to intermittent fault.
The aim of this article is to develop a convenient for practical calculations mathematical model of digital avionics’ operation and maintenance, which allows us to choose the optimal option of maintenance organization. In this article, mathematical models have been developed to evaluate the operational reliability of continuously monitored LRUs/LRMs and redundant avionics systems over a finite time interval, which, unlike known models, take into account the influence of both permanent failures and intermittent faults.
3. Results
In this section, we consider an example of using the mathematical model developed in
Section 2 to choose the best maintenance option for a redundant avionics system.
Let us calculate the ETMC for the air data inertial reference system (ADIRS). The ADIRS provides traffic data (airspeed, angle of attack and altitude) and information on inertial control (position and altitude) on the displays to pilots, as well as to other aircraft systems such as engines, autopilot, flight control system and chassis. The ADIRS consists of fault-tolerant air data inertial reference units (ADIRUs) located in the aircraft electronics rack. Typical ADIRU comprises the following SRUs: an air data computer, multimode receiver, 3 digital ring laser gyros, 3 quartz accelerometers, and power supply module.
The following data are used in calculation of the ETMC for different maintenance options: , , , , , , , , , , , 1000$, , , , , , , , , , , and .
The results of calculation are shown in
Table 1. Since in the third option of maintenance the IFD is used, the rate of intermittent faults in repaired ADIRUs should be significantly reduced. As described in [
37], after repairing the AN/APG-68 airborne radar of the F-16 using IFDIS, MTBUR increased more than threefold. Therefore, it was assumed in the calculations that for the third maintenance option, the rate of intermittent faults is 3 times less, i.e.,
To determine the optimal number of spare ADIRUs, a mathematical model for the operation of the warehouse of spare LRUs based on Markov chains with continuous time was developed. The optimum number of spare LRUs ensured the absence of delays in scheduled flights of aircraft.
The number of spare SRUs was calculated for confidence probability of 0.99 for each type of SRU. According to the calculations, in the second maintenance option there should be two spare SRUs of each type in the warehouse, and in the second maintenance option only one spare SRU.
As can be seen from
Table 1, the third maintenance option, which uses IFD at the I- and D-level maintenance, has the lowest maintenance cost and is therefore the best. Indeed, the maintenance cost in the third maintenance option is 8.3 times smaller than in the first option and more than 3.5 times smaller than for the second option.
It should also be noted that the unavailability of ADIRS is the same for all maintenance options. This means that unavailability (or availability) of the redundant avionics systems cannot be used to select the best maintenance option of avionics systems because it does not depend on the characteristics of the I- and D-level maintenance.
The results of
Table 1 convincingly show the preference of using the third maintenance option in airlines. The question of the stability of the obtained results to the change in the rate of intermittent faults is also topical.
Figure 5 shows the dependence of ETMC on the rate of intermittent faults for each maintenance option. As can be seen in
Figure 5, the first maintenance option is very sensitive to the rate of intermittent faults, since EMTC is increased from
$ to
$ (i.e., almost 28 times increase) when η is changed from
to
. However, when using the third maintenance option and the same change in the rate of intermittent faults, the ETMC increases from
$ to
$, i.е., only four times increase. The second maintenance option also has the stability to the change of the intermittent fault rate, however, when using it, the ETMC is 2–3 times greater than in the third option.
Figure 6 shows the dependence of the optimal number of spare ADIRUs on the rate of intermittent faults for the first, second, and third maintenance options.
As can be seen from
Figure 6, first, to ensure the regularity of flights, a significantly larger number of spare ADIRUs is required for the first maintenance option compared to the second and third options and, second, the optimal number of spare ADIRUs is very strongly depends on the rate of intermittent faults for the first maintenance option and weakly depends for the second and third options.
4. Discussion
In this study, a mathematical model has been developed to evaluate the availability and expected maintenance cost of continuously tested LRMs/LRUs and redundant avionics systems over a finite interval of time, taking into account the effect of both permanent failures and intermittent faults. Unlike most published studies, such as [
20,
21,
22,
23,
24,
25], the developed mathematical model considers the specifics of the architecture and application of modern avionics systems and can be used for any law of distribution of the operating time to permanent failure and intermittent fault. The described model might be further developed in order to divide the NFF events to intermittent faults and false alarms of the BITE. The need for separate consideration of false alarms is due to the fact that the level of false alarms is high enough in the aviation industry, reaching 28% of all events associated with alarm signals [
38].
A number of publications point to the topical need for a cost assessment of the consequences of NFF events in electronic systems [
12,
17]. This problem is indeed topical because according to [
39], the average cost of losses due to NFF per aircraft in the US civil aviation was about
$200,000 in 2013. Similar losses exist in military aviation [
40,
41]. So, according to the US Department of Defense, three out of four (75%) weapon systems are subject to NFF-type failures [
41]. NFF costs the US Department of Defense
$2 billion to
$10 billion in losses per year [
8]. As indicated in [
8,
9], the main cause of NFF events are the intermittent faults that occur in flight. Therefore, the developed mathematical model for evaluating the average cost of avionics systems maintenance with taking into account the effect of intermittent faults is of undoubted interest for developers of maintenance programs for avionics systems of civil and military aircraft. The proposed cost model allows not only to assess the impact of intermittent faults, but also to choose the maintenance option that is practically independent on the rate of intermittent faults, thus ensuring the minimum cost of maintenance. The ideas in this paper are illustrated by an example that describes three maintenance options for the redundant avionics system ADIRS. Among these maintenance options, the option of a three-level maintenance with IFD at the second and third maintenance levels is of particular interest. Despite additional capital expenditures, this option provides the lowest average maintenance cost, which is 8.3 times less than in a single-level and 3.5 times less than in a two-level maintenance system without using IFD. This result is explained by the fact that the application of the IFD allows the identification of SRUs with intermittent faults at the I-level maintenance and, subsequently, at the D-level maintenance to eliminate the causes of such faults. At the same time the rate of intermittent faults decreases and, hence, the MTBUR increases. The performed calculations fully confirm the statistical data given in [
37].
Future research directions may include the development of cost models for maintenance options with different combinations of O-, I-, and D-levels of maintenance with the involvement of outsourcing service companies.