Methodology for Evaluating Risk of Visual Inspection Tasks of Aircraft Engine Blades

Risk assessment methods are widely used in aviation, but have not been demonstrated for visual inspection of aircraft engine components. The complexity in this field arises from the variety of defect types and the different manifestation thereof with each level of disassembly. A new risk framework was designed to include contextual factors. Those factors were identified using Bowtie analysis to be criticality, severity, and detectability. This framework yields a risk metric that describes the extent to which a defect might stay undetected during the inspection task, and result in adverse safety outcomes. A simplification of the framework provides a method for go/no-go decision-making. The results of the study reveal that the defect detectability is highly dependent on specific views of the blade, and the risk can be quantified. Defects that involve material separation or removal such as scratches, tip rub, nicks, tears, cracks, and breaking, are best shown in airfoil views. Defects that involve material deformation and change of shape, such as tip curl, dents on the leading edges, bents, and battered blades, have lower risk if edge views can be provided. This research proposes that many risk assessments may be reduced to three factors: consequence, likelihood, and a cofactor. The latter represents the industrial context, and can comprise multiple sub-factors that are application-specific. A method has been devised, including appropriate scales, for the inclusion of these into the risk assessment.


Introduction
Gas turbine aircraft engines are inspected at regular intervals, or after a known incident (e.g., bird strike). While maintenance, repair and overhaul (MRO) of engines is crucial for flight safety, it is predominantly performed by human operators who are prone to error. The International Air Transport Association (IATA) reported that maintenance and inspection errors are under the top three causes of aircraft accidents and that in 26% of the cases, a maintenance-caused event started the event chain [1,2]. According to Federal Aviation Authority (FAA) records, maintenance was involved in 27.4% of fatalities and 6.8% of incidents [1]. Furthermore, it was reported that component or structural failures are the primary root-cause for maintenance-related incidents and that it most likely occurs at the engine.
In aircraft engine maintenance, the first inspection made is by boroscopic means, whereby a borescope is inserted into the engine (typically while on the wing) and the rotating parts are inspected. The inspector has to examine each blade for indications of damage. If a defect is found, it has to be recorded and quantified as to acceptability. The inspection has to be made in a difficult environment with several constraints, such as limited space for the operator, restricted views, restricted lighting, limited pixel resolution, boredom, distraction, and time pressure [3].
A sample borescope photograph image is presented in Figure 1. Borescopy is the first in a sequence of several inspection activities. In a semi-manual, operator-dependent, time-consuming, and tedious procedure [4], the inspector has to identify a wide variety of defects, with different severity degrees and locations under different angles [5]. This activity informs the decision whether to let the engine continue flying, reduce the inspection interval to check for propagation of the defects over time, or remove it for maintenance. Committing the engine to a (costly) teardown process enables other means of inspection, both visual and other non-destructive testing (NDT) methods, to be applied.
Once an engine is committed to repair, it is disassembled and further visual inspections occur. While the initial borescope inspection is limited in what can be seen, once the engine is disassembled, the blades can be visually inspected individually and in better conditions. The most detailed inspection is the examination of disassembled parts. This is called the 'on-bench' or 'piece-part' inspection, and is the subject of the present paper. This inspection is done visually, and allows the blades to be individually examined from any angle, with excellent lighting, and the use of optical magnification if warranted (refer to Figure 2). At this point, the decision will be made as to whether the blade may be returned to service in its current condition, or diverted to the repair or scrapping processes. The inspection process is prone to human error tendencies as well as lack of accuracy, reliability, subjectivity, consistency, and repeatability, among other factors [6][7][8][9]. Missing a defect (false negative) during an inspection task anywhere on this chain has the potential to lead to catastrophic engine failure, hence risk of damage to the engine and fuselage [10,11], as well as to severe harm to passengers or even fatalities [12,13]. On the other hand, a false positive detection may commit the engine to a needless repair process that is costly in time and financial utility. Hence, visual inspection tasks introduce key decision points into the maintenance process, with far-reaching consequences.
Furthermore, engine maintenance is a complex, time-consuming, and expensive task and one shop visit can create costs equivalent to the engine list price [5,[14][15][16]. This creates a competitive pressure between MRO service providers, whereby the one with the lowest price and shortest turn-around time wins the order.
Increasing fleet sizes, heavy growth of the MRO industry, and shortage of trained personnel at the same time creates additional time pressure to meet the demand [17]. This is highly critical, as time pressure must not have any negative impact on the inspection quality, which affects passenger safety. It further contributes to human error and the risk of missing a critical defect.
Since 90% of inspections in aircraft maintenance are done visually [8,14,18], there is a need to understand the risk inherent in such inspection process. In this paper, we develop a risk framework specifically for visual inspection tasks of geometrically complex parts such as blades. While the subject under examination is inspection of blades, it should be pointed out that much of the safety of aviation systems depends on the inspection vigilance of human operators during manufacture, operation, and maintenance of the technical system.

Literature Review
We briefly review the risk management literature and related question of how risk might be determined in the specific activity of visual inspection.

Extended Risk Constructs Using n-Tuples
The term 'risk' is somewhat ambiguous as it has multiple definitions and methods for determination [19]. Thus, depending on the context, risk may be: uncertainty, potential loss, consequences, probability of an undesired event, or effect of uncertainty on objectives. It is also often a combined metric, e.g., Consequences or damage + Uncertainty, or Probability + Consequences.
Over time, the interpretation of the ISO 31000 risk management [20] concept has tended to dominate. This standard defines risk as the 'effect of uncertainty on the possibility of achieving the organization's objectives'. Furthermore, it provides a specific mechanism to determine risk. It partitions risk into two dimensions: consequence, and likelihood of that occurring. Then, these are combined into a risk metric. The combination may be done in two ways: (a) simply multiplying consequence and likelihood, if both are numeric scales, or (b) using a correlation matrix or map. Thresholds for acceptable risk are then applied, to categorise the risks and prioritise them based on acceptability, practicality, response time, enforceability, durability, cost-benefit ratio, compliance with legislation, and possible treatment. The expected efficacy of the treatments can be estimated by calculating the 'residual' risk after treatment, and this too can be evaluated for acceptability. The results are tabulated in the 'risk register'. The overall outcome of the process is that it provides a methodology whereby organisations can show due diligence towards a systematic assessment of risk, and justify rationing resources to treat the more important risks. Note that in ISO 31000, the concept of risk includes both threats and opportunities, hence the treatments are prevention of threats and capture of opportunities respectively, but here we are primarily interested in the threat component.
The ISO 31000 construct of Consequence × Likelihood has the benefit of providing a common method for the determination of risk. Nonetheless, it has limitations. In the general application of risk management, the scales are almost always subjective and highly variable between organisations [21,22]. Different analysts could even estimate different outcomes with the same scale, contributing towards the inconsistency. Even technical systems, like piece-part inspection, are prone to qualitative assessments of risk as not every quality parameter can be measured and quantified, at least not in real operational settings. Moreover, in reality, failure outcomes are not single consequences but rather a chain of progressive deterioration of the system with multiple opportunities to intervene. The method does not accommodate this-it assumes an equifinality to consequence (and likelihood) that may be unjustifiable in complex failure sequences. Moreover, the two factors do not provide sufficient granularity for many engineering situations.
In an attempt to incorporate other conditional factors, the literature shows diverse ways of achieving this. A common but highly variable approach is to extend the risk metric to encompass a third or more factors. The most common ones were vulnerability [23][24][25][26], detectability [27][28][29][30], manageability [29,[31][32][33][34][35], and time [29,[36][37][38][39]. Other more application-specific factors that have been added to the risk equations include preventability [35], layers of protection [40], resilience [41], volatility [39], experience [42], knowledge [43][44][45], social impact [46] and coping capacity [47,48], performance-shaping factors [49], and human factors [50]. Many of these methods start by defining the consequences and likelihood and then adding the other factors. Other methods augment this by instead focusing on the events leading up to the consequences, typically using a correlation approach such as Quality Function Deployment (QFD) and related extensions [51][52][53][54]. Invariably, the objective is to quantify the risk for a specific situation or context. A detailed list of risk equations with a contextual variable is shown in Table 1. As this shows, there is no general approach capturing all the different approaches.

Specific Frameworks for Visual Inspection Risk
Several approaches attempt to incorporate detection in the risk assessment of failure. The most common approach is the Failure Mode and Effect Analysis (FMEA) [57,58], which analyses the components of a system and how these can fail, and assigns a consequence to each of them. On an operational level, the failure modes are typically processes, thus the resulting framework is Process Failure Mode and Effect Analysis (PFMEA) [58,59], while in the design stage, it is called Design Failure Mode and Effect Analysis (DFMEA) [58,60]. Some attempts have been made to include the criticality and detection of failure modes into the model, leading to the Failure Mode Effect and Criticality Analysis (FMECA) [57], and Failure Mode Effect and Detection Analysis (FMEDA) [61], respectively.
Risk-based inspection (RBI) is a process of developing a risk analysis scheme of inspection. It may include an assessment of the likelihood (probability) of failure due to flaws, damage, or deterioration or degradation, along with an assessment of the consequences of such failure.
A key concept is that of the Probability of Detection (POD), which was originally developed for the US Air Force focusing on turbine engine inspection [62]. The typical area of interest is cracks and flaws, for which the conventional parameters of interest are crack length and sometimes crack depth. However, there does not appear to have been any consideration of other parameters such as other defect types, image quality, inspector expertise, etc., nor has the POD concept evolved into a broader risk management framework. Probability of detection in supply chains has been included in the risk equation by Griffis and Whipple [30], but not for defect detection in manufactured parts.
While there has been some prior work on visual inspection, the literature is sparse on the application of risk frameworks to this area. For example, human factors were studied for borescope operators [63], but without quantifying the risks. Where risk assessments have been used, they have been addressed to the implementation of new systems, rather than the inspection decisions themselves. For example, the authors of [64] provided a risk assessment for the implementation of the structural health monitoring, and similarly, the authors of [65] for a detection procedure.

Representation of Inspection Risk for Decision-Making
All tabular approaches have in common that they multiply the failure likelihood, consequence, and any additional factors. To perform the calculation, numeric scales are required at input and output [21]. Ideally, these scales would represent consequences in a robust variable (such as economic utility, though even that fails to capture all dimensions of value), likelihood as a probability, and the output risk as a number to which people could relate. In practice, there are seldom sufficient data to quantify the input variables, and instead an ordered scale (e.g., from rare to almost certain) is used to which numerical values are assigned (e.g., 1 to 5) [28]. The numbers are then used in the product function, and the resulting risk score (RS), also called risk rating (RR), risk priority number (RPN), or risk value (RV), is mapped back to a descriptive scale (e.g., from low to extreme). All these processes introduce subjective judgements [22]. This, plus the variability in scales used, make it difficult to compare risk assessments from different organisations. Pons [21] showed that this is also problematic for safety assessments, especially considering the need to reconcile these with legislative requirements (which vary across jurisdictions).
An important aspect of risk management is the communication of the risks and identification of appropriate means of prevention and mitigation that address those risks. Typically, colour coding is used to highlight different risk levels, which enables a faster reception and identification of critical risks.
Most common risk matrices use a traffic light system to colour risks from low or acceptable (green), over moderate or tolerable (amber), to high or intolerable risks (red) [66]. Some researchers have added yellow as another colour to their risk matrix, representing low risks in risk assessments that have large values for likelihood and impact, and typically these are non-linear scales [67,68].
In New Zealand and Australia, the handbook for risk management (HB 436:2004) suggests five colours: green, blue, yellow, orange, and red, in ascending order of risk levels [69]. However, there is inconsistency with the order and allocation of these colours. While some follow the Australian and New Zealand standard (AS/NZS 4360) order [70], others follow the order of green-yellow-blue-red [71], and still others start with blue followed by green, yellow, and red [72]. Vose introduced a graded colour scheme with nine colour tones [73]. Only a few risk levels with adjacent RPNs were assigned the same colour. Although the risk heat map consists of nine shades, it contains only four different colours, i.e., green, amber, orange, and red. Since some shades are so similar, it is hard to differentiate the associated risk levels from each other, which does not support the idea of an easier risk perception. As the author stated, it adds more complexity to a tool that was meant to simplify the risk assessment process. It can be concluded that too many colours are detrimental. Vose stated that managers need to know whether to say 'Stop!' or 'Go!' based on the risk involved [73]. Another way of specifying colours is by consideration of the intended audience. In most organisations, there is a progressive escalation of communication about risk depending on the risk appetite of the organisation. Thus, small risks might be treated by the operators, with larger risks progressively escalated to supervisors, managers, executives, board, and external regulators. Hence, the notion of risk might be portioned not so much into arbitrary colours, but into audiences and stakeholders [21]. Some risk assessment tools provide an icon-based rating, which is beneficial for colour-blind people. The tool was used for risk rating biased assessment [74]. The different risk levels are as follows, with their associated icons shown in brackets: low (+), moderate (−), serious (x), and critical (!) [74,75].

Gaps in the Body of Knowledge Regarding Visual Inspection
A two-dimensional risk matrix, while simple to understand, does not allow the complex causality to be represented. This has given rise to an extensive literature on methods that multiply consequence, likelihood, and an additional contextual factor that is application-specific. Most studies have included only a single third dimension. The work by Luquetti Dos Santos et al. presents the idea of multiple performance-shaping factors [49]. However, they did not show how these factors could be included into the risk equation. There is a general lack of understanding as to how additional factors can be added to the risk assessment, especially if there is more than one application-specific factor. There are two primary issues.
The first issue is the increasing risk score when adding (multiplying) additional factors. It is difficult to determine the relative importance or weighting of these factors. Often, they are given the same weighting as the likelihood and severity, which potentially leads to distorted results. To solve this, it is necessary to compensate for the additional factor. Thus, there is a need for a standardised approach to include one or more sub-factors into the risk equation, which should be as generic as possible to allow for its applicability to different industries.
The second issue is that all the risk assessment methods struggle to include factors that are difficult to quantify, such as human factors. While the presence of such factors is generally acknowledged [50,[76][77][78][79], it remains problematic to quantify, or even identify an appropriate scale.
Different risk assessment methods have been applied to the aviation industry and specifically to aircraft maintenance. These including Event Tree Analysis (ETA) [80], Bowtie Analysis [81], Maintenance Factors and Analysis Classification System (MxFACS) [11], and Failure Mode, Effect, and Criticality Analysis (FMECA) combined with Fuzzy Logic, and the 'as low as reasonably practicable' (ALARP) approach [82]. Most commonly however, a tabular approach (risk register) is used for the risk assessment in the aviation domain, e.g., by the International Civil Aviation Organization (ICAO) [83,84], European Union Aviation Safety Agency (EASA) [85], Federal Aviation Administration (FAA) [86], or Civil Aviation Safety Authority (CASA) [87]. However, these methods are applied in basic form, i.e., multiplying the probability and severity, with no considerations of additional factors affecting this risk score. Moreover, no previous study has examined the inherent risk in visual inspection tasks of aircraft engine components, and the factors that might influence the process and hence the risk.

Research Objective
The research objective was to devise a standardised methodology for evaluating risk, in the specific area of visual inspection. The desired outcome was a generic framework for risk assessment with the following attributes: is clearly structured, can accommodate multiple application-specific factors, and can be applied to any industrial visual inspection task. The proposed framework is then applied to the specific case of visual inspection of defects in gas turbine blades.

Approach
The approach involved the lead author being embedded in MRO for the duration of the project. This experience provided contextual knowledge, and access to expert operators. Several work streams were undertaken, of an overlapping and mutually informing nature. The methodology was therefore developed around the industrial context. The concepts were refined through an iterative process of theory building and testing of face validity in the industrial context. The workflow is presented in Figure 3 and each work stream is subsequently described in more detail.

Work Stream 1: Collection of Specimen Images
A reference set of images was needed for 'type' defects. By 'type' we refer to a specimen, that represents a particular category and size of visual defect. The type image presents the defining features of that defect. This is semantically similar to how type is used in biology and within taxonomies. We adopted the taxonomy of blade defects per [88]. We then examined a large number of damaged blades, categorised per the defect taxonomy. Photographs of each defect type were taken with standardised image acquisition and lighting conditions to give a consistent image quality comparable to inspection with a low-magnification glass as available on inspection workbenches in the industry. The image acquisition was done with a Nikon D5200 digital single-lens reflex (DSLR) camera (Nikon Inc., Tokyo, Japan) and a Nikkor 105 mm micro-lens, in a self-built light tent with three 6 W Superlux (Superlux, Auckland, New Zealand) light-emitting diode (LED) ring lights.

Work Stream 2: Design of Risk Framework
Having identified the relevant factors for visual inspection, it was then necessary to design a conceptual framework to include these into a risk metric.

Work Stream 3: Identification of Contextual Factors
First, it was necessary to identify the factors involved in the visual inspection task. This was achieved by observation of the inspection process and communication with industry experts. Specifically, a visual inspection task always depends on how well the defect is manifested in the view. Hence, the detectability or manifestation is a factor that needs to be included. See related work with Bowtie analysis [89]. We identified three primary factors: criticality, severity, and detectability. Criticality is the importance of the defect type to be detected before the part is released back to service. Severity in turn describes the characteristics of the defect shape and probability of propagation towards a severe outcome. The manifestation represents the detectability of the defect in the present level of inspection. There is a correlation between the three factors.

Work Stream 4: Integration of Factors into an Inspection Risk Calculation and Method Validation
The relationships between the three factors were analysed and expressed using a three-dimensional correlation matrix. Subsequently, the proposed method was applied to a case study in a MRO environment. A comparison between the traditional two-factor risk equation and the proposed three-factor approach was done by calculating the two risk scores for the selected defect samples. The most experienced industry expert then validated the veracity of the two results for each case and determined which one best represents the reality. We have high confidence in the validity of the experts' ability to detect the ground truth, since the highly regulated nature of the aviation industry ensures that there is a hierarchy of inspection seniority based on passing a personal certification process.

Defect Taxonomy and Specimen Images
The defect taxonomy was used from [88]. Specimen images for each defect are shown in Table 2. These represent a subset of a larger collection of images.

A Generalised Model for Cofactors in Risk Assessment
Inspection of the existing methodologies hints at a common underlying structure, whereby a consequence metric is multiplied by a likelihood metric, and then by a variety of other factors. Hence, to some extent, many of the methodologies follow an approximate ISO 31000 construct of risk, though they do not all use the consequence and likelihood metrics in precisely the same way (see Table 1).
Therefore, we propose a general scheme for an extended risk assessment, whereby the basic structure follows the ISO 31000 framework of consequence × likelihood, for continuity of interpretation. To this is appended a third 'cofactor'. Hence: We propose that the cofactor runs from halving to doubling the risk, i.e., takes the range: The cofactor can vary according to the industry and application. Alternative terminologies might be conditional, influence, situational, impact, or correction factors.
The 'cofactor' (X) itself comprises any number of additional 'contextual' factors CF1, CF2, CF3, etc. These are determined based on the industrial context. The relationship between the cofactor and the contextual factors is determined by a correlation matrix. This calculation methodology is illustrated in Figure 4. Multiple contextual sub-factors can be assimilated in the cofactor, and for each application, it would be necessary to determine how to do this, i.e., the algorithm need not be fixed.
We interpret consequence as the harm or damage outcome to the overall system. Thus, the other terms for consequence are severity, impact, and loss. The likelihood describes the chances that an event results in these negative outcomes. Thus, in the example of Failure Mode and Effect Analysis (FMEA), we would interpret its severity as a type of consequence, the probability as a likelihood metric, and the detection as a cofactor for the covertness of the failure mode. The same applies to the Failure Mode Effect and Criticality Analysis (FMECA), whereby the criticality is interpreted as the cofactor.
The consequence and likelihood scales are not always continuous scales, but rather have discrete steps. Most frameworks have a five-step scale for consequence and likelihood. Hence, for consistency, we propose that the maximum product of consequence and likelihood without the cofactor shall be about 50, however, that is arranged. Thus, for a context involving inspection of a defect, the risk scales would be as shown in Table 3. The consequence represents any adverse outcome that could occur if a defect stays undetected and propagates. The likelihood in turn describes the occurrence rate of such negative outcome. The contextual factor represents variables that may influence the outcome (inspection accuracy in the present case). Ultimately, the contextual factor adjusts the risk score based on the reliability of the inspection. The resulting risk score can range from 0.5 to 100. An overview of the risk scores and associated risk levels is presented in Table 4 and visualised as a three-dimensional risk matrix in Figure 5.

Contextual Factors in Visual Inspection
In the case of visual inspection tasks, there are multiple contextual factors that need to be identified and then assimilated into a single risk cofactor. Visual inspection of aircraft components is a complex task, since there are numerous internal and external variables, parameters, and conditions that may adversely affect the inspection. Thus, it is difficult to incorporate all factors. We therefore focused on those having the largest influence on the visual perception of the inspector, since the subsequent decision-making process is highly dependent on that perception.
We determined the contextual factors by applying a Bowtie analysis [89], which in turn relied on discussion with experts. Hence, we propose three contextual factors for visual detection: 1.
The criticality of the defect if it stays undetected and the part is being released back to service, with the risk of propagating and cascading towards severe damage and harm. 2.
The severity and characteristics of the defect shape. 3.
The ability to detect the defect with the selected inspection method before the engine returns to service.
All three play a crucial role in the ability to detect a defect and the resulting decisionmaking. The factors are elaborated below, and the integration into the overall risk equation is shown subsequently. Regarding the scales for the contextual factors, we decided to use scales with scores ranging from 1 to 5 rather than −2 to +2. This is because the combination of multiple negative numbers, e.g., in a product operation, results in confusion of signs in the outcome.

Defect Class Type-Criticality Factor
The seriousness level is determined by the risk of propagation during future operation and the ease of repairing the defect.
There are twelve different types of defects that can occur on engine compressor blades [88], each with its own characteristics. Some defects are more critical than others, as they can lead to more severe damage and the propagation is much quicker, which means it can cause damage even before the next routine inspection.
The approach taken was to understand the stress initiators (e.g., sharp bottoms) and stress pathways in the material based on the loading occurring in service. Since the depth of some surface defects, such as corrosion, on these type of blades is sometimes negligible and the blades may not even fail, the risk is relatively low to cause any negative outcome except for efficiency loss. It is also not a common type of defect found on this blade material.
The criticality scale is shown in Table 5 below. Defects with a high criticality were rated with a score of three, while critical defects were scored with two points, and less critical ones received a score of one. The proposed criticality rating (Table 6) is based on the risk of failure and the potential of resulting in catastrophic outcomes. This includes consideration of several defect attributes. For instance, the location of where the defect commonly occurs plays an important role, i.e., defects on the edges are more critical than defects on the centre of the airfoil. Furthermore, the characteristics of the defect shape are taken into account, i.e., sharp defects propagate faster to more severe defects than smooth round bottom defects. The most critical defects involve material separation, while the critical ones are characterised by material deformation. Both criticality level two and three defects are typically found on the edges, which increases the severity further (see next section). Criticality level one defects however are typically found on the airfoil, the least critical blade area. The only exception is tip rub, which is caused by elongation of the blade under centrifugal forces during operation and is hence not foreign object-related. Since it is a known effect, it has been considered in the design phase when determining the life cycle of the part and thus the criticality is considered as being low despite some material deformation and removal.

Propagation Characteristics-Severity Factor
It should be noted that in the area of aircraft engine inspection, there is a periodic inspection process at certain intervals. Hence, the question is not so much whether an undetected defect will be released to service, but rather whether that defect might propagate to catastrophic outcomes before the next service. This complicates things because it requires that regard be given to the type of defect and its propagation characteristics.
Severity represents the potential for the defect to grow to catastrophic outcomes before the next maintenance inspection. It takes into account the size of the defect, categorised by type of defect. For example, a long crack is more likely than a small edge nick to propagate to complete engine destruction before the next inspection, and hence has higher severity. In contrast, a surface defect such as a scratch or corrosion has low severity. When assessing severity, the inspector uses their expertise and training to evaluate how severe the outcomes would be, if the defect under examination was not repaired or replaced.
We propose the concept of retained defect to represent a condition that is not treated but instead passes back into service. The defect could be retained for many reasons: because the inspector judged it to be small (that judgement could be correct or wrong), or it was not visible with the technology available (borescope resolution is limited), the part was too dirty to see it, or it was not apparent from that view. Several of these are covered by the other contextual factors, and hence there is correlation between the factors (explored later).
The proposed severity scoring scale is presented in Table 7 below. A minor defect has a low probability of progressing to severe engine failure before the next engine shop visit (score of one), while a large defect has a high severity (score of three).

Severity
Score Severity Descriptor Description 1 Low severity Retained defect will not cause an engine failure before the next shop visit (6-12,000 cycles).

Moderate severity
Defect has the potential to increase and propagate towards a more severe damage and has the potential to cause engine failure during test or operation. The latter can lead to loss of engines, aircraft, and even lives.

High severity
Obvious defects that can cause damage to the engine and test cell, or subsequently in service cause severe damage and harm to aircraft, engine or passengers.
The primary categorisation of severity is again by defect type, see Table 8. Note that some defects have no level one severity, as they are: (a) not visually detectable at this level e.g., micro-cracks, or (b) because they are progressions of other pre-existing defect types, e.g., a teared, battered, or broken blade. Likewise, for some defects, there might not be a level three, e.g., for corrosion, since the blades are made of titanium and the corrosion is merely of the surface deposits.
As the table shows, the more severe defects are visually pronounced, and have a high likelihood of being detected under favourable viewing conditions.

Detection of the Defect-Detectability Factor
Detectability refers to the extent to which the defect is visible to the inspector. The main parameters are (a) the level of disassembly, and (b) the viewing angle that the operator has of the defect.
There are different levels of inspection before and during an engine shop visit, starting with on-wing in-situ borescope inspection, followed by module inspection, and on-bench piece-part inspection. Each inspection method allows additional views of the part compared to the previous one. Hence, there is a relationship between camera view and level of disassembly. The higher the level of disassembly, the higher the part exposure, and hence the more camera views are possible. Some defects are simply not visible from certain directions or at certain stages of disassembly.
Defects vary in shape and appearance, and hence there is a need to better understand which views are beneficial for each type of defect. An unfortunate view may lead to missing an important defect even by an expert inspector. Thus, it is even more critical when the level of expertise is low and the defect is not presented in the best possible way.

Representative Blade Views
Eight representative blade views were chosen in a way that all relevant areas that need to be inspected during visual in-situ inspection of engine blades are covered. The designation of those views is: leading edge 1 (LE1), leading edge 2 (LE2), concave airfoil 1 (CC1), concave airfoil 2 (CC2), trailing edge 1 (TE1), trailing edge 2 (TE2), convex airfoil 1 (CX1), and convex airfoil 2 (CX2). For better understanding, the acronyms of the relevant views and the viewing directions towards the blade are shown in Figure 6a,b below. The diagram serves only the purpose of illustrating the different views. Since the airfoil is curved and twisted, a simplified but not-to-scale representation was chosen. The acronyms are used in the following sections. The idea of detectability is included in the literature, though not specifically applied to visual orientation as it is here. For example, Youssef et al. defined the detectability as 'the likelihood of discovering and correcting a hazard or failure mode' [28]. During inspection of parts, either in the manufacturing process or during a maintenance repair and overhaul (MRO) process, the viewing angle of either the camera or the human eye and the illumination have a major effect on the appearance of anomalies and defects. Lee et al. [90] previously discussed this phenomenon in the manufacturing process of injection-moulded parts. Detection of defects, such as cracks, scratches, or finishes, are dependent on the viewing angle and incidence of light. Likewise, Zhang et al. [91,92] emphasised the same effects during inspection of highly specular reflective (HSR) surfaces, such as of chrome-plated parts. Reflection on engine blades is less common since these parts, made of titanium and some with a ceramic coating, are often discoloured or covered with deposits, and thus illumination might be less critical. In a manual inspection process, the inspector can move the part relatively to the light source and their eyes, but this is not possible with photographs. In borescope inspection the light source and the camera are always in line-this is a design constraint.
There is a correlation between the view that the operator has of the defect, and the type of defect. We approached this as follows.

Defect View Scale
A camera view comparison was made to determine the best views showing the defect under investigation. The evaluation of the view suitability was made based on the perception of industry experts and verified by the actual defect dimensions as visible from that particular view.
We prepared a set of photographs comprising different views of a variety of defects. These images were shown to the inspection experts (N = 2). We then asked them to rate the viewing positions based on a rating scale from most unfavourable (3) to most favourable (1), as shown in Table 9 below. Between them, the experts had 45 years of experience. One expert did the first evaluation, and the second reviewed and approved the scores. It shall be noted that the scale is inverse, i.e., a high detectability receives a low score as missing the defect is low. Likewise, a low detectability results in a high risk score. We added the level of disassembly to the table as this might help industry select the appropriate detectability score.
The verification was done by using a software that measures the defect size visible on the photograph (in pixels) for each view [93]. The results were than compared and a ranking was made based on the visible defect size.

Correlation Between View and Type of Defects
The ratings given by the experts for each type of defect and viewing angle are presented in Table 10. It can be concluded from the scores that defects that have a stronger three-dimensional shape on the edges, such as teared or battered blades, can be detected from most angles, whereas surface defects have less views from which they can be seen.
Implications are that the view needs to be selected for the type of defect being sought. For example, the best perspective to detect a crack is the CX1 view, but this is relatively poor for bent or battered defects. Generally, it will be necessary to have multiple views. If only one view is possible, then the results identify it as CC2.

Inspection Risk Calculation
To recap, our objective is to determine the multiple sub-factors that make up the cofactor for the risk equation. Having identified three sub-factors for blade inspection (criticality, severity, and detectability), it is now necessary to identify the relationships between them.

Determination of the Cofactor for Blade Visual Inspection
Multiplication of the factors is the most common approach when calculating a risk score. However, other mathematical operations, such as the power law, can be applied as well [47]. In the case of four dimensions, the risk has been calculated by the volume of the pyramid [39]. The literature shows that another way to solve the problem of combining multiple scales can be to use a nomogram. This graphical method involves constructing lines between points on multiple scales, with the intersections then giving the output variable. They were once popular for sizing engineering componentry in an era before computing power, as they obviated the need for the complexity of using slide rules. A novel application of a nomogram to safety has more recently been shown by Amirshenava and Osanloo for mine closure risk [94]. Nonetheless, nomograms do require an explicit algorithm for their construction, even if that is not apparent in the graphical representation. Hence, they were not considered further here. Instead, we proceeded with a correlation matrix between the three factors.
In a first step, the relationship between criticality and severity was represented as a product operation. We justify this on the basis that each of these factors makes the other worse. With a three-level ordinal scale for each, the resulting scores range from 1 to 9 (see Figure 7). The issue that arises with this approach is that different combinations can result in the same overall score. For instance, a non-critical defect of criticality level 1 but with a level 3 severity rating would mathematically result in the same score as a highly critical defect (level 3) of minor severity (level 1). In practice however, this is not the case, as a critical defect, such as a cracked, teared, battered, or broken blade, always needs to be removed from service, independent of the severity level. This problem becomes even more apparent when adding a third dimension to the contextual factor matrix, as there is an increase in combinations available, obtaining the same score. Therefore, the scales need to be adjusted for the subsequent calculation.
The solution we propose is to adjust the weighting. Thus, the criticality, being the most important factor, is given the largest weighting, followed by the severity and detectability. Hence, criticality is rated 1 to 10, severity 1 to 5, and detectability 1 to 3. Therefore, the Contextual Factor Score becomes: Contextual Factor (CF)Score = Criticality (de f ect type) × Severity (de f ect size) × Detectability (view) Similar to the risk matrix, the three contextual factors influencing the inspection and decision-making process can be visualised as a three-dimensional matrix (Figure 8). The scores can range from 1 to 150. The higher the score resulting from Equation (3), the higher the influence on the inspection risk. An extended colour scheme was introduced to visualise the different levels, reaching from minor (green), low (yellow), moderate (orange), high (red), and extreme (burgundy). The resulting cofactor that is fed back into the generic risk equation (Equation (1)) can be retrieved from the right column of the correlation matrix shown in Table 11.

Case Study
The proposed framework was tested and validated by applying it to a case study of high-pressure compressor (HPC) blades of gas turbine engines. This includes sample blades with different types of defects and severity levels at different inspection levels (Figure 9a-d).
First, the traditional two-factor method was applied to calculate the overall risk score (RS) for all four blade samples by multiplying the consequence and likelihood. This resulted in the following risk scores: Blade (a) and (b) RS = 8 (2 × 4), blade (c) RS = 5 (1 × 5), and blade (d) RS = 6 (2 × 3).
Subsequently, the risk scores for the same blades were determined using the new approach, which takes into account the contextual factors (criticality, severity, and detectability). The defect type was classified as nick and thus the criticality score was 5 (level 2). Its deformation is quite severe and therefore received a score of 5 (level 3). Both the criticality and severity score are the same for image (a) and (b), since it is the same blade. However, Figure 9a shows the boroscopic representation where the defect is quite difficult to detect, whereas in the piece-part representation, the defect can hardly be missed (Figure 9b). Therefore, Figure 9a has a detectability score of 3 (level 3), whereas Figure 9b has a score of 1 (level 1).
Multiplying the contextual factors following Equation (3), the resulting cofactor is 75 and 25 for Figure 9a, b, respectively. Applying the correlation matrix (Table 11), the cofactor for Figure 9a is 2.0 and is 1.0 for Figure 9b. This shows not only that the overall risk scores differ to the one of the traditional risk approach, but also that the risk can be different for the same blade and defect at different levels of inspection (borescope vs. piece-part). The conditions shown in Figure 9c are two dents on the airfoil and therefore the criticality factor equals 1. The defect is presented in the CC1 view (perpendicular to the surface). Since this is the ideal perspective for airfoil dents, it is highly detectable and receives a detectability score of 1. The defect is of moderate size and thus equals severity level 2. The CF score is 2 and the resulting cofactor score is 0.5. This indicates that the defect does not affect the safety of aircraft operation, which was confirmed by the industry experts. If such a defect is detected during borescope inspection, the engine does not require a costly teardown or further inspection. This is the result of the proposed framework, whereas the traditional approach would have resulted in a twice as high risk score and possible different maintenance and disassembly decision.
In some cases, such as the blade presented in Figure 9b, the cofactor score equals 1.0 and the proposed three-factor approach results in the same risk score as the traditional two-factor method. This demonstrates that the traditional risk approach is still accurate in some cases (where the cofactor equals 1.0).
We wanted to show that the proposed inspection risk approach is also applicable to other engine blades and parts and therefore included a turbine blade in the case study (Figure 9d). The defect is a broken-off corner of the blade with a criticality score of 10 (level 3), a severity score of 5 (level 2), and a detectability score of 1 (level 1). The resulting contextual factor score is 50 (10 × 5 × 1), which translates to a cofactor score of 1.5. This leads to an overall risk score of 9 (2 × 3 × 1.5). While this particular example shows a defect type that is common on both HPC and HPT blades, there are other defects that only occur on HPT blades and that need to be added to the score lists. Nonetheless, it is possible to apply the same principles and risk equations to any other engine parts.

Go/No-Go Matrix
In some cases, a five-level scoring system is not effective and a simplified version might be required, for example in situations where a decision has to be made. This is readily accommodated by reducing the decision factor categories down to two, namely 'go' and 'no-go'. The threshold can be adjusted based on the company's risk-appetite. 'Go' means that the inspection conditions are good enough to make a sufficiently reliable decision as to whether or not to strip down the engine and perform a more detailed inspection. 'No-go', in contrast, shows that the decision cannot be made with certainty. The relationship between the decision factor score and decision output is shown in Table 12, and the resulting three-dimensional go/no-go matrix is presented in Figure 10. Table 12. Relationship between decision factor and decision output.

Risk Score
Decision Output Colour 1-29 Go Green  No-go Red Figure 10. Three-dimensional go/no-go matrix.

Optimal Viewing Perspectives
When working in an environment with time pressure, such as in aviation maintenance, it is often not practical to take several recordings of a single part, especially if there are hundreds of parts that need to be inspected. Hence, there was an interest to identify the most favourable views to capture as many defects with as few perspectives as possible.
The above method lends itself to this. The multiple views can be analysed using the risk assessment to give a score for criticality and detectability. The view with the highest score is best. This process can be repeated for the range of defect types, to determine overall scores, or scores for specific types of defect (e.g., nicks). This information may then be used to determine which views to prioritise as part of the organisational quality management system. Results for the blades in this dataset are shown in Table 13. The results show that when choosing a minimum set of views to cover all areas of the blade, CC2 and CX2 combined have the greatest potential to cover all defect types, locations, and severity levels. CC2 combined with CC1/LE2/TE2 would have given a higher total score, but would not have covered all blade regions.
In some cases, it is known that an engine has experienced a particular event, e.g., bird strike, 'lucky' coin ingestion, or volcanic ash, in which case it becomes possible, knowing the types of defects that arise, to determine which views would be most effective to view any damage.
For all defects that involve material separation or removal such as scratches, tip rub, nicks, cracks, and breaking, an airfoil view (CC1, CC2, CX1, and CX2) is beneficial. For all defects that involve material deformation and change of shape, such as tip curl, tears, dents on the leading edges, bents, and battered blades, edge views (LE1, LE2, TE2) are beneficial. For all defects that occur on the airfoil surface, such as corrosion, scratches, and dents on airfoil, a concave view (CC1 and CC2), and dependent on the severity, also a convex view (CX1 and CX2), can provide a good detection.

Dents
The appearance and detectability of dents is highly dependent on where the defect is located on the part and on the level of severity. For example, dents on the leading edge are never entirely on the vertex of the edge. The foreign object that hit the blade continues its pathway on either the convex or the concave side of the blade. View LE1 is always good but dependent on the trajectory of the foreign object, sometimes view LE2 is the best, and some other times, perspective CC2 is the best to see the damage. In contrast, a dent on the airfoil highly depends on the severity for its detectability.

Tears
The detectability of tears depends on the location and severity. In some cases, the tear evolved in a way that it is not detectable (as a tear) but rather looks like a nick. This is because a nick often pre-exists and evolves into a tear over time. Hence, a nick can be the preceding defect from which a tear evolves. Thus, the defect may not be detectable in the TE1 view, although this is the position where the defect is closest to the camera sensor. In other cases, this is the best view. Hence, it is difficult to determine a standard view here. We decided to choose a "good view" (LE2) as a somewhat ideal view, knowing that in some cases, there might be a better view, but this one is the one whereby the defect can always be seen. The same counts for LE1.

Nicks
The situation with nicks is similar to tears. The best view highly depends on how the defect was formed and its severity level.

Summary of Outcomes
This work provides a universal framework for risk assessments that incorporates a third dimension into the risk equation, the so-called cofactor. This factor represents the industrial context, and can comprise multiple sub-factors that are application-specific. A method has been devised, including appropriate scales, for the inclusion of these into the risk assessment. A traffic light colour scheme was applied to visualise the different risk levels.
A simplification of the framework provides a method for go/no-go decision-making, which is more relevant to industry practitioners, who need a more black-and-white approach, since any grey will cause inconsistency and potentially an incorrect decision.
In principle, the proposed method allows to add an unlimited number of contextual factors to the risk equation. The contextual factors are summarised (normalised) into one cofactor using a correlation matrix. This is necessary to avoid that the total risk score does not increase infinitely and that adding more contextual factors does not outweigh the likelihood and consequence component, i.e., the number of contextual factors does not affect the risk score.
The proposed framework was applied to a case study in the aircraft engine maintenance domain. The contextual factors relevant for the specific case of visual inspection of engine blades were identified as being criticality, severity, and detectability. Appropriate scales were devised and the new framework was validated by experts in the field.
We suggest that factors that affect the risk can have a negative but also a positive effect. For instance, under ideal inspection conditions such as the ones in on-bench inspection, the risk of missing a defect is much lower than, for example, in borescope inspection, where the environment is more difficult and thus the overall risk score needs to be higher.

Implications for Practitioners
The proposed framework was tested and validated in the specific area of visual inspection of gas turbine engine blades. The generic structure, however, allows for application of the risk framework to other inspection and quality assurance tasks within and outside the aviation sector. Likewise, it is applicable to any other process within the aviation maintenance domain and beyond. It might be of particular interest to other error-prone high-reliability organisations (HROs), including nuclear power, oil and gas, mining, or healthcare [89], as the understanding of risk and safety in those industries is of utmost importance. This generic framework not only allows for the integration of human factors into the well-known consequence × likelihood risk equation, but also supports the standardisation of risk assessment across different organisations and industries.
The go/no-go matrix can be generalised and applied to other applications outside quality assurance, since decisions have to be made in any area. For example, a company might decide whether to invest into a certain product or market based on the inherent risk and the company's risk appetite. In project management, one could think of a project that is exposed to increased risk and the project manager needs to make a call if the project shall continue as planned or needs to be adjusted to the changed circumstances.
One of the most important aspects of risk assessment is the effective communication of those [95,96]. If risks cannot be communicated properly so that stakeholders understand the need to implement new means of prevention and mitigation, or reinforce existing ones, and understanding which risks are most critical, then the entire risk calculation and assessment is of little use. Thus, applying a traffic light system, or a go/no-go system, although done before, can support such risk communication and make it more understandable to people throughout the organisation. Consistency of quality expectations across an organisation, and between organisations, is important in high-reliability systems.
Adding contextual factors to the risk equation requires the involvement of shop floor staff in the risk assessment process to identify the important factors, since they are the ones with the best understanding of the context. Furthermore, the quantification of those factors often relies on expert judgement, when historical data might be scarce or simply not available. Thus, it is beneficial if the context expert (in our case the visual inspection expert) can perform or at least support the risk assessment and provide an 'as good as possible' estimate. This collaboration between the risk analyst and shop floor staff might have several benefits, including increased awareness and better understanding of the inherent risks of the specific application due to improved communication between the risk analyst and workforce, as well as buy-in for changes to reduce those risks. Potentially, the generic and simple structure may even allow non-risk analysts to perform a risk assessment themselves.

Limitations
The framework proposed here could do with more validation. Potentially, this might take the form of a larger study whereby the ground truth was established for multiple blades (by expert assessment), and the risk determined. The detection accuracy from the confusion matrix (false positive/false negative) has not been determined and would be a useful step forward. Alternatively, it could be interesting to track one defective blade through the various work inspection stages, i.e., undertake a longitudinal study of the risk.
The time component of failure has only partly been accommodated in this framework. Some defects may change criticality over time when they propagate towards more severe defects, for example, a nick can become a tear or crack, which can cause a blade to break.
Although one benefit of the proposed framework is that there is no restriction for the number of contextual factors, this is one of the drawbacks at the same time. In this specific case study of visual inspection, three contextual factors were identified. However, in other cases, there might be more or less factors, or the factor scales might be different, and thus the contextual factor scores may vary. Therefore, an adjustment of the correlation matrix based on the number of contextual factors and scores is needed. Such uniform scaling approach has yet not been devised and could be the scope of future work.
The weighting of the scales used for the three contextual factors in this research were adjusted to the specific case. As highlighted in Section 4.4.1, the use of equal and linear scales, would have resulted in risk scores that derive from the reality and thus needed to be adjusted. Applying the risk framework to other industries is possible, but scales may need adjustment. The framework is flexible enough to cope with those variations in scales and the resulting contextual factor scores, providing the correlation matrix is adjusted.
The modified risk equation might be of limited use in situations where the contextual factors or the scales thereof are difficult to quantify. This could be either because of the lack of historic data or the inability of providing estimates by the industry experts.
The three-dimensional risk cube with its traffic light colour coding might be a helpful visualisation and support the communicating risks to stakeholders. While it may work well for the overall risk score comprising three factors (likelihood, consequence, and cofactor), it might not be applicable for the contextual factors level, as there might be more than three sub-factors, which will be difficult to express visually.

Future Research Opportunities
There are several directions for future research. Firstly, there is a knowledge dimension to any risk assessment [44]. We believe this aligns with the expertise and experience of the operator and hence could be included in the risk framework.
Secondly, Hameed et al. introduced a risk-based approach for optimising shutdown inspection and maintenance including human errors and human error probabilities, whereby the authors introduced performance-shaping factors (PSF) related to the performance of the human operator [97]. These include factors such as training, experience, time pressure, work memory, and work environment, among others. Potentially, these ideas might be applied to the current situation.
Thirdly, as mentioned above, sometimes the engine under examination is known to have experienced an incident such as bird strike, in which case, Bayesian methods (conditional probabilities) might be used to determine the likelihood of undetected damage given the views available for inspection.
Lastly, as explained in the Limitations Section, an adjustment of the correlation matrix might be needed if the number of contextual factors or their scale values change. This could be done by proving percentages rather than absolute numbers in the left column of the correlation matrix (Table 11). The contextual factor score levels could then be calculated by multiplying the percentages with the maximum achievable score, which equals the product of the highest value of each scale.

Conclusions
This work makes several novel contributions. The first one is of philosophical nature in that it is proposed that many risk assessments may be reduced to three factors: consequence, likelihood, and a cofactor. The latter represents the industrial context, and can comprise multiple sub-factors.
The second contribution is the identification of three factors that are relevant to visual inspection: criticality, severity, and detectability. Associated with those are a variety of scales for their measurement, and a method to combine them into the risk calculation. This generic framework has the potential to support standardisation in risk assessment to counteract the high variation among different organisations and industries. It has been tested on visual inspection of jet engine blades, but we believe that the principles are adaptable to other inspection tasks. Moreover, due to its generic structure, the risk framework might be applicable to other industries and applications as well.
Another contribution is in the form of a go/no-go matrix that has been devised to support the decision-making process, wherever a clear answer is required and a maybe is not acceptable. One finding worth mentioning is that risk scores do not always have to increase when adding additional factors, but can also decrease, depending on the positive or negative impact those factors have.
A fourth contribution is specific to blades, and the identification of the views for which defects are most visible. In practice, the usefulness of this is more likely to be in its corollary: knowing what incident an engine has experienced, what is the likelihood that any associated defects will be visible? While this paper does not completely answer this question, it provides a method by which it ought to be possible.  Data Availability Statement: The data are not publicly available due to commercial sensitivity.