Skip to Content
MachinesMachines
  • Article
  • Open Access

12 March 2026

Research on Root Cause Analysis Method for Certain Civil Aircraft Based on Ensemble Learning and Large Language Model Reasoning

,
,
and
1
Shenyang Institute of Computing Technology Co., Ltd., CAS., Shenyang 110168, China
2
College of Automation, Shenyang Aerospace University, Shenyang 110136, China
3
College of Software, Northeastern University, Shenyang 110819, China
4
Tianjin Jepsen International Flight College Co., Ltd., Tianjin 300399, China
This article belongs to the Section Automation and Control Systems

Abstract

To address the challenges commonly encountered in civil aircraft operating under multi-mode, strongly coupled closed-loop control—namely scarce fault samples, pronounced distribution shift, and root-cause explanations that are easily confounded by covariates—this paper proposes a root-cause analysis method that integrates ensemble learning with constraint-guided reasoning by large language models (LLMs). First, for Full Authority Digital Engine Control (FADEC) monitoring sequences, a feature system comprising environment-normalized ratios, mechanism-informed mixing indices, and multi-scale temporal statistics is constructed, thereby improving cross-mode comparability and enhancing engineering-semantic expressiveness. Second, in the anomaly detection stage, a cost-sensitive LightGBM model is adopted and a validation-set-based adaptive thresholding strategy is introduced to achieve robust identification under highly imbalanced fault conditions. Furthermore, for Root Cause Analysis (RCA), a “computation–reasoning decoupling” framework is developed: Shapley Additive exPlanations (SHAP) are used to generate segment-level contribution evidence, while causal chains, engineering prohibitions, and structured output templates are injected into prompts to constrain the LLM, enabling it to infer root-cause candidates and produce structured explanations under mechanism-consistency constraints. Experiments on real flight data demonstrate that our method yields an anomaly detection F1-score of 0.9577 and improves overall RCA accuracy to 97.1% (versus 62.3% for a pure SHAP baseline). Practically, by translating complex high-dimensional data into actionable natural language diagnostic reports, the proposed method provides reliable and interpretable decision support for rapid RCA.

1. Introduction

As the core power system of civil aircraft, the operational reliability of aircraft engines directly impacts flight safety and the operational efficiency of the air transport system [1]. With the aviation industry entering a data-intensive development phase, modern civil aircraft are generally equipped with full-authority digital electronic control systems and quick-access recorders, which continuously collect and record high-dimensional monitoring data during flight. These data cover key information such as airpath performance, mechanical conditions, and control responses, providing a data foundation for the transition from traditional scheduled maintenance to condition-based and predictive maintenance. Meanwhile, emerging paradigms such as digital-twin-enabled fault diagnosis leverage multimodal information fusion to enhance the consistency between data-driven inference and physical behavior [2]. However, aircraft engines have characteristics such as strong thermodynamic coupling, rapid changes in operating conditions, and complex closed-loop control structures. Achieving reliable fault detection and further providing mechanistically consistent root-cause explanations, remains a key challenge in both engineering practice and academic research.
In existing aircraft engine fault diagnosis research, data-driven methods are a key route. While traditional machine learning and deep learning have improved fault identification accuracy, they still face three major challenges in real-world scenarios [3]. First, real fault samples are scarce, and class distributions are highly imbalanced, leading models trained with standard loss functions to favor the majority class, causing key faults to be missed. To alleviate fault data scarcity and improve robustness under evolving operating conditions, recent studies have explored fault data generation and stream fine-tuning strategies to adapt diagnostic models under domain shifts [4]. Secondly, the operating state of the engine is significantly influenced by changes in altitude, speed, and environmental parameters. Consequently, the normal operational baseline and statistical distribution of the sensor data shift across different flight modes. This makes pure data-driven models struggle to distinguish between environmental disturbances and system degradation, requiring advanced uncertainty quantification methods [5,6] and domain adaptation mechanisms, including those designed for transient gas-path fault diagnosis [7]. Third, in safety-critical maintenance scenarios, simply providing anomaly detection results is insufficient to support maintenance decisions [8]. Operators require traceable, verifiable, and physically consistent root-cause explanations with propagation paths; case-based reasoning offers a practical solution for preserving diagnostic traceability and supporting decision justification [9], yet many high-performance models still exhibit significant shortcomings in interpretability [10].
To enhance interpretability, various feature attribution methods have gradually been introduced into aircraft engine diagnostics to quantify the contribution of different features to the model’s output [11]. For instance, existing methods such as Integrated Gradients [12] evaluate feature importance by accumulating gradients along a path from a baseline to the actual input, while Deep Learning Important FeaTures (DeepLIFT) [13] allocates contribution scores by comparing neural activations against a reference state. However, these methods primarily provide numerical importance rankings based on local gradients or statistical correlations, inherently lacking the analysis of physical causal propagation chains, which makes the diagnostic results unintuitive and difficult for human engineers to comprehend. Furthermore, these attribution results can even produce systematically misleading importance patterns under common modeling settings [14]. In closed-loop control systems, there is a strong correlation and feedback coupling between control commands, setpoints, and measurements, which can lead to the phenomenon of “apparent high contribution,” where control variables that covary with faults are misinterpreted as key features, thereby obscuring the true fault source. This type of false causality weakens the diagnostic reliability and engineering usability of existing interpretability methods in complex engineering systems [11,14].
In recent years, LLMs have demonstrated strong capabilities in semantic understanding and logical reasoning, providing new possibilities for fault explanation and knowledge integration in complex systems [15,16]. Through appropriate prompt design and knowledge injection, LLMs have the potential to simulate expert reasoning processes, integrate multi-source information, and generate causal explanations; for example, causal-knowledge-enhanced LLMs have been explored for cause analysis in aerospace manufacturing settings [17]. Meanwhile, multimodal LLMs have also been explored to couple signal evidence with language-based reasoning for explainable fault diagnosis [18]. However, general-purpose LLMs still exhibit instability in numerical precision, physical consistency, and constraint satisfaction. If directly applied to raw monitoring data for diagnostic reasoning, they may produce explanations that violate engineering common sense or lead to inconsistent conclusions. Therefore, ensuring reliable numerical calculations while constraining the LLM’s reasoning capability within a verifiable and constrained physical knowledge space is the core technical challenge for applying LLMs to aircraft engine diagnostics, motivating rule- and constraint-aware reasoning mechanisms [15,19].
To address the above challenges, this paper proposes a root-cause analysis framework for certain civil aircraft engines that integrates physical mechanism-driven feature engineering, ensemble learning-based anomaly detection, and constraint-guided reasoning with LLMs. This framework decouples numerical computation from cognitive reasoning: first, an ensemble learning model is used for high-precision anomaly detection in a feature space enhanced by physical semantics, and feature attribution methods are applied to extract segment-level evidence [11]. Then, constraint-based prompt strategies containing engineering rules and causal topologies are introduced to guide the LLM to perform consistency verification and causal tracing of the attribution evidence, achieving root-cause diagnosis output that balances detection performance with mechanistic interpretability [15,16,18]. This design aligns with recent RCA studies that leverage causal knowledge maps and structured reasoning to support root-cause localization in complex industrial environments [20].
The main contributions of this paper are as follows:
  • A high-dimensional feature space guided by physical mechanisms is constructed. Feature semantic expression is enhanced through environment-normalized ratios, mechanism-informed mixing indices, and multi-scale temporal statistics, thereby mitigating the effects of operating condition changes and coupling on model performance, and establishing a correspondence between numerical features and physical semantics.
  • An anomaly detection method based on cost-sensitive ensemble learning and adaptive threshold optimization is proposed, which improves fault recall capability and controls false positives under highly imbalanced data conditions.
  • A root-cause analysis architecture that decouples computation and reasoning is designed. By injecting engineering rules and causal constraints, the LLM is guided to perform secondary reasoning on the attribution results, suppressing interference from apparent features in closed-loop systems, and transforming statistical correlation explanations into mechanistically consistent causal explanations, thus generating structured and actionable diagnostic conclusions.
The remainder of this paper is organized as follows. Section 2 introduces related research and theoretical foundations, Section 3 elaborates on the proposed methodology, Section 4 presents experimental validation and comparative analysis based on real flight data, and Section 5 summarizes the paper and discusses future research directions.

2. Background and Methodological Foundations

2.1. Dataset Description for Certain Civil Aircraft

2.1.1. Aircraft Dataset and Fault Features

In this study, operational data from a certain civil aircraft equipped with a FADEC system were collected, with a sampling frequency set at 1 Hz. Each time-series sample has a duration of approximately 12,000 s, covering three complete flight cycles. The data were then processed and structured into a standard time-series format.
For a comprehensive characterization of the aircraft’s dynamic and control responses, this study defined ten core feature variables, as summarized in Table 1. These variables include power output indicators, such as crankshaft speed and actual power, as well as target and actual values for intake and fuel pressures, control loop parameters for exhaust valve opening, and environmental and electrical state parameters related to atmospheric pressure and DC voltage. Additionally, domain experts performed binary fault labeling of the dataset based on historical operational conditions, where a value of 0 indicates normal operating conditions, and a value of 1 indicates a fault state. These labels serve as the ground truth for subsequent supervised learning models.
Table 1. Original Features of the Certain Civil Aircraft Dataset.
The experimental dataset used in this study covers 63 flight samples from 17 aircraft, including 24 fault data sets and 39 normal data sets. To support the subsequent root-cause analysis and provide more detail on the nature of the faults, the correspondence between fault samples and feature variables has been systematically summarized. As shown in Table 2, each fault data set provides the associated aircraft ID, fault data ID, specific mechanism ID (with detailed fault mechanism analysis provided in Appendix A), and the main set of abnormal variables, which are used to characterize the variable response features of different fault modes and form a clear “aircraft—fault sample—fault mechanism—fault variable” correspondence. It should be noted that Aircraft 16 and Aircraft 17 only contain normal data in this dataset, with no observed fault samples, and therefore are not listed in Table 2.
Table 2. Correspondence Between Fault Samples and Feature Variables.

2.1.2. Typical Fault Characteristics and Mechanism Analysis

To intuitively analyze the temporal characteristics of fault samples, Figure 1 presents a local segment of typical fault data, designated as Fault1. As indicated by the red-bordered area, during the high-load operation phase of the aircraft, MIAP and EVOP exhibit significant high-frequency coupled oscillations. Specifically, under normal operation, EVOP should remain stable when the target commands, such as TP and TIAP, are constant. However, a fault in the wastegate solenoid valve induces severe fluctuations in EVOP, which subsequently drive synchronous, coupled fluctuations in MIAP and MP. Although the data is sampled at 1 Hz and the oscillation period spans approximately 2 to 3 s, this rate is considered “high-frequency” relative to the substantial mechanical inertia of the engine, translating into a rapid power juddering distinctly experienced by the pilot. This abnormal fluctuation directly reflects the instability of the closed-loop control within the intake boosting system.
Figure 1. Overview of typical fault data operation.
Analyzing the physical mechanism, the intake pressure regulation is essentially a feedback control process dominated by the Electronic Control Unit (ECU). The control logic initiates with TP set by the pilot. The ECU calculates TIAP based on this command and the current AP. During the closed-loop regulation process, the system computes the real-time deviation between TIAP and the feedback MIAP. Consequently, it outputs regulation commands to change EVOP, aiming to adjust the turbocharger’s boost intensity to ensure that MIAP precisely tracks TIAP.
It is noteworthy that EVOP in the dataset does not directly represent the physical mechanical displacement of the wastegate, but rather refers to the duty cycle signal of the wastegate control solenoid valve. This regulation mechanism relies on a complex pneumatic drive link: after the solenoid valve receives the ECU command, it drives the actuator’s expansion and contraction by regulating the pressure inside the air diaphragm, thereby indirectly changing the physical position of the wastegate. The violent oscillations of MIAP and EVOP observed in Figure 1 indicate a hysteresis or dynamic mismatch in this feedback link. This causes the solenoid valve to regulate repeatedly, resulting in a hunting phenomenon, thereby failing to maintain the dynamic balance of the boost pressure.

2.2. LightGBM Anomaly Detection Model

In this study, the anomaly detection task is formulated as a supervised binary classification problem characterized by extreme class imbalance. This approach differs from traditional unsupervised methods such as Isolation Forest, Support Vector Data Description, and Autoencoders, which primarily rely on statistical deviations from the majority distribution. Instead, historical fault data annotated by domain experts are leveraged. In safety-critical civil aviation scenarios, the objective extends beyond merely detecting statistical outliers to identifying specific and mechanistically significant fault patterns while suppressing false alarms caused by environmental noise. Consequently, by employing a supervised classifier, precise decision boundaries between normal and fault states can be established.
LightGBM is an efficient ensemble learning method based on the Gradient Boosting Decision Tree (GBDT) framework. It addresses the issues of significant computational complexity and memory overhead associated with traditional GBDT, which requires traversing all samples to perform an exact split search in every iteration under high-dimensional, large-scale data conditions. By systematically optimizing feature representation, sample usage, and tree growth strategies, LightGBM significantly improves training efficiency and generalization ability while ensuring model accuracy, which has been validated across a range of large-scale engineering and scientific applications [21,22,23,24].
In the decision tree construction process, LightGBM adopts a histogram-based approximate splitting method. It discretizes continuous feature values into a finite number of bins and, during sample traversal, accumulates the corresponding first-order gradient statistics into the belonging bins. Subsequently, it searches for the optimal split point only on the discrete bin boundaries. This method reduces the computational complexity of split gain from being linearly related to the sample size to being related to the number of discrete bins, thereby effectively reducing memory access times and improving overall training efficiency; such efficiency advantages make LightGBM particularly attractive for high-dimensional, data-intensive modeling tasks [21,22].
Regarding sample usage, LightGBM introduces a Gradient-based One-Side Sampling (GOSS) mechanism. The core motivation is that samples with larger gradients contribute more to the information gain. Therefore, GOSS retains all samples with large gradients while randomly sampling a smaller proportion of samples with small gradients. To correct the bias introduced to the data distribution by this non-uniform sampling, a specific weight compensation is applied to the sampled small-gradient data when calculating the split gain. This design supports efficient learning in practical settings where data volumes are large and training efficiency is critical, as reflected by recent LightGBM-based solutions in engineering decision support and data-driven modeling pipelines [23,24]. Specifically, for a feature j at a candidate split point d, the estimated variance gain V ˜ j ( d ) is calculated as follows:
V ˜ j ( d ) = 1 n ( x i A l g i + 1 a b x i B l g i ) 2 n l j ( d ) + ( x i A r g i + 1 a b x i B r g i ) 2 n r j ( d ) ,
where:
  • V ˜ j ( d ) denotes the estimated split gain for feature j with threshold d on the current node;
  • n is the total number of samples in the current node;
  • g i represents the first-order gradient of the loss function with respect to the i-th sample;
  • A denotes the subset of high-gradient samples (top a × 100 % ), and B denotes the subset of small-gradient samples randomly selected with a ratio b from the remaining data;
  • A l ,   A r represent the subsets of high-gradient samples falling into the left and right child nodes based on threshold d;
  • B l ,   B r represent the subsets of sampled small-gradient samples in the corresponding child nodes;
  • n l j ( d ) ,   n r j ( d ) denote the effective count (sum of weights) of samples in the left and right child nodes;
  • The coefficient 1 a b acts as a weight amplifier for the small-gradient samples in set B to compensate for the under-sampling.
Regarding the tree structure growth strategy, LightGBM adopts a leaf-wise growth method with depth limitation. Unlike the traditional level-wise strategy, leaf-wise growth iteratively selects the leaf node with the highest split gain to split. Mathematically, for the m-th iteration, the algorithm searches for the optimal split strategy ( p m ,   j m ,   d m ) that maximizes the reduction in the global loss function L :
( p m ,   j m ,   d m ) = arg max ( p ,   j ,   d ) L ( T m 1 ) L ( T m 1 . split ( p ,   j ,   d ) ) ,
where:
  • p m represents the optimal leaf node index selected at the m-th iteration;
  • j m represents the optimal feature index used for splitting (consistent with feature j in Equation (1));
  • d m represents the corresponding optimal split threshold (consistent with threshold d in Equation (1));
  • L ( T ) represents the objective function value for a tree structure T;
  • T m 1 . split ( p ,   j ,   d ) denotes the new tree structure after splitting node p using feature j and threshold d.

2.3. Principle of SHAP Feature Attribution

To enhance interpretability and quantify the contribution of input features to prediction results, this paper introduces the SHAP method. Based on Shapley values from cooperative game theory, SHAP calculates the average marginal contribution of a feature across all possible feature combinations, providing an attribution explanation that satisfies the properties of additivity, local accuracy, and consistency [25].
In the additive feature attribution framework of SHAP, for a given sample x, the model output is represented as the sum of a base value and the contributions of each feature. Let M be the total number of features; the additive explanation model is defined as:
G ( x ) = ϕ 0 + i = 1 M ϕ i x i ,
where:
  • G represents the explanation model;
  • ϕ 0 is the average prediction value of the model on the background dataset (base value);
  • x i { 0 ,   1 } is a binary variable indicating whether the i-th feature is present in the current simplified coalition vector;
  • ϕ i is the Shapley value corresponding to the i-th feature, quantifying its contribution relative to the base value.
To characterize the potential dependencies and interaction effects between features, SHAP defines feature attribution as the weighted average of the marginal gains brought by adding feature i to all subsets that do not contain it. Let F be the set of all input features. The Shapley value for feature i is defined as:
ϕ i = S F { i } | S | ! ( | F | | S | 1 ) ! | F | ! f S { i } ( x S { i } ) f S ( x S ) ,
where:
  • F is the set of all input features, and  | F | denotes the total number of features;
  • S represents a subset of features excluding feature i ( S F { i } );
  • f S ( x S ) denotes the prediction of the model restricted to the feature subset S;
  • The term f S { i } ( x S { i } ) f S ( x S ) quantifies the marginal contribution of adding feature i to the subset S;
  • The fraction term acts as a weighting factor based on the subset size | S | to ensure fair attribution among feature combinations.
Given that the LightGBM model used in this paper belongs to the gradient boosting tree framework, this study adopts the TreeSHAP algorithm. By utilizing the path structure and split rules of decision trees, TreeSHAP accurately calculates the Shapley values in polynomial time, providing efficient and reliable quantitative support for root-cause analysis.

2.4. Prompt Engineering

Prompt engineering is a technical methodology that guides LLMs to generate outputs meeting specific task requirements by designing and optimizing input text structures. By structuring explicit task instructions, contextual cues, and constraints, this approach leverages the model’s inherent reasoning and generative capabilities without requiring parameter updates. Mechanistically, prompt engineering relies on the in-context learning capability of LLMs, mapping downstream tasks to text generation formats familiar to the model from its pre-training phase, thereby narrowing the semantic gap between general-purpose pre-trained models and specific domain applications [26].
In complex tasks involving multi-step logical reasoning, such as aviation aircraft fault diagnosis, direct question-and-answer prompting often fails to ensure the stability and interpretability of the reasoning process. Chain-of-Thought (CoT) prompting addresses this by guiding the model to explicitly output intermediate reasoning steps before generating the final conclusion. This enables the decomposition of complex root-cause analysis into several logically continuous sub-processes, enhancing the accuracy and transparency of multi-hop reasoning tasks. From a probabilistic modeling perspective, the standard inference process seeks to solve the conditional probability P ( Y | X ) of the output Y given the input X. The CoT mechanism introduces an intermediate reasoning state Z, decomposing the inference process into the joint modeling of P ( Z | X ) and P ( Y | Z , X ) . This allows the model to form a rational intermediate judgment before deriving the final result [27].
Role-playing prompting guides the convergence of the model’s output distribution towards specific domain knowledge structures and expression norms by assigning a clear professional identity to the model. This strategy effectively activates domain-related knowledge acquired during pre-training, ensuring that the generated content aligns better with engineering practices in terms of terminology usage, physical mechanism descriptions, and diagnostic logic [28].
In industrial applications, combining role-playing prompts with domain constraints can significantly suppress hallucination issues in open-ended text generation. By incorporating structured prior information—such as feature attribution results, fault discrimination rules, and the physical meanings of sensors—into the prompt, prompt engineering transforms the free generation process into a constrained domain reasoning process [29]. This enables general-purpose LLMs to meet the accuracy and interpretability requirements of complex industrial fault diagnosis under zero-shot or few-shot conditions, achieving an effective mapping from numerical monitoring data to semantic diagnostic results [30].

3. Proposed Root Cause Analysis Framework

To achieve interpretable RCA for certain civil aircraft, this study proposes a hierarchical framework integrating mechanism-guided feature engineering, anomaly detection models, SHAP-based feature attribution, and LLM-based reasoning agents.

3.1. Mechanism-Guided Feature Engineering

Civil aircraft are complex thermodynamic coupled systems subject to the dual constraints of the external flight environment and internal control logic. Raw sensor data often exhibits significant baseline drift due to environmental changes and lacks explicit expressions for the non-linear coupling relationships between components. To address these challenges and bridge the gap between the numerical space and the semantic space required for RCA, a high-dimensional feature space consisting of 73 dimensions is constructed. This space integrates environmental normalization, mechanism-based hybrid indicators, and multi-scale temporal statistical features.

3.1.1. Environmental Normalization Ratio Features

Within the flight envelope, drastic changes in atmospheric pressure lead to significant shifts in sensor baselines, making it difficult for algorithms to distinguish between environmentally induced numerical changes and performance degradation caused by system faults. To alleviate this issue, absolute sensor readings are converted into dimensionless ratio features relative to environmental conditions. Specifically, four key pressure and power variables are normalized against the real-time A P . The detailed definitions of these four environmental normalization ratio features are explicitly listed in Table 3.
Table 3. Constructed Features and Their Mathematical Definitions.

3.1.2. Mechanism-Based Hybrid Features

The raw sensor dataset of certain civil aircraft comprises only a ten-dimensional feature space, which is insufficient to fully characterize the complex nonlinear degradation and fault patterns within the aircraft’s thermodynamic system. This limitation restricts the capability of data-driven models to perceive changes in system states. To address this, leveraging domain expert knowledge and physical mechanism analysis, this study mines the coupling relationships between raw parameters and constructs nine derived features with clear physical meanings to enhance the information expression of the feature space. Based on physical attributes, these features are categorized into four groups: efficiency, control deviation, load, and comprehensive features. Their mathematical definitions are listed in Table 4.
Table 4. Mathematical Formulas for Constructed Features.
Efficiency-type features are used to characterize the energy conversion efficiency of the aircraft and the matching degree of subsystems. These include the PEI, IPR, VPE, and FIR. These indicators reflect power output characteristics from the perspectives of rotational speed, intake pressure, power supply voltage, and air-fuel mixing state, respectively. A continuous decline in these indicators typically signifies increased mechanical loss or a reduction in thermodynamic efficiency.
Control deviation features are derived by computing the residuals between the ECU’s target commands and the corresponding measured values. Specifically, PCD, FPCD, and IPCD are introduced to explicitly characterize the tracking errors within the closed-loop control system. Compared with raw sensor signals, these features more effectively expose actuator malfunctions and sensor drift, thereby enhancing the ability to accurately distinguish fault sources.
Load and comprehensive features are used to characterize the aircraft’s operational intensity and multi-mechanism coupling effects. The PLI reflects the instantaneous mechanical load level of the aircraft under different operating conditions, aiding in distinguishing between high-load operations and fault states. The OPI integrates intake pressure, rotational speed, and power information to provide a unified metric for the aircraft’s overall thermodynamic health. By introducing the aforementioned mechanism-based prior features, the information density of the training data is significantly enhanced, enabling subsequent models to recognize complex fault patterns that are difficult to manifest in the low-dimensional raw features.

3.1.3. Multi-Scale Temporal Dynamic Features

Aircraft faults often exhibit specific temporal patterns. For instance, sensor open circuits or actuator jams typically manifest as transient step changes, whereas degradation caused by wear presents as gradual trends. To capture these dynamic characteristics, differential features and sliding window statistical features are introduced.
To highlight abrupt signals and suppress slow linear drifts, the first-order difference is calculated for all original sensor series. For a feature x at time t, the differential feature Δ x t is defined as:
Δ x t = x t x t 1 ,
where this formula emphasizes the high-frequency components of the signal, aiding in the detection of transient anomalies at the incipient stage of a fault.
To characterize local trends and fluctuations, we employ a sliding window mechanism. Setting the window size w { 5 ,   10 } , the moving average μ t ( w ) and moving standard deviation σ t ( w ) are calculated as follows:
μ t ( w ) = 1 w j = 0 w 1 x t j ,
σ t ( w ) = 1 w j = 0 w 1 ( x t j μ t ( w ) ) 2 .
The moving average reveals medium-term trends by smoothing measurement noise, while the moving standard deviation quantifies the dispersion and fluctuation intensity of the signal. These statistics provide key evidence for identifying typical fault modes such as hunting, oscillation, and combustion instability.

3.1.4. Summary of Feature Space

In summary, the constructed feature space comprises 73 dimensions: 10 original sensor features, 4 environmental normalization ratio features, 9 mechanism-based hybrid features, 10 differential features, and 40 sliding window statistical features (derived from the original features using two statistical metrics across two window sizes). This multi-domain representation not only enhances the robustness of the anomaly detection model under varying operating conditions but also lays a solid data foundation for the subsequent LLM-based semantic reasoning.

3.2. Anomaly Detection Framework Based on LightGBM

In the hierarchical architecture of “Anomaly Perception—Feature Attribution—Cognitive Reasoning,” the anomaly detection module bears the primary responsibility of locking onto fault segments from high-frequency massive data. Given the characteristics of aircraft monitoring data—namely high dimensionality, strict real-time requirements, and extreme imbalance between positive and negative samples—this study selects LightGBM as the core classifier. Compared to traditional boosting algorithms, LightGBM introduces GOSS. Furthermore, its unique leaf-wise growth strategy, when processing high-dimensional physical feature spaces, achieves lower training error and higher detection precision than traditional level-wise growth strategies.

3.2.1. Weighted Loss Function for Imbalanced Data

In the operational history of aircraft, normal samples occupy the vast majority, while fault samples are extremely rare. In conventional binary classification tasks, if the ratio of positive to negative samples is disparate, the standard objective function is often dominated by the majority class (normal samples). This leads the model to converge to a trivial solution where all samples are predicted as “normal,” resulting in severe missed detections of the minority class (fault samples).
To address this bias at the algorithmic level, this study introduces a cost-sensitive learning strategy and constructs a Weighted Binary Log-Likelihood Loss Function. Its formal definition is as follows:
L ( θ ) = 1 N i = 1 N [ α · y i ln ( p i ) + ( 1 y i ) ln ( 1 p i ) ] ,
where:
  • N represents the total number of samples;
  • y i { 0 ,   1 } is the true label of the i-th sample, where 1 represents a fault and 0 represents normal;
  • p i denotes the probability predicted by the model that the sample is a fault;
  • α serves as the penalty weight coefficient for the positive class (fault class), typically set as α N neg / N pos , where N neg and N pos are the counts of negative and positive samples, respectively.
During the gradient boosting iteration process, this mechanism significantly amplifies the gradient residuals generated when fault samples are misclassified by introducing the weight factor α . This compels the decision tree’s splitting criterion to prioritize feature boundaries that effectively separate the minority class. Consequently, it effectively corrects the model’s tendency to overfit the majority class, significantly reducing the missed detection rate while ensuring overall accuracy.

3.2.2. Adaptive Threshold Optimization Based on Validation Set

The raw output of the LightGBM model is the posterior probability P ( y = 1 | x ) [ 0 ,   1 ] that a sample belongs to the “Fault” class. In traditional classification tasks, 0.5 is typically selected as the default decision threshold. However, in anomaly detection scenarios characterized by extreme imbalance, a fixed threshold of 0.5 tends to be overly conservative, leading to insufficient Recall. To achieve the optimal trade-off between false alarm rate and recall rate, this study proposes an adaptive threshold optimization strategy based on maximizing the F1-Score on the validation set.
This strategy utilizes independently held-out validation data to determine the optimal decision boundary through an exhaustive search. Let τ be the candidate threshold. For any sample x k in the validation set, its binarized predicted class y ^ k ( τ ) is defined as:
y ^ k ( τ ) = 1 ( Fault ) , if P ( y = 1 | x k ) τ , 0 ( Normal ) , if P ( y = 1 | x k ) < τ .
The optimization objective function for the optimal threshold τ * is set to maximize the F1-Score on the validation set:
τ * = arg max τ [ 0 ,   1 ] 2 · P rec ( τ ) · R ec ( τ ) P rec ( τ ) + R ec ( τ ) ,
where:
  • P rec ( τ ) and R ec ( τ ) denote the Precision and Recall metrics, respectively, calculated under the specific threshold τ ;
  • The F1-Score serves as the harmonic mean of precision and recall, ensuring a balanced evaluation of the model’s performance on the minority class.
The specific optimization process is as follows: First, the trained LightGBM model is used to perform inference on the validation set to obtain the probability prediction vector. Subsequently, a candidate threshold grid is generated within the interval [ 0 ,   1 ] with a fine-grained step size of 0.001, and the F1-Score is calculated for each candidate point. Finally, the  τ * corresponding to the global maximum F1-Score is selected as the fixed decision threshold for testing and online monitoring. Through this optimization process, the model no longer relies on empirical default thresholds but adaptively finds the best decision boundary based on the data distribution, thereby providing high-quality anomaly segment inputs for subsequent RCA.

3.2.3. Overall Training and Optimization Process of Anomaly Detection Model

To visually illustrate the synergistic effects of mechanism-based feature engineering, imbalanced learning strategies, and adaptive threshold optimization mechanisms during model construction, this study designs an end-to-end anomaly detection model training process, as shown in Figure 2. The process first maps raw sensor data into a physical feature space containing ratio features, hybrid features, and multi-scale temporal statistics. Subsequently, a weighted loss function is introduced during the training phase to train the LightGBM classification model, thereby effectively learning the fault patterns corresponding to anomaly samples. Finally, the classification decision threshold is searched based on the validation set to determine the decision boundary that optimizes the F1-Score. This closed-loop process enhances the model’s perception of nonlinear fault mechanisms while achieving a reasonable trade-off between false alarm rates and missed detection rates under highly imbalanced sample conditions.
Figure 2. Overall flowchart of anomaly detection model training and optimization.

3.3. Automatic Identification Logic of Fault Segments

The raw outputs of the LightGBM model represent predicted anomaly probabilities for each discrete time step. Within the onboard monitoring systems of civil aircraft, sensor data remains vulnerable to electromagnetic interference, airflow disturbances, and signal noise, which often induce spurious single-point false alarms in the model output. According to the physical principles governing mechanical degradation, anomalies arising from pneumatic leakage or component jamming characteristically manifest as continuous temporal trends rather than transient stochastic fluctuations.
To improve the robustness of the diagnostic system, filter out high-frequency noise interference, and ensure that subsequent RCA focuses on confirmed fault events, this study designs a set of automatic identification logic for fault segments based on temporal continuity constraints. The complete flowchart is shown in Figure 3.
Figure 3. Flowchart of automatic identification logic for fault segments based on temporal continuity constraints.
First, using the optimal decision threshold τ * obtained by maximizing the F1-Score on the validation set, the continuous probability sequence output by LightGBM is converted into a discrete binary state sequence. For any sampling time step t, the system’s instantaneous state B t is defined as:
B t = I ( P t τ * ) = 1 , P t τ * , 0 , P t < τ * ,
where P t [ 0 ,   1 ] is the posterior fault probability output by the model, and  I ( · ) is the indicator function. When B t = 1 , it indicates an anomaly; otherwise, it indicates a normal state.
On this basis, to lock onto the true fault process, “effective fault segments” are defined as the set of time windows S that satisfy the minimum duration constraint. The mathematical expression is:
S = { ( t s t a r t ,   t e n d ) k [ t s t a r t ,   t e n d ] , B k = 1 ( t e n d t s t a r t + 1 ) L m i n } .
This formula strictly defines the composition conditions of a fault segment: every sampling point within the interval must be determined as an anomaly. The parameter L m i n represents the minimum continuous anomaly length threshold. Based on the mechanical response characteristics of the aircraft, this study sets L m i n = 10 to effectively filter out sporadic transient noise. As illustrated in Figure 3, any anomaly sequence with a length less than L m i n is regarded as noise and discarded.
Once the monitoring system identifies a fault segment ( t s t a r t ,   t e n d ) S satisfying the above conditions, the algorithm immediately locks this time window and slices the corresponding subsequence data matrix X f a u l t from the dataset:
X f a u l t = { x t t s t a r t t t e n d } ,
where x t is the 73-dimensional feature vector at time t. This fault data matrix X f a u l t contains not only the value trajectories of all sensors during the fault period but also implicitly holds the parameter coupling relationships at the time of the fault. It will serve as the core input passed to the next stage for SHAP-based feature attribution calculation to analyze the deep-seated root causes inducing the continuous anomaly in this time segment.

3.4. Local SHAP Attribution and Key Feature Selection for Fault Segments

Unlike traditional global feature importance analysis, this study focuses on local interpretability, specifically performing attribution on the extracted specific fault segment X f a u l t . Since the manifestation characteristics of different fault modes vary significantly, it is essential to dynamically calculate feature contributions for each independent fault event. For a fault segment X f a u l t = { x 1 ,   x 2 ,   ,   x T } with a duration of T sampling points, the algorithm first calculates the SHAP value matrix for every sampling point x t . To comprehensively evaluate the impact of each feature throughout the fault process, the comprehensive importance I i of the i-th feature within this fault segment is defined as the mean of the absolute SHAP values across all sampling points:
I i = 1 T t = 1 T | ϕ i ( x t ) | ,
where:
  • I i represents the comprehensive importance score of feature i in the current fault segment;
  • T is the total number of sampling points in the fault segment (i.e., the duration);
  • ϕ i ( x t ) denotes the SHAP value of feature i at time step t;
  • The use of absolute values | · | is crucial because certain oscillatory faults can cause feature values to fluctuate drastically in both positive and negative directions. Direct summation would lead to positive and negative contributions canceling each other out, thereby masking the anomaly’s true activity intensity.
Based on the calculated comprehensive importance I i , the algorithm sorts the 73-dimensional feature space in descending order and truncates the top 10 features to construct the “Key Feature Set” E s h a p . This selection process not only achieves data dimensionality reduction but, more importantly, benefits from the construction of mechanism-based feature engineering. The attribution results of SHAP are no longer limited to single raw sensor readings but include composite indicators with clear physical directionality. For instance, if the hybrid feature “PCD” ranks at the top of SHAP values, it directly conveys high-level semantic information of “mismatch between control target and actual output” to the subsequent stages, rather than a simple numerical description of “high TP value” or “low MP value.”
This feature attribution mechanism, which embeds physical mechanisms, provides a mathematically rigorously filtered high-quality chain of evidence for the subsequent LLM to identify “false causes” and deduce “root causes.” Through the mechanism described above, the SHAP module successfully transforms the originally obscure high-dimensional fault data matrix X f a u l t into a list of “abnormal features” ordered by importance, explicitly pointing out the physical basis for the model’s fault determination, thereby completing the key mapping from the underlying “Data Space” to the upper-level “Feature Semantic Space.”

3.5. Prompt Engineering Design

Although the SHAP algorithm can quantify the contribution of features to model predictions, it provides only statistical-level correlation explanations. General-purpose LLMs possess strong text generation capabilities but are prone to “machine hallucinations” when dealing with vertical industrial data—specifically, fabricating values or conducting erroneous physical logic reasoning. To address these issues, this study proposes a “Computation and Reasoning Decoupling” prompt engineering framework. The core philosophy is to delegate deterministic numerical calculations and logical judgments to traditional algorithms, while entrusting complex unstructured root cause reasoning to the LLM. This section details the construction logic of this prompt system across six dimensions: role setting, semantic reconstruction, rule injection, SHAP value transformation, causal constraints, and output specifications. A complete example of the actual prompt template integrating these elements is provided in Appendix B.

3.5.1. Expert Role Setting and Task Anchoring

In the system instruction component of the prompt, this study adopts a role-playing strategy, anchoring the LLM’s cognitive state as a “Senior Aircraft Fault Diagnosis Expert.” By defining a clear role, the aim is to activate the engineering domain knowledge distribution latent within the model, ensuring its output language style aligns with industrial technical reports and minimizing noise from general conversational modes. Specifically, the task definition explicitly requires the model to execute “RCA” rather than simple “Data Recitation.” The system instructions mandate the model to distinguish between “Symptomatic Features” and “Root Cause Features,” explicitly pointing out that anomalies in target values often reflect a failure in system response rather than a fault in the target itself. This setting establishes the fundamental cognitive tone for the subsequent reasoning process, ensuring the model remains focused on control deviations of the physical system when processing high-dimensional features.

3.5.2. Physical Semantic Reconstruction of Feature Space

To enable LLMs to comprehensively understand the physical attributes behind the features, this study establishes a feature type mapping function M ( f ) , classifying the high-dimensional feature space into ten major physical categories, as shown in Table 5.
Table 5. Classification and Physical Attributes of Features.

3.5.3. Pre-Calculation Injection of Deterministic Rules

To mitigate the probabilistic hallucinations inherent in general-purpose large language models during numerical computation and strict comparisons, this study integrates a deterministic computational layer before prompt construction. The fundamental design principle involves the decoupling of numerical calculation from semantic reasoning. By utilizing Python algorithms to execute rigorous logical evaluations of fault segments in accordance with aircraft technical specifications, the system provides objective factual grounding for the subsequent reasoning process of the model.
Since the judgment of aircraft performance indicators relies heavily on current operating conditions, a numerical quantization model for operating states is first established. According to domain expert experience, the full-power state and low-altitude environment are the most critical boundary conditions for fault detection, under which diagnostic judgments achieve maximum accuracy. Consequently, the identification of these two states is treated as deterministic rules for pre-calculation and explicit injection into the reasoning framework. Specifically, based on the average target power T P ¯ and average atmospheric pressure A P ¯ within the fault segment, the power state indicator variable S f u l l and the low-altitude state indicator variable S l o w are defined. Here, S f u l l is used to determine if the aircraft is in a maximum thrust request state (set when T P ¯ exceeds 99.5%), and  S l o w determines if the flight altitude is below 6000 ft (corresponding to air pressure ≈ 800 mBar). Their mathematical definitions are as follows:
S f u l l = 1 , if T P ¯ 99.5 , 0 , otherwise ,
S l o w = 1 , if A P ¯ 800 , 0 , otherwise .
These two Boolean state variables, S f u l l and S l o w , constitute the necessary boundary conditions for subsequent rule judgments. Under determined operating conditions, this study establishes a deterministic judgment rule library covering power, intake, fuel, and wastegate control. This library transforms discrete aircraft technical indicators into strict logical constraints. The specific rule content, judgment logic, and physical basis are shown in Table 6.
Table 6. Deterministic Judgment Rule Library for Aircraft Systems.
Based on the rule system in Table 6, the algorithm traverses each fault segment, converting physical rules into mathematical piecewise functions for judgment. Taking the tracking performance of the intake system as an example, the state variable J i n t a k e is defined. It is judged as abnormal (denoted as 1) if and only if the aircraft is in a low-altitude condition and the deviation between the actual intake pressure M I A P ¯ and the target value T I A P ¯ exceeds the threshold. Its formal expression is:
J i n t a k e = 1 , if S l o w = 1 | M I A P ¯ T I A P ¯ | > 50 , 0 , otherwise .
Similarly, for the wastegate opening E V O P , the judgment logic J e v o p is constructed as a composite constraint function. This function dynamically adjusts the judgment interval based on the altitude state S l o w . When in the full power state, it is considered abnormal if the opening deviates from the [ 60 % ,   85 % ] interval at low altitude, or deviates from the 99% setpoint by more than 5% at high altitude:
J e v o p = 1 , if S f u l l = 1 S l o w = 1 E V O P ¯ [ 60 ,   85 ] , 1 , if S f u l l = 1 S l o w = 0 | E V O P ¯ 99 | > 5 , 0 , otherwise .
Once the algorithm completes the above logical operations, the system maps the Boolean results of all state variables J into natural language descriptions and injects them into the context module of the Prompt. For example, when the calculation yields J i n t a k e = 1 and | M I A P ¯ T I A P ¯ | = 62.5 , the system generates the text:
“Program Calculation Result: [Intake System] The deviation between actual intake pressure and target value is 62.5 mBar, exceeding the normal range ( ± 50 mBar).”
Through this strategy, what the LLM acquires are mathematical facts verified by physical laws. This compels the model to conduct causal backtracking based on these established facts, fundamentally eliminating reasoning biases caused by numerical calculation errors.

3.5.4. Semantic Transformation of SHAP Values

Raw SHAP values generated by the LightGBM model represent solely additive feature attributions without inherent semantic context. The direct input of these numerical values into large language models often induces an over-interpretation of minor fluctuations, which consequently causes the model to overlook significant diagnostic patterns in favor of irrelevant quantitative specifics. To circumvent this issue, the present study proposes a semantic transformation module designed to map SHAP values into qualitative descriptors.
This module first filters the obtained Top-K feature set, retaining the top 10 features with significant contributions. Subsequently, by combining feature names with the physical semantics defined in Table 5, it constructs structured feature description blocks. For each feature, the description text encompasses three dimensions: feature name, feature type, and a statement of relative importance.
In particular, a dynamic rule is applied to prevent causal confusion: when “Target Value Features” appear in the high-contribution list, the prompt automatically appends a cautionary note:
“Note: While the target value feature has a high contribution, this typically implies that the system failed to respond to the target, rather than a fault in the target itself. ”
This dynamic semantic completion mechanism effectively rectifies the common cognitive bias where the model equates high SHAP values with the root cause of faults, ensuring that the reasoning focuses on the response capability of the physical system.

3.5.5. Explicit Injection of Physical Causal Relationships

Aircraft are typical multi-variable strongly coupled systems, where clear physical transmission paths and control constraints exist among key parameters. Under normal operating conditions, the system follows established control laws, and variables exhibit stable synergistic changes. However, when a fault occurs, specific transmission links may experience failure, hysteresis, or directional deviation, subsequently causing anomalies in downstream states. To constrain the LLM to conduct reasoning along physically consistent paths and avoid issues such as “cross-layer jumping,” “causal inversion,” or misidentifying result variables as cause variables, this study constructs a domain-knowledge-based causal topology. This topology is explicitly injected into the prompt context in the form of structured knowledge blocks.
Combining aircraft control mechanisms with historical operational data, this paper identifies the primary transmission links and inhibitory constraint relationships between key features. Table 7 presents the causal directions of core features during normal operation, their correlation coefficients under normal conditions, and the engineering explanations for their corresponding anomaly mechanisms in fault states. It is important to emphasize that the correlation coefficient is used to characterize the degree of synergy and the reference baseline under normal conditions and is not equivalent to causal strength; the causal direction is primarily determined by the control structure and physical mechanisms.
Table 7. Causal Relationship Topology and Engineering Interpretations of Key Features.
Based on the micro-causal relationships listed above, this paper summarizes three macro-core chains governing aircraft operation. These serve as a navigation map to guide the LLM in performing “Chain-of-Thought” reasoning:
  • Wastegate Regulation Chain ( TP EVOP MIAP MP ): This chain describes the complete process of intake control. When an anomaly occurs in this chain, it typically manifests as a failure in wastegate opening regulation, leading to abnormal intake pressure fluctuations and ultimately causing a drop in power output. The model is required to prioritize checking whether the upstream control variable EVOP has responded to the TP command.
  • Fuel Supply Chain ( TFP MFP MP ): This chain reflects the execution capability of the fuel system. Fault features manifest as the actual fuel pressure MFP failing to track the target value TFP, causing insufficient power due to fuel pressure limitations.
  • Intake Pressure Closed-Loop Chain ( TIAP MIAP ): This chain focuses on the tracking performance of the intake system. A significant decrease in the correlation between the two indicates leakages or sensor biases within the intake system.
When constructing the prompt, the aforementioned causal relationship table is converted into a Markdown-formatted knowledge block and injected into the context. The system instructions explicitly require the model to follow the logical principle of “tracing from upstream features to downstream anomalies” when generating diagnostic reports. For instance, if SHAP analysis indicates an anomaly in output power MP, the model must not directly attribute it as the root cause. Instead, it must trace backwards along the path EVOP MIAP MP to confirm whether it is caused by wastegate control failure or insufficient intake pressure. This explicit causal injection mechanism effectively prevents the model from misjudging result variables as cause variables, ensuring the physical consistency of the fault diagnosis logic.

3.5.6. Analysis Rule Constraints and Output Format Specifications

In the generation process of LLMs, excessive degrees of freedom often lead to uncontrollable outputs and hallucination issues. To ensure the consistency, physical rationality, and reproducibility of fault diagnosis reports, this study constructs a rigorous constraint system comprising two core dimensions: “Reasoning Prohibitions” and “Output Templates.”
First, to address potential logical errors in the model, strict reasoning prohibitions are established. These prohibitions are injected into the prompt system instructions as “negative constraints,” aiming to delineate the physical boundaries of the reasoning process. For instance, target value features merely represent the controller’s intent; their deviations are typically the result of actuator response failure rather than the source of the fault. Similarly, environmental parameters belong to external boundary conditions and should not be misjudged as internal system faults. The specific types, contents, and design objectives of these prohibitions are shown in Table 8.
Table 8. Reasoning Prohibition Constraints Table.
Building upon the established reasoning boundaries, this study further prescribes a structured output format template. This template mandates the model to generate Markdown-formatted text according to a predefined hierarchical structure, ensuring that analysis results from different fault segments are fully comparable in terms of content dimensions. The standardized output structure contains four core modules: First, Operational State Definition, clarifying the power setting and altitude background at the time of the fault; Second, Root Cause Feature Locking, outputting the key physical quantities causing the fault sorted by importance; Third, Deep RCA, requiring the model to synthesize operating conditions and causal chains for logical argumentation and explicitly exclude symptomatic features; Finally, Fault Evolution Process, reconstructing the temporal path from the initial anomaly to the final performance degradation. The specific output format definitions are provided in Table 9.
Table 9. RCA Report Output Format Specifications.
By implementing the prescribed reasoning prohibitions and output specifications, this study successfully channels the stochastic generation of the large language model into a rigorous engineering diagnostic framework. This dual-constraint approach precludes analytical inconsistencies while simultaneously facilitating a qualitative transition of diagnostic results from unstructured narratives to standardized data formats.

4. Experiments and Results

4.1. Experimental Environment and Configuration

To validate the efficacy of the hierarchical RCA framework proposed in this study—which synergizes LightGBM-based anomaly perception with LLM-driven cognitive reasoning—an experimental platform was established under local industrial simulation conditions. The specific experimental environment configurations are detailed in Table 10.
Table 10. Experimental Environment Configuration.
The algorithmic implementation utilizes Python 3.10. The anomaly detection model is constructed using LightGBM 4.0 for training and inference, while interpretability analysis employs SHAP 0.42 for feature contribution calculation and visualization. For the cognitive reasoning module, this study deploys the GLM-4-9B LLM locally via Ollama. Developed by Zhipu AI, GLM-4-9B is an open-source large language model featuring 9 billion parameters and a native context window of up to 128,000 tokens. It exhibits strong performance in logical reasoning, instruction following, and multilingual understanding, making it highly capable of parsing complex engineering data and reliably generating structured diagnostic reports. This local deployment strategy is adopted to strictly adhere to data security and privacy protection requirements in offline air-gapped scenarios, while simultaneously enhancing the stability and reproducibility of the reasoning process.

4.2. Dataset Construction and Partitioning

To validate the generalization capability of the hierarchical RCA framework proposed in this study under varying fault complexity conditions, training, validation, and test datasets were constructed based on existing fault data. Specifically, Fault17, Fault1, and Fault7 were selected as data sources for model training and threshold optimization, while Fault10, Fault9, and Fault8 constituted an independent test set for final performance evaluation. This partitioning follows the principle that “the training set covers typical patterns, while the test set covers fault datasets not involved in training and containing different complexities.” This approach aims to minimize accidental fitting to specific fault segments and objectively evaluate the engineering applicability of the method in multi-fault scenarios.
Based on the number of symptom variables and the degree of coupling, the six fault datasets can be categorized into three fault modes:
  • Single-variable Dominant Type: Represented by Fault17 and Fault10, these primarily manifest as the inability of MP to effectively track the target value, with anomalies being relatively concentrated.
  • Dual-variable Coupling Type: Represented by Fault1 and Fault9, these manifest as joint anomalies in MIAP and EVOP, reflecting the coupling influence between the control loop and pneumatic variables. The mechanism complexity is moderate.
  • Multi-variable Cascading Type: Represented by Fault7 and Fault8, these simultaneously exhibit compound anomaly features of MIAP, EVOP, and MP. These belong to fault types with longer multi-variable linkage propagation chains and more complex manifestations.
The sample size and fault feature overview of each dataset are detailed in Table 11. To improve the credibility of threshold optimization and model tuning conclusions, and to reduce class ratio fluctuations caused by one-time random partitioning, this paper employs stratified sampling within the training data to construct the training and validation sets with a ratio of 9:1. The stratification is based on the “Normal/Fault” binary labels, ensuring that the validation set maintains consistency with the overall data in class distribution. This minimizes metric bias caused by class prior shift and ensures the stability and reproducibility of threshold optimization results.
Table 11. Partitioning and Feature Overview of Each Fault Dataset.

4.3. Performance Evaluation Metrics

To comprehensively evaluate fault detection performance, Accuracy, Precision, Recall, and F1-score are adopted as evaluation metrics. Accuracy measures overall classification correctness, Precision reflects the reliability of fault predictions, Recall evaluates the capability of detecting actual faults, and the F1-score provides a balanced assessment of Precision and Recall.
The definitions of the evaluation metrics are given as follows:
Accuracy = TP + TN TP + TN + FP + FN ,
Precision = TP TP + FP ,
Recall = TP TP + FN ,
F 1 = 2 · Precision · Recall Precision + Recall ,
where TP , TN , FP , and  FN denote true positives, true negatives, false positives, and false negatives, respectively.

4.4. LightGBM Model Training and Threshold Optimization

Given that aircraft fault datasets typically exhibit significant class imbalance characteristics, directly training a classifier with default parameters often causes the optimization process to bias towards the majority class. This leads to degenerate behavior where most samples are predicted as normal, resulting in an elevated missed detection rate for fault samples [31]. To enhance the recognition capability for the minority class while ensuring training efficiency and convergence stability, this paper configures key LightGBM parameters based on data scale and sample distribution characteristics [32].
The number of boosting iterations is set to n_estimators = 100, enabling the model to reach stable convergence quickly under the current data scale while controlling model complexity to reduce overfitting risks associated with an excessive number of trees. Simultaneously, the class imbalance adaptive mechanism is enabled by setting is_unbalance = True. This algorithm automatically adjusts loss weights based on the ratio of positive to negative samples in the training set, enhancing the contribution of fault class samples in gradient calculation. This prompts the learned decision boundary to focus more on minority class patterns, thereby effectively mitigating the missed detection issue caused by disparate class ratios [32]. Furthermore, to eliminate random perturbations during feature and data sampling and ensure the reproducibility of experimental conclusions, the random seed is fixed at random_state = 42 [33].
After completing the LightGBM model training, an adaptive optimization of the threshold τ was implemented based on the validation set to determine the optimal binarization decision boundary τ * . As illustrated in Figure 4, the variation of the F1-Score with the threshold presents three distinct stages: In the low-threshold interval, the F1-Score climbs rapidly, indicating that increasing the threshold effectively filters out low-confidence false alarm samples. Subsequently, the curve enters a broad plateau phase, reflecting the model’s good separability and robustness for positive and negative samples. However, as  τ approaches 1, overly stringent judgment conditions lead to a decline in recall, causing the F1-Score to fall back.
Figure 4. F1-Score variation curve with decision threshold on the validation set.
The intersection of the vertical and horizontal red dashed lines in the figure explicitly marks the location of this optimal threshold and its corresponding maximum F1-Score. Ultimately, this paper identifies the peak point τ * 0.6185 (corresponding to F 1 0.9876 ) as the final decision threshold. Compared to the default empirical threshold of 0.5, this adaptive strategy is better adapted to the real data distribution, achieving the optimal trade-off under the dual constraints of high recall and low false alarm rates required for aircraft fault diagnosis, thereby providing high-quality binarized inputs for subsequent RCA.

4.5. Experimental Results and Analysis

4.5.1. Analysis of Fault Detection Performance Results and Ablation Study

To rigorously evaluate the effectiveness of the proposed mechanism-informed feature engineering strategy and to quantify the individual contribution of each feature category, a comprehensive ablation study was conducted. Specifically, a baseline model trained solely on the original 10-dimensional raw sensor data was compared with models trained on the raw features augmented by each feature group independently, as well as with the complete constructed feature space.
As described previously, the full 73-dimensional feature set consists of 10 raw sensor features, 4 environmental normalization ratio features, 9 mechanism-based hybrid features, 10 differential features, and 40 sliding-window statistical features. The averaged performance metrics over independent test sets are summarized in Table 12.
Table 12. Performance Comparison in the Feature Ablation Study.
As shown in Table 12, augmenting the raw sensor data with different feature categories leads to distinct and complementary performance improvements, thereby clarifying the functional role of each feature group in fault detection.
Ratio Features: The incorporation of environmental normalization ratio features substantially improves Recall (from 0.8783 to 0.9262) and increases the F1-score to 0.8464. This enhancement indicates that normalizing sensor signals with respect to environmental variables effectively mitigates the influence of operating condition variability, thereby improving the model’s sensitivity to fault events under fluctuating working conditions.
Hybrid Features: The inclusion of mechanism-based hybrid features primarily enhances Precision (increasing to 0.8320 relative to the baseline). This result suggests that embedding physically meaningful coupling relationships into the feature space constrains the hypothesis space of the classifier and reduces physically implausible fault predictions, thereby improving classification reliability.
Window Features: The addition of sliding-window statistical features results in a significant increase in Precision (0.9645), which represents the highest Precision among all individual feature groups. By capturing temporal statistical characteristics over multiple time scales, these features enhance robustness to transient disturbances and sensor noise, effectively reducing false positive rates.
Differential Features: Introducing differential features yields the highest Recall among the individual feature groups (0.9599). This finding demonstrates that explicitly modeling dynamic trends and rates of change substantially improves the model’s responsiveness to abrupt fault-related deviations, thereby enhancing detection capability.
When all feature categories are jointly incorporated, the complete 73-dimensional mechanism-informed feature space achieves the most robust and well-balanced overall performance. Although the overall Accuracy improves to 0.9901, the more critical improvement lies in the balanced optimization of Precision (0.9441) and Recall (0.9381), resulting in the highest F1-score (0.9381).
These results reveal a clear synergistic effect among the proposed feature categories. A model trained solely on raw sensor data—without explicit representation of physical coupling relationships or temporal dynamics—remains susceptible to misclassifying normal dynamic fluctuations as fault conditions. In contrast, the proposed feature engineering framework integrates physical constraints, temporal statistical descriptors, and dynamic trend information into a unified representation, thereby refining the decision boundaries of the classifier. Consequently, the model achieves robust fault recognition across complex operating scenarios, simultaneously minimizing false alarms and maintaining a high detection rate for true fault events.

4.5.2. Comparative Analysis of Fault Detection Results

To validate the effectiveness of the proposed scheme (Mechanism-Informed Feature Engineering + LightGBM Ensemble Learning), it was compared with five baselines, including SVM [34], Random Forest [35], XGBoost [36], LSTM [37], and 1D-CNN [38], under the same test set and evaluation metrics.
To ensure statistical rigor, a Bootstrap resampling method was employed to calculate the 95% confidence intervals (CIs) for each metric on the combined test set. Specifically, based on the fixed true labels and the 0/1 predictions of each model, 1000 Bootstrap iterations were performed. In each iteration, 10,000 samples (comprising 8000 normal and 2000 fault samples) were drawn with replacement, and the Accuracy, Precision, Recall, and F1-Score were calculated. For each metric and model, the 2.5th and 97.5th percentiles of the 1000 computed results were taken as the lower and upper bounds of the 95% CIs. The performance summary is presented in Table 13.
Table 13. Comparison of Fault Detection Performance Across Different Models.
Overall, LightGBM achieved the best comprehensive performance, reaching an Accuracy of 0.9836 and a Recall of 0.9306. While Random Forest exhibited the highest Precision (0.9923), its Recall (0.8623) lagged behind our method, leading to a lower F1-Score of 0.9227 compared to our 0.9577. This indicates that LightGBM achieves a more favorable balance between false alarms and missed detections. Deep learning baselines (LSTM and 1D-CNN) showed inferior results, with F1-Scores significantly lower than GBDT-based models.
To rigorously verify whether the performance advantage of LightGBM is statistically significant, McNemar’s tests were conducted, with LightGBM as the baseline and compared against the other five models one by one on the same combined test set. The test statistic is defined as:
χ 2 = ( b c ) 2 b + c
where b represents the number of samples misclassified by LightGBM but correctly predicted by the comparative model, and c represents the number of samples correctly classified by LightGBM but misclassified by the comparative model. The results, including the test statistics and corresponding p-values, are detailed in Table 14.
Table 14. McNemar’s Test Results Comparing LightGBM with Baselines.
As shown in Table 14, for every comparison, the value of c is substantially larger than b. Consequently, the computed χ 2 statistics are very large, yielding p-values approaching 0 (all <0.001). This confirms that under identical test samples and threshold strategies, LightGBM makes significantly fewer errors than all other models. This difference is not merely a random fluctuation, but a statistically significant performance advantage at the 95% confidence level.
Furthermore, threshold-independent performance was evaluated using Precision-Recall (PR) and Receiver Operating Characteristic (ROC) curves, as illustrated in Figure 5 and Figure 6. The three tree-based models (LightGBM, Random Forest, and XGBoost) exhibit outstanding performance in both Area Under the Curve (AUC) and Average Precision, with their curves clustering tightly together in the top tier. This indicates that their ranking ability is highly superior. In contrast, the curves for the deep learning models (1D-CNN and LSTM) and SVM are noticeably lower, with their respective Average Precision and AUC values trailing behind those of the tree models.
Figure 5. PR curves for different fault detection models.
Figure 6. ROC curves for different fault detection models.
These results suggest that tree-based architectures are fundamentally more suitable for this specific fault detection task. Although tree-based methods exhibit varying performance across different regions of the ROC and PR curves, LightGBM significantly outperforms the other two tree models at the optimal F1-score. Importantly, at this optimal threshold, LightGBM maintains a low false alarm rate, strictly satisfying the practical requirements of fault detection applications where excessive false positives cannot be tolerated. By explicitly incorporating mechanism-informed features under conditions of limited and noisy monitoring data, LightGBM efficiently models complex nonlinear decision boundaries, ultimately achieving an optimal trade-off between detection accuracy and engineering practicality.

4.5.3. Evaluation of RCA Accuracy

Subsequent to fault detection, the root cause localization capabilities of the proposed “Anomaly Perception-Cognitive Reasoning” framework are evaluated. The study benchmarks the proposed method against a traditional data-driven strategy that relies solely on SHAP-based statistical attribution for feature ranking. In contrast, our LLM-enhanced approach integrates physical constraints, engineering prohibitions, and causal chain prompts to perform mechanistic consistency verification and causal backtracking on the attribution evidence. To quantify diagnostic reliability in an engineering context, the RCA Accuracy is strictly defined as the successful identification of the expert-labeled ground truth within the algorithm’s Top-3 output features. The statistical outcomes are presented in Table 15, with comprehensive comparisons illustrated in Figure 7 and Figure 8.
Table 15. Comparison of RCA Accuracy Across Datasets.
Figure 7. Detailed comparison of root cause identification accuracy between SHAP and LLM methods.
Figure 8. Overall accuracy comparison of RCA across different datasets.
From Table 15 and Figure 7, it is evident that the LLM method demonstrates higher and more stable identification accuracy across different fault types and root cause features.
For the single-variable fault (Fault10), the expert-labeled root cause is MP. The accuracy of the SHAP method is 0%, whereas the LLM method achieves 100%. This discrepancy is significant and typical: in a closed-loop control system, SHAP, relying solely on correlation attribution, tends to misidentify non-fault variables such as “target commands or setpoints” as key factors, leading to a complete failure in root cause localization. In contrast, the LLM, after introducing engineering prohibitions such as “Target values cannot serve as fault sources,” can effectively eliminate symptomatic features and focus reasoning on the execution feedback loop where the fault could physically occur, thereby achieving stable identification of the true root cause.
For the dual-variable coupling fault (Fault9), involving both MIAP and EVOP features, SHAP achieves 100% identification for MIAP but drops to 85.7% for EVOP. This indicates that in coupled scenarios, although SHAP can capture the main relevant features, ranking drift may occur in multi-variable co-driven anomaly patterns. Conversely, LLM achieves 100% for both MIAP and EVOP, reflecting its consistent judgment capability for “multi-root cause joint action” in coupled faults.
For the multi-variable cascading fault (Fault8), involving MIAP, EVOP, and MP, SHAP’s accuracy is 57.1%, 92.9%, and 64.3%, respectively, showing good identification for some variables but instability for key variables. LLM achieves 100% for both MIAP and EVOP, and 78.6% for MP. Although overall superior to SHAP, the drop in MP accuracy suggests that when the fault propagation chain is long and variables simultaneously act as “propagation nodes and performance results,” some confusion may still arise between the source and intermediate links.
Further combining the overall results in Figure 8, the LLM method achieves 100% overall accuracy on Fault10 and Fault9, and 92.9% on Fault8, resulting in a total accuracy of 97.1%. In comparison, SHAP has an overall accuracy of 0% on Fault10, 92.9% on Fault9, and 71.4% on Fault8, resulting in a total accuracy of 62.3%. These results indicate that while the pure SHAP method is valuable for “explaining model discrimination basis,” what it essentially reflects is the strength of feature contribution to classification output, which belongs to correlation-level evidence. It is difficult to directly equate this to root causes in an engineering sense, especially in closed-loop systems with control strategies, setpoints, and feedback loops, where features with high contribution may merely be external manifestations or transmission paths of the fault rather than the source.
Comparatively, the LLM-enhanced method successfully fuses the statistical evidence provided by SHAP with domain knowledge. By filtering out impossible fault features through mechanistic prohibitions and backtracking inversely along the causal chains given in the prompt, it achieves a leap from “Correlation Explanation” to “Causal Candidacy” in most scenarios. Meanwhile, the LLM possesses stronger semantic integration capabilities for operational state information, allowing it to synthesize the impact of conditions such as full power status and operating condition changes on anomaly manifestations, thereby enhancing the engineering consistency and cross-dataset robustness of reasoning conclusions. It is worth noting that the identification accuracy for MP in Fault8 did not reach 100%, suggesting room for improvement. Future work can further reduce confusion between sources and propagation nodes in complex compound cascading faults by supplementing finer-grained causal topology constraints, refining the list of potential fault variables, and improving the consistency verification mechanism for reasoning paths, thereby elevating the stability and interpretability confidence of root cause localization.

4.6. Case Study Analysis

To further validate the engineering applicability of the proposed diagnostic framework—“Anomaly Perception → Root Cause Localization → Explanation & Causal Chain Output”—under scenarios of varying fault complexity, this section selects three typical fault segments as case studies. These correspond to single-variable dominant faults, dual-variable coupling faults, and multi-variable cascading faults, respectively. Each case study evaluates two RCA strategies according to uniform criteria by comparing SHAP-derived attribution results with LLM-generated cognitive reasoning outputs. This assessment prioritizes the identification of expert-labeled root causes within the top three candidates while simultaneously examining the generation of mechanistic explanations and diagnostic causal trajectories. Consequently, the analysis validates the multifaceted superiority of the proposed framework in terms of localization accuracy and interpretability utility.

4.6.1. Analysis of Single-Variable Dominant Case

Figure 9 illustrates an exemplary RCA report generated by the LLM for a typical single-variable fault segment. The report synthesizes structured content encompassing operational state definitions, root cause conclusions, key evidentiary features, and the chronological fault evolution process. This design effectively bridges the gap between statistical model evidence and engineering semantics, yielding a diagnostic narrative specifically tailored for maintenance troubleshooting.
Figure 9. Example of LLM-generated RCA report for a single-variable fault case.
Table 16 presents the comparative results of root cause localization for this segment. For the scenario where the expert-labeled root cause is MP, the Top-3 features output by the SHAP method were TP, EVOP_mean_10, and PCD. It failed to hit the root cause and could only provide correlation ranking at the feature contribution level, making it difficult to further provide mechanistic explanations and causal chains.
Table 16. Comparison of Root Cause Localization Results and Interpretability for Fault10 (Single-variable).
In contrast, the Top-3 features output by the LLM were EVOP, MP, and MIAP. This successfully covered the root cause MP and provided a causal explanation framework of “Control Variable Anomaly → Intake State Change → Insufficient Power Output.” This case illustrates a typical issue in closed-loop control systems: Target value features often exhibit high contribution in SHAP rankings, but they are more likely to reflect control commands and state requests, belonging to symptomatic information rather than the fault source. The cognitive reasoning module, however, can suppress the interference of such symptomatic features under rule constraints, focusing the diagnostic attention on the execution and feedback chains where the fault actually occurred, thereby outputting more actionable root cause conclusions.

4.6.2. Analysis of Dual-Variable Coupling Case

Figure 10 presents an example of the LLM RCA report for a typical dual-variable coupling fault segment. Consistent with the single-variable case, the report not only provides a root cause ranking but also explicitly describes the operational state and fault propagation relationships. Furthermore, it includes explanations for excluding potential interfering factors, thereby endowing the diagnostic results with enhanced interpretability and traceability.
Figure 10. Example of LLM-generated RCA report for a dual-variable coupling fault case.
Table 17 presents the comparative results for the Fault9 scenario. The expert-labeled root causes for this fault type involve two variables: MIAP and EVOP, manifesting as a coupling between intake pressure anomalies and wastegate regulation anomalies.
Table 17. Comparison of Root Cause Localization Results and Interpretability for Fault9 (Dual-variable).
The Top-3 features output by the SHAP method were TP, MIAP_std_10, and MIAP_std_5. While these features reflect intake pressure fluctuation characteristics, they failed to consistently cover EVOP. Consequently, the root cause hit performance is classified as “Partial,” and the method lacks the capability to provide a clear causal chain explanation.
In contrast, the Top-3 features output by the LLM were EVOP, MIAP, and MP. This result not only covers both root causes simultaneously but also provides a transmission chain that aligns better with engineering logic. It identifies the regulation anomaly of EVOP as the upstream factor, explains how it triggers the deviation in MIAP, and subsequently leads to a decline in MP. Additionally, the report elucidates the influence of environmental factors such as altitude, thereby enhancing the engineering credibility and usability of the diagnostic conclusion.

4.6.3. Analysis of Multi-Variable Cascading Case

Figure 11 presents an example of the LLM RCA report for a typical multi-variable cascading fault segment. Such faults feature longer propagation chains and stronger variable coupling. Relying solely on single-dimensional feature contribution ranking makes it difficult to organize a consistent set of root causes and propagation paths. The structured report, through the format of “Operational State—Evidentiary Features—Root Cause Conclusion—Process Explanation,” provides a more complete diagnostic narrative framework for complex faults.
Figure 11. Example of LLM-generated RCA report for a multi-variable cascading fault case.
Table 18 presents the comparative results for the Fault8 scenario. The expert-labeled root causes involve three variables: MIAP, EVOP, and MP. The Top-3 features output by the SHAP method were TP, MIAP_std_5, and MIAP_std_10. The hit performance is classified as “Partial,” and it failed to generate a causal chain explanation. The results overall still exhibit the issue where “target value features occupy top rankings while root cause coverage is insufficient.”
Table 18. Comparison of Root Cause Localization Results and Interpretability for Fault8 (Multi-variable Cascading).
In contrast, the Top-3 features output by the LLM were EVOP, MIAP, and MP. This successfully covers all three root causes and outputs a consistent mechanistic explanation and transmission chain. It organizes multi-source anomalies into a traceable causal structure, making it more suitable for engineering diagnosis scenarios involving complex compound cascading faults.

4.6.4. Overall Analysis

A synthesis of the findings from the three representative case studies reveals two primary conclusions. In the first instance, although SHAP provides robust statistical evidence for the interpretation of the model’s predictive logic, the approach remains inherently a correlation-based attribution technique. Consequently, high-contribution features do not necessarily correspond to engineering root causes. In closed-loop control systems, target-value features and their derivatives often dominate the top rankings, leading to incomplete coverage of the true root causes and a lack of causal explanations suitable for troubleshooting. Second, the LLM-enhanced method successfully augments SHAP evidence by introducing physical rules and constraints. It effectively suppresses symptomatic features, thereby focusing the analysis on the actual fault source. By generating structured content—encompassing operational state definitions, root cause locking, causal chain tracing, and environmental factor exclusion—it elevates the diagnostic result from a mere “list of important features” to a comprehensive “diagnostic report for engineering decision-making.”
Table 19 provides a comparative summary of the report element coverage between the two methods. It is evident that standard SHAP outputs typically lack essential engineering contexts, such as operational state definitions and environmental factor exclusion. In contrast, the LLM-generated reports successfully incorporate this critical information and construct clear causal chains, thereby significantly enhancing the readability, traceability, and engineering applicability of the diagnostic results. Taken together, the three case studies validate that the proposed framework consistently enhances the robustness of root cause localization within intricate failure modes regarding both identification accuracy and explanatory depth. By generating systematized diagnostic conclusions aligned with engineering cognition, the method establishes a solid foundation for subsequent informed maintenance decision-making.
Table 19. Comparison of Elem ent Coverage in RCA Reports.

5. Conclusions

To address critical challenges such as high dimensionality, class imbalance, and the insufficient interpretability of traditional data-driven methods in civil aircraft monitoring data, this paper proposes and validates an RCA framework that integrates mechanism-informed feature engineering, ensemble learning-based anomaly perception, and LLM-driven cognitive reasoning. By constructing a high-dimensional feature space containing environmental ratios and physical hybrid indicators, the proposed method effectively mitigates the data distribution shift problem caused by drastic changes in flight conditions. Furthermore, combining the cost-sensitive LightGBM model with an adaptive threshold optimization strategy significantly enhances the detection recall capability for rare fault samples while maintaining a low false alarm rate.
Furthermore, this paper establishes a diagnostic architecture predicated on the decoupling of computation and reasoning. The framework employs SHAP to extract statistical feature contribution evidence, which is then refined by an LLM guided by physical causal chains and engineering prohibitions. This mechanism enables the model to perform mechanistic consistency verification and causal backtracking on the preliminary statistical results. Experimental evaluations demonstrate that, at the RCA level, the proposed cognitive reasoning mechanism effectively mitigates the interference of symptomatic features in closed-loop control systems, significantly enhancing both the identification accuracy and interpretability. These findings validate that synergizing data-driven anomaly perception with knowledge-guided cognitive reasoning constitutes a robust technical pathway for achieving trustworthy diagnosis in complex industrial systems.
While this study focuses on a specific type of civil aircraft, the proposed methodology could potentially be extended to other aircraft types and subsystems. Transferring this approach to different equipment, however, requires fulfilling certain prerequisites. First, the anomaly perception module relies on a sufficient amount of normal operational data alongside a subset of known fault samples to establish reliable decision boundaries. Second, the cognitive reasoning module heavily depends on prior domain knowledge. Adapting the LLM constraints for a new system requires an explicit understanding of its specific working mechanisms, deterministic engineering rules, and well-defined causal topologies. If these data and knowledge conditions can be met, the “computation and reasoning decoupling” architecture may serve as a useful reference for developing interpretable root cause analysis workflows in other complex industrial systems.
Despite these achievements, the proposed method still possesses certain limitations when dealing with multi-variable strong coupling and long causal chain compound faults, particularly regarding the precision of distinguishing between fault sources and intermediate propagation nodes. Additionally, the currently constructed causal topology relies primarily on prior expert knowledge; its generalization reasoning capability remains insufficient when facing unknown or evolving fault scenarios. Future research will focus on introducing dynamic causal graphs to enhance the resolution capability for complex cascading faults and exploring the combination of causal discovery methods to automatically mine latent physical constraint relationships from data. This aims to reduce reliance on manual rules and further elevate the robustness and universality of the diagnostic framework under different mechanisms and complex operating conditions.

Author Contributions

W.D.: methodology, experiments, data analysis, writing-review and editing, funding acquisition, supervision. J.D.: coding, validation, writing—original. H.Z.: data curation. D.Y.: supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundation of China (61903262), Natural Science Foundation of Liao Ning province (2024-MS-133).

Data Availability Statement

The data presented in this study are available on request from the corresponding author because the data are not publicly available due to privacy.

Conflicts of Interest

The author Haoran Zhang was employed by the company Tianjin Jepsen International Flight College Co., Ltd. Authors Wenyou Du and Dongsheng Yang were employed by the company Shenyang Institute of Computing Technology Co., Ltd., CAS. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
APAtmospheric Pressure
AUCArea Under the Curve
CoTChain-of-Thought
CREVCrankshaft Revolution
CIsCnfidence Intervals
DCPSVDC Power Supply Voltage
DeepLIFTDeep Learning Important FeaTures
ECUElectronic Control Unit
EVOPExhaust Valve Opening Position
FADECFull Authority Digital Engine Control
FIRFuel Intake Ratio
FPCDFuel Pressure Control Deviation
GBDTGradient Boosting Decision Tree
GOSSGradient-based One-Side Sampling
IPCDIntake Pressure Control Deviation
IPRIntake Power Ratio
LLMsLarge Language Models
MFPManifold Fuel Pressure
MIAPManifold Intake Air Pressure
MPMeasured Power
OPIOverall Power Indicator
PCDPower Control Deviation
PEIPower Efficiency Index
PLIPower Load Indicator
PRPrecision-Recall
RCARoot Cause Analysis
ROCReceiver Operating Characteristic
SHAPShapley Additive exPlanations
TFPTarget Fuel Pressure
TIAPTarget Intake Air Pressure
TPTarget Power
VPEVoltage Power Efficiency

Appendix A. Detailed Analysis of Fault Mechanisms

To provide a deeper understanding of the nature of the faults present in the dataset, this appendix details 13 specific fault mechanisms (F1–F13) identified by domain experts.
  • F1: Intake Manifold Air Leak. The difference between TIAP and MIAP is excessively large, exceeding the normal range (Under full power and altitude < 6000 ft, the error between TIAP and MIAP should be within ± 50 mBar).
  • F2: Insufficient Actual Power (Due to F1 and F4). When altitude < 6000 ft and TP is set to 100 % , severe EVOP fluctuations and insufficient MIAP lead to inadequate intake air, subsequently causing insufficient MP (MP should be >95% under these conditions).
  • F3: Excessive Wastegate Opening. When altitude < 6000 ft and TP is set to 100 % , a stuck wastegate actuator results in insufficient physical opening. Consequently, MIAP remains lower than TIAP, causing the ECU to continuously command the wastegate solenoid to increase the opening. This leads EVOP to exceed 85 % (Normal EVOP range under these conditions is 60– 85 % ).
  • F4: Severe Wastegate Opening Fluctuation. Under stable TIAP conditions, EVOP fluctuates violently, which in turn causes severe fluctuations in the actual intake pressure (MIAP).
  • F5: Insufficient Actual Power Due to High Intake Temperature. When altitude  < 6000  ft and TP is set to 100 % , blockage in the intercooler causes the intake temperature to exceed limits. This triggers high exhaust temperatures, prompting the engine protection mechanism to forcefully reduce MP to prevent engine damage (MP drops below the expected 95 % ).
  • F6: Insufficient Actual Power Due to F1. Similar to F1, an intake manifold leak causes insufficient MIAP, directly resulting in insufficient MP (<95%) under full power and altitude < 6000 ft.
  • F7: Sudden Power Drop Due to F3. Triggered by F3, the persistent lack of MIAP causes the ECU to continuously increase the EVOP signal. The stuck actuator fails to move, and the continuous inflation by the solenoid valve eventually bursts the air diaphragm. This irreversible mechanical damage causes a total loss of wastegate control, retracting the actuator, disabling turbocharging, and leading to a sudden, severe drop in both MIAP and MP.
  • F8: Insufficient Actual Power Due to High Altitude. When flight altitude exceeds 6000 ft, low atmospheric pressure is inherently insufficient to maintain MP > 95 % . This is treated as an environmental anomaly rather than a system fault.
  • F9: Insufficient Actual Power Due to Propeller Fault. Both intake and fuel pressures are normal, but a fault in the propeller system prevents the MP from reaching the required 95 % under full power.
  • F10: Severe Fluctuation in Actual Power Due to F4. While TP remains stable, the MIAP fluctuations caused by F4 directly induce severe fluctuations in MP.
  • F11: Simulated Forced Landing Training. These are non-fault anomalous operations intentionally executed during flight training.
  • F12: Insufficient Actual Fuel Pressure. A fault in the fuel booster pump causes an excessive difference between TFP and MFP (During normal operation, the difference should remain within ± 50 Bar), resulting in a sudden drop in MP.
  • F13: Low Wastegate Opening and Excessive Intake Pressure. Due to incorrect adjustment of the wastegate actuator, EVOP remains abnormally low when TP = 100 % , causing MIAP to become excessively high (Violating the normal EVOP range of 60– 85 % and the TIAP/MIAP error margin of ± 50 mBar).

Appendix B. Prompt Template for LLM-Based Root Cause Analysis

# Role Setting
 
You are a senior aircraft engine fault diagnosis expert with profound knowledge of engine physics and extensive experience in fault analysis. Your task is **Root Cause Analysis (RCA)**: to identify the fundamental cause leading to the fault, rather than merely describing the symptomatic manifestations.
 
# Task
 
Based on the Top 10 features contributing most to the fault as identified by SHAP analysis, combined with engine physical knowledge and causal relationships between features, perform **Root Cause Analysis**.
 
**Core Requirements**:
1. **Distinguish Symptoms from Root Causes**: If SHAP indicates a high contribution from target value features (TP, TIAP, TFP), this is not the root cause, but a symptom. You must analyze **why the system failed to reach these targets**.
2. **Incorporate Causal Relationships**: Utilize the causal relationship chains between features to trace from upstream features to the root cause of downstream anomalies.
3. **Exclude Irrelevant Features**: Environmental value features (e.g., AP) are generally not fault root causes unless there is explicit evidence.
4. **Focus on Hybrid Features**: Control deviation features (PCD, FPCD, IPCD) directly reflect the tracking errors of the control system; performance indicator features (PEI, IPR, VPE, FIR, PLI, OPI) reflect overall system efficiency.
 
# Fault Segment Information
 
- **Fault Segment ID**: [Dynamic Injection: e.g., #1]
- **Fault Duration**: [Dynamic Injection: e.g., 120] sampling points
 
## SHAP Analysis Results (Top 10 Features with Highest Contribution)
 
**Note**:
- This SHAP ranking is **dynamically calculated for this specific fault segment**, reflecting the feature importance for this period.
- Features may include original features and derived features (ratio features, hybrid features, differential features, rolling window features, etc.).
- SHAP values reflect the feature’s contribution to the model’s fault prediction, but do not imply the feature is the root cause.
- Distinguish based on physical knowledge: Target Values (cannot be faults) vs. Actual Values (potential faults) vs. Control Variables (potential faults).
 
[Dynamic Injection: Top 10 feature list generated by the algorithm, including feature name, physical meaning, feature type (Target/Actual/Control/Environment/Performance), and SHAP contribution. Mark if it is a derived feature.]
 
[Dynamic Injection: If the system is at full power and target value features appear in the Top 10, automatically inject the following warning rule:
“Important Premise: Although target value features (e.g., TP) show high contribution in SHAP analysis, they are **not fault root causes**. This fault occurred under a full power state (TP = 100\%), meaning the ECU requested maximum power output, but the system failed to achieve it. You must analyze **why the full power request could not be met**, rather than treating TP itself as the fault.”]
 
## Feature Causal Relationships
 
| Causal Relation | Normal State Correlation | Engineering Interpretation |
| **TP -> MP** | 0.99 | Power command fails to effectively translate into output power |
| **TP -> TIAP** | 0.98 | ECU control target remains partially effective |
| **TIAP -> MIAP** | 0.99 | Actual intake pressure fails to track the target value |
| **EVOP -> MIAP** | +0.86 | Wastegate regulation direction failure |
| **TP -> EVOP** | 0.89 | ECU control over wastegate fails |
| **TFP -> MFP** | 1.00 | Instability in fuel system supply |
| **MFP -> MP** | 1.00 | Fuel pressure limits power output |
| **MIAP -> MP** | 0.99 | Insufficient or unstable intake air |
| **MP -> CREV** | 0.93 | Power fails to effectively translate into rotational speed |
| **EVOP -> MP** | 0.89 | Wastegate regulation leads to a power drop |
| **AP -| MP** | -0.65 | Power is passively limited after control failure |
| **AP -| EVOP** | -0.79 | Wastegate fails to adjust according to altitude laws |
 
**Key Understanding**:
- **Target Value Features (TP, TIAP, TFP)**: These are control targets set by the ECU and cannot malfunction themselves. If SHAP shows high contribution, it means the system failed to reach these targets. Find out **why they could not be reached**.
- **Actual Value Features (MP, MIAP, MFP, CREV, DCPSV)**: These are actual measured values and may be abnormal due to physical faults.
- **Control Variable Features (EVOP)**: The output control signal of the ECU. Anomalies may indicate actuator faults or control logic issues.
- **Environmental Value Features (AP)**: Environmental conditions. Usually not the fault cause but will affect system performance.
- **Performance Indicator Features (PEI, IPR, VPE, FIR, PLI, OPI)**: Comprehensive performance metrics reflecting overall system efficiency. Anomalies indicate a degradation in system performance.
- **Control Deviation Features (PCD, FPCD, IPCD)**: The deviations between target and actual values, directly reflecting control system tracking errors. They are critical indicators for diagnosis.
- **Ratio Features (_AP_Ratio)**: Represent the ratio of the feature to atmospheric pressure, used to eliminate baseline drift caused by environmental pressure.
- **Differential Features (_diff)**: Represent the first-order difference of the feature, used to capture abrupt change patterns in the signal.
- **Mean Features (_mean_5, _mean_10)**: Represent the average value within a sliding window, used to capture trend changes in the system.
- **Standard Deviation Features (_std_5, _std_10)**: Represent the standard deviation within a sliding window, used to capture fluctuation characteristics and oscillation intensity of the signal.
 
**Macro Causal Chains (Reasoning Navigation Map)**:
When conducting RCA, strictly follow the three physical macro chains below for top-down backtracking:
1. **Wastegate Regulation Chain (TP -> EVOP -> MIAP -> MP)**: Describes the entire intake control process. If downstream anomalies occur, prioritize checking whether the upstream control variable EVOP responded to the TP command.
2. **Fuel Supply Chain (TFP -> MFP -> MP)**: Reflects fuel system execution capability. Manifests as actual fuel pressure MFP failing to track target TFP, causing power to be limited by fuel.
3. **Intake Pressure Closed-Loop Chain (TIAP -> MIAP)**: Focuses on intake system tracking performance. A significant drop in correlation indicates leaks or sensor biases within the intake system.
 
## Engine Fault Diagnosis Knowledge Base
 
The following knowledge base is based on the technical specifications of this aircraft engine. All numerical judgments have been pre-calculated by the program:
 
### 1. Target Power (TP)/Measured Power (MP)
**Technical Specifications**:
- TP is manually set by the pilot based on flight needs, ranging from 0-100\%.
- The engine equipped on this aircraft can maintain full power operation up to an altitude of 6000  feet (ft).
- When flight altitude is below 6000 ft and the throttle lever is at full (TP = 100\%), the MP value should be > 95%.
- MP value is calculated by the ECU.
**Judgment Results for Current Fault Segment** (Pre-calculated):
[Dynamic Injection: Operating status (e.g., Full Power/Partial Load, Altitude </>= 6000 ft), MP judgment result (Normal/Abnormal) and specific deviation values.]
 
### 2. Actual Intake Pressure (MIAP)/Target Intake Pressure (TIAP)
**Technical Specifications**:
- When TP is at 100%, TIAP is 2225 mbar.
- The difference between MIAP and TIAP should be within +/-50 mbar (altitude below 6000 ft).
- In low power states, a relatively fixed difference between MIAP and TIAP is normal.
- However, MIAP should match TP smoothly without major fluctuations.
**Judgment Results for Current Fault Segment** (Pre-calculated):
[Dynamic Injection: TIAP compliance status, MIAP-TIAP difference judgment result (Normal/Abnormal) and specific deviation values.]
 
### 3. Actual Fuel Pressure (MFP)/Target Fuel Pressure (TFP)
**Technical Specifications**:
- During normal operation, the difference between MFP and TFP should be within +/-50 bar.
**Judgment Results for Current Fault Segment** (Pre-calculated):
[Dynamic Injection: MFP-TFP difference judgment result (Normal/Abnormal) and specific deviation values.]
 
### 4. Wastegate Opening Position (EVOP)
**Technical Specifications**:
- When flight altitude is below 6000  ft and throttle is at full power (TP = 100%), EVOP should range between 60-85%.
- During high-altitude flight (above 6000 ft), EVOP = 99% is considered normal.
**Judgment Results for Current Fault Segment** (Pre-calculated):
[Dynamic Injection: EVOP range judgment result (Normal/Abnormal) and specific deviation values.]
 
**Important Note**: All numerical judgments above have been pre-calculated by the program. Please directly use these judgment results for RCA without performing numerical comparisons again.
 
## Key Data of Fault Segment
 
**Operating Status Premise**:
[Dynamic Injection: Specific values of current TP mean, AP altitude description, and corresponding warnings (e.g., natural power drop exemption when AP < 800 mBar).]
 
**Fault Segment Statistical Values (Mean)**:
[Dynamic Injection: Specific mean values of various physical features during the fault segment.]
 
# Analysis Task
 
Please conduct **Root Cause Analysis (RCA)** and directly output the analysis results. Do not output your internal reasoning process or conditions.
 
**Important Instructions**:
- This analysis targets a continuous fault segment in the test data. **The SHAP ranking is dynamically calculated for this segment**, reflecting feature importance.
- Flight altitude is derived from AP and cannot be unknown.
- **You must use the judgment results from the knowledge base**: All numerical judgments have been pre-calculated. Use these directly; do not recalculate.
- You must combine the operating state (Full Power/Partial Load) and flight altitude for RCA.
- You must distinguish between symptoms and root causes: Target value features cannot be faults. If their contribution is high, explain why the system failed to reach them.
 
Please output strictly in the following format:
 
## Root Cause Conclusion
 
**Operational State Definition**:
- Power State: [Full Power State (TP = 100%)/Partial Load State]
- Flight Altitude: [Altitude value and state description]
 
**Root Cause Feature Locking** (Ranked by importance):
1. [Feature Name] - [Why it is the root cause, explained with operating state and altitude]
2. [Feature Name] - [Why it is the root cause, explained with operating state and altitude]

 
**Deep RCA**:
[Provide a detailed analysis of the fundamental cause. You MUST include:
1. Combine Full Power State: If TP = 100%, analyze why the system failed to meet the maximum power requirement.
2. Combine Altitude Factor: Explain how altitude affects the fault (If AP < 800 mBar, differentiate between physical altitude limits and actual faults).
3. Causal Chain Tracing: Trace from upstream features to downstream anomalies (e.g., TP -> EVOP -> MIAP -> MP).
4. Exclude Symptomatic Features: Explicitly state that target features are not root causes, but symptoms.]
 
**Fault Evolution Process**:
[Describe how the fault dynamically evolved:
1. Initial State: The fault began under [Operating State] and [Altitude].
2. Development Process: From the initial cause to final manifestations, combined with causal chains.
3. Final Manifestation: The ultimate presentation of the fault.]
 
---
**Crucial Reminders**:
- Do not output internal thought processes; output the conclusion directly.
- Target features cannot be faults. Find why the system failed to achieve them.
- Must combine operating state and altitude for RCA.
- Must trace causal chains to find the true root cause, focusing on Actual Value features and Control Variable features.
- Control Deviation features directly reflect tracking errors and are key indicators.
- Performance Indicator features reflect overall efficiency; anomalies indicate performance degradation.

References

  1. Sharifi, M.; Taghipour, S. Redundancy allocation problem with a mix of components for a multi-state system and continuous performance level components. Reliab. Eng. Syst. Saf. 2024, 241, 109632. [Google Scholar] [CrossRef]
  2. Huang, Y.; Tao, J.; Sun, G.; Wu, T.; Yu, L.; Zhao, X. A novel digital twin approach based on deep multimodal information fusion for aero-engine fault diagnosis. Energy 2023, 270, 126894. [Google Scholar] [CrossRef]
  3. Chen, S.; Wu, M.; Wen, P.; Xu, F.; Wang, S.; Zhao, S. A multimode anomaly detection method based on OC-ELM for aircraft engine system. IEEE Access 2021, 9, 28842–28855. [Google Scholar] [CrossRef]
  4. Liao, Z.; Zhang, R.; Zhao, H.; Gao, F.; Geng, J.; Chen, X.; Song, Z. Towards domain shifts: Stream fine-tuning via feed-forward fault data generation for on-board aero-engine gas-path diagnosis. Measurement 2024, 237, 115207. [Google Scholar] [CrossRef]
  5. Zheng, X.; Yao, W.; Gong, Z.; Zhang, X. Learnable quantile polynomial chaos expansion: An uncertainty quantification method for interval reliability analysis. Reliab. Eng. Syst. Saf. 2024, 245, 110036. [Google Scholar] [CrossRef]
  6. Stenfelt, M.; Hällqvist, R.; Kyprianidis, K. Handling of model uncertainties for underdetermined gas path analysis. Aeronaut. J. 2025, 129, 2263–2282. [Google Scholar] [CrossRef]
  7. Xu, J.; Wang, Y.; Wang, Z.; Wang, X.; Zhao, Y. Transient gas path fault diagnosis of aero-engine based on domain adaptive offline reinforcement learning. Aerosp. Sci. Technol. 2024, 155, 109701. [Google Scholar] [CrossRef]
  8. Liu, B.; Pang, J.; Yang, H.; Zhao, Y. Optimal condition-based maintenance policy for leased equipment considering hybrid preventive maintenance and periodic inspection. Reliab. Eng. Syst. Saf. 2024, 242, 109724. [Google Scholar] [CrossRef]
  9. Chen, M.; Qu, R.; Fang, W. Case-based reasoning system for fault diagnosis of aero-engines. Expert Syst. Appl. 2022, 202, 117350. [Google Scholar] [CrossRef]
  10. Zhou, M.; Li, Y.; Cao, Y.; Ma, X.; Xu, Z. Physics-informed spatio-temporal hybrid neural networks for predicting remaining useful life in aircraft engine. Reliab. Eng. Syst. Saf. 2025, 256, 110685. [Google Scholar] [CrossRef]
  11. Tang, H.; Wang, H.; Li, C. Explainable fault diagnosis method based on statistical features for gearboxes. Eng. Appl. Artif. Intell. 2025, 148, 110503. [Google Scholar] [CrossRef]
  12. Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  13. Tjoa, E.; Guan, C. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4793–4813. [Google Scholar] [CrossRef]
  14. Huang, X.; Marques-Silva, J. On the failings of Shapley values for explainability. Int. J. Approx. Reason. 2024, 171, 109112. [Google Scholar] [CrossRef]
  15. Wen, S.; Li, F.; Zhuang, W.; Pan, X.; Yu, W.; Bao, J.; Li, X. Leveraging large language models for Human-Machine collaborative troubleshooting of complex industrial equipment faults. Adv. Eng. Inform. 2025, 65, 103235. [Google Scholar] [CrossRef]
  16. Lin, L.; Zhang, S.; Fu, S.; Liu, Y. FD-LLM: Large language model for fault diagnosis of complex equipment. Adv. Eng. Inform. 2025, 65, 103208. [Google Scholar] [CrossRef]
  17. Zhou, B.; Li, X.; Liu, T.; Xu, K.; Liu, W.; Bao, J. CausalKGPT: Industrial structure causal knowledge-enhanced large language model for cause analysis of quality problems in aerospace product manufacturing. Adv. Eng. Inform. 2024, 59, 102333. [Google Scholar] [CrossRef]
  18. Wang, J.; Li, T.; Yang, Y.; Chen, S.; Zhai, W. DiagLLM: Multimodal reasoning with large language model for explainable bearing fault diagnosis. Sci. China Inf. Sci. 2025, 68, 160103. [Google Scholar] [CrossRef]
  19. Pan, Q.; Yao, L.; Shen, G.; Han, X.; Chen, Y.; Kong, X. Leveraging temporal validity of rules via LLMs for enhanced temporal knowledge graph reasoning. Knowl.-Based Syst. 2025, 327, 114094. [Google Scholar] [CrossRef]
  20. Yue, W.; Chai, J.; Wan, X.; Xie, Y.; Chen, X.; Gui, W. Root cause analysis for process industry using causal knowledge map under large group environment. Adv. Eng. Inform. 2023, 57, 102057. [Google Scholar] [CrossRef]
  21. Pan, Z.; Lu, W.; Bai, Y. Groundwater contaminated source estimation based on adaptive correction iterative ensemble smoother with an auto lightgbm surrogate. J. Hydrol. 2023, 620, 129502. [Google Scholar] [CrossRef]
  22. Hao, X.; Zhang, Z.; Xu, Q.; Huang, G.; Wang, K. Prediction of f-CaO content in cement clinker: A novel prediction method based on LightGBM and Bayesian optimization. Chemometr. Intell. Lab. Syst. 2022, 220, 104461. [Google Scholar] [CrossRef]
  23. Li, L.; Liu, Z.; Shen, J.; Wang, F.; Qi, W.; Jeon, S. A LightGBM-based strategy to predict tunnel rockmass class from TBM construction data for building control. Adv. Eng. Inform. 2023, 58, 102130. [Google Scholar] [CrossRef]
  24. Shang, S.; Li, J.; Hao, J. A LightGBM-based pricing method for healthcare data. Procedia Comput. Sci. 2025, 266, 1102–1108. [Google Scholar] [CrossRef]
  25. Wang, Z.; Han, G.; Liu, L.; Zhu, Y.; Abudurexiti, Y. A Shapley value-based method for formulating physical mechanism semantics of signal sequences in interpretable fault diagnosis. IEEE Trans. Instrum. Meas. 2025, 74, 3557616. [Google Scholar] [CrossRef]
  26. Edemacu, K.; Wu, X. Privacy preserving prompt engineering: A survey. ACM Comput. Surv. 2025, 57, 255. [Google Scholar] [CrossRef]
  27. Li, J.; Li, G.; Li, Y.; Jin, Z. Structured chain-of-thought prompting for code generation. ACM Trans. Softw. Eng. Methodol. 2025, 34, 37. [Google Scholar] [CrossRef]
  28. Hendawi, S.; Kanan, T.; Elbes, M.; Mughaid, A.; Alzu’bi, S. Automated prompt engineering pipelines: Fine-tuning LLMs for enhanced response accuracy. Expert Syst. Appl. 2026, 298, 129689. [Google Scholar] [CrossRef]
  29. Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 42. [Google Scholar] [CrossRef]
  30. Liu, P.; Qian, L.; Zhao, X.; Tao, B. Joint knowledge graph and large language model for fault diagnosis and its application in aviation assembly. IEEE Trans. Ind. Inform. 2024, 20, 8160–8169. [Google Scholar] [CrossRef]
  31. Liu, W.; Fan, H.; Xia, M.; Xia, M. A focal-aware cost-sensitive boosted tree for imbalanced credit scoring. Expert Syst. Appl. 2022, 208, 118158. [Google Scholar] [CrossRef]
  32. Tang, M.; Zhao, Q.; Wu, H.; Wang, Z. Cost-sensitive LightGBM-based online fault detection method for wind turbine gearboxes. Front. Energy Res. 2021, 9, 701574. [Google Scholar] [CrossRef]
  33. Li, W.; He, J.; Lin, H.; Huang, R.; He, G.; Chen, Z. A LightGBM-based multiscale weighted ensemble model for few-shot fault diagnosis. IEEE Trans. Instrum. Meas. 2023, 72, 3523014. [Google Scholar] [CrossRef]
  34. Zhang, J.; Zhang, Q.; Qin, X.; Sun, Y. A two-stage fault diagnosis methodology for rotating machinery combining optimized support vector data description and optimized support vector machine. Measurement 2022, 200, 111651. [Google Scholar] [CrossRef]
  35. Mendoza-Díaz, J.; Cueto-Barboza, C.; Portnoy, I.; Torregroza-Espinosa, A.C. A Novel Approach for Fault Detection and Diagnosis in Multi-mode Processes Based on PCA, Random Forest, and K-means Clustering. Int. J. Control Autom. Syst. 2024, 22, 3499–3508. [Google Scholar] [CrossRef]
  36. Trizoglou, P.; Liu, X.; Lin, Z. Fault detection by an ensemble framework of Extreme Gradient Boosting (XGBoost) in the operation of offshore wind turbines. Renew. Energy 2021, 179, 945–962. [Google Scholar] [CrossRef]
  37. Shi, J.; Peng, D.; Peng, Z.; Zhang, Z.; Goebel, K.; Wu, D. Planetary gearbox fault diagnosis using bidirectional-convolutional LSTM networks. Mech. Syst. Signal Process. 2022, 162, 107996. [Google Scholar] [CrossRef]
  38. Hong, D.; Kim, B. 1D convolutional neural network-based adaptive algorithm structure with system fault diagnosis and signal feature extraction for noise and vibration enhancement in mechanical systems. Mech. Syst. Signal Process. 2023, 197, 110395. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.