1. Introduction
Aero-engines [
1,
2,
3,
4] are the core components of aircraft, and their operating conditions directly affect flight safety and efficiency. During various operations such as takeoff, cruise, and landing, the engines encounter different aerodynamic loads, temperatures, and pressure environments, and their operating conditions are complex and variable. In-depth research on their operating characteristics under various conditions is of great significance for ensuring aviation safety and optimizing engine performance. To address the challenges of abnormal monitoring for aero-engines under different flight conditions, Sun et al. [
5] innovatively proposed a brand-new evaluation method based on state monitoring information. This method used kernel principal component analysis (KPCA) to construct the condition subspace of the aero-engine, thereby accurately depicting the performance evolution process of the engine. To solve the problem of abnormal monitoring for aero-engines under different flight conditions, Wang et al. [
6] developed an adaptive anomaly monitoring framework. Firstly, they used the mean and standard deviation method to preprocess the flight data, achieving the automatic division of complex flight scenarios; then, they constructed a monitoring model combining the Sparrow Search Algorithm (SSA) and Kmedoids algorithm to ensure adaptive monitoring in all flight conditions; finally, they designed specific indicators to precisely evaluate the abnormal risk via state monitoring. Through actual flight data verification, this method demonstrated higher detection accuracy and a lower false alarm rate (FAR) in the detection of sudden and progressive anomalies in aero-engines. Liu et al. [
7] proposed the Fourier graph network (OCFGNet) based on the representation of working condition characteristics, which maps the input flight sequence to a high-dimensional graph space. This space can integrate temporal and spatial dynamic features in a unified manner. The Fourier graph operator adopts a shared weight parameter mechanism, which can effectively balance information from adjacent nodes in different propagation sequences, fully capturing the spatio-temporal dependencies in the graph structure. Moreover, this operator also had high sensitivity to complex working conditions, achieving efficient attention while reducing computational complexity. Ding et al. [
8] constructed a refined time–frequency neural network (RTFNN)-interpretable model by designing a refined time–frequency convolution kernel (RTFCK) that embeds high-order phase operators, achieving efficient aggregation of fault feature information energy, significantly improving the interpretability and fault feature extraction ability of the model. Li et al. [
9] proposed the Spatio-Temporal Physical Field Intelligent Perception Method (SGM) based on diffusion generative models, providing a new approach for aero-engine combustion diagnosis and regulation, and helping to deeply understand the dynamic evolution of the complex internal physical processes of the engine under different operating conditions.
When using the Fourier infrared spectrometer to detect the operating conditions of an aero-engine, based on the absorption characteristics of substances to infrared light, it analyzes the composition by emitting infrared light, interacting with the sample, and processing the signals through Fourier transformation. Its core application is in the analysis of exhaust gas components, for judging combustion efficiency and faults, monitoring the state of lubricating oil to warn of wear and contamination, and also assisting in analyzing the composition of wear particles to locate the fault. This technology has advantages such as simultaneous detection of multiple components, rapid response, and non-destructiveness. Wolak et al. [
10] compared the analytical results obtained from two devices in order to quickly assess the quality of lubricating oil using infrared spectroscopy. This assessment was based on the changes in selected physical and chemical properties of engine oil that occurred during actual operation. The changes in physical and chemical properties such as oxidation degree, nitration degree, sulfonation degree, carbon content, basic value (TBN), and additive percentage content were analyzed in terms of direction and intensity. Based on the obtained results, the statistical relationship between the two alternative devices was thoroughly described. To achieve the rapid determination of the gasoline induction period, Liu et al. [
11] innovatively proposed a new analytical method based on Fourier transform–attenuated total reflection infrared spectroscopy (ATR-FTIR), and constructed a dedicated analytical system integrating spectral measurement, data processing, display, and storage. This system deeply integrated the Fourier transform infrared spectrometer module with metrological software. The sample display accessory was particularly crucial, using a zinc selenide (ZnSe) 9-fold reflection ATR crystal coated with diamond film combined with a stainless steel cover with a sealing device. This not only ensured a constant optical path but also significantly improved the convenience of sample injection and cleaning. Yin et al. [
12], based on data from Fourier transform infrared (FTIR) spectroscopy, Spectral Oil Analysis (SOA), and other conventional methods, discussed the oil monitoring experiment for the propeller steering. The experiment showed that the FTIR spectroscopy method could obtain results quickly and easily through laboratory analysis, and combined with the oil analysis of the spectrometer, the complementary information was most effective for the condition monitoring of marine machinery. Mike et al. [
13] used infrared spectroscopy to analyze the antioxidant content and total acid value of synthetic turbine oil for aero-engines. Two-dimensional infrared correlation analysis was used to study and interpret the trends observed in the spectra, because acids form in the oil and antioxidant substances are depleted, which is a function of aging and engine wear. Principal component and partial least squares algorithms were used and compared to develop calibration and prediction models.
An autoencoder is an unsupervised learning model based on neural networks. Its core objective is to automatically learn the latent feature representation of data through the compression–reconstruction process. Its structure is symmetrical, consisting of an encoder (Encoder) and a decoder (Decoder), and is commonly used for data dimensionality reduction, denoising, feature extraction, and generation tasks [
14,
15,
16]. In the field of anomaly detection, the nonlinear dimensionality reduction autoencoder (Autoencoder) demonstrates unique advantages. Sakurada et al. [
17] selected artificial data generated by the Lorenz system and real data from spacecraft telemetry as samples, processed them through the autoencoder for dimensionality reduction, and compared it with traditional linear principal component analysis (PCA) and kernel principal component analysis (kernel PCA) to deeply explore its performance characteristics. The autoencoder can sensitively capture subtle anomalies and successfully detect abnormal data points that linear methods cannot identify, significantly improving the sensitivity of anomaly detection. Additionally, by extending the autoencoder to a denoising autoencoder (Denoising Autoencoder), the model performance is further optimized, and the detection accuracy and robustness are improved. Compared with kernel principal component analysis, the autoencoder achieves nonlinear dimensionality reduction without the need for complex kernel function calculations, reducing computational complexity and improving algorithm efficiency. Gonzalez et al. [
18] employed an unsupervised variational autoencoder (VAE) to analyze a set of FTIR spectral data from multiple iron ore deposit reverse circulation (RC) drill core samples in the Pilbara region of Western Australia, in order to identify any potential anomalies. Yang et al. [
19] proposed a data-driven method based on FTIR data to predict the characteristics of crude oil. The autoencoder was used to learn a new representation form for the dimensionality reduction of FTIR data. The learned low-dimensional representation was input into SVR to predict the characteristics of crude oil.
The main contents of this work are as follows:
(1) A Fourier infrared spectrometer was utilized to conduct precise field measurements on the hot jet of a certain type of aero-engine, and the spectral data of the hot jet generated by the engine under different operating conditions was obtained.
(2) A dual-channel feature construction algorithm for spectral analysis is proposed. This algorithm consists of two branches: neighborhood integration and an autoencoder, which respectively extract the original spectral features and deep spectral features of the aero-engine hot jet, breaking through the limitations of traditional single-modal feature representation.
(3) A feature selection and fusion strategy based on physical significance is proposed. By designing a feature selection algorithm based on peak area and a cross-space feature fusion optimization mechanism, the accuracy and interpretability of clustering analysis were improved. This lays a solid data foundation for subsequent studies on hot jet characteristics comparison and fault diagnosis.
The structure of this thesis is mainly composed of five parts.
Section 2 reviews the current status of the motion characteristics of different operating conditions of aero-engines and the methods for feature extraction. It briefly introduces the methods, contributions, and framework of this paper.
Section 3 introduces the field experiment design for the measurement of hot jet spectra of aero-engines and the structure and details of the dual-channel feature spectral analysis method.
Section 4 elaborates on our experimental content and results and conducts a detailed analysis of the experimental results.
Section 5 presents our discussion on the experimental results, analyzes the advantages and limitations of the research methods, and explores potential improvement directions.
Section 6 provides a systematic summary of the entire paper.
2. Related Work
Spectral feature extraction [
20,
21] is a crucial step in extracting useful information from spectral data, reducing data dimensionality, and improving the efficiency of subsequent analysis. The methods for spectral feature extraction include two major categories: traditional and deep learning. Traditional methods directly utilize the original spectra or extract features based on physical meanings, such as the original spectral features with the original data appearance preserved, giving them low computational cost and strong interpretability. Statistical and physical features focus on key information to achieve effective dimensionality reduction, but both are limited by high-dimensional redundancy and complex sample processing capabilities [
22,
23]. In the field of biomedical mass spectrometry analysis, the Morris research team proposed a new method for extracting mass spectrometry data features by integrating translation-invariant wavelet transform with average spectral peak detection [
24]. This method achieved feature extraction and quantitative analysis through average spectra, enhanced signal processing capabilities using translation-invariant wavelet transform, and completed high-precision peak detection based on average spectra. The study systematically verified the effectiveness of this method through case analysis and simulation experiments, fully demonstrating the technical advantages of average spectra in peak detection. Additionally, the team innovatively constructed a computer mass spectrometry model based on physical mechanisms, providing a more theoretically supported algorithm framework for mass spectrometry analysis. He et al. [
21] used pulse eddy current (PEC) technology to achieve non-contact and non-destructive welding quality monitoring. Using PEC technology, they quantitatively detected laser-welded aluminum alloy structures with porosity and crack defects. They constructed a detection system to obtain the PEC signals of different defects in laser weld seams. They calculated characteristic parameters such as the peak, peak time, fundamental amplitude, peak and rising curvature ratio, fundamental and third harmonic amplitude ratio, and marginal spectral peak of the PEC signals, quantitatively representing the type and size of laser welding defects. They established a defect identification model based on a support vector machine (SVM), using input characteristic parameters to identify the type and depth of laser weld seam defects. Deep learning methods, with the help of models such as CNN [
25,
26], LSTM [
27], and Transformer [
28], can automatically learn spectral features, adaptively extract nonlinear patterns, efficiently denoise, and handle complex data. However, they have limitations such as high computational cost and strong data dependence.
Traditional spectral feature extraction methods still have several key deficiencies. On the one hand, conventional methods mainly rely on peak positions and peak intensities, making it difficult to capture the overall changes in spectral absorption and making them insensitive to subtle concentration differences under different operating conditions. On the other hand, some feature extraction methods based on deep learning can achieve high classification accuracy. Moreover, most existing algorithms cannot simultaneously retain local spectral information and mine deep nonlinear relationships, making it difficult to obtain robust and discriminative features from precious and small sample spectral data.
To overcome these shortcomings, this paper proposes a dual-branch feature extraction framework that integrates peak area calculation and a deep learning autoencoder. The peak area branch enhances the discrimination of concentration differences by integrating local spectral information and strengthens the quantitative expression of spectral changes. The autoencoder branch, on the other hand, adaptively extracts deep nonlinear features without the need for manual design. The combination of the two not only retains the clear physical interpretability brought by the peak area but also possesses the powerful feature expression ability of the autoencoder, thereby enabling more effective and reliable feature extraction for engine exhaust spectra under different operating conditions.
4. Experiments and Results
The overall experimental process of this article is shown in
Figure 6. The experiment is mainly divided into three parts: acquisition of hot jet spectral data of the aero-engine, feature extraction, and clustering.
4.1. Feature Construction Experiment
When processing the spectral data of an aero-engine’s hot jet acquired from outdoor field experiments, the model first conducts preprocessing on the raw spectral data. Specifically, normalization is applied to ensure the consistency and stability of the data, thereby laying a solid foundation for accurate feature extraction in subsequent steps. In the experiment, since the resolution of the hot jet of the aero-engine is 1 cm−1, the sliding window width of the neighborhood integral branch is set to 2, and the total length of the sliding window is 5. This is sufficient to capture the complete shape of the local absorption peak and does not cause feature blurring due to an overly large window. The step size is set to 1, and point-by-point sliding can retain the local information of all wave number points. The essence of the sliding window integration is the accumulation of local energy. The neighborhood integral value of a certain wave number point in the hot jet spectrum corresponds to the total radiative energy of molecular vibration or rotational energy level transitions near that wave number. The higher the integral value, the higher the molecular concentration and temperature in that area, which is directly related to the physical state of the hot jet. The neighborhood integration branch module works by integrating the brightness temperature information of the raw spectral data, from which it extracts 15 dimensional key original spectral features. Meanwhile, the autoencoder branch is dedicated to extracting the deep spectral features of the aero-engine’s hot jet. Through three layers of dimensionality reduction operations, this branch compresses the high-dimensional spectral data into a 15-dimensional feature space, enabling more-efficient subsequent analysis.
Under two distinct operating conditions, the spectral curves of the aero-engine hot jet exhibit a highly consistent variation trend, with significant consistency in their constituent substances. The primary difference between the two lies in the content ratio of each substance, a characteristic that provides a key basis for subsequent operating condition differentiation. Notably, the peak intensity of CO2 within the wavenumber range of 2387–2394 cm−1 is extremely high. This intense peak stands out prominently in the spectral graph and produces a significant masking effect: other characteristic peaks preceding the CO2 peak have relatively weak signal intensities, which are easily obscured when contrasted with the strong CO2 peak. This obscuration makes it challenging to accurately identify and analyze the substance information and associated characteristics represented by these weaker peaks. The neighborhood integration method demonstrates unique advantages in addressing this issue. Its implementation involves setting a local sliding window around each wavenumber point in the spectrum and summing the spectral intensity values within the window. Unlike methods that focus solely on the intensity of individual wavenumber points, this approach comprehensively considers the spectral information in the vicinity of each point. Specifically, it can integrate and amplify the information of the previously weak characteristic peaks that were masked by the strong CO2 peak. Through this integration process, even those characteristic peaks that were submerged by the strong peak (and previously difficult to distinguish) have their embedded information preserved and presented in the form of neighborhood integration values. Ultimately, this method enables the effective extraction of the weaker characteristic peaks that appear before the strong CO2 peak, providing a more robust means for the comprehensive and accurate analysis of the aero-engine hot jet’s spectral characteristics, and for identifying the various material components it contains.
Based on the characteristic wavenumber point indices selected by the neighborhood branch module, the results of the extracted original spectral features are presented in
Figure 7. Under the two distinct operating conditions, the hot jet spectra exhibit significant characteristics at specific wavenumber positions. Specifically, a clear absorption peak is observed at the wavenumber of 2283 cm
−1, a feature corresponding to compounds containing -C≡N. The presence of such cyanide-containing compounds confirms the existence of specific chemical components in the engine’s hot jet. Cyanides are relatively common in organic synthesis chemistry; their origin in the hot jet can be attributed to two main pathways: either as reaction products of certain nitrogen-containing organic components in the fuel during high-temperature combustion, or as products of further conversion of some combustion intermediates. Notably, the detection of cyanide-containing compounds holds important indicative value for evaluating three key aspects of the engine combustion process: the chemical reaction pathways involved, combustion efficiency, and pollutant generation. This information thus provides critical insights for in-depth analysis of the engine’s combustion performance and environmental impact.
A similar prominent spectral feature is also observed within the wavenumber range of 2387–2394 cm−1, which corresponds to CO2—one of the most common byproducts of combustion. During the combustion of an aero-engine’s fuel, the carbon–hydrogen components in the fuel react with oxygen in the air, generating large quantities of CO2. The absorption peaks appearing in this wavenumber range directly reflect the concentration and existing state of CO2 in the hot jet. By analyzing parameters such as the intensity of these absorption peaks, we can further assess the completeness of the engine’s combustion process. For instance, abnormal absorption peak intensity may indicate an oxygen-rich or oxygen-deficient condition during combustion—both of which can affect the engine’s performance and emission characteristics. Notably, combining the spectral characteristics of cyanide-containing compounds (at 2283 cm−1) with those of CO2 (at 2387–2394 cm−1) provides critical spectroscopic evidence for in-depth research into three core aspects of aero-engines under different operating conditions: combustion mechanisms, pollutant generation pathways, and emission patterns.
Figure 8 presents the t-SNE visualization of the feature extraction results obtained by the model. From this visualization, we can preliminarily conclude that the fused features are capable of distinguishing the hot jet spectral data of the aero-engine under different operating conditions to a certain extent. This observation confirms the effectiveness of the method that combines autoencoder-derived features with neighborhood integration-based features—specifically, it demonstrates that the method has successfully captured the discriminative information in the spectral data, which is critical for distinguishing between different operating conditions.
Figure 9 shows the 95% confidence interval comparison chart of the two types of samples at the extracted characteristic wave number points, indicating the average intensity differences and statistical stability of these two types of samples at these characteristic points. The intensity change trends of the two types of samples are highly synchronous: both reach a peak around 2391.2 cm
−1 and then gradually decrease. This indicates that the acceleration treatment did not change the overall peak shape of the spectrum, but instead enhanced the overall absorption intensity.
Figure 10 shows the SHAP plots for the neighborhood branch and the autoencoder branch. The important features are mainly distributed within the range of 2283–2395 cm
−1. The neighborhood branch pays more attention to the feature at the position of 2392.1 cm
−1, while the autoencoder branch focuses more on the feature at 2389.2 cm
−1. This difference in feature focus points not only verifies the reliability of the 2283–2395 cm
−1 range as a discriminative basis, but also reflects the complementarity of the two feature extraction strategies: the neighborhood branch focuses more on the local integral features of the spectrum, while the autoencoder branch focuses more on the abstract features encoded by a single point, providing interpretability support for the effectiveness of the subsequent fusion of features.
4.2. Validation Experiments for the Construction of Features
After extracting key original spectral features and key deep spectral features from the FTIR spectral data of the aero-engine wake flow via the proposed model, a multi-dimensional feature vector is constructed. To validate the effectiveness of this feature vector, an unsupervised clustering algorithm was employed for verification. These features of each sample are combined into a vector, thereby constructing a multi-dimensional feature matrix , where represents the number of samples, and is the multi−dimensional feature vector of the i−th sample. The extracted multi−dimensional feature matrix is input into the Agglomerative Clustering model for clustering.
Agglomerative Clustering is a clustering algorithm based on the agglomerative hierarchical clustering method [
33]. The core idea of this algorithm is to construct a clustering hierarchy structure through a bottom-up strategy [
34]. The algorithm starts with the finest granularity, treating each data point in the dataset as an independent initial cluster, and then continuously merges the clusters with the highest similarity (the closest distance) based on the pre-defined similarity measurement criterion until the preset stopping conditions are met, such as a specified number of clusters, a maximum distance threshold between clusters, or meeting a certain homogeneity standard. In practical applications, the methods for measuring the similarity between clusters are diverse. Common ones include Euclidean distance (measuring spatial geometric distance) and cosine similarity (evaluating the consistency of vector directions). In this study, we adopted the variance minimization-based method to measure inter-cluster distance. This method works by minimizing the dispersion of data points within each cluster, thereby enhancing the compactness (i.e., tightness of data distribution within clusters) and homogeneity (i.e., consistency of data attributes within clusters) of the final clustering results.
Step 1: Firstly, each sample is independently classified into a cluster. The merging cost is calculated based on the cluster centroid and the number of samples. Through sample quantity weighting, the influence during the merging of large clusters is balanced. At the same time, the difference between clusters is characterized by the square of the centroid distance. The smaller the value, the closer the data distribution after the clusters are merged, indicating a better clustering effect.
where
represents the number of cluster samples,
represents cluster centroid, and
represents Euclidean distance.
Step 2: After calculating the merging cost for all
cluster pairs, select the pair with the lowest cost
for merging and generate a new cluster
. Assign weights based on the original cluster sample size to ensure that the new centroid position reasonably reflects the distribution characteristics of the merged data. The centroid of the new cluster is
Step 3: Repeat Step 2 to achieve the Nth iteration, calculate the distances of the remaining clusters. After each iteration, the number of clusters decreases by 1. As the merging process progresses, the algorithm continuously calculates the distances between the remaining clusters and dynamically adjusts the clustering structure. When the number of clusters drops to the preset value, the algorithm terminates.
GMM [
35] and kmeans [
36] are also commonly used clustering algorithms. During the experiment, the feature vectors extracted by the proposed model were input into three distinct clustering models: Agglomerative Clustering, Gaussian Mixture Model (GMM), and kmeans. For visualization of the clustering results, t-SNE (t-distributed Stochastic Neighbor Embedding) was employed. Its core function is to convert the sample distance relationships in the high-dimensional feature space into a probability distribution in a low-dimensional space, while maximizing the preservation of the local neighborhood structure of the original data—this ensures that the relative proximity of samples in the low-dimensional visualization remains consistent with their relationships in the high-dimensional space. The visualization results of the clustering outcomes using t-SNE are presented in
Figure 11.
We adopted the evaluation metrics of Accuracy [
37,
38], Precision [
39,
40], Recall [
41,
42], and F1 score [
43,
44]. Accuracy is defined as the ratio of correctly predicted samples (both true positives and true negatives) to the total number of samples. Precision refers to the proportion of actually positive samples among all samples predicted as positive (based on the model’s output). Recall, by contrast, represents the proportion of correctly predicted positive samples relative to the total number of actual positive samples in the dataset. F1 Score is a core indicator in statistics for evaluating the prediction accuracy of binary classification models. Its main advantage lies in balancing Precision and Recall to comprehensively measure the overall performance of the model in identifying positive and negative samples. Specifically, it is represented by the harmonic mean of the two. Through the weighted harmonic mechanism, it avoids the model masking its true performance due to bias towards a certain type of prediction. The higher the value, the better the model achieves a balance between accurately identifying positive examples and reducing the omission of positive examples, and it is an important and practical standard for evaluating machine learning models.
Table 1 shows the results of each clustering algorithm. It can be seen that Agglomerative Clustering performs outstandingly in the performance indicators of Accuracy, Precision, and F1 score, while kmeans and GMM perform relatively weakly. It can be seen from
Figure 12 that the kmeans algorithm performs better in the Recall indicator.
where TP represents true positive cases (where the prediction is correct and the actual outcome is positive); TN represents true negative cases (where the prediction is correct and the actual outcome is negative); FP represents false positive cases (where the prediction is incorrect and the actual outcome is negative); and FN represents false negative cases (where the prediction is incorrect and the actual outcome is positive).
5. Discussion
This study focused on extracting the hot jet characteristics of an aero-engine under different operating conditions. We conducted the research using an on-site measurement method. To ensure data comparability, the experimental conditions strictly controlled the measurement environmental parameters (temperature, humidity, etc.) and geometric conditions (measurement distance, angle, etc.). A Fourier Transform Infrared spectrometer was used to collect spectral data from the central area of the engine’s tail nozzle. Notably, the temperature of the engine’s tail nozzle can reach 1500–2000 K, posing significant safety risks for direct measurement. In the on-site experimental environment, to safeguard the safety of measurement personnel and ensure the stable operation of the spectrometer, the instrument was deployed on the side of the engine. A measurement distance of 127–280 m was maintained between the hot jet source and the spectrometer [
45]. While this setup of measurement angle and distance ensures operational safety, it inevitably leads to attenuation of the hot jet radiation signal. Specifically, during the transmission of the spectral signal through the atmospheric medium in the measurement path, the atmosphere exerts absorption and attenuation effects on the signal. However, in the described on-site data collection, the experimental distance was relatively short, and the temperature difference between the hot jet mixed gas and the background environment was approximately 300–400 °C. Given these conditions, the influences of the atmosphere and ambient factors are temporarily neglected in the analysis.
In the data analysis stage of the hot jet spectrum of the aero-engine, this paper focuses on the radiation brightness temperature spectrum in the 400–4000 cm
−1 characteristic band. This band covers the characteristic absorption peaks of various key combustion products and contains abundant information about the combustion reaction. It is an important data window for revealing the essence of the combustion process. To deeply explore the characteristic information contained in the spectral data, the NAIDN feature extraction algorithm was designed in this paper. This algorithm innovatively adopts a dual-branch architecture, capturing the global features and local details of the spectral data through different branches, achieving hierarchical extraction of spectral features. One branch uses the local perception characteristics of the convolutional neural network to accurately locate the characteristic absorption peaks in the spectrum; the other branch uses the autoencoder architecture to learn the low-dimensional embedding representation of the spectral data and extract the intrinsic structural features of the data. The collaborative work of the two branches ensures the comprehensive extraction of complex features in the spectral data. As shown in
Figure 10, the SHAP value distribution of the neighborhood branch is wider and has a larger absolute value (up to ±0.075), indicating a stronger influence on the model’s decision-making. Traditional spectral feature extraction methods mostly rely on peak intensity for analysis, while other neural network architectures, although achieving high classification accuracy, have features with weak interpretability. This study focuses on the spectral feature extraction of the same model under different operating conditions. Since the tail gas components of the same model are consistent, the differences mainly lie in the concentration of substances. To more accurately depict these concentration differences, this study uses peak area as the core feature and integrates the wavenumber neighborhood to amplify the concentration differences between different operating conditions. In the experiment, we selected an appropriate half-width and step size of the sliding window, which not only retains the local information of all wavenumber points but also effectively enhances the sensitivity of the features to concentration changes.
After feature extraction is completed, the Agglomerative Clustering algorithm is used to verify the extracted features. This algorithm performs bottom-up hierarchical clustering to gradually aggregate similar spectral features, thereby achieving effective classification of spectral data under different operating conditions. The clustering results clearly show that in different operating conditions, the characteristic manifestations of cyanide compounds and carbon dioxide in the hot jet spectrum of the aero-engine are particularly prominent. Further analysis reveals that this characteristic difference may be the result of the combined effect of combustion thermodynamic conditions and chemical reaction kinetics paths under different operating conditions. Specifically, the thermodynamic parameters such as air–fuel ratio, temperature, and pressure in the combustion process directly affect the oxidation process of the fuel, and the significant characteristic of cyanide compounds reflects the degree of fuel pyrolysis and incomplete oxidation, and its concentration change is closely related to the combustion efficiency. While carbon dioxide is the main combustion product, its spectral signal enhancement is not only related to the generation amount but is also affected by subsequent secondary reactions and physical transmission processes. For example, the reverse water–gas shift reaction under high-temperature conditions leads to dynamic changes in the concentration of carbon dioxide, and the mixing process of the hot jet with the surrounding environment also changes its spatial distribution. Based on the above research findings, this provides new ideas and directions for engine combustion optimization. By adjusting the fuel injection strategy, such as adopting stratified combustion technology, a gradient distribution of fuel can be achieved, promoting the fine control of the combustion process; optimizing the airflow organization of the combustion chamber to enhance the mixing efficiency of fuel and air can effectively reduce the emission of incomplete combustion products and reduce the intensity of characteristic spectral signals, thereby achieving efficient and clean combustion of the aero-engine.
This study extracted multiple key features from the FTIR spectral data of an aero-engine under different operating conditions, including both the original spectral features and the deep spectral features. It also employed a feature selection and fusion strategy based on physical significance. This cross-space feature fusion can not only retain the physical interpretability of spectral analysis but also explore the inherent patterns of the data through deep learning. It is particularly suitable for complex hot jet scenarios influenced by multiple parameters, such as those of aero-engines, providing a more reliable feature basis for combustion state assessment and analysis of pollutant generation mechanisms. The current study is limited by the sample size. The current study is also limited by the model and operating condition dimensions covered by the samples. If we can obtain the hot jet multispectral data of all operating conditions of multiple aero-engines (covering small and large fan ratios, models, typical operating conditions such as takeoff at sea level, cruise, and high-altitude climb), we will use the existing feature extraction methods to explore more data features and thereby achieve the optimization design and intelligent diagnosis of aero-engines.