Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning

Zheng, Jiang; Yao, Mingming; Zhan, Kai; Lu, Qingfei

doi:10.3390/app16105062

Open AccessArticle

Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning

by

Jiang Zheng

,

Mingming Yao

,

Kai Zhan

and

Qingfei Lu

^*

School of Aerospace, Xihua University, Chengdu 610039, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(10), 5062; https://doi.org/10.3390/app16105062

Submission received: 23 April 2026 / Revised: 14 May 2026 / Accepted: 17 May 2026 / Published: 19 May 2026

(This article belongs to the Section Fluid Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

The aerodynamic performance of compressor stators critically affects aircraft engine efficiency, yet traditional CFD-based evaluation and optimization suffer from high computational cost. This study addresses this gap by developing deep learning surrogate models to predict total pressure loss coefficient and outlet flow angle deviation for compressor stator vanes, using two geometric parameters—stagger angle

β_{y}

, leading-edge radius ratio R_rle, and one operational parameter, attack angle α. A high-fidelity dataset of 1701 cases was generated via automated CFD simulations using the transitional SST k-ω model. Among evaluated models—including standard CNN, CBAM-CNN, SS-CNN, and CNN-Transformer, SS-CNN achieved the highest accuracy, reducing mean absolute percentage error from 3.56% to 2.03% for loss and from 1.49% to 1.11% for outlet angle, with substantial computational savings. These surrogate models were integrated into a multi-objective optimization framework. The optimized vane, featuring a reduced leading-edge radius ratio within a stable stagger range, reduced total pressure loss by 2.38% (from 0.0570 to 0.0556) at the design attack angle of −2.83°, while the outlet angle deviation decreased from 0.439° to 0.066° (85% reduction), with the outlet angle improvement concentrated near the design condition. This work demonstrates a systematic, data-driven pipeline combining parametric modeling, automated simulation, deep learning-based prediction, and rapid optimization, offering an efficient solution for intelligent compressor blade design.

Keywords:

stator blade; total pressure loss coefficient; outlet flow angle deviation; deep learning; multi-objective optimization

1. Introduction

Compressor stator vanes are critical components of modern aero-engines, whose aerodynamic performance directly affects overall efficiency and reliability. Accurate prediction and optimization of stator vane performance are therefore essential. Traditionally, these tasks rely on computational fluid dynamics (CFD) or experimental measurements. Although CFD offers high fidelity, it demands substantial computational resources and long turnaround times, which hinders rapid design iterations in engineering practice.

In recent years, data-driven methods, particularly deep learning, have shown great potential in solving complex fluid dynamics problems. Unlike physical modeling, deep learning extracts flow regularities from historical data, significantly accelerating predictions while maintaining acceptable accuracy. This capability has attracted growing attention in turbomachinery design. For performance prediction, Ma et al. [1] applied U-Net and 1D-CNN deep learning models to rapidly reconstruct the compressor S1 flow surface cascade flow field. Using design parameters as inputs, U-Net predicted the aerodynamic distribution with an average relative error below 1%, while 1D-CNN predicted surface aerodynamic parameters and flow field coefficients, achieving <1% error for the pressure recovery coefficient and <2% for the total pressure loss coefficient. Ramzanizadeh et al. [2] coupled four optimization algorithms (GA, PSO, HGAPSO, ICA) with least squares support vector machine (LSSVM) to model the dynamic viscosity of Al₂O₃/water nanofluid based on nanoparticle size, temperature, and concentration. Their approach achieved high predictive accuracy, with correlation coefficients up to 0.9871 (GA-LSSVM) and MSE values as low as 0.00854 (HGAPSO-LSSVM). Pakatchian et al. [3] called for more comprehensive models for axial compressors by incorporating additional inputs such as hub/tip dimensions, chord, camber, and blade profile type.

More recent studies have directly applied deep neural networks to flow field modeling and prediction. Santos et al. [4] used a convolutional neural network (CNN) to predict porous media flow fields within sub-second time without numerical simulation. Sekar et al. [5] designed a modified vision transformer-based encoder–decoder network to predict transonic flow over supercritical airfoils. They introduced multilevel wavelet transformation and gradient distribution losses into the loss function, reducing the maximum error near shock regions by 50%. Transfer learning via pretraining on large-scale datasets followed by fine-tuning improved generalizability. The model achieved a mean absolute error on the order of 1 × 10⁻⁴. Esfahanian et al. [6] optimized a two-stage axial turbine with 112 geometric parameters using CNN, achieving 0.83% efficiency gain and 1.02% power increase. Niu et al. [7,8] applied random forest, multilayer perceptron (MLP), 1D-CNN, and long short-term memory networks to predict blade aerodynamic characteristics, obtaining excellent agreement with CFD results. They further integrated CNN, physics-informed neural networks, and deep reinforcement learning to improve lift-to-drag ratio. Wang et al. [9] combined CNN with NSGA-II to optimize blade geometry, maximizing power coefficient and efficient operating range. Du et al. [10,11,12] adopted numerical simulation with MLP and CNN to predict aerodynamic parameters of compressor double-circular-arc blades, achieving pressure coefficient error below 0.2% and total pressure loss coefficient error below 1.2%, demonstrating CNN’s superiority over fully connected networks. Bruni et al. [13] developed a modified U-Net (C(NN)FD model) that predicted compressor tip clearance flow fields with errors below 0.01%. Wang et al. [14] established a Bezier-GAN for turbine blade profile generation and a dual-CNN for pressure/temperature field reconstruction, reducing single-blade off-design analysis from 38.4 h to 7.68 s. Huang, Xu et al. [15,16,17] applied deep deterministic policy gradient (DDPG) and its variants combined with genetic algorithms to optimize transonic rotors, achieving a 1.01% pressure ratio increase and demonstrating that reinforcement learning can be integrated with other methods for better design.

Despite these advances, most existing studies focus on rotors or turbine blades. Systematic investigations into compressor stator vanes—especially the coupled effects of key geometric/operational parameters such as attack angle, stagger angle, and leading-edge radius ratio—remain limited. Moreover, an end-to-end framework that seamlessly integrates parametric modeling, automated CFD simulation, deep learning-based surrogate modeling, and multi-objective optimization for stator vane design is still lacking. This work addresses this gap by developing such an integrated framework and systematically evaluating multiple attention-enhanced CNN architectures for stator performance prediction.

To address this gap, this paper develops an integrated design methodology for compressor stator vanes. The main contributions are fourfold:

(1) establishing a parametric modeling and automated CFD simulation platform, and using it to generate a systematically sampled dataset of 1701 stator vane configurations spanning a wide range of geometric and operational conditions; (2) comparing multiple surrogate models (standard CNN, CBAM-CNN, SS-CNN, and CNN-Transformer) for predicting total pressure loss coefficient and outlet flow angle deviation, evaluating both prediction accuracy (RMSE, MAE, R²) and computational cost; (3) performing multi-objective optimization using NSGA-II with the best-performing surrogate to reduce total pressure loss while meeting outlet angle targets; and (4) validating the optimized vane geometry via high-fidelity CFD and analyzing the underlying aerodynamic improvement mechanisms. The proposed framework demonstrates a viable path toward efficient, intelligent stator vane design.

2. Parametric Modeling and Automated Simulation Platform

Accurate and efficient parametric geometric modeling and numerical simulation are the cornerstones of intelligent aerodynamic optimization for blades. This chapter aims to construct a parametric modeling and automated simulation platform to provide high-quality, large-scale sample data for training subsequent data-driven surrogate models.

Drawing on the core idea of Pritchard’s eleven-parameter axial turbine blade geometry model (Figure 1 left) [18], this study independently developed a corresponding nine-parameter double-circular-arc blade generation program. The program takes key parameters—such as axial chord length, stagger angle, inlet/outlet blade angles, leading-edge/trailing-edge radius ratios, and pitch—as inputs, automatically calculates the centers and tangent points of the leading-edge and trailing-edge arcs, and uses cubic spline functions to connect them into complete pressure and suction surfaces. The blade generated by this method (Figure 1 right) can flexibly adjust the camber and thickness distribution by modifying the parameters, providing a foundation for systematically studying the influence of geometric parameters on aerodynamic performance.

To fully describe the two-dimensional geometry of a compressor stator vane (Figure 2 left), a set of physically meaningful design variables is selected, including axial chord length

C_{x}

, stagger angle

β_{y}

, inlet/outlet blade angles

β_{b_i n}

/

β_{b_o u t}

, inlet/outlet half-wedge angles

β_{w_i n}

/

β_{w_o u t}

, and leading-edge radius ratio R_rle (defined as leading-edge radius divided by axial chord length) and trailing-edge radius ratio R_tle (defined as trailing-edge radius divided by axial chord length). By adjusting these parameters, the camber, thickness distribution, and leading/trailing-edge characteristics can be flexibly controlled over a wide range.

The leading-edge shape affects the attack angle and the development of the blade surface boundary layer, making it a very important geometric parameter. The stagger angle directly influences the incoming flow attack angle and thus changes the flow field structure. The chord length affects blade solidity and the flow turning capability, and is also a necessary geometric parameter. Under the condition of fixed axial chord length, the stagger angle and the tangential chord are numerically related. Therefore, this study selects the leading-edge radius ratio R_rle, stagger angle

β_{y}

, and attack angle α as variables, keeping all other parameters fixed for blade generation. The fixed blade parameters used in the experiments are as follows (Table 1):

For platform development, this study uses Python 3.10 to integrate the parametric program with commercial software, establishing an automated simulation workflow that seamlessly links parameter input, geometric modeling, meshing, CFD solving, and result extraction, greatly improving data generation efficiency. Using this platform, samples covering the design space are generated through batch simulations, laying a solid data foundation for subsequent research.

All CFD simulations were performed using ANSYS Fluent 2024R2. A velocity inlet and a pressure outlet were specified as the boundary conditions. Convergence was assessed by monitoring the scaled residuals of continuity, momentum, and turbulence quantities, which were required to drop below 10⁻⁶.

A mesh independence study was conducted using five unstructured grids ranging from approximately 0.56 million to 1.34 million cells. The near-wall mesh was refined to maintain y⁺ ≈ 1 on all blade surfaces, satisfying the requirement of the Transition SST model. The minimum orthogonal quality exceeded 0.4 across all grids (Figure 2 right). Table 2 reports the variation in the predicted total pressure loss coefficient with grid resolution for the baseline case (α = 0.27°). The 0.66 million-cell grid was adopted for all production simulations as a balance between accuracy and computational cost.

To select an appropriate turbulence modeling approach, four widely used RANS closures—standard k-ε, Spalart–Allmaras (SA), SST k-ω, and the Transition SST model—were compared against the available experimental data. As shown in Table 3, the Transition SST model provides the best overall agreement with the experimental measurements for both the outlet flow angle and the total pressure loss coefficient. All model parameters were retained at their default Fluent settings. This model was therefore adopted for the entire dataset generation.

To validate the numerical method, steady simulations are performed under two inlet conditions and compared with experimental measurements under the same conditions (Table 4): inlet flow angle of 45° (attack angle 0.27°) and design inlet flow angle of 42.5° (attack angle −2.83°). The comparison metrics include outlet flow angle, outlet Mach number, and total pressure loss coefficient, supplemented by Mach number contours, static pressure contours, and total pressure contours to analyze flow structure and loss sources (Figure 3, Figure 4 and Figure 5). The blade parameters and the experimental/simulation data for the two conditions are listed below:

From the Mach number contours (Figure 3), the main flow region remains in the subsonic range under both conditions. Near the blade surfaces, a low-Mach streak extending along the flow direction can be observed, corresponding to the development of the near-wall boundary layer, which evolves into a clear wake velocity deficit zone after the trailing edge—this is the main source of velocity non-uniformity in the passage. Comparing the two conditions, the 45° condition exhibits a wider wake zone with a more pronounced velocity deficit, indicating a thicker boundary layer or a stronger tendency toward flow separation. In contrast, the 42.5° condition shows a narrower low-speed zone and smoother Mach contours, suggesting more stable near-wall flow and better velocity field uniformity, characteristic of operation closer to the design condition.

Quantitatively, for the 45° condition, the experimental outlet Mach number is 0.4562 and the simulated value is 0.4450, an error of about 2.4%, demonstrating that the simulation accurately predicts the outlet velocity level and reasonably captures the mainstream kinetic energy distribution. For the 42.5° condition, the experimental outlet Mach number is 0.4718 and the simulated value is 0.4647, again in good agreement, further confirming the consistency of the numerical model in predicting the velocity field.

The static pressure contours (Figure 4) clearly reveal the significant pressure difference between the suction and pressure surfaces of the blade. This pressure gradient is the fundamental mechanism driving flow turning and producing aerodynamic loading. In local regions, particularly on the suction surface, a distinct pressure drop followed by a recovery process can be observed. An excessively rapid pressure recovery intensifies the adverse pressure gradient, which may cause boundary layer thickening or even separation. Comparing the two conditions, the 45° condition exhibits a more concentrated low-pressure region and a steeper static pressure gradient, indicating more abrupt loading changes near the blade. The 42.5° condition, on the other hand, shows smoother static pressure contours and more uniform pressure recovery, reflecting more stable flow characteristics when operating closer to the design attack angle. This is consistent with the outlet flow angle prediction: the simulated value of 16.5113° at the design condition is nearly identical to the experimental measurement, confirming the close relationship between the static pressure field and the accuracy of flow turning prediction.

The total pressure contours (Figure 5) directly reflect the spatial distribution of irreversible losses in the passage. Under both conditions, a distinct low-total-pressure streak appears downstream of the trailing edge, consistent with the wake region in the Mach number contours. This indicates that losses are primarily concentrated in two regions: viscous dissipation within the near-wall boundary layer and turbulent mixing in the trailing-edge shear layer. The 45° condition exhibits a wider low-total-pressure wake band with more significant streamwise diffusion, implying stronger turbulent mixing and entropy generation, as well as poorer total pressure uniformity in the main flow. In contrast, the 42.5° condition shows a narrower total pressure deficit zone and smoother contours, suggesting that when the blade operates near the design attack angle, the losses are more localized and less diffused, resulting in better overall aerodynamic performance with lower flow energy loss.

3. Aerodynamic Performance Prediction Model

To establish a fusion neural network model for rapid prediction and optimization of stator vane aerodynamic performance, this study constructs a deep learning dataset based on the parametric modeling and automated simulation platform. All deep learning models were implemented in Python using the PyTorch 2.6.0 framework. Given the significant sensitivity and strong coupling characteristics of stator vane aerodynamic performance to geometric and inflow conditions, this chapter selects stagger angle, leading-edge radius ratio, and attack angle as key independent variables to predict the target outputs—total pressure loss coefficient and outlet flow angle—and to provide a data foundation for subsequent multi-objective optimization.

To enhance dataset diversity and improve model generalization, a series of image-based data augmentation techniques were applied to all 1701 blade profile images prior to data splitting. The blade images are 128 × 128 pixels. Considering the three input variables—attack angle, stagger angle, and leading-edge radius ratio—the augmentation operations, based on blade shape, are designed to reflect reasonable variations in operating conditions. Rotation is used to represent different attack angle states; scaling factors derived from normalizing the leading-edge radius ratio are used to represent small changes in the leading-edge radius ratio, which are visually subtle in the 128 × 128 images and thus made more distinguishable by scaling; the stagger angle is directly represented by the blade orientation in the image and is therefore not processed.

Following augmentation, all images were normalized by linearly scaling the pixel intensities from the original [0, 255] range to [0, 1] to improve numerical stability during subsequent model training. The target labels (total pressure loss coefficient and outlet flow angle) were min–max normalized using their respective global maximum and minimum values, thereby reducing the model’s sensitivity to the absolute magnitude of the outputs.

The augmented and normalized dataset was then randomly shuffled and partitioned into training, validation, and test subsets following a 65–15–20% ratio, corresponding to 1106, 255, and 340 samples, respectively. The split was performed once and kept fixed across all model comparisons to ensure a fair evaluation. The test set remained strictly untouched during all stages of training and hyperparameter tuning; model selection was conducted solely based on the validation loss.

All models were trained using the Adam optimizer (β₁ = 0.9, β₂ = 0.999, no weight decay) with an initial learning rate of 0.001. A mini-batch size of 50 was employed, and training was conducted for a maximum of 200 epochs. A learning rate scheduler (ReduceLROnPlateau) was applied, which reduced the learning rate by a factor of 0.5 when the validation loss failed to improve for 10 consecutive epochs. Model checkpoints were saved based on the best validation loss, and the corresponding weights were used for final evaluation on the held-out test set. The batch size, initial learning rate, and maximum number of epochs were determined through preliminary experiments monitoring validation loss convergence; the selected values represent a balance between training stability and computational efficiency.

It should be noted that augmentation was performed on the full dataset prior to splitting; however, the augmentation magnitudes were kept conservative (small rotation angles and scaling factors) to avoid introducing excessive similarity between training and test samples.

Except for the three variable parameters mentioned above, all other geometric and boundary conditions are kept constant to ensure that performance differences in the dataset are primarily caused by the key variables, thereby improving the interpretability and generalization capability of the model. The fixed parameters in this study include: axial chord length of 30 mm, baseline blade inlet and outlet metal angles of 45.2703° and 7.5488°, respectively, leading-edge thickness ratio (or related fixed quantity) of 0.0067, inlet and outlet endwall angles of 4°, and blade pitch of 24.9088 mm. The ranges of the above fixed and variable parameters are summarized in the Table 5.

The reasons for selecting stagger angle, leading-edge radius ratio, and attack angle as input variables are as follows:

The stagger angle essentially determines the geometric turning baseline of the cascade and the nominal attack angle relationship with the incoming flow. It is a core geometric parameter affecting the blade loading distribution, outlet flow direction, and off-design adaptability (stability margin). A small change in stagger angle can cause shifts in outlet flow angle deviation and loss level, making it a necessary variable for prediction and optimization.

The attack angle represents the incidence deviation between the incoming flow and the blade geometry, serving as a key inflow parameter for describing off-design conditions. Variations in attack angle directly alter the leading-edge attack angle, pressure gradients on the suction/pressure surfaces, and boundary layer state, thereby significantly affecting the total pressure loss coefficient and causing outlet flow angle deviations through changes in effective turning. Since inflow fluctuations are inevitable in actual operation, the attack angle is an indispensable variable for performance prediction and robustness analysis.

The leading-edge radius ratio controls the bluntness of the leading edge and the local acceleration process near it, playing an important modulating role in the initial development of the boundary layer, flow separation tendency, and wake thickness. Near the design point, the leading-edge radius ratio often serves as a “fine-tuning” parameter for performance. Under incidence deviations (especially at highly negative attack angles), the leading-edge radius ratio may amplify or suppress the sensitivity of losses and outlet angle deviations. Therefore, including it as a design variable helps further reduce losses and improve off-design robustness while satisfying the outlet flow angle target (16.95°).

In summary, this variable set has clear physical meaning and can effectively characterize the variation patterns of total pressure loss coefficient and outlet flow angle, making it suitable as input features for neural networks to achieve performance prediction and optimization design.

3.1. Evaluation Metrics

The evaluation metrics mainly consist of two parts: accuracy and computational cost.

Accuracy is primarily reflected by the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE).

MAPE: MAPE is a commonly used metric for measuring model prediction errors, especially in regression tasks. It represents the average percentage error between predicted and true values, intuitively reflecting the prediction accuracy of the model.

RMSE: RMSE is a commonly used metric for measuring the error between predicted and true values, also suitable for regression tasks. It reflects the average magnitude of the error. A smaller RMSE indicates that the predicted values are closer to the true values.

R² (Coefficient of Determination): R² is an important statistic in regression analysis for evaluating the goodness of fit of a model. It indicates the proportion of the variance in the dependent variable that is explained by the independent variables—that is, the degree of agreement between the model’s predictions and the actual observations. R² ranges between 0 and 1; the closer it is to 1, the better the model’s fit.

The mathematical formulas are as follows (where

y_{i}

is the true value,

\bar{y_{i}}

is the predicted value, and n is the number of samples)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - \bar{y_{i}}}{y_{i}}| \times 100 %

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

Computational Cost mainly includes the following aspects: FLOPs (floating point operations), the total number of trainable parameters (Params), the training time per epoch (Time), and the estimated total size of the model (a measure of storage space).

FLOPs: In deep learning, FLOPs are a standard metric for measuring the computational complexity of a single forward pass of a model. The total number of floating-point operations directly reflects the inherent complexity of the algorithm. FLOPs are hardware-independent; a smaller value indicates higher computational efficiency and faster inference speed. It is an important metric for designing low-complexity models.

Params: Params denote the parameters that need to be updated via backpropagation during model training and are a key metric for measuring model size and storage requirements. Params determine the storage size of the model (total bytes of parameters) and directly affect the operational efficiency on hardware devices. Params measure the static size of the model (the number of parameters that need to be stored), while FLOPs measure the dynamic computational complexity (the amount of computation required for one forward pass).

Time: Training time per epoch helps not only to monitor training progress but also to identify performance bottlenecks, compare the efficiency of different models and hyperparameter configurations, and gain a more comprehensive understanding of the training process.

Estimated Total Size: This metric is typically used to indicate the expected total amount of data or resources during data loading and model training. It helps predict the required resources, ensures sufficient resources for correct program execution, and further optimizes the model and resource allocation.

3.2. Model

3.2.1. CNN

Convolutional Neural Networks (CNN) are neural network architectures specifically optimized for spatial data processing and are particularly adept at extracting hierarchical feature patterns from inputs with geometric structures. Their basic architecture achieves its functionality through three core mechanisms: local feature extraction using convolutional kernels (filters), dimensionality reduction of feature maps via pooling operations, and the formation of nonlinear decision boundaries through fully connected layers [19]. On this CNN foundation, this study further compares CBAM-CNN and develops two enhanced architectures, SS-CNN and CNN-Transformer, to improve performance through modular extensions. To meet the requirements for embedded deployment, a systematic network architecture exploration was conducted based on the dataset. Each candidate network configuration was evaluated according to two metrics: computational efficiency and prediction accuracy. Through a trade-off analysis between overall performance and computational cost, a four-layer CNN architecture (Figure 6) achieved the best balance between operational efficiency and predictive capability (Table 6). The selected architecture employs 3 × 3 convolutional kernels, which significantly reduce the computational load compared to traditional large-kernel designs while maintaining performance.

This small-kernel strategy substantially lowers computational complexity while preserving the ability to identify key features such as compressor blade profiles. Through its local receptive field mechanism and hierarchical feature extraction process, the CNN effectively captures local details in images, including edges, textures, and shape features. By stacking convolutional layers, the network gradually abstracts image content at different scales, thereby preserving and representing micro-structures and local variations. However, due to the inherent limitation of the receptive field in CNNs, their ability to model global dependencies is relatively weak. Moreover, the strong sequential dependency between CNN layers limits parallel computation efficiency, which in turn restricts the model’s scalability in large-scale application scenarios.

3.2.2. SS-CNN

SS-CNN is formed by combining CBAM-CNN with the SE module.

The Spatial Attention Module (SAM) is a key component of the Convolutional Block Attention Module (CBAM) architecture [20] (Figure 7). As an efficient attention mechanism, CBAM enhances feature discriminability in both channel and spatial dimensions, enabling targeted feature strengthening while maintaining structural consistency of the learned representations.

The architectural advantage of CBAM mainly stems from its two sequentially integrated complementary sub-modules:

Channel Attention Module (CAM): dynamically adjusts the feature response of each channel through an adaptive weighting mechanism.

Spatial Attention Module (SAM): enhances the feature representation of salient spatial regions through a contextual feature association mechanism.

Empirical evidence shows that the channel-first sequential configuration achieves better performance compared to parallel or reversed sequential strategies.

The core of CBAM’s unique capability lies in its dual-pooling strategy: max pooling and average pooling. These two complementary pooling mechanisms achieve a critical performance improvement over traditional single-pooling convolution operations.

However, during preliminary testing, because the dataset is not extremely complex and the CBAM module has relatively high computational complexity, the fitting effect was poor while computational overhead remained high. Therefore, this study optimizes CBAM by first reducing the complexity of the Channel Attention Module (CAM) and replacing it with the Squeeze-and-Excitation (SE) module. The SE module consists of two core steps: Squeeze and Excitation [21].

Since CNNs are inherently biased toward capturing local features and already emphasize salient regions through max pooling, the role of the Spatial Attention Module (SAM) is primarily to evaluate the importance of local regions within the spatial domain—a function that partially overlaps with the effect of max pooling.

Therefore, to minimize computational overhead, this study chooses to integrate the SAM module only in the final layer of the network. Consistent with the sequential structure of CBAM, we connect the SE module and the SAMmodule in sequence, forming what we call the SS module. This design ultimately yields the SE-SAM-CNN (abbreviated as SS-CNN) model, as shown in Figure 8.

3.2.3. CNN-Transformer

CNN-Transformer (hereinafter referred to as CNN-T) is a hybrid model that combines a Convolutional Neural Network (CNN) with a Transformer, aiming to leverage the local perception capability of CNNs and the global modeling capability of Transformers.

Due to its excellent performance in natural language processing, the Transformer has recently gained extensive attention in the computer vision community. Dosovitskiy et al. [22] demonstrated that the Transformer also performs remarkably well in vision tasks. The Transformer is based on a self-attention mechanism and adopts an encoder–decoder architecture, where the encoder and decoder are built by stacking self-attention layers and fully connected layers. Unlike traditional Recurrent Neural Networks (RNNs) and CNNs, the Transformer completely discards recurrence and convolution, relying instead on self-attention to capture global dependencies. At the same time, it offers stronger parallelism, significantly reducing training time [23].

The output of a CNN is a 4D feature map of the shape (B, C, H, W), where B is the batch size, C is the number of channels, and H and W are the height and width of the feature map, respectively. Since the Transformer processes data in a sequential format, the CNN output feature map needs to be flattened into a sequence before being fed into the Transformer, e.g., into the shape (H × W, B, C). This flattened feature map can then serve as the input embedding for the Transformer.

Subsequently, a multi-head attention mechanism is used to model the global relationships of the input features. Unlike a single attention head, which only focuses on one subspace at a time and cannot fully learn the diversity of the input sequence, the multi-head attention mechanism learns different dependencies of the input sequence across multiple subspaces through several parallel attention heads. Each attention head operates independently on the input sequence, allowing the model to capture information from multiple perspectives. The queries (Q), keys (K), and values (V) of the input sequence are linearly projected into multiple low-dimensional subspaces. Attention is computed independently within each subspace, and the results from all heads are then concatenated and linearly transformed to produce the final output (Figure 9, left).

We choose to add the self-attention module only between the fourth convolutional layer and the fully connected layer (Figure 9, right). Compared with adding a multi-head self-attention module after every layer or placing it in shallower layers (e.g., the first or second layer), this approach reduces computational overhead and lowers complexity.

3.3. Prediction Accuracy Analysis

From the statistical indicators of the scatter plot (Figure 10 and Figure 11), the standard CNN exhibits the highest error level, with scatter points noticeably dispersed on both sides of the ideal line. Particularly in regions with higher loss coefficient values, there are more significant fluctuations of underestimation/overestimation, indicating that the standard CNN inadequately fits sample segments with strong nonlinearity and rapid gradient changes. After introducing attention mechanisms, the fitting consistency of the models improves significantly: CBAM-CNN, SS-CNN, and CNN-Transformer all bring the scatter points closer to the ideal line, suggesting that attention mechanisms enhance the response to critical feature combinations to a certain extent, thereby improving generalization and robustness. The scatter point distribution of the proposed SS-CNN is narrower, implying smaller error variance and stronger prediction stability across the entire range. This indicates that, given the current data scale and feature morphology, the SS-CNN model can more accurately fit nonlinear and locally sharply varying patterns in the data at a lower cost. The statistical indicators are as follows (Table 7 and Table 8):

The CNN-based method has irreplaceable advantages in this study: through the local receptive fields of convolutional kernels and hierarchical feature extraction, it can automatically learn high-dimensional information such as shape details, local curvature variations, and geometric topological differences from geometric field/image representations. For blade profile families that exhibit complex geometric differences and are difficult to fully characterize with a small number of parameters, this method offers stronger representational capability and transfer potential. Furthermore, the introduction of attention modules (e.g., CBAM, SS module, or Transformer structures) can further enhance the feature aggregation ability in key regions, facilitating the capture of local geometric features sensitive to aerodynamic performance.

3.4. Evaluation of Generalization

To assess whether the reported prediction accuracy reflects genuine generalization rather than local interpolation within a densely sampled parameter grid, a leave-one-attack-angle-out test was conducted for total pressure loss coefficient prediction. Specifically, all 189 samples corresponding to an attack angle of α = −2° (the design condition) were held out exclusively as the test set. The remaining samples, covering the other eight attack angles (−4°, −3°, −1°, 0°, 1°, 2°, 3°, 4°), were used for training and validation with a 65%/15% split. All four models—CNN, CBAM-CNN, CNN-Transformer, and SS-CNN—were trained from scratch and evaluated on the held-out α = −2° test set. This procedure was repeated three times with different random seeds to assess the stability of the results. The detailed results are presented in Table 9:

Several important observations emerge from Table 9. The SS-CNN model consistently achieves the best average performance across all three metrics, confirming that its architectural advantages extend to operating conditions entirely unseen during training. The mean MAPE of SS-CNN (4.04%) is approximately 20% lower than that of the baseline CNN (5.06%), and the mean R² improves from 0.650 to 0.748. Although the second run of SS-CNN exhibits a slightly higher RMSE (0.0047) and lower R² (0.702), the overall variability (standard deviation of MAPE = 0.25%) is well within an acceptable range, and the model’s best run achieves a MAPE as low as 3.81% with an R² of 0.777. The CBAM-CNN and CNN-Transformer perform comparably to each other, both outperforming the baseline CNN but falling short of SS-CNN, suggesting that the combination of SE-style channel attention with a single spatial attention module in the final layer is more effective for out-of-distribution generalization than either full CBAM or a Transformer-based hybrid under the present data regime.

Compared with the dataset used in Section 3.3 (where SS-CNN achieved MAPE = 2.03% and R² = 0.936), the leave-one-attack-angle-out performance exhibits an expected increase in prediction error across all models. Nevertheless, the SS-CNN MAPE remains at approximately 4%, and the R² of 0.75 confirms that the model retains substantial predictive capability even for an operating condition completely absent from training. An MAPE of 4% remains practically useful for rapid aerodynamic screening in preliminary design.

3.5. Comparison with Scalar-Input Baseline Models

The prediction task in this study can also be formulated as a low-dimensional scalar regression problem, since the three input variables (α,

β_{y}

, R_rle) are well-defined numerical parameters. To benchmark the image-based models against this natural alternative, two widely used scalar-input regression models were implemented: a multilayer perceptron (MLP) with architecture 3–64–32–1 using ReLU activation and the Adam optimizer, and an XGBoost regressor with 130 trees (max_depth = 3, learning_rate = 0.12). Both models receive the three geometric parameters directly as numerical inputs and were evaluated on the same randomly shuffled 65%/15%/20% dataset split used for the image-based models in Section 3.3.

As shown in Table 10, the scalar-input models outperform the image-based models across all metrics. The MLP achieves the best overall accuracy (RMSE = 0.0018, R² = 0.9516, MAPE = 1.63%), followed closely by XGBoost (RMSE = 0.0020, R² = 0.9450, MAPE = 1.69%). This performance gap is physically intuitive: the scalar models receive the three geometric parameters as exact numerical values with no intermediate information loss, whereas the image-based models must re-infer equivalent information from pixel renderings. Subtle geometric variations—most notably the leading-edge radius ratio R_rle changing in increments of 0.0003—are barely perceptible at a resolution of 128 × 128 pixels. Furthermore, the current design space is a densely sampled, three-variable full-factorial grid with smooth aerodynamic responses, representing near-ideal conditions for scalar regression.

Nevertheless, these findings do not negate the value of the image-based CNN approach; rather, they clarify its appropriate application domain. When blade geometries can be compactly described by a small, fixed set of scalar parameters, regression models operating directly on those parameters (MLP, XGBoost, Gaussian processes) are the recommended practical choice. The image-based CNN framework is intended for scenarios where scalar parameterization is infeasible or insufficient—for example, non-parametric blade profiles defined by Bezier/B-spline control points, scanned coordinate sets, or configurations with sweep, lean, and end-wall contouring that require high-dimensional or variable-length geometric descriptions. In such cases, scalar models cannot be applied directly, whereas the image-based pipeline can accept any geometry that can be rendered as an image. The CNN surrogate framework thus provides a generalizable pathway for aerodynamic performance prediction when the underlying geometry cannot be compactly parameterized.

3.6. Ablation Study on Attention Mechanisms

To isolate the contributions of the channel attention (SE) and spatial attention (SAM) modules to the prediction accuracy, an ablation study was conducted under the same random 65%/15%/20% data split used in Section 3.3. Five model configurations were compared: the baseline CNN with no attention, SAM-CNN (spatial attention only), SE-CNN (channel attention only), CBAM-CNN (combining CAM and SAM), and the proposed SS-CNN (SE attention on all convolutional layers plus SAM in the final layer). Each configuration was trained three times with different random seeds, and the mean and standard deviation of the evaluation metrics on the test set are reported in Table 11.

The ablation results reveal several clear trends. First, both attention mechanisms independently improve performance over the baseline CNN: SAM alone reduces MAPE from 3.68% to 2.98%, while SE alone reduces it to 2.89%. The SE module provides a marginally larger gain, suggesting that channel-wise feature recalibration is particularly effective for this aerodynamic prediction task. Second, combining both mechanisms yields the best result: SS-CNN achieves a MAPE of 2.07%, outperforming all single-module variants and also CBAM-CNN (MAPE 2.35%). The advantage of SS-CNN over CBAM-CNN is attributable to two design choices: replacing CAM with the lighter SE module reduces overfitting risk on the moderately sized training set, and restricting SAM to the final convolutional layer provides spatial gating where it is most impactful without introducing unnecessary complexity in earlier layers. Third, the low standard deviations across three runs confirm that the observed performance differences are stable and reproducible. These findings provide quantitative justification for each component of the SS-CNN architecture.

3.7. Computational Cost Analysis

From the perspective of model computational cost (Table 12), the theoretical computational load and parameter scale of the four CNN-based models are essentially the same: GFLOPs are approximately 3.49 for all, and the number of parameters is about 4.29 MB, indicating that, given the same shared backbone network, the introduction of attention modules does not significantly increase the theoretical computation or trainable parameters of the models. Differences exist in actual inference time per epoch: CNN-T has the shortest time (0.36 s), followed by SS-CNN (0.53 s), while CBAM-CNN and the standard CNN are similar (0.57 s). This suggests that actual speed is influenced not only by FLOPs but also by parallel efficiency. In terms of runtime memory overhead, CNN-T has the lowest memory usage (20.82 MB), CNN and SS-CNN are in the middle (28.76 MB and 29.77 MB, respectively), and CBAM-CNN has the highest (31.67 MB). These differences mainly arise from the intermediate features and temporary tensors introduced by the attention branches. Overall, under comparable computational loads, CNN-T offers advantages in inference efficiency and memory usage, while CBAM-CNN has relatively higher runtime overhead.

In summary, SS-CNN is more suitable for rapid modeling and optimization iteration from structured parameters to performance metrics, and is better suited for scenarios in-volving more complex geometric shapes where the relationship between high-dimensional shape information and performance needs to be uncovered.

4. Aerodynamic Optimization Design

Unlike the traditional NSGA-II [24], which treats operating condition variables and structural variables together as decision variables and uses a single operating point indicator as the optimization objective, this paper targets the characteristic that the angle of attack of a stator blade fluctuates with operating conditions in actual operation. The angle of attack is treated as a disturbance variable, while the stagger angle and leading-edge radius ratio are taken as design variables. An interval evaluation of performance is performed over a sampling range of angle of attack from −4° to 4°. The three criteria—”average loss”, “maximum exit angle deviation”, and “deviation fluctuation”—are combined to form a multi-objective function. The SS-CNN model is embedded, and NSGA-II is proposed to obtain a Pareto front that satisfies the requirements of low loss, small deviation, and low sensitivity, thereby improving the applicability and stability of the blade profile over the full range of angles of attack.

The optimization objectives are: 1. Low overall loss; 2. Low overall exit angle deviation; 3. Avoid a sudden large deviation at a particular angle of attack.

The optimization problem is formulated as follows. Each candidate stator geometry, defined by a stagger angle

β_{y}

and a leading-edge radius ratio R_rle, is evaluated by the SS-CNN surrogate model at nine equally spaced attack angles α ∈ {−4°, −3°, −2°, −1°, 0°, 1°, 2°, 3°, 4°}. All nine operating conditions are treated with equal importance. Three objective functions are defined:

f_{1} = \frac{1}{9} \sum_{k = 1}^{9} ω (α_{k})

f_{2} = \max_{k = 1, \dots, 9} |β_{out} (α_{k}) - {16.95}^{°}|

f_{3} = std (β_{out} (α_{k}) - {16.95}^{°})

where ω(

α_{k}

) and

β_{o u t}

(

α_{k}

) are the predicted total pressure loss coefficient and outlet flow angle at attack angle

α_{k}

, respectively.

NSGA-II is applied to search for the Pareto-optimal set with respect to (

f_{1}

,

f_{2}

,

f_{3}

). From the resulting front, candidate designs are filtered by enforcing

f_{2}

≤ 0.1° to guarantee acceptable outlet angle tracking across the entire attack-angle range, and the configuration minimizing

f_{1}

is selected as the final optimized design.

Comparison of blade profiles before and after optimization is shown in Figure 12. Simulation calculations were performed for nine angles of attack (including the design angle) in the range of −4° to 4°. The comparison of total pressure loss coefficient and exit flow angle before and after optimization is given in Table 13. The exit angle is closer to the design exit angle at negative angles of attack; although the deviation increases at positive angles, the overall total pressure loss coefficient decreases by about 4% across the entire range.

The Mach number contour (Figure 13) of the optimized blade shows a more continuous high-Mach region, smoother isocontours without obvious abrupt changes, and a weaker tendency for flow separation. The static pressure contour (Figure 14) of the optimized blade shows that the low-pressure region is more wall-attached and the isobars are smoother, suggesting a milder local expansion transition.The total pressure contour (Figure 15) of the optimized blade exhibits a thinner and more trailing-edge-attached blue/cyan low-pressure band behind the blade trailing edge, which diffuses more slowly downstream, indicating a reduction in total pressure loss. These flow field characteristics are consistent with the quantitative results: the total pressure loss coefficient decreases from 0.057 to 0.0556, and the exit angle increases from 16.51° to 16.88°.

From the perspective of total pressure loss coefficient, the optimized blade exhibits a consistent reduction across the entire tested attack-angle range (−4° to 4°), with the reduction being more pronounced at larger positive angles. At the design attack angle of −2.83°, the total pressure loss coefficient decreases from 0.0570 to 0.0556, a reduction of approximately 2.38%.

From the perspective of outlet flow angle deviation, the improvement is more selective with respect to operating condition (Figure 16 and Figure 17). Near the design attack angle (the −4° to −1° range), the outlet flow angle moves significantly closer to the target value of 16.95°. For example, at −2.83°, the deviation relative to 16.95° decreases from 0.439° to 0.066°, a reduction of approximately 85%. However, at 0° and positive attack angles, the outlet angle of the optimized blade is systematically higher, resulting in a larger deviation from 16.95° than that of the baseline. This indicates that optimizing the stagger angle and leading-edge radius ratio for low loss and improved turning near the design condition necessarily involves a trade-off: the improvement on one side of the attack-angle range comes at the cost of increased deviation on the other side.

It should be emphasized that this trade-off is a consequence of the equal-weighting strategy across the nine attack angles adopted in the multi-objective formulation, not a failure of the optimization method itself. The resulting design is best suited for applications in which the compressor stator operates predominantly near the design attack angle and its immediate neighborhood. For scenarios requiring uniformly tight outlet angle control across a wide range of off-design conditions, a weighted formulation that assigns higher importance to specific attack-angle intervals would be more appropriate.

5. Conclusions

This study focuses on the aerodynamic performance analysis and optimization design of compressor stator blades. An integrated technical route consisting of “parametric modeling—automated simulation—dataset construction—surrogate model prediction—multi-objective optimization and validation” has been established. A high-fidelity dataset covering 1701 blade profile cases was efficiently generated, and the reliability of the simulation model was verified through experiments, providing a physical and data foundation for subsequent data-driven modeling.

Based on this, a comparative study of various data-driven prediction models was carried out. A surrogate model framework including CNN, CBAM-CNN, SS-CNN, and CNN-Transformer was constructed and comprehensively evaluated from two aspects: prediction accuracy (RMSE, MAE, R²) and computational cost. The results show that the prediction models can significantly reduce the cost of performance evaluation while maintaining acceptable error levels, laying the groundwork for rapid iterative optimization using surrogate models in place of CFD. At the same time, the trade-offs between accuracy and efficiency for different models are clearly characterized, providing alternative pathways for high-precision regression in engineering applications.

Finally, multi-objective optimization was performed based on the surrogate model and a NSGA-II algorithm. The optimized blade profile, while maintaining a stable stagger angle, adopts a reduced leading-edge radius ratio. The total pressure loss near the design attack angle is reduced by 2.38%, and the exit flow angle deviation is reduced from 0.439° to 0.066° (an 85% reduction). It is noted that these improvements are achieved primarily in the negative-to-near-design attack-angle range; at positive off-design conditions, the outlet angle deviation is slightly increased. This trade-off is inherent to the equal-weighting formulation and may be adjusted in future work through condition-dependent weighting strategies.

From an engineering application perspective, the workflow presented in this paper is primarily based on numerical simulation data and surrogate models, and is mainly oriented toward subsonic operating conditions. However, as the blade Mach number increases, prediction accuracy decreases. For high-load compressor design, more attention needs to be paid to the predictive performance of traditional empirical models under high-load and supersonic inflow conditions. When conditions permit, experimental data from transonic blade profiles should be used to calibrate empirical models to broaden their applicable range [25].

Although the method adopted in this paper can improve the efficiency of performance evaluation and optimization design to some extent, its conclusions and methods are still inevitably subject to several limitations. First, enhancing model generalizability requires expanding the training samples. However, experimental data for advanced compressors are rarely publicly available, so the sample data are mainly derived from simulations. Although necessary comparisons and validations with experimental results have been performed, for strongly nonlinear phenomena such as near-wall flow, separation/re-attachment, and wake mixing, the choice of turbulence model, grid resolution, and wall treatment in the simulation itself may introduce systematic errors, thereby affecting the authenticity of the surrogate model training targets and the prediction of optimal solutions. One specific limitation in this context is the lack of experimental validation for the optimized blade. Owing to the unavailability of a low-speed cascade wind tunnel, the optimization results were verified solely by re-evaluation with the same CFD method used for dataset generation. Although the close agreement between the SS-CNN predictions and the CFD re-assessment suggests that the optimization is reliable, future work should include experimental testing to further confirm the predicted gains. Second, the effectiveness of the surrogate model depends on the coverage of the parameter space by the training samples and the rationality of their distribution. In regions with abrupt response changes or high sensitivity, there may still be issues of insufficient generalization and difficulty in quantifying uncertainty, so the robustness of the optimization results under extreme conditions or assembly deviations requires further verification. Finally, this paper focuses on two core indicators, the total pressure loss coefficient and the exit flow angle deviation, without systematically incorporating more complex engineering factors such as endwall effects, secondary flow structures, and consistency constraints across multiple operating conditions. In light of these limitations, future work should gradually improve the credibility and engineering applicability of the method through more abundant experimental validation data, adaptive sampling and uncertainty modeling for critical regions, and the introduction of multi-condition optimization frameworks, thereby moving the proposed workflow closer to a reusable engineering design tool.

Author Contributions

Conceptualization; methodology; software; writing—original draft preparation: J.Z. Writing—review and editing: M.Y. and K.Z. funding acquisition; Project administration: Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

All data and code are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

$C_{x}$	axial chord length
$β_{y}$	stagger angle
$β_{b_i n}$	inlet blade angle
$β_{b_o u t}$	outlet blade angle
R_rle	leading edge radius ratio to axial chord
R_tle	trailing edge radius ratio to axial chord
$β_{w_i n}$	inlet half-wedge angle
$β_{w_o u t}$	outlet half-wedge angle
t	pitch
α	attack angle
CNN	Convolutional Neural Networks
CBAM	Convolutional Block Attention Module
SAM	Spatial Attention Module
CAM	Channel Attention Module
SE	Squeeze-and-Excitation module
SS	Squeeze-and-Excitation + Spatial Attention Module
NSGA-II	Multi-objective Genetic Algorithm

References

Ma, Y.L.; Du, Z.; Xu, Q.Y.; Qi, J.H. Flow field reconstruction of compressor blade cascade based on deep learning methods. Aerosp. Sci. Technol. 2024, 155, 109637. [Google Scholar] [CrossRef]
Ramezanizadeh, M.; Ahmadi, M.A.; Ahmadi, M.H.; Nazari, M.A. Rigorous smart model for predicting dynamic viscosity of Al₂O₃/water nanofluid. J. Therm. Anal. Calorim. 2019, 137, 307–316. [Google Scholar] [CrossRef]
Pakatchian, M.R.; Ziamolki, A.; Nazari, M.A. Applications of machine learning approaches in aerodynamic aspects of axial flow compressors: A review. Front. Energy Res. 2023, 11, 1135055. [Google Scholar] [CrossRef]
Santos, J.E.; Xu, D.; Jo, H.; Landry, C.J.; Prodanovic, M.; Pyrcz, M.J. PoreFlow-Net: A 3D convolutional neural network to predict fluid flow through porous media. Adv. Water Resour. 2020, 138, 103539. [Google Scholar] [CrossRef]
Deng, Z.W.; Wang, J.; Liu, H.S.; Xie, H.R.; Li, B.K.; Zhang, M.; Jia, T.M.; Zhang, Y.; Wang, Z.D.; Dong, B. Prediction of transonic flow over supercritical airfoils using geometric-encoding and deep-learning strategies. Phys. Fluids 2023, 35, 075146. [Google Scholar] [CrossRef]
Esfahanian, V.; Izadi, M.J.; Bashi, H.; Ansari, M.; Tavakoli, A.; Kordi, M. Aerodynamic shape optimization of gas turbines: A deep learning surrogate model approach. Struct. Multidiscip. Optim. 2024, 67, 2. [Google Scholar] [CrossRef]
Niu, Y.; Zhao, K.N.; Yang, Y.J.; Yao, M.H.; Wu, Q.L.; Bai, B.; Ma, L. Integration of deep learning and computational fluid dynamics for rapid aerodynamic force prediction of compressor blades. Phys. Fluids 2024, 36, 103610. [Google Scholar] [CrossRef]
Niu, Y.; Zhao, K.N.; Yao, M.H.; Wu, Q.L.; Yang, S.W.; Ma, L. Aerodynamic force prediction of compressor blade surfaces based on machine learning. Phys. Fluids 2024, 36, 083614. [Google Scholar] [CrossRef]
Wang, L.Y.; Xu, J.; Luo, W.; Luo, Z.H.; Xie, J.H.; Yuan, J.P.; Tan, A.C.C. A deep learning-based optimization framework of two-dimensional hydrofoils for tidal turbine rotor design. Energy 2022, 253, 124130. [Google Scholar] [CrossRef]
Du, Z.; Ma, Y.; Xu, Q.; Wu, F.; Feng, X. ResNet data-driven compressor blade profile optimization. J. Aerosp. Power 2023, 38, 1592–1603. [Google Scholar] [CrossRef]
Du, Z.; Xu, Q.; Ma, Y.; Jiang, Y. Compressor blade profile performance prediction based on deep neural network. J. Aerosp. Power 2025, 40, 20240123. [Google Scholar] [CrossRef]
Du, Z.; Xu, Q.; Song, Z.; Wang, H.; Ma, Y. Prediction of aerodynamic characteristics of compressor blade profiles based on deep learning. J. Aerosp. Power 2023, 38, 2251–2260. [Google Scholar] [CrossRef]
Bruni, G.; Maleki, S.; Krishnababu, S.K. C(NN)FD—A Deep Learning Framework for Turbomachinery CFD Analysis. IEEE Trans. Ind. Inform. 2024, 20, 10230–10237. [Google Scholar] [CrossRef]
Wang, Y.Q.; Du, Q.W.; Li, Y.Z.; Zhang, D.; Xie, Y.H. Field reconstruction and off-design performance prediction of turbomachinery in energy systems based on deep learning techniques. Energy 2022, 238, 121825. [Google Scholar] [CrossRef]
Xu, X.H.; Huang, X.D.; Bi, D.F.; Zhou, M. An Intellectual Aerodynamic Design Method for Compressors Based on Deep Reinforcement Learning. Aerospace 2023, 10, 171. [Google Scholar] [CrossRef]
Xu, X.H.; Huang, X.D.; Bi, D.F.; Zhou, M. A Combined Artificial-Intelligence Aerodynamic Design Method for a Transonic Compressor Rotor Based on Reinforcement Learning and Genetic Algorithm. Appl. Sci. 2023, 13, 1026. [Google Scholar] [CrossRef]
Xu, X.H.; Huang, X.D.; Zhang, K.; Zhou, M. An intellectual design case of compressor airfoils based on reinforcement learning. Eng. Comput. 2023, 40, 2145–2173. [Google Scholar] [CrossRef]
Pritchard, L. An eleven parameter axial turbine airfoil geometry model. In Proceedings of the Turbo Expo: Power for Land, Sea, and Air, Houston, TX, USA, 18–21 March 1985; p. V001T003A058. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module; Springer: Cham, Switzerland, 2018. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Dosovitskiy, A. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.U.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Cheng, H.; Yi, W.; Ji, L. Application research of machine learning in optimization design of high-pressure ratio centrifugal impeller. J. Eng. Thermophys. 2020, 41, 2734–2741. [Google Scholar]

Figure 1. Parametric Blade Profile Design: Eleven-parameter blade profile (left); Nine-parameter blade profile (right).

Figure 2. Stator blade and parameter schematic (left); mesh schematic (right).

Figure 3. Mach Number Contours: (left) inlet flow angle 45° (attack angle 0.27°); (right) inlet flow angle 42.5° (attack angle −2.83°).

Figure 4. Static Pressure Contours: (left) inlet flow angle 45° (attack angle 0.27°); (right) inlet flow angle 42.5° (attack angle −2.83°).

Figure 5. Total Pressure Contours: (left) inlet flow angle 45° (attack angle 0.27°); (right) inlet flow angle 42.5° (attack angle −2.83°).

Figure 6. CNN Model.

Figure 7. CBAM-CNN Model.

Figure 8. SS-CNN Model.

Figure 9. CNN-Transformer Model.

Figure 10. Prediction Results of Total Pressure Loss Coefficient.

Figure 11. Prediction Results of Outlet Flow Angle.

Figure 12. Blade Profile Comparison Before and After Optimization.

Figure 13. Mach Number Contours Before (left) and After (right) Optimization.

Figure 14. Static Pressure Contours Before (left) and After (right) Optimization.

Figure 15. Total Pressure Contours Before (left) and After (right) Optimization.

Figure 16. Loss Before and After Optimization.

Figure 17. Exit Flow Angle Before and After Optimization.

Table 1. Blade Profile Parameters.

$C_{x}$ (mm)	30
$β_{y}$ (°)	26.9596
$β_{b_i n}$ (°)	45.2703
$β_{b_o u t}$ (°)	7.5488
R_rle	0.0083
R_tle	0.0067
$β_{w_i n}$ (°)	4
$β_{w_o u t}$ (°)	4
t (mm)	24.9088

Table 2. Mesh independence study (attack angle 0.27°).

Grid (Cells)	Total Pressure Loss Coefficient
0.56 × 10⁶	0.0704
0.66 × 10⁶	0.0606
0.84 × 10⁶	0.0600
1.07 × 10⁶	0.0602
1.20 × 10⁶	0.0589
1.34 × 10⁶	0.0581

Table 3. Turbulence model comparison (attack angle 0.27°).

Model	Outlet Flow Angle (°)	Total Pressure Loss Coefficient
Experiment	15.8589	0.0564
k-ε	16.7198	0.0656
SA	16.9589	0.0683
SST k-ω	16.9554	0.0622
Transition SST	16.9552	0.0606

Table 4. Validation of Simulation Accuracy.

	Inlet Flow Angle 45° (Attack Angle 0.27°)			Inlet Flow Angle 42.5° (Attack Angle −2.83°)
	Experiment	Simulation	Error	Experiment	Simulation	Error
Outlet flow angle	15.8589	16.9552	6.9%	16.5113	16.5113	0%
Outlet Mach number	0.4562	0.4450	2.4%	0.4718	0.4647	1.5%
Total pressure loss coefficient	0.0564	0.0606	7.4%	0.0543	0.0570	4.98%

Table 5. Selection of Blade Profile Parameters.

$C_{x}$ (mm)	30
$β_{y}$ (°)	22~30
$β_{b_i n}$ (°)	45.2703
$β_{b_o u t}$ (°)	7.5488
R_rle	0.0060~0.0120
R_tle	0.0067
$β_{w_i n}$ (°)	4
$β_{w_o u t}$ (°)	4
t (mm)	24.9088
α (°)	−4~4

Table 6. Performance Metrics for Different Layers.

Layers	2	3	4	5	6	7
RMSE	0.0047	0.0046	0.0037	0.0043	0.0031	0.0047
MAPE	4.91%	4.24%	3.77%	3.72%	3.41%	3.38%
GFLOPs	2.2151	2.7525	3.4931	4.3352	5.2282	6.1465
Params (M)	16.7841	8.4159	4.2932	2.4976	2.6283	6.8201
Time (s)	0.4126	0.4641	0.5777	0.6562	0.7181	0.6696
Estimated Total Size (MB)	73.97	43.66	28.76	22.29	23.00	39.11

Table 7. Comparison of Multi-Model Prediction Accuracy for Loss.

Loss	CNN	CBAM-CNN	SS-CNN	CNN-T
RMSE	0.0035	0.0024	0.0022	0.0028
MAPE	3.56%	2.23%	2.03%	2.70%
R²	0.8402	0.9275	0.9360	0.8961

Table 8. Comparison of Multi-Model Prediction Accuracy for Flow Angle.

Angle	CNN	CBAM-CNN	SS-CNN	CNN-T
RMSE	0.3483	0.2601	0.2530	0.3154
MAPE	1.49%	1.12%	1.11%	1.41%
R²	0.9768	0.9870	0.9877	0.9809

Table 9. Leave-one-attack-angle-out generalization test results for all models (held-out α = −2°).

Model	Run	RMSE	R²	MAPE
CNN	1	0.0053	0.6307	5.22%
	2	0.0050	0.6614	4.92%
	3	0.0051	0.6574	5.03%
	Mean ± Std	0.0051 ± 0.0002	0.650 ± 0.014	5.06 ± 0.12%
CBAM-CNN	1	0.0043	0.7502	4.04%
	2	0.0047	0.7004	4.51%
	3	0.0044	0.7380	4.15%
	Mean ± Std	0.0045 ± 0.0002	0.729 ± 0.022	4.23 ± 0.20%
CNN-Transformer	1	0.0043	0.7558	4.17%
	2	0.0047	0.7032	4.56%
	3	0.0046	0.7239	4.27%
	Mean ± Std	0.0045 ± 0.0002	0.728 ± 0.023	4.33 ± 0.16%
SS-CNN	1	0.0041	0.7773	3.81%
	2	0.0047	0.7020	4.38%
	3	0.0042	0.7649	3.92%
	Mean ± Std	0.0043 ± 0.0003	0.748 ± 0.033	4.04 ± 0.25%

Table 10. Comparison of scalar-input baselines and image-based models for total pressure loss coefficient prediction.

Model	Input Type	RMSE	R²	MAPE
XGBoost	3 scalars	0.0020	0.9450	1.69%
MLP	3 scalars	0.0018	0.9516	1.63%
CNN	128 × 128 × 3 image	0.0035	0.8402	3.56%
CBAM-CNN	128 × 128 × 3 image	0.0024	0.9275	2.23%
CNN-Transformer	128 × 128 × 3 image	0.0028	0.8961	2.70%
SS-CNN	128 × 128 × 3 image	0.0022	0.9360	2.03%

Table 11. Ablation study on attention mechanisms for total pressure loss coefficient prediction.

Model	Run	RMSE	R²	MAPE
CNN	1	0.0035	0.8402	3.56%
	2	0.0033	0.8614	3.48%
	3	0.0036	0.8373	4.00%
SAM-CNN	1	0.0027	0.9045	2.59%
	2	0.0030	0.8851	3.00%
	3	0.0032	0.8717	3.36%
SE-CNN	1	0.0031	0.8749	3.06%
	2	0.0028	0.8996	2.77%
	3	0.0028	0.8966	2.83%
CBAM-CNN	1	0.0024	0.9275	2.23%
	2	0.0026	0.9137	2.42%
	3	0.0026	0.9155	2.40%
SS-CNN	1	0.0022	0.9360	2.03%
	2	0.0022	0.9388	1.93%
	3	0.0024	0.9267	2.25%

Table 12. Comparison of Computational Cost Across Models.

	CNN	CBAM-CNN	SS-CNN	CNN-T
Gflops	3.4931	3.5199	3.4935	3.4921
Params size (MB)	4.2932 M	4.2966	4.2963	4.2935 M
Time(s)	0.57	0.56	0.53	0.36
Estimated Total Size (MB)	28.76	31.67	29.77	20.82

Table 13. Comparison of Loss and Flow Angle Before and After Optimization.

	Base $β_{y}$ : 26.9596; R_rle: 0.0084		After Optimization $β_{y}$ : 27.517; R_rle: 0.006087
attack	loss	outlet angle	loss	outlet angle
−4	0.0572	16.3505	0.0559	16.6744
−3	0.0572	16.4588	0.0555	16.7639
−2.83	0.0570	16.5113	0.0556	16.8835
−2	0.0573	16.6123	0.0560	16.9879
−1	0.0588	16.7607	0.0569	17.0459
0	0.0606	16.9552	0.0588	17.2831
1	0.0635	17.1954	0.0610	17.5937
2	0.0685	17.5050	0.0667	17.9262
3	0.0756	17.8555	0.0736	18.2483
4	0.0903	18.2818	0.0875	18.6944

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, J.; Yao, M.; Zhan, K.; Lu, Q. Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning. Appl. Sci. 2026, 16, 5062. https://doi.org/10.3390/app16105062

AMA Style

Zheng J, Yao M, Zhan K, Lu Q. Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning. Applied Sciences. 2026; 16(10):5062. https://doi.org/10.3390/app16105062

Chicago/Turabian Style

Zheng, Jiang, Mingming Yao, Kai Zhan, and Qingfei Lu. 2026. "Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning" Applied Sciences 16, no. 10: 5062. https://doi.org/10.3390/app16105062

APA Style

Zheng, J., Yao, M., Zhan, K., & Lu, Q. (2026). Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning. Applied Sciences, 16(10), 5062. https://doi.org/10.3390/app16105062

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Aerodynamic Prediction and Optimization of Compressor Stators Based on Deep Learning

Abstract

1. Introduction

2. Parametric Modeling and Automated Simulation Platform

3. Aerodynamic Performance Prediction Model

3.1. Evaluation Metrics

3.2. Model

3.2.1. CNN

3.2.2. SS-CNN

3.2.3. CNN-Transformer

3.3. Prediction Accuracy Analysis

3.4. Evaluation of Generalization

3.5. Comparison with Scalar-Input Baseline Models

3.6. Ablation Study on Attention Mechanisms

3.7. Computational Cost Analysis

4. Aerodynamic Optimization Design

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI