1. Introduction
In aircraft design, the ability to rapidly and accurately determine aerodynamic parameters holds paramount importance, as these parameters critically influence performance optimization, safety evaluation, and operational efficiency. Conventional approaches for acquiring such parameters, primarily flight experiments, computational fluid dynamics simulations (CFD) [
1,
2], and wind tunnel testing [
3,
4]. However, these methods are constrained by prohibitive costs, extended timelines, and technical complexities, thereby underscoring the urgency for more efficient methodologies.
As computer science continues to advance, deep learning techniques based on large datasets have been widely applied across various domains [
5,
6]. Following this trend, many scholars have employed data-driven approaches for aerodynamic parameter prediction. Tao [
7] proposed a backpropagation neural network integrated with a genetic algorithm for predicting aerodynamic parameters. Zheng [
8] proposed an intelligent aerodynamic parameter identification method based on bidirectional long short-term memory (LSTM) networks. Yuan [
9] used multi-task learning neural networks (MTLNNs) to directly predict the aerodynamic parameters of missiles. However, these methods are constrained by limited data sources, preventing them from utilizing multi-source data available in flight environments to enhance prediction accuracy.
Among these developments in deep learning, the evolution of convolutional neural networks (CNNs) is particularly representative [
10,
11]. Yilmaz and German [
12] developed a CNN approach to predict airfoil pressure distributions based directly on geometric coordinates. Zhang [
13] employed CNNs to learn the lift coefficients of airfoils with various shapes under multiple combinations of Mach numbers, Reynolds numbers, and angles of attack. Zhu [
14] incorporated the angle of attack and Mach number as RGB primary color values into the aircraft configuration images to predict the lift and drag coefficients of the aircraft. Chai [
15] proposed a method based on data transformation and CNNs, establishing a nonlinear mapping relationship between ice shapes and aerodynamic parameters. However, these methods only utilize image information and lack physical constraints when applied in complex environments.
The attention mechanism allows models to selectively focus on important parts of input data, thereby enhancing their ability to process complex information and improving model performance. Bahdanau [
16] first proposed the attention mechanism and applied it to neural machine translation tasks, after which Vaswani et al. [
17] introduced the transformer architecture, revolutionizing the field of deep learning. Zuo [
18] introduced a deep attention network-based method for the rapid reconstruction of incompressible steady flow fields around airfoils. Wang [
19] proposed a self-attention generative network called SAG-FlowNet, addressing the limitations of CNN methods in capturing global flow field features in airfoil design. However, attention mechanisms have been rarely applied to aerodynamic parameter prediction. To our best knowledge, there is currently no research applying cross-attention mechanisms to this task.
Physics-informed neural network (PINNs) [
20] integrate physical constraints into the loss function, offering advantages such as low data dependency and guaranteed physical consistency, making them particularly suitable for aircraft parameter prediction tasks characterized by limited experimental data and stringent physical constraints. Xiao [
21] investigated the application principles of hard boundary constraints in PINN-based fluid flow prediction. Ren [
22] extended PINNs to compressible steady aerodynamic flow learning and prediction under high Reynolds numbers, enhancing their capability in modeling complex flow scenarios. Lin [
23] applied a PINN to missile aerodynamic parameter prediction with comparative analysis against traditional MTLNNs. Liu [
24] proposed an innovative airfoil shape optimization framework combining CNNs, PINNs, and deep reinforcement learning, demonstrating significant improvements in aerodynamic performance. However, the physics laws currently applied to PINNs are still relatively simple. Meanwhile, the majority of existing studies have not yet attempted to simultaneously process both image and state information.
This paper proposes an innovative ISA-PINN model, which creatively integrates image and state information using cross-attention mechanisms and a convolutional module, while incorporating a PINN as a physical constraint to ensure accurate aerodynamic parameter prediction. First, the aircraft’s geometric parameters, flight conditions (angle of attack and Mach number), and aircraft configuration images serve as inputs. These are processed through multi-head cross-attention modules (MHCA) and a convolutional module, while integrating physics-informed networks as loss constraints, to generate the corresponding aerodynamic parameters. The main contributions of this paper are as follows:
To our best knowledge, we are the first to incorporate cross-attention mechanisms into a PINN for aerodynamic parameter prediction tasks. Based on current flight condition states and aircraft configurations, the network can simultaneously attend to information across multiple dimensions, enabling the model to accurately capture key patterns in complex fluid dynamics phenomena.
We designed an innovative architecture for processing image and state information. Through cross-attention modules and a convolutional module, our approach effectively fuses image and state information, improving prediction accuracy compared to using state information alone.
We demonstrated the effectiveness of this innovative architecture through experiments and verified the impact of different modules through ablation studies. This architecture achieves improved accuracy compared to previous methods for aerodynamic parameter prediction.
The structure of this work is arranged as follows:
Section 2 reviews related work in the field.
Section 3 details the overall architecture of the ISA-PINN framework, along with its attention mechanism module, image state fusion module, and PINN module.
Section 4 compares the effectiveness of three methods and evaluates the performance of each module, while also designing experiments to validate the optimal number of heads for the MHCA module and the generalization capability of the model.
Section 5 summarizes the completed work.
4. Results and Discussion
This section evaluates our ISA-PINN model through the following experiments: the details of the datasets used in our experiments (see
Section 4.1); performance assessment against experimental data and convergence metrics (see
Section 4.2); selection of the optimal number of heads for the MHCA model (see
Section 4.3); visualization of module-specific attention patterns and feature transformations (see
Section 4.4); ablation studies quantifying individual component contributions (see
Section 4.5); investigation of model generalization capability (see
Section 4.6). These results validate the effectiveness of our model for aerodynamic parameter prediction.
4.1. Dataset
To achieve accurate prediction of aircraft aerodynamic characteristics, we employed the industry-standard aerodynamic analysis software DATCOM (1997 FORTRAN 90 version) for dataset generation. DATCOM offers distinct advantages through its empirical and semi-empirical methodologies, providing rapid aerodynamic coefficient estimation across diverse flight regimes. Its proven reliability in preliminary aircraft design makes it an ideal data source, delivering consistent aerodynamic solutions while avoiding the prohibitive computational costs of high-fidelity CFD simulations. Using DATCOM, we established a comprehensive dataset of 116,000 high-fidelity samples spanning diverse aircraft configurations and their corresponding performance metrics. Each sample consists of two complementary components: a 16D numerical feature vector describing the aircraft’s geometric and flight parameters (detailed in
Table 1), and a corresponding 2D cross-sectional image representation (illustrated in
Figure 6).
The dataset incorporates 3125 distinct aircraft configurations, with each configuration evaluated across a comprehensive flight envelope defined by eight angles of attack ( ranging from 0° to 35° in 5° increments) and five Mach numbers ( = 0.5, 0.8, 1.2, 2, and 2.5). This systematic sampling strategy yields 40 operating points per configuration, enabling our models to capture both subsonic and supersonic aerodynamic behaviors across various attitude conditions.
Prior to model training, numerical features were normalized using min–max scaling to mitigate the impact of different physical units, while image data underwent standardization procedures to ensure consistent input quality. The dataset was partitioned into training (90%), validation (5%), and test (5%) sets using stratified sampling based on and configuration distributions, ensuring a representative assessment of model performance across the entire design and flight envelope space.
4.2. Performance Evaluation and Comparative Analysis
We conducted a comprehensive evaluation of the ISA-PINN model using rigorous experimental protocols. The model was trained on a dataset comprising 116,000 samples over 50 epochs using a hybrid optimization approach that alternated between Adam (learning rate: ) and L-BFGS (learning rate: ) optimizers. Network parameters were initialized using Xavier initialization. Hyperbolic tangent (tanh) activation functions were employed throughout the network architecture. To prevent overfitting, regularization with a coefficient of was applied to all trainable parameters.
Performance assessment was conducted using the root mean square error (RMSE) metric, defined as follows:
Additionally, mean relative error (MRE) was used, defined as follows:
where
represents the ground truth value, and
denotes the predicted value for the
i-th sample.
Figure 7 illustrates the comparative analysis between predicted and experimental values for six critical aerodynamic parameters across the angle of attack. The model demonstrates excellent predictive capability. The quantitative performance assessment yields impressively low error metrics across all key aerodynamic parameters, with RMSE and MRE values, respectively, of the normal force coefficient (
= 0.00376, 0.90%), pitching moment coefficient (
= 0.00322, 0.85%), axial force coefficient (
= 0.00412, 0.97%), center of pressure position (
= 0.00159, 0.74%), lift coefficient (
= 0.00439, 1.02%), and drag coefficient (
= 0.00312, 0.89%). These MRE values, all below 1.1%, confirm the model’s high prediction accuracy.
Figure 8a depicts the convergence characteristics of the model through the evolution of data loss, physics-informed loss, and total loss during the training process.
Figure 8b provides deeper insight by decomposing the physics-informed loss into its constituent components, representing the three fundamental aerodynamic constraints incorporated into the model. The convergence behavior reveals that the model achieves stability after approximately 30 epochs, with the physics-informed loss terms asymptotically approaching small values, indicating successful integration of physical principles into the learning process.
Figure 9 presents a comparative analysis of prediction accuracy for six critical aerodynamic parameters across multiple test conditions. The MTLNN provides baseline prediction capabilities, while the PINN incorporates governing equations through additional loss terms. As illustrated in the figure, both MTLNN and PINN models manifest considerable error values at particular angles of attack relative to the ISA-PINN methodology.
Figure 10 compares the constraint error of the physical constraint
across three models at different angles of attack. The ISA-PINN model maintains the lowest constraint error throughout the entire angle range, with error values consistently below 0.05. In this set of experiments, the PINN performs poorly in the low angle region (0°), while the MTLNN exhibits large constraint error peaks in the medium to high angle regions (10° and 30°), reaching as high as 1.2. This demonstrates that the enhancement of the PINN through the ISA-PINN approach is effective. This also indicates that solely using physical information as a training constraint may result in an inability to balance data loss and physical information loss at certain points, such as at an angle of attack of
.
The bar chart in
Figure 11 clearly demonstrates that the proposed ISA-PINN model achieves superior performance across all parameters, with an RMSE of 0.00335, compared to 0.00433 for the PINN (29.25% improvement) and 0.00573 for the MTLNN (71.04% improvement). Similarly, the MRE comparison in
Figure 12 confirms this trend, with the ISA-PINN achieving an average MRE of only 0.895%, while the PINN and MTLNN demonstrate higher error rates of 1.235% and 1.678%, representing performance gaps of 37.99% and 87.49%, respectively. The ISA-PINN model demonstrates overall improvement compared to the MTLNN, and through additional physical constraints, it achieves significant enhancements in the prediction of
,
, and
compared to the baseline PINN. Additionally, analyzing these results in conjunction with
Figure 10, the incorporation of aircraft configuration images as input augments the model’s capability to predict non-physically constrained parameters.
To assess model scaling efficiency, we analyzed six training sets with sample sizes ranging from 10,000 to 116,000.
Figure 13 reveals that while accuracy improves with more training data, the benefits diminish notably beyond 50,000 samples. Increasing from 50,000 to 116,000 samples yields only 8–12% RMSE reduction while training time increases linearly (reaching 6 min per epoch at maximum size). This identifies an optimal operational point at approximately 50,000 samples, where computational cost and prediction accuracy are effectively balanced for practical applications.
4.3. The Optimal Number of Attention Heads
We employed the supervised K-means clustering algorithm to perform clustering separately on two sets of relationships: first, between flight conditions and aerodynamic parameters, and second, between aircraft configurations and aerodynamic parameters.
Figure 14 shows the silhouette scores for the clustering of flight conditions and aerodynamic parameters. It can be seen that four clusters yield the best performance with a silhouette score of 0.364, though six clusters also present a viable alternative with a comparable score of 0.353, suggesting multiple effective partitioning schemes exist for flight condition data.
Figure 15 shows the silhouette scores for the clustering of aircraft configurations and aerodynamic parameters, with four clusters providing markedly superior performance (0.205) compared to other cluster numbers. This distinct peak in the silhouette score indicates that aircraft configuration data have an intrinsic structure that is optimally captured with four clusters, while silhouette scores significantly decrease when the number of clusters exceeds six.
Further analysis of these figures reveals distinct patterns in aerodynamic behavior. In
Figure 16, the radar chart demonstrates how aerodynamic parameters (
,
,
,
,
, and
) vary across different flight conditions, with Cluster 3 exhibiting notably higher values for
and
. Similarly,
Figure 17 illustrates the influence of aircraft configurations on aerodynamic characteristics, where Cluster 4 demonstrates markedly broader distribution across all parameters, particularly in the
and
regions. This clustering analysis effectively captures the fundamental relationships between flight conditions, aircraft configurations, and their corresponding aerodynamic responses, providing valuable insights for subsequent modeling efforts.
For the selection of the number of attention heads, we used the MRE value of the parameter as the evaluation metric. We tested the first and second MHCA modules separately. When testing one module, the number of heads for the other module was set to the optimal number determined by the supervised K-means clustering algorithm.
Figure 18 shows the selection process for the first MHCA module, where four heads proved to be the optimal choice. Compared to using eight heads, the MRE improved from 0.89 to 0.87, indicating that dividing flight states into four clusters is optimal.
Figure 19 shows the selection process for the second MHCA module, which aligns with the optimal algorithm’s calculation. Both four and six heads emerged as optimal solutions. Due to the complexity of aircraft configurations, increasing the number of heads beyond four does not significantly reduce performance. Without further explanation, in other experiments, we set the number of attention heads to four for both MHCA modules.
4.4. Module Effectiveness Analysis
The effectiveness of the first MHCA module is evidenced by the specialized attention patterns shown in
Figure 20. The visualization reveals that each attention head autonomously learns to focus on distinct aerodynamic regimes defined by combinations of the angle of attack (
) and Mach number (
). Heads 1 and 2 demonstrate strong attention activation for high-angle, transonic conditions (
= 35°,
= 1.2), while head 3 specializes in supersonic flow regimes (
= 2.5). Head 4 exhibits more distributed attention across multiple subsonic and transonic conditions.
Figure 21 presents the attention weight distribution across flow regions for a moderate-angle, transonic flight condition (
,
). The radar visualization reveals distinct specialization patterns: head 3 demonstrates the strongest focus on space 1, head 1 prioritizes space 2, head 4 shows light activation in space 3, while head 2 exhibits more balanced distribution with a slight preference for space 4. This complementary attention allocation illustrates how the multi-head mechanism effectively partitions the feature space to capture different aspects of the flow field. The average pattern (shown in red) indicates a relatively balanced distribution across all spaces.
Figure 22 demonstrates the effectiveness of the multi-head attention mechanism across different flight regimes. The radar chart reveals distinct specialization patterns for each attention head when processing aerodynamic information. Head 1 focuses primarily on space 1 and space 2, while Head 2 prioritizes space 3 with moderate attention to space 1. Head 3 exhibits strong attention specifically to space 1, and Head 4 shows a more balanced attention pattern with emphasis on spaces 2 and 4. The average attention (shown in red) indicates an overall preference toward space 1 and space 3. This specialized attention distribution highlights the module’s capability to extract and process geometry-specific features, enabling comprehensive interpretation of how different aircraft structural elements influence aerodynamic behavior.
Figure 23 illustrates feature activation evolution through our image-information fusion module. Starting from the aircraft silhouette (stage 0), the module progressively refines representations across three stages. Stage 1 highlights the primary fuselage structure, while stage 3 reveals sophisticated activation patterns around aerodynamically critical surfaces. This progression demonstrates the module’s ability to systematically extract and enhance boundary-sensitive features essential for accurate aerodynamic prediction.
To rigorously evaluate the contribution of the PINN module, we compared the performance of our complete model against a model without the physical constraint mechanisms provided by the PINN module.
Figure 24 presents experimental values alongside predictions from both the complete ISA-PINN model and the physics-uninformed model across different angles of attack, which reveals that the physics-uninformed model exhibits substantially higher error metrics for critical stability parameters (
,
, and
). Interestingly, the differences are less pronounced for integrated force coefficients (
,
, and
). This performance disparity highlights the crucial role of physics constraints in accurately modeling complex force and moment characteristics.
4.5. Ablation Studies
To validate the effectiveness of our proposed architecture, we conducted comprehensive ablation studies across five model configurations: a full model, a model without the first MHCA module (no first MHCA model), a model without the second MHCA module (no second MHCA model), a model without both MHCA modules (neither MHCA model), and a model without the image-information fusion module (no image model).
Figure 25 presents the training loss comparison across these configurations. The full model consistently demonstrates superior convergence behavior throughout the training process. Models with partial MHCA modules show intermediate performance, while models without both MHCA modules exhibit the highest loss values and notable instability during training, particularly in the physics constraint loss component. This indicates that the multiple patterns of data extracted by the MHCA module can effectively guide the direction of data training. This performance degradation confirms that our MHCA module significantly enhances training stability and convergence efficiency. Interestingly, the no image configuration shows competitive final performance despite slower initial convergence, suggesting that while visual features provide valuable initialization guidance, the model can still extract critical aerodynamic patterns from alternative input modalities.
Figure 26 quantifies each component’s contribution through final loss comparison across model variants. The full model achieves the lowest total and physics losses (0.26 and 0.02), while the model without either MHCA modules shows markedly degraded performance (0.41 and 0.09), particularly in satisfying physical constraints. Individual attention mechanism removal (no first MHCA model and no second MHCA model) results in a moderate performance decline. The no image variant maintains reasonable performance (0.33 total loss) despite lacking visual input, confirming that while visual features enhance results, the MHCA module plays the critical role in ensuring physical consistency and prediction accuracy in aerodynamic flow modeling.
Table 2 shows the performance comparison of different model architectures. From the RMSE and MRE metrics, it can be seen that the full model performs best. The first MHCA module plays a significant role in reducing MRE, while the effect of the second MHCA module is similar to removing both MHCA modules. This might be because the second MHCA processes information in the middle to later stages of the entire model, and for data that have already been highly feature-extracted, the cross-attention mechanism based on the aircraft shape provides limited performance improvement. Meanwhile, ablation studies reveal that removing individual attention components incrementally degrades accuracy, while simultaneously removing both MHCA modules causes significant training instability and convergence issues, which also indirectly demonstrates the role of data processing by the MHCA module in stabilizing training.
Although the model without the image-information fusion module offers 60.5% reduced training time (from 3.8 h to 1.5 h), this comes at a substantial cost to model accuracy and stability, confirming the critical role of our attention-based fusion approach. It is worth noting that the computation time of traditional CFD methods typically requires tens to hundreds of hours [
31]. Although our ISA-PINN model increases the computation time compared to the original PINN, it can still save computational time by two orders of magnitude, greatly improving the efficiency of aerodynamic parameter prediction for aircraft.
4.6. Model Generalization Capability
Model generalization capability refers to the ability of a machine learning model to maintain good predictive performance when facing unseen data. A model with good generalization capability not only performs well on training data but also maintains stable prediction accuracy on test sets or in practical application scenarios. This study evaluates the generalization capability of the proposed method by comparing the performance differences between training and test sets.
Figure 27 shows the comparative predictions of the training set and test set for six aerodynamic parameters, while
Figure 28 shows the RMSE and MRE of aerodynamic parameters under generalization testing. As can be observed, due to the incorporation of physical information constraints and additional image information as auxiliary inputs, the ISA-PINN model demonstrates relatively strong generalization capability. The overall generalization errors of the model average 6.12%. Comparatively, the generalization errors for
,
, and
are smaller, while the generalization errors for
,
, and
are larger. Possible reasons for this difference include physical constraints, which may be more effective in capturing the underlying principles governing
,
, and
, making these parameters more predictable in unseen conditions. The complexity of the physical relationships governing
,
, and
might require more training data or different model architectures to achieve the same level of generalization performance, or they might be more sensitive to subtle variations in flow conditions that were not fully represented in the training dataset. Overall, the ISA-PINN demonstrates good generalization capability.
5. Discussion
In this work, an ISA-PINN model was proposed for aerodynamic parameter prediction of aircraft, integrating two MHCA modules, an image-information fusion module, and PINN methodology. The model demonstrates accurate prediction capabilities across various aircraft configurations, offering a novel approach for rapid aerodynamic computation. The proposed MHCA module dynamically extracts latent features from pattern data while adjusting focus on relevant target information dimensions. The image-information fusion module integrates multi-scale geometric information from aircraft images to enhance prediction accuracy. Additionally, physics-informed components ensure predictions comply with fundamental aerodynamic principles, maintaining physical consistency despite limited training data. Experimentally, the ISA-PINN demonstrated improvements of 29.25% in RMSE and 37.99% in MRE compared to the baseline PINN. Meanwhile, a mere 6.12% increase in error on the test set validates the model’s strong generalization capability.
Future research directions include incorporating high-fidelity CFD data to improve prediction reliability and utilizing generative networks to synthesize additional high-confidence data from limited CFD samples. Additionally, the investigation of optimal attention module placement within the model architecture and the use of more complex physical constraints present an opportunity for performance enhancement. These avenues of exploration hold significant promise for advancing the capabilities and applications of aerodynamic prediction frameworks.