1. Introduction
Wind energy has become a crucial component of the global transition to renewable energy, driven by the demand for sustainable, low-carbon power generation [
1]. However, wind farm performance is significantly affected by wake effects, where downstream turbines experience reduced wind speeds and increased turbulence due to the influence of upstream turbines [
2]. To mitigate these effects, researchers have proposed deliberately adjusting the yaw of the turbine to redirect the wakes away from the downstream turbines [
3,
4,
5,
6,
7,
8,
9]. Accurate prediction of turbine wakes is essential for minimizing power losses and maximizing wind farm efficiency.
High-fidelity wind turbine models typically solve the momentum equations with turbine loads represented as body forces [
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]. One of the earliest and most widely used methods is the actuator disk model, which treats the rotor as a porous disk [
10,
11]. This approach was extended by modeling individual blades as lines [
12], and later refined into the actuator surface model, which projects blade geometry onto a 2D surface for improved force distribution [
18]. Further enhancements account for tower and nacelle effects using either the immersed boundary or actuator surface methods [
18,
21,
22].
Although high-fidelity models accurately capture flow in wind turbines and farms, their computational cost often limits their use in layout optimization and control. To address this, simplified analytical wake models have been developed. Early efforts to model wakes for turbine placement optimization were carried out by Jensen [
23]. This model assumes a uniform velocity profile behind the rotor that results in a top hat shape and a linearly expanding wake to account for energy entrainment. However, experimental measurements have shown that the top hat velocity profile is not realistic, and a Gaussian distribution provides a more accurate representation [
24]. To overcome this limitation, Bastankhah and Porté-Agel [
25] introduced an analytical model assuming a Gaussian velocity deficit in the wake. This model showed reasonable agreement with the experimental measurements of Chamorro and Porté-Agel [
26] and high-fidelity simulations of Wu and Porté-Agel [
16,
27].
To capture the effects of yaw misalignment and wake steering, several extensions of the Gaussian wake model have been proposed. Bastankhah and Porté-Agel [
28] modified the Gaussian wake model to account for the wake displacement due to yaw misalignment, which was shown to predict the wake redirection with fair accuracy for a single turbine. However, high-fidelity simulations have shown that yaw misalignment generates a counter-rotating vortex, leading to asymmetric wake deflection and secondary steering of downstream turbine wakes [
6]. To account for the secondary steering, the curled wake model of Martínez-Tossas et al. [
29,
30] was introduced into the Gaussian wake model to compute the cross-wind velocity and obtain an effective yaw angle for the computation of the wake displacement [
31].
While analytical wake models provide valuable insights, they often struggle to capture the complex flow dynamics of utility-scale wind farms. To address the limitations of conventional wake modeling, data-driven approaches can leverage high-fidelity numerical simulations and LiDAR measurements to improve wake characterization and enhance predictive accuracy. Renganathan et al. [
32] developed a machine learning (ML) model trained on LiDAR measurements collected from a wind farm in Texas. Their framework employed an autoencoder convolutional neural network (ACNN) to compress high-dimensional wake flow fields into a compact latent space. The encoder was subsequently replaced by a multi-layer perceptron and a Gaussian process regressor to map input parameters directly to the latent representation and reconstruct the original measurements. This approach, referred to as the neural compression paradigm, requires abundant high-fidelity supervision—5000 labeled instances were used in their experiments.
To reduce the need for extensive high-fidelity data, Zhang et al. [
33] and Santoni et al. [
34] proposed methods that exploit multi-fidelity training and leverage the super-resolution capabilities of ACNNs. These models are trained to reconstruct time-averaged high-fidelity velocity fields from their low-fidelity counterparts. Specifically, Zhang et al. used a limited number of instantaneous high-fidelity snapshots as degraded input data, while Santoni et al. employed velocity fields generated by the Gauss–curl hybrid (GCH) model developed by King et al. [
31] as low-fidelity inputs. Compared to the flow-to-latent mapping used in neural compression, the flow-to-flow mapping of the super-resolution paradigm is simpler and requires fewer high-fidelity examples to train.
However, super-resolution methods face two major limitations: (i) inference relies on the availability of low-fidelity simulations, which creates a bottleneck for real-time deployment; and (ii) training requires paired datasets with one-to-one correspondence between fidelity levels, which restricts data flexibility and limits applicability.
Recent works [
35,
36,
37,
38,
39] have explored applying transfer learning to high-dimensional multi-fidelity modeling in other scientific domains with promising results. These approaches first train a neural network surrogate model to map input parameters to low-fidelity simulations using abundant training data, and then fine-tune it to map input parameters to high-fidelity simulations with scarce high-fidelity supervision. This study seeks to develop a novel framework for wind farm surrogate modeling based on transfer learning, aiming to replace the super-resolution paradigm and overcome the aforementioned limitations. To facilitate parameter-to-wind-flow prediction, an efficient encoding scheme is proposed to enable ACNN processing of physical parameters. While this encoding enables parameter-to-flow predictions, empirical studies (
Section 3.4) show that vanilla transfer learning performs poorly in the context of wind farm mean flow modeling. It is hypothesized that this suboptimal performance arises from the heterogeneous spatial similarities between high- and low-fidelity simulations. As a solution, a more adaptive transfer learning approach is proposed—one that gradually transfers learned representations by initially focusing on regions with higher similarity and progressively adapting to regions with lower similarity. Empirical results confirm that the proposed method outperforms vanilla transfer learning and achieves an accuracy level sufficient to support downstream wind farm design and control tasks.
The developed ML model directly predicts the high-fidelity time-averaged velocity field from wind farm information, eliminating the need for simulations during inference and achieving real-time prediction. Furthermore, the model can be trained on any combination of data and can be extended to more than two levels of fidelity, without requiring correspondence between high-fidelity data and their degraded versions. This flexibility expands the range of potential training datasets.
The end-to-end network is fully differentiable, allowing for seamless integration into gradient-based control or optimization algorithms. This approach not only overcomes the computational bottlenecks of previous methods but also provides a scalable and versatile solution for time-averaged velocity prediction in complex scenarios. A comparison of training data requirements and inference characteristics between the proposed method and previous methods are provided in
Table 1 and
Table 2.
The main contribution of this paper is summarized as follows:
An adaptive transfer learning framework is proposed for high-dimensional multi-fidelity modeling, effectively integrating data from varying fidelity levels corresponding to distinct physical models.
A GPU-efficient scheme is designed to encode wind farm physical parameters into representations compatible with the ACNN, facilitating rapid data processing and real-time inference, thereby significantly reducing computational overhead.
By combining the adaptive transfer learning framework with the efficient encoding scheme, a surrogate model is developed for wind farm mean flow prediction. This model seamlessly integrates high-fidelity large eddy simulation (LES) data with low-fidelity engineering wake models, demonstrating effectiveness, generalizability, and extensibility.
The paper is organized as follows.
Section 2 describes the proposed methodology, including efficient parameter encoding, the neural network architecture, and the adaptive transfer learning framework.
Section 3 elaborates on the experimental setup, including data simulations, dataset creation, training specifics, and the obtained results. Finally,
Section 4 summarizes the findings and discusses potential future research directions inspired by this work.
2. Methodology
The proposed framework incorporates a novel neural network architecture and an improved training procedure. The network architecture is based on a U-Net backbone [
40], augmented with an efficient mechanism for encoding wind farm information at the input stage and two linear layers at the output for multi-task prediction. The training procedure enhances the standard transfer learning paradigm [
41] and is specifically tailored to address the challenges of high-dimensional transfer scenarios.
2.1. Efficient Encoding of Wind Farm
Previous studies have demonstrated the effectiveness of ACNNs for processing fluid velocity fields. However, these architectures are primarily suited for high-dimensional mappings, such as flow-field-to-flow-field transformations, and cannot directly process physical wind farm parameters, which are represented as a series of data in the form of , where denote the rotor center location and yaw angle of the i-th turbine in a wind farm with a total of N turbines. To address this limitation, a GPU-efficient encoding scheme that maps wind farm parameters to a high-dimensional space is proposed, enabling ACNN processing. This approach separates inputs into two groups—one representing wind conditions and the other describing the wind farm configuration, which are processed independently.
To approximate the free-stream flow field, the wind condition parameters are used in conjunction with the law of the wall. Specifically, the freestream velocity at height y is , where , , and are hub height, hub-height velocity, and wind shear. Although this provides only a coarse approximation of wind farm flow without turbines, it is computationally efficient on GPUs and is a robust representation for ACNN-based learning.
The wind turbine placement is encoded by representing each turbine as a 3D Gaussian-shaped distribution in the wind farm. Each Gaussian has its mean at the turbine center and variance determined by the yaw angle.
The encoded value at position
of the entire simulation domain for a set of
N turbines located at
with yaw angles
is given by:
where
x,
y, and
z are the positions along the free-stream flow direction, height, and spanwise direction, respectively.
A,
B, and
C are selected based on characteristic scale for each dimension, and
is the rotor yaw angle of turbine
i. As the mean represents the center of the turbine, the covariance of the encoding effectively rotates the Gaussian according to the yaw direction of the turbine, allowing the representation to be both direction-aware and spatially continuous. It is worth noting that this representation is not a probability density function, as the total sum equals the number of turbines and thus depends on the configuration, rather than being normalized over space.
Figure 1c shows the encoded representation of a three-turbine system at hub height, and the zoom-in view highlights a turbine yaw of
. Working with object locations and poses as Gaussian distributions is a common technique in computer vision and facilitates neural network learning. Additionally, this representation aligns with the velocity deficit observed in wind turbine wakes, as demonstrated by experimental and field measurements [
24,
26,
42]. This alignment helps simplify the learning process.
Both encoding processes are computationally efficient and parallelizable on GPUs. They involve only basic arithmetic operations, which are highly optimized in modern deep learning frameworks like PyTorch (v2.0.1) [
43]. Specifically, they are implemented using standard PyTorch (v2.0.1) tensor operations—including element-wise addition, subtraction, exponentiation, and power transformation—which are GPU accelerated and fully integrated into the computational graph. The only data transfer between CPU and GPU consists of copying the physical input parameters, which is minimal in size. Once these parameters are on the GPU, all subsequent encoding computations and neural network operations are executed entirely within the GPU memory without requiring additional CPU–GPU communication.
The computational graph in this context refers to the directed graph of tensor operations that define the complete data flow—from the raw input parameters through the proposed encoding mechanism to the neural network’s outputs and ultimately to the loss function. This graph is automatically constructed by PyTorch during the forward pass, with each tensor operation registered as a node. Because the entire process—including the encoding—is built from differentiable operations, the backward pass can seamlessly compute gradients across the whole graph using automatic differentiation. While automatic differentiation facilitates neural network training, it also provide direct gradient calculation with respect to wind farm layouts during inference and can be leveraged for gradient-based wind farm design and control.
2.2. Model Architecture
Figure 1 illustrates the general framework of the proposed machine learning model. After encoding the wind farm parameters into an ACNN-processable representation, they are concatenated and passed through a U-Net backbone [
40].
Figure 1a shows the transformation of the wind farm representation within the U-Net architecture.
U-Net follows an encoder-decoder design with skip connections that directly transfer feature maps from corresponding layers of the encoder to the decoder. This structure helps preserve spatial information essential for capturing small-scale fluid features. The encoder extracts high-level features from input data through a series of convolutional layers, each followed by ReLU activations and max-pooling operations. These layers progressively reduce spatial dimensions while increasing feature depth, enabling the model to learn abstract representations of fluid flow patterns. The bottleneck connects the encoder and decoder, offering the deepest layer where the model consolidates global and local features. This stage captures complex interactions. The decoder gradually reconstructs the spatial resolution using upsampling layers. Skip connections from the encoder reintroduce high-resolution features, allowing the model to reconstruct detailed flow features that might otherwise be lost due to downsampling.
Finally, the U-Net output is processed by two linear prediction heads (equivalent to two
convolutional layers). The outputs from these two prediction heads are trained under the supervision of different fidelity datasets, as specified by the multitask loss in Equation (
2).
2.3. Adaptive Transfer Learning for High-Dimensional Multi-Fidelity Modeling
In this section, a preliminary discussion on few-shots learning is given to contextualize the ML problem settings. The algorithm section provides a detailed description of the method.
2.3.1. Preliminary
Multi-fidelity surrogate modeling aims to approximate complex systems governed by partial differential equations by leveraging a combination of a few high-fidelity data points and abundant low-fidelity data points. Traditional approaches such as multi-fidelity Kriging (MFK) [
44,
45] integrate simulations of different fidelity levels using an auto-regressive Gaussian process framework. While MFK has been widely used for low-dimensional problems, it does not scale well for high-dimensional settings due to the curse of dimensionality.
Few-shot transfer learning aims to adapt a model pre-trained on large-scale datasets for a downstream task with a limited amount of data. Such adaptation is often realized by fine-tuning. Intuitively, the fine-tuned model will have better performance if the pre-trained features are more related to the downstream task. Zhou et al. [
46] provide an upper bound on the test error of fine-tuning an empirical risk minimizer (ERM), which depends on the
-distance between pre-trained model weights and fine-tuned model weights. Hu et al. [
47,
48] explicitly define a model-agnostic method to calculate “task distance” as a measurement for task similarity in classification problems. Zamir et al. [
49] and later works [
50,
51,
52] consider heterogeneous similarities in the pre-train task. They try to divide pre-training tasks into different subsets of tasks based on task similarity and choose the best subset of pre-training sub-tasks for different downstream tasks. These methods consider the heterogeneous similarity in pre-training tasks and focus on building a better pre-trained model by choosing more related pre-training sub-tasks and abandoning less related sub-tasks. They consider the downstream task as a whole and compare it to different subsets of pre-training tasks. The method proposed in this work considers the similarity of the physics underlying the high- and low-fidelity simulations at different regions of the simulation for the same task and proposes a better fine-tuning strategy.
Multi-fidelity surrogate modeling through transfer learning pre-trains a deep neural network on low-fidelity data and fine-tunes the network on high-fidelity data. ACNNs have shown great ability as data-driven surrogate models. The computational cost of collecting enough training data for neural-network training urges us to adapt these data-driven models to multi-fidelity surrogate modeling. Transfer learning is a natural approach and has been studied in different problems [
35,
36,
37,
38,
39]. Unlike passive learning, multi-fidelity active learning [
53,
54,
55] tries to balance between information gain and computational cost and actively decides the fidelity level of the next data to acquire. However, all of these works treat “multi-fidelity” as “multi-resolution” for experiments with high-dimensional outputs and collect data from the same algorithms with different resolutions of grids.
In the context of wind farm mean flow modeling, low-fidelity simulations are derived from simplified models compared to high-fidelity simulations, neglecting or approximating difficult-to-compute physics. As a result, they offer significant speed improvements beyond merely using coarser grids.
A key challenge arises because the wind farm flow field comprises distinct regions—such as free stream interacting with turbines, newly formed wakes, far wakes, and wakes interacting with downstream turbines—and each of these regions is affected differently by the physics neglected in low-fidelity models. While previous studies have successfully applied vanilla transfer learning in multi-resolution settings, empirical results (
Section 3.4) confirm that standard transfer learning performs poorly under these conditions.
The hypothesis is that the relationship between the pre-trained features and the high-fidelity data varies across these different regions of the flow field. In regions where critical physical processes are missing or heavily approximated in the low-fidelity simulations, the pre-trained features are poorly aligned with the true high-fidelity behavior, and fine-tuning must effectively relearn the correct features. However, in regions where the preserved physics in low-fidelity simulations still closely match the high-fidelity data, the pre-trained features remain useful for accurate prediction. Unfortunately, indiscriminate fine-tuning across all regions risks corrupting these useful representations, ultimately degrading performance in areas where transfer learning should offer benefits.
2.3.2. Algorithm
To address the issues mentioned above, an adaptive transfer learning method based on a multi-task network is proposed. The key idea is to gradually adjust model flexibility by increasing the weight on high-fidelity outputs and reducing the weight on low-fidelity outputs during training. This progressive fine-tuning allows the model to first fit highly related regions before adapting to less related ones.
Additionally, patch-level pseudo-high-fidelity data [
56] are selected to stabilize training. These patch-level pseudo-data are generated during the gradual relaxation of constraints, ensuring that when the model gains flexibility, the uncertainty in well-fitted regions does not increase. This method effectively mitigates the corruption of pre-trained features, leading to more accurate and robust predictions across diverse multi-fidelity scenarios.
The model constraint and pseudo-data selection are implemented as described in Algorithm 1. Let
be the low-fidelity dataset,
be the high-fidelity dataset,
be some neural network backbone, and
and
be two linear layers, while
are the low-fidelity predictions and
are the high-fidelity predictions. The network is trained to minimize a multi-task loss:
where
is a constraint control parameter. When
, the model matches low-fidelity simulations, and when
, it matches high-fidelity ones. Using only
without pseudo-labeling corresponds to vanilla transfer learning. As
, the model minimizes
subject to
, where
is the pseudo-inverse of
, enforcing high-fidelity predictions as linear transformations of low-fidelity ones. For intermediate
, the multi-task loss constrains high-fidelity predictions roughly within a subspace spanned by a linear transformation of low-fidelity data. In the proposed method,
is gradually decreased from 1 to 0, training the model to convergence at each step.
To select patch-level pseudo-data, high-dimensional outputs are partitioned into patches. Let V be the output space, be some partition over V, and be the relative volume of with respect to V. For some patches and some dissimilarity measurement , the patch-level dissimilarity is . When y for x is not available, estimated is used.
As flexibility gradually increases, pseudo-data need to be collected at the patches where high- and low-fidelity simulations are less relevant. Therefore, an adaptive threshold
is used, where
is the constraint control parameter and
and
are some pre-selected threshold parameters. If the estimated dissimilarity is smaller than this threshold at patch
v of input
x,
is added to the high-fidelity training set.
Algorithm 1 Adaptive Transfer Learning |
- 1:
Input: Low-fidelity dataset , high-fidelity dataset , neural network backbone , linear layers and , multi-task loss function , dissimilarity measurement , output space partition , threshold parameters , - 2:
for to 0 do - 3:
Minimize - 4:
if then - 5:
for each do - 6:
for each do - 7:
if then - 8:
Add to - 9:
end if - 10:
end for - 11:
end for - 12:
end if - 13:
end for
|
After this, patch-level pseudo-data are added to the training set. Not all data points in high-fidelity datasets have ground truth over the entire output space
V. The new high-fidelity training set is represented as
, where
is the confident patches in
V for input
x.
if the data point is from the original training set. The multi-task loss will be modified as
where
represents the indicator function of confident patches.
4. Conclusions
This study presents an adaptive multi-fidelity framework for high-dimensional surrogate modeling, extending traditional transfer learning to address complex scenarios where simulations of varying fidelities involve distinct physics. Applied to wind farm mean flow prediction, the framework integrates the U-Net architecture and a novel encoding scheme for wind farm physical parameters, utilizing sparse high-fidelity data alongside abundant low-fidelity data. By adaptively regulating the similarity between high- and low-fidelity predictions and enriching the training set with patch-level synthesized pseudo-high-fidelity data, the model achieves real-time high-fidelity inference with demonstrated generalizability and extensibility, surpassing traditional paradigms including neural compression and super resolution.
The performance of the model is evaluated under two different scenarios: one involving variations in wind direction and wind speed, and the other involving different yaw angles and wind speeds. For both scenarios, only three high-fidelity data points are used for training, with , and the model is trained for 100,000 iterations using the Adam optimizer with a learning rate of for each value of . Once trained, the model requires no additional high- or low-fidelity simulations during inference, making the inference speed GPU hour per instance on a single RTX A6000 GPU. The CPU workload is negligible, as it is solely used for transferring wind farm parameters to the GPU.
Results demonstrate that the proposed model closely resembles high-fidelity simulations, achieving an average error rate of approximately in close-rotor regions, with no single case exceeding a error. It also successfully captures features absent in low-fidelity simulations, such as nacelle effects and the asymmetry between positive and negative yaw angles. Additionally, it exhibits strong generalizability, accurately predicting cases with physical parameters not present in any high-fidelity training data. Beyond the main results, ablation and extensibility studies were conducted to highlight the method’s advantages over vanilla transfer learning and its ability to handle varying numbers and types of fidelity sources.
The framework’s real-time inference capability and full differentiability, enabled by neural network backpropagation, offer promising avenues for wind farm control co-design optimization, integrating real-time decision making with flow modeling.
Additionally, its ability to handle diverse fidelity sources opens the possibility of leveraging extensive data from various engineering models, simulations, and LiDAR measurements to develop a robust foundational wind farm model in the future. In the broader machine learning community, foundational models—large, general-purpose models pre-trained on diverse datasets and adaptable to a wide range of downstream tasks—are transforming domains such as natural language processing and computer vision. By enabling the integration of heterogeneous wind farm data across fidelities and physics, our framework lays the groundwork for such a foundational model in wind farm modeling, offering robustness, transferability, and scalability that go beyond existing multi-fidelity approaches.