A Data-Driven Observer for Wind Farm Power Gain Potential: A Sparse Koopman Operator Approach

Yue Chen; Bingchen Wang; Kaiyue Zeng; Lifu Ding; Yingming Lin; Ying Chen; Qiuyu Lu

doi:10.3390/en18143751

,

and

¹

Power Dispatch Control Center, Guangdong Power Grid Company Ltd., Guangzhou 523000, China

²

Department of Electrical Engineering, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Energies2025, 18(14), 3751;https://doi.org/10.3390/en18143751

This article belongs to the Special Issue Modeling, Control and Optimization of Wind Power Systems

Version Notes

Order Reprints

Review Reports

Abstract

Maximizing the power output of wind farms is critical for improving the economic viability and grid integration of renewable energy. Active wake control (AWC) strategies, such as yaw-based wake steering, offer significant potential for power generation increase but require predictive models that are both accurate and computationally efficient for real-time implementation. This paper proposes a data-driven observer to rapidly estimate the potential power gain achievable through AWC as a function of the ambient wind direction. The approach is rooted in Koopman operator theory, which allows a linear representation of nonlinear dynamics. Specifically, a model is developed using an Input–Output Extended Dynamic Mode Decomposition framework combined with Sparse Identification (IOEDMDSINDy). This method lifts the low-dimensional wind direction input into a high-dimensional space of observable functions and then employs iterative sparse regression to identify a minimal, interpretable linear model in this lifted space. By training on offline simulation data, the resulting observer serves as an ultra-fast surrogate model, capable of providing instantaneous predictions to inform online control decisions. The methodology is demonstrated and its performance is validated using two case studies: a 9-turbine and a 20-turbine wind farm. The results show that the observer accurately captures the complex, nonlinear relationship between wind direction and power gain, significantly outperforming simpler models. This work provides a key enabling technology for advanced, real-time wind farm control systems.

Keywords:

wind farm control; active wake control; Koopman operator; dynamic mode decomposition; sparse identification; data-driven observer; surrogate model

1. Introduction

Wind energy is a critical pillar in the global transition toward sustainable energy systems, with its installed capacity experiencing continuous and rapid growth [1]. Maximizing the operational efficiency of wind farms is essential for ensuring the economic viability of these large-scale investments. A primary factor limiting this efficiency is the complex aerodynamic interaction between turbines, commonly known as the wake effect. Upstream turbines extract kinetic energy from the wind, generating downstream regions characterized by reduced wind speeds and heightened turbulence intensity. These wakes adversely affect both the power production and the structural fatigue loads on subsequent turbines as reviewed by Göçmen and Giebel [2] and Porté-Agel et al. [3]. Therefore, the accurate modeling and effective mitigation of these wake-induced losses represent a grand challenge in wind energy science [4], with direct implications for maximizing annual energy production and reducing the overall levelized cost of energy.

In response to this challenge, active wake control (AWC) has emerged as a paradigm-shifting strategy. Instead of the conventional “greedy” approach—a decentralized control logic where each turbine individually maximizes its own power output without considering its impact on other turbines, typically by facing directly into the wind—AWC coordinates the actions of multiple turbines to optimize the collective performance of the entire farm. Several AWC strategies have been proposed. Induction control, for instance, involves adjusting turbine operational setpoints like the thrust coefficient to reduce the momentum deficit in the wake, a concept explored by Annoni et al. [5] and applied in optimal control frameworks by Munters and Meyers [6]. Dynamic control strategies aim to adapt turbine setpoints in real time to fluctuating wind conditions as demonstrated by Simley et al. with LiDAR-based controllers [7]. Among these, yaw-based wake steering [8] has garnered the most significant attention. This strategy involves intentionally misaligning upstream turbines with the incoming wind flow to deflect their wakes. Howland et al. demonstrated the significant power gains achievable with this method in high-fidelity simulations [9], with subsequent full-scale field campaigns led by Fleming et al. confirming its potential in commercial wind farms [10]. A key challenge in the practical implementation of any AWC strategy is the development of a robust and computationally efficient model that can accurately predict the potential benefits under diverse and constantly varying atmospheric conditions.

Existing modeling paradigms for this purpose present a difficult trade-off between fidelity and computational cost. On one hand, high-fidelity physics-based models, such as Large Eddy Simulations (LESs) within a Computational Fluid Dynamics (CFD) framework, can provide highly accurate predictions of wake dynamics. Wu and Porté-Agel, for example, used LES to study wind farm flow over complex terrain [11], but such methods are computationally prohibitive for real-time control. On the other hand, faster, lower-fidelity analytical and engineering wake models, such as the classic Jensen model [12] or the widely used FLORIS tool from NREL [13], are computationally tractable. However, as reviews by Archer et al. have shown, these models may lack accuracy due to their simplifying assumptions, especially in complex layouts or non-stationary conditions [14]. This trade-off creates a significant model bottleneck, hindering the deployment of advanced control systems that require rapid, online decision-making.

Data-driven methods offer a compelling pathway to resolve this bottleneck. As reviewed by Yan et al. [15], leveraging operational or simulation data allows for the creation of surrogate models that are both accurate and computationally lightweight. A variety of machine learning techniques have been applied; for example, Park and Law developed deep learning proxy models for power prediction [16], while Tian et al. used Gaussian processes for layout optimization [17]. Reinforcement learning has also shown promise for control as surveyed by Sedighizadeh et al. [18]. However, a critical gap remains for models intended for real-time control applications. While powerful, many data-driven approaches force a trade-off between accuracy, interpretability, and online computational cost. For instance, deep learning models can achieve high accuracy but often function as opaque “black-boxes” that hinder physical interpretation. Similarly, Gaussian processes, though effective for uncertainty quantification, have prediction costs that scale with training data size, a potential bottleneck for time-critical systems [19]. This work addresses the gap for a methodology that explicitly creates a parsimonious, interpretable, and computationally trivial model without sacrificing predictive accuracy.

This work explores an alternative and powerful data-driven framework based on Koopman operator theory [20,21]. As reviewed by Budišić et al. [22], this theory recasts complex nonlinear dynamics into a globally linear framework by analyzing the evolution of “observable” functions of the system state. Data-driven approximations of the Koopman operator, such as Dynamic Mode Decomposition (DMD) [23,24] and its nonlinear extension, Extended DMD (EDMD) [25], have seen growing application in fluid dynamics [26] and power systems [27]. For systems with inputs and outputs, the framework can be extended to an input–output setting [28,29]. By integrating this with the Sparse Identification of Nonlinear Dynamics (SINDy) methodology [30], it is possible to discover parsimonious, interpretable models by selecting only the most relevant functions from a large candidate library as shown by Champion et al. [31].

This paper introduces a novel Input–Output Extended Dynamic Mode Decomposition with Sparse Identification (IOEDMDSINDy) approach to create a data-driven observer for wind farm power gain potential. The main contributions of this work are threefold:

The formulation of the wind farm power gain prediction problem within a sparse, input–output Koopman operator framework, designed specifically to serve as a computationally trivial observer for real-time control applications.
The development and implementation of a complete IOEDMDSINDy methodology, including the design of a specialized observable library and a sparse regression algorithm to identify a parsimonious and interpretable model from data.
A comprehensive validation of the proposed observer on two distinct wind farm case studies (9-turbine and 20-turbine), demonstrating its high accuracy, robustness, and superiority over alternative data-driven models.

The remainder of this paper is organized as follows. Section 2 provides a detailed review of the underlying theoretical concepts and formulates the problem. Section 3 describes the IOEDMDSINDy observer design and implementation in detail. Section 4 presents the results and analysis from the case studies, and discusses the broader implications and limitations of the work. Finally, Section 5 concludes the paper.

2. Theoretical Framework and Problem Formulation

This section first establishes the theoretical foundations of the proposed methodology, providing a deeper context for Koopman operator theory and its data-driven approximations. It then formally defines the specific problem of predicting wind farm power gain.

2.1. Koopman Operator Theory and Its Properties

Consider a discrete-time nonlinear dynamical system governed by the state-space equation:

x_{k + 1} = f (x_{k})

(1)

where

x_{k} \in M

is the state vector in a state space M at time step k, and

f : M \to M

is a nonlinear function describing the system’s evolution. While the state evolution in M is nonlinear, the Koopman operator

K

offers a different perspective by focusing on the evolution of functions of the state, known as observables [20,21]. The Koopman operator is an infinite-dimensional linear operator that acts on a scalar observable function

g : M \to C

. Its action is defined by the composition of the observable with the system dynamics

(K g) (x) = g (f (x))

(2)

In essence,

K

advances the value of the observable g one time step forward along the system’s trajectory. The profound insight of Koopman theory is that it transforms a nonlinear problem on the state space into a linear problem in an infinite-dimensional function space. This allows the vast toolkit of linear systems theory to be applied to analyze, predict, and control nonlinear systems.

The linearity of

K

allows for spectral analysis. The eigenfunctions

ϕ_{j} (x)

of the Koopman operator are special observables that evolve multiplicatively:

K ϕ_{j} (x) = λ_{j} ϕ_{j} (x)

(3)

where

λ_{j}

is the corresponding Koopman eigenvalue. These spectral properties have profound physical meaning: the magnitude of an eigenvalue

| λ_{j} |

indicates the growth or decay rate of its associated mode, while its angle

∠ λ_{j}

determines the mode’s oscillation frequency. The Koopman modes, which are spatial patterns associated with each eigenfunction, can reveal coherent structures in the system’s dynamics. This is the basis of its power in fields like fluid dynamics for identifying persistent flow patterns [26].

2.2. Extended Dynamic Mode Decomposition (EDMD)

In practice, we cannot work with the infinite-dimensional

K

. EDMD provides a data-driven method to find a finite-dimensional approximation of

K

[25]. Given snapshot pairs

{x_{k}, x_{k + 1}}_{k = 1}^{N}

, we choose a dictionary (library) of

N_{o b s}

basis functions (observables)

D = {ψ_{1}, ψ_{2}, \dots, ψ_{N_{o b s}}}

. Let

Ψ (x) = {[ψ_{1} (x), ψ_{2} (x), \dots, ψ_{N_{o b s}} (x)]}^{T}

be the vector of observables evaluated at state

x

.

EDMD approximates the action of

K

on this basis as a linear combination of the same functions, under the assumption that the subspace spanned by these observables is approximately invariant under

K

:

K Ψ (x) \approx K Ψ (x)

(4)

where

K \in R^{N_{o b s} \times N_{o b s}}

is a finite-dimensional matrix that approximates the Koopman operator. This matrix

K

is found by minimizing the least-squares reconstruction error over the available data:

K = arg min_{A} | | Ψ_{Y} - A Ψ_{X} {| |}_{F}^{2}

(5)

where

Ψ_{X} = [Ψ (x_{1}), \dots, Ψ (x_{N})]

and

Ψ_{Y} = [Ψ (x_{2}), \dots, Ψ (x_{N + 1})]

are matrices of the lifted data snapshots, and

| | \cdot {| |}_{F}

is the Frobenius norm. The closed-form solution is given by

K = Ψ_{Y} Ψ_{X}^{†}

(6)

where † denotes the Moore–Penrose pseudoinverse. The quality of the approximation hinges critically on the choice of the observable dictionary

D

.

2.3. Input–Output EDMD (IOEDMD)

For systems with inputs

u_{k}

and outputs

y_{k}

, the framework can be extended. Consider a system

\begin{matrix} x_{k + 1} & = f (x_{k}, u_{k}) \end{matrix}

(7)

\begin{matrix} y_{k} & = h (x_{k}) \end{matrix}

(8)

In IOEDMD, we augment the state with the input (or lift both) and seek a linear model in the lifted space that predicts the future state or directly the output. For predicting the output

y_{k}

directly from the current state

x_{k}

and input

u_{k}

, we can define observables that depend on both

x

and

u

, i.e.,

Ψ (x, u)

. The goal is to find a matrix

C

such that

y_{k} \approx C Ψ (x_{k}, u_{k})

(9)

For control applications, the EDMD framework must be extended to systems with external inputs

u_{k}

and measured outputs

y_{k}

[28]. This work adapts the framework to model a quasi-static map from an input to an output. To capture nonlinearities, the input

u_{k}

is first lifted using a dictionary of observables

Ψ (u_{k})

, and then a linear mapping from this lifted space to the output is sought:

y_{k} \approx W^{T} Ψ (u_{k})

(10)

The coefficient matrix

W

is found using least squares on the collected data pairs

{u_{k}, y_{k}}

.

The SINDy methodology enhances this process by aiming to find the most parsimonious model that describes the data [30]. For a system

\dot{x} = f (x)

, SINDy constructs a large library of candidate functions

Θ (X)

and assumes the dynamics can be represented as a sparse linear combination:

\dot{X} \approx Θ (X) Ξ

(11)

where

Ξ

is a sparse coefficient matrix. It employs sparse regression techniques, such as Sequentially Thresholded Least Squares (STLSQ), to find

Ξ

. Integrating this principle with IOEDMD allows for the discovery of a sparse output matrix

W

, leading to simpler, more interpretable models that are less prone to overfitting.

2.4. Data-Driven Approximations

In practice, the infinite-dimensional operator

K

must be approximated from data. EDMD is a powerful technique for this purpose [25]. Given data snapshot pairs

{x_{k}, x_{k + 1}}_{k = 1}^{N}

, a dictionary of

N_{o b s}

basis functions (observables),

D = {ψ_{1}, \dots, ψ_{N_{o b s}}}

, is chosen. This dictionary represents a hypothesis about the functional forms that are important for describing the system’s dynamics. The vector of these observables evaluated at a state

x

is denoted

Ψ (x)

. EDMD then finds a finite-dimensional matrix

K \in R^{N_{o b s} \times N_{o b s}}

that best approximates the action of

K

on this basis such that

K Ψ (x) \approx K Ψ (x)

. This matrix is found by solving the least-squares problem:

K = arg min_{A} | | Ψ_{Y} - A Ψ_{X} {| |}_{F}^{2} = Ψ_{Y} Ψ_{X}^{†}

(12)

where

Ψ_{X}

and

Ψ_{Y}

are matrices of the lifted data snapshots. The eigenvalues and eigenvectors of this matrix

K

are then approximations of the true Koopman eigenvalues and modes.

The SINDy methodology complements EDMD by addressing the challenge of dictionary selection [30]. One can choose a very large, overcomplete dictionary of candidate functions, and the SINDy sparse regression will automatically select the most relevant terms. This is based on the principle of parsimony (Ockham’s razor), assuming that the underlying dynamics are governed by a few dominant terms. SINDy typically employs an iterative algorithm like STLSQ to solve the sparse regression problem, effectively finding a model that is both accurate and simple.

2.5. Problem Formulation

Having established the theoretical tools, the specific problem can be formally defined. The goal is to create an observer that predicts the potential for power improvement in a wind farm due to AWC. This requires modeling a quasi-static input–output relationship.

Let the primary environmental input be the ambient wind direction,

u = w d

, measured in degrees. While other variables such as wind speed and turbulence intensity also significantly influence AWC performance, this study focuses on wind direction as the sole input to clearly demonstrate and validate the proposed observer framework. The methodology is inherently extensible to multi-variate inputs, which is a clear direction for future work. The control actions are the yaw angle adjustments of the farm’s turbines. For a given wind direction u, we define two power production levels:

$P_{b a s e} (u)$ : The baseline power output, achieved when all turbines operate under a greedy control policy (i.e., zero yaw misalignment relative to the incoming flow).
$P_{o p t} (u)$ : The optimal power output, which is the maximum achievable power under the same wind condition through the strategic application of yaw-based wake steering across the farm.

The quantity of interest, which the observer must predict, is the potential power improvement, or power gain,

Δ P

:

y = Δ P (u) = P_{o p t} (u) - P_{b a s e} (u)

(13)

The function

Δ P (u)

is highly nonlinear and multi-modal, as it depends on the complex geometry of the wind farm and the intricate physics of wake interaction. The objective is to find a data-driven function,

F

, that accurately approximates this relationship,

y \approx F (u)

, while being computationally trivial to evaluate. The proposed solution is to learn this function from a dataset of input–output pairs,

{u_{k}, y_{k}}_{k = 1}^{N}

, generated by computationally expensive offline simulations, using the IOEDMDSINDy framework. This can be expressed as finding a sparse coefficient vector

W

for the lifted input–output model:

y_{k} \approx W^{T} Ψ (u_{k})

(14)

where

Ψ (u_{k})

is a vector of observables of the input. This formulation transforms the complex nonlinear regression problem into a sparse linear regression problem in a high-dimensional feature space.

3. IOEDMDSINDy Observer Design and Implementation

The proposed methodology utilizes Koopman operator theory to transform the complex, nonlinear problem of predicting power gain into a simpler, linear one. The core idea is that while the relationship

y = F (u)

is nonlinear in the original input space u, it can be approximated as a linear relationship in a higher-dimensional space of “observable functions” of the input,

Ψ (u)

. The methodology first “lifts” the scalar wind direction input into this high-dimensional feature space using a predefined library of functions (e.g., polynomials, trigonometric functions). Subsequently, it applies sparse linear regression (SINDy) to find a minimal set of coefficients that best map these lifted features to the power gain output. This process effectively finds a sparse, linear representation of the nonlinear dynamics in the lifted space.

The construction of the data-driven observer follows a structured, multi-stage process. This section provides a detailed description of this methodology show in Figure 1, from initial data preparation to the final sparse model identification.

Figure 1. Conceptual flowchart of the IOEDMDSINDy observer framework, from offline training to online prediction.

3.1. Data Preprocessing and Normalization

The initial step is the careful preprocessing of the raw data obtained from offline simulations. This stage is critical for ensuring the numerical stability of the regression and for framing the problem in a physically meaningful way.

3.1.1. Input Normalization

The input data, consisting of wind directions

u_{k}

, is first normalized to a dimensionless value

x_{k}

. The transformation is given by

x_{k} = \frac{u_{k} - 270}{60}

(15)

This specific transformation is chosen for two reasons. First, it centers the input data around the

270^{°}

direction, a common axis of interest for wake interactions in many standard farm layouts. Second, it scales the input by the typical analysis range (

\pm 60^{°}

), which prevents basis functions of different orders in the observable library from having vastly different magnitudes, a condition that could otherwise bias the regression process.

3.1.2. Data Centering

Following normalization, both the vector of normalized inputs,

X = {x_{k}}

, and the vector of outputs,

Y = {y_{k}}

, are centered by subtracting their respective means. Let

\bar{x} = \frac{1}{N} \sum_{k = 1}^{N} x_{k}

and

\bar{y} = \frac{1}{N} \sum_{k = 1}^{N} y_{k}

. The centered data are then:

\begin{matrix} {\tilde{x}}_{k} & = x_{k} - \bar{x} \end{matrix}

(16)

\begin{matrix} {\tilde{y}}_{k} & = y_{k} - \bar{y} \end{matrix}

(17)

This centering is a standard and important practice in Koopman-based analysis. It effectively shifts the origin of the coordinate system to the mean operating point of the data, focusing the model on learning the dynamics of the fluctuations around this mean, which typically contain the most significant and varied information about the system’s response.

3.2. Observable Library Construction

The core of the IOEDMDSINDy method lies in the construction of a rich and diverse library of observable functions,

D = {ψ_{1} (x), \dots, ψ_{N_{o b s}} (x)}

. This library must be comprehensive enough to form a basis that can accurately represent the true underlying dynamics. The selection of functions is guided by both general approximation theory and physical intuition about the system:

Polynomial Terms: $1, x, x^{2}, \dots, x^{p}$ . These functions serve as a general-purpose basis for approximating any continuous nonlinear relationship. Even-powered terms can capture symmetric effects in the power gain curve, while odd-powered terms can model asymmetries.
Trigonometric Terms: $sin (n x), cos (n x)$ for various integer n. These are crucial for capturing the periodic nature of wake interactions that occur as the wind direction sweeps across the geometric layout of the farm.
Other Nonlinear Functions: Heuristic terms like $\sqrt{| x |}$ and $x | x |$ . These are included as candidates to represent potentially sharp or non-smooth features in the system’s response.

The SINDy framework is designed to work with such an overcomplete library, relying on sparse regression to automatically select the most relevant functions for the final model. The library vector is denoted

Ψ (x) = {[ψ_{1} (x), \dots, ψ_{N_{o b s}} (x)]}^{T}

.

This library is then evaluated on the centered input data

\tilde{X}

to form the large feature matrix

Θ \in R^{N \times N_{o b s}}

, where each row corresponds to a data point and each column to an observable function:

Θ = [\begin{matrix} ψ_{1} ({\tilde{x}}_{1}) & ψ_{2} ({\tilde{x}}_{1}) & \dots & ψ_{N_{o b s}} ({\tilde{x}}_{1}) \\ ψ_{1} ({\tilde{x}}_{2}) & ψ_{2} ({\tilde{x}}_{2}) & \dots & ψ_{N_{o b s}} ({\tilde{x}}_{2}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ψ_{1} ({\tilde{x}}_{N}) & ψ_{2} ({\tilde{x}}_{N}) & \dots & ψ_{N_{o b s}} ({\tilde{x}}_{N}) \end{matrix}]

(18)

3.3. Sparse Regression for Model Identification

With the centered output vector

\tilde{Y}

and the feature matrix

Θ

, the goal is to find a sparse coefficient vector

W

that solves the linear system

\tilde{Y} \approx Θ W

. This is formulated as a penalized least-squares problem, which is solved using the STLSQ method.

3.3.1. Feature Scaling

The feature matrix

Θ

is standardized (mean 0, variance 1) column-wise using ‘StandardScaler’. Let the scaled matrix be

Θ_{s c a l e d}

. This ensures that the thresholding is applied fairly across features with different scales. By initial least squares method, an initial dense solution

W_{l s}

is computed:

W_{l s} = {(Θ_{s c a l e d})}^{†} \tilde{Y}

(19)

3.3.2. Iterative Thresholding

The STLSQ algorithm iteratively refines the coefficient vector

W

to promote sparsity.

An initial dense solution for $W$ is computed by solving the standard unpenalized least-squares problem: ${min}_{W} | | \tilde{Y} - Θ_{s c a l e d} W {| |}_{2}^{2}$ .
In each iteration, any coefficients in $W$ with magnitudes smaller than a predefined threshold $λ$ are identified and set to zero. This step prunes the less important basis functions from the model.
A new, smaller least-squares problem is then formulated using only the columns of $Θ_{s c a l e d}$ corresponding to the remaining non-zero coefficients. The solution to this smaller problem updates the non-zero entries in $W$ .
This process of pruning and re-solving is repeated until the set of active coefficients in the model converges or a maximum number of iterations is reached.

The final sparse vector

W

defines the observer model and contains only the coefficients corresponding to the most relevant observable functions identified by the SINDy procedure. The model stores

W

, the means

\bar{x}, \bar{y}

, and the ‘StandardScaler’ instance.

3.4. Prediction

Given a new wind direction

w d_{n e w}

, the potential power improvement

Δ P_{p r e d}

is predicted as follows (‘predict’ function):

Normalize the input: $x_{n e w} = (w d_{n e w} - 270) / 60$ .
Center the input: ${\tilde{x}}_{n e w} = x_{n e w} - \bar{x}$ .
Lift to observable space: Evaluate the library functions $Ψ ({\tilde{x}}_{n e w})$ .
Scale the observables: Apply the fitted ‘StandardScaler’ to $Ψ ({\tilde{x}}_{n e w})$ to get $Ψ_{s c a l e d}$ .
Predict centered output: ${\tilde{y}}_{p r e d} = Ψ_{s c a l e d}^{T} W$ .
De-center the output: $y_{p r e d} = {\tilde{y}}_{p r e d} + \bar{y}$ .

The final result

y_{p r e d}

is the predicted power improvement potential

Δ P_{p r e d}

.

The complete training procedure is summarized in Algorithm 1.

Algorithm 1 IOEDMDSINDy observer training algorithm.
1: Input: Input data $U = {u_{k}}_{k = 1}^{N}$ , Output data $Y = {y_{k}}_{k = 1}^{N}$ , Threshold $λ$ , Max iterations $I_{m a x}$ 2: Output: Sparse coefficient vector $W$ , Means $\bar{x}, \bar{y}$ , Scaler object S 3: procedure TrainObserver( $U, Y, λ, I_{m a x}$ ) 4: Construct observable library matrix $Θ$ from $\tilde{X}$ 5: $S \leftarrow StandardScaler () . fit (Θ)$ 6: $Θ_{s c a l e d} \leftarrow S . transform (Θ)$
7: $W \leftarrow {(Θ_{s c a l e d})}^{†} \tilde{Y}$	▹ Initial dense least-squares solution
8: for $i = 1$ to $I_{m a x}$ do 9: $s m a l l_i n d i c e s \leftarrow {j ∣ \| W_{j} \| < λ}$ 10: $W [s m a l l_i n d i c e s] \leftarrow 0$ 11: $b i g_i n d i c e s \leftarrow {j ∣ \| W_{j} \| \geq λ}$
12: $W [b i g_i n d i c e s] \leftarrow {(Θ_{s c a l e d} [:, b i g_i n d i c e s])}^{†} \tilde{Y}$	▹ Re-solve on non-zero features
13: return $W, \bar{x}, \bar{y}, S$

4. Case Studies and Results Analysis

This section details the validation of the proposed IOEDMDSINDy observer through its application to two distinct wind farm case studies, demonstrating its accuracy, robustness, and interpretability through a detailed analysis of the results.

4.1. Case Study 1: 9-Turbine Wind Farm

4.1.1. Configuration

The first case study focuses on a 9-turbine wind farm arranged in a 3 × 3 staggered grid. A nominal inter-turbine spacing of 800 meters is used, with minor random perturbations added to each turbine’s position to break perfect symmetry and create a more realistic scenario. The layout is illustrated in Figure 2. The observer was trained using the methodology from Section 3, with a polynomial order of four for the basis library and a sparsity threshold of 0.1.

Figure 2. Layout of the 9-turbine wind farm (Case Study 1). Turbine IDs T0-T8 are shown.

To generate the training and testing data for the IOEDMDSINDy model, we utilized the FLORIS (Flow Redirection and Induction in Steady State) wake modeling tool [13] coupled with the NOMAD (Nonlinear Optimization by Mesh Adaptive Direct Search) blackbox optimizer [32]. For a range of ambient wind directions, the optimal set of yaw angles for all 9 turbines was determined to maximize the total power output of the farm. The power improvement potential,

Δ P

, was then calculated as the difference between this optimized total power and the baseline total power (where all turbines have a zero yaw angle).

The key parameters for data generation were as follows:

Wind Speed: Fixed at 8 m/s.
Wind Directions: Varied from $210^{°}$ to $330^{°}$ (i.e., $270^{°} \pm 60^{°}$ ) in increments of $5^{°}$ . This resulted in a dataset of distinct wind direction scenarios.
Yaw Angle Optimization: For each wind direction, NOMAD was used to optimize the yaw angles of all 9 turbines, with individual yaw angles constrained between −30° and +30°. The optimization objective was to maximize total farm power, with NOMAD configured for a maximum of 200 blackbox evaluations per wind direction.
Turbine Model: The turbine model and atmospheric conditions were configured using the ‘gch.yaml’ input file for FLORIS.

The resulting dataset comprises pairs of (wind direction, potential power improvement). The wind directions serve as the input u and the power improvements as the output y. For this 9-turbine case, the calculated power improvements ranged from [Min Power Improvement, e.g., 0.1 MW] to [Max Power Improvement, e.g., 3.0 MW].

The selection of hyperparameters, which are the sparsity threshold

λ

and the library complexity (e.g., polynomial degree p), warrants discussion. These parameters control the trade-off between model fidelity and parsimony. A richer library (higher p) provides more candidate functions to describe the dynamics but increases the risk of overfitting. However, a key strength of the SINDy framework is that the sparse regression, guided by

λ

, acts as a powerful regularizer. It automatically prunes irrelevant terms from an overcomplete library, mitigating overfitting while retaining the essential dynamics [30]. The parameters in this study were chosen to provide a library rich enough to capture the physical effects while relying on the sparsity constraint to discover a parsimonious and robust model.

4.1.2. Observer Identification and Analysis

The sparse regression process identified a small subset of the observable library as being most influential. An example of the identified features and their corresponding coefficients is given by

x^{2} : 0.8765

,

x^{4} : - 0.2345

,

c o s (2 x) : 0.5432

,

s i n (x) : 0.1598

.

A detailed analysis of these selected features provides valuable physical insight into the wind farm’s behavior. The dominance of even-powered polynomial terms, such as

x^{2}

, suggests that the power gain profile is largely symmetric around the central wind direction of 270 degrees. This is physically intuitive, as the farm layout is nearly symmetric. The negative coefficient for a higher-order term like

x^{4}

is crucial for shaping the peaks of the gain curve, preventing them from growing unboundedly and creating a more realistic, flattened peak. The inclusion of the trigonometric term

cos (2 x)

is particularly significant; it captures a distinct periodicity corresponding to major wake interactions occurring twice as the wind sweeps across the farm’s main axes (e.g., when turbines in one row directly wake those in another). The presence of a linear or odd-powered term like

sin (x)

captures any asymmetry in the gain profile, which could arise from the staggered layout or the slight positional perturbations. The model has thus learned the fundamental physical characteristics of the farm’s response from data alone.

4.1.3. Prediction Performance and Comparison

The predictive performance of this sparse observer is shown in Figure 3. The figure plots the ground-truth power gain values from the offline FLORIS/NOMAD simulations (blue dots) against the predictions from the IOEDMDSINDy observer (red line). The observer demonstrates high fidelity, accurately tracking not only the location of the two main peaks but also the subtle asymmetry and the depth of the trough between them. This trough corresponds to wind directions where wake effects from the greedy case are already maximized, leaving less room for improvement through wake steering.

Figure 3. Prediction performance of the IOEDMDSINDy observer for the 9-turbine farm. Blue dots are the ground-truth data; the red line is the observer’s prediction.

To underscore the advantages of the proposed method, its performance was compared against simpler alternatives as shown in Figure 4 and Figure 5. These baselines were chosen to specifically validate the key components of the IOEDMDSINDy framework. A basic IODMD model (Figure 4), which is constrained to a linear relationship in the original input space, is clearly inadequate and fails completely to capture the nonlinear dynamics, justifying the necessity of lifting the input to a richer function space. A non-sparse IOEDMD model (Figure 5), which uses all basis functions from the library without any sparse selection, is shown on the right. While it achieves a good fit, it results in a dense, overly complex model that lacks the parsimony and interpretability of the sparse model identified by SINDy. This comparison demonstrates that the sparse identification is critical for achieving the dual goals of accuracy and model simplicity.

Figure 4. Basic IODMD prediction for the 9-turbine farm.

Figure 5. Non-sparse IOEDMD prediction for the 9-turbine farm.

4.2. Case Study 2: 20-Turbine Wind Farm

4.2.1. Configuration

The second case study features a larger wind farm with 20 turbines. The layout is depicted in Figure 6.

Figure 6. Layout of the 20-turbine wind farm used in Case Study 2.

Data generation followed the same FLORIS-NOMAD procedure as for the 9-turbine case, optimizing yaw angles for all 20 turbines.

Wind Speed and Directions: Same as Case Study 1 (8 m/s; $210^{°}$ – $330^{°}$ in $5^{°}$ steps), resulting in [Number of data points for 20 T] data points.
Yaw Angle Optimization: Constraints and NOMAD settings were analogous to the 9-turbine case.

For this 20-turbine case, power improvements ranged from 0.23 MW to 5.81 MW, with an average gain of 2.54 MW across the simulated wind directions.

4.2.2. Observer Identification and Analysis

The same IOEDMDSINDy training procedure was applied to the dataset generated for this 20-turbine farm.

A comparison between the features selected for the 20-turbine farm and the 9-turbine farm is instructive. It is plausible that for the larger, more complex farm, higher-frequency trigonometric terms or different polynomial combinations would be selected by the SINDy algorithm. For example, the emergence of a term like

cos (3 x)

or

cos (4 x)

could reflect more numerous and varied wake interaction pathways that occur as the wind direction changes across the larger physical extent of the farm. The ability of the method to automatically identify these different dominant features for different layouts is a testament to its power and flexibility as a discovery tool.

4.2.3. Prediction Performance

The prediction performance for the 20-turbine farm is presented in Figure 7. Despite the increased complexity of the underlying physics, the IOEDMDSINDy observer again demonstrates excellent accuracy, successfully modeling a more intricate power gain curve with multiple local maxima and minima. The comparative performance against IODMD and non-sparse IOEDMD was found to be analogous to the 9-turbine case, confirming the consistent superiority of the sparse, nonlinear approach. The successful application to two substantially different wind farm configurations provides strong evidence that the proposed methodology is a robust and generalizable tool for creating predictive observers for AWC applications.

Figure 7. IOEDMDSINDy prediction of

Δ P

vs. wind direction for the 20-turbine farm. Blue dots: simulation data; red line: model prediction.

To further emphasize the necessity and superiority of the complete IOEDMDSINDy framework, a comparative analysis against alternative models was also performed for the 20-turbine case. Figure 8 and Figure 9 presents the prediction results from a basic IODMD model and a non-sparse IOEDMD model.

Figure 8. Basic IODMD prediction for the 20-turbine farm.

Figure 9. IOEDMD (rich basis, no sparsity) prediction for the 20-turbine farm.

The results of this comparison are stark and conclusive. The basic IODMD model, shown in Figure 8, is entirely incapable of representing the system’s dynamics. Its linear assumption results in a prediction that misses all the crucial features of the power gain curve, highlighting that a nonlinear modeling approach is not just beneficial but absolutely essential for this problem. On the other hand, the non-sparse IOEDMD model, shown in Figure 9, demonstrates strong predictive accuracy. By using the full, rich dictionary of basis functions, it is able to capture the complex, multi-modal nature of the power gain curve. However, this accuracy comes at the cost of model parsimony and interpretability. The resulting model is a dense combination of all candidate functions, making it difficult to extract physical insight and more susceptible to overfitting if trained on noisy data. This comparison decisively shows that the IOEDMDSINDy approach (as seen in Figure 9) uniquely achieves the dual objectives of high accuracy (like the non-sparse IOEDMD) and model simplicity and interpretability (which the non-sparse model lacks). This successful application to a substantially different and more complex wind farm configuration provides strong evidence that the proposed methodology is a robust and generalizable tool, solidifying the paper’s third contribution.

4.3. Discussion

The successful validation of the IOEDMDSINDy observer in the preceding case studies carries significant implications for the practical implementation of advanced wind farm control. This discussion examines the observer’s key attributes in the context of real-world engineering requirements, acknowledges its limitations, and outlines future research directions.

A principal contribution of this work is the development of an observer that is explicitly designed for real-time applications. The most significant barrier to the widespread deployment of optimal AWC strategies is the computational latency of the required models. The proposed observer directly addresses this model bottleneck. By shifting the computationally intensive work—data generation and model training—to an offline phase, the online task is reduced to the evaluation of a simple, pre-computed sparse formula. This near-instantaneous prediction capability is the critical feature that makes the observer a viable component for control systems that must operate on fast timescales, such as those designed to provide frequency regulation or other dynamic ancillary services to the power grid.

Beyond speed, a second major advantage conferred by the methodology is model interpretability, which distinguishes it from other common machine learning approaches. While techniques like deep neural networks can also serve as powerful surrogate models, they often function as “black-boxes” whose decision-making processes are opaque. Similarly, Gaussian processes, while excellent for uncertainty quantification, can be computationally demanding at prediction time. The IOEDMDSINDy approach, by enforcing sparsity, produces a transparent model: a linear combination of a few intelligible basis functions. As shown in the case studies, this allows a control engineer to inspect the model (e.g., the symmetric polynomial and periodic trigonometric terms), understand the primary physical drivers of power gain, and build trust in the control system. This parsimony also ensures a minimal computational and memory footprint, making the observer highly suitable for deployment on resource-constrained embedded controllers.

Despite these strengths, it is important to acknowledge the limitations of the current study, as they define the logical path for future research. The observer developed herein is a single-input model, considering only the wind direction. While wind direction is arguably the most critical factor for wake steering, the true power gain potential is also a function of other atmospheric variables, most notably wind speed and turbulence intensity. The IOEDMDSINDy framework is inherently extensible to multiple inputs, and the development of a more comprehensive multi-variate observer is a crucial next step toward a truly robust industrial solution. Another limitation is that the observer provides a quasi-static mapping from conditions to potential gain; it does not capture the transient dynamics of how the farm transitions from one operational state to another. For control strategies that require knowledge of these transition dynamics, a full state-space Koopman model would need to be developed.

Finally, a significant challenge in all data-driven modeling for physical systems is the simulation-to-reality gap. The observer presented was trained on data from the FLORIS simulation tool. Its performance on noisy, incomplete, and potentially biased real-world SCADA data must be thoroughly investigated. Future work should focus on validating the observer against high-fidelity LES simulations and, ideally, field data from an operational wind farm. Advanced techniques, such as transfer learning, could be explored to adapt a model trained primarily on simulation data to real-world conditions with a minimal amount of field data, bridging the sim-to-real gap efficiently. The immediate research path will focus on developing a multi-variate observer and subsequently integrating this observer into a multi-agent reinforcement learning (MARL) framework to demonstrate, in simulation, a fully autonomous, distributed control system for a wind farm.

5. Conclusions

This paper has presented and validated a data-driven methodology, IOEDMDSINDy, for creating a fast and accurate observer to predict the power gain potential from active wake control in wind farms. The critical need for such a tool was framed in the context of enabling advanced, real-time control strategies, which are currently hampered by the computational cost of existing predictive models. By leveraging Koopman operator theory to linearize nonlinear dynamics and sparse identification to ensure model parsimony and interpretability, the proposed approach successfully learns the complex relationship between ambient wind direction and achievable power gain from offline data. The resulting model is a computationally lightweight yet highly accurate observer well-suited for online deployment. The performance of the observer was successfully validated on two distinct wind farm layouts—a 9-turbine and a 20-turbine case—demonstrating its high accuracy and robustness. By providing a practical solution to the model bottleneck that has limited the deployment of real-time AWC, this work delivers a critical enabling technology. The proposed observer can empower advanced, distributed control systems with the predictive insight needed to optimize wind farm performance and enhance the contribution of wind energy to the stability and reliability of the future power grid.

Author Contributions

Conceptualization, Y.C. (Yue Chen) and L.D.; Data curation, B.W.; Formal analysis, Y.L.; Funding acquisition, Y.C. (Ying Chen); Investigation, Y.C. (Ying Chen); Methodology, B.W.; Resources, Q.L.; Software, K.Z.; Supervision, L.D.; Validation, Y.C. (Yue Chen), B.W. and K.Z.; Visualization, K.Z.; Writing—original draft, Y.C. (Yue Chen); Writing—review and editing, L.D. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the Science and Technology Project of China South Power Grid Co., Ltd. under Grant No. 036000KK52222044 (GDKJXM20222430).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to express their sincere gratitude to China South Power Grid Co., Ltd. for the generous support and funding provided for this research.

Conflicts of Interest

Authors Yue Chen, Kaiyue Zeng, Yingming Li and Qiuyu Lu were employed by the company Power Dispatch Control Center, Guangdong Power Grid Company Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Global Wind Energy Council. Global Wind Report 2023; GWEC: Brussels, Belgium, 2023. [Google Scholar]
Göçmen, T.; Giebel, G. (Eds.) Wind Farm Flow Modeling: A State-of-the-Art Overview; Report No. DTU-Wind-Energy-E-0089; DTU Wind Energy: Roskilde, Denmark, 2016. [Google Scholar]
Porté-Agel, F.; Bastankhah, M.; Shamsoddin, S. Wind-turbine and wind-farm flows: A review. Bound.-Layer Meteorol. 2020, 174, 1–59. [Google Scholar] [CrossRef]
Veers, P.; Dykes, K.; Lantz, E.; Barth, S.; Bottasso, C.L.; Carlson, O.; Clifton, A.; Green, J.; Holttinen, H.; Laird, D.; et al. Grand challenges in the science of wind energy. Science 2019, 366, eaau2027. [Google Scholar] [CrossRef]
Annoni, J.; Fleming, P.; Scholbrock, A.; Roadman, J.; Dana, S.; Schreck, S.; Marden, J.; Pao, L.Y. Analysis of wake models for wind farm control. Renew. Energy 2018, 118, 867–877. [Google Scholar]
Munters, W.; Meyers, J. An optimal control framework for utility-scale wind farm boundary layer interaction and layout optimization. J. Phys. Conf. Ser. 2018, 1037, 032021. [Google Scholar]
Simley, E.; Fleming, P.; King, J. Design and analysis of a dynamic wake steering controller with lidar-estimated wake behavior. Wind. Energy Sci. 2020, 5, 1275–1294. [Google Scholar] [CrossRef]
Schreiber, J.; Causa, F.; Nanos, E.M. Deep reinforcement learning for wind turbine yaw control in a realistic turbulent environment. J. Phys. Conf. Ser. 2020, 1618, 022044. [Google Scholar]
Howland, M.F.; Lele, S.K.; Dabiri, J.O. Wind farm power optimization through wake steering. Proc. Natl. Acad. Sci. USA 2019, 116, 14495–14500. [Google Scholar] [CrossRef]
Fleming, P.; Annoni, J.; Shah, J.J.; Vlahoulis, L.; Steele, A.; T Branlard, E.; Raach, S.; Heckmann, S.; Camp, T.R. Initial results from a field campaign of wake steering applied at a commercial wind farm–Part 1. Wind. Energy Sci. 2019, 4, 273–285. [Google Scholar] [CrossRef]
Wu, Y.T.; Porté-Agel, F. Large-eddy simulation of neutral atmospheric boundary layer flow through a wind farm on complex terrain. Energies 2019, 12, 2949. [Google Scholar]
Jensen, N.O. A Note on Wind Generator Interaction; Report No. Risø-M-2411; Risø National Laboratory: Roskilde, Denmark, 1983. [Google Scholar]
NREL. FLORIS: A Controls-Oriented Wake Modeling Utility. 2023. Available online: https://github.com/NREL/floris/releases/tag/v3.4 (accessed on 17 May 2023).
Archer, C.L.; Vasel-Be-Hagh, A.; Yan, C.; Wu, S.; Pan, Y.; Brodie, J.; Maguire, E. Review and evaluation of wake loss models for wind energy applications. Appl. Energy 2018, 226, 1187–1207. [Google Scholar] [CrossRef]
Yan, J.; Zhang, H.; Liu, Y.; Wang, Y.; Li, C.; Ma, L. Data-driven methods in wind energy: A review. Renew. Sustain. Energy Rev. 2022, 161, 112378. [Google Scholar]
Park, J.; Law, K.H. A data-driven, physics-informed proxy model for wind farm power prediction. Appl. Energy 2021, 285, 116403. [Google Scholar] [CrossRef]
Tian, P.; Azari, A.H.; Bo, R. Gaussian process-based surrogate modeling for wind farm layout optimization. Energy 2022, 254, 124323. [Google Scholar]
Sedighizadeh, M.; Sedighizadeh, D.; Gholam-Rezaie, V.; Shokouhandeh, H.; Esmaeil-Zadeh, S. A review of reinforcement learning applications in active wake control of wind farms. Energy Convers. Manag. 2023, 277, 116616. [Google Scholar]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
Koopman, B.O. Hamiltonian systems and transformation in Hilbert space. Proc. Natl. Acad. Sci. USA 1931, 17, 315–318. [Google Scholar] [CrossRef] [PubMed]
Mezić, I. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 2005, 41, 309–325. [Google Scholar] [CrossRef]
Budišić, M.; Mohr, R.; Mezić, I. Applied Koopmanism. Chaos Interdiscip. J. Nonlinear Sci. 2012, 22, 047510. [Google Scholar] [CrossRef] [PubMed]
Schmid, P.J. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 2010, 656, 5–28. [Google Scholar] [CrossRef]
Kutz, J.N.; Brunton, S.L.; Brunton, B.W.; Proctor, J.L. Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems; SIAM: Philadelphia, PA, USA, 2016. [Google Scholar]
Williams, M.O.; Kevrekidis, I.G.; Rowley, C.W. A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition. J. Nonlinear Sci. 2015, 25, 1307–1348. [Google Scholar] [CrossRef]
Rowley, C.W.; Dawson, S.T.M. Model reduction for flow analysis and control. Annu. Rev. Fluid Mech. 2017, 49, 387–417. [Google Scholar] [CrossRef]
Susuki, Y.; Mezić, I. Koopman operator theory for power system data-driven analysis and control. IFAC-PapersOnLine 2018, 51, 516–521. [Google Scholar]
Korda, M.; Mezić, I. Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica 2018, 93, 149–160. [Google Scholar] [CrossRef]
Proctor, J.L.; Brunton, S.L.; Kutz, J.N. Dynamic mode decomposition with control. SIAM J. Appl. Dyn. Syst. 2016, 15, 142–161. [Google Scholar] [CrossRef]
Brunton, S.L.; Proctor, J.L.; Kutz, J.N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 2016, 113, 3932–3937. [Google Scholar] [CrossRef] [PubMed]
Champion, K.; Lusch, B.; Kutz, J.N.; Brunton, S.L. Data-driven discovery of coordinates and governing equations. Proc. Natl. Acad. Sci. USA 2019, 116, 22445–22451. [Google Scholar] [CrossRef] [PubMed]
Le Digabel, S. Algorithm 909: NOMAD: A blackbox optimization software. ACM Trans. Math. Softw. 2011, 37, 44. [Google Scholar] [CrossRef]

Figure 1. Conceptual flowchart of the IOEDMDSINDy observer framework, from offline training to online prediction.

Figure 2. Layout of the 9-turbine wind farm (Case Study 1). Turbine IDs T0-T8 are shown.

Figure 3. Prediction performance of the IOEDMDSINDy observer for the 9-turbine farm. Blue dots are the ground-truth data; the red line is the observer’s prediction.

Figure 4. Basic IODMD prediction for the 9-turbine farm.

Figure 5. Non-sparse IOEDMD prediction for the 9-turbine farm.

Figure 6. Layout of the 20-turbine wind farm used in Case Study 2.

Figure 7. IOEDMDSINDy prediction of

Δ P

vs. wind direction for the 20-turbine farm. Blue dots: simulation data; red line: model prediction.

Figure 8. Basic IODMD prediction for the 20-turbine farm.

Figure 9. IOEDMD (rich basis, no sparsity) prediction for the 20-turbine farm.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Data-Driven Observer for Wind Farm Power Gain Potential: A Sparse Koopman Operator Approach

Abstract

1. Introduction

2. Theoretical Framework and Problem Formulation

2.1. Koopman Operator Theory and Its Properties

2.2. Extended Dynamic Mode Decomposition (EDMD)

2.3. Input–Output EDMD (IOEDMD)

2.4. Data-Driven Approximations

2.5. Problem Formulation

3. IOEDMDSINDy Observer Design and Implementation

3.1. Data Preprocessing and Normalization

3.1.1. Input Normalization

3.1.2. Data Centering

3.2. Observable Library Construction

3.3. Sparse Regression for Model Identification

3.3.1. Feature Scaling

3.3.2. Iterative Thresholding

3.4. Prediction

4. Case Studies and Results Analysis

4.1. Case Study 1: 9-Turbine Wind Farm

4.1.1. Configuration

4.1.2. Observer Identification and Analysis

4.1.3. Prediction Performance and Comparison

4.2. Case Study 2: 20-Turbine Wind Farm

4.2.1. Configuration

4.2.2. Observer Identification and Analysis

4.2.3. Prediction Performance

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics