Machine and Deep Learning Approaches for Wind Turbine Model Parameter Prediction Within the Framework of IEC 61400-27 Standard

Jiménez-Ruiz, Javier; Honrubia-Escribano, Andrés; Gómez-Lázaro, Emilio

doi:10.3390/electronics15051104

Open AccessArticle

Machine and Deep Learning Approaches for Wind Turbine Model Parameter Prediction Within the Framework of IEC 61400-27 Standard

by

Javier Jiménez-Ruiz

^*

,

Andrés Honrubia-Escribano

and

Emilio Gómez-Lázaro

Renewable Energy Research Institute, Department of Electrical, Electronic, Automatic and Communications Engineering of ETSII-AB, University of Castilla-La Mancha (UCLM), 02071 Albacete, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(5), 1104; https://doi.org/10.3390/electronics15051104

Submission received: 13 February 2026 / Revised: 3 March 2026 / Accepted: 5 March 2026 / Published: 6 March 2026

(This article belongs to the Topic Advances in Wind Energy Technology: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The increasing penetration of renewable energy sources in power systems has intensified the need for accurate modelling of generation units under transient conditions. Despite the widespread adoption of the IEC 61400-27 generic wind turbine models, their parametrization remains a critical challenge. Classical optimization-based approaches are time-consuming, prone to convergence to local minima in the high-dimensional non-convex parameter space and require substantial expert knowledge. To address this gap, this paper proposes a machine learning- and deep learning-based methodology for estimating the key mechanical parameters of Type III wind turbines. A synthetic database of 10,000 active power responses was generated using DIgSILENT PowerFactory via its Python Application Programming Interface, covering a wide range of voltage dip conditions and mechanical parameter combinations. A comparative analysis of eight machine learning and deep learning algorithms for this task is performed. Validation is performed on both the synthetic dataset and two real manufacturer-validated wind turbine models. The results demonstrate that the proposed methodology enables fast and accurate identification of the mechanical parameters of wind turbines, maintaining reliable estimation performance even in the presence of measurement noise, thereby supporting its applicability in power system stability studies.

Keywords:

DIgSILENT PowerFactory; IEC 61400-27; machine learning; power systems stability; wind energy

1. Introduction

In recent years, incorporating wind energy into power systems has become a vital aspect of the global energy sector, positioning it as one of the most promising renewable energy sources. The Global Wind Energy Council reports that 117 GW of wind power capacity was added worldwide in 2024, bringing the total installed capacity to 1136 GW. Most of this growth occurred in countries such as China, the United States, India, and Germany [1]. Initiatives to combat pollution in the European Union (EU) have resulted in stringent policies targeting reductions in greenhouse gas emissions. By 2030, the EU aims to have achieved 323 GW of installed wind power capacity, with 100 GW expected to come from offshore installations [2]. Wind energy has become a primary energy source in many nations, particularly in Europe. In Spain, for instance, wind power dominated the energy mix in 2023, emerging as the renewable source with the highest share in annual electricity production, meeting 22.2% of the country’s energy demand [3] (see Figure 1, in which wind energy appears in green with red lines). To successfully incorporate a large amount of wind energy into power systems, it is essential to carefully evaluate the impact of these plants on the electrical grid. One common approach to study the influence of these generation facilities on power systems is through transient stability analyses. These analyses are crucial for understanding power system behaviour during transient disturbances and ensuring that synchronism and voltage stability are maintained, regardless of the events affecting the grid.

To conduct such analyses, Transmission System Operators (TSOs) and Distribution System Operators (DSOs) rely on precise dynamic models of wind turbines (WTs), typically supplied by WT manufacturers, for integration into power systems. However, unlike the standardized synchronous generators used in traditional power plants, the majority of existing WT models lack broad standardization [4]. This creates a significant challenge, as a large variety of WT models is required to accurately assess how the power system will perform once all wind power plants (WPPs) are in operation. Although manufacturer-provided models offer an accurate representation of WT performance, they are often restricted by non-disclosure agreements, limiting access to the necessary details for all stakeholders to simulate WT behaviour effectively. For this reason, TSOs have found the use of detailed manufacturer models to be impractical for large simulations. To address this issue, ongoing efforts have focused on developing standardized models capable of replicating the transient behaviour of WTs. These models are designed to be accessible to all stakeholders without violating the confidentiality agreements of WT manufacturers. Such standardized models are widely used in power system simulations, while detailed models are typically required by grid operators only in specific cases, such as preventing sub-synchronous interactions [5]. The models proposed by the International Electrotechnical Commission (IEC) through IEC 61400-27 standard [6] are the most widely used WT generic models in the industry, along with those proposed by the Western Electricity Coordinating Council (WECC).

The use of generic models provides a standardized framework for simulating WTs from different manufacturers. Importantly, the IEC 61400-27 models are specifically designed to represent the dynamic behaviour of any WT independently of its manufacturer, making them the most suitable tool for generalization and cross-technology analysis. However, these models often require tuning a large number of parameters, sometimes exceeding a hundred, to accurately represent the turbine’s behaviour. Adjusting these parameters is crucial to ensure that simulations align with real-world performance. Yet, obtaining these parameters can be challenging due to empirical estimation difficulties or manufacturer confidentiality restrictions. The parametrization of WT models to accurately reproduce the real behaviour of an operating turbine still relies largely on manual tuning of model parameters. This approach requires substantial expert knowledge of the underlying model [7] and is extremely time-consuming, making it impractical unless an approximate range of the true turbine parameters is already known. Alternative approaches based on classical derivative-free optimization algorithms, such as Particle Swarm Optimization, Nelder–Mead, or the Kalman Filter, are also commonly used in power systems studies [8,9,10], and have also been implemented through specialized tools such as the Simulink Design Optimization Toolbox [11]. However, these methods entail high computational costs, lack scalability, and are generally limited to fine-tuning parameters around an initial guess that is already reasonably accurate (see Table 1). More fundamentally, such optimization-based techniques face intrinsic limitations when applied to highly nonlinear dynamic systems such as DFIG-based WTs. The IEC 61400-27 models define a high-dimensional parameter space with over 100 tunable parameters, and the mapping from parameter values to observable outputs, such as the active power response during a voltage dip, is strongly nonlinear and non-convex. This non-convexity leads to multiple local minima, saddle points, and flat regions that frequently trap iterative, gradient-free methods and hinder convergence to the global optimum. In addition, the sensitivity of the system response to individual parameters is highly non-uniform: while some parameters have a dominant influence on post-fault oscillations, others exhibit subtler and coupled effects. These characteristics, together with the strong interaction between mechanical and electrical subsystems, result in a particularly challenging optimization landscape. In contrast, machine learning (ML) and deep learning (DL) approaches learn a direct, nonparametric mapping from the observable post-fault active power trajectory to the underlying model parameters using a large and diverse training dataset. This allows them to implicitly capture complex nonlinear dependencies and interaction effects without requiring an explicit formulation of the system dynamics, and enables parameter inference through a single forward pass, completely avoiding iterative search procedures and their associated convergence issues.

To overcome these limitations, this paper proposes a ML based methodology to estimate key parameters. By analysing a WT response to transient conditions, the approach identifies the parameters that best represent its post-fault behaviour, enabling accurate simulations across various scenarios. Unlike conventional parameter identification techniques, which are often time-consuming, require extensive expert knowledge, and may converge to suboptimal solutions, ML models are capable of automatically learning complex, nonlinear relationships between the input signals and the underlying parameters. This substantially reduces the manual effort required in the tuning process, while improving consistency and reproducibility. Furthermore, the proposed methodology is inherently scalable: once trained, the ML models can be reused or retrained with new data, making it possible to efficiently adapt the parameter estimation process to different turbine types, operating conditions, and fault scenarios. The paper also analyzes the differences among the various ML and DL models used in the study, highlighting disparities in both model accuracy and training times. The models selected in this work have been chosen not only because they are among the most widely used in power system analysis, but also because they are highly optimized, allowing for fast training times. This ensures that the parameter estimation process remains both efficient and practical, particularly when large datasets or multiple simulations are involved. Moreover, the use of ML provides an additional advantage: it facilitates the integration of large amounts of simulation or measurement data, which would be difficult to exploit effectively with conventional approaches.

In summary, the method proposed in this paper provides significant support for WT modeling by combining the generalization capabilities of ML with the standardized structure of IEC 61400-27 models. The result is a robust and efficient methodology for accurately parametrizing WT models, enabling realistic and scalable simulations that are independent of the turbine manufacturer. The contributions of this paper with respect to previous works can therefore be summarized as follows:

A novel ML based methodology is proposed and validated for the parametrization of WT models compliant with the IEC 61400-27 standard.
A comprehensive review of commonly used ML and DL models in power systems is provided, emphasizing their key features and applicability.
The performance of various ML and DL techniques is evaluated in terms of accuracy and precision, using realistic simulation data.

The novelty of the present work is therefore multi-dimensional. At the problem level, this is, to the best of the authors’ knowledge, the first study to formulate the parametrization of IEC 61400-27 mechanical modules as a supervised learning problem and to validate it against manufacturer-provided real turbine models. At the methodological level, the work introduces an original end-to-end pipeline—from automated simulation-based database generation to domain-informed signal segmentation and multi-output regression—that is specifically designed around the physics of DFIG post-fault dynamics. At the practical level, the work provides a systematic and quantitative comparison of eight ML/DL architectures, demonstrating that data-driven approaches achieve substantially superior accuracy at a fraction of the computational cost. Finally, by anchoring the training data to the IEC 61400-27 generic framework, the methodology is inherently manufacturer-agnostic, enabling generalization across turbine types without the need for manufacturer-specific data or models. The paper is structured as follows: Section 2 discusses the use of ML and DL techniques in the study of power systems, highlighting their increasing importance and practicality in solving complex problems. Section 3 discusses the type of WT studied in this paper, providing insights on its working principles. Section 4 outlines the methodology used in this study, detailing the characteristics of the WT type under investigation, as well as the procedures and algorithms employed to obtain the results. Section 5 outlines the results obtained by the algorithms studied in this work, using both a synthetic testing database and a real WT model to validate the models developed. Finally, Section 6 provides a summary of the key conclusions of the study.

2. ML in Power Systems

Data has long been essential for operating and planning power systems. However, with the digitalization of these systems and advances in computing power, the use of large databases in power system operations has become standard. Large databases are particularly useful for predicting demand consumption, which is essential for TSOs to effectively plan the operation of power systems [12]. The extensive availability of data in the modern era has enabled the recurring use of ML and DL algorithms for the operation, planning and control of power systems and their components. The use of ML algorithms in power systems has been largely driven by the development of so-called smart grids, which are electric networks that combine digital devices to efficiently control power flow between generation and loads to improve the efficiency of the overall power system [13]. ML algorithms are widely used in smart grids to predict loads and energy prices, demand-side management, control power generation, or predict grid failures (in advance of their occurrence) among others [14]. However, ML and DL algorithms are not limited to smart grids; they are applied across a wide range of fields in electrical engineering. One of their most common applications is load forecasting, a complex task influenced by multiple factors, such as holiday periods, temperature and even the day of the week [15]. Despite its complexity, numerous models based on different architectures have been developed to predict aggregate energy demand, sometimes forecasting several days in advance [16].

Another field where ML and DL algorithms have gained significant relevance is in the prediction of wind and solar resources [17,18,19]. In recent years, substantial efforts have been made to reduce the carbon footprint of power systems, leading to the increasing integration of renewable energy sources. However, renewable generation, particularly wind and solar, faces a major challenge: its inherently variable and sometimes unpredictable nature. Accurate forecasting of these resources is therefore critical for operational planning, grid stability, and overall energy system efficiency. Regarding wind resource prediction, numerous models have been developed to capture the temporal and spatial variability of wind [20]. However, it has been observed in several studies that the quality and diversity of wind speed data, including effects from extreme weather events or measurement errors, significantly influence model accuracy [21]. Consequently, appropriate data preprocessing and quality control steps are essential to ensure reliable predictions. For solar resource prediction, variability is strongly influenced by seasonal changes, cloud cover, and local weather patterns. A wide variety of ML-based models have been developed to predict solar irradiance and power output under these conditions, demonstrating that accurate forecasts can be achieved in different locations with reasonable margins of error [22]. These advances illustrate how digitalization, combined with ML and DL techniques, supports improved energy management, enabling more efficient scheduling, dispatch, and integration of renewable energy resources into modern power systems. Overall, ML-driven forecasting represents a crucial component in enhancing the reliability, efficiency, and flexibility of grids with a high share of renewables.

The prediction of failures in both the power grid and the equipment that comprises it is another field where ML and DL algorithms have proven to be highly useful tools [23,24]. Within this field of study, a wide variety of works have addressed how to ensure the stability of the power system. As an example, Ref. [25] proposes a model to predict line trip faults, achieving an accuracy of up to 97%. Other works, such as [26], detail both the procedure for obtaining data and its preprocessing to maximize effectiveness in predicting faults that may occur in the power system. The use of these tools is of particular interest to TSOs, especially in power systems where the goal is to integrate a large amount of renewable energy without compromising the stability of the power grid [27]. It is also worth emphasizing the role that ML and DL algorithms currently play in the field of maintenance and fault prediction in WTs. Determining potential failures in a WT is a task of vital importance, as some of these failures can completely or partially hinder the proper operation of a WT [28,29].

Additionally, it is worth noting that these types of algorithms are beginning to be used even in the study of power grid behaviour during transient events, such as short circuits or voltage dips. The use of these tools for conducting such studies has proven to be of great interest, capable of rivalling classic simulations designed for this purpose [30]. It should be noted that the study of power systems, especially during transient events, relies on the use of specific models. These models may be insufficient to describe the particular behaviour of a power system during complex transient situations. This is why the combination of ML and DL algorithms with traditional simulation tools cannot only help improve the accuracy of these simulations but also significantly reduce simulation times [31]. Hence, based on what has been discussed in this section, it is evident that the use of ML and DL algorithms for the analysis of power systems is becoming increasingly important. Therefore, the use of these tools will undoubtedly be a key pillar in transitioning to stable power systems free of greenhouse gas emissions.

3. Type III WTs

Type III WTs are those that utilize a doubly fed induction generator (DFIG), with the rotor connected to the grid through a power converter (typically rated at 20% to 25% of the WT’s nominal power [32]). The presence of this converter is of critical importance for the operation of this type of WT, as it allows for the injection of an excitation current into the rotor, thereby aiding in the control of the current flowing through it. This enables effective regulation of the rotor’s rotational speed. Figure 2 shows a schematic of this type of WT, including the crowbar system, which protects the WT in the event of voltage dips.

Type III WTs offer a significant number of advantages, with their primary strengths being their exceptional ability to maximize wind energy capture efficiency and their comparatively lower cost when compared with WTs employing synchronous generators. This has led DFIG-based WTs to dominate the market over the past decade (especially for onshore applications) [33]. In some countries, such as Spain, this technology is notably the most prevalent, accounting for up to 70% of all WTs in the country [34]. Due to the widespread presence of this type of WT in power systems worldwide, the development of models capable of accurately representing the behaviour of these WTs under transient conditions becomes critically important to ensure the stability of the grid to which they are connected. The IEC 61400-27 standard establishes a generic modelling framework for all main WT types, from Type I to Type IV, and provides a manufacturer-independent representation of WT dynamic behaviour. For Type III (DFIG-based) turbines, the standard specifies a generic model composed of 16 interconnected submodules, including the mechanical (drive train) module, which is implemented using a two-mass representation. This modelling choice reflects the standard’s design philosophy of capturing the dominant electromechanical dynamics of the turbine with a minimal set of physically interpretable parameters, while maintaining sufficient accuracy for power system studies. While the overall structure of the IEC 61400-27 generic model is fixed, the adjustment of its parameters is of critical importance, as different parameter sets can lead to markedly different dynamic responses. Consequently, knowing the actual values of these parameters is essential for accurately simulating the behaviour of a specific WT. However, these parameters are typically not publicly available, as they constitute confidential manufacturer information, which significantly complicates the accurate parametrization of WT models in practice. Among the 16 submodules defined in the standard, the mechanical module is of particular relevance, as it governs the dynamics of the drive train and strongly influences the WT response immediately following a fault. In particular, it largely determines the frequency, amplitude, and damping of the post-fault power oscillations [35]. Mismatches in these parameters may result in unrealistic oscillatory behaviour, distorted active power responses, inaccurate assessment of transient stability margins, and erroneous evaluation of fault recovery dynamics, thereby compromising power system studies and control design. The use of IEC 61400-27 generic models as the basis for the ML and DL models developed in this work provides a key advantage: since the standard is explicitly designed to be manufacturer-agnostic, the resulting data-driven models are not tied to a proprietary or turbine-specific representation. Instead, they are exposed to a broad family of physically valid DFIG responses, which enhances their ability to generalize across turbines from different manufacturers without requiring retraining from scratch. This manufacturer-independent property constitutes one of the main motivations for selecting the IEC 61400-27 framework as the foundation of the proposed methodology. To represent the drive train dynamics, various mechanical models exist, including one-mass to six-mass representations. The IEC 61400-27 standard adopts the two-mass model (shown in Figure 3), as it offers a favourable trade-off between modelling accuracy and computational simplicity. This model is defined by four key parameters: the inertia time constant of the WT rotor (

H_{wtr}

), the inertia time constant of the generator (

H_{gen}

), the drive train stiffness (

k_{drt}

), and the drive train damping (

c_{drt}

). As discussed in [36], variations in these parameters directly affect the WT response following voltage dips, making their accurate estimation essential to ensure that simulated responses reliably reproduce real post-fault behaviour. The equations representing the behaviour of the two-mass mechanical model can be expressed as:

\begin{matrix} \{\begin{matrix} 2 H_{wtr} \frac{d ω_{wtr}}{d t} = T_{aero} - c_{drt} \cdot (ω_{w t r} - ω_{gen}) - k_{drt} \cdot (θ_{wtr} - θ_{gen}) \\ 2 H_{gen} \frac{d ω_{gen}}{dt} = - T_{gen} + c_{drt} \cdot (ω_{wtr} - ω_{gen}) + k_{drt} \cdot (θ_{wtr} - θ_{gen}) \\ \frac{d θ_{wtr}}{dt} = ω_{wtr}, \frac{d θ_{gen}}{dt} = ω_{gen} \end{matrix} \end{matrix}

(1)

In the following sections, the methodology used to characterize the mechanical parameters of a Type III WT through the application of ML and DL algorithms is discussed. The obtained models are tested to assess their ability to accurately identify the mechanical parameters of various WTs, thereby determining their validity. In addition, the developed models are evaluated using a synthetic testing database to minimize the risk of overfitting.

4. Research Methodology

As discussed in Section 1, while generic models for WT simulation are available, they often require a large number of precisely tuned parameters to accurately represent WT behaviour. However, obtaining these parameters can be challenging due to manufacturers’ privacy restrictions. To address this issue, this paper proposes a methodology for estimating key parameters essential for simulating WTs in transient conditions, using ML and DL algorithms, therefore facilitating the accurate modelling of real WTs. The subsequent subsections outline the research methodology, focusing on the critical aspects of this work. Figure 4 provides a graphical summary of the methodology employed to obtain the results presented in this paper, which are discussed in the subsequent sections.

4.1. Creation of Synthetic Database

The first step in training ML and DL algorithms capable of determining the mechanical parameters of a WT is to have a database that reflects active power responses of WTs with different mechanical parameters under voltage dip conditions. Since obtaining experimental databases that are sufficiently extensive and varied is practically impossible, the approach taken was to generate a synthetic database. To generate this synthetic database, it is first necessary to have an appropriate tool for simulating WTs using the generic models defined by the IEC 61400-27 standard. In this case, the software DIgSILENT PowerFactory was chosen, as it is one of the most widely used tools both at the level of TSOs and DSOs and in academic settings [37]. This software also includes the WT models defined according to the IEC 61400-27 standard, making it an ideal tool for use in this study. However, although DIgSILENT PowerFactory is capable of simulating WTs under transient conditions using the generic models defined in the IEC 61400-27 standard, manually creating a database that records the responses of WTs with different values of the mechanical module parameters is impractical. Nevertheless, DIgSILENT PowerFactory provides an Application Programming Interface (API) that allows users to interact with the program, thereby enabling the automation of tasks within DIgSILENT PowerFactory through the use of Python (version 3.11) scripts [38].

For the creation of the synthetic database, a Python script was developed to simulate the response of a Type III WT (modelled using the generic model defined by the IEC 61400-27 standard) in DIgSILENT PowerFactory via its API. The script generates the response of the WT to random voltage dips while randomly varying the mechanical model parameters using uniform sampling within ranges bounded by the most commonly reported values in the literature [36,39,40,41,42]. These parameter ranges are based on values reported for commercially deployed DFIG-based and therefore represent physically realistic operating conditions, ensuring that no numerical instability arises from pathological parameter configurations. The simulations are performed using a time step of 1 ms, which corresponds to the standard resolution employed in RMS power system simulations. Since electromagnetic transients are inherently averaged out in RMS simulations, the relevant system dynamics, namely, the post-fault mechanical oscillations driven by rotor and generator inertia and shaft stiffness, evolve on a much slower time scale. Consequently, reducing the simulation step below 1 ms would not provide additional information relevant to the parameter estimation task, while significantly increasing computational cost.

The voltage dips used for the construction of the database were generated through a script that simulates voltage dips with characteristics representative of those typically observed in the field [43,44,45], thereby ensuring that the performed simulations accurately reflect the behaviour that can be encountered in real-world operating conditions. Figure 5 illustrates five of the voltage dips employed in the database generation, showing variations in both depth and duration. This ensures that the models to be trained are not biased toward a single type of voltage dip. The database consists of 10,000 active power responses, each corresponding to simulations with a total duration of 10 s and a simulation time step of 1 ms. The results of these simulations are stored in a database along with the values of the mechanical model parameters associated with each simulation. This database is subsequently used to train the ML and DL algorithms employed in this study after the data is processed, a point elaborated upon in the following section.

It is important to emphasize that the use of the generic WT models defined in the IEC 61400-27 standard does not limit the generalization capability of the proposed ML/DL models. On the contrary, the purpose of the IEC 61400-27 models is precisely to provide standardized representations that capture the dynamic behaviour of WTs independently of the manufacturer. By relying on this standard, the trained models are not tied to a specific implementation or proprietary structure, but are instead exposed to dynamics that have been designed to faithfully reproduce the response of a wide range of real WTs. Therefore, training on IEC 61400-27 generic models ensures that the developed ML/DL approaches are extensible and applicable to turbines from different manufacturers, rather than being biased toward a single technology-specific representation.

4.2. Data Processing

Once the synthetic database was generated, it needed to be adapted to facilitate efficient training of the ML and DL algorithms employed in this study. As discussed in Section 4.1, the dynamic simulations were performed with a time step of 1 ms to ensure maximal numerical accuracy, resulting in 10,000 time steps per sample and a correspondingly large dataset that would substantially increase training times. To mitigate this, the signals were downsampled to a 10-ms time step, significantly reducing the dataset size without compromising critical information, as illustrated in Figure 6. This approach is justified as the study specifically addresses post-fault mechanical oscillations, predominantly dictated by the rotor and generator inertias and shaft dynamics, which manifest at comparatively low frequencies. Consequently, high-frequency components, typical of electrical transients, exert negligible influence on the mechanical response, ensuring that the downsampling process retains all information essential for accurate parameter estimation.

Another point worth emphasizing is the segmentation of the signals carried out. Initially, the complete active power response of WTs to a voltage dip can be considered as input data for the models. However, this approach presents significant drawbacks. It is important to note that using the entire signal would neglect the influence of other model parameters on the WT’s behaviour during a voltage dip, which would be illogical. Therefore, to train the ML and DL algorithms, a segment of the system response that primarily depends on the values of the mechanical model parameters, rather than other parameters of the WT model, must be extracted. As discussed in [36], the mechanical model parameters have the greatest influence on the post-fault response of WTs. Consequently, this post-fault period of the response is used to train the ML and DL algorithms. As shown in Figure 7, the influence of the mechanical parameters is much greater in the post-fault response of the system, where other model parameters have lesser impact. This demonstrates that the post-fault response of WTs is the ideal segment of the response to feed into the algorithms latter discussed in Section 4.3. In Figure 7, the region corresponding to the post-fault response is shaded. The data within this region serve as input to the ML and DL models used in this study. To automatically identify the post-fault portion of each active power signal, a gradient-based detection script was developed. The algorithm computes the first-order finite difference of the discrete-time active power signal,

Δ P (k) = P (k + 1) - P (k)

, and identifies the fault onset index

k_{s t a r t}

as the first time step at which

Δ P (k)

falls below a negative threshold

- θ

, corresponding to the sharp power drop caused by the voltage dip. The fault clearance index

k_{end}

is then detected as the first time step after

k_{start}

at which

Δ P (k)

exceeds a positive threshold

+ θ

, indicating active power recovery following voltage restoration. The detection threshold is defined as a fixed fraction of the signal dynamic range,

θ = α \cdot (\max (P) - \min (P))

, with

α = 0.05

, ensuring robustness across simulations with different pre-fault power levels. Once

k_{end}

is identified, the post-fault segment

[k_{end}, k_{end} + N]

is extracted and stored in a new database, where N corresponds to a fixed window of 500 samples. This window length is sufficient to capture the relevant post-fault mechanical oscillations while excluding steady-state behavior, thereby improving the accuracy of the ML and DL models by focusing exclusively on dynamics influenced by the mechanical parameters and significantly reducing the input vector length, which in turn accelerates the training process.

Another important point in data processing is the scaling of the mechanical parameters. In this study, the problem addressed is a regression problem, since the values of four numerical parameters must be predicted based on the response of a WT to a voltage dip, so an appropriate loss function for this case was defined, this being the mean squared error (MSE), the expression of which is presented in Equation (2) (being

Y_{i}

the objective and

\hat{Y_{i}}

the parameters predicted by the model). If ranges of the mechanical model parameters presented in Table 2 are observed, it can be seen that the order of magnitude of the parameters is not the same. Consequently, it is for this reason of utmost importance to scale these parameters in order to improve the convergence of the algorithms used in this work, otherwise the models will be unable to efficiently correct the errors of the predictions, thus making it impossible to obtain accurate models (further information on the importance of data scaling can be found in [46]). For the scaling of the parameters, a standard scaler was used, which converts the data based on the standard deviation (

σ

) of the parameters and their mean (

μ

), as represented in Equation (3). Additionally, the data used in this study is divided into a training set and a testing set following an 80-20 ratio, aiming to mitigate the risk of potential overfitting.

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - \hat{Y_{i}})}^{2}

(2)

z = \frac{x - μ}{σ}

(3)

4.3. ML and DL Model Training

As discussed in Section 2, ML and DL tools are playing an increasingly key role in the field of power systems. In this paper, some of the most common algorithms are used to determine the characteristic mechanical parameters of Type III WTs based on their response to voltage dips. By developing these models, the modelling of this type of WT is improved, ensuring that their real behaviour during grid transient conditions aligns with simulations. This, in turn, aids in more accurate and precise planning of their contribution to maintaining grid stability. In the following subsections, the ML and DL algorithms used in this paper are discussed, along with considerations regarding their application and use in the field of power systems. All models used in this paper are trained using both a CPU (Central Processing Unit), Intel(R) Core(TM) i7-5820K CPU @ 3.30 GHz, and a GPU (Graphics Processing Unit), NVIDIA T4 in order to compare the training times for each model on the two devices.

4.3.1. Gradient Boosting

Gradient boosting algorithms are some of the most widely used ML algorithms due to their high accuracy and fast training times [47], which have made them the preferred choice for dealing with problems where a large amount of data is available in tabular form. Boosting algorithms are also of special importance for the study of power systems, especially for the forecasting of electricity generation and demand, including wind and solar resource forecasting [48]; an extensive review on boosting algorithms in energy research can be found in [49]. The operational principle of this type of algorithm is to combine a series of weak predictors, which, through their combination, form a model capable of capturing complex behaviours (these kinds of algorithm are defined as ensemble learning algorithms) [50]. These algorithms are based on the iterative creation of decision trees, with each new decision tree correcting the errors of the preceding one. As described in Equation (4), this type of algorithm combines a number of weak learners (M), each correcting the errors of the previous one, to create a final model (H). Each of these models is distinct (

h_{m}

), depending on the input variables (X) and the respective weight vector (

w_{m}

), each weak learner contribution being controlled by the learning rate (v).

H (X) = \sum_{m = 1}^{M} v * h_{m} (X, w_{m})

(4)

There are numerous implementations of this type of algorithm, each with its own distinctive characteristics. In this study, three different gradient boosting packages are employed to compare the results obtained with each of them, thereby determining which yields the best performance. The boosting algorithms used in this work correspond to the most widely utilized packages in data science, namely XGBoost, LightGBM, and CatBoost [49]. XGBoost (which was first proposed in [51]) is one of the most extensively used boosting algorithms today, as it has proven to be highly effective in solving a wide range and variety of problems [47]. This algorithm constructs decision trees in a level-wise (horizontal) manner, identifying the optimal splits within each decision tree. LightGBM is another popular algorithm (first proposed in [52]), developed by Microsoft with the aim of creating a highly efficient boosting algorithm capable of building models that can handle large volumes of data in the shortest possible time without sacrificing accuracy. Although this algorithm has some similarities with XGBoost, there are significant differences in the way decision trees are constructed. The primary distinction is that LightGBM builds decision trees in a leaf-wise (vertical) manner rather than a level-wise (horizontal) one, thereby enabling the algorithm to efficiently process larger amounts of information [50]. The differences between LightGBM and XGboost are highlighted and discussed in [53]. Catboost (first proposed in [54]) is another frequently deployed boosting algorithm in data science. This algorithm constructs decision trees in a symmetric manner, meaning that all the decision trees it employs are identical. This algorithm offers significant advantages when handling categorical data; however, it is also extensively used even when working exclusively with numerical data due to its optimization for execution on GPUs. An extensive technical comparative analysis of these algorithms can be found in [50].

In this paper, XGBoost, LightGBM, and CatBoost models were employed with two main objectives: first, to assess which algorithm best reproduces the active power response of WTs during voltage dips, and second, to compare the training efficiency of each method. Since the predictive capability of gradient boosting models is highly sensitive to the choice of hyperparameters, an adequate tuning process was required. To this end, a randomized search strategy was adopted [55], in which candidate hyperparameter combinations were randomly sampled from predefined ranges (Table 3). Compared to an exhaustive grid search, randomized search allows the exploration of a wider variety of configurations while significantly reducing computational cost, which is critical given the large dataset and multiple models considered. For each sampled configuration, cross-validation was performed to evaluate predictive accuracy and stability, and the best-performing candidates were further analysed to ensure consistency in both error reduction and convergence speed. The final selection of hyperparameters balanced two aspects: minimizing the prediction error with respect to measured active power signals, and keeping training times within practical limits. This systematic tuning procedure ensured that the comparison across models was fair and that the reported performance improvements could be attributed to the intrinsic characteristics of each algorithm rather than suboptimal parameter choices.

4.3.2. Support Vector Machine (SVM)

SVMs are a highly powerful algorithm for handling small to medium-scale datasets, capable of being applied to both regression and classification tasks interchangeably. This type of algorithm was first proposed in [56], and despite initially being met with scepticism, it has become one of the most widely used ML algorithms in the world. When applying SVMs to regression problems, the task reduces to fitting a function within a tolerance margin while simultaneously minimizing the error as much as possible. The behaviour of SVMs is primarily defined by two key parameters: the C parameter and the

ϵ

parameter. The

ϵ

parameter determines the tolerance margin for the function that fits the data used by the model. If a prediction falls within this tolerance margin, it is not it is not penalized, but it is if it falls outside this range (values outside this tolerance band are referred to as support vectors). This can be clearly seen by observing the loss function, which is represented in Equation (5). On the other hand, the C parameter defines the balance between the model’s complexity and the margin of violation. Specifically, higher values of C emphasize minimizing errors, resulting in more complex models. Conversely, lower values of C lead to simpler models that may produce larger errors but are better at generalizing.

L_{ϵ} (Y, \hat{Y}) = m a x (0, | Y - \hat{Y} | - ϵ)

(5)

SVMs have been extensively applied in the field of power systems, particularly in areas such as power quality analysis [57,58], detection of voltage dips and fault causes [59], power system stability during transient events [60] and electric load forecasting [61], among others. In this study, SVMs are used to evaluate their effectiveness in predicting the parameters of the mechanical model of WTs, comparing the results with those obtained from the boosting algorithms discussed in the previous section. The hyperparameters of this model were obtained using the same process as with the gradient boosting algorithms, employing a random search, whose results were subsequently refined (Table 3).

While SVMs are powerful tools for small-to-medium scale datasets and offer strong generalization through their margin-based formulation, their representational capacity is inherently limited when the input dimensionality is very high. This motivates the exploration of more complex architectures, which are designed to learn hierarchical feature representations directly from high-dimensional sequential data without requiring manual feature engineering.

4.3.3. Neural Networks (NNs)

NNs have a long history, with the foundational concepts being introduced in the 1940s and 1950s [62], and the algorithms governing their operation being developed, expanded, and optimized during the 1980s [63]. Although NNs were relegated to limited use for an extended period, in recent years, they have gained significant prominence in both academia and industry due to their remarkable adaptability and ability to solve a wide range of problems, including time series forecasting, image classification and natural language processing, among others [64]. The great versatility of NNs to adapt to a wide range of problems has made them widely used tools in power system analysis. Within power system analysis, NNs have been used for energy demand and generation prediction [65], wind and solar resource prediction [66], energy management [67], economic dispatch [68] and frequency analysis and control [69]. A large number of NNs types exist, each with its own architecture that is best suited to certain problems. A comprehensive review of the different types of NNs and architectures most commonly used in academia and industry can be found in [70]. In this paper, four of the most widely used NN architectures are tested, namely multi-layer perceptron (MLP), recurrent neural network (RNN), long short-term memory (LSTM) and gate recurrent unit (GRU). All the DL models used in this paper are trained for 100 epochs, using batches of 128 elements.

MLPs are of the simplest types of NNs; however, they allow for the real-time or online learning of complex, non-linear models [71]. The architecture of a MLP first consists of an input layer that receives the input data for the model. The data is processed within the NN by passing through one or more hidden layers (fully connected to each other), with the result then being returned through the output layer (see Figure 8). The training process of this type of NN is divided into two distinct stages: forward propagation and backpropagation. During forward propagation, random weights and biases are assigned to each layer, producing a defined output for each neuron according to Equation (6) (where

α^{l}

represents the output of each layer,

σ

represents the activation function,

W^{l}

represents the weight matrix at the layer, and

b^{l}

represents the bias vector at the layer) until the output results are obtained in the output layer. Once forward propagation is completed, the error in the output layer is calculated by comparing the difference between the predicted values and the actual values. Based on this result, the errors in each hidden layer are computed according to Equation (7), where

δ^{l}

represents each layer error. Subsequently, the gradients are calculated, and the weights and biases are updated to minimize the error. This process is repeated for all samples and for a user-defined number of iterations (epochs) to achieve the lowest possible error and, consequently, the most accurate model. In [72], a detailed explanation of the forward and backpropagation algorithms is provided, with particular emphasis on the matrix calculations involved.

α^{l} = σ (W^{l} α^{l - 1} + b^{l}) = σ (z^{l})

(6)

δ^{l} = ({(W^{l + 1})}^{T} δ^{l + 1}) ⊙ σ^{'} (z^{l})

(7)

In the MLP used in this study, 6 hidden layers are used, with 1024, 800, 400, and 200 neurons, respectively, and the output layer consisting of 4 neurons, as the model is required to predict the values of 4 parameters. The activation function used in the hidden layers is the rectified linear unit (ReLU), which is defined according to Equation (8). The use of this activation function enables the NN to recognize nonlinear functions in a straightforward manner. The Adam optimizer was selected, as it yielded more satisfactory results, which is consistent with findings reported in other studies [73].

ReLU (x) = \{\begin{matrix} x, & if x > 0 \\ 0, & otherwise \end{matrix}

(8)

Although MLPs are capable of adapting easily to a wide variety of problems, they may be unsuitable when the NN needs to detect patterns in time series. For such cases, the use of RNNs is far more appropriate, as they enable the detection of temporal relationships within the input data provided to the model [74]. RNNs process each element of the sequence sequentially, retaining memory of previous elements to enable their reuse when analysing subsequent elements in the sequence (a diagram of a RNN is provided in Figure 9). The behaviour of RNNs is defined based on Equation (9), in which

h_{t}

represents the hidden state,

x_{t}

represents the input and

h_{t - 1}

represents the hidden state of the previous iteration. Each recurrent neuron has two sets of weights for both the input (

W_{i h}

) and another for the output of the previous time step (

W_{h h}

). Although RNNs are highly useful for studying time series, this type of architecture suffers from the vanishing gradient problem [75], making them less suitable for working with very long data sequences. It is for this reason that a new architecture capable of handling long time series was proposed in [76], leading to this type of model becoming widely adopted within the scientific community.

h_{t} = σ (x_{t} W_{i h}^{T} + b_{i h} + h_{t - 1} W_{h h}^{T} + b_{h h})

(9)

LSTM networks were the architecture that enabled the efficient analysis of long data sequences. The structure of LSTMs is more complex than that of RNNs, as LSTM cells are composed of three types of gates: the forget gate (

f (t)

), the input gate (

i (t)

), and the output gate (

o (t)

) [47]. The forget gate allows the cell to reset itself, controlling which information is retained and which is discarded. The input gate is responsible for updating the cell state, while the output gate updates the values of the hidden units (a diagram of a LSTM is provided in Figure 10). The behaviour of LSTMs is defined based on Equation (10), in which, in addition to

i_{t}

,

f_{t}

and

g_{t}

, the effect of the cell state (

c_{t}

) can be observed. A detailed explanation of these types of cells can be found in [74].

\begin{matrix} i_{t} & = σ (W_{i i} x_{t} + b_{i i} + W_{h i} h_{t - 1} + b_{h i}) \\ f_{t} & = σ (W_{i f} x_{t} + b_{i f} + W_{h f} h_{t - 1} + b_{h f}) \\ g_{t} & = \tanh (W_{i g} x_{t} + b_{i g} + W_{h g} h_{t - 1} + b_{h g}) \\ o_{t} & = σ (W_{i o} x_{t} + b_{i o} + W_{h o} h_{t - 1} + b_{h o}) \\ c_{t} & = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ g_{t} \\ h_{t} & = o_{t} ⊙ \tanh (c_{t}) \end{matrix}

(10)

Although LSTM networks are capable of achieving highly accurate results, they typically require substantial training time due to their large number of parameters. To address this issue, in [77] a new type of cell was proposed: the GRU. These cells (see Figure 11) represent a simplified version of LSTM cells, capable of delivering comparable accuracy to LSTMs while significantly reducing training time. In these cells, a reset gate (

r_{t}

), an update gate (

z_{t}

), and an additional new gate (

n_{t}

) are used. The outputs of these gates are used to compute the hidden state according to Equation (11).

\begin{matrix} r_{t} & = σ (W_{i r} x_{t} + b_{i r} + W_{h r} h_{(t - 1)} + b_{h r}) \\ z_{t} & = σ (W_{i z} x_{t} + b_{i z} + W_{h z} h_{(t - 1)} + b_{h z}) \\ n_{t} & = \tanh (W_{i n} x_{t} + b_{i n} + r_{t} ⊙ (W_{h n} h_{(t - 1)} + b_{h n})) \\ h_{t} & = (1 - z_{t}) ⊙ n_{t} + z_{t} ⊙ h_{(t - 1)} \end{matrix}

(11)

The implementation of NNs can be carried out either manually, by programming them from ground up, or by utilizing dedicated packages designed for this purpose. There currently exists a wide array of libraries specifically intended for this use, with notable examples including PyTorch, TensorFlow, Keras, and Flax, among others. Each of these packages presents its own advantages and disadvantages. In [78], a review discussing the differences among various DL learning frameworks is presented. In this paper, the NNs utilized are implemented using the PyTorch framework to illustrate how integrating conventional simulation methodologies with advanced DL learning approaches can improve the modeling and simulation of WTs under transient operating conditions. PyTorch framework was selected from among all available options primarily due to its high flexibility and widespread adoption in both industry and academia. In fact, this DL learning framework is currently the most widely used in the world [79]. For all DL models developed in this study, a consistent training configuration was adopted to ensure fair comparison and reproducibility. Specifically, the optimizer used for all models is Adam with an adaptive learning rate starting at 0.001, the loss function is mean squared error (MSE), training is performed over 100 epochs, and the batch size is set to 128. These hyperparameters were carefully selected based on preliminary experiments to ensure convergence, stability, and high predictive accuracy across all architectures. While dropout was not employed in the present study due to the relatively small network sizes and the stability of the training process, the chosen configuration has been sufficient to achieve robust performance in all tested scenarios.

It is important to acknowledge that the comparison between gradient boosting models and NN architectures involves structural differences that must be interpreted carefully. Tree-based ensemble methods, such as XGBoost, LightGBM, and CatBoost, are optimized through greedy, stage-wise procedures that are inherently stable and do not suffer from gradient explosion or vanishing gradient phenomena. In contrast, recurrent architectures such as RNNs and LSTMs are trained via backpropagation through time, which is susceptible to gradient instability, particularly for long input sequences. To mitigate this, gradient clipping was applied during the training of all recurrent models, with a maximum gradient norm of 1.0. Regarding the use of the Adam optimizer for all DL models, while it is true that Adam’s adaptive learning rate benefits architectures with heterogeneous gradient magnitudes (such as LSTMs) more directly than shallower networks, preliminary experiments confirmed that Adam outperformed SGD and RMSProp across all tested architectures in terms of convergence speed and final accuracy. Its use for all DL models therefore represents a consistent and empirically justified choice. The gradient boosting models, by contrast, employ their own internal optimization procedures (boosting iterations with Newton-step corrections for XGBoost and CatBoost, and histogram-based gradient steps for LightGBM), which are entirely decoupled from the Adam optimizer.

4.4. Result Analysis

Having trained all the models used in this study, their performance is evaluated in two different ways. First, the error rate produced by these models when applied to the synthetic testing database, which was previously created, is examined. For this purpose, the MSE, root mean square error (RMSE), and mean absolute error (MAE) are compared, as described by Equations (2), (12) and (13) respectively. Values close to zero for these errors indicate that the developed models are capable of perfectly adapting to the testing database.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - \hat{Y_{i}})}^{2}}

(12)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - \hat{Y_{i}} |

(13)

Additionally, the proposed models are evaluated in terms of their capability to accurately predict the mechanical parameters of two WTs, whose mechanical representation is based on the generic framework defined by the IEC 61400-27 standard and implemented in DIgSILENT PowerFactory. The WT models are subjected to several random voltage dips, determining the average accuracy of the proposed models in characterizing the parameter values of the mechanical model. In addition, and in order to assess the robustness of the developed models against measurement noise, it is studied how the accuracy of the parameter estimation task is affected when external noise is introduced into the active power signals. This type of disturbance is representative of the uncertainties and imperfections typically observed in field measurements. For each of the two WT models under study, three scenarios are considered: noise-free signals, signals corrupted with 1% noise, and signals corrupted with 3% noise with respect to the total range of the signal. The injected noise is implemented as additive Gaussian noise, whose standard deviation is proportional to the dynamic range of the signal (s). Specifically, for a given discrete-time signal

s = s_{1}, s_{2}, \dots, s_{N}

the signal with noise (

\tilde{s}

) is calculated using Equation (14), where the standard deviation (

σ

) is defined using Equation (15), where p denotes the percentage of noise added to the original active power signal. This formulation ensures that the perturbation level remains consistent with the scale of the signal under analysis, thereby providing a realistic representation of measurement uncertainties in practical conditions.

{\tilde{s}}_{i} = s_{i} + ϵ_{i}, ϵ_{i} \sim N (0, σ^{2}), i = 1, \dots, N

(14)

σ = \frac{p}{100} (\max (s) - \min (s))

(15)

Finally, the training times of each model are analysed, with particular emphasis on the trade-off between training time and predictive accuracy. The two additional WTs used to validate the accuracy and robustness of the proposed models are:

A Siemens-Gamesa SG 2.1-114 WT with a rated power of 2.1 MW, whose model was provided by the manufacturer and validated by demonstrating its capability to reproduce the real turbine behaviour under transient conditions.
A Siemens-Gamesa G52 WT with a rated power of 850 kW, whose model was validated using field measurements in [45], ensuring that its dynamic response accurately reflects the real turbine performance under transient disturbances.

It is worth noting that these two WTs feature mechanical models with completely different parameter values, reflecting significant variations in their drive train characteristics. Furthermore, the parameters of the remaining submodules (electrical, control, and protection) also differ substantially between the two machines. This diversity provides a rigorous test scenario that goes beyond a single turbine configuration, thereby strengthening the validation of the proposed methodology. By successfully predicting the mechanical parameters under such varied conditions, the developed ML and DL models demonstrate their capability to generalize and effectively characterize the mechanical dynamics of any DFIG-based WT, regardless of its size, manufacturer, or control setup. Additionally, as mentioned in Section 4.3, the training times of the proposed ML and DL models are compared on both the CPU and GPU in order to determine which of the two devices is more suitable for this purpose.

5. Results and Discussion

As previously mentioned in Section 4.4, this section presents the results obtained from training the proposed models, which are detailed in Section 4.3. First, an analysis of the errors exhibited by the models when tested against a synthetic testing dataset is conducted. This dataset was not used during the training process, with the aim of evaluating the models’ ability to generalize effectively. Subsequently, the capability of the proposed models to determine the mechanical parameters of two different WT models is examined. The WT models are simulated under several random voltage dips. Finally, the training times of each model are discussed, with a focus on the relationship between training time and accuracy.

5.1. Model Error Assessment

All the models used in this study were tested using a portion of the synthetic database specifically designated for this purpose. To evaluate the predictive capability of the proposed models, information regarding the MAE, MSE and RMSE was recorded for each model and for each of the four parameters of the two-mass mechanical model. The values of these errors for each model and parameter are presented in Figure 12, Figure 13 and Figure 14. If Figure 12 is first analysed, it can be observed that the MAE produced by most of the the models is low, with the MLP standing out as the model with the lowest average MAE among all the models. Upon examining Figure 12, it is also evident that the only model unable to accurately predict the values of the mechanical parameters is the RNN, with this model producing errors up to seven times greater than those of the other models. Figure 13 and Figure 14 once again demonstrate that the MLP model exhibits superior capability in accurately predicting the values of the mechanical model parameters, closely followed by the results obtained for the LSTM and GRU models. Additionally, it is confirmed that the RNN model is unable to accurately predict the values of the parameters of the two-mass mechanical model. The poor performance of the RNN model in estimating the mechanical parameters can be attributed to its inability to capture long-term dependencies in relatively long input sequences. The post-fault active power signals used in this study consist of 500 time steps, resulting in sequences that exceed the effective memory capacity of standard RNNs. In such networks, the hidden state is recursively updated at each time step as a function of the current input and the previous hidden state, causing gradients of the loss function with respect to early hidden states to decay exponentially through the chain rule, a phenomenon known as the vanishing gradient problem. As a result, the network progressively loses memory of information contained in the early portion of the sequence, leading to poor convergence and inaccurate predictions. This limitation is particularly detrimental for the parameter estimation task, as the initial cycles of the post-fault oscillations, encoded in the first samples of the sequence, carry the most discriminative information about key parameters such as the rotor inertia

H_{w t r}

. In contrast, LSTM and GRU architectures incorporate gating mechanisms that allow the network to selectively retain or discard information over extended temporal horizons, enabling them to effectively capture these long-term dependencies and achieve significantly higher accuracy despite having a similar recurrent structure. An analysis of Figure 14 further shows that

H_{w t r}

is consistently estimated with the lowest error among the parameters considered. This result is physically intuitive, since the rotor inertia exerts a dominant influence on the post-fault dynamic behaviour of the WT [36], while other parameters, such as the drivetrain damping

c_{d r t}

, affect the response in a more subtle and distributed manner. Moreover, the rotor inertia is approximately one order of magnitude larger than the generator inertia (see Table 2), which explains why the generator’s contribution to the observed dynamics is comparatively less pronounced and more difficult to isolate.

Based on the results presented in Figure 12, Figure 13 and Figure 14, it can be concluded that all the models are capable of accurately predicting the values of the parameters of the two-mass mechanical model using the synthetic testing database, with the exception of the RNN model. The MLP model produces the lowest errors across all parameters, closely followed by the LSTM model and the GRU model. The gradient boosting-based models exhibit very similar errors, with CatBoost showing a slightly higher accuracy compared to LightGBM and XGBoost. SVM shows an accuracy similar to the gradient boosting methods.

5.2. Predictive Capability on Real System

In the previous section, the effectiveness of the models developed in this work was evaluated using the synthetic testing database. In this section, meanwhile, the performance of the models in determining the mechanical parameters of both a Siemens-Gamesa SG 2.1-114 WT and a Siemens Gamesa G52 are discussed. To this end, ten simulations were conducted by subjecting the model representing this WT (a model structured according to IEC 61400-27) to ten random voltage dips, with depths ranging between 0.9 and 0.25 pu and durations between 0.5 and 1 s. To evaluate the predictive capability of the models developed in this paper, heatmaps were created, which are shown in Figure 15 and Figure 16. These heatmaps represent, for each model studied, the average accuracy achieved for each of the parameters of the two-mass mechanical model corresponding to the Siemens-Gamesa SG 2.1-114 WT and Siemens-Gamesa. Due to the confidentiality of the data for these models, Figure 15 and Figure 16 present all the values of the mechanical parameters in per unit (pu). A value of 1 pu indicates that the predicted parameter value exactly matches the real parameter value. Values greater than 1 pu indicate that the predicted parameter is higher than the real value, while values less than 1 pu indicate that the predicted parameter is lower than the real value. It is worth emphasising that the validation performed using the two manufacturer-provided WT models serves simultaneously as a physical consistency check for the proposed methodology. The accurate prediction of mechanical parameters that successfully reproduce the transient active power behaviour of real turbines provides strong evidence that the models are not producing physically implausible estimates. Moreover, since the models were trained exclusively on data generated within the physically validated parameter ranges of Table 2, out-of-range predictions are prevented by construction.

Upon analyzing the results presented in Figure 15 and Figure 16, it is evident that all the models are capable of accurately predicting the values of the mechanical model parameters, with the exception of the RNN model (which aligns with the results obtained in the previous section). It is also observed that the parameter

H_{g e n}

is the most challenging to predict accurately, due to its relatively minor influence compared to the parameter

H_{w t r}

. Additionally, it is noteworthy that the RNN model, despite being the least accurate of all the models, is capable of determining the parameters

H_{w t r}

and

k_{d r t}

with reasonable precision, as these parameters have the most significant influence on the post-fault behaviour of the WT compared to

H_{g e n}

and

c_{d r t}

. Based on these results, it is therefore confirmed that the models developed in this work are capable of inferring the values of the mechanical model parameters accurately based on the active power response of a Type III WT during a voltage dip. If Figure 15 and Figure 16 are analysed in detail, it can be observed that although noise has a noticeable impact on the ability of the models to accurately predict the mechanical parameter values of both WT models, the overall performance remains fairly robust. In fact, the employed models demonstrate a remarkable degree of resilience to noise, maintaining a high level of accuracy even under relatively high noise conditions. Furthermore, for both WT models under study, it becomes evident that ML approaches, particularly those based on boosting techniques, are considerably less sensitive to noise compared to DL models. This difference can be explained by the inherent structure of ensemble tree-based methods, which perform implicit averaging over multiple weak learners, naturally reducing sensitivity to small perturbations in the input data. In contrast, NNs directly propagate input variations through multiple layers, making them more susceptible to high-frequency noise unless additional regularization or filtering is applied. In particular, the higher sensitivity of the MLP to input noise, compared to tree-based models, is a direct consequence of its architecture. In an MLP, the input signal is processed as a flat vector, and each element is connected to all neurons in the first hidden layer via trained weights. When noise is added to the signal, the perturbation at each time step propagates linearly through these weights into the first hidden layer, and then nonlinearly through subsequent layers via the ReLU activations. The cumulative effect of 500 noisy inputs, each contributing small activation perturbations, can shift the intermediate representations away from those learned during training, leading to degraded output accuracy. In contrast, tree-based models perform splits based on thresholds applied to individual features or aggregations thereof, and the ensemble averaging across many trees acts as an implicit low-pass filter that attenuates the influence of small, high-frequency noise on the final prediction.

5.3. Computational Efficiency

The results presented thus far demonstrate that the proposed models are capable of inferring, with varying degrees of accuracy, the mechanical parameters of Type III WTs by analysing the active power response during a voltage dip. However, while the accuracy of the models is a critical factor, the training time of the different models is also a determining factor. As previously mentioned in Section 4.3, the models developed in this work were trained on both a CPU and a GPU. The corresponding training times for each model, for both CPU and GPU, are presented in Figure 17 and Figure 18.

Figure 17 presents the training times for the different ML models on both the CPU (blue) and the GPU (red). Examining Figure 17, it is evident that training all the models on the GPU drastically reduces the training times. For example, the SVM model takes a total of 159 s to train using the CPU, but this is reduced to 27.5 s (representing an 82.7% reduction in training time) when using the GPU. It is also notable that Figure 17 shows a significant difference in training times between the different gradient boosting-based models. It is observed that the model using LightGBM can be trained in 6.8 s on the CPU and 4.23 s on the GPU, which are extremely short times. Reviewing Figure 12, Figure 13 and Figure 14, it is evident that LightGBM does not produce significantly higher errors compared to the other models, despite being the fastest. Therefore, in terms of efficiency, LightGBM proved to be the most suitable option in this case. On the other hand, the substantially higher training time of CatBoost compared to LightGBM and XGBoost, despite only marginal accuracy improvements, is attributable to its symmetric tree construction strategy. While this approach enhances generalization and regularization by building balanced trees in which all nodes at a given depth share the same splitting condition, it also increases the number of candidate splits evaluated at each boosting iteration. Additionally, CatBoost implements an ordered boosting scheme designed to reduce prediction bias during training, which requires maintaining separate model estimates for each data sample. This combination of oblivious tree construction and ordered boosting results in a higher per-iteration computational cost relative to the leaf-wise (LightGBM) or level-wise (XGBoost) strategies, explaining the observed training time discrepancy. Figure 18 presents the training times for the DL models used in this paper, it is noteworthy that Figure 18 does not include the CPU training times for the RNN, LSTM and GRU models. This is because training these models on a CPU resulted in extremely long training times (due to the sequential processing nature of these cells), with training times in the order of hours. From Figure 18, it can be observed that the shortest training time was that of the MLP, while the longest was that of the LSTM (due to its higher number of parameters). It is noteworthy, therefore, that the fastest DL model to train in this study (MLP) is also the one that achieves the highest accuracy results. Finally, it should also be highlighted that inference times for all the evaluated models are in the order of milliseconds. Therefore, the computational burden associated with inference is negligible compared to the training phase, and does not represent a practical limitation for real-time deployment or integration into control or monitoring systems.

Based on these results, Figure 19 and Figure 20 were constructed to illustrate the relationship between training time and RMSE on the test set for models trained on CPU and GPU, respectively. Figure 19 includes the PowerFactory parameter identification tool (represented by the grey marker) as a benchmark for comparison with the proposed ML and DL approaches. The parameter identification module in PowerFactory is a conventional optimization-based framework widely used in industrial and academic studies to automatically estimate model parameters by minimizing the error between simulated and measured signals. It relies on derivative-free optimization routines such as the Nelder–Mead simplex algorithm to iteratively search for a parameter set that best reproduces the system dynamics. Despite being a widely adopted tool, the results shown in Figure 19 demonstrate that the traditional parameter identification approach in PowerFactory exhibits substantially inferior performance in both accuracy and computational efficiency when compared to data-driven methods. Specifically, PowerFactory achieves an RMSE of approximately 0.46, which is significantly higher than all ML and DL models evaluated in this study. This lower performance is linked to the intrinsic limitations of optimization-based parameter identification when applied to highly nonlinear power system models. WT control loops involve complex interactions among mechanical and electrical subsystems, which often lead to non-convex optimization landscapes, multiple local minima, and sensitivity to initial values. As a result, algorithms such as Nelder–Mead may converge to suboptimal solutions or require extensive iteration time to reach an acceptable estimate. In contrast, ML and DL models inherently learn complex nonlinear mappings directly from data without requiring explicit mathematical formulations of the underlying physical processes. This characteristic enables superior generalization capabilities and allows the proposed models to efficiently capture parameter dependencies across a wide range of operating scenarios, disturbances, and turbine configurations.

From Figure 19, it can be observed that the MLP model achieves the highest accuracy while requiring a shorter training time than CatBoost, SVM, and the traditional PowerFactory approach. The comparison with PowerFactory is particularly revealing, as the MLP reduces RMSE by approximately 72% while requiring only about 10% of the training time. Nevertheless, in accordance with the previously analysed results, the MLP model exhibits a higher sensitivity to noise in the active power signal, such that its accuracy strongly depends on whether the signal under study has been preprocessed or filtered. Figure 19 further shows that LightGBM is the fastest model to train, while providing levels of accuracy comparable to the other ML algorithms and vastly superior to the conventional parameter identification method. Figure 20, which presents the same analysis for models trained on GPU, indicates that ML models display similar training times, with LightGBM once again standing out in terms of efficiency. By contrast, DL models achieve higher predictive accuracy at the expense of significantly longer training times. It is also noteworthy from Figure 20 that the GRU model is considerably more efficient than LSTM, as it provides very similar levels of accuracy while requiring substantially less training time. However, even the longest training time among the ML and DL approaches (LSTM at approximately 350 s on GPU) represents a substantial improvement over traditional methods in terms of the accuracy-to-training-time trade-off, further validating the efficacy of data-driven methodologies for power system parameter identification tasks.

As a final step, a summary table (Table 4) is provided to consolidate the main results obtained across all tested models. This table gathers, for each algorithm and parameter, the error metrics (MAE, MSE, RMSE) for the testing database together with the corresponding training times on CPU and GPU. The aim of this synthesis is to facilitate the reader’s understanding by presenting, in a compact form, the trade-offs between accuracy and computational efficiency observed during the study. In this way, the table highlights not only which models achieve the best parameter estimation performance, but also how the training cost varies depending on the chosen approach, thereby enabling a clearer comparison among the different methodologies analysed.

6. Conclusions

The aim of this study was to predict the values of the mechanical model parameters of Type III WTs based on their active power response during voltage dip events. By capturing the turbine behaviour during faults, a model capable of reproducing its response under other operating conditions can be obtained. To this end, a structured methodology was followed, comprising the generation of a synthetic database, its processing and segmentation, the training of multiple ML and DL models, and a comprehensive evaluation of their performance. Several widely used ML and DL techniques in power systems research were considered, and the results show that most of the evaluated models are able to estimate the target parameters with good accuracy. Validation was performed using both synthetic data and measurements from a real WT, yielding satisfactory results in both cases. Among the evaluated models, the MLP achieved the highest overall accuracy, while the RNN consistently underperformed, and the LightGBM model provided the best trade-off between accuracy and computational efficiency, with training times orders of magnitude lower than those of the other approaches.

The results of this work indicate that the characterization of WTs based on the IEC 61400-27 generic models can substantially benefit from ML and DL techniques. These tools extend the applicability of standardized models by enabling fast and accurate parameter identification, thereby simplifying the simulation and operation of power systems with high penetration of renewable generation. At the same time, several limitations of the proposed approach should be acknowledged. The models are trained using RMS (phasor-domain) simulations compliant with IEC 61400-27 and are therefore directly applicable to balanced voltage dip events, which are the most commonly considered in grid code compliance studies. As RMS models do not represent electromagnetic transients or unsymmetrical faults, extending the methodology to EMT-based simulations or highly unbalanced events would require additional training data. Furthermore, the training dataset is synthetically generated within parameter ranges representative of commercially deployed turbines. While this ensures physical realism and numerical stability, model performance may degrade if applied to turbines with mechanical parameters significantly outside these ranges. Nevertheless, this limitation can be addressed by extending the synthetic database, a process that is straightforward given the automated data generation framework developed in this work. Finally, although inference times are negligible, practical deployment in SCADA or monitoring environments requires the availability of clean, synchronized active power measurements during fault events. The robustness analysis shows that the models remain accurate under noise levels consistent with typical SCADA uncertainties, although additional preprocessing may be beneficial in noisier environments.

Future work will further enhance these results through the adoption of transfer learning strategies. For instance, models trained for parameter estimation in Type III WTs could serve as a basis for accelerating training and improving accuracy for Type IV turbines, or for adapting models to different operating conditions and fault scenarios. More generally, the proposed methodology can be extended to the identification of other control or mechanical parameters, provided that these have a measurable impact on active or reactive power signals.

Author Contributions

Conceptualization and ideas: J.J.-R., A.H.-E. and E.G.-L.; Methodology: J.J.-R.; Validation, J.J.-R., A.H.-E. and E.G.-L.; Resources, A.H.-E.; Writing—original draft preparation: J.J.-R.; Visualization: J.J.-R.; Supervision: A.H.-E. and E.G.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the State Research Agency (Agencia Estatal de Investigación) through PROJECT PID2024-157436OB-C21, PRE2022-102783, and by the Council of Communities of Castilla-La Mancha (‘Junta de Comunidades de Castilla-La Mancha’, JCCM) through project SBPLY/23/180225/000226.

Institutional Review Board Statement

Not applicable because this study does not require ethical approval.

Informed Consent Statement

Not applicable because this study does not involve humans.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors are grateful to the State Research Agency (Agencia Estatal de Investigación) and the Council of Communities of Castilla-La Mancha for partially funding this research. In addition, the authors would like to thank the editors and reviewers for their valuable comments and constructive suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

API	Application Programming Interface
CPU	Central Processing Unit
DFIG	Doubly Fed Induction Generator
DL	Deep Learning
DSO	Distribution System Operator
EU	European Union
GPU	Graphic Processing Units
IEC	International Electrotechnical Commission
LSTM	Long Short-Term Memory
ML	Machine Learning
MLP	Multi-Layer Perceptron
NN	Neural Network
RNN	Recurrent Neural Network
SVM	Support Vector Machine
TSO	Transmission System Operator
WECC	Western Electricity Coordinating Council
WT	Wind Turbine
WPP	Wind Power Plant

References

Global Wind Report 2025. Available online: https://www.gwec.net/reports/globalwindreport (accessed on 9 February 2026).
Soares-Ramos, E.P.; de Oliveira-Assis, L.; Sarrias-Mena, R.; Fernández-Ramírez, L.M. Current status and future trends of offshore wind power in Europe. Energy 2020, 202, 117787. [Google Scholar] [CrossRef]
AEE Anuario 2023—Asociación Empresarial Eólica. Available online: https://aeeolica.org/aee-anuario-2023/ (accessed on 9 February 2026).
Santoso, S.; Le, H.T. Fundamental time–domain wind turbine models for wind power studies. Renew. Energy 2007, 32, 2436–2452. [Google Scholar] [CrossRef]
Leon, A.E.; Solsona, J.A. Sub-synchronous interaction damping control for DFIG wind turbines. IEEE Trans. Power Syst. 2014, 30, 419–428. [Google Scholar] [CrossRef]
IEC 61400-27-1:2020; Wind Energy Generation Systems—Part 27-1: Electrical Simulation Models—Generic Models. International Electrotechnical Commission: Geneva, Switzerland, 2020; pp. 400–427.
Lorenzo Bonache, A. Modeling, Simulation and Validation of Generic Wind Turbine Models Based on International Guidelines. Ph.D. Thesis, Universidad de Castilla-La Mancha, Albacete, Spain, 2019. [Google Scholar]
Verdejo, H.; Pino, V.; Kliemann, W.; Becker, C.; Delpiano, J. Implementation of particle swarm optimization (PSO) algorithm for tuning of power system stabilizers in multimachine electric power systems. Energies 2020, 13, 2093. [Google Scholar] [CrossRef]
Niegodajew, P.; Marek, M.; Elsner, W.; Kowalczyk, Ł. Power plant optimisation—Effective use of the Nelder-Mead approach. Processes 2020, 8, 357. [Google Scholar] [CrossRef]
Cagigal, M.Á.G. Application of Kalman Filter Based Estimation Techniques to Electric Power Systems. Ph.D. Thesis, Universidad de Sevilla, Sevilla, Spain, 2021. [Google Scholar]
Villena-Ruiz, R.; Lorenzo-Bonache, A.; Honrubia-Escribano, A.; Jiménez-Buendía, F.; Gómez-Lázaro, E. Implementation of IEC 61400-27-1 Type 3 Model: Performance Analysis under Different Modeling Approaches. Energies 2019, 12, 2690. [Google Scholar] [CrossRef]
Sun, L.; You, F. Machine learning and data-driven techniques for the control of smart power generation systems: An uncertainty handling perspective. Engineering 2021, 7, 1239–1247. [Google Scholar] [CrossRef]
Colak, I.; Bayindir, R.; Sagiroglu, S. The effects of the smart grid system on the national grids. In Proceedings of the 2020 8th International Conference on Smart Grid (icSmartGrid), Paris, France, 17–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 122–126. [Google Scholar]
Ibrahim, M.S.; Dong, W.; Yang, Q. Machine learning driven smart electric power systems: Current trends and new perspectives. Appl. Energy 2020, 272, 115237. [Google Scholar] [CrossRef]
He, W. Load forecasting via deep neural networks. Procedia Comput. Sci. 2017, 122, 308–314. [Google Scholar] [CrossRef]
Kuo, P.; Huang, C. A high precision artificial neural networks model for short-term energy load forecasting. Energies 2018, 11, 213. [Google Scholar] [CrossRef]
Benitez, I.B.; Singh, J.G. A comprehensive review of machine learning applications in forecasting solar PV and wind turbine power output. J. Electr. Syst. Inf. Technol. 2025, 12, 54. [Google Scholar] [CrossRef]
Panda, S.K. Electrical load and solar power forecasting using machine learning techniques. J. King Saud Univ.-Sci. 2025, 37, 11. [Google Scholar] [CrossRef]
Aouidad, H.I.; Bouhelal, A. Machine learning-based short-term solar power forecasting: A comparison between regression and classification approaches using extensive Australian dataset. Sustain. Energy Res. 2024, 11, 28. [Google Scholar] [CrossRef]
Chen, J.; Zeng, G.Q.; Zhou, W.; Du, W.; Lu, K.D. Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization. Energy Convers. Manag. 2018, 165, 681–695. [Google Scholar] [CrossRef]
Lipu, M.H.; Miah, M.S.; Hannan, M.; Hussain, A.; Sarker, M.R.; Ayob, A.; Saad, M.H.M.; Mahmud, M.S. Artificial intelligence based hybrid forecasting approaches for wind power generation: Progress, challenges and prospects. IEEE Access 2021, 9, 102460–102489. [Google Scholar] [CrossRef]
Li, L.L.; Cheng, P.; Lin, H.C.; Dong, H. Short-term output power forecasting of photovoltaic systems based on the deep belief net. Adv. Mech. Eng. 2017, 9, 1687814017715983. [Google Scholar] [CrossRef]
Vaish, R.; Dwivedi, U.; Tewari, S.; Tripathi, S.M. Machine learning applications in power system fault diagnosis: Research advancements and perspectives. Eng. Appl. Artif. Intell. 2021, 106, 104504. [Google Scholar] [CrossRef]
Porawagamage, G.; Dharmapala, K.; Chaves, J.S.; Villegas, D.; Rajapakse, A. A review of machine learning applications in power system protection and emergency control: Opportunities, challenges, and future directions. Front. Smart Grids 2024, 3, 1371153. [Google Scholar] [CrossRef]
Zhang, S.; Wang, Y.; Liu, M.; Bao, Z. Data-based line trip fault prediction in power systems using LSTM networks and SVM. IEEE Access 2017, 6, 7675–7686. [Google Scholar] [CrossRef]
Wang, Y.; Liu, M.; Bao, Z. Deep learning neural network for power system fault diagnosis. In Proceedings of the 2016 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 6678–6683. [Google Scholar]
Qiu, D.; Strbac, G.; Wang, Y.; Ye, Y.; Wang, J.; Pinson, P.; Silva, V.; Teng, F. Artificial Intelligence for Microgrid Resilience: A Data-Driven and Model-Free Approach. IEEE Power Energy Mag. 2024, 22, 18–27. [Google Scholar] [CrossRef]
Artigao, E.; Martín-Martínez, S.; Honrubia-Escribano, A.; Gómez-Lázaro, E. Wind turbine reliability: A comprehensive review towards effective condition monitoring development. Appl. Energy 2018, 228, 1569–1583. [Google Scholar] [CrossRef]
Torres-Cabrera, J.; Maldonado-Correa, J.; Valdiviezo-Condolo, M.; Artigao, E.; Martín-Martínez, S.; Gómez-Lázaro, E. A Novel Data-Driven Approach with a Long Short-Term Memory Autoencoder Model with a Multihead Self-Attention Deep Learning Model for Wind Turbine Converter Fault Detection. Appl. Sci. 2024, 14, 7458. [Google Scholar] [CrossRef]
Sedghi, M.; Zolfaghari, M.; Mohseni, A.; Nosratian-Ahour, J. Real-time transient stability estimation of power system considering nonlinear limiters of excitation system using deep machine learning: An actual case study in Iran. Eng. Appl. Artif. Intell. 2024, 127, 107254. [Google Scholar] [CrossRef]
Papadopoulos, P.N.; Chatzivasileiadis, S.; Marot, A. Can Machine Learning Help Keep the System Secure?: Power Systems and Change Addressing the Increasing Complexity and Uncertainty During the Energy Transition. IEEE Power Energy Mag. 2024, 22, 100–111. [Google Scholar] [CrossRef]
Polinder, H.; Bang, D.; Van Rooij, R.; McDonald, A.; Mueller, M. 10 MW wind turbine direct-drive generator design with pitch or active speed stall control. In Proceedings of the 2007 IEEE International Electric Machines & Drives Conference, Antalya, Turkey, 3–5 May 2007; IEEE: Piscataway, NJ, USA, 2007; Volume 2, pp. 1390–1395. [Google Scholar]
Goudarzi, N.; Zhu, W. A review of the development of wind turbine generators across the world. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition; American Society of Mechanical Engineers: Houston, TX, USA, 2012; Volume 45202, pp. 1257–1265. [Google Scholar]
Artigao, E.; Martin-Martinez, S.; Ceña, A.; Honrubia-Escribano, A.; Gomez-Lazaro, E. Failure rate and downtime survey of wind turbines located in Spain. IET Renew. Power Gener. 2021, 15, 225–236. [Google Scholar] [CrossRef]
Muyeen, S.; Ali, M.H.; Takahashi, R.; Murata, T.; Tamura, J.; Tomaki, Y.; Sakahara, A.; Sasano, E. Comparative study on transient stability analysis of wind turbine generator system using different drive train models. IET Renew. Power Gener. 2007, 1, 131–141. [Google Scholar] [CrossRef]
Lorenzo-Bonache, A.; Honrubia-Escribano, A.; Jiménez-Buendía, F.; Molina-García, Á.; Gómez-Lázaro, E. Generic type 3 wind turbine model based on IEC 61400-27-1: Parameter analysis and transient response under voltage dips. Energies 2017, 10, 1441. [Google Scholar] [CrossRef]
Sultan, H.M.; Diab, A.A.Z.; Kuznetsov, O.N.; Ali, Z.M.; Abdalla, O. Evaluation of the impact of high penetration levels of PV power plants on the capacity, frequency and voltage stability of Egypt’s unified grid. Energies 2019, 12, 552. [Google Scholar] [CrossRef]
Jiménez-Ruiz, J.; Honrubia-Escribano, A.; Gómez-Lázaro, E. Combined Use of Python and DIgSILENT PowerFactory to Analyse Power Systems with Large Amounts of Variable Renewable Generation. Electronics 2024, 13, 2134. [Google Scholar] [CrossRef]
Han, X.S.; Liu, Q.H. Research on IEC Type3 wind turbine generator. Appl. Mech. Mater. 2014, 556, 2021–2026. [Google Scholar] [CrossRef]
Seyedi, M. Evaluation of the DFIG Wind Turbine Built-In Model in PSS/E. Master’s Thesis, Chalmers University of Technology, Göteborg, Sweden, 2009. [Google Scholar]
Honrubia-Escribano, A.; Gómez-Lázaro, E.; Vigueras-Rodríguez, A.; Molina-García, A.; Fuentes, J.; Muljadi, E. Assessment of DFIG simplified model parameters using field test data. In Proceedings of the 2012 IEEE Power Electronics and Machines in Wind Applications, Denver, CO, USA, 16–18 July 2012; pp. 1–7. [Google Scholar]
Okedu, K. Transient Analysis of Variable- and Fixed-Speed Wind Turbines. In Onshore Wind Farms: Dynamic Stability and Applications in Hydrogen Production; AIP Publishing LLC: Melville, NY, USA, 2021. [Google Scholar] [CrossRef]
Lorenzo-Bonache, A.; Honrubia-Escribano, A.; Jiménez-Buendía, F.; Gómez-Lázaro, E. Field validation of generic type 4 wind turbine models based on IEC and WECC guidelines. IEEE Trans. Energy Convers. 2018, 34, 933–941. [Google Scholar] [CrossRef]
Villena-Ruiz, R.; Jiménez-Buendía, F.; Honrubia-Escribano, A.; Molina-García, Á.; Gómez-Lázaro, E. Compliance of a generic type 3 WT model with the Spanish grid code. Energies 2019, 12, 1631. [Google Scholar] [CrossRef]
Villena-Ruiz, R.; Honrubia-Escribano, A.; Fortmann, J.; Gómez-Lázaro, E. Field validation of a standard Type 3 wind turbine model implemented in DIgSILENT-PowerFactory following IEC 61400-27-1 guidelines. Int. J. Electr. Power Energy Syst. 2020, 116, 105553. [Google Scholar] [CrossRef]
Sharma, V. A study on data scaling methods for machine learning. Int. J. Glob. Acad. Sci. Res. 2022, 1, 31–42. [Google Scholar] [CrossRef]
Raschka, S.; Liu, Y.H.; Mirjalili, V. Machine Learning with PyTorch and Scikit-Learn: Develop Machine Learning and Deep Learning Models with Python; Packt Publishing Ltd.: Birmingham, UK, 2022. [Google Scholar]
Sivakumar, V.; Arunfred, N.; Anusha, N.; Balakrishnan, C.; Meenakshi, B.; Sujatha, S. A Gradient Boosting Algorithm to Predict Energy Consumption for Home Applications. In Proceedings of the 2024 2nd International Conference on Computer, Communication and Control (IC4), Indore, India, 8–10 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
Tyralis, H.; Papacharalampous, G. Boosting algorithms in energy research: A systematic review. Neural Comput. Appl. 2021, 33, 14101–14117. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
Yao, X.; Fu, X.; Zong, C. Short-term load forecasting method based on feature preference strategy and LightGBM-XGboost. IEEE Access 2022, 10, 75257–75268. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6638–6648. [Google Scholar]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
Shinde, P.; Patil, P.; Ahmad, A.; Munje, R. Support Vector Machine: A Machine Learning Approach for Power Quality Application. In Proceedings of the 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), Bombay, India, 29–31 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Tao, Y.; Yan, J.; Niu, E.; Zhai, P.; Zhang, S. An SVM-Based Anomaly Detection Method for Power System Security Analysis Using Particle Swarm Optimization and t-SNE for High-Dimensional Data Classification. Processes 2025, 13, 549. [Google Scholar] [CrossRef]
Chuan, O.W.; Ab Aziz, N.F.; Yasin, Z.M.; Salim, N.A.; Wahab, N.A. Fault classification in smart distribution network using support vector machine. Indones. J. Electr. Eng. Comput. Sci. 2020, 18, 1148–1155. [Google Scholar] [CrossRef]
Hou, K.; Shao, G.; Wang, H.; Zheng, L.; Zhang, Q.; Wu, S.; Hu, W. Research on practical power system stability analysis algorithm based on modified SVM. Prot. Control Mod. Power Syst. 2018, 3, 11. [Google Scholar] [CrossRef]
Sun, C.; Gong, D. Support vector machines with PSO algorithm for short-term load forecasting. In Proceedings of the 2006 IEEE International Conference on Networking, Sensing and Control, Ft. Lauderdale, FL, USA, 23–25 April 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 676–680. [Google Scholar]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition; Rumelhart, D.E., McClelland, J.L., Eds.; MIT Press: Cambridge, MA, USA, 1986; Volume 1, pp. 319–362. [Google Scholar]
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow; O’Reilly Media, Inc.: Santa Rosa, CA, USA, 2022. [Google Scholar]
Jnr, E.O.N.; Ziggah, Y.Y.; Relvas, S. Hybrid ensemble intelligent model based on wavelet transform, swarm intelligence and artificial neural network for electricity demand forecasting. Sustain. Cities Soc. 2021, 66, 102679. [Google Scholar] [CrossRef]
Perveen, G.; Rizwan, M.; Goel, N.; Anand, P. Artificial neural network models for global solar energy and photovoltaic power forecasting over India. Energy Sources Part A Recovery Util. Environ. Eff. 2025, 47, 864–889. [Google Scholar]
Chen, Z.; Liu, Y.; Zhang, Y.; Lei, Z.; Chen, Z.; Li, G. A neural network-based ECMS for optimized energy management of plug-in hybrid electric vehicles. Energy 2022, 243, 122727. [Google Scholar] [CrossRef]
Liu, H.; Shen, X.; Guo, Q.; Sun, H. A data-driven approach towards fast economic dispatch in electricity–gas coupled systems based on artificial neural network. Appl. Energy 2021, 286, 116480. [Google Scholar] [CrossRef]
Zhang, Y.; Shi, X.; Zhang, H.; Cao, Y.; Terzija, V. Review on deep learning applications in frequency analysis and control of modern power system. Int. J. Electr. Power Energy Syst. 2022, 136, 107744. [Google Scholar] [CrossRef]
Sarker, I.H. Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci. 2021, 2, 420. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Parr, T.; Howard, J. The matrix calculus you need for deep learning. arXiv 2018, arXiv:1802.01528. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Zheng, J.; Xu, C.; Zhang, Z.; Li, X. Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network. In Proceedings of the 2017 51st Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 22–24 March 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
Hochreiter, S. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Gheisari, M.; Ebrahimzadeh, F.; Rahimi, M.; Moazzamigodarzi, M.; Liu, Y.; Dutta Pramanik, P.K.; Heravi, M.A.; Mehbodniya, A.; Ghaderzadeh, M.; Feylizadeh, M.R.; et al. Deep learning: Applications, architectures, models, tools, and frameworks: A comprehensive survey. CAAI Trans. Intell. Technol. 2023, 8, 581–606. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]

Figure 1. Annual generation of electricity in Spain.

Figure 2. Type III WT scheme.

Figure 3. Two-mass mechanical model.

Figure 4. Research methodology outline.

Figure 5. Examples of simulated voltage dips used for database generation.

Figure 6. Comparison of full and downsampled active power response.

Figure 7. Influence of mechanical parameters in active power response.

Figure 8. MLP diagram.

Figure 9. RNN diagram.

Figure 10. LSTM diagram.

Figure 11. GRU diagram.

Figure 12. MAE obtained for each model and mechanical parameter.

Figure 13. MSE obtained for each model and mechanical parameter.

Figure 14. RMSE obtained for each model and mechanical parameter.

Figure 15. Heatmap representing the accuracy of Gamesa SG 2.1-114 mechanical parameters.

Figure 16. Heatmap representing the accuracy of Gamesa G52 mechanical parameters.

Figure 17. Model training times for both CPU and GPU for ML models.

Figure 18. Model training times for both CPU and GPU for DL models.

Figure 19. Scatter plot of RMSE versus training time for the different models trained on CPU.

Figure 20. Scatter plot of RMSE versus training time for the different models trained on GPU.

Table 1. Methods commonly used for WT model parametrization.

Method	Advantages	Disadvantages	Data Requirements
Manual tuning	Precise fine-tuning when prior information is available	Extremely time-consuming, requires expert knowledge and not scalable	Low (only approximate prior knowledge)
Classical optimization (PSO, Nelder–Mead, etc.)	Applicable to non-linear problems	High computational cost and need approximate initialization	Medium (simulation data for repeated evaluations)
Kalman Filter	Suitable for online estimation and good convergence in dynamical systems	Sensitive to initialization, statistical assumptions and high cost for large models	High (PMU or SCADA time-series data)
Proprietary tools (e.g., Simulink Design Optimization)	Direct integration in industrial software and user-friendly interface	Limited to proprietary environments and poor scalability	Medium (simulation results and initial parameter guess)

Table 2. Variation of values of two-mass mechanical model.

Parameter	Unit	Range
$H_{gen}$	s	[0.3–3.5]
$H_{wtr}$	s	[3.5–10.5]
$k_{drt}$	$T_{base}$	[10–100]
$c_{drt}$	$T_{base} / Ω_{base}$	[0.1–4]

Table 3. Hyperparameter tuning space.

Model	Parameters	Hyperparameter Space
Boosting algorithms (Xgboost, LightGBM, Catboost)	Maximum tree depth	[3, 5, 7]
	Learning rate	[0.01, 0.1, 0.2]
	Number of trees	[100, 200, 300]
	Fraction of features used per tree	[0.3, 0.5, 0.7, 0.8]
SVM	C	[0.1, 1, 10, 50, 100]
SVM	$ϵ$	[0.01, 0.1, 0.2, 0.5, 1]

Table 4. Results summary table.

Model	Parameter	MAE	MSE	RMSE	Training Time (CPU) [s]	Training Time (GPU) [s]	Model	Parameter	MAE	MSE	RMSE	Training Time (CPU) [s]	Training Time (GPU) [s]
Xgboost	$H_{gen}$	0.134	0.036	0.190	33.49	7.89	MLP	$H_{gen}$	0.019	0.002	0.046	97.42	30
	$H_{wtr}$	0.130	0.037	0.193				$H_{wtr}$	0.013	0.001	0.029
	$k_{drt}$	0.119	0.038	0.196				$k_{d r t}$	0.020	0.003	0.060
	$c_{drt}$	0.145	0.039	0.197				$c_{drt}$	0.021	0.002	0.048
LightGBM	$H_{gen}$	0.111	0.025	0.159	6.86	4.23	RNN	$H_{gen}$	0.446	0.335	0.579	-	82
	$H_{wtr}$	0.108	0.024	0.156				$H_{wtr}$	0.421	0.335	0.579
	$k_{drt}$	0.100	0.027	0.164				$k_{drt}$	0.697	0.722	0.850
	$c_{drt}$	0.128	0.030	0.175				$c_{drt}$	0.376	0.244	0.494
Catboost	$H_{gen}$	0.101	0.020	0.141	137.47	23.94	LSTM	$H_{gen}$	0.063	0.011	0.105	-	343
	$H_{wtr}$	0.090	0.017	0.132				$H_{wtr}$	0.052	0.007	0.089
	$k_{drt}$	0.096	0.022	0.151				$k_{drt}$	0.064	0.014	0.118
	$c_{drt}$	0.116	0.023	0.154				$c_{drt}$	0.059	0.007	0.087
SVM	$H_{gen}$	0.088	0.028	0.168	159.57	27.53	GRU	$H_{gen}$	0.053	0.010	0.103	-	229
	$H_{wtr}$	0.049	0.011	0.109				$H_{wtr}$	0.036	0.003	0.055
	$k_{drt}$	0.083	0.031	0.178				$k_{drt}$	0.050	0.014	0.119
	$c_{drt}$	0.122	0.043	0.207				$c_{drt}$	0.046	0.004	0.068

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiménez-Ruiz, J.; Honrubia-Escribano, A.; Gómez-Lázaro, E. Machine and Deep Learning Approaches for Wind Turbine Model Parameter Prediction Within the Framework of IEC 61400-27 Standard. Electronics 2026, 15, 1104. https://doi.org/10.3390/electronics15051104

AMA Style

Jiménez-Ruiz J, Honrubia-Escribano A, Gómez-Lázaro E. Machine and Deep Learning Approaches for Wind Turbine Model Parameter Prediction Within the Framework of IEC 61400-27 Standard. Electronics. 2026; 15(5):1104. https://doi.org/10.3390/electronics15051104

Chicago/Turabian Style

Jiménez-Ruiz, Javier, Andrés Honrubia-Escribano, and Emilio Gómez-Lázaro. 2026. "Machine and Deep Learning Approaches for Wind Turbine Model Parameter Prediction Within the Framework of IEC 61400-27 Standard" Electronics 15, no. 5: 1104. https://doi.org/10.3390/electronics15051104

APA Style

Jiménez-Ruiz, J., Honrubia-Escribano, A., & Gómez-Lázaro, E. (2026). Machine and Deep Learning Approaches for Wind Turbine Model Parameter Prediction Within the Framework of IEC 61400-27 Standard. Electronics, 15(5), 1104. https://doi.org/10.3390/electronics15051104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine and Deep Learning Approaches for Wind Turbine Model Parameter Prediction Within the Framework of IEC 61400-27 Standard

Abstract

1. Introduction

2. ML in Power Systems

3. Type III WTs

4. Research Methodology

4.1. Creation of Synthetic Database

4.2. Data Processing

4.3. ML and DL Model Training

4.3.1. Gradient Boosting

4.3.2. Support Vector Machine (SVM)

4.3.3. Neural Networks (NNs)

4.4. Result Analysis

5. Results and Discussion

5.1. Model Error Assessment

5.2. Predictive Capability on Real System

5.3. Computational Efficiency

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI