SCNGO-CNN-LSTM-Based Voltage Sag Prediction Method for Power Systems

Sun, Lei; Xu, Yu; Bai, Jing

doi:10.3390/en19020428

Open AccessArticle

SCNGO-CNN-LSTM-Based Voltage Sag Prediction Method for Power Systems

by

Lei Sun

,

Yu Xu

and

Jing Bai

^*

College of Electrical and Information Engineering, Beihua University, Jilin 132021, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(2), 428; https://doi.org/10.3390/en19020428

Submission received: 8 December 2025 / Revised: 9 January 2026 / Accepted: 14 January 2026 / Published: 15 January 2026

Download

Browse Figures

Versions Notes

Abstract

To achieve accurate voltage sag prediction and early warning, thereby improving power quality, a hybrid voltage sag prediction framework is proposed by integrating Kernel Entropy Component Analysis (KECA) with an improved Northern Goshawk Optimization (NGO) algorithm for hyperparameter tuning of a CNN-LSTM model. First, to address the limitations of the original NGO, such as proneness to falling into local optima and high randomness of the initial population distribution, a refraction-opposition-based learning mechanism is introduced to enhance population diversity and expand the search space. Furthermore, a sine–cosine strategy (SCA) with nonlinear weight coefficients is integrated into the exploration phase to dynamically adjust the search step size, optimizing the balance between global exploration and local exploitation, thereby boosting convergence speed and accuracy. The improved algorithm (SCNGO) is then utilized to optimize the hyperparameters of the CNN-LSTM model. Second, KECA is applied to voltage-sag-related data to extract key features and eliminate redundant information, and the resulting dimensionally reduced data are fed as input to the SCNGO-CNN-LSTM model to further improve prediction performance. Experimental results demonstrate that the SCNGO-CNN-LSTM model outperforms other comparative models significantly across multiple evaluation metrics. Compared with NGO-CNN-LSTM, GWO-CNN-LSTM, and the original CNN-LSTM, the proposed method achieves a mean squared error (MSE) reduction of 53.45%, 44.68%, and 66.76%, respectively. The corresponding root mean squared error (RMSE) is decreased by 25.33%, 18.61%, and 36.92%, while the mean absolute error (MAE) is reduced by 81.23%, 77.04%, and 86.06%, respectively. These results confirm that the proposed framework exhibits superior feature representation capability and significantly improves voltage sag prediction accuracy.

Keywords:

voltage prediction; CNN-LSTM; Kernel Entropy Component Analysis; improved Northern Goshawk Optimization

1. Introduction

Electric energy, as an important secondary energy that is economical, clean, and easy to transmit and convert [1], electrical energy has become the basic reliance for modern industrial production and residents’ daily lives. With the continuous expansion of power system scale and improvements in automation and intelligence levels, various industries have increasingly higher requirements for power quality, making power quality issues a research focus [2,3]. Power quality problems include steady-state and transient disturbances, among which voltage sag has emerged as a key research direction in recent years due to its high occurrence frequency, wide impact range, and strong uncertainty [4,5]. Therefore, achieving accurate voltage sag prediction is of great significance for enhancing the stability and reliability of power systems.

Voltage sag (also referred to as voltage dip) is defined as a phenomenon where the root mean square (RMS) value of voltage drops to 10–90% of the rated value within a short period ranging from 0.5 cycles to 1 min [6]. Although it is a transient event, it occurs frequently in modern power systems and is sufficient to cause malfunction or shutdown of sensitive equipment such as variable speed drives and programmable logic controllers. Its main causes include short-circuit faults in transmission and distribution networks, large motor startup, transformer energization, system operations, and lightning strikes [4], among which short-circuit faults are often regarded as the dominant factor. To address this problem, measures such as power grid reinforcement, optimization of protection coordination, and installation of compensation devices like Dynamic Voltage Restorers (DVRs) and Static Synchronous Compensators (STATCOMs) can be adopted [2]; shortening the fault clearing time is also an effective strategy [5]. However, these methods are often accompanied by limitations such as high cost, complex implementation, or only post-fault compensation characteristics. Therefore, voltage sag prediction—defined as modeling the short-term evolution trend of voltage signals based on historical data—has gained increasing attention as an analytical tool, aiming to provide support for power quality assessment.

In recent years, with the advancement of computing power and the continuous accumulation of power system monitoring data, Machine Learning (ML) [7,8] and Deep Learning (DL) [9,10] have been widely applied in the field of power quality analysis. Early studies mostly adopted traditional supervised learning methods based on expert features (e.g., time-domain, frequency-domain, and wavelet coefficients) such as SVM, KNN, and random forests [11,12,13]. While these methods perform well in scenarios with distinct features and moderate sample sizes, they suffer from limitations including strong reliance on features, high manual intervention, and insufficient noise robustness. With the rise in the end-to-end learning paradigm, deep neural networks (e.g., CNN, RNN, LSTM) have gradually replaced manual feature extraction, enabling adaptive modeling of complex nonlinear relationships. Ref. [14] proposed the use of LSTM for voltage sag classification, Ref. [15] adopted an attention mechanism combined with CNN for power quality disturbance identification, and Ref. [16] applied LSTM to power system voltage stability prediction. Several studies have utilized LSTM for modeling and predicting voltage states, explored its application potential in system operation analysis and auxiliary decision-making, and verified the effectiveness of deep learning methods in addressing voltage fluctuation issues [17]. Zhang et al. recently proposed a data-driven method for identifying voltage sag consequence states for industrial users; by performing feature learning on monitoring data, it achieves effective discrimination of voltage sag impact states, further demonstrating the application value of data-driven methods in voltage sag analysis and evaluation [18]. In addition, to address the residual voltage prediction problem during voltage sags, some studies have proposed data fusion-based prediction methods that improve the feasibility and accuracy of residual voltage prediction by incorporating multi-source information [19]. Meanwhile, in the broader field of power quality disturbance prediction, other deep learning architectures such as Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) have also been introduced to predict the dynamic changes in power quality indicators in distributed systems [20]. However, standalone LSTM neural networks often face challenges such as a large number of gate units, slow training speed, and relatively low stability of the prediction model. To mitigate these shortcomings, Ref. [21] proposed a hybrid CNN-LSTM model that can simultaneously capture local spatial features (accomplished by CNN) and long-term temporal dependencies (captured by LSTM), achieving superior performance compared to single models in multiple power prediction tasks. Based on measured voltage data, it verified the advantages of CNN-LSTM in power quality disturbance classification, with results showing that the model maintains high robustness even in noisy environments. To further enhance model performance, Ref. [22] introduced meta-heuristic models to optimize the hyperparameters and partial structures of CNN-LSTM (e.g., number of convolution kernels, number of LSTM units, and learning rate), achieving joint optimization and significantly improving prediction accuracy and generalization ability. Overall, most existing studies focus on network structure optimization, but pay insufficient attention to the criticality of input features. Feature redundancy not only reduces prediction accuracy but also increases model complexity.

To further improve the effectiveness and interpretability of input features, researchers have begun to explore data-driven feature extraction methods. Data-driven multivariate statistical methods, represented by Principal Component Analysis (PCA), have been widely applied in the field of industrial process condition monitoring [23]. They do not require the establishment of accurate mathematical models and offer greater simplicity and practicality compared to mechanism-based or knowledge-based modeling methods. However, PCA assumes a linear relationship between variables, while power quality data typically exhibit multivariate, highly coupled, and strongly nonlinear characteristics—this limits PCA’s performance in such scenarios. To address the nonlinearity and coupling of variables, Schölkopf et al. [24] proposed Kernel Principal Component Analysis (KPCA), which extends PCA to a high-dimensional feature space by introducing kernel functions, enabling more effective process monitoring. Nevertheless, KPCA is based on the variance maximization criterion and insufficiently reveals the density structure of data. Further, Jenssen [25] proposed Kernel Entropy Component Analysis (KECA), which measures the importance of kernel feature directions using kernel density estimation and Renyi’s quadratic entropy. By ranking features according to their kernel entropy contributions, KECA can more accurately retain the density and cluster structure information of data. Ref. [26] demonstrates that KECA outperforms PCA and KPCA in terms of quality evaluation criteria and clustering accuracy. Currently, relevant research on KECA is mainly applied in face recognition, remote sensing, noise reduction, and fault diagnosis [27,28], but its application in process condition monitoring—especially in voltage prediction and power quality analysis—remains relatively limited.

To address this gap, this paper proposes a voltage sag prediction method integrating KECA dimensionality reduction, SCNGO hyperparameter optimization, and CNN-LSTM temporal modeling. First, KECA is used to extract key features of voltage sags. Subsequently, the SCNGO algorithm (an improved version of the Northern Goshawk Optimization, NGO) is utilized to optimize the hyperparameters of the CNN-LSTM model. Finally, the KECA-SCNGO-CNN-LSTM prediction model is constructed. A case study is conducted using AC power monitoring data from the Aucma CFD-50 vaccine storage refrigerator developed by New Horizons under Global Health Labs, verifying the effectiveness and feasibility of the proposed method.

1.1. KECA Algorithm

Kernel Entropy Component Analysis (KECA) is a nonlinear dimensionality reduction method based on information entropy theory, which can explore the inherent distribution structure of high-dimensional data, extract the most informative features, and eliminate redundancy. Its core idea is to measure the importance of feature directions using Renyi’s quadratic entropy and calculate entropy contributions via kernel density estimation to determine the optimal projection directions. It aims to select feature parameters that are sensitive to the prediction model, highly discriminative, and with distinct regularity from complex original feature signals, thereby improving the accuracy of model predictions.

In the problem of voltage sag prediction, power quality features such as voltage, current, power, and frequency usually exhibit significant nonlinear coupling relationships and information redundancy. Using high-dimensional original features directly not only increases the computational complexity of the model but also may weaken the ability to express key disturbance patterns. By means of an entropy evaluation criterion based on kernel density estimation, KECA ranks the importance of feature directions, which enables it to better retain information closely related to the data distribution structure during dimensionality reduction, thereby improving the input quality and prediction stability of the subsequent CNN–LSTM model.

The KECA method selects the feature variables required to form the projection space while minimizing the Renyi entropy loss after dimensionality reduction. The input vector is expressed asdimensionality reduction. The input vector is expressed as:

[X = X_{1}, X_{2} \dots X_{P}], X_{K} = R^{d}

. Here, P denotes the number of voltage sag prediction features, i.e., P = 4.

The mathematical description of Renyi’s quadratic entropy, used as an information entropy index, is as follows:

M (p) = - \lg (\int p^{2} (x) d x),

(1)

where M(p) denotes Renyi’s quadratic entropy, and p(x) represents the probability density function of the input prediction data. Based on the monotonicity of the logarithmic function, Equation (1) is rewritten as:

C (p) = \int p^{2} (x) d x,

(2)

The Parzen window probability density estimation function is introduced to calculate C(p):

\hat{p} (x) = \frac{1}{N} \sum_{x_{t} \in x} k_{σ} (x, x_{t}),

(3)

where

k_{σ} (x, x_{t})

is the Parzen window function,

x_{t}

is the center, and

σ

is the width parameter. A Gaussian kernel function is selected as the kernel function for KECA. Then,

M_{p}

can be derived as:

\hat{M} (p) = \int p^{2} (x) d x = \frac{1}{N^{2}} \sum_{i = 1}^{N} \sum_{j = 1}^{N} k_{\sqrt{2} σ} (x_{i}, x_{j}) = \frac{1}{N^{2}} L^{T} K L,

(4)

K (i, j) = k_{\sqrt{2} σ} (x_{i}, x_{j}),

(5)

where K is an (N × N) kernel matrix, and L is a unit vector of length N. The eigenvalue decomposition of the kernel matrix K is calculated by the following equation:

K = E D E^{T},

(6)

where D = diag(

λ_{1}, λ_{2} \dots λ_{N}

) denotes a diagonal matrix composed of eigenvalues, and E = (

e_{1}, e_{2} \dots e_{n}

) is the eigenvector matrix corresponding to the eigenvalues. Here,

λ_{i}

and

e_{i}

are sorted in descending order based on the estimated Renyi entropy values.

\hat{V} (p) = \frac{1}{N^{2}} L^{T} K L = \frac{1}{N^{2}} 1^{T} E D E^{T} L = \frac{1}{N^{2}} \sum_{i = 1}^{N} {(\sqrt{λ_{i}} e_{i}^{T} L)}^{2},

(7)

F_{i} = \sqrt{λ_{i}} e_{i}^{T} L,

(8)

Therefore, the

F_{i}

values can ensure that the information entropy loss before and after sample dimensionality reduction is minimized. By sorting the magnitudes of

F_{i}

in descending order, the top d eigenvalues and their corresponding eigenvectors with large contributions to Renyi entropy are selected to form a new eigenvector matrix

E_{d}

, and the corresponding eigenvalues constitute a diagonal matrix

D_{d}

. The result of the KECA projection transformation is:

Φ_{e c a} = P_{u_{d}} Φ = D_{d}^{\frac{1}{2}} E_{d}^{T},

(9)

where Φ represents the function value of the original data mapped to the high-dimensional feature space, and

u_{d}

denotes the projection position with the maximum information content constructed by the selected eigenvalues and their corresponding eigenvectors. To ensure that the mapping process fully retains the information contained in the original data, the value of d should be determined by solving the following optimization problem:

Φ_{e c a} = D_{d}^{\frac{1}{2}} E_{d}^{T} : \underset{λ_{1}, \dots, λ_{N}, e_{1}, \dots, e_{N}}{m i n} \{\hat{V} (p) - {\hat{V}}_{d} (p)\},

(10)

The expression for the minimum value of the above equation is written as:

Φ_{e c a} = D_{d}^{\frac{1}{2}} E_{d}^{T} : \underset{λ_{1}, \dots, λ_{N}, e_{1}, \dots, e_{N}}{m i n} \{\hat{V} (p) - {\hat{V}}_{d} (p)\},

(11)

where

Ψ_{j}

is the j-th largest element in Equation (11), and thus the value of d is determined. The projection result of the next test sample

x_{n e w}

on

u_{d}

is shown in Equation (12):

y_{n e w} = P_{u_{d}} Φ (x_{n e w}) = D_{d}^{- \frac{1}{2}} E_{d}^{T} Φ^{T} Φ (x_{n e w}) = D_{d}^{- \frac{1}{2}} E_{d}^{T} K (x, x_{n e w}),

(12)

The feature dimension of KECA is determined by a built-in automatic dimension selection criterion, aiming to retain key kernel entropy information. For the dataset in this study, KECA automatically selects 3 kernel entropy components (auto-dim = 3), which are used as the input features for the subsequent CNN–LSTM model. This setting effectively reduces the input dimension and computational complexity while ensuring information retention.

The flowchart of the KECA algorithm is shown in Figure 1.

1.2. CNN-LSTM Prediction Method

1.2.1. Convolutional Neural Network (CNN)

For the analysis of large-scale time-series data, one-dimensional neural networks exhibit strong feature extraction capabilities and can be applied to fixed-length signal analysis. A CNN consists of multiple layers: data is input through the input layer, undergoes feature transformation and extraction via convolutional layers and pooling layers, and then all information is integrated through fully connected layers before being output by the output layer. The convolutional calculation process is shown in Equation (13).

k^{l} = f (W^{l} \cdot P^{l - 1} + b^{l}),

(13)

where

k^{l}

denotes the feature vector of the

l

-th layer;

f

represents the activation function;

W^{l}

is the weight matrix of the filter in the

l

-th layer;

P^{l - 1}

denotes the output of the

(l− 1)

-th layer; and

b^{l}

is the bias term from the layer

(l− 1)

-th to the

l

-th layer, (a trainable parameter in the model used to shift the weighted sum). The pooling calculation is expressed in Equation (14):

y_{λ}^{ξ} (w) = \max {k_{i}^{l} (γ)}, γ \in k_{w},

(14)

where

y_{λ}^{ξ} (w)

is the element in the

λ

-th feature matrix of the

ξ

-th layer after pooling;

k_{i}^{l}

is the element in the

i

-th feature vector of the

l

-th layer; and

k_{w}

denotes the

w

-th pooling coverage area.

1.2.2. LSTM Neural Network Model

The LSTM employs three gating units to update or discard historical information, whose specific structure is illustrated in Figure 2.

1.: Forget Gate

The forget gate uses the Sigmoid function to determine which data information can be transmitted through the cell state, and judges the information based on the output from the previous moment. The expression for the gate signal of the forget gate is:

f_{t} = σ (U_{f} \cdot [h_{t - 1}, x_{t}] + b_{l}),

(15)

where

f_{t}

denotes the gate signal of the forget gate;

U_{f}

represents the weight matrix of

f_{t}

;

b_{l}

is the bias term of the forget gate;

σ

stands for the Sigmoid activation function with a value range of [0, 1];

h_{t - 1}

is the hidden state matrix from the previous moment; and

x_{t}

denotes the input information at the current moment.

2.: Input Gate

The input gate uses the Sigmoid function to determine which values are used for updates; subsequently, a tanh layer generates candidate values, which are then combined to obtain new candidate states. By integrating these two steps, unnecessary information can be discarded and new information added from large volumes of data. The expressions are as follows:

C_{t} = f_{t} \cdot C_{t} - 1 + i_{t} \cdot {\tilde{C}}_{t},

(16)

i_{t} = σ (U_{i} \cdot [h_{t - 1}, x_{t}] + b_{2}),

(17)

{\tilde{C}}_{t} = t a n h (U_{C} \cdot [h_{t - 1}, x_{t}] + b_{3}),

(18)

where

C_{t}

denotes the cell state matrix at time t;

{\tilde{C}}_{t}

is the candidate cell state matrix;

i_{t}

represents the gate signal of the input gate;

U_{i}

and

b_{2}

are the weight matrix and bias term of the input gate signal

i_{t}

, respectively;

U_{c}

is the weight matrix of the tanh function; and

b_{3}

is the state update bias term.

3.: Output Gate

After obtaining an initial output through the Sigmoid layer, the tanh function is used to scale the output to the range [−1, 1], which is then multiplied by the output from the Sigmoid layer via element-wise multiplication to generate the model output. The expressions are as follows:

o_{t} = σ (U_{o} \cdot [h_{t - 1}, x_{t}] + b_{4}),

(19)

h_{t} = o_{t} \cdot t a n h (C_{t}),

(20)

where

o_{t}

denotes the gate signal of the output gate;

U_{o}

and

b_{4}

are the weight matrix and bias term of

o_{t}

, respectively.

The prediction process of the CNN-LSTM model is illustrated in Figure 3.

1.3. Improved Northern Goshawk Optimization Algorithm

1.3.1. Northern Goshawk Optimization

The Northern Goshawk Optimization (NGO) algorithm was proposed by M. Dehghani et al. [29] in 2021. It simulates the hunting behaviors of northern goshawks, mainly consisting of an exploration phase (prey identification and attack) and an exploitation phase (chase and escape).

1.: Prey identification phase

A prey is randomly selected and attacked rapidly, which can be described by the following mathematical models:

p_{i} = X_{k}, (i = 1,2, \dots, N; k = 1,2, \dots, i - 1, \dots, N),

(21)

x_{i, j}^{n e w, p_{1}} = \{\begin{matrix} x_{i, j} + r (p_{i, j} - I x_{i, j}), F p_{i} < F_{i}) \\ x_{i, j} + r (x_{i, j} - p_{i, j}), F p_{i} \geq F_{i}) \end{matrix},

(22)

X_{i} = \{\begin{matrix} x_{i}^{n e w, p_{1}}, F_{i}^{n e w, p_{1}} < F_{i} \\ x_{i}, F_{i}^{n e w, p_{1}} \geq F_{i} \end{matrix},

(23)

where

p_{i}

denotes the position of the prey chosen by the

i

-th goshawk;

F p_{i}

is the objective function value; k is a random number within [1, N];

x_{i}^{n e w, p_{1}}

represents the new state of the

i

-th goshawk;

x_{i, j}^{n e w, p_{1}}

is the new state of the

i

-th goshawk in the

j

-th dimension;

F_{i}^{n e w, p_{1}}

is the corresponding fitness value;

r \in [0, 1]

is a random number; and

I \in [1, 2]

. Both

r

and

I

are random numbers used to generate NGO behaviors during the search and update processes.

2.: Pursuit and escape phase

Northern goshawks are highly agile. When chasing escaping prey, they can capture it at any time and place. If a goshawk is in an attack position with radius

R

, the second phase can be expressed by the following mathematical formulas:

x_{i, j}^{n e w, p_{2}} = x_{i, j} + R (2 r - l) X_{i, j},

(24)

R = 0.02 (l - \frac{t}{T}),

(25)

X_{i} = \{\begin{matrix} x_{i}^{n e w, p_{2}}, F_{i}^{n e w, p_{2}} < F_{i} \\ x_{i}, F_{i}^{n e w, p_{2}} \geq F_{i} \end{matrix},

(26)

where

t

is the current iteration number;

T

is the maximum number of iterations;

x_{i}^{new, p 2}

denotes the new state of the i-th goshawk during the chase phase;

x_{i, j}^{new, p 2}

is the new state of the

i

-th goshawk in the

j

-th dimension during the chase phase; and

F_{i}^{new, p 2}

is the fitness value in the new state.

1.3.2. Improved Strategies

Despite its advantages of high convergence accuracy and strong stability, the NGO algorithm is still prone to falling into local minima and premature convergence when solving large-scale complex optimization problems. To address the above issues, this paper improves the NGO algorithm as follows.

1.: Adoption of Refractive Opposition-Based Learning Mechanism

Aiming at the problem that the NGO algorithm tends to lose population diversity during the optimization process, which makes it easy to fall into local optima and thus leads to low convergence accuracy, the opposition-based learning mechanism proposed in Ref. [30] is used for population initialization. Specifically, the search range is expanded by inversely solving the current solution to find an optimal alternative solution. Meanwhile, the refraction principle [31] is introduced to reduce the probability of premature convergence in the later stage of search. Assume that the optimization range on the x-axis is

[l, u]

,

γ

is the incident angle,

β

is the refraction angle,

h

is the length of the incident ray, and

h^{*}

is the length corresponding to the refracted ray. Then, we have:

\{\begin{matrix} \sin γ = ((l + u) / 2 - x) / h \\ s i n β = (x^{*} - (l + u) / 2) / h^{*} \end{matrix},

(27)

According to the definition of refractive index, the calculation formula of the refractive index n is as follows:

n = \frac{h^{*} [(l + u) / 2 - x]}{h^{*} [x^{*} - (l + u) / 2]},

(28)

Let the scaling factor

k = h / h^{*}

. Substituting it into Equation (28) yields:

x^{*} = \frac{l + u}{2} + \frac{l + u}{2 k n} - \frac{x}{k n},

(29)

Under the conditions of

n = 1

and

k = 1

, Equation (29) can be converted into the opposition-based learning formula:

x^{*} = l + u - x,

(30)

When extending Equation (30) to the high-dimensional space of the Northern Goshawk Optimization algorithm, setting

n = 1

gives the following formula:

x_{i, j}^{*} = \frac{l_{j} + u_{j}}{2} + \frac{l_{j} + u_{j}}{2 k} - \frac{x_{i, j}}{k},

(31)

where

x_{i, j}

denotes the position of the

i

-th goshawk in the

j

-th dimension of the population (

i = 1, 2, \dots, D; j = 1, 2, \dots, N

),

D

is the population size,

N

is the number of dimensions, and

x_{i, j}^{*}

is the refraction-opposition position of

x_{i, j}

.

l_{j}

and

u_{j}

are the minimum and maximum values in the

j

-th dimension of the search space, respectively.

2.: Incorporation of the Sine–cosine Algorithm

In local optimal regions, the emergence of a large number of followers leads to the loss of population diversity, making the algorithm prone to falling into local optima. To address this issue, this paper introduces the Sine–cosine Algorithm (SCA) [32] into the position update process of the discoverers in the Northern Goshawk Optimization algorithm, thereby enhancing the search capability of the NGO algorithm. The core idea of the SCA is to leverage the oscillatory changes in the sine–cosine model for global and local optimization, so as to obtain the global optimal solution. However, the step search factor in the original SCA is defined as

r = a - \frac{t}{{I t e r}_{m a x}},

which exhibits a linear decreasing trend. This is not conducive to further balancing the global and local optimization capabilities of the NGO algorithm. Therefore, we improve the step search factor and propose a novel nonlinear decreasing search factor, as follows:

r_{1} = b \cdot {(1 - {(\frac{t}{I t e r_{m a x}})}^{η})}^{\frac{1}{η}},

(32)

where

η

is the adjustment coefficient, which is set to 1.1 in this paper;

b

is a constant with a value of 1 in this study; and t denotes the number of iterations.

3.: Introduction of Nonlinear Weight Factor

During the global search process of the NGO algorithm, the position information of the population is affected by its current position. Therefore, a nonlinear weight factor

ω

is introduced into Equation (33) to adjust the dependence of individual position updates in the population on individual information.

ω = \frac{e^{\frac{t}{i t e r_{m a x}}} - 1}{e - 1},

(33)

In the initial stage of the search, reducing the value of

ω

can decrease the impact of the position changes in search individuals on the global optimal solution, thereby enhancing the global optimization capability. In the later stage of the optimization process, the value of

ω

can effectively utilize the correlation between the current position and individual positions, thus accelerating the convergence speed of the algorithm. The modified position update formula for the new discoverers is given as follows:

X_{i, j}^{t + 1} = \{\begin{matrix} ω \cdot X_{i, j}^{t} + r_{1} \cdot s i n r_{2} \cdot |r_{3} \cdot X_{b e s t} - X_{i, j}^{t}|, R_{2} < S T \\ ω \cdot X_{i, j}^{t} + r_{1} \cdot c o s r_{2} \cdot |r_{3} \cdot X_{b e s t} - X_{i, j}^{t}|, R_{2} \geq S T \end{matrix},

(34)

where

r_{2} \in [0, 2 π]

determines the movement distance, and

r_{3} \in [0, 2 π]

.

The above improvement strategies are mainly designed to address the common non-convex search space and premature convergence problems in the hyperparameter optimization process of the CNN-LSTM model. They enhance the search behavior of the NGO algorithm from the algorithmic level, rather than being tailored to the specific physical mechanism of voltage sags. The refraction-opposition-based learning mechanism, sine–cosine search strategy, and nonlinear weight factor serve to improve population diversity, balance global exploration and local exploitation, and enhance search stability, respectively. The three mechanisms work synergistically to improve the robustness and convergence reliability of the hyperparameter tuning process for the deep learning model.

1.4. Performance Testing and Analysis of the SCNGO Algorithm

To verify the optimization performance and stability of the SCNGO algorithm, it is compared with the NGO algorithm, Grey Wolf Optimizer (GWO) [33], and Whale Optimization Algorithm (WOA) [34]. These algorithms are evaluated using six benchmark functions of different types as algorithm performance test functions. Among them, the unimodal test functions (f₁, f₄, f₅) are used to evaluate the local search capability of the algorithms, while the multimodal test functions (f₈, f₉, f₁₁) are adopted to verify the global search capability and the ability to escape local optima of the algorithms. Specifically, a lower objective function value indicates better convergence accuracy of the algorithm; a smaller mean value of the results implies stronger capabilities of the algorithm to avoid local optima and perform global optimization; in addition, a smaller standard deviation reflects higher stability and stronger robustness of the algorithm’s optimization performance across different operating scenarios. The test functions are listed in Table 1, and the results of the benchmark test functions are presented in Table 2.

As can be seen from Table 2, SCNGO achieves better optimal values, mean values, and smaller standard deviations on the unimodal functions

f_{1}, f_{3}, f_{4}

, demonstrating high convergence accuracy. For the multimodal functions

f_{8}, f_{9}, f_{11}

, its average performance is still superior to that of other algorithms. For instance, the mean value of

f_{8}

is significantly lower than those of NGO and GWO, with the smallest standard deviation, indicating that SCNGO also possesses better robustness and optimization consistency in complex search spaces. In addition, for

f_{9}

and

f_{11}

, SCNGO can stably reach the theoretical optimal value, while some algorithms still exhibit deviations. Overall, SCNGO exhibits good comprehensive performance across different function types. To further compare the convergence speed and convergence characteristics of each algorithm, convergence curves of the six benchmark functions are plotted as shown in Figure 4, so as to more intuitively demonstrate the differences in their performance.

As can be seen from Figure 4, SCNGO exhibits excellent optimization performance across different types of test functions: for unimodal functions with smooth surfaces, it declines most rapidly in the early stage of iteration and reaches the stable region at the earliest time, demonstrating stronger global search efficiency. For multimodal functions with complex structures and numerous local optima, SCNGO can not only continuously reduce the fitness value and eventually achieve a lower convergence value but also maintain a stable convergence process without obvious fluctuations, thus showing higher stability and robustness. Overall, SCNGO is significantly superior to the comparison algorithms in terms of search speed, ability to escape local optima, and convergence consistency, which is highly consistent with the optimal values, mean values, and standard deviation results presented in the table. This further verifies its comprehensive advantages in complex optimization tasks.

2. KECA Combined with SCNGO-CNN-LSTM Prediction Model

The performance of deep learning models largely depends on rational hyperparameter configuration. For the CNN-LSTM network, key hyperparameters such as the number of LSTM hidden units, learning rate, and Dropout ratio directly affect the model’s feature representation capability, training convergence speed, and generalization performance. However, traditional hyperparameter selection methods based on manual experience or grid search are not only inefficient but also prone to falling into local optima. In combination with the requirements of the voltage sag prediction task, this paper introduces the improved Northern Goshawk Optimization algorithm (SCNGO) to realize automatic optimization of the CNN-LSTM hyperparameters. The main process is as follows:

1.: Based on the structural characteristics of CNN-LSTM and sensitivity analysis of model performance, the three most critical hyperparameters are selected as the optimization targets, and their settings are presented in Table 3.

In the table, LSTM hidden units refers to the number of LSTM hidden units, Initial Learning Rate denotes the learning rate, and Dropout is a hyperparameter used to improve the generalization ability of the model.

2.: Optimization Objective Function

SCNGO adopts the root mean square error (RMSE) of the validation set as the fitness function, aiming to minimize the prediction deviation of the model. The fitness function is defined as follows:

F i n t n e s s (x) = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i}^{r e a l} - y_{i}^{p r e d} (x))}^{2},

(35)

where

y_{i}^{real}

is the true value,

y_{i}^{pred} (x)

is the predicted value of the CNN-LSTM model trained with the given hyperparameter combination

x

, and

N

is the number of samples. A smaller fitness value indicates better performance of the model obtained with the corresponding hyperparameter combination.

3.: SCNGO Optimization Mechanism

To further enhance the search performance, this paper enhances the NGO algorithm in three aspects:

Initialization with Refraction-Opposition-Based Learning

It generates both the original solutions and their opposition-based solutions simultaneously, which enhances the diversity of the initial population, improves the global search capability, and reduces the risk of premature convergence.

Enhancement of the Exploration Phase with the Sine–cosine Strategy (SCA)

Leveraging the oscillatory characteristics of the sine and cosine functions, it introduces jump displacements during the search process, thereby enhancing the algorithm’s ability to escape local optima in complex spaces.

Adjustment of Search Step Size with Nonlinear Weight Factor

By introducing a dynamically decaying weight, it achieves a smooth transition from global search to local convergence, thus improving the accuracy of the final solution and the convergence speed.

The optimal hyperparameters obtained via SCNGO optimization are used to construct the final CNN-LSTM model, enabling the model to achieve higher prediction accuracy and better generalization performance while ensuring the convergence speed. The convergence curve is shown in Figure 5.

Optimization Process and Result Analysis

SCNGO first randomly generates 12 sets of initial hyperparameter combinations in the search space and takes RMSE as the fitness function to train and evaluate the CNN-LSTM model. As can be seen from the convergence curve, the algorithm achieves a significant performance improvement within the first 5 iterations, with the RMSE dropping rapidly from 11.906 to approximately 11.73, indicating that SCNGO can efficiently locate high-quality parameter regions in the early stage. Subsequently, the optimization trend gradually stabilizes, with minimal fluctuations between iterations and no obvious oscillations or regressions, demonstrating good global convergence and search stability. Finally, after the improvement margin becomes negligible for several consecutive iterations, the model triggers the early stopping mechanism and completes hyperparameter optimization, which avoids unnecessary iterative computations and improves the overall optimization efficiency.

Figure 6 shows the fitness distribution diagram. It can be seen that the individual solutions are highly concentrated near the optimal solution, indicating that SCNGO has favorable convergence performance. Meanwhile, the population can achieve superior results under different hyperparameter combinations (especially the number of LSTM units and Dropout ratio), which also reflects the robustness of the model performance. Finally, the globally optimal parameters obtained by SCNGO search are as follows: LSTM = 64, Learning Rate = 0.004978, Dropout = 0.469, corresponding to an RMSE of 11.6. The hyperparameter search and evaluation of SCNGO are mainly completed based on the training set, and the training results are shown in Figure 7.

As shown in Figure 7, it presents the prediction performance of the CNN-LSTM optimized by SCNGO on the training set. On the whole, the model is consistent with the voltage fluctuation trend, indicating that it can accurately depict the dynamic changes and fluctuation characteristics of the voltage signal over time. Although locally there is slight smoothing and minor lag at rapid mutation points, the overall error distribution is stable with no obvious deviation, laying a solid foundation for subsequent testing.

In addition, this paper adopts Kernel Entropy Component Analysis (KECA) for dimensionality reduction in the original voltage-related features, so as to extract key features and eliminate redundant information. The features processed by KECA are input into the SCNGO-CNN-LSTM model, realizing the synergistic improvement of feature extraction and hyperparameter optimization. The voltage sag prediction process of CNN-LSTM based on the joint optimization of KECA and SCNGO is shown in Figure 8.

The KECA–SCNGO–CNN–LSTM framework is a supervised learning-based, data-driven method for voltage sag prediction and analysis. Its primary objective is to characterize the evolutionary characteristics of voltage-related signals over short time scales using historical monitoring data, thereby improving prediction accuracy and feature representation capability, while it does not involve control action generation or policy update processes. Several Deep Reinforcement Learning (DRL) methods proposed in recent years (e.g., DDPG [35] and SARSA [36]) are mainly oriented toward control tasks such as voltage regulation or power compensation. Their core idea is to gradually learn control policies under the constraints of reward functions through online interaction with physical systems, which usually rely on explicit control objectives, system feedback, and real-time decision-making mechanisms. From the perspective of methodological positioning, there are differences between the prediction modeling framework adopted in this paper and the above-mentioned DRL methods in terms of technical levels and research priorities. The former focuses on enhancing the modeling capability of voltage sag evolution trends via offline data analysis under limited interaction conditions; the latter places greater emphasis on online control optimization and policy learning. Therefore, the two types of methods exhibit a certain degree of complementarity in terms of modeling objectives and evaluation dimensions. The prediction model proposed in this paper can serve as an upper-layer analysis or decision support tool, providing valuable prior information for subsequent control or regulation strategies.

3. Verification

3.1. Sample Set Construction

The data used in the experiments of this paper were collected from the Aucma CFD-50 vaccine storage refrigerators, manufactured by Qingdao Aucma Global Medical Co., Ltd., Qingdao, Shandong, China, and deployed in Kenyan hospitals under the New Horizons project of Global Health Labs. The total volume of AC power monitoring data for a single month exceeded 2.8 million records. To verify the performance of the proposed prediction model, 20,000 data records with obvious voltage fluctuations were selected as the research samples; the sampling interval was 10 ms, covering a time period of approximately 200 s. Among the features, voltage and current were calculated as root mean square (RMS) values based on a fixed sampling window, power features corresponded to active power, and frequency was the real-time measured value of the system operating frequency. All features were sampled at equal time intervals, and the model inputs were constructed via a sliding time window method to characterize the short-term dynamic evolution characteristics of electrical quantities during voltage sag events. The data were divided at an 8:2 ratio, where the first 160 s were used for model training and the remaining 40 s for model testing so as to verify the model’s temporal prediction capability for transient variation characteristics.

3.2. Evaluation Metrics

To evaluate the performance of the established model for voltage sag prediction, four metrics were selected: Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²). Specifically, RMSE is generally used to represent the degree of data dispersion; MAE can indicate the bias in prediction; in addition, R² is used to measure the linear correlation between actual values and predicted values. The definitions of each metric are given as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(E_{s p} - E_{s p}^{'})}^{2},

(36)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(E_{s p} - E_{s p}^{'})}^{2}},

(37)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |E_{s p} - E_{s p}^{'}|,

(38)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(E_{s p} - E_{s p}^{'})}^{2}}{\sum_{i = 1}^{n} n {(E_{s p} - \bar{E_{s p}})}^{2}}

(39)

where n denotes the number of samples;

E_{s p}

is the actual value of voltage;

E_{s p}^{'}

is the predicted value; and

\bar{E_{s p}}

is the mean value of the actual voltage values.

3.3. Data Preprocessing

There are significant differences in the sensitivity of various power quality characteristic parameters to disturbances during voltage sag events. Specifically, key parameters such as current, power, and frequency exhibit distinct response characteristics in the case of instantaneous voltage drops. For instance, sudden changes in current can directly reflect transient disturbances caused by load variations or short-circuit faults; power fluctuations are capable of revealing load nonlinearity and system energy changes; while frequency deviations may indicate unbalanced operating states at either the generation or load side. These features usually present strong correlation and redundancy in both the time and frequency domains. The same event may trigger similar responses across multiple parameters, resulting in an information overlap effect, which thereby impairs the effective characterization of the core features of voltage sags.

In the case of excessively high feature dimensions, although a large amount of raw information is retained, redundant features will increase the computational complexity of the prediction model and even degrade its recognition and prediction accuracy. Conversely, excessively low feature dimensions may lead to the omission of key risk information, affecting the accurate prediction of voltage sags. To address this issue, this paper adopts Kernel Entropy Component Analysis (KECA) for feature mining and dimensionality reduction in the original voltage-related features. The kernel width parameter is set based on common empirical criteria, combined with the statistical distribution and dimensional characteristics of the input data. Moreover, KECA is mainly used for feature dimensionality reduction and redundancy suppression rather than independent prediction modeling in this study, and the overall performance of the model shows low sensitivity to moderate changes in kernel parameters. Through kernel mapping and entropy analysis, KECA can not only eliminate redundant features but also extract core information that is sensitive to voltage sags, providing efficient and representative input features for the CNN-LSTM-based prediction model.

The input features in this study include four key electrical quantities: voltage, current, power, and frequency. Due to their complex dynamic changes and nonlinear correlations during voltage sag events, it is difficult for simple methods to accurately capture their core features. Therefore, the adoption of a sophisticated machine learning framework can more effectively extract key information related to voltage sags, thereby improving prediction accuracy and model robustness.

Different from Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA), which aim at variance maximization, KECA evaluates the information contribution of feature directions based on second-order Rényi entropy, with a greater focus on characterizing the probability density structure of data. For voltage sag data, its transient characteristics are often manifested as abrupt changes in distribution patterns rather than simple variations in amplitude variance. Thus, the entropy criterion has certain advantages in characterizing such non-stationary features. Via kernel mapping, KECA can map the nonlinear dependencies among original features to a high-dimensional space, enabling entropy analysis to measure the complex statistical correlations between multiple variables. In this study, the kernel function and kernel width parameter are set according to the statistical distribution and dimensional characteristics of the input features to ensure the smoothness and effectiveness of the mapping. To compare the differences in feature extraction performance among different dimensionality reduction methods, this paper, respectively, adopts PCA, KPCA, and KECA to process the original feature data, with the cumulative information retention rate as the comparison metric. The relevant results are presented in Table 4 and Figure 9. This comparison is mainly used to evaluate the capabilities of different methods in feature redundancy suppression and effective information retention, providing a feature-level reference basis for subsequent prediction model performance analysis.

As can be seen from the tables and figures, KECA achieves the highest contribution rate on the first principal component, which is significantly higher than those of PCA and KPCA. This indicates that KECA can maximize the information content while retaining only a small number of principal components, and the key features most relevant to voltage sags can be extracted solely via the first principal component. In contrast, the information of KPCA is mainly distributed across the second and third principal components, resulting in a more dispersed feature representation. Although PCA exhibits a relatively uniform information distribution across all principal components, its overall feature extraction efficiency is significantly lower than that of KECA, with the gap being more pronounced especially in terms of the cumulative contribution rate of the first three principal components. Therefore, KECA is more suitable as the feature extraction method for voltage sag prediction, providing more representative input features for subsequent models and thus improving prediction performance.

3.4. Experimental Results and Analysis

To verify the prediction performance of the proposed model, the test data were input separately into the trained models of CNN-LSTM, GWO-CNN-LSTM, NGO-CNN-LSTM, and SCNGO-CNN-LSTM for comparison. The comparison chart of the predictions generated by the four models is shown in Figure 10.

As shown in Figure 10, different optimization algorithms exert varying impacts on the accuracy of prediction models. Compared with other models, the SCNGO-CNN-LSTM model yields predictions closer to the actual values, and it can track the actual values more accurately at turning points with a better fitting degree.

Table 5 presents a performance comparison of the prediction results obtained from different voltage sag prediction models. It can be observed that the SCNGO-CNN-LSTM model achieves lower values of MSE, RMSE, and MAE, as well as a higher R² score, compared with other methods. Thus, the SCNGO-CNN-LSTM model exhibits superior performance in voltage state prediction.

To further verify the model’s response capability in environments with rapid voltage fluctuations, this study conducted 30 repeated single-sample inference tests and a batch prediction experiment with 3999 data points, with the results presented in Table 6. The single inference time of all four models stably stays in the range of 3–4 ms, which is far lower than the sampling period (10 ms). Overall, the SCNGO-CNN-LSTM model achieves an optimal balance between prediction accuracy and real-time inference performance, demonstrating potential for application in scenarios requiring real-time voltage sag prediction and rapid response.

This study verified the KECA-SCNGO-CNN-LSTM model based on real power monitoring data. The experimental results indicate that this model can effectively extract key features related to voltage sags and exhibits favorable prediction accuracy and robustness on the adopted dataset. Although the scope of experimental verification is currently limited to the existing dataset, the model’s stable modeling capability for features such as voltage, current, power, and frequency lays a foundation for its further verification under more abundant data conditions and different operating scenarios.

4. Discussion

Experimental results demonstrate that the proposed KECA–SCNGO–CNN–LSTM framework can effectively achieve voltage sag prediction based on historical monitoring data. Compared with relevant studies, the method in this paper focuses more on nonlinear feature representation and data-driven modeling, which is consistent with the research hypothesis proposed herein: through reasonable feature extraction methods and deep learning structures, the evolutionary laws of voltage sags can be characterized from time-series data. In terms of feature extraction, KECA is employed to address the prevalent nonlinearity, coupling characteristics, and non-Gaussian distribution features in voltage sag-related signals. By preserving information concentration and density structure, KECA can provide compact and information-rich representations for voltage-related features, which helps improve prediction performance when combined with the CNN–LSTM model. The experimental results are consistent with the conclusions in existing studies that entropy-based methods are suitable for nonlinear signal analysis.

In terms of model optimization, SCNGO is used to optimize the key hyperparameters of the CNN–LSTM model. By focusing on the parameter configurations that most significantly affect prediction performance, this optimization strategy achieves a good balance between prediction accuracy and computational efficiency. The results show that targeted hyperparameter optimization can effectively improve model robustness without introducing excessive computational overhead. The framework has generality and can be extended to higher-dimensional hyperparameter optimization scenarios, but it is still necessary to make a trade-off between computational resources and practical application requirements. This study is mainly aimed at verifying the feasibility of the proposed method.

Future research can be further expanded in the following aspects: first, extend the proposed prediction framework to online monitoring scenarios, and evaluate the feasibility of its real-time application by combining computational complexity and response latency analysis; second, further verify the generalization capability of the model based on more diverse datasets and operating conditions; third, explore the possibility of combining prediction results with voltage regulation or compensation strategies to enhance its practical application value in power quality management.

5. Conclusions

A voltage sag prediction method based on KECA-SCNGO-CNN-LSTM is proposed. Through experimental analysis and model comparison, the following conclusions are drawn:

The proposed voltage sag prediction method based on KECA adopts Kernel Entropy Component Analysis for data dimensionality reduction to eliminate redundant information in high-dimensional data. It can not only extract core features that effectively reflect voltage sags but also maximize the retention of hidden information in nonlinear data, thereby improving the input quality of the prediction model.
The proposed KECA-SCNGO-CNN-LSTM method integrates the dimensionality reduction technology based on Kernel Entropy Component Analysis (KECA) and the SCNGO optimization strategy into the CNN-LSTM framework. Experimental results show that this method outperforms other models in multiple evaluation metrics, achieving high prediction accuracy and robustness. It can effectively extract key features from voltage sag-related data, thus improving the performance of the voltage sag prediction model.

Author Contributions

Conceptualization, L.S. and J.B.; methodology, L.S.; software, L.S.; validation, L.S., J.B. and Y.X.; formal analysis, L.S.; investigation, L.S.; resources, L.S.; data curation, L.S.; writing—original draft preparation, L.S.; writing—review and editing, L.S.; visualization, L.S.; supervision, J.B. and Y.X.; project administration, J.B. and Y.X.; funding acquisition, J.B. and Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Jilin Provincial Department of Science and Technology, China, grant number 20230204093YY.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chowdhury, B.H. Power quality. IEEE Potentials 2002, 20, 5–11. [Google Scholar] [CrossRef]
Tang, L.; Han, Y.; Yang, P.; Wang, C.; Zalhaf, A.S. A review of voltage sag control measures and equipment in power systems. Energy Rep. 2022, 8, 207–216. [Google Scholar] [CrossRef]
Caicedo, J.E.; Agudelo-Martínez, D.; Rivas-Trujillo, E.; Meyer, J. A systematic review of real-time detection and classification of power quality disturbances. Prot. Control Mod. Power Syst. 2023, 8, 1–37. [Google Scholar] [CrossRef]
Han, Y.; Feng, Y.; Yang, P.; Xu, L.; Xu, Y.; Blaabjerg, F. Cause, classification of voltage sag, and voltage sag emulators and applications: A comprehensive overview. IEEE Access 2019, 8, 1922–1934. [Google Scholar] [CrossRef]
Lamoree, J.; Mueller, D.; Vinett, P.; Jones, W.; Samotyj, M. Voltage sag analysis case studies. IEEE Trans. Ind. Appl. 2002, 38, 1083–1089. [Google Scholar]
IEEE Std 1159-2019; IEEE Recommended Practice for Monitoring Electric Power Quality. IEEE: New York, NY, USA, 2019.
Kuppusamy, R.; Nikolovski, S.; Teekaraman, Y. Review of machine learning techniques for power quality performance evaluation in grid-connected systems. Sustainability 2023, 15, 15055. [Google Scholar] [CrossRef]
Porawagamage, G.; Dharmapala, K.; Chaves, J.S.; Villegas, D.; Rajapakse, A. A review of machine learning applications in power system protection and emergency control: Opportunities, challenges, and future directions. Front. Smart Grids 2024, 3, 1371153. [Google Scholar] [CrossRef]
Turović, R.; Dragan, D.; Gojić, G.; Petrović, V.B.; Gajić, D.B.; Stanisavljević, A.M.; Katić, V.A. An end-to-end deep learning method for voltage sag classification. Energies 2022, 15, 2898. [Google Scholar] [CrossRef]
Samanta, I.S.; Panda, S.; Rout, P.K.; Bajaj, M.; Piecha, M.; Blazek, V.; Prokop, L. A comprehensive review of deep-learning applications to power quality analysis. Energies 2023, 16, 4406. [Google Scholar] [CrossRef]
Chen, W.; Hao, X.; Lin, J. Identification of voltage sags in distribution system using wavelet transform and SVM. In Proceedings of the IEEE International Conference on Control and Automation, Guangzhou, China, 30 May–1 June 2007; pp. 1605–1609. [Google Scholar]
Erişti, H.; Demir, Y. A new algorithm for automatic classification of power quality events based on wavelet transform and SVM. Expert Syst. Appl. 2010, 37, 4094–4102. [Google Scholar] [CrossRef]
Ravi, T.; Sathish Kumar, K.; Dhanamjayulu, C.; Khan, B. Utilization of Stockwell transform and random forest algorithm for efficient detection and classification of power quality disturbances. J. Electr. Comput. Eng. 2023, 2023, 6615662. [Google Scholar] [CrossRef]
Topaloglu, I. Deep learning-based a new approach for power quality disturbances classification in power transmission systems. J. Electr. Eng. Technol. 2023, 18, 77–88. [Google Scholar] [CrossRef]
Abbass, M.J.; Lis, R.; Rebizant, W. A predictive model using long short-term memory technique for power system voltage stability. Appl. Sci. 2024, 14, 7279. [Google Scholar] [CrossRef]
Aksan, F.; Li, Y.; Suresh, V.; Janik, P. CNN–LSTM vs. LSTM–CNN to predict power flow direction: A case study of the high-voltage subnet of northeast Germany. Sensors 2023, 23, 901. [Google Scholar] [CrossRef]
Xue, J.; Ma, J.; Ma, X.; Zhang, L.; Bai, J. Research on Voltage Prediction Using LSTM Neural Networks and Dynamic Voltage Restorers Based on Novel Sliding Mode Variable Structure Control. Energies 2024, 17, 5528. [Google Scholar] [CrossRef]
Zhang, B.; Chen, Z.; Zhou, Y.; Wang, W.; Xu, X. Data-driven method for voltage sag consequence state recognition for industrial users. IET Gener. Transm. Distrib. 2025, 19, e70068. [Google Scholar] [CrossRef]
Zheng, C.; Dai, S.; Zhang, B.; Li, Q.; Liu, S.; Tang, Y.; Wang, Y.; Wu, Y.; Zhang, Y. A residual voltage data-driven prediction method for voltage sag based on data fusion. Symmetry 2022, 14, 1272. [Google Scholar] [CrossRef]
Akdeniz, M.; Özer, İ.; Efe, S.B. Deep learning based prediction of power quality disturbances in distribution networks. Int. J. Energy Smart Grid 2025, 10, 79–88. [Google Scholar] [CrossRef]
Garcia, C.I.; Grasso, F.; Luchetta, A.; Piccirilli, M.C.; Paolucci, L.; Talluri, G. A comparison of power quality disturbance detection methods using CNN, LSTM and CNN–LSTM. Appl. Sci. 2020, 10, 6755. [Google Scholar] [CrossRef]
Liu, J.; He, Q.; Yue, Z.; Pei, Y. A hybrid strategy-improved SSA–CNN–LSTM model for metro passenger flow forecasting. Mathematics 2024, 12, 3929. [Google Scholar] [CrossRef]
Yongguang, M.; Yongsheng, F. Research on fault warning method of wind turbine gearbox. Acta Energiae Solaris Sin. 2023, 44, 67–73. [Google Scholar]
Schölkopf, B.; Smola, A.; Müller, K.R. Kernel principal component analysis. In Proceedings of the International Conference on Artificial Neural Networks, Lausanne, Switzerland, 8–10 October 1997; pp. 583–588. [Google Scholar]
Xipei, M.A.; Lei, Z.; Sun, Y.Z. Comparison of kernel entropy component analysis with several dimensionality reduction methods. J. Donghua Univ. 2017, 34, 4. [Google Scholar]
Gómez-Chova, L.; Jenssen, R.; Camps-Valls, G. Kernel entropy component analysis in remote sensing data clustering. In Proceedings of the IEEE IGARSS, Vancouver, BC, Canada, 24–29 July 2011; pp. 3728–3731. [Google Scholar]
Zhou, H.; Shi, T.; Liao, G.; Xuan, J.; Duan, J.; Su, L.; He, Z.; Lai, W. Weighted kernel entropy component analysis for fault diagnosis of rolling bearings. Sensors 2017, 17, 625. [Google Scholar] [CrossRef]
Tu, B.; Zhou, C.; Peng, J.; He, W.; Ou, X.; Xu, Z. Kernel entropy component analysis-based robust hyperspectral image supervised classification. Remote Sens. 2019, 11, 2823. [Google Scholar] [CrossRef]
Dehghani, M.; Hubálovský, Š.; Trojovský, P. Northern goshawk optimization. IEEE Access 2021, 9, 162059–162080. [Google Scholar] [CrossRef]
Gopi, S.; Mohapatra, P. Fast random opposition-based learning Aquila optimization algorithm. Heliyon 2024, 10, e26187. [Google Scholar] [CrossRef]
Shao, P.; Yang, L.; Tan, L.; Li, G.; Peng, H. Enhancing artificial bee colony algorithm using refraction principle. Soft Comput. 2020, 24, 15291–15306. [Google Scholar] [CrossRef]
Li, C.; Liang, K.; Chen, Y.; Pan, M. Exploitation-boosted sine cosine algorithm. Eng. Appl. Artif. Intell. 2023, 117, 105620. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Wu, H.; Xu, Z.; Wang, M.; Jia, Y. Full-model-free adaptive graph DDPG for voltage control. J. Mod. Power Syst. Clean Energy 2024, 12, 1893–1904. [Google Scholar] [CrossRef]
Yao, G.; Zhang, N.; Duan, Z.; Tian, C. Improved SARSA and DQN algorithms. Theor. Comput. Sci. 2025, 1027, 115025. [Google Scholar] [CrossRef]

Figure 1. The flowchart of the KECA algorithm.

Figure 2. LSTM Neural Network.

Figure 3. The prediction workflow of the CNN-LSTM.

Figure 4. Comparison of Convergence Curves of Test Functions and Four Algorithms.

Figure 5. SCNGO Algorithm Convergence Curve.

Figure 6. Fitness Distribution Map.

Figure 7. Training Results.

Figure 8. KECA-SCNGO-CNN-LSTM-Based Voltage Prediction Flowchart.

Figure 9. Pareto Chart of Contribution Rates.

Figure 10. Prediction results of different models.

Table 1. Comparison of evaluation metrics for different prediction models.

Function	Expression	Dimension	Range	Minimum
Sphere Function	$f_{1} = \sum_{i = 1}^{n} x_{i}^{2}$	30	[−100, 100]	0
Schwefel’s Problem 1.2	$f_{3} (x) = \sum_{i = 1}^{n} {(\sum_{j = 1}^{i} x_{j})}^{2}$	30	[−100, 100]	0
Schwefel’s Problem 2.21	$f_{4} (x) = {m a x}_{i} {∣ x_{i} ∣, 1 \leq i \leq D}$	30	[−100, 100]	0
Generalized Schwefel’s Problem 2.26	$f_{8} (x) = - \sum_{i = 1}^{30} (X_{i} s i n (\sqrt{\|X_{i}\|}))$	30	[−500, 500]	−12,596.4
Generalized Rastrigin’s Function	$f_{9} (x) = \sum_{i = 1}^{30} [x_{i}^{2} - 10 c o s (2 π x_{i}) + 10]$	30	[−5.12, 5.12]	0
Generalized Griewank’s Function	$f_{11} (x) = \frac{1}{4000} \sum_{i = 1}^{n} (x^{2}) \prod_{i = 1}^{n} c o s \frac{x_{i}}{\sqrt{i}} + 1$	30	[−600, 600]	0

Table 2. The statistical results of SCNGO and other algorithms.

Function	Algorithm	Best Value	Worst Value	Mean Value	Standard Deviation
f₁	SCNGO	1.6559 × 10⁻²⁰⁶	1.4509 × 10⁻¹⁷⁰	4.8379 × 10⁻¹⁷²	0.0000 × 10⁰⁰
	NGO	4.7151 × 10⁻⁵³	2.2351 × 10⁻⁵⁰	5.9192 × 10⁻⁵¹	6.2447 × 10⁻⁵¹
	GWO	2.5801 × 10⁻¹⁶	2.2977 × 10⁻¹⁴	4.4533 × 10⁻¹⁵	4.3082 × 10⁻¹⁵
	WOA	1.4368 × 10⁻⁵³	2.4804 × 10⁻⁴²	9.7348 × 10⁻⁴⁴	4.5149 × 10⁻⁴³
f₃	SCNGO	6.6322 × 10⁻¹⁵⁶	2.7398 × 10⁻¹¹⁵	9.1343 × 10⁻¹¹⁷	5.0022 × 10⁻¹¹⁶
	NGO	1.9058 × 10⁻¹⁵	2.7078 × 10⁻¹⁰	1.3249 × 10⁻¹¹	4.9742 × 10⁻¹¹
	GWO	4.5745 × 10⁻⁴	5.3635 × 10⁻¹	5.9710 × 10⁻²	1.2462 × 10⁻¹
	WOA	3.6795 × 10⁴	9.6152 × 10⁴	6.9031 × 10⁴	1.5086 × 10⁴
f₄	SCNGO	3.3043 × 10⁻¹⁰²	5.1806 × 10⁻⁸⁴	2.4450 × 10⁻⁸⁵	9.9807 × 10⁻⁸⁵
	NGO	9.1344 × 10⁻²³	1.5951 × 10⁻²¹	5.8853 × 10⁻²²	4.1087 × 10⁻²²
	GWO	6.4135 × 10⁻⁵	5.1818 × 10⁻³	1.0015 × 10⁻³	9.3749 × 10⁻⁴
	WOA	4.2332 × 10⁰⁰	8.8001 × 10¹	5.5331 × 10¹	2.6901 × 10¹
f₈	SCNGO	−12,569.4803	−12,212.2835	−12,548.0041	74.8505
	NGO	−8858.0190	−6658.1234	−7567.8199	525.6237
	GWO	−7437.7182	−3267.8564	−6062.0732	837.7509
	WOA	−12,568.1147	−6679.8754	−10,801.2927	1825.0107
f₉	SCNGO	0.000000	0.000000	0.000000	0.000000
	NGO	0.000000	0.000000	0.000000	0.000000
	GWO	0.000000	17.421495	3.622879	4.584073
	WOA	0.000000	0.000000	0.000000	0.000000
f₁₁	SCNGO	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰
	NGO	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰
	GWO	0.000000 × 10⁰⁰	4.154980 × 10⁻²	4.712644 × 10⁻³	1.059956 × 10⁻²
	WOA	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰	0.000000 × 10⁰⁰

Table 3. Optimized Parameter Settings.

Parameter	Function	Search Range	Optimum
LSTM hidden units	Control memory capacity	8–64	64
Initial Learning Rate	Control training speed	0.0001–0.005	0.004978
Dropout	Control overfitting	0.05–0.5	0.469

Table 4. Contribution Rate of Each Kernel Entropy Component.

Principal Component	PCA (%)	KPCA (%)	KECA (%)
1	65.15	70.55	74.08
2	19.28	13.18	16.69
3	2.16	6.75	9.86
4	0.24	1.32	1.42

Table 5. Comparison of prediction results of different models.

Mode	MSE	RMSE	MAE	R²
SCNGO-CNN-LSTM	150.2437	13.4125	1.8301	98.81%
NGO-CNN-LSTM	322.8646	17.9684	9.7516	97.33%
GWO-CNN-LSTM	271.5825	16.4798	7.9707	97.75%
CNN-LSTM	452.1439	21.2637	13.1258	96.26%

Table 6. Comparison of inference time among different models.

Model	Average Single-Sample Inference Time (ms)	Inference Time Standard Deviation (ms)	Batch Average Inference Time (ms/Sample)
SCNGO-CNN-LSTM	3.026	0.459	0.0345
NGO-CNN-LSTM	3.919	0.380	0.0241
GWO-CNN-LSTM	3.058	0.445	0.0284
CNN-LSTM	3.028	0.481	0.0226

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, L.; Xu, Y.; Bai, J. SCNGO-CNN-LSTM-Based Voltage Sag Prediction Method for Power Systems. Energies 2026, 19, 428. https://doi.org/10.3390/en19020428

AMA Style

Sun L, Xu Y, Bai J. SCNGO-CNN-LSTM-Based Voltage Sag Prediction Method for Power Systems. Energies. 2026; 19(2):428. https://doi.org/10.3390/en19020428

Chicago/Turabian Style

Sun, Lei, Yu Xu, and Jing Bai. 2026. "SCNGO-CNN-LSTM-Based Voltage Sag Prediction Method for Power Systems" Energies 19, no. 2: 428. https://doi.org/10.3390/en19020428

APA Style

Sun, L., Xu, Y., & Bai, J. (2026). SCNGO-CNN-LSTM-Based Voltage Sag Prediction Method for Power Systems. Energies, 19(2), 428. https://doi.org/10.3390/en19020428

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SCNGO-CNN-LSTM-Based Voltage Sag Prediction Method for Power Systems

Abstract

1. Introduction

1.1. KECA Algorithm

1.2. CNN-LSTM Prediction Method

1.2.1. Convolutional Neural Network (CNN)

1.2.2. LSTM Neural Network Model

1.3. Improved Northern Goshawk Optimization Algorithm

1.3.1. Northern Goshawk Optimization

1.3.2. Improved Strategies

1.4. Performance Testing and Analysis of the SCNGO Algorithm

2. KECA Combined with SCNGO-CNN-LSTM Prediction Model

3. Verification

3.1. Sample Set Construction

3.2. Evaluation Metrics

3.3. Data Preprocessing

3.4. Experimental Results and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI