Next Article in Journal
Subsymmetric Polynomials on Banach Spaces and Their Applications
Previous Article in Journal
A Hierarchical Fractal Space NSGA-II-Based Cloud–Fog Collaborative Optimization Framework for Latency and Energy-Aware Task Offloading in Smart Manufacturing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Operating Regimes and THD Forecasting in Steelmaking Plant Power Systems Using Advanced Neural Architectures

Department of Electrical Engineering and Industrial Informatics, Politehnica University of Timisoara, 300006 Timișoara, Romania
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(22), 3692; https://doi.org/10.3390/math13223692
Submission received: 7 October 2025 / Revised: 9 November 2025 / Accepted: 12 November 2025 / Published: 18 November 2025
(This article belongs to the Section E: Applied Mathematics)

Abstract

This study offers a comprehensive study of power quality in industrial rolling mill grids, focusing on total harmonic distortion (THD) and its forecasting under different operational conditions. The research begins with a measurement-based evaluation of load variations and the effects of reactive power compensation using capacitor banks. To improve these results, forecasting algorithms were developed utilizing modern methods based on data capable of recognizing both short-term and long-term dependencies within the THD signal. The models were evaluated using three forecasting strategies: classical prediction on test data, autoregressive one-step forecasting, and direct multi-step forecasting. This was done using well-known error and correlation indices like RMSE, MAE, sMAPE, the coefficient of determination (R2), and the Pearson correlation coefficient (ρ). The results indicate that models incorporating both local feature extraction and temporal dynamics provide the most accurate forecasts. In particular, the hybrid convolutional-recurrent structure achieved the best overall performance, with R2 = 0.923 and ρ = 0.961 in classical prediction, and it was the only approach to maintain a positive R2 (0.285) in multi-step forecasting. These results demonstrate the usefulness of modern predictive modeling for Total Harmonic Distortion (THD) in industrial grids, combining conventional measurement-based techniques by offering relevant observations for power quality monitoring and control.

1. Introduction

Industrial power systems are increasingly facing problems related to voltage stability, current harmonic distortions, and generally deteriorating power quality. This issue is mainly due to nonlinear electrical loads, constantly changing operating conditions, and increasing demand for energy efficiency. This condition is especially true for high-power industrial consumers such as metallurgical plants, where electric arc furnaces, ladle furnaces, and hot rolling mills operate. All of these loads are nonlinear and use high-power electrical drives with rapidly fluctuating and transient loads that produce significant harmonic distortions [1,2].
Total harmonic distortion (THD) is a key measure of power quality, influencing equipment reliability, energy efficiency, and compliance with international standards [3,4]. Passive filters and capacitor banks for reactive power compensation are two methods that have been the focus of several studies aimed at reducing harmonic distortion [5,6]. Although these systems function successfully when the load is constant or predictable, they fail when dealing with the frequent and unexpected changes in operating circumstances that are typical of modern industrial settings, like steel rolling mills. In light of this weakness, it is clear that better prediction and control systems are required to detect THD changes before they cause serious issues [7]. Recently, deep learning (DL) and artificial intelligence (AI) have gained recognition as powerful tools for simulating dynamic and nonlinear processes in power grids. For time series forecasting problems, sequential architectures like LSTM, Bidirectional LSTM (BiLSTM), and Gated Recurrent Units (GRU) have shown promise [8,9]. This is due to their ability to accurately capture non-stationary dynamics and long-term temporal dependencies. At the same time, Convolutional Neural Networks (CNN) and deep hybrid models have been successfully applied to extract high-level spatial and temporal features from multidimensional power data, improving both accuracy and generalization in complex industrial scenarios [7]. More recently, attention-based architectures and transformation networks have been shown to improve forecast accuracy by dynamically weighing temporal relationships, thereby capturing context-dependent patterns in power quality signals [8,10].
Building on this research, this paper investigates the measurement-based prediction of Total Harmonic Distortion (THD) in a power system for an industrial rolling mill operating in different configurations—both with and without reactive power compensation. The paper compares the prediction performance of several deep neural architectures in three prediction methods (classical, autoregressive, and multi-step). By analyzing the advantages and limitations of each model, the study presents the potential of modern hybrid and attention-based networks for improving predictive monitoring and resilient operation in industrial power systems.
The present study uses real data measured in the electrical network of a hot rolling mill, thus providing an authentic experimental basis for evaluating the behavior of a reactive power compensation system. Using this real data, the paper presents an extensive comparative analysis of the performances obtained by eight modern deep neural architectures (including various combinations of RNN/LSTM/GRU and hybrid models), evaluated within three distinct forecasting methods: (i) classical direct forecasting (one stage ahead); (ii) iterative autoregressive forecasting (using the model outputs as input for the next steps); and (iii) multi-step forecasting over an extended time horizon. Experimental results highlight that the CNN–GRU–LSTM hybrid architecture proposed in this study is the only model among those tested that manages to maintain a positive coefficient of determination (R2) under the difficult conditions of multi-step forecasting, demonstrating a superior long-term generalization capacity. The study also comparatively evaluates the accuracy of the forecasting models in two operating regimes, highlighting how the compensation regime influences the THD signal variability and the difficulty of the prediction task. By combining these innovative elements, the work provides useful information both on the evaluation of the efficiency of reactive compensation and on the selection of the optimal neural architecture for applications of this type.

2. Literature Review

Recent advances in power quality assessment have highlighted the importance of harmonic distortions, especially total harmonic distortion (THD), in industrial systems such as rolling mills and their relationship to operating conditions. At the same time, the development of predictive modeling methods has allowed their use for accurate THD forecasting under complex and dynamic load conditions.
Numerous studies have investigated harmonic mitigation strategies and THD behavior in real environments, emphasizing the importance of controlling nonlinear loads and fluctuating operating conditions. Other research efforts have focused on evaluating the influence of variable operating conditions on harmonic generation, which has led to the development of methodologies for early detection and management of power quality disturbances.
This literature review organizes recent contributions into four categories: (i) research aimed at general power quality assessment and disturbance classification, (ii) research aimed at harmonic and THD analysis and mitigation in industrial systems, (iii) assessment of operating regimes and their impact on power quality, and (iv) THD forecasting using advanced predictive models. This systematic approach provides a step-by-step presentation, from basic power quality criteria to the application of predictive algorithms in THD prediction in rolling mills.

2.1. General Power Quality Assessment and Disturbance Classification

A novel methodology for power disturbance classification proposed in [11] uses advanced feature extraction techniques that improve the automatic identification of power quality indicators, such as voltage sags, surges, and harmonics. In the study presented in [10], an automated machine learning approach is presented that introduces AutoML frameworks for time series data, with the aim of optimizing the classification process without extensive manual parameter tuning.
Decision rule-based machine learning techniques used in [12] have demonstrated efficient methods for recognizing multiple categories of power anomalies. The approach presented in [7] employed deep learning with recurrence graphs, enabling neural networks to interpret visualized time-series patterns for power quality event detection.
Advanced data detection and compression techniques proposed in [13] enable rapid detection of disturbances while reducing data volume. The investigation of the severity of voltage flicker caused by even-order harmonics in [3] highlighted the specific influence of lower-order harmonic components on human-perceptible flicker effects.
The study presented in [14] introduced optimization-based classifiers, including an extreme learning machine optimized using particle swarm techniques. In [15], the Arrhenius artificial bee colony feature selector was used, highlighting the role of intelligent algorithms in improving classification accuracy. The hybrid approaches presented in [16], combining different machine learning models, have demonstrated enhanced robustness for disturbance detection tasks.

2.2. Analysis and Mitigation of Harmonics and THD

Total harmonic distortion (THD) mitigation is a major challenge in high-power electrical systems such as rolling mills and electric arc furnaces (EAFs). Much research has been conducted to address this issue. A review presented in [4] found that passive filter installations significantly reduce voltage distortion levels in a steel mill, increasing power quality. Harmonic measurements performed in a modern steel production facility and presented in the research [1] provided useful information about harmonic distortion from an EAF. The harmonic control solutions described in article [2] and used in the Yili rolling mill demonstrated the need for adaptive measures.
Advanced prediction algorithms were also investigated. In the research presented in article [17], a hybrid model based on LSTM and ANFIS was presented, which used intelligent methods to predict harmonic distortion in renewable energy systems.
The study presented in [5] proposed the use of artificial sequential neural networks (ANN) to predict total harmonic distortion (THD) at unmonitored buses in large-scale uncertain transmission networks, using offline measurements and simulations. In the research presented in [6], in the context of highly nonlinear loads such as electric arc furnaces, digital thyristor-controlled compensators (TSCs) are presented to dynamically attenuate harmonics using predictive models.
The research presented in [18] introduced an adaptive neuro-fuzzy system for current prediction in electric arc furnaces, demonstrating the effectiveness of hybrid intelligent models in representing nonlinear load dynamics and forecasting current-related distortions under variable operating conditions.
Voltage flicker issues caused by lower-order harmonics were investigated in [3], highlighting the importance of harmonic management in reducing negative consequences.
The paper [19] proposes a universal converter/STATCOM to compensate for imbalance, flicker, reactive power, and load harmonics.
The article [20] provides an extensive review of selective harmonic elimination and mitigation (SHE/SHM) techniques in DC–AC converters, highlighting advances in PWM strategies, optimization methods, and control structures for reducing THD and switching losses in medium- and high-power applications.
The research presented in [21] provides an overview of the multipulse connection circuits of the main AC electrical networks (RED) of rolling mills and identifies solutions capable of improving their electromagnetic compatibility with the network. The results obtained can be used by researchers and engineers to provide electromagnetic compatibility for nonlinear consumers in similar circuits as well as to design them.

2.3. Forecasting THD and Power Quality Indicators Using Modern Neural Networks

Advanced neural network architectures significantly improve the prediction of total harmonic distortion (THD) and other power quality indicators.
Sequential artificial neural networks proposed in [5] demonstrated the ability to estimate THD in poorly monitored networks, showcasing the potential of deep learning to overcome limited measurement availability. The study presented in [8] proposes an ultra-short-term prediction model for photovoltaic energy using a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and a Bidirectional Short-Term Memory (BiLSTM) network. Experimental results indicate that training the prediction model with augmented samples improves the prediction accuracy.
A unified SCBLSTM + DA data model was proposed in [22] to improve the long-term accuracy of WSPat at the target location. The proposed WindForecastX model integrates the CNN-based and BiLSTM-based superimposed ensemble with DAM. The WindForecastX results demonstrate that SCBLSTM + DA significantly outperforms other independent models. Hybrid techniques that combine convolutional and recurrent layers, as presented in [7], have demonstrated strong robustness in capturing nonlinearities and temporal dependencies, both essential for modeling harmonic behavior. In addition, the multi-step prediction strategies proposed in [23] introduced adaptive input selection methods that enhance forecasting performance as more data become available.
The study presented in [9] proposed a photovoltaic power generation prediction model based on parallel bidirectional long short-term memory (BiLSTM) networks. The proposed method combines three BiLSTMs and a deep neural network (DNN) for the prediction of photovoltaic power generation.
The study presented in [24] introduced a sparse transformer-based approach for predicting electricity consumption time series. The proposed method achieves an accuracy comparable to the current RNN-based method, TSRNN, but with a much higher speed.
These studies offer frameworks that are both flexible and scalable, making them suitable for industrial THD forecasting.

3. Materials and Methods

Data Acquisition

A power quality study was conducted on the power supply system of a hot rolling mill within a metallurgical production unit. The power supply system includes several electrical substations connected to the national grid through step-down transformers with a nominal voltage of 110/6 kV. The measurements were made on a section supplying the roughing mill. The structure of this power distribution system is presented in Figure 1. The diagram illustrates the interconnection between the national grid and the internal substations. The measurement points for power quality analysis were located at Station ST1, specifically on feeder lines 1 and 2. Station ST1, which supplies the electrical drives of the rolling mill, served as the main observation point, with experimental measurements being carried out on feeder lines 1 and 2.
A three-phase power analyzer was used to record time-synchronized power quality indicators under different operating conditions. These included the root mean square (RMS) values of the phase (U1, U2, U3) and line (V1, V2, V3) voltages, the corresponding total harmonic distortion (THD), active and reactive power, power factor (PF), displacement power factor (DPF), and V/H totals.
The measurements were made in two distinct configurations: with and without the capacitor bank (used for reactive power compensation) in operation. This approach allows for the direct evaluation of the effect of the capacitor bank on harmonic distortion and power factor correction. Therefore, the recorded data are useful for modeling the behavior of power quality under real conditions and are also useful for forecasting using advanced neural architectures.
The results are illustrated in Figure 2, Figure 3, Figure 4 and Figure 5 and show the power quality indicators for both operating modes: with and without the capacitor bank in operation.
Figure 2a shows the variation of the RMS current in the absence of the capacitor bank. Significant fluctuations in the current value are observed, frequently reaching peaks of over 650 A, indicating a dynamic load and instabilities in the operating modes. The shape of the graph suggests sudden variations, possibly associated with sudden starts or stops of large equipment.
Figure 2b represents the same measurement of the RMS current, but in the presence of a capacitor bank. Compared to the previous figure, a slight reduction in the maximum peaks is observed, but especially a stabilization of the minimum values and an attenuation of the rapid oscillations. This suggests an improvement in the current variation due to the reactive compensation introduced by the capacitors, which reduces the imbalances and load peaks.
Figure 3a illustrates the THD (Total Harmonic Distortion) of the current in the absence of the capacitor bank. Generally, the values remain constant at 2.5–3%, but we also observe some anomalous peaks around 7–8.5%, which indicate strong distortions at isolated moments. These values can affect sensitive equipment and may indicate a disturbed operating regime.
Figure 3b shows the same THD of the current, but with the capacitor bank connected. Although the general structure is very similar to that in Figure 3a (including the peaks), the interpretation must be nuanced: the fact that the capacitor bank does not significantly reduce the THD suggests that the main sources of harmonics are the nonlinear loads, whereas reactive compensation is not, in this case, an effective mechanism for their attenuation.
Figure 4a shows the evolution of the active power without compensation. A pulsating variation is observed, with frequent values between 25 and 65 kW. The calculated active energy is 11,853.29 Wh, reflecting the total consumption during this time interval. This high consumption is typical for a high energy-consuming industrial process such as a rolling mill. Figure 4b shows the active power with the capacitor bank. It can be seen that the curves are similar in amplitude and shape to those in Figure 4a. However, the total active energy has decreased to 10,821.19 Wh, i.e., a saving of approximately 8.7%. This observation demonstrates that reactive compensation reduces losses and optimizes the power flow.
Figure 5a shows the evolution of the reactive power in the absence of the capacitor bank. It is observed that the variation is pulsating, with peaks reaching and exceeding 45 kVAr. This indicates the presence of a significant reactive component in the supplied loads, which is due to induction motors and equipment with coils or transformers in the no-load state. The total reactive energy is 10,271.4 VArh, a significantly higher value than that observed in Figure 5b (with compensation), where it was 7826.92 VArh. This difference of almost 2444.5 VArh (approximately 24%) confirms the efficiency of the reactive compensation achieved by the capacitor bank.

4. THD Forecasting Using Deep Learning Models

4.1. Motivation and Modeling Strategy

Accurate prediction of total harmonic distortion (THD) is necessary for real-time monitoring and predictive control of power quality in factories like hot rolling mills. Given the temporal dynamics of THD fluctuations caused by load variations, nonlinear equipment, and reactive power flows, deep learning models with temporal processing abilities are suitable. Several neural network topologies were designed and tested for THD prediction using measurements from the power supply of the rolling mill. The models vary from simple recurrent architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) to more complex hybrid neural networks that include CNN, LSTM, and GRU layers. Each model is evaluated in terms of predictive accuracy, robustness, and computing efficiency.
In this article, all neural network models employed—from simple feedforward networks (FNN) to recurrent networks (LSTM, GRU, BiLSTM) and hybrid networks (CNN-LSTM, GRU-LSTM)—were evaluated using three basic prediction methodologies. The first method, single-step parallel prediction, involves estimating each future step using just real data as input, resulting in an optimistic estimate of the model’s performance. The single-step autoregressive prediction method uses previously predicted values as input for subsequent estimates, replicating a realistic scenario where the model must self-feed its input sequence. The third method, multi-step horizon prediction, involves creating the complete future sequence (e.g., 10 or more steps) from only a window of known data, with no access to intermediate real values. This approach has significance for validating the robustness of the model in short- and medium-term forecasting scenarios, especially in situations where real data are not available in real time. The systematic use of these three methodologies enabled a rigorous comparative investigation of each model’s capacity to learn the temporal structure of THD signals.
The input window length was set to 200 samples, determined empirically to balance model accuracy and computational efficiency. This duration captures the dominant transient behavior observed in the measured THD, current, and power signals. Sensitivity tests with varying window sizes (100–400 samples) confirmed that shorter windows reduced accuracy by omitting transient dynamics, whereas longer windows yielded marginal gains at the cost of higher complexity. A feature relevance analysis, performed by selectively omitting input parameters, showed that current and reactive power had the highest influence on THD prediction, followed by active power and voltage, confirming the physical consistency of the selected inputs.
All neural network architectures implemented in this paper (CNN, LSTM, GRU, BiLSTM, CNN–GRU–LSTM, and transformer-based models) were trained and tested under identical experimental conditions to ensure a fair comparison of their performance. The main hyperparameters are the number of training epochs (600), the initial learning rate (0.008), and the mini-batch size (128), and they were the same for all models. In addition, the networks were designed with a comparable structural depth and several hidden units (ranging from 60 to 100 per recurrent layer).
Of the measured data set, 75% was used for training and 25% for testing, so the models were evaluated on data sequences not used for training. Given the sequential nature of the THD time series data, k-fold randomized cross-validation was not appropriate, as it would disrupt the temporal dependencies of the signals.
We used rolling (walk-forward) validation, in which the model was iteratively retrained and tested on segments moved forward in the time series. This method provides a satisfactory assessment of the stability of the prediction. This approach follows the standard practice in time-series forecasting, where rolling forecast validation is used to ensure robust performance evaluation under temporal dependency.

4.2. Feedforward Neural Networks (FNN)

Feedforward neural networks (FNN) are the simplest form of deep learning models [25]. For a network with one hidden layer, let the input be x R 2 , where x is the input vector of features, W [ 1 ] R m x n the weight matrix for the first layer, b [ 1 ] R m the bias vector, and σ an element-wise activation function. Then the hidden representation h and output are computed as presented in Equation (1) [25].
h = σ W [ 1 ] x + b [ 1 ] y = W [ 2 ] h + b [ 2 ] ,
Here, W [ 2 ]   a n d   b [ 2 ] represent the weights and bias of the output layer.
The network is trained by minimizing a loss function such as MSE (mean squared error):
  L y , y ^ = 1 N i = 1 N y i y i ^ 2   ,
where y i is the true output and y i ^ the predicted output.

4.3. Convolutional Neural Networks (CNN)

CNNs apply convolution operations to extract spatial features [26]. For a 1D CNN, given an input signal x = x 1 , x 2 , x T , where T is the sequence length, and a kernel w ϵ R k , the convolution output at time is presented in Equation (3) [27]:
z t = i = 1 k 1 w i · x t + i   ,
Here, w i are filter parameters, and the outputs are then passed through nonlinear functions (such as ReLU) and pooling layers to reduce dimensionality.

4.4. Recurrent Models

4.4.1. Long Short-Term Memory (LSTM)

LSTM networks are a type of recurrent neural network (RNN) designed to learn long-term dependencies [25]. Each LSTM cell maintains a cell state c t and hidden state h t . The update process involves several gates [28]:
  • Forget gate f t : decides what information to discard from the cell state.
  • Input gate i t : determines which new information to add.
  • Candidate cell state c ~ t : proposes new values to be added to the cell.
  • Cell state update c t : combines the old state and new candidate.
  • Output gate o t : decides what part of the cell state to output.
The LSTM updates are given by the Equation (4).
f t = σ W f · h t 1 , x t + b f   i t = σ W i · h t 1 , x t + b i c ~ t = t a n h W c · h t 1 , x t + b c c t = f t c t 1 + i t c t ~ o t = σ W o · h t 1 , x t + b o h t = o t t a n h c t
This sequence of operations allows the LSTM to selectively retain, update, and output information at each time step, effectively mitigating the vanishing gradient problem and enabling long-term memory.
Where:
  • x t is the input at time t
  • h t 1 is the hidden state from the previous time step
  • σ is the sigmoid activation function
  •   denotes element-wise multiplication
  • f t ,   i t , o t are the forget, input, and output gates

4.4.2. Gated Recurrent Unit (GRU)

GRUs are a simplified variant of LSTMs with fewer gates as presented in Equation (5) [29].
z t = σ W z · h t 1 , x t + b z   r t = σ W r · h t 1 , x t + b r h ~ t = t a n h W h · r t h t 1 , x t + b h h t = 1 z t h t 1 + z t h t
where:
  • z t is the update gate
  • r t is the reset gate
  • h ~ t is the candidate activation
  • h t   is the final hidden state

4.4.3. Bidirectional LSTM (Bi-LSTM)

Bi-LSTM networks process the sequence in both forward and backward directions, concatenating both hidden states, as presented in Equation (6) [30,31].
h t = h t ;   h t     ,

4.5. RNN with Self-Attention

A self-attention mechanism has been added to the architecture of recurrent neural networks (RNNs) to help them better capture long-term dependencies in time series data. This approach allows the model to give each time step a varied level of priority, instead of only depending on memory [32].
Let the output of the RNN (e.g., GRU or LSTM) at each time step be represented as a sequence of hidden states:
H = h 1 , h 2 , h T     R T x d   ,
where h t   R d is the hidden state at time t and T is the sequence length.
The self-attention mechanism computes three matrices:
Q = H W Q   K = H W K ,   V = H W V
where W Q ,     W K ,     W V   R d x d k are learnable projection matrices, and Q, K, and V     R T x d k .
The attention weights are calculated as
A t t e n t i o n Q ,   K ,   V = s o f t m a x Q K T d k V
This results in a new representation for each time step, which encodes contextual information from the entire sequence.
This context-aware representation is then passed through a feedforward layer or a final regression layer for prediction:
y ^ = F C A t t e n t i o n Q ,   K ,   V
By integrating this mechanism into the RNN pipeline, the model gains the ability to focus on relevant parts of the input sequence, regardless of their temporal distance. This architecture has been particularly effective for time series forecasting problems with non-local dependencies.

4.6. Hybrid Models

Combining convolutional networks (CNNs) with recurrent units such as GRUs and LSTMs in hybrid architectures is a powerful and flexible way to model complex sequences from temporal data. In this configuration, CNNs are used as feature extraction layers to identify local patterns and spatial structures in the sequences before they are processed over time. These extracted features are then passed to a GRU layer, which handles sequential relationships with better computational efficiency than LSTMs, followed by an LSTM layer to capture longer and more complex temporal dependencies. This series of layers allows the model to utilize the best parts of each: CNNs can obtain strong representations, GRUs are fast, and LSTMs are effective at learning long-term relationships. Results indicated that this architecture provides superior performance in THD prediction tasks, especially with preprocessed and augmented data. Using these types of networks allows for in-depth and hierarchically structured learning. This approach works well with the complexity of real-world signals.

4.7. Evaluation Metrics

To quantitatively evaluate the predictive performance of the tested architectures, several commonly used error and correlation metrics were employed. In this work, yi denotes the measured values of THD, while y i ^ represents the predicted THD values provided by the neural networks.
The root mean squared error (RMSE) is defined as
R M S E = 1 N i = 1 N y i y i ^ 2
while the mean absolute error (MAE) is expressed as
M A E = 1 N i = 1 N y i y i ^
The symmetric mean absolute percentage error (sMAPE) measures the relative deviation between predicted and actual values and is given by
s M A P E = 100 N i = 1 N y i y i ^ y i + y i ^ / 2
The coefficient of determination (R2) quantifies the proportion of variance in the measured data explained by the predictions:
R 2 = 1 i = 1 N y i y i ^ 2 i = 1 N y i y ¯ 2
and the Pearson correlation coefficient (ρ) evaluates the linear association between predicted and measured values:
ρ = 1 i = 1 N y i y ¯ y i ^ y ^ ¯ i = 1 N y i y ¯ 2 i = 1 N y i ^ y ^ ¯ 2
These metrics have been widely applied in time-series forecasting and power quality analysis, ensuring comparability of results with existing literature [26,29].

5. Results

This section presents the results obtained for THD prediction using several neural architectures, ranging from simple models to advanced models. Thus, the analysis starts from the feedforward network, then continues with basic recurrent network (RNN) architectures and their extensions, namely GRU and LSTM. Subsequently, more complex networks are investigated, such as BiLSTM, which integrates bidirectional information, and RNN with a self-attention mechanism (of the Transformer type), which adds an increased capacity to capture long-term dependencies. In parallel, convolutional architectures (CNN) are also included, as well as a CNN–GRU–LSTM hybrid model, designed to capitalize on both the efficient extraction of spatial features and the modeling of temporal dependencies.
The performance of the models is evaluated based on standard indicators: root mean square error (RMSE), mean absolute error (MAE), symmetric mean percentage error (sMAPE), coefficient of determination (R2), as well as Pearson correlation coefficient (ρ) and the statistical significance level (p-value) [26,29]. This approach allows a systematic comparison between different architectures and prediction strategies, highlighting both the advantages and limitations of each method.
The evaluation was conducted under three forecasting strategies: classical one-step prediction, one-step autoregressive prediction, and multi-step horizon prediction, to provide a comprehensive comparison of model performance [22,29].
The hyperparameter settings for all networks were carefully selected to ensure comparable model complexity and consistent training conditions (Table 1).
All models were trained using identical conditions with the Adam optimizer (learning rate = 0.008), a mini-batch size of 128, and 600 epochs. Weight initialization followed the He scheme for recurrent and convolutional layers.

5.1. Results Using Feedforward Network

To evaluate the performance of the feedforward network in predicting total harmonic distortion (THD), the analysis considered the three types of prediction mentioned.
Regarding the training parameters, the model receives input from a window of 200 preceding samples, allowing it to capture the signal’s key temporal dependencies. The initial learning rate was set to 0.008, which provides rapid convergence in the early stages of training while maintaining the stability of the optimization process. The mini-batch size was fixed at 128, providing a compromise between gradient stability and computational speed. The maximum number of epochs was established at 600 to ensure an extensive exploration of the network’s parameter space. For performance validation, the dataset was divided into 75% training and 25% testing, ensuring a sufficiently broad background for training the models as well as a representative test set for measuring generalization capacity.
Figure 6a shows the measured data together with the predicted data on the training and testing sets, and from the representation it can be seen that the feedforward model manages to track the evolution of the THD values quite well. The quantitative evaluation of the performance for the classical prediction on the testing data (Figure 6b) indicates very good values of the statistical indicators, with RMSE = 0.221150, MAE = 0.135831, sMAPE = 0.008772, and a correlation coefficient of ρ = 0.894739 (p-value < 10−189), which confirms a high correspondence between the measured and estimated data. In contrast, for the single-step autoregressive prediction (Figure 6c), the performance degrades, obtaining RMSE = 0.570047, MAE = 0.435029, sMAPE = 0.027837, and a very weak correlation, ρ = −0.229568 (p-value ≈ 5.196025 × 10−8). In the case of multi-step prediction (Figure 6d), the situation deteriorates further, with increased errors. The errors go up (RMSE = 0.718889, MAE = 0.569069, sMAPE = 0.569069), and the correlation stays low (ρ = −0.229148). This shows that errors rise up quickly and accuracy drops down over extended time periods. These results demonstrate that the feedforward architecture is suitable for classical point prediction; however, it shows significant deficiencies in autoregressive and multi-step techniques, where the absence of memory systems considerably reduces generalization capacity.

5.2. Results Using Recurent Neural Networks (RNN)

The results obtained for the RNN network (Figure 7a–d) highlight a difference between the high performance in classical prediction and its limits in autoregressive and multi-step predictions.
In the case of the RNN network, the performance in classical prediction on the test set (Figure 7b) is satisfactory, with the model being able to track the variations of the real values. The statistical indicators confirm this: RMSE = 0.2969, MAE = 0.2286, and sMAPE = 0.0146, with a high correlation coefficient ρ = 0.7927 and a coefficient of determination R2 = 0.625. These results indicate that the model successfully identified the relationship between the input data and the desired outputs.
For the autoregressive prediction in a single step (Figure 7c), the performance drops considerably. The values RMSE = 0.5969, MAE = 0.4544, and sMAPE = 0.0192 indicate an increase in errors, while the determination coefficient decreases to R2 = 0.247, showing partial degradation due to the iterative propagation of prediction errors. Even though the Pearson correlation remains moderate (ρ = 0.584, p < 0.001), the predictions begin to deviate from the real data dynamics. In the case of multi-step prediction (Figure 7d), the results are similar: RMSE = 0.6166, MAE = 0.4813, sMAPE = 0.0253, and R2 = −0.296, with a weak correlation (ρ = 0.227). Although the absolute errors remain moderate, the negative coefficient of determination highlights that the model fails to maintain a consistent explanation of the variability of the data over extended horizons.
In conclusion, RNN provides accurate prediction in the classical approach, but the accumulation of errors in the autoregressive and multi-step prediction significantly limits its long-term generalization capability.

5.3. Results with Gated Recurrent Unit (GRU)

The results for the GRU network (Figure 8a–d) show a balanced performance, with excellent prediction in the classical regime and superior stability in the autoregressive and multi-step scenarios.
In the classical prediction on the test set (Figure 8b), the GRU model reproduces the general evolution of the real values with moderate accuracy, obtaining RMSE = 0.3246, MAE = 0.2371, and sMAPE = 0.0150, with a correlation coefficient ρ = 0.746 and R2 = 0.551. These values indicate a satisfactory but slightly weaker fit compared to the RNN.
For the one-step autoregressive prediction (Figure 8c), the performance decreases noticeably, with RMSE = 0.5410, MAE = 0.4072, and sMAPE = 0.0271. The coefficient of determination becomes negative (R2 = −0.328) and the correlation is weak (ρ = −0.104), showing that error accumulation significantly degrades prediction quality.
In the case of multi-step prediction (Figure 8d), the results remain similar, with RMSE = 0.6469, MAE = 0.5103, sMAPE = 0.0319, R2 = −0.690, and a very weak correlation (ρ = 0.056). Although the estimated series still follows the general shape of the signal, the low statistical indicators confirm that the GRU model fails to maintain consistency over extended horizons. Overall, the GRU provides only moderate accuracy in classical prediction and shows limited robustness in autoregressive and multi-step forecasting, performing slightly below the RNN in terms of long-term generalization.

5.4. Results with the Long Short-Term Memory (LSTM) Model

The LSTM network (Figure 9a–d) shows robust behavior, with excellent accuracy in the classical regime and improved stability on extended horizons. The LSTM classical prediction on the test set (Figure 9b) yields RMSE = 0.2571, MAE = 0.1870, sMAPE = 0.0119, ρ = 0.870, and R2 = 0.718. The measured and predicted values correspond closely.
For the one-step autoregressive prediction (Figure 9c), the performance decreases, with RMSE = 0.5809, MAE = 0.4479, and sMAPE = 0.0221, while the determination coefficient remains slightly positive (R2 = 0.0257) and the correlation drops to ρ = 0.365 (p < 10−18). These results indicate that the accumulation of errors begins to affect the prediction quality, yet the model retains a weak but consistent correspondence with the real data.
In multi-step prediction (Figure 9d), LSTM achieves RMSE = 0.6470, MAE = 0.5042, sMAPE = 0.0264, R2 = −0.286, and a low correlation (ρ = 0.281, p < 10−11). While the negative R2 values confirm the architecture’s limited ability to explain data variability over extended horizons, the predicted series still follows the main shape of the real data more coherently than in the one-step regime.
Even though the statistical values of the autoregressive and multi-step strategies show limitations, LSTM continues to perform well in classical prediction and demonstrates partial robustness against long-term error accumulation.

5.5. Results with Bidirectional LSTM (Bi-LSTM)

The results for the BiLSTM network (Figure 10a–d) show a similar behavior to other recurrent architectures, with excellent performance in classical prediction and limitations in the autoregressive and multi-step regimes.
In classical prediction on the test set (Figure 10b), BiLSTM obtains RMSE = 0.2805, MAE = 0.2132, sMAPE = 0.0135, correlation ρ = 0.832, and a coefficient of determination R2 = 0.665. These results indicate a satisfactory fit between the measured and estimated data, demonstrating the model’s ability to capture bidirectional dependencies in the data used.
For the single-step autoregressive prediction (Figure 10c), the performance decreases moderately, with RMSE = 0.5646, MAE = 0.4285, and sMAPE = 0.0233, while the determination coefficient remains close to zero (R2 = 0.006) and the correlation drops to ρ = 0.353 (p < 10−17). These results suggest that the accumulation of errors slightly affects prediction quality, but the model retains a weak consistency with the real data.
In the case of multi-step prediction (Figure 10d), the results are comparable: RMSE = 0.6146, MAE = 0.4792, sMAPE = 0.0215, with R2 = 0.007 and a moderate correlation (ρ = 0.401, p < 10−22). Even though the statistical values highlight the model’s limited explanatory power, the visual representation shows that BiLSTM manages to follow the general trends of the real data, maintaining better stability over extended horizons.
Overall, BiLSTM offers good accuracy in the classical regime and relatively higher stability in multi-step predictions, but the accumulation of small errors still limits its long-term generalization capability.

5.6. Results with RNN with Self-Attention

The results for the self-attention RNN (Figure 11a–d) show a strong integration between the recurrent mechanism and the ability of the attention layer to highlight important relationships within temporal sequences.
In the classical prediction regime (Figure 11b), the model obtains RMSE = 0.2831, MAE = 0.2168, sMAPE = 0.0138, ρ = 0.834, and R2 = 0.659. These results indicate that attention integration enhances the model’s ability to capture temporal dependencies, achieving high precision and a strong correlation between measured and predicted values.
The one-step autoregressive prediction (Figure 11c) shows a moderate decrease in performance, with RMSE = 0.6164, MAE = 0.4726, sMAPE = 0.0190, ρ = 0.605, and R2 = 0.317. These values demonstrate that, although iterative propagation increases errors, the model maintains a significant positive correlation with real data, confirming the stabilizing effect of the attention mechanism.
In the multi-step prediction (Figure 11d), the results remain consistent, with RMSE = 0.6577, MAE = 0.5125, sMAPE = 0.0214, ρ = 0.515, and R2 = 0.099. The positive R2 values and moderate correlations indicate that the self-attention structure helps preserve temporal consistency over extended horizons, even though the predictive precision gradually decreases.
Overall, the self-attention RNN achieves high accuracy in classical point prediction and maintains improved robustness in the autoregressive and multi-step forecasts, demonstrating that the attention mechanism effectively enhances the recurrent architecture’s temporal learning capability.

5.7. Results Using Convolutional Neural Networks (CNN)

The results for the CNN network (Figure 12a–d) show a satisfactory prediction capacity in the classical regime but also limitations in the autoregressive and multi-step predictions.
In the classical prediction on the test set (Figure 12b), the CNN obtains RMSE = 0.204, MAE = 0.145, sMAPE = 0.0094, a correlation coefficient ρ = 0.913, and a coefficient of determination R2 = 0.822. These results confirm good accuracy and a strong correlation between the measured and predicted values, demonstrating the efficiency of the convolutional architecture in extracting relevant local features.
For the one-step autoregressive prediction (Figure 12c), the performance decreases, with RMSE = 0.581, MAE = 0.581, sMAPE = 0.0186, R2 = 0.330, and a moderate correlation (ρ = 0.589). The results show that the model partially loses connection with the real series when previous predictions are reintroduced as inputs, which causes an increase in errors and a reduction in correlation strength.
In the case of multi-step prediction (Figure 12d), the CNN obtains RMSE = 0.657, MAE = 0.513, sMAPE = 0.0222, ρ = 0.483, and R2 = 0.043. Although the predicted series visually follow the general trend of the real data, the low R2 values indicate that the model explains only a limited part of the data variability over extended horizons.
Overall, the CNN provides accurate prediction in the classical regime, but its performance deteriorates in the autoregressive and multi-step scenarios, where error accumulation and the absence of temporal memory significantly reduce long-term accuracy.

5.8. Results with Hybrid CNN-GRU-LSTM Architecture for Robust THD Forecasting

An advanced hybrid architecture combining convolutional layers (2D CNN) with recurrent GRU and LSTM layers was proposed to utilize both the ability to extract local features from sequential signals and the ability to capture long-term temporal relationships.
The input sequences were initially passed through a deep convolutional block, consisting of multiple 2D convolutional layers with progressive dilation factors (1, 2, 4, 8, and 16). This provides the network the ability to capture varying amplitude and frequency patterns in the input data. Each convolutional layer is followed by an ELU (Exponential Linear Unit) activation, which maintains the stability of the gradient propagation. A pooling layer is used to reduce the intermediate dimensionality.
After spatial feature extraction, the data is fed into a sequence of GRU and LSTM layers. The initial GRU provides efficient storage with a small number of parameters, while the subsequent LSTM layers are used to model complex temporal relationships. To prevent overfitting and improve generalization, dropout regularization layers are inserted between the recurrent layers.
Finally, the output is preceded by a fully connected layer, followed by a regression layer that produces the numerical prediction of the THD value.
The proposed CNN–GRU–LSTM hybrid architecture is shown in Figure 13. It integrates convolutional and recurrent components to capture both local and long-term temporal dependencies in the THD signal. The network includes four convolutional layers (3 × 3 kernels, stride = 1, ELU activation, and batch normalization), followed by an average pooling layer and a flattening operation. The extracted features are processed by a GRU layer (100 units) and an LSTM layer (100 units) in sequence, with dropout regularization (rate = 0.2) between layers. A fully connected regression output layer completes the model.
This architecture was optimized empirically through iterative tuning of convolutional filters and recurrent units, resulting in a balanced configuration that ensured both prediction accuracy and computational efficiency.
The hybrid model achieves very good performance in classical prediction: RMSE = 0.134, MAE = 0.097, sMAPE = 0.0063, with a very high correlation (ρ = 0.961) and a coefficient of determination of R2 = 0.923. These values confirm that the hybrid architecture manages to capture both local features (through the convolutional part) and long-term temporal dependencies (through GRU and LSTM). Practically, the predictions overlap almost perfectly with the measured data.
In autoregressive prediction, as observed in the other architectures, the performance decreases due to the accumulation of errors. However, in this case, the degradation is moderate: RMSE = 0.611, MAE = 0.458, sMAPE = 0.0156, with a positive R2 = 0.437 and correlation ρ = 0.683. These results show that the hybrid model retains a strong capacity to preserve the signal trend, effectively limiting error propagation compared to the other networks. The results are presented in Figure 14.
For multi-step prediction, the results remain stable and robust: RMSE = 0.620, MAE = 0.472, sMAPE = 0.0171, with R2 = 0.430 and a significant correlation (ρ = 0.669). This demonstrates that, unlike other architectures where multi-step predictions deteriorated rapidly, the hybrid model maintains consistency with the real signal and successfully reproduces its dynamics over extended horizons.
The measurements were performed in a single steel mill; however, the data acquisition included multiple operating cycles, over several days, that included transient events, steady-state periods, and switching of the reactive power compensation facility. These variations produced a diverse set of signal patterns, including frequent transients, harmonic fluctuations, and changes in reactive power flow, which allowed the models to learn from a rich and heterogeneous data set. Therefore, the sampling duration and the diversity of operating regimes were sufficient to characterize the dynamic behavior of the system under real industrial conditions. In addition, to improve generalization, the data set was divided into independent training, validation, and testing subsets to ensure that unseen data segments were used for performance evaluation. The predictive models—especially the CNN-GRU-LSTM hybrid—demonstrated consistent accuracy and stable correlation metrics (R2 and ρ) in these unused training data sequences, confirming their ability to generalize to new operational scenarios, rather than simply memorizing specific models.

6. Discussion

The comparative results of all architectures across the three prediction strategies are summarized in Table 2, which reports the values of RMSE, MAE, sMAPE, the coefficient of determination (R2), the Pearson correlation coefficient (ρ), and the associated p-values. These results allow for a detailed discussion on the relative performance, robustness, and limitations of each model.
The comparative analysis of the tested neural network architectures highlights important differences in their behavior for the three prediction methods (classical, autoregressive one-step, and multi-step).
In the case of classical prediction on the test data, the results confirm the theoretical expectation that convolutional and hybrid architectures provide the highest accuracy. However, several recurrent models also performed well. The feedforward network achieved strong results (R2 ≈ 0.79, ρ ≈ 0.89), showing that even a simple architecture can effectively capture short-term dependencies. Among recurrent models, the LSTM also showed robust performance (R2 ≈ 0.72, ρ ≈ 0.87), confirming its ability to model nonlinear temporal dynamics. The BiLSTM and RNN with self-attention offered slightly lower but still consistent accuracy (R2 ≈ 0.65–0.66, ρ ≈ 0.83–0.84), while the standard RNN and GRU achieved moderate performance (R2 ≈ 0.55–0.62, ρ ≈ 0.74–0.79).
The Enhanced Hybrid CNN–GRU–LSTM achieved the best overall performance, with very low error rates (RMSE = 0.134, MAE = 0.097, sMAPE = 0.0063), a very strong correlation with the real data (ρ = 0.961, p < 0.001), and the highest coefficient of determination (R2 = 0.923). These results show that the hybrid model effectively captures both local features (via convolutional layers) and long-term temporal dependencies (through GRU and LSTM units), making it ideally suited for the analyzed THD data.
The CNN model also achieved very good results (R2 = 0.822, ρ = 0.913), confirming the ability of convolutional architectures to extract meaningful local patterns from sequential signals. Related studies have reported similar conclusions, showing that CNN–LSTM hybrids improve forecasting accuracy in solar irradiance prediction [32] and electricity consumption estimation [33].
When using autoregressive one-step prediction, all models’ performance dropped significantly. This is in line with the theory that recursive prediction propagates and amplifies errors. For every model, the R2 values became significantly smaller, sometimes negatives (from −0.588132 to 0.437422), which meant that the models explained less variance than a simple predictor that only returned the mean of the data. Such negative coefficients of determination are consistent with prior observations in the literature, which demonstrate that recursive forecasting methods inherently accumulate uncertainty and gradually lose coherence with the real data [34]. In the context of this study, the GRU model, despite its gating mechanism, experienced a notable degradation of predictive accuracy under recursive conditions, confirming observations from earlier studies on the difficulty of maintaining gradient stability in gated networks during iterative forecasting [34]. By contrast, recurrent models such as RNN, LSTM, BiLSTM, and RNN with self-attention maintained positive R2 values (0.02–0.32) and moderate correlations (ρ ≈ 0.35–0.68), indicating that they retained partial coherence with the measured data. The Enhanced Hybrid CNN–GRU–LSTM model achieved the best autoregressive performance, maintaining a high correlation (ρ = 0.683) and a positive R2 = 0.437, demonstrating that the hybridization of convolutional and recurrent layers significantly improves recursive stability. These results align with research on error mitigation in recurrent forecasting, which shows that gating mechanisms can limit the accumulation of errors over extended horizons [35], as well as studies that emphasize the benefits of improved gating structures in recurrent neural networks [36]. In contrast, the Feedforward architecture exhibited the weakest autoregressive performance, with a strongly negative R2 and high absolute errors, confirming that models without temporal memory are most affected by recursive feedback and error accumulation.
Although the overall accuracy did not increase, the degradation of performance was more gradual, indicating that the multi-step approach provided improved stability across extended horizons. This behavior reflects the fact that direct multi-step forecasting, unlike recursive strategies, produces predictions directly for the future interval rather than reusing previous outputs as new inputs. As a result, it limits the exponential accumulation of propagated errors, even when the global accuracy remains similar to or slightly below that of the one-step regime. These results are consistent with other studies reporting that direct multi step methods mitigate the instability typical of iterative approaches [37].
In the current research, most architectures achieved near-zero or slightly negative R2 values (−0.29 to 0.10), with moderate correlations (ρ ≈ 0.40–0.67), indicating partial robustness in tracking long-term dynamics. Notably, several models obtained positive R2 values in this regime—specifically, the BiLSTM (R2 ≈ 0.007), CNN (R2 ≈ 0.043), RNN with self-attention (R2 ≈ 0.099), and the Enhanced Hybrid CNN–GRU–LSTM (R2 ≈ 0.430). These results demonstrate that the inclusion of bidirectional or attention mechanisms, and especially the hybrid combination of convolutional and recurrent layers, enhances the network’s ability to preserve temporal consistency over longer horizons.
By contrast, Feedforward, GRU, LSTM, and RNN exhibited negative R2 values (ranging from −0.29 to −0.69), yet maintained relatively low errors (RMSE ≈ 0.61–0.66), suggesting that even when explanatory power decreases, the models still retain a reasonable capability to approximate the overall signal trend.
These outcomes align with observations in the time-series forecasting literature, particularly within attention-based and hybrid architectures. For example, transformer architectures are often preferred in multi-step forecasting tasks due to their ability to reduce long-term error accumulation and preserve sequential coherence [38]. Furthermore, hybrid deep learning methods have demonstrated superior accuracy in long-horizon prediction problems, such as coal price forecasting and other complex dynamic domains [39].
Overall, several conclusions can be drawn. In classical prediction, convolutional and hybrid architectures remain superior, with the Enhanced Hybrid CNN–GRU–LSTM achieving the highest accuracy. In the autoregressive regime, all models exhibit performance degradation due to recursive error propagation, though networks with memory—such as GRU, LSTM, BiLSTM, and the hybrid—retain better stability than the Feedforward and CNN models. In multi-step forecasting, direct prediction over extended horizons limits, but does not eliminate, error accumulation; several architectures, including BiLSTM, CNN, the attention-based RNN, and the hybrid model, maintain positive R2 values and moderate correlations. Finally, while correlations remain high in the classical regime (>0.90), they decrease in autoregressive mode and partially recover in multi-step forecasting, reflecting the stabilizing effect of direct multi-horizon prediction on temporal coherence.
Negative R2 values were obtained for some architectures in the autoregressive mode—mainly Feedforward and GRU—indicating weaker performance due to recursive error accumulation. This behavior is typical and consistent with reports in the recent time-series literature [40,41]. In contrast, models with memory and gating mechanisms, such as LSTM, BiLSTM, and the hybrid CNN–GRU–LSTM, maintained positive R2 values and moderate correlations (ρ ≈ 0.35–0.68), showing better resistance to error propagation.
In conclusion, the comparative evaluation shows that hybrid architectures, particularly the Enhanced CNN-GRU-LSTM, are the most effective choice for THD prediction, offering both useful short-term accuracy and superior robustness in long-term forecasting scenarios.
In addition to the prediction performance criteria, the computational efficiency of the tested models was evaluated to determine whether they were suitable for real-time applications. For the purpose of accurate comparison, all neural network architectures were trained on the same datasets with the same hyperparameters (600 epochs, MiniBatchSize = 128, initial learning rate = 0.008). Although more complex architectures required longer training times, the CNN-GRU-LSTM hybrid network is computationally acceptable, with training taking only a few tens of minutes on a mid-range GPU. The inference speed of the CNN–GRU–LSTM model was tested on a computer with an Intel i7 processor and an NVIDIA RTX 3060 GPU. Once trained, the model performs THD prediction with high computational efficiency, requiring less than 0.5 s on the CPU and less than 0.1 s on the GPU for a 10-step forecast window. These results demonstrate that the proposed hybrid architecture is suitable for near-real-time power quality prediction. Consequently, the model can be integrated into real-time monitoring and predictive control frameworks without additional hardware acceleration.
The experimental data used for the model development included numerous sequences with sudden and repetitive variations of current, THD, and active/reactive power, which show frequent transient conditions. These data characterize dynamic regimes—with current peaks exceeding 650 A and THD increases of up to 7–8.5%—and are the real input data that were used by the neural network architectures, and thus they were able to learn from real operating fluctuations. The proposed CNN-GRU-LSTM hybrid architecture efficiently learned both short-term transient characteristics and long-term temporal dependencies, which makes it suitable for modeling the nonlinear behavior of THD under such variable load conditions. Although the model does not explicitly classify the operating regimes, it implicitly learns their patterns from the training data. Through experiments performed in both compensated and uncompensated regimes, the network correctly identified the correlations between power variations and distortion peaks. This confirms the model’s ability to adapt to real industrial operating conditions and to provide a good and stable forecast of harmonic distortion during frequent load transitions.
All models were validated using standard indicators, including RMSE, MAE, sMAPE, R2, and Pearson correlation coefficient (ρ). These metrics provide a comprehensive evaluation of accuracy, error magnitude, and temporal continuity. The CNN-GRU-LSTM hybrid model outperformed all other studied architectures in terms of predictive accuracy and robustness, with R2 = 0.923 and ρ = 0.961 in classical prediction and a positive R2 (0.285) in multi-step prediction. These results confirm the model’s capacity for working consistently and accurately in the face of unpredictability. Furthermore, the prediction of THD progress allows the early detection of power quality issues and the implementation of proactive compensation or filtering actions, resulting in a more resilient power supply system for rolling mills.

7. Conclusions

This paper presented a comprehensive analysis of total harmonic distortion (THD) prediction using various neural network architectures, including feedforward networks, recurrent architectures (RNN, GRU, LSTM, BiLSTM), convolutional models (CNN), attention-based variants, and hybrid methods that combine CNN and recurrent units. The study systematically evaluated these models’ performance using three prediction methods: classical prediction on test data, autoregressive one-step prediction, and multi-step prediction.
The results show that architectural choices have an important effect on predicted accuracy and robustness. In classical prediction, convolutional and hybrid architectures performed better, with the Enhanced Hybrid CNN-GRU-LSTM appearing as the most effective model, with the lowest error values and the highest correlation with real data. In autoregressive one-step prediction, every architecture degraded significantly due to recursive error propagation, although recurrent models with gating mechanisms (GRU, LSTM, BiLSTM) and hybrid architectures demonstrated relative stability. The Enhanced Hybrid CNN-GRU-LSTM was the only architecture that achieved a positive R2, showing its resilience in long-term forecasting.
Overall, the comparative study emphasizes the advantages of combining convolutional and recurrent structures in a single hybrid framework. Such architectures offer both superior short-term accuracy and increased durability in long-term prediction tasks, making them ideal for use in power quality monitoring and industrial systems.
The comparative evaluation and optimization procedures applied in this study are consistent with current best practices in hybrid intelligent modeling and predictive system design [38].
Future research will focus on increasing the adaptability of recursive prediction modes, integrating advanced attention mechanisms, and testing the suggested models on larger and more varied datasets to demonstrate their generalizability.
The proposed CNN–GRU–LSTM model can be integrated as a predictive module within supervisory control or energy management systems for industrial facilities (such as SCADA platforms). In this way, measures could be taken, and proactive compensation and control actions could be performed, such as adaptive regulation of reactive power compensators and automatic switching of harmonic filters. By predicting the trends of future THD values in real time, the model can help in decisions aimed at maintaining voltage stability and improving power quality and thus can contribute to reducing energy consumption and extending the lifetime of equipment.
Future work will focus on proposing a closed-loop predictive control framework using the developed model.

Author Contributions

Conceptualization, M.P. and C.P.; methodology, M.P.; software, C.P.; validation, P.I.; formal analysis, M.P.; investigation, P.I.; resources, P.I.; data curation, C.P.; writing—original draft preparation, P.I.; writing—review and editing, M.P.; visualization, C.P.; supervision, M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality agreements with the industrial facility where the measurements were collected.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ANNArtificial Neural Networks
CNNConvolutional Neural Network
GRUGated Recurrent Unit
LSTMLong Short Term Memory
BiLSTMBidirectional Long Short Term Memory
FNNFeedforward Neural Network
RNNRecurrent Neural Network
RMSERoot Mean Square Error
MAEMean Absolute Error
sMAPESymmetric Mean Absolute percentage Error
THDTotal Harmonic Distortion

References

  1. Bao, M.; Xia, J.; Yin, X.; Dong, M.; He, H.; He, J. Harmonic Measurements and Analysis in a Modern Steel Manufacturing Facility. In Proceedings of the 2010 International Conference on Power System Technology (POWERCON), Hangzhou, China, 24–28 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–7, ISBN 978-1-4244-6551-4. [Google Scholar]
  2. Liu, B.; Liu, C.; Liu, J.; Zheng, X.; Xu, G.; Wang, B. Harmonic Control Measures in Yili Rolling Mill. In Proceedings of the 2018 China International Conference on Electricity Distribution (CICED), Tianjin, China, 17–19 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 571–575, ISBN 978-1-5386-6775-0. [Google Scholar]
  3. Goh, Z.P.; Radzi, M.A.M.; Hizam, H.; Wahab, N.I.A. Investigation of severity of voltage flicker caused by second harmonic. IET Sci. Meas. Technol. 2017, 11, 363–370. [Google Scholar] [CrossRef]
  4. Park, B.; Lee, J.; Yoo, H.; Jang, G. Harmonic Mitigation Using Passive Harmonic Filters: Case Study in a Steel Mill Power System. Energies 2021, 14, 2278. [Google Scholar] [CrossRef]
  5. Zhao, Y.; Milanović, J.V. Probabilistic Harmonic Estimation in Uncertain Transmission Networks Using Sequential ANNs. In Proceedings of the 2022 20th International Conference on Harmonics & Quality of Power (ICHQP), Naples, Italy, 29 May–1 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar] [CrossRef]
  6. Esfahani, M.T.; Vahidi, B. Electric Arc Furnace Power Quality Improvement by Applying a New Digital and Predicted-Based TSC Control. Turk. J. Electr. Eng. Comput. Sci. 2016, 24, 3971–3984. [Google Scholar] [CrossRef]
  7. Soni, P.; Mondal, D.; Mishra, P.; Chatterjee, S. Deep Learning Technique for Recurrence Plot-Based Classification of Power Quality Disturbances. In Proceedings of the 2022 IEEE International Power and Renewable Energy Conference (IPRECON), Kollam, India, 16–18 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar] [CrossRef]
  8. Huang, L.; Tang, L.; Wang, C.; Chen, Y. Ultra-Short-Term Prediction of Small-Sample Photovoltaic Power Based on WGAN-GP and BiLSTM-NGO. In Proceedings of the 2024 IEEE 2nd International Conference on Power Science and Technology (ICPST), Shanghai, China, 9–11 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
  9. Rao, Z.; Yang, Z.; Li, J.; Li, L.; Wan, S. Prediction of Photovoltaic Power Generation Based on Parallel Bidirectional Long Short-Term Memory Networks. Energy Rep. 2024, 12, 3620–3629. [Google Scholar] [CrossRef]
  10. Revin, I.; Potemkin, V.A.; Balabanov, N.R.; Nikitin, N.O. Automated Machine Learning Approach for Time Series Classification Pipelines Using Evolutionary Optimization. Knowl. Based Syst. 2023, 268, 110483. [Google Scholar] [CrossRef]
  11. Guerrero-Sánchez, A.E.; Rivas-Araiza, E.A.; Garduño-Aparicio, M.; Tovar-Arriaga, S.; Rodriguez-Resendiz, J.; Toledano-Ayala, M. A Novel Methodology for Classifying Electrical Disturbances Using Deep Neural Networks. Technologies 2023, 11, 82. [Google Scholar] [CrossRef]
  12. Shinde, P.; Ahmad, A.; Munje, R. Decision Rules Based Supervised Machine Learning for Power Quality Application. In Proceedings of the 2019 5th International Conference for Convergence in Technology (I2CT), Pune, India, 29–31 March 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
  13. Cheng, L.; Wu, Z.; Xuanyuan, S.; Chang, H. Power Quality Disturbance Classification Based on Adaptive Compressed Sensing and Machine Learning. In Proceedings of the 2020 IEEE PES Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 17–20 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 65–71. [Google Scholar]
  14. Vidhya, S.; Kamaraj, V. Particle Swarm Optimized Extreme Learning Machine for Feature Classification in Power Quality Data Mining. Automatika 2017, 58, 487–494. [Google Scholar] [CrossRef]
  15. Dawood, Z.; Babulal, C.K. Power Quality Disturbance Classification Based on Efficient Adaptive Arrhenius Artificial Bee Colony Feature Selection. Int. Trans. Electr. Energy Syst. 2021, 31, e12868. [Google Scholar] [CrossRef]
  16. Kavaskar, S.; Sendil Kumar, S.; Karthick, K. Power Quality Disturbance Detection Using Machine Learning Algorithm. In Proceedings of the 2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE), Coimbatore, India, 10–11 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
  17. Al Hadi, F.M.; Aly, H.H. Harmonics Forecasting of Renewable Energy System Using Hybrid Model Based on LSTM and ANFIS. IEEE Access 2024, 12, 50966–50985. [Google Scholar] [CrossRef]
  18. Panoiu, M.; Ghiormez, L.; Panoiu, C. Adaptive Neuro-Fuzzy System for Current Prediction in Electric Arc Furnaces. In Soft Computing Applications; Balas, V., Jain, L.C., Kovačević, B., Eds.; SOFA 2014; Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2016; Volume 356. [Google Scholar] [CrossRef]
  19. Farokhnia, N.; Mohammad, M.; Rezanezhad Gatabi, I.; Ehsani, M. Unbalance, Flicker, Harmonic, Voltage and Reactive Power Compensation of the Distribution Grid Using a Universal STATOM. In Proceedings of the 2014 IEEE PES General Meeting|Conference & Exposition, National Harbor, MD, USA, 27–31 July 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–5. [Google Scholar] [CrossRef]
  20. Ali, M.; Al-Ismail, F.S.; Gulzar, M.M.; Khalid, M. A review on harmonic elimination and mitigation techniques in power converter based systems. Electr. Power Syst. Res. 2024, 234, 110573. [Google Scholar] [CrossRef]
  21. Maklakov, A.S.; Jing, T.; Nikolaev, A.A.; Gasiyarov, V.R. Grid Connection Circuits for Powerful Regenerative Electric Drives of Rolling Mills: Review. Energies 2022, 15, 8608. [Google Scholar] [CrossRef]
  22. Sankar, S.R.; Madhavan, P. WindForecastX: A Dynamic Approach for Accurate Long-Term Wind Speed Prediction in Wind Energy Applications. Ocean Dyn. 2025, 75, 11. [Google Scholar] [CrossRef]
  23. Zjavka, L. Power Quality Multi-Step Prediction with the Gradually Increasing Selected Input Parameters Using Machine Learning and Regression. Sustain. Energy Grids Netw. 2021, 26, 100442. [Google Scholar] [CrossRef]
  24. Chan, J.W.; Yeo, C.K. Electrical Power Consumption Forecasting with Transformers. In Proceedings of the 2022 IEEE Electrical Power and Energy Conference (EPEC), Ottawa, ON, Canada, 5–7 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar] [CrossRef]
  25. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; Chapter 6: Deep Feedforward Networks; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  26. Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM Networks. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
  27. Cacciari, I.; Dusi, F.; Odorico, A.; Zio, E. Hands-On Fundamentals of 1D Convolutional Neural Networks. Appl. Sci. 2024, 14, 8500. [Google Scholar] [CrossRef]
  28. Choi, S.; Kim, S.I.; Chairattanawat, C.; Hwang, S. Transformer-based multi-step time series forecasting of methane yield in Full-Scale anaerobic digestion. Water Res. 2025, 286, 124276. [Google Scholar] [CrossRef]
  29. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  30. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-Art in Artificial Neural Network Applications: A Survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef]
  31. Shalini, T.A.; Revathi, B.S. Hybrid Power Generation Forecasting Using CNN-Based BiLSTM Method for Renewable Energy Systems. Automatika 2023, 64, 127–144. [Google Scholar] [CrossRef]
  32. Lin, Z.; Feng, M.; Santos, C.N.d.; Yu, M.; Xiang, B.; Zhou, B.; Bengio, Y. A Structured Self-Attentive Sentence Embedding. In Proceedings of the 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 24–26 April 2017. [Google Scholar]
  33. Ladjal, B.; Nadour, M.; Bechouat, M.; Hadroug, N.; Sedraoui, M.; Rabehi, A.; Guermoui, M.; Agajie, T.F. Hybrid deep learning CNN–LSTM model for forecasting direct normal irradiance. Sci. Rep. 2025, 15, 94239. [Google Scholar] [CrossRef] [PubMed]
  34. Chung, H.; Jang, H. Accurate prediction of electricity consumption using a hybrid CNN–LSTM model. PLoS ONE 2022, 17, e0278071. [Google Scholar] [CrossRef]
  35. Effrosynidis, D.; Spiliotis, E.; Sylaios, G.; Arampatzis, A. Time series and regression methods for univariate environmental forecasting: An empirical evaluation. Sci. Total. Environ. 2023, 875, 162580. [Google Scholar] [CrossRef]
  36. Farooq, J.; Bazaz, M.A.; Rafiq, D. Multiscale Autoencoder-RNN Architecture for Mitigating Error Accumulation in Long-Term Forecasting. In Proceedings of the 2023 International Conference on Emerging Techniques in Computational Intelligence (ICETCI), Hyderabad, India, 21–23 September 2023; pp. 271–275. [Google Scholar] [CrossRef]
  37. Iordan, A.-E. An Optimized LSTM Neural Network for Accurate Estimation of Software Development Effort. Mathematics 2024, 12, 200. [Google Scholar] [CrossRef]
  38. Wu, Y.; Cai, D.; Gu, S.; Jiang, N.; Li, S. Compressive strength prediction of sleeve grouting materials in prefabricated structures using hybrid optimized XGBoost models. Constr. Build. Mater. 2025, 476, 141319. [Google Scholar] [CrossRef]
  39. Bai, W.; Jin, M.; Li, W.; Zhao, J.; Feng, B.; Xie, T.; Li, S.; Li, H. Multi-Step Prediction of Wind Power Based on Hybrid Model with Improved Variational Mode Decomposition and Sequence-to-Sequence Network. Processes 2024, 12, 191. [Google Scholar] [CrossRef]
  40. Gülmez, B. Stock price prediction with optimized deep LSTM network with artificial rabbits optimization algorithm. Expert Syst. Appl. 2023, 227, 120346. [Google Scholar] [CrossRef]
  41. Baun, J.J.; Janairo, A.G.; Ii, R.C.; Francisco, K.; Enriquez, M.L.; Relano, R.J.; Sybingco, E.; Bandala, A.; Vicerra, R.R. Deep neural network-based current and voltage prediction models for digital measuring unit of capacitive resistivity underground imaging transmitter subsystem. Int. J. Comput. Digit. Syst. 2024, 15, 627–640. [Google Scholar] [CrossRef]
Figure 1. Diagram of the 6 kV power supply system for the hot rolling mill, including transformer substations and key measurement points at ST1.
Figure 1. Diagram of the 6 kV power supply system for the hot rolling mill, including transformer substations and key measurement points at ST1.
Mathematics 13 03692 g001
Figure 2. Phase 1 RMS voltage during operation: (a) without capacitor bank; (b) with capacitor bank.
Figure 2. Phase 1 RMS voltage during operation: (a) without capacitor bank; (b) with capacitor bank.
Mathematics 13 03692 g002
Figure 3. Total harmonic distortion of phase 1 voltage: (a) without capacitor bank; (b) with capacitor bank.
Figure 3. Total harmonic distortion of phase 1 voltage: (a) without capacitor bank; (b) with capacitor bank.
Mathematics 13 03692 g003
Figure 4. Active power: (a) without capacitor bank; (b) with capacitor bank.
Figure 4. Active power: (a) without capacitor bank; (b) with capacitor bank.
Mathematics 13 03692 g004
Figure 5. Reactive power: (a) without capacitor bank; (b) with capacitor bank.
Figure 5. Reactive power: (a) without capacitor bank; (b) with capacitor bank.
Mathematics 13 03692 g005
Figure 6. Results of the prediction using the Feedforward neural network: (a) The result of training on the entire dataset; (b) classic prediction and comparison between the test sequence and the predicted sequence; (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 6. Results of the prediction using the Feedforward neural network: (a) The result of training on the entire dataset; (b) classic prediction and comparison between the test sequence and the predicted sequence; (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g006
Figure 7. Results of the prediction using the RNN: (a) The result of training on the entire dataset; (b) classic prediction (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 7. Results of the prediction using the RNN: (a) The result of training on the entire dataset; (b) classic prediction (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g007
Figure 8. Results of the prediction using the GRU: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 8. Results of the prediction using the GRU: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g008
Figure 9. Results of the prediction using the LSTM neural network: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 9. Results of the prediction using the LSTM neural network: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g009
Figure 10. Results of the prediction using the Bi-LSTM: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 10. Results of the prediction using the Bi-LSTM: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g010
Figure 11. Results of the prediction using the RNN with Self-Attention: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 11. Results of the prediction using the RNN with Self-Attention: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g011
Figure 12. Results of the prediction using the CNN: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 12. Results of the prediction using the CNN: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g012
Figure 13. The structure of the CNN–GRU–LSTM in Matlab.
Figure 13. The structure of the CNN–GRU–LSTM in Matlab.
Mathematics 13 03692 g013
Figure 14. Results of the prediction using the hybrid network: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Figure 14. Results of the prediction using the hybrid network: (a) classic prediction over the entire dataset; (b) classic prediction; (c) one-step autoregressive prediction; (d) multistep prediction.
Mathematics 13 03692 g014aMathematics 13 03692 g014b
Table 1. Architectural configurations and key parameters of the neural network models.
Table 1. Architectural configurations and key parameters of the neural network models.
ModelArchitecture SummaryHidden UnitsDropoutActivation
FNN3 dense layers [5, 7, 15] + FC + regression5–15ReLU
CNN5 Conv (3 × 3, 32 filters, dilation 1–8) + AvgPool + LSTM(32) + FC32ELU
LSTMLSTM(64) → LSTM(32, last) + FC64–320.25tanh
GRUGRU(128, last) + FC128tanh
BiLSTMGRU(128) → BiLSTM(64, 32, last) + FC128–64–320.25tanh
Hybrid CNN–GRU–LSTM5 Conv (3 × 3, 32, dilation 1–16) + GRU(128) → LSTM(64, 32) + FC128–64–320.25ELU/tanh
RNN with self AttentionSelf-attention (8 heads, 64 units) + GRU(128, last) + FC8 × 64/128tanh
Note: “–” indicates that no dropout layer was applied in the corresponding architecture
Table 2. The comparative results between models.
Table 2. The comparative results between models.
ArchitecturePrediction TypeRMSEMAEsMAPER2ρ (Pearson)
Feedforward (FF)Classical (test)0.2211500.1358310.0087720.7916980.894739
Autoregressive one-step0.5700470.4350290.027837−0.588132−0.229568
Multi-step0.7188890.5690690.035749−1.598135−0.229148
RNNClassical (test)0.2968920.2285670.0146020.6245810.792680
Autoregressive one-step0.5968940.4543640.0192470.2465210.584289
Multi-step0.6165970.4813280.025322−0.2959310.226603
GRUClassical (test)0.3245790.2370720.0150490.5512950.746066
Autoregressive one-step0.5409550.4072430.027052−0.327907−0.103972
Multi-step0.6468900.5102630.031932−0.6898710.056487
LSTMClassical (test)0.2571200.1870400.0119480.7184250.870073
Autoregressive one-step0.5808560.4479030.0221280.0256910.364876
Multi-step0.6469990.5042100.026425−0.2860140.280822
BiLSTMClassical (test)0.2805020.2131880.0135280.6648860.831858
Autoregressive one-step0.5645770.4284570.0233400.0600180.353387
Multi-step0.6145850.4791640.0214980.0074290.400658
CNNClassical (test)0.2044410.1446810.0093700.8219840.913255
Autoregressive one-step0.5813450.5813450.0186180.3297270.589284
Multi-step0.6568710.5133680.0221550.0427390.483223
RNN with Self-AttentionClassical (test)0.2831160.21700.0137880.6586100.834439
Autoregressive one-step0.6164060.4725950.0189820.3166360.605077
Multi-step0.6576940.5125140.0213670.0986610.515469
Hybrid CNN–GRU–LSTMClassical (test)0.1343400.0973400.0062510.9231340.960899
Autoregressive one-step0.6113480.4583750.0156270.4374220.683305
Multi-step0.6203420.4717060.0171400.4300410.668777
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Panoiu, M.; Ivascanu, P.; Panoiu, C. Analysis of Operating Regimes and THD Forecasting in Steelmaking Plant Power Systems Using Advanced Neural Architectures. Mathematics 2025, 13, 3692. https://doi.org/10.3390/math13223692

AMA Style

Panoiu M, Ivascanu P, Panoiu C. Analysis of Operating Regimes and THD Forecasting in Steelmaking Plant Power Systems Using Advanced Neural Architectures. Mathematics. 2025; 13(22):3692. https://doi.org/10.3390/math13223692

Chicago/Turabian Style

Panoiu, Manuela, Petru Ivascanu, and Caius Panoiu. 2025. "Analysis of Operating Regimes and THD Forecasting in Steelmaking Plant Power Systems Using Advanced Neural Architectures" Mathematics 13, no. 22: 3692. https://doi.org/10.3390/math13223692

APA Style

Panoiu, M., Ivascanu, P., & Panoiu, C. (2025). Analysis of Operating Regimes and THD Forecasting in Steelmaking Plant Power Systems Using Advanced Neural Architectures. Mathematics, 13(22), 3692. https://doi.org/10.3390/math13223692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop