1. Introduction
The wide usage of variable speed drives (VSDs), rectifiers, and inverters creates non-linear loads in a power network, resulting in non-sinusoidal current that is non-linearly related to the supply voltage. The non-sinusoidal current is a periodic function, and it can be decomposed into its fundamental sine wave plus various other sine waves of harmonic frequencies. Harmonics are an integral of multiples of the fundamental frequency. Thus, a non-sinusoidal current may have both odd and even harmonics. Harmonics have a negative impact on the reliability, stability, and safety of a power network. They may cause power supply interruptions, abnormal grounding protection, shorten equipment life, and overheat conductors as well as equipment [
1]. This, in turn, has a cascading impact on equipment downtime, ultimately diminishing the overall system productivity. Identifying abnormal behavior in equipment is gaining recognition as a pivotal element in proactively anticipating necessary maintenance actions [
1]. Both current and voltage harmonics are non-linear, non-stationary, dynamic, complex, noisy, and unstable and are classified as time series. These characteristics make harmonics difficult to predict.
Figure 1 shows a typical Nestle plant VSD connection of two electric motors. Most of the three-phase VSDs employ six-pulse-bridge “uncontrollable” diode rectifiers or “half-controllable” silicon-controlled rectifiers (SCRs) that are used to convert AC power to DC power that is used by variable frequency converters (VFCs). These circuits are low in cost, simple, and highly reliable. However, the conversion process produces significant harmonic currents that trigger system resonance at
(
Figure 1), which causes relatively high-voltage harmonics across the point of common coupling (PCC). All these phenomena pollute the power supply, leading to poor power quality. LC passive filters (de-tuned filter banks and fine-tuned filters) are used to address harmonics issues. Other measures include active filtering, inductive reactance, and high-pulse rectifiers. Nevertheless, filters are not quite cost-effective for dealing with these issues. In some instances, differential mode (DM) and common ode (CM) noise like electromagnetic interference (EM) filters are added to the filter circuit (
Figure 1), but this increases the risk of resonance in a multi-drive system, like the one at the Nestle Chocolate factory in East London, Eastern Cape Province of South Africa.
Machine learning (ML) techniques model complex and non-linear problems better than statistical techniques. The precise detection of harmonics is critical to an electrical power network’s efficiency and reliability. Harmonics degrade the quality of an electrical power supply by causing steady-state waveform distortion [
2]. An effective harmonics prediction method is a useful tool in planning an efficient electrical power network. Power network efficiency is characterized by its reliability and system reliability and is a function of cost-saving. The widespread use of electronic converters and inverters contributes to distorting both the voltage and current waveforms, resulting in harmonics. Statistical and machine learning algorithms have been used in forecasting power consumption, fault prediction on machines, and the number of time series data, but work is still being performed to find more accurate algorithms for the prediction of harmonics in an electrical power network [
3,
4,
5,
6,
7].
To the best of our knowledge, the application of CNN-BiLSTM-AM has not been extensively utilized in the prediction of electrical load harmonics, marking this research as pioneering in this specific field. This work is built on our previous work on the application of ANN, LSTM, and CNN-LSTM in the detection and prediction of harmonics in an electrical power system [
8,
9,
10]. In this paper, six (6) models are used to predict the load harmonics and are separately trained and tested using the historical harmonics data. The independent results are analyzed to determine the most accurate model, and RMSE is the method performance indicator. The paper’s contribution is to (1) propose an innovative method of harmonics detection and prediction based on a hybrid CNN-BiLSTM-AM model; (2) compare CNN-BiLSTM-AM model harmonics forecasting capabilities with five other deep learning methods; and (3) demonstrate that the CNN-BiLSTM-AM network performed better than the other five models in detecting and forecasting of harmonics. The paper outline is as follows:
Section 2 covers related work performed so far.
Section 3 outlines the methodology used in this paper. The results are discussed in
Section 4 and the conclusion in
Section 5.
2. Review of Harmonics Forecasting
The literature review effectively contextualized the investigation by considering aspects such as scope and relevance, the identification of research gaps, rationale for the investigation, integration with investigation objectives, critical analysis, and synthesis, as well as scope limitations. This comprehensive approach ensured that the investigation’s context was well established, relevant research gaps were clearly identified, and the rationale for addressing these gaps was justified based on the existing literature. Additionally, the review critically analyzed prior research and synthesized diverse perspectives, providing a solid foundation for the current investigation’s contributions.
Fast Fourier transforms (FFTs), the zero-crossing method, the least squared method (LSM), the Prony method, and the Time-Domain Quasi-Synchronous Sampling method have traditionally been used in harmonics measurements [
5]. These methods are challenged when they are used to manipulate big data. FFT has problems associated with spectrum leakage and picket fencing resulting from nonsynchronous sampling. These challenges are handled better using LSM based on the interpolation method, resulting in improved accuracy [
6]. Most of these methods extract features in the time and frequency domains, and the associated challenges include the following: 1. Harmonics and many other power quality disturbances have almost similar features; this leads to poor feature selection. 2. Features captured in the time and frequency domains do not describe harmonic pollution, thus leading to poor selection accuracy. 3. A feature extraction process demands a high-level understanding of harmonics characteristics, leading to complex feature extraction [
7].
Time series prediction techniques fall into two categories, namely, traditional time series methods and forecasting methods based on machine learning. Traditional time series techniques deal with specific models to describe time series, and they have challenges in dealing with real-world time series data due to their non-linearities [
8]. The dynamic equation for time series is either complex or unknown, and as such, noisy and complex features would not be determined using analytical equations. These ML techniques require low-end hardware [
9]. ML and DL methods are used to enhance network management, and these methods are bioinspired mathematical [
10]. Support vector machines (SVMs), Random Forest, ARIMA, decision trees, and logistic regression are the most frequently used ML time series data approaches, and they tend to work better on small-scale data [
11].
DL (deep learning), a branch of ML, is a data-driven methodology with the capability to handle prediction issues related to big data. On the other hand, signal processing techniques struggle with prediction issues related to big data. DL methods perform better when used to solve power quality disturbance challenges [
12]. The DL method has shown good performance when dealing with high-dimension data, non-stationary data, and non-linear time series data [
10]. The most popular deep learning (DL) methods are recurrent neural networks (RNNs), LSTM, BiLSTM, and CNN. These algorithms have densely connected neurons, resulting in high learning and generalization capabilities [
13]. A ‘deep recurrent neural network with long short-term memory (DRNN-LSTM)’ was used to predict solar panel output including the load an hour ahead [
14].
A deep neural network (DNN) is a developed and advanced artificial neural network (ANN) that is sequence-based and has the capability of learning and extracting features from inputs in the time series domain. RNN is a recurring neural network and has a unique network architecture, as shown in
Figure 2.
to
represent the input variables, while
to
represent the output variables. A variable is a symbolic representation that can assume various values, categorized based on characteristics and context into types like independent, dependent, discrete, continuous, and categorical.
The output of the neuron is an input to the same neuron at the following time point, so that the output of the system at this instant comes from the interaction between the input of a particular time instant and all the time instances in memory. Consequently, it essentially lets signals move forwards and in reverse. In the time domain, the regression procedure has produced good results during wind speed forecast and current control [
15]. RNN methods are many inputs to one output, many inputs to many outputs, and one input to many outputs. RNN is challenged when dealing with long-term dependencies that lead to vanishing gradients. LSTM is a specialized RNN capable of dealing with these long-term dependency challenges and has shown significant performance when dealing with time series data and forecasting [
16]. RNN and particularly LSTM have been used in feature sequence extraction, for example, in transient periods, as well as data classification in power applications, including fault diagnosis of photovoltaic, wind turbines with multivariate time series, transmission lines, and the prediction of fault location distance in a two-bus line test system of 220 km [
17]. There is no need to make assumptions when using LSTM, and it can deal with dynamic, non-linear, complex, and noisy data in a higher-dimensional space. The LSTM network transmits in the forward direction, and Bi-LSTM has a minimum of two LSTM layers arranged so that one LSTM layer processes data in the onward direction and the second LSTM layer in the reverse direction. This arrangement improves forecasting precision [
18].
CNN is extensively used in the field of image processing and is steadily being used to predict time series data [
19]. CNN has been applied in various electrical power systems relating to prediction and classification problems varying from harmonics, transients, islanding, instability, etc. CNN automatically extracts system features and is easy to train compared to other neural networks that have several hidden layers. CNN can also be used for 1-D inputs, like most power systems problems, and relies mainly on the convolutional and dense layers rendering the pooling layer less significant. It has been successfully used in a number of power system-related classification and prediction problems [
20].
Figure 3 is the example of CNN 1-D architecture.
CNN is combined with AM to focus on specific features, leaving unnecessary features and boosting the desired information. This enhances the feature selectivity of the chosen model [
21]. CNN-BiLSTM with attention mechanism (AM) has been successfully used to identify a two-phase flow pattern in a multivariate time series. AM was used to select the highest values of the small vectors, as the vital features of the small vectors could not be detected and selected by CNN-BiLSTM layers. These vectors would be combined in the n-dimensional vector to give a vector of each medium-sized vector [
22]. Hybrid LSTM with the AM model has been used in residential load forecasting and showed impressive results. AM is well suited for demand-side forecasting methods involving LSTM. LSTM cannot pick up inner correlations among the hidden features that have a significant impact on the forecast results. Thus, AM is deployed to mitigate this weakness by adaptively weighting the hidden features [
23].
4. Results and Discussions
The same dataset is applied to five other algorithms, and the results are compared with the proposed hybrid model. These models are CNN, LSTM, BiLSTM, CNN-LSTM, CNN-BiLSTM, and CNN-BiLSTM-AM, and their prediction performances are shown in
Table 1. The proposed CNN-BiLSTM-AM hybrid model has the best performance and minimum prediction error. RMSE and loss functions against iteration plots for CNN-BiLSTM-AM model performance are shown in
Figure 12. Plotting RMSE against iteration in machine learning or optimization shows how the error changes throughout training or optimization. Initially, RMSE typically decreases rapidly as the model improves its predictions. Eventually, the rate of decrease slows, and RMSE may stabilize, indicating convergence. A rising RMSE on a validation dataset while the training RMSE continues to decrease suggests overfitting, prompting techniques like early stopping to prevent it. Overall, this visualization offers insights into training progress, convergence, and potential overfitting, aiding in the optimization of machine learning models. Plotting the loss against iteration in machine learning or optimization demonstrates how the loss function evolves during training or optimization. Initially, the loss typically decreases rapidly as the model learns from the data, indicating improvement in predictions. Over time, the rate of decrease may slow down, and the loss may stabilize, signifying convergence. A rising loss on a validation dataset while the training loss decreases suggests overfitting, prompting techniques like early stopping to prevent it. In essence, this visualization provides crucial insights into training progress, convergence, and potential overfitting, guiding the optimization of machine learning models.
For the RMSE graph, the y-axis is and the loss graph y-axis is . The network training cycle had 430 epochs and completed 43,000 iterations with a piecewise learning rate of 1exp (−12), and (a) shows 100% training complete while (b) is 70% complete. CNN-BiLSTM-AM offered superior accuracy by leveraging automated feature extraction, capturing temporal dependencies, focusing attention, enabling end-to-end learning, and ensuring adaptability compared to traditional methods.
Figure 13 shows CNN-BiLSTM RMSE and loss curves. The network training cycle had 430 epochs and completed 43,000 iterations with a piecewise learning rate of 1exp (−12), and (a) shows 100% training complete while (b) is 70% complete. RMSE and loss start to fall at epoch 3. An “epoch” in machine learning signifies one complete iteration through the entire training dataset during neural network training. It involves processing all data batches to adjust model parameters based on computed errors. Multiple epochs are typically needed for effective model training, with validation performed after each epoch to monitor progress and prevent overfitting.
Both RMSE and loss progressively decreased as the iterations increased. An indication measure of how a model can predict the expected outcome is called a loss function. A loss function is a commonly used metric to evaluate a model’s misclassification rate, i.e., the proportion of incorrect predictions. The deep learning network learns by means of the loss function. The loss function tends to be large when the predictions deviate significantly from the actual results. Optimization algorithms are used to ensure that the loss function learns to minimize the error in the prediction process. In general, loss functions are classified into two major categories, namely, regression losses and classification losses. In our work, the loss function is a regression loss because it is used to predict continuous values.
The trained network is tested by forecasting multiple harmonics in the future. The network would predict time steps one at a time, then update the network state at each prediction, i.e., the previous prediction is used as an input function for the current prediction. A large harmonics data and Intel Core i7 CPU are used. The Intel Core i7 processor is an industry-leading CPU in terms of its performance for discrete-level graphics and AI acceleration and predictions, as well as the RMSE is computed faster.
The results are in line with expectations. CNN has a fast training time and makes superior predictions when dealing with images instead of time series data. LSTM’s prediction performance on time series data is better than CNN’s prediction performance on time series data. BiLSTM’s prediction performance is better than LSTM’s since it extracts information from both forward and reverse directions. CNN-BiLSTM’s prediction is superior to that of CNN-LSTM. These hybrid models combine the advantages of either model, resulting in faster training times and better performance. Adding the attention mechanism to the CNN-BiLSTM model, which is the focus of this paper, further improves the prediction accuracy, as shown in
Figure 14. In (a), the prediction in blue closely follows the expected in red; (b) shows the harmonics data; and (c) shows that the prediction in red closely follows the expected. The model’s prediction accuracy is superior. The proposed hybrid model achieved excellent results in the prediction of harmonics.
The superiority of the proposed model is further shown in
Table 1, where it is compared to the five other models. The table compares model prediction accuracy and RMSE.
The combination of CNNs, BiLSTMs, and attention mechanisms (AMs) can improve accuracy for the prediction of load harmonics compared to traditional methods in several ways:
- (i)
CNN (feature extraction): Traditional methods often rely on handcrafted features or simplistic transformations of the input data. In contrast, CNNs can automatically learn hierarchical representations of the data, capturing both local and global features that are relevant to load harmonics. This allows the model to adapt more effectively to the complexity of the data and extract features that may not be apparent through manual analysis.
- (ii)
BiLSTM (capturing temporal dependencies): Load harmonics data are inherently sequential, with complex temporal dependencies between data points. BiLSTMs are designed to capture such dependencies by processing the data bidirectionally, enabling the model to learn patterns across different time scales. This ability to capture long-range dependencies can lead to more accurate predictions compared to traditional methods that may struggle with capturing temporal dynamics effectively.
- (iii)
Attention Mechanism: The attention mechanism further enhances the model’s ability to focus on relevant parts of the input sequence. In the context of load harmonics prediction, certain time periods or frequency components may be more critical for accurate forecasting. By dynamically adjusting the importance of different parts of the input sequence, the attention mechanism allows the model to prioritize information that is most relevant to the prediction task, leading to improved accuracy.
- (iv)
End-to-End Learning: The CNN-BiLSTM-AM architecture facilitates end-to-end learning, where the model learns directly from the raw input data to make predictions. Traditional methods often involve multiple stages of preprocessing and feature engineering, which can introduce manual errors and may not fully capture the complexity of the data. By learning directly from the raw data, the CNN-BiLSTM-AM model can leverage the full information content of the input sequence, potentially leading to more accurate predictions.
- (v)
Adaptability and Generalization: CNN-BiLSTM-AM models are highly adaptable and can generalize well to unseen data. Traditional methods may rely on assumptions or simplifications that limit their applicability to diverse datasets or changing conditions. The flexibility of deep learning models allows them to adapt to different data distributions and environmental factors, leading to more robust and accurate predictions across a wide range of scenarios.
Overall, the CNN-BiLSTM-AM architecture offers significant advantages over traditional methods for load harmonics prediction, including improved feature representation, better capture of temporal dependencies, enhanced attention mechanisms, end-to-end learning, and increased adaptability and generalization capabilities. These advantages contribute to higher accuracy and more reliable predictions in practical applications.
The hybrid model, consisting of CNN-BiLSTM-AM, exhibits optimal performance with minimal prediction error, as evidenced by the RMSE and loss function plots against iterations presented in
Figure 12. In the RMSE graph, the y-axis is scaled by 10
16, while the Loss graph y-axis is scaled by 10
33. The network underwent 430 epochs, completing 43,000 iterations with a piecewise learning rate of 1exp(−12). Subplot (a) indicates 100% training completion, while subplot (b) represents 70% completion.
Figure 13 showcases the RMSE and loss curves for CNN-BiLSTM. Similar to the previous case, the network underwent 430 epochs, completing 43,000 iterations with a piecewise learning rate of 1exp(−12). Subplot (a) denotes 100% training completion, and subplot (b) signifies 70% completion. Notably, both the RMSE and loss curves begin to decline at epoch 3, indicating the model’s improving predictive capabilities.
The reduction in both RMSE and loss metrics continues as iterations progress. The loss function serves as a metric for evaluating misclassification rates, reflecting the proportion of incorrect predictions. Optimization algorithms ensure that the loss function minimizes prediction errors during the learning process.
Comparing individual model performances, CNN demonstrates rapid training for image-related tasks, while LSTM outperforms CNN in time series data prediction. BiLSTM, by considering information in both forward and reverse directions, surpasses LSTM. The hybrid CNN-BiLSTM model excels over CNN-LSTM, offering faster training and enhanced performance. The CNN-BiLSTM-AM model is validated using methods like holdout, cross-validation, time series validation, and stratified sampling. Given the time series nature and complexity of the harmonics data, time series validation methods were selected. This approach involves training the model on past data and validating it on future data using a rolling or expanding window approach. Other validation methods, include Nested Cross-Validation, Holdout with Stratified Sampling, Bootstrapping, Leave-One-Out Cross-Validation (LOOCV), Holdout Validation, and Cross-Validation.
The introduction of the attention mechanism in the CNN-BiLSTM model, highlighted in this study, further enhances prediction accuracy, as depicted in
Figure 14. Subplot (a) illustrates the close alignment between the blue prediction curve and the red expected curve. Subplot (b) displays harmonics data, and subplot (c) showcases the red prediction closely matching the expected data, emphasizing the superior accuracy of the model in predicting harmonics.
Table 1 reinforces the excellence of the proposed hybrid model by comparing it to five other models. The table provides a comprehensive overview of model prediction accuracy and RMSE, further supporting the superior performance of the proposed hybrid model in predicting harmonics.
The CNN-BiLSTM-AM model introduces novel contributions to electrical power system harmonics analysis by integrating CNNs for feature extraction, BiLSTMs for capturing temporal dependencies, and attention mechanisms for focusing on relevant harmonic components. This integrated framework enhances the accuracy and interpretability of harmonic analysis, potentially improving detection and mitigation strategies for harmonic distortions in power systems.
In electrical engineering, the CNN-BiLSTM-AM architecture has various practical applications:
- (i)
Power Quality Analysis:
Voltage Sag/Swell Detection: Detect and classify voltage disturbances with CNNs, BiLSTMs, and attention mechanisms.
Harmonic Analysis: Assess power quality and harmonic distortion using CNNs, BiLSTMs, and attention mechanisms.
- (ii)
Power Grid Monitoring and Control:
Fault Detection: Detect anomalies in power transmission systems using CNNs, BiLSTMs, and attention mechanisms.
Load Forecasting: Predict future electricity demand by analyzing historical load data with CNNs, BiLSTMs, and attention mechanisms.
- (iii)
Smart Grid Optimization:
Energy Management: Optimize energy distribution and consumption in smart grids with CNNs, BiLSTMs, and attention mechanisms.
Renewable Energy Integration: Integrate renewable energy sources efficiently using CNNs, BiLSTMs, and attention mechanisms.
- (iv)
Electrical Equipment Maintenance:
Predictive Maintenance: Predict and prevent equipment failures with CNNs, BiLSTMs, and attention mechanisms.
Condition Monitoring: Continuously monitor equipment health and performance using CNNs, BiLSTMs, and attention mechanisms.
These applications highlight how the CNN-BiLSTM-AM architecture can be applied in electrical engineering for various tasks such as monitoring, control, optimization, and maintenance.
The CNN-BiLSTM-AM model, while powerful for sequence modeling, faces notable limitations. Its computational complexity demands substantial resources for training and inference, restricting its deployment in resource-constrained environments. Moreover, large labeled datasets are necessary for optimal performance, posing challenges for applications with limited data availability. Hyperparameter tuning is time-consuming and expertise-intensive, crucial for achieving model effectiveness, while the model’s complex architecture hinders interpretability, impacting its suitability for transparent decision-making contexts. Overfitting is a risk, especially with small datasets or overly complex architectures, and training time can be lengthy, necessitating acceleration techniques for efficiency. Domain specificity may require retraining or fine-tuning, and imbalanced datasets can undermine performance, necessitating additional mitigation strategies. Understanding and addressing these limitations are essential for effective utilization of the CNN-BiLSTM-AM model across diverse applications.
Future work on the CNN-BiLSTM-AM hybrid model could involve hyperparameter tuning, adjusting model architecture, exploring ensemble methods, and optimizing for real-time applications. Continuous evaluation and refinement of the model along these suggested directions, can enhance its adaptability and effectiveness. One potential avenue for future work involves the development of the demand-side subsystem within the electrical power system. This would entail integrating the trained model and evaluating its impact. A comparative analysis would then be conducted between the demand-side subsystem with and without the model to assess its effectiveness.