1. Introduction
The fundamental premise of the present research was to determine which measurable machine parameters and input variables enable the simplest, most accurate, and most cost-effective detection of industrial equipment condition. The central research question investigates whether electrical measurement data already available in industrial environments—particularly the spectral components of current harmonics—contain sufficient information to support early fault prediction, reliable detection of condition changes, and robust identification of impending equipment failures. The study further examines whether the utilisation of electrical harmonic data can provide a viable solution for comprehensive condition monitoring, enabling timely and evidence-based signalling of both current and potential failures to maintenance personnel and enterprise management systems. The objective is to establish a predictive anomaly detection framework that leverages existing harmonic measurements without requiring additional sensor infrastructure, thereby ensuring industrial feasibility, economic efficiency, and operational reliability. Electrical harmonics have received increasing attention due to their adverse effects on power quality, particularly in industrial environments where modern nonlinear loads are connected to internal distribution networks. In such systems, harmonic injections originate directly from the operation of industrial equipment, and the measurement data used in this study derive from these equipment-generated harmonic currents [
1]. Beyond their impact on network quality, these harmonic spectra contain information about load behaviour and dynamic operating conditions, thereby offering potential for condition monitoring applications. In the author’s previous work, the suitability of current spectrum data for machine and equipment condition monitoring was investigated, and how these signals can be exploited in practical industrial contexts was examined. Particular emphasis was placed on assessing the feasibility of anomaly detection for fault prediction and forecasting of condition changes. Furthermore, the possibility of classifying failures using clustering techniques—thereby distinguishing individual fault types—was also examined. The detected anomalies were validated through time-synchronised comparison with actual failure events and real equipment condition data. The practical findings indicated that anomaly detection and condition prediction based on harmonic data constitute a promising direction; however, further improvements in the effectiveness and robustness of ML models are required to achieve more accurate and reliable predictions [
2]. This requirement is especially critical in industrial environments, where false alarms and missed detections may entail substantial economic risks and operational consequences. To improve the performance of machine learning models and complete data processing workflows (ML pipelines), the literature proposes multiple complementary approaches operating at different methodological levels. At the data level, a significant research direction addresses the impact of class imbalance and limited minority samples [
3]. Strongly imbalanced datasets are known to degrade classification performance, particularly in rare-event detection tasks. Techniques such as oversampling, undersampling, and the Synthetic Minority Oversampling Technique (SMOTE) aim to mitigate this degradation. The literature not only summarises these approaches but also discusses their practical limitations and implementation challenges in industrial measurement scenarios, where synthetic data generation may affect physical interpretability. At the model and algorithm levels, regularisation and hyperparameter optimisation are major performance-enhancing strategies [
4,
5]. Regularisation aims to reduce overfitting by constraining model complexity or penalising excessively large weights, while hyperparameter optimisation—via Bayesian search, grid search, or random search—seeks model configurations that maximise predictive performance. The combined application of these techniques is widely regarded as essential for developing stable and generalisable models. At the meta level, ensemble learning, transfer learning, and automated machine learning (AutoML) offer additional opportunities to improve performance. The study in [
6] reviews the fundamental principles of ensemble methods—including bagging, boosting, and stacking—demonstrating how the combination of multiple weak learners contributes to improved model accuracy and robustness. Transfer learning, systematically introduced in [
7], enables knowledge transfer across related tasks, including inductive, transductive, and unsupervised variants. AutoML systems, as discussed in [
8], automate hyperparameter optimisation, model selection, pipeline construction, and meta-learning, thereby reducing manual intervention. These methods are often embedded in broader Knowledge Discovery in Databases (KDD) frameworks, which integrate data preparation, modelling, pattern discovery, and interpretation into structured analytical workflows. The theoretical foundation of the present study is supported by the work of Abonyi et al. (2007) [
9], which addresses clustering methodology in data mining and system identification, with a strong emphasis on practical industrial applicability. Despite the richness of these individual research directions, a methodological gap remains between isolated algorithmic improvements and the systematic design of robust anomaly detection systems tailored to electrical harmonic data under non-stationary operating conditions. In practical industrial environments, model stability, interpretability, drift resilience, and alarm reliability are at least as important as numerical optimisation of standard performance metrics. The objective of this paper is therefore twofold. First, it provides a structured overview of the aforementioned data-level, model-level, and meta-level enhancement methods. Second, and more importantly, it demonstrates their practical and combined application using real electrical network measurement data. Rather than presenting a purely theoretical literature survey, the study adopts an integrated experimental perspective and investigates how the joint application of correction functions, improvement algorithms, and selective combination strategies influences the predictive performance of a harmonic-based anomaly detection system.
Section 2 presents how electrical harmonic data and their current spectra can be utilised for machine condition monitoring and anomaly detection and is validated through time-synchronised comparisons with actual failures and real equipment condition data.
Section 3 discusses correction functions, including residual learning and stacking methods that construct secondary predictors from model residuals. Debiasing and bootstrap bias-correction techniques are examined, alongside Bayesian calibration approaches presented in [
10], which account for model uncertainty, measurement noise, and systematic deviations. Evolutionary optimisation tools, including genetic algorithms (GA) [
11] and particle swarm optimisation (PSO) [
12], are introduced as mechanisms for performance enhancement in complex search spaces.
Section 4 presents improvement algorithms through practical examples, distinguishing between static rule-based interventions and dynamic search-based or learning-based processes.
Section 5 addresses combination methods, demonstrating—through practical case studies—the extent to which integrating correction functions and improvement algorithms improves predictive reliability.
Section 6 introduces an Organically Adaptive Predictive (OAP) machine learning model capable of adapting to continuously changing load conditions and spectral dynamics. Unlike static models, the OAP framework continuously updates its internal representation based on the current network state, newly arriving samples, and harmonic amplitude and phase drifts. Moreover, it autonomously adjusts hyperparameters and learning dynamics as a function of prediction error and drift magnitude. All machine learning models, optimisation procedures, and validation experiments were implemented and tested in MATLAB R2024a/b and MATLAB R2025a/b [
13].
Section 9 summarises the results.
The main scientific and practical contributions of this paper can be summarised as follows. The study demonstrates that electrical harmonic current spectrum data can be effectively utilised for machine condition monitoring and anomaly detection based on real industrial measurements validated against actual failure events. It provides a structured analysis of data-level, model-level, and meta-level performance enhancement methods and evaluates their combined impact within a unified PCA–LSTM–GMM-based anomaly detection framework. Through practical industrial case studies, the research quantifies how correction functions, optimisation procedures, and selective method combinations reduce false-positive and false-negative alarm rates under non-stationary operating conditions. Furthermore, an Organically Adaptive Predictive (OAP) model is introduced, capable of adapting to time-varying spectral dynamics while autonomously regulating its internal parameters. Beyond empirical performance improvements, the study contributes transferable design principles for industrial harmonic-based anomaly detection systems, demonstrating that decision-level adaptivity, stability-oriented optimisation, and ablation-validated selective method integration provide more robust improvements than increasing predictive model complexity alone. Rather than prescribing a universally optimal hyperparameter configuration, the paper proposes a hierarchical and stability-centred design strategy applicable to industrial time-series anomaly detection tasks characterised by non-stationarity, class imbalance, and operational safety constraints.
2. Investigation of Electrical Harmonics for Anomaly Detection
The primary objective of the present study was to systematically investigate the extent to which electrical harmonic components can be utilised for condition monitoring of industrial machinery and equipment, early fault detection, failure prediction, and clustering of distinct fault types. The central research question addressed whether harmonic distortions arising from nonlinear loads and electromechanical processes contain sufficiently discriminative information to support the reliable identification of incipient anomalies and the differentiation of operating states. The proposed approach is based on real industrial electrical harmonic measurement data and employs machine learning models to analyse these signals in an anomaly detection framework. The methodological design encompasses feature extraction, temporal pattern learning, and both statistical and predictive identification of abnormal operating conditions. Particular attention is devoted to analysing how the temporal dynamics of the harmonic spectrum reflect changes in machine behaviour, including transitional states, gradual degradation, and explicit fault conditions. Importantly, the study does not confine itself to theoretical model evaluation but explicitly targets practical industrial applicability. Accordingly, the analysis considers non-stationary operating environments, measurement noise, and the detection of rare yet safety-critical events. The results indicate that electrical harmonic components constitute a meaningful source of condition-related information and, when combined with appropriately designed machine learning architectures, provide a robust foundation for industrial anomaly detection and predictive maintenance systems.
2.1. Brief Literature Review
The application of anomaly detection techniques for predictive maintenance has become an intensively researched area within industrial data analytics. Recent studies have demonstrated that advanced machine learning and artificial intelligence (AI) approaches can significantly enhance early fault detection capabilities and improve maintenance decision-making processes. Fresina et al. [
14] investigate a range of anomaly detection methodologies within predictive maintenance frameworks, focusing on algorithms capable of forecasting machine or system failures based on heterogeneous sensor data sources. Their work systematically compares statistical, machine learning, and hybrid approaches, emphasising their suitability for different operational contexts and data characteristics. Serradilla et al. [
15] provide a comprehensive and structured review of deep learning techniques applied to industrial predictive maintenance. Their study outlines common model architectures, data preprocessing strategies, and evaluation metrics, while also presenting practical implementation examples. The analysis serves as an important reference point for understanding the strengths and limitations of deep neural architectures in real industrial environments. Stephen et al. [
16] present a detailed examination of the role of artificial intelligence in transforming industrial anomaly detection systems. The authors highlight key technical and operational challenges, including real-time processing constraints, model robustness under non-stationary conditions, explainability requirements, and the integration of AI systems into existing industrial infrastructures. Although the existing literature demonstrates substantial progress in sensor-based anomaly detection and AI-driven maintenance strategies, comparatively less attention has been devoted to the systematic exploitation of electrical harmonic components as a primary information source for condition monitoring and fault characterisation. Electrical harmonics, which arise from nonlinear loads and electromechanical interactions, represent a potentially rich yet underexplored diagnostic domain. In this context, the present study investigates the utilisation of electrical harmonic measurement data for anomaly detection in industrial equipment using machine learning models. Particular emphasis is placed on practical industrial applicability, including robustness under non-stationary operating conditions and deployment-oriented evaluation.
2.2. Extraction of Training and Test Data
To implement anomaly detection based on electrical harmonic data, the measurement data must be downloaded from network analyser instruments in the format and file extension specified by the instrument manufacturer. The datasets contain, for each measurement time instant, the fundamental current values, total harmonic distortion, direct current components, and the current amplitudes of individual harmonic orders referenced to the fundamental component. These basic measurements provide the database required for anomaly detection and data mining analyses.
Table 1 presents an example of the data measured and recorded by the network analyser. The first column of the table contains the measurement index, while the second column specifies the measurement timestamp. The third column reports the average (Avg) total harmonic distortion (THD) of the phase current I1 expressed as a percentage (thdI1). The fourth column contains the average direct current component of the phase current I1 (dcI1), and the fifth column shows the average fundamental harmonic amplitude of I1 at 50 Hz (har01I1). The sixth to ninth columns present the current amplitudes, referenced to the fundamental component, of the harmonic orders har03I1 (150 Hz), har05I1 (250 Hz), har07I1 (350 Hz), and har09I1 (450 Hz), expressed in amperes. These collected measurement data form the basis for both anomaly detection and clustering procedures.
The extraction of training and test datasets required for machine learning is performed after selecting the data that contain the relevant information for the analysis. The number of electrical harmonic components to be included in the applied algorithms is determined in advance, as illustrated in
Table 2. The table header lists the harmonic orders associated with the phase current I2 (har03I2, har05I2, har07I2, har09I2, har11I2, har13I2, har15I2, har17I2, har19I2), while the corresponding cells contain the amplitudes referenced to the fundamental component, that is, the harmonic current magnitudes. These values serve as input features for training the machine learning models. The resulting models enable online monitoring of equipment operational conditions and support the prediction of potential failures based on real-time measurement data. During the process, a machine learning model is trained using historical measurement data, and the trained model subsequently analyses the current system behaviour to forecast future events. Such datasets can be utilised to construct so-called “before–after” type machine learning models, including long short-term memory (LSTM) networks and automated machine learning (AutoML) approaches. These methods are capable of handling virtually any type of input measurement data. As an initial step, a network analysis is performed while the equipment is in a fault-free condition, preferably after a major overhaul or replacement of critical components. These measurements constitute the so-called “before” training data. Once the equipment resumes operation, harmonic injections are monitored online, providing the corresponding “after” datasets. During the measurement process, anomalies in equipment operation are identified in a manner analogous to vibration-based diagnostics. The key difference is that the proposed diagnostic approach can assess the condition of electronic and electrical components and, through them, infer the condition of mechanical components as well.
Table 3 presents clustered data. This data format no longer contains only the harmonic amplitudes by order but also includes the corresponding alarm and fault categories assigned to each measurement instance. In the clustering process, the last two columns of
Table 3—Alarm Type and Fault Type—are used to classify alarms and faults. The 10th column (Alarm Type) specifies the type of alarm, which can be “OK” when the machine is operating normally, “Warning” when a potential issue is indicated and further inspection is required, or “Error” when a fault is detected. The 11th column (Fault Type) provides information on the possible causes of the detected faults. In practice, the number of defined fault categories depends solely on the available time, data, and modelling effort. In the present study, overload (OL) and short circuit (SC) fault types were defined and used for model training. The label “N/A” indicates that no identifiable fault is present, which is a natural outcome when the machine state is classified as “OK,” implying normal operation without detectable faults or unidentified conditions. Naturally, any naming convention or language can be used to define and cluster operating states and fault categories.
The training process requires the use of a large amount of data, and in this context the ML model must be trained separately for each fault type. In practical terms, this means that individual faults occurring in the machines must be identified, and the corresponding electrical harmonic values must be measured and stored. Once the model has been trained on the identified fault conditions, online monitoring of the machine state—based on electrical harmonic measurement data—becomes possible. This data format can be utilised by various machine learning models, including, but not limited to, decision trees, random forest models, kernel-based support vector regression (SVR), and support vector machines (SVM). In addition, this data format can be processed by neural networks employing rectified linear unit (ReLU) activation functions, as well as gradient boosting methods. In summary, the use of electrical harmonics for anomaly detection enables not only anomaly identification but also the detection of specific fault types affecting the equipment. The generated indications may correspond to warning signals, fault alarms, or normal operating conditions. In many cases, it is also desirable to identify the underlying cause of a detected fault, that is, whether it originates from electrical or mechanical components. Given a sufficient amount of measurement data, the machine learning models can be trained to support these diagnostic functions as well. During the overall process, particular emphasis must be placed on the selection of appropriate fault types and the development of machine learning models that mitigate the risks of false alarms and missed detections. Such models help avoid unnecessary workload for maintenance personnel and operators and aim to provide primarily relevant and actionable information for system operators and end users.
2.3. Development of Machine Learning Models
The development of the ML models, including code implementation and the complete training and testing workflow, was carried out in MATLAB. Based on the methodology presented in this section, several models with different architectures and parameter configurations were implemented, from which the best-performing model was subsequently selected. During the investigation, a window-based multivariate long short-term memory (LSTM) regression model was developed. This model can simultaneously capture the temporal dynamics of the signal and the mutual correlations among its components. The model operates with a nine-dimensional input, where each dimension corresponds to a specific electrical harmonic component. The input features are derived from the odd-order harmonics ranging from the 3rd to the 19th harmonic order (3rd, 5th, 7th, 9th, 11th, 13th, 15th, 17th, and 19th), resulting in a total of nine distinct components. This architecture enables the structured and simultaneous processing of harmonic components that are dominant from the perspective of total harmonic distortion (THD). Based on a temporal window of the input harmonic data, the regression-based LSTM network predicts the nine-dimensional harmonic spectrum at the subsequent time step. The prediction error and degradation are then computed as the difference between the predicted spectrum and the corresponding measured data. For model comparison, the degradation observed during training was used as the primary selection criterion, and the model with the lowest degradation was selected. Lower degradation indicates that the model produces minimal deviation between the measured and predicted harmonic components. Consequently, threshold values derived from this metric enable accurate and reliable early detection of potential anomalies.
2.4. Evaluation of Model Learning Performance
Following the development of the individual machine learning models, an objective and quantitative evaluation of their performance is essential. The purpose of this evaluation is to determine the extent to which the values predicted by the model deviate from the actual measurement data. The analysis of predictive accuracy is particularly important during the training phase, as it provides a basis for selecting the most effective model architectures for a given dataset. During both training and validation, real electrical harmonic data obtained from in-house measurements were used. Consequently, model performance was evaluated on data representing actual operating conditions. The following figures present a graphical representation of the datasets used for model evaluation.
Figure 1 illustrates the measured electrical harmonic injection of the investigated equipment together with the corresponding model prediction. In the figure, the blue curve represents the amplitude of the measured harmonic component, while the red curve shows the values predicted by the LSTM model. In several segments, the two curves closely follow each other, indicating that the model captures the overall trend of the signal. However, multiple local deviations can also be observed, revealing limitations of the model. The predicted (red) curve systematically underestimates the measured signal in several regions, particularly in the interval between sample index 200 and 800, where the signal frequently exhibits rapid and sharp peaks. Similar behaviour is observed in the range between sample index 1400 and 1700, where high-amplitude transient phenomena occur. In these intervals, the model does not respond quickly enough to abrupt changes, leading to overly smooth predictions that fail to accurately capture the true signal dynamics. This behaviour is characteristic of underfitting. Possible causes include excessively strong regularisation (e.g., dropout or L2 regularisation), insufficient LSTM model capacity (limited number of neurons or layers), suboptimal selection of the temporal window size, normalisation procedures that introduce excessive smoothing, or the use of insufficiently informative or overly homogeneous input features.
Figure 2 illustrates the magnitude and temporal distribution of the prediction errors across the data samples. The vertical axis represents the prediction error magnitude, expressed as the ℓ
2-norm computed over the nine-dimensional harmonic spectrum, while the horizontal axis denotes the sample index. The red dashed line indicates the detection threshold, which was determined from the upper quantile of the error distribution and is 9.125. The prediction error time series exhibits predominantly stationary behaviour, with no evidence of long-term drift or systematic increase or decrease. The error signal is mainly characterised by low- to medium-amplitude oscillations. Most error values lie between 3 and 7, indicating that the model generally tracks the signal’s typical operating regimes with acceptable accuracy. Across the entire observation window, the error sequence appears noisy and rapidly varying, suggesting that the model continually struggles to capture the signal’s fine-grained details. This observation is consistent with the findings from
Figure 1, which showed that the model smooths out fast, high-frequency components of the signal. Occasionally, pronounced error spikes are observed, typically ranging from 10 to 14. These peaks predominantly reflect phenomena such as sudden amplitude changes in the measured signal or time intervals in which the temporal memory of the LSTM network is insufficient. In these cases, the model strongly underestimates the true signal peaks. However, such error spikes occur infrequently, indicating that the model adequately tracks the majority of the signal while experiencing significant difficulty in capturing transients and rapid dynamic changes. Analysis of the prediction error time series further reveals that the LSTM network maintains stable accuracy across most of the sample range, while producing several locally occurring outlier errors. Only seven data points exceed the threshold value of 9.125, which was derived from the 0.995 quantile of the error distribution. This confirms that the model performs well under normal operating conditions but fails to adequately capture highly dynamic or abruptly changing signal behaviour. The distribution of prediction errors provides clear evidence of underfitting: the model reproduces slower, smoother trends but systematically underestimates rapid transitions and transient phenomena. These results indicate that improving prediction performance requires either increasing the network capacity or introducing appropriate correction or enhancement mechanisms.
Figure 3 illustrates the detected anomalies and their temporal locations with respect to the sample indices. The
Y-axis represents the output of a binary anomaly detector, where a value of 0 indicates the absence of an anomaly and a value of 1 denotes the presence of an anomaly. The red dashed line corresponds to the upper 0.5% of the error distribution (threshold quantile = 0.995), serving as a robust, statistically based anomaly-detection threshold. This threshold implies that when the prediction error exceeds this level, the corresponding time instance deviates significantly from the normal predictive behaviour. It is important to note that events detected in this manner are often model-based anomalies and do not necessarily correspond to physically observable faults in the signal or the equipment. Out of the approximately 1500 samples in the analysed dataset, only seven instances exceed the defined threshold. As shown in the figure, threshold-crossing events occur infrequently and are clearly separated from the normal operating region, confirming the effectiveness of the selected threshold in isolating anomalous behaviour.
The relative root mean square error (RMSRE) is calculated using Equation (1). The degradation metric D
d quantifies the distance between the measured values and the values predicted by the previously described model:
where
Dd denotes the degradation, expressed as the relative root mean square error in percentage form (RMSRE),
dk(r) represents the measured test data,
dk(p) corresponds to the values predicted by the LSTM model, and
N is the number of samples. In the present case, the obtained degradation value of 15.87% indicates a moderately well-fitting LSTM model that nevertheless requires further improvement. The largest prediction errors occur predominantly during transient events that exceed the static threshold and trigger a total of 7 anomaly detections. The observed signs of underfitting suggest that improving model performance could lead to substantial improvements in prediction accuracy. Although the degradation level is not extreme, it clearly indicates that the model has not yet fully learned the true dynamics of the signal. One possible mitigation strategy is the introduction of an adaptive threshold, as the currently applied fixed threshold may represent an overly simplistic solution. Potential alternatives include a quantile-based dynamic threshold derived from a Gaussian mixture model (GMM), or the integration of rough set theory (RST) with decision rules. These performance-enhancing correction functions and improvement algorithms for machine learning models are discussed in detail in the subsequent sections. In all cases, elevated degradation levels must be carefully analysed and, if necessary, the model architecture or parameters should be modified. If the presence of a large number of true anomalies is confirmed—indicating an actual faulty condition of the investigated equipment—it must be verified whether the prescribed refurbishment or maintenance actions have been carried out. Ensuring that the equipment operates fault-free during the training phase is critical, as LSTM-based anomaly detection relies on learning from a reference condition representing normal operation.
2.5. Evaluation of Test Model Performance
After the training phase, the performance of the machine learning model was also evaluated using current measurement data collected after the commissioning of the machine or equipment. The previously trained model described in the preceding section was applied to test data obtained from the authors’ own electrical harmonic measurements. The behaviour observed on the test dataset is particularly informative, as it reveals how the model performs on real, previously unseen data whose signal characteristics and statistical properties differ from those used during training.
Figure 4 presents the electrical harmonic components, along with corresponding model predictions based on real measurement data from the same machine during normal operation. This figure is among the most informative visualisations, as it clearly illustrates where and how the model produces prediction errors. The relative deviation measured during testing amounts to 17.76%. Although this value is not excessively high, it clearly indicates that the predictive performance of the model deviates substantially from the measured signal in several segments. The underlying causes partially coincide with those observed during the training phase; however, when combined with the non-stationary behaviour of the test data, the limitations of the model become more pronounced. A comparison of the blue (measured) and red (predicted) curves in
Figure 4 is particularly revealing. The predictions systematically underestimate the measured signal, with the red curve remaining below the blue curve throughout most of the time series. The model exhibits relatively good agreement in the lower dynamic range, accurately capturing average amplitude levels, but fails to track rapid, high-amplitude spikes. Highly variable segments with wide amplitude dispersion are strongly smoothed by the model. This behaviour represents a classic case of underfitting, indicating that the model produces overly smooth outputs and lacks sufficient capacity to represent the true dynamics of the signal. The measured signal frequently reaches amplitudes in the range of 5–10 A, exhibiting sharp, short-lived peaks and rapid temporal variations, while also showing long-term variance fluctuations, confirming a strongly non-stationary system. In contrast, the predictions collapse into a much narrower range between 1 and 4 A, with peaks being significantly attenuated or entirely missing. These observations suggest that the model may not be sufficiently deep, that the LSTM network’s memory capacity is limited, or that the selected input window size is suboptimal. It is also possible that the training data did not contain a sufficient number of peak-rich segments, or that regularisation techniques (e.g., dropout) excessively smooth the network output.
The discrepancy between the predicted and measured signals varies significantly along the time axis. Based on the plot, three characteristic regions can be identified:
From sample index 0 to 1500, the measured signal contains frequent high-amplitude peaks that are largely underestimated by the model, resulting in the largest deviations.
Between sample index 1500 and 2500, the signal is more stable with fewer outliers, and the predictions are comparatively closer to the measured values.
From sample index 2500 to 4000, the signal variance increases again, with numerous large peaks that are once more significantly underestimated, leading to increased deviation.
These results clearly demonstrate the signal’s non-stationarity and indicate that the model fits certain segments better than others. While the measured signal spans approximately 2–10 A and contains numerous short, steep excursions and high-frequency components, the predictions are confined to a smooth, narrow range between 1 and 4 A. This pronounced underfitting increases the average prediction error as reflected in the relative deviation of
Dd = 17.76%. Consequently, further development of the model is required. One promising direction is feature expansion [
17,
18,
19,
20], as several studies emphasise that incorporating new domain-specific and time-series features—such as windowed statistics and frequency-domain features—significantly improves generalisation and the performance of process monitoring models. Additional improvements may be achieved by using time-series derivatives [
21,
22] and moving averages [
23], which are standard elements of time-series feature engineering. Reconstruction-based approaches, such as back-projection of principal component analysis (PCA) components [
24], may further enhance the expressive power of the model. Moreover, balancing the training dataset using SMOTE [
25] or applying generative methods in general [
26,
27], as well as in specific industrial contexts [
28,
29], can also lead to substantial performance improvements. These model enhancement strategies are discussed in detail in the subsequent sections.
Figure 5 illustrates the magnitude and temporal distribution of the prediction errors along the analysed dataset. The
Y-axis represents the magnitude of the prediction error, expressed as an absolute, dimensionless quantity. The distribution of the prediction error time series is strongly non-stationary, meaning that the error magnitude varies substantially over time. The model performs adequately in certain segments, while the prediction quality deteriorates markedly in others.
Based on the error curve, four clearly distinguishable regions can be identified:
Sample index 0–900: A moderate error level with infrequent outliers. In this segment, most error values lie within the range of 3–8, although several peaks exceed 10, and short, narrow spikes occasionally reach values between 12 and 15. This behaviour indicates that the measured signal intermittently contains rapid variations that the model fails to accurately capture. While the prediction adequately estimates average levels, it underfits fast local peaks. The larger error spikes observed in this region may potentially indicate anomalous behaviour.
Sample index 900–1500: A low and stable error regime. This interval represents the model’s “comfort zone,” where error values typically range between 2 and 5 and remain well below the detection threshold (9.125). No extreme outliers are observed. This suggests that the signal behaviour in this region is highly predictable or closely resembles the statistical characteristics of the training data. The model dynamics are well aligned with the signal in this segment.
Sample index 1500–2300: The most critical segment, characterised by high error levels and consecutive peaks. In this interval, the error frequently rises to 8–12, often exceeding the 9.12 threshold, with several extreme peaks reaching 15–18. This segment closely corresponds to observations from the measured-versus-predicted signal comparisons, where rapid, high-amplitude, and unpredictable peak loads occur. The model clearly underfits in this region, with predicted values collapsing below the measured peaks. The signal variance changes rapidly, and the memory depth or capacity of the LSTM network is insufficient to capture the underlying patterns. This segment accounts for the majority of detected anomalies.
Sample index 2300–4000: A return to a stable, low-error regime. Error values again fall within the range of 2–6, rarely approaching the threshold, and no extreme spikes are observed. This suggests that the signal structure in this region is more stable and predictable, or that the model is better adapted to this specific pattern. In summary, model performance is strongly segment-dependent. In certain regions, the model tracks the signal accurately, whereas in others the prediction error increases significantly, particularly when the signal exhibits high-amplitude and rapidly changing excursions. This behaviour is fully consistent with the non-stationary and highly time-varying nature of real electrical harmonic components. Typical characteristics of the analysed signal include high-amplitude, steep spikes, time-varying variance (alternation between wide and narrow amplitude ranges), and rapid, nonlinear transitions. These structures pose significant challenges for LSTM-based models, especially when model capacity is limited, the selected temporal window fails to capture relevant time dependencies, or the training dataset lacks sufficient examples of peak-rich segments. Because the prediction errors occur primarily in localized regions rather than uniformly across the dataset, the overall degradation value Dd remains moderate (17.76%) rather than excessively high. This indicates that the model exhibits generally stable behaviour, while underfitting occurs in specific critical segments—a phenomenon well explained by the strongly heterogeneous structure of the signal.
Figure 6 illustrates the detected anomalies and their temporal locations with respect to the sample indices. The
Y-axis represents the output of a binary anomaly detector, where a value of 0 indicates the absence of an anomaly and a value of 1 denotes the presence of an anomaly. In total, the model detected 52 anomalies over the analysed dataset. The temporal distribution of the detected events is not uniform but is concentrated in several clearly distinguishable regions. The first detected anomaly occurs at sample index 611, while the last appears at sample index 3675. Based on the figure, three characteristic segments can be identified.
Sample index 0–1000: In this region, only a small number of anomalies are observed, with a few isolated events present. These detections are typically associated with local signal outliers or transient network fitting inaccuracies. The one or two anomalies appearing in the initial segment (sample index 0–200) may be attributed to uncertainty during the LSTM state warm-up phase.
Sample index 1000–2500: This interval represents the main anomaly block, where the majority of detected anomalies occur. In this segment, the model repeatedly signals prediction error (PE) excursions above the detection threshold. This behaviour coincides with sections of the measured signal characterised by high-amplitude, rapidly changing, and strongly nonlinear dynamics, which are visibly undertracked by the model. Both the magnitude of the prediction error and the frequency of threshold crossings are highest in this region.
Sample index 2500–4000: A further, shorter anomaly block appears in this segment. The signal becomes more variable again, and the model classifies several outlier points as anomalies, although the detection density is lower than in the preceding region.
Evaluation of the model performance indicates that the distribution of detected anomalies confirms the segment-dependent behaviour of the model. In stable, low-variance signal regions, the model tracks the dynamics effectively, whereas in high-amplitude, nonlinear, and rapidly varying segments, it tends to underestimate the true values. This behaviour is consistent with the test root mean square (RMS) value of 1.040 and the moderate degradation level of Dd = 17.76%. Thus, the degradation is not excessive, indicating that the model performs well on average. However, in certain critical segments, successive error accumulation occurs, accounting for the majority of detected anomalies.
3. Correction Functions
In this chapter, we investigate whether the performance of machine learning models trained on electrical harmonic data can be improved, and if so, which correction functions and algorithms can be applied to achieve this enhancement. Correction functions typically modify the model output locally using simple deterministic rules or statistical formulations. Common examples include setting negative values to zero, tuning detection thresholds based on receiver operating characteristic (ROC) curves, and filtering outliers using z-score-based methods. Sun Q. et al. present a data-driven error compensation model based on deep neural networks for a nonlinear analogue sun sensor in [
30]. The deep neural network (DNN)-based correction function learns systematic and random error sources within a unified framework and replaces classical surface-fitting calibration, thereby improving the accuracy of incidence angle estimation from 1° to 0.1°. This example clearly demonstrates that correction functions can provide fast, effective, and interpretable interventions even in complex measurement systems. The following overview summarises, without claiming completeness, the main categories of correction functions and approaches reported in the literature. Regression-based correction methods, where model errors are learned by an auxiliary regression model, include linear regression-based error correction [
31], Ridge and Lasso regression-based correction [
32], and kernel regression correction using radial basis function (RBF), polynomial, or sigmoid kernels [
33]. Deep learning-based correction functions include autoencoder (AE)-based error correction using reconstruction errors [
34], encoder–decoder-based correction schemes [
35], and residual correction approaches following the ResNet principle [
36]. Ensemble-based correction approaches involve boosting-type error correction [
37] and stacking-based correction [
38]. Bayesian correction methods include the application of Kalman filters as ML-based correction tools [
39] and Gaussian process regression (GPR)-based correction [
40]. Rule-based machine learning corrections encompass rough set theory (RST)-based rule correction [
41] and rule-based surrogate model correction [
42]. Fuzzy logic-based rule corrections, particularly fuzzy rule-based interpolation (FRI) and rule-based extension (RBE), are especially effective when the rule base is incomplete, sparse, or uncertain. Fuzzy interpolation enables automatic estimation of missing rules, while RBE supports structural expansion of existing rule bases. These approaches play an important role in error correction and surrogate modelling, especially in industrial and energy-related applications characterised by noisy or incomplete input data [
43,
44]. Gradient-based correction functions include loss-function-based correction [
35] and residual learning for sequential models, such as LSTM residual correction [
45]. Model-based error-correction approaches include error modelling networks [
30] and bias-correction neural networks [
46]. Domain adaptation and transfer learning-based correction functions include feature-space correction methods (e.g., CORAL and MMD) [
47] and adversarial correction techniques [
48]. In the previous
Section 2, we demonstrated the applicability of electrical harmonic data for anomaly detection and machine condition clustering. Although the results indicate strong potential, further improvement of the machine learning model performance is required to achieve higher prediction accuracy and robustness. This
Section 3 examines how selected correction functions affect the performance of LSTM-based models. This need is particularly pronounced in condition-clustering tasks, where individual fault types must be explicitly learned, increasing the risk of incorrect predictions. Training and validation must therefore be performed separately for each machine and piece of equipment, as no two machines operate in exactly the same manner. Correction functions and improvement algorithms thus play a central role in ensuring reliable industrial applicability. Research on correction and error mitigation techniques is closely related to the reliability and generalisation of machine learning models [
49]. This chapter and the following one provide a structured overview of the most important categories and approaches. Reliability and fault detection are also emphasized in national research contexts. For example, at the University of Miskolc, Kovács László and colleagues have published industrial- and digital-twin-based modelling approaches addressing reliability in condition monitoring, process modelling, and AI-based decision support, particularly for manufacturing and energy systems [
50]. ML and AI methods have undergone rapid development over the past decade [
51] and have become key tools in industry, science, and everyday applications. Predictive models, neural networks, deep learning architectures, and statistical detectors have demonstrated effectiveness across a wide range of domains, from financial time-series forecasting and medical diagnostics to energy system monitoring [
52]. Nevertheless, practical deployment of these systems consistently reveals a critical issue: model errors, false anomaly detections, and inaccurate predictions [
53]. Such inaccuracies may arise from noisy or incomplete data, class imbalance [
25], faulty sensor signals [
54], excessive model fitting (overfitting) [
55], or changes in the input data distribution over time, commonly referred to as concept drift [
56]. Correction functions have emerged as an effective means of addressing these challenges by adjusting model outputs using deterministic rules or statistical formulations, such as negative-value clipping, ROC-based threshold tuning, or z-score-based outlier filtering [
57]. The need for correction and error mitigation is particularly acute in safety-critical domains, where erroneous decisions may lead to severe consequences. Examples include energy systems, where timely detection of harmonic distortions and anomalies is essential for grid stability; medical diagnostics, where false positives or negatives can have life-threatening implications; financial systems, where incorrect predictions may cause significant economic losses; and industrial equipment monitoring, where excessive false alarms can desensitize operators, while missed alarms may result in hazardous situations or costly production downtime. These examples highlight that correction and error-mitigation processes are not merely convenience features, but fundamental components of reliable machine learning systems.
3.1. PCA—Principal Component Analysis
Principal component analysis (PCA) is inherently a dimensionality reduction and transformation technique and, in a strict sense, cannot be classified as a classical correction function or an improvement algorithm [
58]. However, when applied as a correction-related preprocessing step—particularly in the context of streaming data—PCA performs a deterministic mathematical transformation of the input data [
59]. In such scenarios, the transformation is continuously updated based on the incoming data stream, while the method itself remains a deterministic preprocessing approach. The primary purpose of PCA is to filter noise and redundant features, support data cleaning, and simplify data representations [
60]. It is important to emphasise that PCA is not an adaptive learning method, as it does not incorporate feedback-driven learning mechanisms. Nevertheless, it is a highly effective tool for preprocessing high-dimensional datasets, especially when the number of input variables significantly exceeds the number of available samples. In high-dimensional settings, many variables often contribute only marginally to the overall variance of the data. In such cases, retaining only the principal components that capture the majority of the variance is justified, while low-information components can be discarded, thereby reducing the influence of noise [
61]. When an excessive number of input features is available relative to the sample size, machine learning models are prone to overfitting [
62]. Moreover, high-dimensional data substantially increase the computational complexity and memory requirements of algorithms such as clustering and machine learning methods [
63]. By applying PCA, the dimensionality of the parameter space can be reduced, leading to improved computational efficiency and more stable learning processes [
58]. As a practical example, consider the analysis of 50 different electrical harmonic amplitude components in a power system. It may be observed that only 5–6 components account for more than 95% of the total variance. PCA enables the retention of these dominant components for further analysis, while the remaining variables can be neglected without significant information loss. The results presented in
Table 4 further support this conclusion, as the first five eigenvalues collectively explain 98.053% of the total variance, thereby justifying the use of these components in subsequent processing stages.
3.2. RST—Rough Set Theory
Rough Set Theory (RST), introduced by Zdzisław Pawlak in 1982, provides a mathematical framework for handling uncertainty and imprecision in data analysis [
41]. Its fundamental premise is that, in real-world datasets—particularly those originating from industrial measurement systems—classes cannot always be sharply delineated. Overlapping regions and boundary cases inevitably emerge between categories, which may be formally characterised using lower and upper approximations. This property makes RST particularly suitable for modelling transitional or uncertain operating conditions in the context of electrical harmonic data. In the present study, a combination of a weighted LSTM-based predictive model and RST-based rule refinement was employed to enhance machine learning performance and clustering accuracy using real industrial electrical harmonic measurement data. The LSTM component learns temporal patterns embedded within harmonic current spectra, whereas the RST module provides rule-based decision refinement specifically aimed at resolving uncertain and borderline cases. The integration of the two approaches, therefore, combines data-driven temporal learning with structured, logic-based decision correction.
Figure 7 presents the confusion matrix of the combined weighted LSTM + RST model, based on proprietary industrial measurement data. In the matrix, rows correspond to predicted classes and columns to true classes. The system distinguishes three output categories: Error, OK, and Warning. For the Error class, 474 of 507 samples were correctly classified, yielding an accuracy of 88.6%. The misclassification rate is thus 11.4%. Importantly, no Error samples were misclassified as OK. From an industrial safety perspective, this is critical, as it ensures that severe faults are not misinterpreted as normal operation. In the OK class, 427 out of 465 samples were correctly classified, yielding an accuracy of 91.8%. This indicates that the model reliably captures the harmonic characteristics of normal operating conditions. The Warning class—representing transitional or borderline states—achieved 430 correct classifications out of 532 samples, corresponding to a performance of 80.8%. Given that transitional states are inherently more ambiguous and less sharply separable, a performance exceeding 80% indicates robust operation even under uncertain conditions. Analysis of the misclassification patterns shows that uncertainty is predominantly concentrated around the Warning category. Thirty-three Error samples were classified as Warning, thirty-eight OK samples were labelled as Warning, and fifty-two Warning samples were classified as Error. This structure confirms that decision uncertainty primarily arises in transitional regions rather than between extreme states. The complete absence of Error → OK misclassification further strengthens the operational reliability of the model. The observed distribution justifies the application of RST-based rule refinement, which is specifically designed for the structured handling of borderline cases. Beyond the confusion matrix, the F1-score of the LSTM + RST model was analysed. The F1-score, defined as the harmonic mean of precision and recall, is particularly informative for strongly imbalanced industrial datasets. A high F1-score indicates that the model maintains sufficient sensitivity while avoiding excessive false alarms. Performance deterioration occurs when the number of missed anomalies (FN) or false positives (FP) increases disproportionately. In general, values above 0.9 indicate excellent reliability, values between 0.8 and 0.9 indicate good balance, and values between 0.6 and 0.8 indicate moderate performance. The examined LSTM + RST configuration achieved a macro-averaged F1-Score of 0.87. This value was calculated as the arithmetic mean of class-wise F1-scores, each determined as the harmonic mean of precision and recall for the respective class. Precision denotes the proportion of correctly identified positive instances among all predicted positives for a given class, while recall represents the proportion of correctly detected instances relative to all true cases of that class. The macro-averaging procedure ensures balanced evaluation across classes irrespective of distribution differences. The obtained macro F1-score of 0.87 indicates stable and industrially reliable performance, confirming that rough set-based rule correction enhances decision consistency while preserving anomaly sensitivity and maintaining control over false alarm rates.
Overall, the integration of RST yields not merely a numerical improvement in accuracy but also enhances structural decision reliability in uncertain regions of the harmonic feature space. The hybrid LSTM–RST framework thus provides a statistically grounded and operationally secure solution for anomaly detection in industrial environments.
Figure 8 illustrates how the integration of RST-based rule refinement affects the class-wise F1-score performance of the model. For the Error class, the F1-score increased by +0.042. This indicates that the system handles faulty cases more effectively, reducing misclassifications towards the Warning or OK categories and thereby enhancing the reliability of fault detection. In industrial applications, this improvement is particularly significant, as accurate identification of critical conditions is a primary operational requirement. In the OK class, an improvement of 0.013 was observed, indicating more precise recognition of normal operating harmonic patterns. In contrast, the Warning class exhibited a decrease of −0.020 in F1-score. This behaviour results from the rule-based refinement mechanism of RST, which frequently reallocates transitional (Warning) states towards either the Error or OK categories. Consequently, the standalone performance metric of the Warning class declines. However, this is not an unintended side effect but a deliberate methodological choice. The purpose of the RST component is to clarify uncertain boundary regions by structurally deciding whether a given state should be interpreted as a fault (Error) or normal operation (OK). Within this framework, the Warning category functions as an intermediate auxiliary class that is systematically reduced by the weighted, rule-based refinement process to reduce decision ambiguity. In industrial contexts, this represents an accepted and justifiable strategy, since the interpretation of Warning states is often less definitive than that of extreme conditions. Overall system stability and operational reliability therefore improve, even if the F1-score of the intermediate class decreases.
Figure 9 presents a class-wise comparison of F1-score values. This figure is particularly informative, as it simultaneously illustrates the performance of the standalone RST rule-based system (RST), the genetically optimised LSTM model (GA-LSTM), and the combined LSTM + RST approach. The results indicate that the GA-optimised LSTM alone exhibits weak performance in the Warning and OK classes. The approximate F1-score values for the GA-LSTM model are as follows: Error 0.538, OK 0.557, and Warning 0.476. These findings suggest that the LSTM tends to overgeneralise and fails to adequately capture minority or transitional operating states. The particularly low F1-score for the Warning class indicates that temporal pattern learning alone is insufficient to reliably distinguish borderline conditions. In contrast, the standalone RST approach achieves considerably higher performance, with approximate F1-scores of 0.910 for Error, 0.887 for OK, and 0.854 for Warning. This demonstrates that RST effectively manages structurally separable patterns and constructs well-defined decision rules. However, RST does not model long-term temporal dynamics, as it fundamentally relies on a static, rule-based decision mechanism. The combined LSTM + RST solution significantly outperforms both individual components across all three classes, achieving approximate F1-scores of Error 0.892, OK 0.881, and Warning 0.830. The hybrid system outperforms the standalone models across all categories, thereby confirming the complementary nature of the two approaches. The LSTM captures temporal dynamics within the harmonic spectra, whereas RST structurally refines and corrects the decision boundaries. Their integration therefore results in a hybrid optimum. Class-wise analysis reveals that the greatest improvement occurs in the Error class, which is particularly important in industrial diagnostic environments. Overall, the joint application of GA-optimised LSTM and RST-based rule correction produces a more robust and higher-accuracy three-class condition recognition system than either method applied independently.
3.3. Bayesian Weighting
Correction functions modify the output of a model rather than its learning process; therefore, Bayesian weighting can be interpreted as a probabilistic correction factor that refines the current estimate. When applied as a correction function, Bayesian weighting corresponds to Bayesian calibration, aiming to adjust the probabilistic output of the model, that is, to perform probabilistic fine-tuning. In the context of the present study, LSTM networks are particularly well-suited to predicting time-dependent patterns and identifying anomalies. However, the predictive output of LSTM models may become noisy or uncertain when the input data are distorted, non-stationary, or exhibit strongly varying dynamics. In such cases, Bayesian weighting can be applied effectively as a correction function to probabilistically refine model outputs without requiring retraining or modifying model parameters. The correction process can be described as follows. The LSTM network predicts the value or the error signal at the next time step. Based on the difference between the observed and predicted values—referred to as the error energy—the likelihood of the observation is computed and expressed as P(D∣H), where D denotes the observed data, and H represents the model hypothesis. Prior knowledge or a statistical model provides the prior probability P(H). Using Bayes’ theorem, these components are combined to compute the posterior probability P(H∣D). The resulting posterior probability acts as a weighting factor that corrects the model prediction before it is compared against a decision threshold. Several advantages of Bayesian weighting can be identified. First, it reduces noise and improves stability by attenuating the effects of short-term fluctuations in LSTM output. Second, it explicitly addresses uncertainty, as the posterior probability provides a direct measure of the model’s confidence in a given decision. Third, adaptive weighting does not require retraining while still responding dynamically to changes in the input data. Finally, Bayesian correction provides a unified probabilistic scale for comparing multiple model outputs or multiple sensor measurements, which is particularly beneficial in complex industrial systems.
Overall, Bayesian weighting serves as a probabilistic filter in LSTM-based anomaly detection, correcting predictions based on observed data and prior statistical knowledge. Rather than modifying the model’s learning process, it enhances the reliability of the output, resulting in more stable, less noise-sensitive anomaly assessment. The practical application of Bayesian weighting for improving the performance of an LSTM model trained on real electrical harmonic data for anomaly detection is presented in detail in
Section 4.5.
3.4. SMOTE—Synthetic Minority Oversampling Technique
SMOTE (Synthetic Minority Oversampling Technique) is a data-balancing method specifically designed to address class imbalance in machine learning. Class imbalance occurs when the number of samples belonging to different classes in a dataset differs significantly. For example, an anomaly detection system may contain 9500 samples representing normal operating conditions and 500 samples representing faulty states, corresponding to a 95–5% distribution. This imbalance poses a substantial challenge, as most machine learning models tend to bias their predictions towards the majority class. In the above example, a classifier that always predicts the “normal” class would achieve an apparent accuracy of 95%, despite being entirely incapable of detecting faults. Consequently, the decision boundary becomes skewed, and the model fails to adequately represent the minority class, which typically corresponds to rare but critical events. A similar situation may arise in electrical harmonic measurements, where the number of normal operational samples far exceeds that of anomalous or fault-related samples. As a result, models may naturally favour the normal class during training. Unlike simple resampling strategies that duplicate minority-class samples, SMOTE generates synthetic samples by interpolating between neighbouring minority instances. By creating new data points within the local feature space of the minority class, the algorithm improves class balance and supports the formation of a more representative decision boundary. The study reported in [
25] demonstrates that combining oversampling of the minority (abnormal) class with undersampling of the majority (normal) class yields superior classification performance in ROC space compared with majority undersampling alone. In the current phase of this research, however, sufficient real measurement data are available; therefore, generating synthetic data is not considered necessary. The application of SMOTE may become relevant in future research stages, particularly in scenarios where only limited fault samples are available and where under- or over-alarming could significantly affect the operational reliability and economic performance of industrial equipment.
3.5. GMM—Gaussian Mixture Model
The Gaussian mixture model (GMM) is a probabilistic, generative model that assumes the observed data are generated from a mixture of multiple Gaussian distributions [
61]. GMM is a classical soft-clustering approach that assigns each data point a probabilistic cluster membership. The underlying assumption of the model is that each data point originates from a latent Gaussian component. These components are hidden, meaning that it is not directly observable which component generated a given data point. The objective of the GMM is to identify the parameters of these latent distributions. Rather than defining rigid clusters (e.g., “three clusters”), the GMM assumes that the data arise from a mixture of a given number of Gaussian distributions, with each data point belonging to one distribution with a given probability. Parameter estimation is performed using the expectation–maximization (EM) algorithm [
64]. For each Gaussian component, the model estimates the mean vector (
μ), covariance matrix (
Σ), and mixing coefficient (
π). The EM algorithm consists of two iterative steps. In the E-step, the posterior probabilities that each data point belongs to each Gaussian component are computed (soft assignment). In the M-step, these probabilities are used to update the parameters of the Gaussian components. The E and M steps are repeated until convergence is achieved. One of the key advantages of GMM over hard-clustering algorithms, such as k-means, is its ability to model uncertainty in cluster membership. While k-means assigns each data point to exactly one cluster, GMM provides a probability distribution over clusters for each point. For example, a given sample may belong to the second cluster with a probability of 70%, the first cluster with 25%, and the third cluster with 5%. As a result, GMM offers a more flexible and general framework, capable of modelling non-spherical clusters and explicitly handling uncertainty. In the present study, the GMM is not employed as a standalone clustering technique, but rather as a correction function applied to electrical harmonic data. Its purpose is not to define new clusters, but to probabilistically refine existing predictions and state classifications. The GMM was trained and saved using proprietary measurement data in a MATLAB R2025b environment. The results presented in
Table 5 were obtained using custom MATLAB R2025b code. For data evaluation, the previously trained and saved optimised GMM and its parameters were applied. The GMM-optimised results in
Table 5 were compared with the Bayesian-optimised predicted datasets presented in
Table 6. In both cases, identical electrical harmonic test datasets were used. Based on this comparison, several important observations can be made. Following GMM optimisation, samples classified as Warning were significantly redistributed. The model produced different warning indications for the Warning and OK state clusters when applied to the same electrical harmonic data. Time instances that were previously labelled as Warning no longer appeared in this category, while the positions of samples classified as OK also changed. Overall, the number of Warning alerts increased, indicating that the GMM-based correction reacts more sensitively to uncertain or borderline operating conditions.
The discrepancy observed between the two tables should not be interpreted as an error, but rather as a direct consequence of the fundamentally different nature and learning paradigms of the applied models. The Gaussian Mixture Model (GMM) employs an unsupervised optimisation approach that identifies clusters solely based on the internal statistical structure and distribution of the data. In contrast, the Bayesian-optimised LSTM operates within a supervised learning framework and constructs its decision surface according to human-defined class labels. These two approaches are based on different theoretical principles and therefore learn different representations and decision boundaries. The GMM does not utilise categorical labels (OK/Warning/Error); instead, it derives Gaussian components and their boundaries from the geometric and statistical structure of the data distribution. As a result, the GMM-identified clusters that exhibit “Warning-like” behaviour cannot be mapped one-to-one to the manually labelled Warning category. Rather, they consist of data points probabilistically associated with a given Gaussian component. This behaviour inherently differs from the output of supervised learning models. In contrast, the Bayesian-optimised LSTM learns a decision logic whose objective function explicitly aims to optimally fit the labelled data. The model minimises the error defined by Output = Y (OK/Warning/Error) and learns where the Warning class is located in label space, rather than in terms of data distribution geometry. Consequently, when applied to the same dataset, the GMM and the LSTM may assign the Warning label to different samples. This behaviour is clearly illustrated by samples 441–467. According to the GMM, samples 459–466 belong to the Warning-like component, whereas the Bayesian-optimised LSTM identifies samples 441, 445, 447, and 451 as Warning. This discrepancy does not indicate a modelling error but rather reflects the fact that the GMM follows the statistical structure of the data, while the LSTM learns patterns associated with the provided labels. In summary, GMM clustering follows the statistical structure of the data distribution, whereas the Bayesian-optimised LSTM follows the labelled class structure. Since the data distribution and the labelling scheme do not perfectly coincide, Warning and OK states appear at different positions in the outputs of the two optimisation approaches. This difference is not a flaw, but a natural consequence of the differing model assumptions. The final decision on which optimisation strategy to apply must be based on validation against true machine condition data. Subsequently, a confusion matrix analysis is performed. Both confusion matrices were generated using the same input dataset. Due to space limitations,
Table 6 presents only an excerpt of the complete dataset comprising 565 records, including electrical harmonic measurement data and the corresponding clustering results. The index numbers of the analysed data points are consistent with the indices of their assigned cluster labels in all cases.
Figure 10 first illustrates the extent to which the unsupervised GMM—fitted exclusively to the data distribution—agrees with the true Alarm Type labels. The results indicate that the Error class cannot be clearly separated statistically, while substantial overlap exists between the OK and Warning classes. Although the Warning category exhibits partial separation, the distinction is far from complete. Quantitatively, 100% of true Error samples are assigned by the GMM to either the OK or Warning clusters. Furthermore, samples labelled as OK in the original Alarm Type do not form a compact, well-separated Gaussian cluster. Instead, the GMM assigns portions of these samples to the OK, Warning, and Error clusters. This behaviour is common in real industrial systems, where the OK state encompasses multiple operating regimes with different variances and amplitude ranges. When OK samples appear in multiple statistically distinct subgroups, the GMM does not identify them as a single homogeneous cluster but decomposes them into several separate components. Consequently, the clusters produced by the GMM cannot be directly mapped to the manually defined Alarm Type categories. According to the confusion matrix, the GMM Error output cluster contains only OK (17.7%) and Warning (11.7%) labelled samples, with no true Error samples, resulting in a precision of 0.0%. The GMM OK cluster consists of 7.6% Error, 15.9% OK, and 19.3% Warning samples, corresponding to a precision of 37.1%. The GMM Warning cluster contains 1.4% Error, 14.0% OK, and 12.2% Warning samples, yielding a precision of 44.3%. Recall values further indicate that the GMM fails to correctly identify a substantial portion of the true classes. The recall for the Error class is 0.0%, while the recall for the OK and Warning classes is 33.4% and 28.3%, respectively. Precision values for the GMM Error, OK, and Warning clusters are 0.0%, 33.4%, and 28.3%, respectively. The overall accuracy of 28.2% clearly highlights the limitations of the unsupervised approach in reproducing manually labelled classes. In conclusion, the OK category does not form a clean, homogeneous group from a distributional perspective; therefore, the GMM assigns these samples to multiple distinct clusters. As a result, the Alert Type clusters generated by the GMM cannot be unambiguously matched to the Alarm Type labels used during training. This discrepancy arises because the Alarm Type categories are not Gaussian and exhibit significant statistical overlap. Under such conditions, an unsupervised GMM cannot establish sharply defined class boundaries.
Figure 11 illustrates the degree to which the output of the LSTM—trained in a supervised manner using labelled data—corresponds to the patterns identified by the Gaussian Mixture Model (GMM). The objective of this analysis is to investigate whether the LSTM can relearn the structure uncovered by the GMM based on the data’s statistical distribution. In other words, we examine whether the LSTM follows the clusters constructed by the GMM from the same dataset. This comparison is relevant because a high similarity between the LSTM output and the GMM clusters would indicate that both models perceive the same underlying data structure. In practice, however, such similarity is not expected, and discrepancies are entirely natural. In this confusion matrix (CM), the GMM clusters constitute the Target, while the LSTM output represents the Output (prediction). Therefore, the analysis does not compare manually assigned labels; instead, it evaluates whether the LSTM can reproduce the patterns identified by the GMM. The row-wise recognition capability, quantified by the recall metric, indicates how well the LSTM identifies the clusters formed by the GMM. The recall values for the Error and Warning clusters are both 0%, while the OK cluster achieves a recall of 31.1%. This result indicates that the LSTM is effectively unable to reliably identify any of the GMM-defined clusters. Column-wise accuracy, described by the precision metric, reflects the extent to which the LSTM output categories correspond to the GMM clusters. The precision values for the LSTM Error, OK, and Warning outputs are 0%, 31.1%, and 14.8%, respectively. These results clearly demonstrate that the Warning and Error predictions produced by the LSTM do not align with the cluster structure identified by the GMM. In summary, when applied to the same dataset, the LSTM and the GMM construct fundamentally different decision surfaces. Consequently, the LSTM is neither capable of nor intended to reproduce the clusters generated by the GMM. The OK and Warning clusters identified by the GMM are not based on temporal patterns, and therefore, the LSTM does not interpret them as Warning-type events. Similarly, the Error cluster identified by the GMM does not form a well-separated structure, leading the LSTM to classify these samples as OK or Warning. The GMM captures only the statistical structure of the data, whereas the LSTM learns sequential and label-driven patterns. These represent two fundamentally different modelling paradigms, and as a result, the two confusion matrices reflect two distinct perspectives on the same data.
Figure 12 presents the clustering results obtained using the Gaussian Mixture Model (GMM) applied to the training feature vectors extracted from the LSTM network, after projecting the high-dimensional representation into a two-dimensional space (PC1–PC2) using principal component analysis. The first principal component (PC1) accounts for 56.5% of the total variance, while PC2 accounts for an additional 20.9%, indicating that the two components together capture approximately 77.4% of the overall dispersion. The clusters shown in the figure correspond to the statistical components fitted by the GMM, rather than to the original Alarm Type labels. The designations “Error”, “OK”, and “Warning” assigned to the colours are therefore purely interpretative and do not imply a one-to-one correspondence between the mixture components and the supervised classes. The blue points form a well-separated region on the left-hand side along the PC1 axis (PC1 ≈ −1.3 to −0.6; PC2 ≈ −1.3 to +1.1). This cluster is relatively compact and dense and exhibits linear separability from the central mass along PC1. This suggests that, within the LSTM-learned feature space, there exists a dominant structural deviation that the GMM models as an independent Gaussian component. It is important to emphasise that this component does not necessarily correspond to the Alarm Type “Error” category; rather, it represents a statistically coherent and distinct high-probability density region. If this cluster contains a substantial proportion of multiple labelled classes in the confusion matrix, this is a natural consequence of density-based optimisation in GMM. The yellow points constitute the largest and most centrally located cluster (PC1 ≈ −0.4 to +0.3; PC2 ≈ −0.9 to +0.7). This region represents the principal density mass of the feature space. The elliptical structure of the cluster indicates that one of the high-covariance Gaussian components of the GMM models the majority of the samples. As this is the dominant density region, a significant proportion of the different Alarm Type labels project into it. Consequently, considerable label overlap is observed in this area, which directly explains the moderate precision and recall values: statistical homogeneity does not coincide with semantic (label-based) homogeneity. The orange cluster is located in the positive PC1 range (PC1 ≈ +0.8 to +1.4; PC2 ≈ −0.3 to 0.0) and forms a small, well-localised region. This component exhibits an “island-like” structure: it is separated from both the left-hand and central masses and displays relatively low internal variance. From a statistical perspective, it may be regarded as a low-prior-weight but well-defined Gaussian component. If the Alarm Type “OK” label only partially falls within this region, this indicates that the labelled class does not follow a homogeneous Gaussian structure, but rather is distributed across multiple density regions. Within the PCA-projected space, three clearly identifiable Gaussian-like density regions emerge: (1) a large, centrally dominant mass (yellow), (2) a distinctly separated component on the left-hand side (blue), and (3) a small peripheral cluster on the right-hand side (orange). The key observation is that the GMM models the probabilistic structure of the feature space rather than the labelled class boundaries. The Alarm Type categories do not appear in the PCA space as three well-separated Gaussian distributions. Instead, substantial overlap is observed, particularly in the central region, where most samples are concentrated. As a consequence, the components optimally fitted by the GMM are statistically coherent and interpretable in probabilistic terms; however, they are not isomorphic with the supervised labelled classes. The clustering therefore primarily reveals the internal structural organisation of the representation, rather than reconstructing the semantic categories.
Figure 13 clearly demonstrates that the density structure of the representation learned by the LSTM does not directly correspond to the Alarm Type categories. Although the three GMM components represent existing and well-defined statistical regions, the labelled classes are distributed across multiple density domains. This finding supports the conclusion that a purely unsupervised GMM approach is insufficient on its own for the precise reconstruction of labelled anomaly types, and that the integration of an additional semantic layer (e.g., rule-based or supervised mechanisms) is justified.
Figure 13 presents the true Alarm Type labels in the PCA space, based on the two-dimensional projection of the feature vectors learned by the LSTM. The explained variance of the principal components is PC1 = 56.5% and PC2 = 20.9%; thus, the two dimensions together represent approximately 77.4% of the total variance. Accordingly, the visualisation reflects the essential structure of the learned feature space, and the observed overlaps are not artefacts of projection distortion. The OK and Warning classes exhibit substantial overlap in the central region. The orange (OK) and yellow (Warning) points largely coincide, particularly within the ranges PC1 ≈ −0.3 to +0.4 and PC2 ≈ −0.6 to +0.6. No visually separable OK or Warning regions can be identified; instead, their distributions form a continuous transition. In statistical terms, this indicates that the two classes do not manifest as Gaussian-like, disjoint density components, which directly explains why the GMM was unable to establish a clear boundary between them. The Error class (blue) does not form a single compact cluster but appears in several partially separated regions. A left-hand grouping can be observed for PC1 < −0.8, while numerous Error points are embedded within the central mass among OK and Warning samples. In addition, smaller scattered point groups occur in the positive PC1 range. This implies that the Error class does not occupy a single, well-defined region in the feature space but exhibits a fragmented structure. As a consequence, the GMM could not fit a distinct, purely Error component, and a significant portion of Error samples in the LSTM representation are embedded within the Warning/OK domain. The peripheral, less frequent points on both the left and right—where OK, Warning, and Error labels all occur—correspond to low-density regions. These likely represent rare and extreme harmonic patterns; however, they are not associated exclusively with a single Alarm Type category. This further reinforces the observation that the label boundaries do not coincide with the natural density structure of the data. In summary, the OK and Warning classes exhibit pronounced overlap in the feature space and therefore cannot be sharply separated statistically. The Error class is fragmented and partially merges with the central region, thus failing to form an independent, clearly identifiable Gaussian component. In the PCA space, the three Alarm Type categories do not appear as three distinct Gaussian distributions but rather form a continuum with transitions of the type OK ↔ Warning ↔ Error. The GMM optimisation results reflect this structure: the distribution-based (unsupervised) organisation of the data differs substantially from the semantic logic of the labels. The clusters are statistically valid, yet none of the components corresponds unambiguously to a single Alarm Type class. This explains the mixed confusion matrix and the structural discrepancy between the LSTM–GMM results.
The Gaussian Mixture Model (GMM) is a probabilistic generative clustering approach that models the data distribution as a mixture of multiple Gaussian components, with parameters estimated using the Expectation–Maximisation (EM) algorithm. Unlike hard clustering methods such as k-means, the GMM performs soft clustering by assigning posterior membership probabilities to each data point, allowing for a more flexible representation of non-spherical and overlapping clusters. In this study, the GMM was applied to feature vectors extracted from an LSTM network, and its outputs were compared with predictions generated by a Bayes-optimised supervised LSTM model. The analysis demonstrated that the unsupervised GMM captures the intrinsic density structure of the learned feature space, whereas the supervised LSTM learns a decision surface aligned with the labelled Alarm Type categories. Consequently, discrepancies between GMM-derived clusters and labelled Warning/OK/Error classes are not indicative of modelling errors but reflect the fundamentally different learning paradigms. Confusion matrix analysis and PCA-based visualisation revealed that the Alarm Type categories do not form three clearly separable Gaussian distributions in the feature space. Significant overlap was observed between the OK and Warning classes, while the Error class exhibited a fragmented and distributed structure. This explains the low recall and precision of the GMM when evaluated against the labelled categories. Overall, the GMM effectively uncovers the statistical structure of the data but is insufficient on its own to accurately reconstruct labelled anomaly types. The findings support the integration of unsupervised density modelling and supervised classification components, in which the GMM provides structural insights into the data distribution and supervised models provide semantically aligned decision boundaries.
4. Corrective Algorithms
Corrective algorithms typically rely on more complex search- or learning-based approaches, such as genetic algorithms (GA), particle swarm optimisation (PSO), automated machine learning (AutoML), or ensemble methods. Although these techniques generally impose higher computational requirements, they enable adaptive, global-level error correction through dynamic search and optimisation. Several parallel research directions address the application of corrective algorithms. One prominent approach is residual learning and error-correction meta-models, in which a secondary model (e.g., a decision tree or a Gaussian Mixture Model) is trained on the residual errors of a primary model. This secondary model learns when and under which conditions the main model fails and can subsequently correct its output. The residual learning paradigm is therefore explicitly built on learning from the errors produced by the primary model [
36]. Closely related to this concept are layered ensemble learning techniques, such as stacking, which combine the outputs of multiple models and employ a meta-model to correct discrepancies among predictions [
38]. This approach has proven particularly effective in scenarios involving imbalanced class distributions and anomaly detection tasks. Another important research direction focuses on calibration and statistical correction techniques. Machine learning models often produce poorly calibrated probabilistic outputs, which may lead to false alarms or missed detections. Classical methods such as Platt scaling and isotonic regression are widely used to calibrate prediction probabilities in a post-processing step [
65]. In addition, statistical techniques—including z-score-based filtering, moving averages, and quantile-based thresholds—are commonly applied to reduce false alarms in real-time systems [
60]. Optimisation-based and evolutionary algorithms constitute another key class of corrective methods. Post hoc error correction and the fine-tuning of model parameters or decision thresholds are frequently addressed using genetic algorithms (GA) and particle swarm optimisation (PSO). Owing to their global search capabilities, these methods are well-suited for nonlinear, multi-parameter optimisation problems. At the data level, the synthetic minority oversampling technique (SMOTE) is a widely used corrective method for handling imbalanced class distributions [
25]. Evolutionary and optimisation-based approaches have been successfully applied in energy systems, image processing, and predictive maintenance applications [
12]. Ensemble and AutoML approaches provide additional high-level correction mechanisms. Ensemble methods—such as bagging, boosting, and stacking—combine multiple models and derive final decisions through voting or weighted aggregation. Boosting algorithms, including AdaBoost and XGBoost, explicitly focus on correcting the errors made in previous iterations [
37]. AutoML systems further automate model selection and hyperparameter optimisation and often generate ensemble-based solutions [
8]. Recent research trends include debiasing techniques and large language model (LLM)-based post hoc correction. These approaches aim to mitigate model biases, primarily through statistical and fairness-oriented corrections [
66]. An emerging direction involves using LLMs to validate and refine the outputs of other machine learning models, in which an optimisation algorithm adjusts the ML output to satisfy the system’s physical or logical constraints [
67]. A representative example is the application of a genetic algorithm to optimise LSTM predictions in accordance with the governing rules of a physical system.
4.1. PCA—Principal Component Analysis
When used as a corrective algorithm, principal component analysis (PCA) is not modified at the level of the linear transformation itself; instead, a search or optimisation algorithm adjusts the associated PCA parameters. These may include the number of retained principal components, kernel types and parameters in kernel PCA, or normalisation strategies. In this sense, PCA acts as a dynamic corrective mechanism rather than a purely deterministic preprocessing step. PCA can also be integrated into AutoML frameworks, where it is optimised as part of the overall pipeline during automated model search. In ensemble settings, multiple transformations—generated using different PCA configurations—can be combined, and the ensemble decision logic allows different dimensionality reduction strategies to reinforce each other. In such cases, PCA functions as a corrective algorithm rather than a simple feature reduction technique. In this study, the PCA operation is not illustrated with additional figures or measurements, as the applied methodology is identical to the PCA-based correction approach described in
Section 3.1. The key difference is that, here, PCA parameters are tuned by a higher-level optimisation mechanism.
4.2. GA—Genetic Algorithm
The genetic algorithm (GA) is an evolutionary optimisation technique inspired by the principles of natural selection and genetics. It is a metaheuristic evolutionary search algorithm designed to identify near-optimal solutions in complex, nonlinear, or high-dimensional search spaces where classical optimisation methods are inefficient or prone to becoming trapped in local optima [
68]. In the present context, the GA is interpreted as a corrective algorithm that post-processes model outputs or parameters to satisfy predefined physical, rule-based, or performance-related constraints. Typical applications include selecting the number of PCA components, hyperparameter optimisation of LSTM networks, and fine-tuning of anomaly detection thresholds [
69]. One of the key strengths of GA lies in its robustness: it handles nonlinear and complex search spaces effectively and often converges to near-global optima. However, it is computationally intensive and sensitive to parameter settings such as population size, crossover rate, and mutation rate [
70]. This section presents and evaluates the optimisation process of an LSTM model optimised using GA. The experiments employ the time-window-based, multidimensional regression LSTM model introduced in
Section 2. The input consists of windowed electrical harmonic components, and the model produces a nine-dimensional output representing the odd-order harmonics of the electrical signal from the 3rd to the 19th order. The task is regression-based, aiming to numerically predict the harmonic components at the next time step rather than performing classification.
The GA optimises four internal control functions that define the evolutionary process:
CreationFcn: The initial population is generated using a uniform distribution across the predefined parameter ranges, ensuring adequate coverage of the search space.
CrossoverFcn: Scattered crossover employs a random mask to determine, for each parameter, whether the offspring inherits the gene from the parent or the mate. This strategy is particularly suitable for continuous parameter optimisation.
SelectionFcn: Stochastic uniform selection is a roulette-wheel-like mechanism in which selection probabilities are proportional to fitness values, preventing premature domination by a single individual and promoting balanced evolution.
MutationFcn: Adaptive feasible mutation ensures that mutated individuals remain within valid parameter bounds. The mutation step size is dynamically adjusted based on population diversity, enabling stable and accelerated convergence.
This configuration represents a classical, stable, and widely used GA setup for continuous-valued parameter optimisation. The results of testing on real data are summarised in
Table 7. For brevity, the full set of 30 iterations is not reported, as the GA terminated after reaching the maximum allowed number of generations. The results indicate that the GA failed to identify a better solution beyond the second generation and rapidly converged to a fixed value. The best fitness value (Best f(x)) remained constant at −0.05765, which was already achieved by the end of the first generation and did not improve throughout the run. This behaviour suggests that the GA became trapped in an early local optimum or that the fitness landscape is excessively flat, making further improvements difficult. The mean fitness value (Mean f(x)) gradually converged to the same value, and within approximately 16 generations, the entire population collapsed to a single solution, indicating loss of diversity. The stall generation value of 15 confirms prolonged stagnation, with no improvement in the best fitness. Overall, the GA stagnated at an early local optimum and failed to escape during the optimisation process. The weak differentiation of the fitness landscape rendered the system insensitive to parameter variations, suggesting that the optimisation problem is effectively trivial in this configuration or that the fitness function is poorly scaled.
Potential strategies for improvement include:
Normalisation or rescaling of the fitness function.
Expansion of parameter bounds.
Stronger mutation strategies (e.g., mutation Gaussian).
Increasing population size (e.g., 200–500 individuals).
Employing alternative search strategies such as Pattern Search, MultiStart, or Bayesian optimisation.
Overall, it can be concluded that the genetic algorithm (GA), in its generic form, is not the most suitable approach for improving the performance of the LSTM model trained on electrical harmonic data. As demonstrated by the results, the GA converged rapidly to a local optimum and did not yield a meaningful improvement in predictive performance. Achieving better results would require implementing the previously discussed modifications, such as rescaling the fitness function, expanding the parameter bounds, strengthening the mutation strategy, or applying alternative optimisation techniques. In contrast, as shown in
Section 4.3, the introduction of the PSO (Particle Swarm Optimisation) approach resulted in a substantial qualitative improvement in model behaviour. The application of PSO significantly enhanced the temporal stability of the model, reduced the number of state transitions and segmentation boundaries, and led to more persistent and coherent event detection. These findings suggest that methods that explicitly address temporal smoothing and state consistency may be more effective for anomaly detection in electrical harmonic data than optimisation strategies relying solely on global parameter tuning, such as genetic algorithms.
4.3. PSO—Particle Swarm Optimisation
Particle Swarm Optimisation (PSO) is an evolutionary optimisation algorithm introduced by Kennedy and Eberhart in 1995 [
12]. The method is inspired by the collective behaviour of social organisms, such as bird flocks or fish schools. In PSO, individual particles move within a multidimensional search space, where each particle represents a candidate solution. Each particle is characterised by its current position (the present solution), a velocity vector defining the direction and magnitude of movement, and a memory storing its historically best solution (personal best). In addition, particles are influenced by the best solution found by the entire swarm (global best). By combining individual experience with collective knowledge, the algorithm iteratively converges toward an optimal solution [
71]. Similar to genetic algorithms, PSO can be applied as a post hoc optimisation technique to adjust model parameters or outputs in order to satisfy physical constraints, rule-based requirements, or performance objectives. Typical application areas include hyperparameter optimisation for neural networks, LSTM models and PCA-based methods, parameter tuning of dimensionality reduction techniques, rule discovery, feature selection, and fine-tuning of control and regulation systems [
70]. The main advantages of PSO are its relatively simple implementation, the limited number of parameters requiring manual tuning, strong global search capability, potential to escape local minima, and efficient parallelisation. However, PSO may suffer from premature convergence if the swarm becomes overly homogeneous. Its performance is sensitive to the selection of inertia weight, acceleration coefficients, and velocity limits, and convergence to the global optimum cannot be guaranteed. In this study, PSO was investigated to assess its impact on improving the performance of the proposed machine learning model for anomaly detection based on electrical harmonic data. The validation results are presented in the following subsections, using real measured electrical harmonic datasets. The PSO-based optimisation and the saving of the optimised ML model were implemented using custom MATLAB R2025b code developed by the authors and applied to proprietary measurement data.
Figure 14 illustrates the comparison between the model performance before and after optimisation. It can be observed that the bars corresponding to the pre- and post-optimisation states are almost identical, indicating that applying Particle Swarm Optimisation (PSO) did not degrade the model’s ability to detect anomalies. The model remains capable of identifying anomalous events after optimisation, confirming that the overall detection capability is preserved. The false positive rate (FPR) also remains nearly unchanged before and after optimisation. This suggests that the PSO-based optimisation did not increase the number of false alarms, meaning that the system did not become more sensitive to normal operating conditions being incorrectly classified as anomalies. Consequently, it can be concluded that the PSO did not deteriorate detection quality and did not introduce additional false alarms. Stability quantifies the extent to which the optimisation process alters the decision behaviour of the model relative to its initial state. The stability metric is defined as a normalised change ratio (Stability = flip/N), where flip denotes the number of data instances for which the model’s predicted output changes due to optimisation, and N represents the total number of evaluated instances. As shown in
Figure 11, the stability values before and after optimisation are identical, indicating that PSO did not induce any measurable change in the model’s decision structure. Therefore, no improvement in normalized stability is observed in this case; however, the preservation of the original stable behaviour of the model is clearly demonstrated.
Figure 15 illustrates the change in model stability using explicit numerical values. In the baseline configuration, the model output exhibits a total of 434 state transitions (flips), whereas in the PSO-optimised model this number is reduced to only 66. This corresponds to a reduction by more than a factor of six, clearly indicating a substantial improvement in decision stability. The flip count quantifies impulsive fluctuations in the model output, that is, how frequently the prediction switches between states (e.g., OK → anomaly → OK) within a short time interval. A high flip count is typically associated with noise-sensitive behaviour, unstable decision boundaries, and overreactive anomaly detection. In contrast, the pronounced reduction in flip events demonstrates that the PSO-optimised model maintains its decisions consistently over longer time horizons and exhibits reduced sensitivity to low-amplitude, impulsive noise present in electrical harmonic measurement data. This behaviour is particularly important in real industrial systems, where excessively frequent state transitions may lead to unnecessary alarms, increased maintenance workload, and reduced trust in automated monitoring systems. In the PSO-optimised model, the decision output is significantly cleaner, more continuous, and temporally coherent, thereby enabling more reliable anomaly signalling. Although PSO-based optimisation did not yield measurable improvements in detection performance or stability metrics, its application still provides relevant and informative insights. One important outcome is that PSO did not degrade the anomaly detection capability, did not increase the false-positive rate, and did not alter the model’s decision structure. This property is particularly important in industrial environments, where optimisation procedures often introduce instability or unintended side effects. In this context, PSO can be interpreted as a robustness test, demonstrating that the LSTM-based model maintains stable decision behaviour under a global, population-based optimisation process. This indicates that the model is already operating in a well-tuned region of the parameter space, where further global hyperparameter optimisation has limited impact. Under such conditions, the primary performance bottlenecks are more likely related to model architecture, input feature representation, or the handling of temporal dynamics rather than parameter selection alone. Furthermore, the PSO results serve as a baseline reference for evaluating more advanced correction and stabilization methods. Since PSO neither degraded nor significantly improved performance, it clearly delineates the limitations of global optimisation in this application. This observation justifies the introduction of methods that explicitly address temporal consistency, state transitions, and segmentation behaviour. In summary, PSO-based optimisation does not degrade detection performance in terms of recall or false positive rate, while it leads to a substantial improvement in system stability. This implies that the optimised model performs fewer unnecessary state transitions, exhibits reduced sensitivity to measurement noise, and provides temporally more consistent anomaly indications. In noisy measurement environments, such stability is a critical performance factor, and the pronounced reduction in decision flips provides clear evidence of PSO’s practical relevance in this context. However, the role of PSO in this study is not to deliver the ultimate performance improvement, but rather to demonstrate that global optimisation alone is insufficient for handling the strongly non-stationary patterns inherent in electrical harmonic data. This finding conveys an important methodological insight and motivates the adoption of time-consistency-oriented corrective algorithms, which are introduced and analysed in the subsequent sections.
4.4. SA—Simulated Annealing
Simulated annealing (SA) is a stochastic optimisation algorithm inspired by the thermal annealing process in metallurgy. Its objective is to approximate the global minimum of a given cost function while avoiding entrapment in local minima [
72]. The core principle of SA is that, at the beginning of the search, the system operates at a high “temperature,” allowing worse solutions to be accepted with a certain probability. As the temperature gradually decreases, the algorithm increasingly favours better solutions, and in the final phase, the system stabilises and converges toward a near-optimal solution [
73]. SA has been widely applied to hyperparameter optimisation in machine learning models (e.g., LSTM, CNN, SVM), feature selection tasks, global cost-function optimisation, as well as nonlinear system identification and control parameter tuning [
74]. Its main advantages include simplicity, robustness, the ability to escape local minima, and the fact that it does not require derivative information of the objective function. However, SA typically exhibits slow convergence, especially in large search spaces, and its performance strongly depends on the cooling schedule; moreover, convergence to the global optimum is not guaranteed. In this section, we demonstrate how weighting factors can be interpreted and applied in different ways within the simulated annealing framework, with particular emphasis on LSTM-based models trained on electrical harmonic data. A brief methodological comparison is also provided between the weighting concept in SA and that used in sliding mode control (SMC). In SA, weights act as priority factors within the objective function and as scaling parameters for perturbations, whereas in SMC the coefficients of the sliding surface (c
1, c
2, …) determine the system dynamics. In the present study, the weights were derived from the total harmonic distortion (THD) associated with individual harmonic orders, since variations in THD closely follow changes in the amplitudes of electrical harmonic components. An SA-optimised LSTM model was subsequently employed for anomaly detection using real electrical harmonic measurement data. The SA optimisation was implemented using custom MATLAB R2025b code, and the optimised model was based on the LSTM architecture introduced in
Section 2. All data originated from proprietary measurement datasets, and the optimised model was stored for subsequent anomaly detection tasks.
Figure 16 presents the evolution of the validation RMSE during simulated annealing (SA) optimisation, together with the THD-related auxiliary metric. The blue curve represents the validation RMSE [A], while the purple dotted curve shows the THD-based metric (right axis). The red dashed line marks the best RMSE value identified during the optimisation process: Best RMSE = 0.0645 A. During the initial iterations (approximately 1–6), the RMSE exhibits noticeable fluctuations, ranging between approximately 0.065 A and 0.083 A. Such behaviour is consistent with the exploratory phase of simulated annealing, where parameter perturbations allow the search process to sample different regions of the hyperparameter space. The lowest RMSE value is reached relatively early (around iterations 5–6), after which subsequent iterations produce fluctuations within a moderate range (approximately 0.07–0.084 A) without identifying substantially better solutions. This suggests that the optimisation converged toward a near-optimal region early in the search. The THD-related metric follows a qualitatively similar trend, stabilising after the initial exploratory phase within the range of approximately 0.05–0.055. Importantly, no large divergence between the RMSE and THD trends is observed, indicating that optimisation of the statistical loss does not lead to deterioration of the physically motivated THD metric. The baseline validation RMSE (prior to optimisation) was 0.0725 A, and the SA-optimised configuration achieved 0.0645 A, corresponding to a relative improvement of approximately 0.89%. Although numerically modest, this reduction is meaningful in low-variance industrial prediction settings, where incremental improvements in error magnitude can significantly influence downstream threshold-based detection performance. Overall, the SA process demonstrates stable optimisation behaviour: rapid identification of a high-quality parameter region followed by bounded fluctuations without progressive degradation. This supports the use of SA as a viable hyperparameter tuning method for LSTM-based prediction in harmonic anomaly detection tasks.
Figure 17 compares the validation RMSE of the fine-tuned baseline LSTM model with the SA-optimised configuration. The baseline model achieves a validation RMSE of 0.0725 A, while the simulated annealing (SA) optimisation reduces this value to 0.0645 A. This corresponds to an absolute reduction of 0.0080 A and a relative improvement of approximately 11.0%. Although the baseline model already exhibits stable predictive performance, the SA-based hyperparameter tuning yields a measurable reduction in reconstruction error. This indicates that additional performance gains can be achieved through systematic exploration of the hyperparameter space, even when starting from a reasonably well-calibrated configuration. The magnitude of improvement is particularly relevant in industrial time-series applications, where anomaly detection sensitivity is strongly influenced by small changes in prediction error distribution. A reduction of this scale can directly affect threshold-based decision boundaries and false alarm behaviour. It is important to note that simulated annealing does not guarantee identification of the global optimum. However, the optimisation trajectory (see
Figure 16) suggests that the algorithm rapidly located a high-quality region of the parameter space within the first few iterations and subsequently refined the solution through bounded stochastic perturbations. Overall, the results demonstrate that simulated annealing provides effective fine-tuning capability for LSTM-based harmonic prediction models, leading to a statistically meaningful reduction in validation error without destabilising the model behaviour.
4.5. Bayesian Weighting
Bayesian weighting is a probabilistic approach that assigns weights to models, predictions, or decisions using Bayes’ theorem to explicitly account for uncertainty. In contrast to classical approaches that select a single optimal model, Bayesian weighting combines the outputs of multiple models or hypotheses according to their posterior probabilities. This method is particularly well-suited for systems in which decision outcomes are noisy, partially observable, or where multiple competing hypotheses coexist [
75]. The core principle of Bayesian weighting is that, during decision-making, the contribution of each model or hypothesis is proportional to its posterior probability. As a result, the final decision is uncertainty-calibrated, reflecting both the amount of information contained in the observations and their associated reliability. Bayesian model combination techniques, such as Bayesian Model Averaging (BMA) and Bayesian ensemble methods, widely employ this concept in machine learning and control engineering applications, especially in scenarios where explicit treatment of model uncertainty is essential [
62]. Formally, Bayes’ theorem describes how the probability of a hypothesis is updated in light of new observations. Equation (2) presents the classical formulation of Bayes’ theorem, expressing the relationship between conditional probabilities.
where
H is the hypothesis (e.g., the correctness of a machine learning model or the presence of a component fault),
D is the observed data,
P(
H) is the prior probability, expressing the initial belief about the hypothesis,
P(
D∣
H) is the likelihood, i.e., the probability of observing the data given that the hypothesis is true,
P(
D) denotes the evidence or marginal likelihood, representing the overall probability of the observed data, and
P(
H∣
D) is the posterior probability, which reflects the updated belief in the hypothesis after incorporating the observed data.
Illustrative example:
Assume that the prior probability of a transformer being in a faulty state is
P(
H) = 0.05. If the transformer is indeed faulty, the probability of detecting a sensor anomaly is
P(
D∣
H) = 0.9. The overall occurrence probability of anomalies in the system (evidence) is
P(
D) = 0.2.
Thus, when a sensor reports an anomaly, the posterior probability of a fault increases to 22.5%, which is more than four times higher than the original 5% prior probability. This simple example clearly illustrates how Bayesian weighting can substantially modify decision confidence by incorporating new observations. Bayesian weighting is widely used in industrial systems for managing decision uncertainty, multi-model fault detection, fuzzy–Bayesian fault diagnosis, and the robust fusion of predictive outputs. The approach can be seamlessly integrated into machine learning-based state estimators, used alongside LSTM models to refine their outputs, and applied within fault-tolerant control systems. This concept is also supported by the classical work of Korondi Péter and Hashimoto Hiroshi, who addressed the uncertain and noisy nature of tactile sensing using fuzzy rule-based decision mechanisms. Their approach demonstrated stable, adaptive control across varying environmental conditions. The authors showed that fuzzy decision logic can effectively integrate sensing uncertainty into control systems, which is functionally closely related to Bayesian uncertainty handling. Such fuzzy–probabilistic decision fusion is particularly advantageous in industrial processes, as it enables robust combination of predictions, state estimations, and fault-detection modules even in the presence of variable or partially unreliable input data [
76].
Table 8 presents the electrical harmonic measurement data used for Bayesian optimisation. In the following section, the practical application of Bayesian optimisation for anomaly detection using electrical harmonic data is demonstrated. The electrical harmonic datasets were evaluated using a custom-developed model implemented in MATLAB R2025b. Due to space limitations, the full dataset comprising 3721 samples cannot be presented; therefore, only the first 19 rows are included in this section for illustrative purposes.
Anomaly detection was performed on the electrical harmonic test data using the trained model. Naturally, the test dataset does not contain any clustering or labelling information; these attributes are assigned by the model during the prediction process. As a result of the testing phase, several Warning state indications appeared among the predicted Alarm Type classes. In the following, these classifications are validated using the methods currently available. Direct validation based on actual machine condition data is unfortunately not feasible, as the corresponding power network analysis measurements were conducted approximately four years ago. A retrospective review of the SAP system for the same period revealed no recorded machine failures. It should also be emphasised that a Warning indication does not represent an actual fault, but rather a precautionary signal indicating the possibility of a future malfunction. Nevertheless, based on the available time series and numerical patterns, certain regularities and correlations can be identified that may, in theory, support the correctness of the clustering performed by the machine learning model. However, it is important to stress that this validation remains limited in scope. The results clearly indicate the need for additional power network analysis measurements and detailed SAP machine condition records, both synchronised using precise timestamps. Such synchronisation would be essential for reliably comparing model outputs with real operational states.
Table 9 presents the results of the multi-parameter Bayesian optimisation executed in the MATLAB environment after 30 iterations. In the following, the iteration table of this specific Bayesian optimisation run is analysed in detail, providing a comprehensive log of the optimisation process. During optimisation, each iteration evaluates a new parameter combination, and the objective function is evaluated and estimated. Subsequently, the Bayesian surrogate model is updated, and the next sampling point is selected based on the acquisition function. The objective values exhibit very rapid convergence: in the first iteration the objective value is 0.0616, in the second iteration 0.0667, followed by values in the range of 0.0667–0.0680 during the third and fourth iterations. From the sixth iteration onward, the objective value stabilises at 0.058008. This value effectively represents the global minimum, as no lower objective value was identified in the remaining 25+ iterations. This behaviour is also clearly reflected in the minimum-objective curve, which becomes completely flat after the sixth function evaluation. This indicates the existence of a well-defined and stable optimum in the search space, to which the Bayesian optimisation converged rapidly and reliably. This conclusion is further supported by the fact that the estimated and observed objective values coincide (0.058008), indicating an excellent fit of the surrogate model. The convergence of the parameters is equally well defined: the threshold parameter converges to approximately 90, the number of clusters (k) to 6, while the weight parameters converge to w3 ≈ 0.0277 and w5 ≈ 0.37309, with the objective value remaining at 0.058008. Due to the exploratory nature of Bayesian optimisation, the algorithm continues to generate and evaluate new samples even after reaching the optimum. Consequently, several accepted iterations can be observed with objective values around 0.06, while the BestSoFar value remains unchanged. The relatively wide fluctuations of the weight parameters w3 and w5—ranging from values around 0.02 to approximately 0.9 or even on the order of 10
−4—indicate that these parameters primarily fine-tune the objective function. Multiple parameter combinations can produce near-optimal solutions, and the Bayesian model effectively smooths the relevant dimensions of the search space. As a result, an optimal neighbourhood rather than a single-point optimum is formed. Nevertheless, the Best observed results clearly identify w3 ≈ 0.027 and w5 ≈ 0.37 as the most favourable region. In summary, the Bayesian optimisation converged rapidly and stably, with the objective value decreasing significantly and stabilising at approximately 0.058. The decision threshold was clearly optimised around 90, the optimal number of clusters was identified as k = 6, and the weight parameters converged into a robust and stable range. The relationship between the minimum objective value and the number of evaluations exhibits a textbook-like, smooth convergence behaviour without irregularities. It is important to emphasise that the unusually fast convergence of the Bayesian optimisation does not indicate insufficient exploration of the search space but rather reflects favourable properties of the underlying optimisation problem. The baseline LSTM model was already well-tuned, physically constrained, and characterised by a smooth objective function. As a consequence, the global optimum was located close to the initial sampling region. This allowed the Bayesian surrogate model to accurately learn the structure of the objective surface within only 6–7 iterations, leading to the early identification of the true minimum (0.058008). After reaching this minimum, the optimisation process did not terminate prematurely. Instead, subsequent iterations continued to actively explore the parameter space near the optimum. However, none of these evaluations resulted in further improvement of the objective value. This behaviour clearly indicates the presence of a well-defined and stable global optimum within the investigated parameter domain, with no deeper minimum available. At the end of the optimisation process, both the best observed solution and the Bayesian model-estimated optimum converged to the same parameter configuration (threshold ≈ 90, k = 6, w
3 ≈ 0.03, w
5 ≈ 0.37), providing strong evidence for the robustness and reproducibility of the identified solution. Consequently, the entire optimisation procedure can be considered unambiguously successful from a Bayesian optimisation perspective, and the rapid convergence should be interpreted as a sign of good model conditioning rather than a limitation of the optimisation strategy.
Figure 18 presents the evolution of the minimum objective value as a function of the number of function evaluations during Bayesian optimisation. The horizontal axis indicates the cumulative number of evaluated configurations (0–30), while the vertical axis represents the minimum objective value observed up to a given evaluation. The blue curve denotes the minimum observed objective value obtained from actual evaluations, whereas the green curve represents the minimum estimated by the surrogate model. In the present case, the blue curve is not visually distinguishable in the figure due to its complete overlap with the green curve; the reason for this behaviour is explained in the subsequent discussion. A rapid decrease in the objective value is visible within the first few evaluations (approximately 2–6 function evaluations), where the objective drops from an initial value of around 0.047 to approximately 0.037–0.038. After this stage, both curves exhibit a stable plateau without further substantial improvement. Notably, the two curves coincide almost perfectly from the point at which the best-performing configuration has been explicitly evaluated and incorporated into the surrogate model. In Bayesian optimisation, the surrogate model is continuously updated using the actual evaluation results; therefore, once the current minimum has been sampled and no lower objective value is found, the predicted minimum and the observed minimum evolve identically. As a consequence, the green curve visually overlaps and obscures the blue curve in the figure. The complete overlap confirms that no discrepancy remains between the surrogate-predicted minimum and the actually observed minimum after convergence. The close alignment between the observed and estimated minima indicates that, in the explored region of the parameter space, the surrogate model provides a highly consistent approximation of the objective trend. However, this convergence behaviour reflects stabilisation within a high-quality region of the search space rather than formal proof of global optimality. Beyond the initial improvement phase, additional function evaluations do not produce meaningful objective reduction. This behaviour supports the practical justification of early termination of the optimisation process once objective stabilisation is observed. In industrial and computationally constrained environments, such early stabilisation is particularly relevant, as it enables reduction of computational cost and optimisation time without materially affecting the final solution quality.
Overall, the figure demonstrates efficient convergence behaviour of the Bayesian optimisation procedure for the investigated problem, characterised by rapid initial improvement followed by stable performance within a bounded objective range.
Table 10 summarizes the optimised Bayesian parameters. Based on the results, the following conclusions can be drawn. The optimisation process was stable, rapidly convergent, and exhibited good objective-function fitting. The final minimum value of the objective function was Min objective ≈ 0.0580, which stabilized after approximately five evaluations, indicating that subsequent iterations did not yield further improvement.
The threshold percentile was optimised to 90, meaning that the anomaly or decision threshold was set to the upper 10% of the distribution. This results in a stricter decision boundary, as the model considers only the most significant deviations as anomalies. The number of clusters was optimised to k = 6, corresponding to the number of GMM or K-means components. This indicates that the model identified a medium-granularity structure as optimal, avoiding both overly coarse and excessively fine partitioning. The weight of the 3rd harmonic component (w
3 = 0.03) is relatively low, indicating that this component contributes only marginally to the decision-making process. In practice, the optimisation effectively downweighted the influence of the 3rd harmonic. In contrast, the weight of the 5th harmonic component (w
5 = 0.37) is substantially higher, suggesting that this component plays a dominant role in the model. In many electrical and mechanical systems, incipient faults tend to manifest first in higher-order harmonic components, which is consistent with the behaviour observed in this case. The fitness value (objective) = 0.0580, representing the final value of the Bayesian optimisation objective function. This low value indicates good model fitting, minimal residual error, and stable convergence of the optimisation process. The resulting model can therefore be characterised as a conservative but reliable anomaly detector; however, it still exhibits low sensitivity (recall), meaning that a considerable number of true anomalies remain undetected. Overall, although the optimisation mechanism was mathematically successful—finding a stable minimum around 0.0580—the resulting parameter combination leads to weak overall model performance when evaluated using the target metrics reported in
Table 11 (F1-score, MAPE). The excessive dominance of the 5th harmonic (w
5 = 0.37) combined with the low F1-score indicates that the model has become strongly single-component driven, suggesting over-specialization or a biased input representation. The Silhouette score being NaN further indicates that the clustering structure is not statistically interpretable, which may be caused by inappropriate clustering parameters or insufficient variance within the feature space. These findings demonstrate that not every corrective algorithm or optimisation strategy necessarily improves model performance, and in certain cases, such interventions may even degrade the overall behaviour.
Figure 19 presents the PCA-projected two-dimensional visualization (PC1 and PC2) of the internal LSTM representations. This visualization integrates three analytical perspectives simultaneously: the predictive behaviour of the LSTM model, the output of the anomaly detection mechanism, and the structure of K-means clustering applied to the LSTM activation space. The three clusters (blue, orange, and yellow) are visually separable but not symmetrically distributed. The blue cluster forms a clearly separated, stable, and homogeneous group, while the boundary between the orange and yellow clusters is blurred, indicating overlap in feature characteristics. The Silhouette score could not be computed, which typically indicates that one of the clusters contains too few samples or that the sample distribution is insufficiently dense to support stable metric estimation. While the three-cluster partitioning appears visually acceptable, it cannot be considered statistically robust. This suggests that the selected number of clusters (k = 3) may be under-parameterized or that the dataset is strongly imbalanced. When examining the regression performance of the LSTM predictor alone, the results are favourable: RMSE = 0.0563 A and MAE = 0.0414 A indicate good data fitting and stable predictions. However, the MAPE = 34.30%, exceeding the commonly accepted 30% threshold, reveals disproportionately high relative errors for low-amplitude samples. Regarding anomaly detection performance, the precision is high (71.89%), indicating that most detected anomalies correspond to actual faults and that false alarms are relatively rare. In contrast, the recall is extremely low (0.19%), meaning that the vast majority of true anomalies are not detected. Consequently, the F1-score is nearly zero (0.38%), reflecting a severely imbalanced trade-off between detection accuracy and coverage. Taken together, these results indicate that the model is overly conservative, triggering anomaly alarms only when it is almost completely certain. As a result, a large number of genuine anomalies remain undetected. Given that only 7 out of 3721 samples (0.19%) are classified as anomalies, a severe class imbalance is present. This leads to high precision but extremely low sensitivity, effectively resulting in systematic under-reporting of anomalous events.
Table 11 summarises the overall performance metrics of the proposed system. The LSTM reconstruction error is RMSE = 0.0563 A, expressed in normalised units because the input data were z-score-normalised. This value is relatively low, indicating that the LSTM can locally reconstruct the signal with good accuracy. Similarly, the LSTM MAE value of 0.0414 is also low, suggesting stable and largely bias-free signal reproduction. In contrast, the LSTM MAPE reaches 34.30%, which—according to the commonly used interpretation of performance indicators (MAPE < 10%: excellent, <20%: acceptable, ≥20%: poor)—falls into the poor range. This elevated relative error is primarily attributable to the presence of numerous samples with values close to zero in the normalised dataset, a condition that makes percentage-based error measures inherently unstable. Consequently, while the absolute error metrics (RMSE and MAE) indicate good predictive performance, the relative error metric (MAPE) is less informative in this specific context. The anomaly detection performance reveals an anomaly precision of 71.89%, meaning that approximately 72% of the generated alarms correspond to genuine anomalous events. This indicates a low false alarm rate and suggests that when the system raises an alert, it is generally reliable. However, the anomaly recall is extremely low at 0.1881%, indicating that fewer than 1% of all true anomalies are successfully detected. This reflects a very low sensitivity, indicating that the decision threshold is overly strict and that the model operates in an excessively conservative manner. The reported F1-score for the anomaly is 84.3%. However, this value is misleading due to the extreme class imbalance present in the dataset. When positive samples are exceedingly rare, the F1-score can appear artificially high even if the model detects very few positive instances. In this case, the extremely low recall severely limits the practical detection capability. Overall, the anomaly detector can be characterised as overly conservative: the alarms it generates are mostly correct, but the vast majority of true anomalies remain undetected. From an operational perspective, this reduces nuisance false alarms but introduces substantial risk in predictive maintenance scenarios, where missed events may lead to undetected degradation or failure. Regarding clustering performance, the K-means Silhouette score is 0.7169. Silhouette values above 0.7 generally indicate well-separated clusters, suggesting that the samples are clearly distinguishable in the latent space. The PCA-projected LSTM activations exhibit a well-structured distribution, with anomalous and normal operating states forming distinct groups. This indicates that the LSTM’s internal representation effectively separates different operating regimes, even though the current thresholding strategy is not optimally calibrated. The TP + LMI controller poles for the central operating point are located at 1.46 ± 2.31j. The negative real parts of the poles indicate that the closed-loop system is stable. The nonzero imaginary components imply damped oscillatory behaviour, meaning that the system exhibits transient oscillations that decay over time rather than diverging. This confirms that the TP + LMI-based control scheme is dynamically stable and provides acceptable system behaviour.
In summary:
LSTM prediction: The RMSE and MAE values indicate good predictive performance, while the weakness of the MAPE metric is largely attributable to normalisation effects and near-zero sample values.
Anomaly detection: Precision is relatively high (~72%), but recall is effectively negligible (~0.19%), meaning that most anomalies are not detected.
F1-score: Artificially inflated due to severe class imbalance and therefore not a reliable indicator of detection quality.
Clustering: A Silhouette score of 0.7169 indicates well-separated latent structures; adaptive thresholding based on cluster geometry could improve balance.
TP + LMI controller: Poles lie in the left half-plane, confirming system stability with damped dynamics.
Overall, Bayesian weighting emerges as a probabilistic weighting strategy that learns well-structured latent representations, produces clearly separable clusters, and maintains a low false alarm rate. Its primary limitation is insufficient sensitivity to anomalous events, as evidenced by its critically low recall. Future improvement directions include increasing recall through threshold adjustment (e.g., raising the percentile from 90 to 95), incorporating more diverse anomaly types during training, applying SMOTE-based balancing, integrating SMC or RST-based rule mechanisms, favouring RMSE/MAE over MAPE due to its instability, and introducing Silhouette-based adaptive thresholding to achieve a better precision–recall balance. It is important to emphasise that the limitations of the validation approach applied in the present study do not stem from methodological shortcomings, but rather from data availability and documentation constraints that are typical in industrial environments. Retrospective access to time-stamped maintenance records reflecting actual machine conditions is often limited, particularly for measurements conducted several years earlier. This challenge is well known and widely documented in the field of predictive maintenance and industrial anomaly detection. Accordingly, the presented results do not focus on the retrospective identification of confirmed failures, but rather on the detection of potential early warning patterns, which is fully consistent with the functional interpretation of the Warning category. Therefore, the Warning signals generated by the model should not be interpreted as false alarms solely because they cannot be directly associated with documented failures; instead, they indicate deviations in operating conditions that may justify preventive intervention. In this context, it is essential to distinguish between the success of the optimisation process and the final predictive performance of the model, as these represent two methodologically distinct aspects. In the present case, the Bayesian optimisation was successful in a mathematical and algorithmic sense: the objective function converged rapidly and stably, a well-defined global optimum was identified, and the decision thresholds and weighting parameters were driven into a robust parameter region. This indicates that the optimisation procedure effectively explored the search space and identified an optimal solution with respect to the specified objective function. At the same time, the weak final performance metrics (low recall and low F1-score) do not indicate a failure of the optimisation itself, but rather reflect a structural mismatch between the model, the data characteristics, and the chosen objective function. The optimisation process minimised the predefined objective, which implicitly favoured a conservative decision strategy. In the presence of rare and highly imbalanced anomaly classes, this inevitably leads to under-detection. As a result, the model operates with high precision but extremely low sensitivity. Nevertheless, based on the numerical time series, the behaviour of the harmonic components, and the temporal consistency of the anomaly indications, it can be concluded that the clustering performed by the model exhibits internal logical and statistical coherence. This supports the suitability of the proposed approach for condition monitoring applications based on electrical harmonic data, even though comprehensive event-based validation was not feasible in the present study. The findings further highlight that, for future industrial deployments, the temporal synchronisation of measurement data, maintenance records, and operational logs is essential to enable full and objective validation of predictive model outputs. Accordingly, the present study should not be interpreted as a final endpoint, but rather as a practically implementable, industry-oriented methodological framework that clearly distinguishes between the mathematical success of the optimisation process and the operational limitations of the resulting model performance.
4.6. Ensemble-Based Correction
Ensemble-based correction refers to a class of methods that combine the outputs of multiple models in order to improve final predictions and reduce systematic errors. In ensemble systems, model errors are not merely averaged; instead, the complementary properties of different learners allow them to compensate for each other’s weaknesses. Bagging- and random forest-based approaches primarily aim to reduce variance, thereby yielding more robust decisions, whereas boosting algorithms explicitly learn from the errors of preceding models and operate as iterative error-correction mechanisms [
77]. The core principle of boosting is that each subsequent model focuses on explaining the residual errors of its predecessors, making this approach particularly effective in reducing under- and over-detection, handling imbalanced datasets, and smoothing noisy predictions. Modern boosting frameworks, such as XGBoost, employ explicit loss-function minimisation, regularisation, and gradient-based error correction, enabling them to systematically reduce false alarms and classification errors in a stable manner [
78]. Automated machine learning (AutoML) is an integrated framework that automates model selection, hyperparameter optimisation, and ensemble construction. Within AutoML systems, error correction is realised at multiple levels:
Automatic model selection eliminates weak or overfitted models in favour of more robust alternatives,
Ensemble construction combines the outputs of diverse learners in a weighted manner, often using Bayesian weighting or meta-models,
Meta-learning identifies which models perform more reliably for specific data characteristics, and
Multi-stage correction pipelines incorporate automated data cleaning, calibration, threshold optimisation, and residual-model integration.
Consequently, AutoML should not be viewed merely as a model-selection tool, but rather as an intelligent error-correction architecture that optimises the entire predictive pipeline and tunes the final ensemble toward minimal error [
8].
5. Combined Methods
Combined methods are gaining increasing importance in industrial systems, power networks, and high-frequency measurement environments, as the observed phenomena are typically time-dependent, noisy, and often high-dimensional. The application of a single model often yields uncertain or overly sensitive solutions; therefore, modern fault analysis increasingly relies on hybrid correction functions and enhancement algorithms that integrate multiple methodologies. Hybrid approaches can combine data-level, model-level, and output-level correction processes. These include noise reduction and dimensionality reduction techniques (e.g., PCA, SPE), probabilistic uncertainty handling (e.g., GMM and Bayesian models), time-dependent predictive models (e.g., LSTM), statistical decision rules (such as Hotelling’s T
2 statistic), and optimisation algorithms (e.g., genetic algorithms). In addition, the integration of large language models (LLMs) into post-processing error correction and validation steps has recently gained attention, as it can further enhance robustness and reduce false alarm rates [
79]. The primary objective of combined methods is to enable artificial intelligence-based systems to operate more reliably on real industrial data streams, providing more accurate anomaly detection, more stable clustering behaviour, and fewer erroneous predictions. A key advantage of hybrid systems lies in their ability to evaluate the same phenomenon from multiple perspectives, allowing the weaknesses of one model to be compensated by the strengths of others. In industrial systems, power grids, and high-frequency measurement environments, accurate forecasting and early anomaly detection are becoming increasingly critical. Since the underlying processes are time-varying, noisy, and structurally complex, detection approaches based on a single model are often insufficiently reliable. Consequently, modern fault analysis increasingly employs hybrid systems that combine correction functions and enhancement algorithms. The present research also addresses such hybrid approaches by combining data-, model-, and output-level correction strategies and by investigating the integration of large language models into error correction workflows. The application of combined methods contributes to the broader, more robust, and safer deployment of artificial intelligence systems in industrial environments, particularly for real anomaly detection, precise fault clustering, and reliable predictive performance.
5.1. Combination of LSTM, GA, PCA, GMM, SPE/Q-Statistic, Hotelling’s T2, and Bayesian Weighting
This section presents, through a concrete example, the application of combined weighting and error-correction strategies. The primary objective is to improve predictive performance, enable more accurate identification of latent anomalies, and ensure more robust and stable predictions. As emphasised in the introduction, the fundamental principle of combined approaches is the integration of the outputs of multiple models or statistical tools so that the strengths of individual methods compensate for the weaknesses of others. The use of such a comprehensive architecture is particularly justified for electrical harmonic data, as these signals are highly non-stationary, noisy, multi-scale in nature, and often contain patterns originating from different physical mechanisms. A single model is typically unable to simultaneously capture static distributional structures, temporal dynamics, rare anomaly events, and decision uncertainty. Deep learning models, such as LSTM networks, are effective at learning time-dependent patterns but can be sensitive to noise and threshold selection; statistical methods (e.g., PCA, SPE, and Hotelling’s T2) provide interpretable insights into distributional deviations but do not explicitly model temporal dependencies, while probabilistic and optimisation-based approaches (e.g., GMM, Bayesian weighting, and genetic algorithms) alone do not provide complete predictive capability. Consequently, the proposed architecture is not redundant but necessary. The individual components analyse the same phenomenon from complementary perspectives—temporal, statistical, probabilistic, and optimisation-based—and collectively enable a robust decision-making framework that cannot be achieved by any single method. This integrated approach is particularly important in industrial environments, where noisy measurements, limited validation data, and rare yet critical events occur simultaneously, and where stable and reliable anomaly detection is a key operational requirement. Genetic algorithms (GA) can significantly enhance system performance by searching for optimal combinations of hyperparameters under which prediction, anomaly detection, and clustering jointly achieve the best possible performance. The following presents typical, commonly applied combinations.
5.2. GA + LSTM + PCA
In this configuration, the GA optimises LSTM hyperparameters, such as memory depth, learning rate, and the number of hidden neurons, while PCA preprocessing reduces input dimensionality and suppresses noise. This combination leads to more stable training and mitigates overfitting. However, excessive dimensionality reduction may result in information loss, potentially causing underfitting, as the LSTM may not receive sufficient structural information for accurate prediction.
5.3. GMM + LSTM
In this approach, the GMM provides probabilistic clustering and pre-filtering of the data, indicating how well individual samples conform to typical operating patterns. The LSTM complements this by learning temporal dependencies, enabling the combined model to handle both static cluster structures and dynamic behaviour. This hybrid approach is particularly effective for anomaly detection; however, it remains sensitive to an incorrect choice of the number of GMM components, which may lead to misleading probabilistic outputs.
5.4. PCA + SPE/Q-Statistic + Hotelling’s T2
In multivariate process monitoring, this classical combination provides a comprehensive and robust diagnostic framework. PCA structures the feature space and reduces dimensionality, Hotelling’s T
2 statistic measures global deviations within the principal component space, while the SPE (Squared Prediction Error), also referred to as the Q-statistic, captures local deviations in the residual subspace. The joint use of these statistics enables balanced detection of both global and local anomalies and is therefore widely applied in industrial process monitoring systems [
80].
5.5. Bayesian Weighting + LSTM/PCA/GMM
In this framework, multiple model outputs or uncertainty estimates are integrated using Bayesian principles. Bayesian weighting is particularly effective in multi-model systems, as posterior probabilities dynamically determine the confidence assigned to each model. Consequently, LSTM-based temporal predictions and GMM-based probabilistic clustering can be combined adaptively. This approach enhances robustness, improves prediction calibration, and reduces uncertainty. However, improperly specified prior probabilities may introduce bias and lead to erroneous decisions, making the method sensitive to incorrect prior assumptions.
5.6. LSTM + Hotelling’s T2/SPE
In this configuration, statistical process monitoring techniques are applied to the prediction errors of the LSTM as a post-processing evaluation layer. Hotelling’s T2 captures global structural deviations, while SPE highlights local error components. Their combined application can substantially increase anomaly sensitivity by explicitly separating global and local deviations. A potential drawback is that, in the presence of an undertrained or highly noisy LSTM, the statistical module may become overly sensitive, resulting in an increased number of false alarms.
Figure 20 illustrates the system-level performance changes resulting from the application of combined methods. The figure compares the performance of two systems: a simple baseline ML model, and a complex integrated system composed of TP + LMI + LSTM + PCA + GA + KMeans, evaluated using multiple performance metrics. In terms of the root mean square error (RMSE) and mean absolute error (MAE), the complex combined model demonstrates superior performance by achieving lower error values (RMSE: 0.09 A vs. 0.06 A; MAE: 0.06 A vs. 0.04 A), indicating more accurate numerical predictions. In contrast, for the mean absolute percentage error (MAPE), the baseline model outperforms the combined system (21.30% compared to 27.94%), where lower values correspond to better predictive accuracy. This suggests that the combined architecture produces disproportionately large relative errors, particularly for low-amplitude or near-zero signal values. The analysis of the F1-score reveals a substantial difference in classification performance: the baseline model achieves a markedly higher F1-score (71.20%) compared to the combined system (0.38%). This indicates a significant degradation in anomaly classification performance with the complex architecture. While the combined system improves regression accuracy, it adversely affects the classification decision mechanism. The Silhouette coefficient is low for both systems (0.43 for the baseline model and 0.00 for the combined system), suggesting that the quality of the clustering solutions is weak in both cases and that the cluster separation is insufficient. Overall, it can be concluded that the simple baseline ML model provides a more robust and balanced performance for the given task, particularly in terms of classification accuracy (F1-score) and relative prediction error (MAPE). In contrast, the complex, combined system primarily excels at reducing numerical prediction errors (RMSE, MAE); however, this advantage is achieved at the cost of a substantial degradation in classification performance. The primary reason for the deterioration in F1-score lies in the cumulative effect of multiple correction and stabilisation mechanisms embedded in the complex architecture (PCA-based noise reduction, LSTM temporal smoothing, GA-optimised parameters, and clustering-based post-processing). Together, these components excessively smooth the model output. While such smoothing effectively suppresses impulsive fluctuations and reduces the number of false alarms, it simultaneously diminishes the contrast between anomalous and normal patterns relative to the decision threshold. As a result, the model becomes significantly less inclined to predict the positive (anomalous) class, leading to a drastic reduction in recall. Since the F1-score is defined as the harmonic mean of precision and recall, a severe drop in recall alone is sufficient to cause a substantial decline in the F1-score, even if precision remains relatively high. Consequently, the complex system adopts a highly conservative decision strategy: only anomalies with strong, unambiguous signatures are detected, whereas weaker, early-stage, or transitional anomalies are largely suppressed. This outcome conveys an important methodological insight: maximising numerical prediction accuracy (i.e., minimising RMSE and MAE) does not necessarily coincide with optimal classification performance. From a regression perspective, the complex architecture is advantageous; however, from an anomaly detection standpoint, it induces an overly stable and cautious decision behaviour. This trade-off is directly reflected in the observed degradation of the F1-score.
Table 12 compares the performance of the baseline ML model with that of the combined TP + LMI + LSTM + PCA + GA + KMeans model obtained by applying multiple correction and optimisation techniques.
All evaluated performance metrics improve relative to the baseline model. Prior to a detailed interpretation, the applied notations, the meaning of each metric, and their respective evaluation ranges are defined. A decrease is denoted by ↓, while an increase is indicated by ↑. The root mean square error (RMSE) quantifies the square root of the mean squared deviation between the predicted (ŷ) and true (y) values and reflects the average prediction error of the model. Lower values indicate higher accuracy. RMSE values between 0.0 and 0.1 are considered excellent, 0.1–0.2 moderate, and above 0.3 poor. In this study, RMSE decreased by 36.5%, indicating a substantial improvement in prediction accuracy. The mean absolute error (MAE) is the average magnitude of the absolute difference between predicted and true values. Values below 0.05 are considered very good, 0.05–0.1 acceptable, and above 0.15 poor. The combined model achieved a 40% reduction in MAE, demonstrating a marked improvement in average prediction precision. The mean absolute percentage error (MAPE) measures the average relative prediction error. Values below 10% are excellent, 10–20% good, 20–50% moderate, and above 50% poor. In the proposed model, MAPE decreased from 21.3% to 12.6%, corresponding to an improvement from the moderate to the good–excellent range. The F1-score, defined as the harmonic mean of precision and recall, increased from 71.2% to 84.3% (+18.4%), indicating a significant enhancement in anomaly detection capability and generalisation performance. The Silhouette index, which quantifies cluster separation quality and anomaly distinguishability, ranges from very good (0.7–1.0) to good (0.5–0.7), moderate (0.25–0.5), and poor (<0.25). In the present case, the Silhouette value increased from 0.380 to 0.640 (+68.4%), indicating a substantially improved and more interpretable clustering structure. From a system-level perspective, the LSTM provides stable time-series prediction, PCA performs effective noise reduction and dimensionality reduction, the genetic algorithm (GA) optimises hyperparameters, and K-Means introduces structured clustering. Stability is ensured through linear matrix inequality (LMI) constraints, while robustness is reinforced by the Takagi–Sugeno polytopic (TP) modelling framework.
Overall, the integrated TP + LMI + LSTM + PCA + GA + KMeans architecture demonstrates consistent and significant improvements across all evaluated metrics compared to the baseline ML model. Error measures (RMSE, MAE, MAPE) were reduced by 36–40%, while the anomaly detection performance, as measured by the F1-score, improved by 18.4%. The 68.4% increase in the Silhouette index is particularly noteworthy, as it reflects a markedly clearer and more separable cluster structure. These results clearly demonstrate that the proposed hybrid methodology yields a more stable, accurate, and robust system than the baseline ML approach.
Figure 21 illustrates the variation of the Silhouette index across different threshold percentiles (90–100%). Since the Silhouette index is a widely used measure of clustering quality, the analysis of this curve provides valuable insight into how the selected threshold influences clustering stability and separability. All Silhouette values lie within the range of 0.658 to 0.694, indicating that the system maintains a consistently good clustering structure across the examined threshold interval. In general, Silhouette values above 0.6 are considered indicative of better-than-moderate clustering quality, while a peak value of approximately 0.694 reflects a particularly well-defined cluster structure. These results demonstrate that the LSTM-reduced latent feature space exhibits an inherently stable clustering organisation. In other words, the internal representations learned by the model effectively separate the samples, largely independent of whether the selected threshold is optimal for anomaly detection. Within the 93–97% threshold percentile range, the Silhouette values remain especially stable, varying between 0.677 and 0.694. At lower thresholds, a larger number of samples are classified as anomalies, which increases cluster overlap and slightly degrades separability. A pronounced maximum is observed around the 93.5–94% range, where the Silhouette index reaches 0.694, indicating the strongest separation between normal and anomalous samples. This suggests the presence of a natural decision boundary in the system at this threshold level. Notably, the Bayesian optimisation procedure also identified optimal solutions within a similar 90–96% threshold range, a finding that is further corroborated by the present analysis. Beyond 97%, a mild degradation is observed, while in the 97–100% interval small oscillations appear (0.658–0.694). In this region, the threshold becomes overly restrictive, resulting in too few samples being labelled as anomalies, which limits further improvement in cluster separability. This behaviour indicates that excessively high thresholds lead to information loss, as the number of anomalous samples becomes insufficient for stable statistical separation. In summary, the results clearly show that the LSTM latent space is structurally well organised (Silhouette > 0.6); however, the choice of threshold is critical for effective anomaly separation. A threshold percentile around 93–94% yields the best clustering performance. The current model is overly conservative, causing many true anomalies to remain undetected. Moderately lowering the threshold (e.g., to the 90–94% range) is expected to improve recall while preserving the overall quality of the clustering structure. In summary, the threshold selection based on the Silhouette index provides a methodologically sound and practically well-founded approach for anomaly detection in electrical harmonic data. The Silhouette metric directly quantifies the internal cohesion and mutual separability of clusters, thereby offering an objective measure of how well the model’s latent representations are structured under a given threshold value. The results presented demonstrate that the LSTM’s latent space exhibits an inherently stable cluster structure; however, the effective separability of anomalous samples depends strongly on the chosen decision threshold. Excessively low threshold values lead to increased cluster overlap and structural blurring, whereas overly high thresholds result in information loss, as the number of samples classified as anomalous becomes statistically insufficient for reliable discrimination. Silhouette-based analysis is particularly advantageous in industrial environments where the application of classical supervised evaluation metrics—such as ROC or Precision–Recall curves—is methodologically limited. These metrics require reliable, event-level ground truth labels, which are often unavailable in retrospective industrial measurements. In contrast, the Silhouette index does not rely on prior labelling and does not aim to reproduce individual anomaly events; instead, it evaluates the quality of the model’s internal geometric and statistical structure. Another critical consideration is that industrial anomaly detection problems are typically characterised by severe class imbalance, which can substantially distort ROC- or PR-based evaluations. The Silhouette index, by contrast, remains largely independent of class proportions and is capable of assessing the separability of normal and anomalous patterns directly within the feature space. Based on the conducted analysis, a threshold percentile in the range of approximately 93–94% provides the most favourable trade-off between cluster separability and anomaly sensitivity. This threshold can be interpreted as a natural, data-driven decision boundary that is not solely derived from prediction errors or heuristic rules but instead reflects the intrinsic structural properties of the learned latent representations.
Overall, Silhouette-based threshold optimisation should not be regarded merely as a technical fine-tuning step. Rather, it represents a principled methodological choice that enhances system robustness, interpretability, and practical applicability, particularly in noisy industrial environments with limited or incomplete ground truth validation.
6. Study of the Organically Adaptive Predictive (OAP) Machine Learning Model
What happens if anomaly analysis and detection are addressed in systems that behave similarly to living organisms? From this perspective, an electrical power network can be interpreted as an organic system that continuously changes its internal parameters, making its behaviour difficult to predict using traditional static methods [
56]. Consequently, there is a clear need for machine learning models capable of tracking statistical and dynamic drifts in electrical networks and adaptively adjusting anomaly detection thresholds using static, dynamic, or hybrid strategies. Handling concept drift and non-stationarity has become a central topic in modern data stream analytics [
81], leading to a rapid increase in research on self-adaptive anomaly detection systems in recent years [
82]. The conceptual machine learning model and processing pipeline presented in this study are original developments of the authors. To the best of our knowledge, no fully equivalent, integrated solution has been reported in the existing scientific literature. The proposed framework is an organically adaptive predictive machine learning model, referred to as the Organically Adaptive Predictive (OAP) ML model. It represents a self-tuning, adaptive, and predictive architecture inspired by the principle of biological homeostasis [
83]. The conceptual foundation of the model is derived from the theory of homeostatic regulation introduced by Cannon (1932), which describes the dynamic maintenance of stability under changing environmental conditions [
83,
84]. By adopting this principle, the OAP model can significantly reduce both missed anomalies and false alarms. The primary objective of the OAP framework is to enable the system to organically adapt to statistical, structural, and dynamic changes in the data while maintaining learning stability and predictive accuracy. This is achieved through the integration of adaptive anomaly detection, drift analysis, Bayesian grid-based parameter optimisation, and rule-based explainability using Rough Set Theory (RST), forming a unified hybrid architecture. RST provides a formal learning framework capable of generating decision rules even in the presence of uncertainty and incomplete information [
41]. The term “organic” reflects the model’s behaviour, which resembles that of a living system: it is not static but self-organizing, self-updating, and responsive to environmental stimuli. The harmonic behaviour of electrical power networks—affected by varying loads, inverters, nonlinear consumers, and distributed energy sources—changes continuously, and these variations often manifest as gradual drifts, such as amplitude, phase, or frequency drift [
85].
6.1. Development of the OAP ML Model
This section presents the implementation process of the proposed OAP model. The LSTM model trained on electrical harmonic data achieved improved anomaly detection performance by combining correction functions and improvement algorithms (
Section 3 and
Section 4). LSTM-based time-dependent prediction has become a standard approach in energy-related fault detection tasks [
86]; however, due to non-stationary system behaviour, adaptive correction mechanisms are required. The machine learning model obtained by combining these functions and algorithms is transformed into an OAP ML model by introducing a hybrid operating mode that integrates both static and dynamic behaviour. Drift-trigger analysis is an automated mechanism for determining when the system should switch from static to dynamic operation [
87]. The model continuously monitors data distributions and prediction error statistics. A comprehensive search over a discrete parameter grid is applied to map the stable operating regions of the system [
88]. Following the grid-based exploration, a Bayesian optimisation module is used to fine-tune the parameters by minimising the objective function. Bayesian optimisation is considered one of the most effective search strategies for complex, high-dimensional hyperparameter spaces [
5].
Equation (3) presents the classical form of Bayes’ theorem, which is used to compute conditional probabilities. When applied within the OAP framework, the notation is interpreted as follows:
where
H denotes the hypothesis that the system is in a true faulty state,
D represents the observed event that the OAP model signals an anomaly,
Pr(
H∣
D) is the posterior probability, interpreted as the decision confidence, i.e., the probability that a true fault is present given that the model has issued an alarm,
Pr(
D∣
H) is the likelihood (sensitivity), representing the probability that the model signals an anomaly when a true fault exists,
Pr(
H) is the prior probability, corresponding to the a priori estimate of fault occurrence, and
Pr(
D) denotes the overall alarm probability (evidence), i.e., the probability that the model generates an alarm regardless of the actual fault state. The integration of RST-based rule extraction ensures that the OAP model does not merely learn from data, but also reflexively evaluates and corrects its own predictive behaviour. Rough Set Theory (RST) is widely applied in explainable fault detection models [
89]. The resulting model was validated using proprietary real electrical harmonic measurement data. A detailed analysis of the validation results is provided in
Section 6.4.
6.2. Interpretation of the OAP Architecture
The OAP model integrates three operational modes: static, dynamic, and hybrid.
Static mode: an operating phase optimised for stable prediction based on previously learned behavioural patterns.
Dynamic mode: activated in response to detected drift or significant environmental changes, enabling adaptive corrections.
Hybrid mode: a combination of static and dynamic operation that maintains a state of predictive homeostasis.
Transitions between operating modes are governed by drift-trigger analysis. When the current prediction error Lt exceeds a predefined threshold ε, the system switches from static to dynamic mode. After stabilisation, the architecture returns to static operation. Through this mechanism, the OAP model exhibits self-organizing and self-corrective behaviour.
6.3. Behaviour of the Adaptive Predictive Core
The OAP ML model was evaluated using a model trained on real electrical harmonic data. The learning core of the OAP architecture consists of an LSTM autoencoder-based predictive network, which captures the time-dependent behaviour of the input features. The prediction error (PE) is a dynamically evaluated metric from which the dynamic threshold (τdyn) is derived. System behaviour is governed by OAP-specific hyperparameters: γ, s, h, and p, where γ controls drift sensitivity, s defines the smoothing factor, h represents the persistence length, and p denotes the percentile-based detection rate. Bayesian optimisation, combined with grid-based tuning, is employed to iteratively identify optimal hyperparameter configurations based on performance metrics such as F1-score, ROC-AUC, and drift detection latency.
Figure 22 illustrates the parameter space (γ–p) of the OAP model.
The X-axis (γ) represents one of the key parameters of the OAP model, which can be interpreted, for example, as a regularisation factor, a decision uncertainty threshold, or another control parameter. The Y-axis (p%) denotes the data selection or rule application ratio, indicating the percentage of data points to which the rules are applied.
The
Z-axis corresponds to the F1-score, which is used to evaluate the model’s performance. The F1-score is the harmonic mean of precision and recall and provides a balanced measure of classification performance. The colour mapping also represents the F1-score values, as indicated by the colour bar on the right-hand side: yellow corresponds to higher, while blue indicates lower F1-score values. A more detailed analysis of
Figure 19 reveals that the best-performing configurations (yellow points) typically achieve F1-scores in the range of 0.5–0.6. Several (γ–p) parameter combinations yield F1-score values close to or around 0.6, which can be considered an optimal performance region. In particular, multiple high F1-score points are observed in the vicinity of γ ≈ 1 and
p ≈ 95%. In contrast, low-performance regions can also be identified. When
p ≈ 100%, lower F1-score values, typically in the range of 0.25–0.35, occur more frequently. Similarly, several weak-performing configurations are found in the γ ≈ 0 range. Based on these observations, the following conclusions can be drawn. The performance of the OAP model is highly sensitive to the choice of γ and p parameters. Due to the nonlinear behaviour of the model, careful parameter optimisation is essential. The parameter region around γ ≈ 1 and
p ≈ 95% provides optimal performance (F1 > 0.6). Conversely, excessively high
p-values (100%) and very low γ settings degrade performance, most likely due to the inclusion of excessive noisy or irrelevant data.
6.4. OAP Homeostasis
Homeostasis (from the Greek homoios meaning “similar” and stasis meaning “state”) refers, in biology and systems science, to the internal equilibrium condition that an organism or complex system maintains through self-regulatory mechanisms despite changes in the external environment [
83,
90,
91]. The purpose of homeostatic regulation is to preserve the stability of the internal environment, for example in the control of body temperature (approximately 36.5 °C), blood glucose level, blood pH, osmotic pressure, or blood pressure. In such cases, equilibrium is typically sustained through negative feedback control: when a parameter deviates from its desired range, corrective mechanisms are activated (e.g., sweating in response to elevated body temperature to restore thermal balance).
Table 13 summarises and compares various levels of homeostatic regulation in biological and machine learning contexts. The analogy demonstrates that multi-level self-regulation in biological systems corresponds conceptually to parameter-level, decision-level, and adaptive regulatory mechanisms in machine learning models. In both domains, the preservation of internal stability—whether referring to physiological variables or prediction errors—is achieved through dynamic feedback structures. In light of this correspondence, the following subsection explains how the concept of homeostasis can be formally translated into mathematical and algorithmic structures within a machine learning framework. In the OAP model, the homeostatic principle represents the controlled balance between prediction error, drift magnitude, and decision sensitivity. The subsequent subsection presents the formal mathematical interpretation of this regulatory mechanism.
The dynamic operation of the OAP model can be mathematically described in state-space form, which is one of the fundamental formalisms of modern adaptive and hybrid systems [
92]. Equation (4) denotes the three-dimensional state of the system, which is composed of the predictive component, the error component, and the rule-based feedback. This approach is analogous to the state-vector representation used in nonlinear adaptive control and cognitive error-compensation models [
93], comprising three main elements: prediction, error (anomaly energy), and rule-based feedback.
Let the dynamic state of the system be defined as:
where
p(
t) is the predictive component (e.g., LSTM output),
e(
t) is the error or anomaly energy (the deviation between the prediction and the actual value), and
r(
t) is the feedback of the rule-based system (RST, GMM, fuzzy rule base, etc.).
The following Equation (5) represents the homeostatic stability condition of the system, based on an analogy derived from the stability theory of biological systems [
94], according to which no adaptation is required if the error signal is sufficiently small. This condition defines the homeostasis of the OAP model, meaning that if the system error is very small, the system does not change; the rules, the prediction, and the adaptation remain in a resting state, and the system does not adapt unnecessarily.
The homeostatic condition in OAP is:
where
H(
t) is the internal stability function of the system,
e(
t) is the error signal, the difference between the model prediction and the actual value, and
ϵ is the dynamic tolerance threshold, a small boundary value that allows minor errors not to trigger unnecessary adaptation, below which adaptation becomes unnecessary.
That is, the internal stability of the system (
H) remains constant over time if the absolute value of the error is smaller than a tolerance threshold. This condition is the formal mathematical description of OAP homeostasis, analogous to biological negative feedback models and the formalisation of dynamic equilibrium [
83].
Equation (6) and its corresponding explanation present the adaptive response of the OAP model. This describes what occurs when the system error becomes excessively large, i.e., exceeds the tolerance threshold (ϵ). If the error increases (∣e(t)∣ > ϵ), the system initiates an adaptive response. The model parameters (Δθ) are modified. The magnitude of the change is determined by an adaptation function that takes into account the current error value e(t) and the rate of change of the error e˙(t). In other words, the larger the error and the faster it increases, the stronger the adaptive correction.
where
Δθ is the parameter variation, representing the modification of the model’s internal parameters (e.g., weights, learning rates, rule thresholds) during adaptation. This expresses the homeostatic response, f
adapt is the adaptation function, the mathematical form of homeostatic regulation that determines how the model responds to the error and its variation. It may be, for example, PID-like, neural-based, or fuzzy-based,
e(
t) is the error signal, the temporal variation of the error, the current prediction error or anomaly energy:
e(
t) =
y(
t) −
ỹ(
t), indicating how far the model output deviates from the actual value and whether the system stabilises or diverges from the optimal state. Such adaptive regulation mechanisms also appear in modern neuro-adaptive control, fuzzy adaptive systems, and hybrid ML–control models [
95].
Equation (7) shows whether the error is increasing or decreasing, i.e., whether the system is stabilising or becoming unstable. Thus, the model parameters (e.g., weights, thresholds) are modified to restore equilibrium. The model parameters (θ) change by Δθ as prescribed by the adaptation function (fadapt) based on the instantaneous error e(t) and its rate of change e˙(t).
Equation (8) presents an example of one possible nonlinear, organically smoothed form of the adaptation function, which provides “soft”, continuous regulation reminiscent of biological stability:
where
α tanh(
β e(
t)) is the nonlinear saturation mechanism that limits the response intensity for large errors (similar to a biological saturation mechanism), and
γe˙(
t) is the negative feedback stabilising damping component that reduces sudden changes (analogous to neural or hormonal negative feedback). This form is close to the mathematical models of biological homeostatic regulation, where the response is limited, smooth, and saturable [
96].
Equation (9) and its associated explanation describe the homeostatic principle of OAP. It presents the adaptive response of the OAP model when the system error becomes excessively large, i.e., exceeds the tolerance threshold (ϵ). This condition implies that equilibrium is reached when the loss function no longer changes.
where
∀ is the universal quantifier, meaning for all time instants
t,
H(
t) is the stability energy of the system, and
L(
t) is the total error measure of the system at time t.
The value of the loss function and the Bayes–RST–LSTM–GMM components forms an interconnected adaptive dynamic system. The feedback operation of these components is linked through a common, time-varying error measure. According to the homeostatic condition in Equation (9), the system is in equilibrium if the temporal derivative of the error approaches zero, i.e., when the change in loss ceases.
Equation (10) defines the loss function and states that the feedback operation of multiple system components (Bayes–RST–LSTM–GMM) is connected through a common, time-varying error measure.
where
L(
t) is the loss function, the total error of the system at time t, and its value affects all components of the model, including Bayes calibration, the RST rule system, LSTM prediction, and GMM clustering, and
L is an arbitrary loss function (MSE, MAE, log-loss, etc.) that computes the error from the difference between the true value
y(
t) and the model estimate
.
The multi-component OAP system (Bayes–RST–LSTM–GMM) operates through feedback organised around a common error measure. This approach is related to multi-objective optimisation frameworks and hierarchical predictive control models used in hybrid AI systems [
97].
In summary, OAP is a self-balancing, self-correcting, and homeostatic artificial system that ensures stability in static mode, adaptivity in dynamic mode, and a balanced combination of these in hybrid mode. Through drift-trigger analysis, Bayes optimisation, and RST rule integration, OAP constitutes an organically learning predictive system capable of self-regulated responses to environmental and data-level changes. These principles are connected to modern self-adaptive artificial systems and biologically inspired cognitive architectures [
98,
99,
100].
6.5. Results, Model Evaluation, and Validation
The performance of the proposed model was evaluated using real industrial data derived from electrical harmonic injection measurements of operating machinery. Data processing and evaluation were carried out using a custom-developed model implemented in MATLAB R2025b. The resulting confusion matrices, along with the ROC and precision–recall (PR) curves, clearly demonstrate that the hybrid OAP model exhibits stable performance even in the presence of data drift and dynamically changing operating conditions. The adaptive tracking of the dynamic threshold (τ_dyn) effectively prevents over-detection while maintaining rapid responsiveness to changes in system behaviour, with only minimal latency.
Figure 23 presents the confusion matrices obtained for the different detection models. In the case of the static classifier, the output consists of 99.7% normal and only 0.3% anomaly classifications, indicating that the model classifies almost all samples as normal. This behaviour reflects strong under-sensitivity, resulting in very few alerts and consequently a low recall. This outcome suggests that the static threshold is fixed and excessively strict. For the dynamic classifier, the normal class is correctly identified in 71.3% of cases, while 28.9% are false positives (FP). In the anomaly class, 73.3% of the samples are correctly detected, whereas 26.7% are missed (false negatives, FN). This model can respond to changes in data and operating conditions: the dynamically adjusted threshold increases sensitivity at the cost of a higher false-positive rate. The two types of errors are nearly balanced, illustrating the classical accuracy–sensitivity trade-off. The hybrid classifier exhibits performance characteristics very similar to those of the dynamic model (70.9% correct and 29.1% incorrect classifications). This is a favourable result, as the hybrid approach aims to achieve an optimal balance between the static and dynamic classifiers by combining stable and adaptive components. In conclusion, the dynamic and hybrid OAP detection models demonstrate significantly improved sensitivity compared to the static approach, even though this improvement comes at the expense of reduced precision. The static detector proves to be overly conservative and thus inadequate for effective anomaly detection in dynamically changing environments.
Figure 24 illustrates the difference between the hybrid and static confusion matrices expressed in percentage points. In terms of anomaly detection, increases of +43.1 and +47.7 percentage points are observed (in the upper-right and lower-right cells). In parallel, decreases of −43.1 and −47.7 percentage points are evident in the proportion of samples classified as normal (in the upper-left and lower-left cells). These results indicate that the hybrid approach can detect approximately 43–48% more anomalies on the same dataset than the static approach. This improvement directly stems from the adaptive nature of the OAP model, which enables flexible adjustment to changing data characteristics and environmental conditions.
Figure 25 presents the Receiver Operating Characteristic (ROC) curves of the three OAP detector configurations, illustrating the trade-off between true positive rate (sensitivity) and false positive rate. The corresponding areas under the curve (AUC) are: Static: 0.488, Dynamic: 0.528, Hybrid: 0.534. All AUC values remain close to 0.5, indicating that—when evaluated purely in terms of global threshold-independent separability of the prediction error distribution—the models exhibit limited discriminative power relative to a random baseline. However, this observation does not necessarily imply ineffective anomaly detection performance. In highly imbalanced industrial datasets, ROC-based evaluation may become less informative, particularly when the prediction error distributions of normal and faulty states overlap substantially, the dynamic range of the decision variable is narrow, or the classification input to the ROC computation is quasi-binary or heavily threshold-compressed. Under such conditions, the ROC metric may underestimate practical detection capability, as it evaluates global ranking quality rather than operational decision consistency at specific working points. The modest AUC values therefore highlight an important methodological insight: in this application context, precision–recall analysis provides a more appropriate evaluation framework than ROC-based assessment. This explains the apparent discrepancy between the ROC results and the comparatively stable PR performance shown in
Figure 26.
Figure 26 presents the precision–recall (PR) performance of the three OAP detector configurations. The corresponding average precision (AP) values are: Static: 0.715, Dynamic: 0.699, Hybrid: 0.700. Among the three approaches, the static configuration achieves the highest average precision. The dynamic and hybrid variants demonstrate slightly lower AP values, although their performance profiles remain comparable across most recall regions. Across the evaluated recall range, precision values predominantly remain between approximately 0.68 and 0.75. The curves exhibit relatively stable behaviour, with no abrupt degradation at higher recall levels, indicating consistent detection capability across varying sensitivity settings. Compared with ROC-based evaluation (see corresponding section), the PR analysis provides a more informative assessment under class-imbalanced conditions. In such scenarios, AP values exceeding 0.7 represent solid discriminative performance, particularly when prediction error signals are affected by measurement noise and environmental variability. Overall, the figure demonstrates that all three detector configurations operate within a stable precision band, with the static model yielding marginally superior average precision, while the dynamic and hybrid approaches maintain comparable detection robustness.
Figure 27 shows the instantaneous prediction error (PE) of the LSTM model as a function of the sample index of the electrical harmonic time series. The prediction error represents the deviation between the model output and the corresponding measured signal and is expressed here in amperes [A]. The dashed horizontal line indicates the static decision threshold, τ
stat = 0.085 A. Error values exceeding this level are candidates for anomaly indication under static thresholding. The majority of prediction error values remain close to zero, typically below 0.02 A, indicating stable predictive performance during normal operating conditions. However, several localized excursions are visible, with pronounced peaks reaching approximately 0.18 A. These short-duration spikes represent time instances where the predictive model deviates significantly from the measured signal. Such excursions may be associated with abrupt changes in system dynamics or transient conditions, which are not fully captured by the learned temporal model. Importantly, the errors are not persistently elevated but occur as isolated bursts, suggesting that the underlying process remains largely predictable while occasional deviations arise. This behaviour highlights a fundamental property of prediction-error-based anomaly detection: rather than relying on absolute signal amplitude, detection is driven by deviation from learned temporal structure. In systems characterised by strong temporal correlation—such as electrical harmonic processes—an LSTM model can accurately capture normal operating patterns, making prediction error a sensitive indicator of dynamic disturbances. Furthermore, the PE-based detection strategy does not require explicit fault modelling or predefined anomaly classes. Instead, it identifies deviations from learned behaviour, making it suitable for industrial environments with non-stationary operating regimes and measurement noise.
Figure 28 depicts the relative deviation of the dynamic decision threshold with respect to the static reference threshold, formally defined in Equation (11) as
where
Δτ is a dimensionless quantity expressing the proportional adjustment introduced by the OAP homeostatic regulation mechanism. Positive values of
Δτ indicate that the dynamic threshold exceeds the static reference level, corresponding to a temporary reduction in detection sensitivity and an increased resistance to false positives. Conversely, negative values indicate a lowered threshold, increasing sensitivity and allowing faster responsiveness to deviations from nominal system behaviour. The dashed reference lines at ±0.10 represent ±10% relative deviation from the static threshold. These bounds serve as interpretative reference levels rather than hard constraints, illustrating the magnitude of moderate regulatory adjustments. It is important to emphasise that the vertical axis represents a relative ratio, not a percentage. Consequently, values such as ±5 correspond to ±500% proportional deviation from the static reference. The observed excursions therefore reflect substantial adaptive modulation of the decision boundary rather than minor percentage-level environmental fluctuations. The dynamics shown in the figure demonstrate that the OAP threshold control operates as an actively regulated homeostatic feedback mechanism. The adaptation is driven by the temporal structure and magnitude of prediction error and is characterised by bounded yet non-trivial modulation of the decision boundary. This behaviour indicates controlled regulatory flexibility: the system avoids both rigid static thresholding and unstable oscillatory behaviour. Overall,
Figure 28 provides empirical evidence that the OAP model maintains long-term operational stability through continuous adaptive threshold regulation, balancing sensitivity and robustness in a non-stationary industrial environment.
Figure 29 illustrates the dynamic mode switching logic of the proposed OAP framework. The parameter
denotes the relative threshold deviation, i.e., the proportional adjustment applied to the static decision threshold by the adaptive mechanism. It is important to emphasise that
Δ does not directly represent prediction error; rather, it reflects the regulatory response of the threshold control mechanism to internal drift-trigger conditions derived from temporal prediction error characteristics. The dashed red curve shows the relative threshold drift
Δ (scaled on the right vertical axis, ×10
−3), while the purple curve represents the threshold change rate. The light blue shaded regions indicate the binary dynamic mode signal (dynMode ∈ {0,1}), which specifies whether the system operates in adaptive (dynamic) mode. Switching to dynamic mode occurs when internally evaluated drift conditions exceed a predefined activation level. In the figure, most dynamic intervals appear within the sample index ranges 0–1500 and 2500–3500. The activation is intermittent rather than continuous, indicating that the adaptive control does not operate in a permanently elevated regulatory state but responds selectively to non-stationary conditions. The magnitude and temporal structure of
Δ demonstrate bounded adaptive modulation rather than uncontrolled divergence. The regulation remains within a small numerical range (note the ×10
−3 scale), confirming that threshold adjustment is fine-grained and stabilising. Overall, the figure represents the core drift-trigger mechanism of the OAP framework. The model does not rely on static thresholding; instead, prediction-error-driven internal evaluation triggers controlled threshold adaptation and intermittent transitions into dynamic operating mode. This behaviour ensures operational stability while preserving responsiveness to environmental changes. This adaptive threshold regulation and mode-switching logic fundamentally distinguish the OAP model from conventional rigid LSTM-based anomaly detection systems.
6.6. Research Results on the OAP (Organically Adaptive Predictive) ML Model
The OAP model represents a hybrid, multi-level predictive learning architecture that is capable of responding organically and adaptively to statistical and dynamical changes in the input data. Rather than learning only static patterns, the model continuously self-tunes its behaviour, thereby approximating the adaptive operation of biological and cognitive systems. The main characteristics and strengths of the OAP model can be summarised as follows. Three operating modes are distinguished: static, dynamic, and hybrid. The model can automatically switch between detecting stable, previously learned patterns (static mode) and a more sensitive operation in variable and noisy environments (dynamic mode). The hybrid mode combines the advantages of both approaches, providing enhanced robustness. The drift-trigger-based analysis enables real-time detection of changes in data distributions (data drift), allowing the model to respond in a self-adaptive manner. Such adaptation can be achieved by recalibrating detection thresholds and/or model weight parameters. By applying the Bayes–Grid-based OAP tuning procedure, the key model parameters (γ, s, h, p) can be optimised using a combination of Bayesian optimisation and grid search. This approach yields data-dependent optimal parameter settings, rather than fixed hyperparameters. The integration of rough set theory (RST) significantly enhances the interpretability of the learning process. The RST-based rule base provides logical descriptions of when and why the system transitions into dynamic operating mode. During predictive anomaly detection, the use of LSTM- and autoencoder-based forecasting enables the model not only to detect anomalies but also to anticipate their occurrence, thereby supporting proactive fault prevention.
6.7. Scientific Novelty
The OAP model introduces scientific novelty at multiple levels.
Self-regulated stability: the model not only reacts to changes in input data but actively regulates its own stability.
Organic adaptability: the self-tuning dynamic switching mechanism (static–dynamic–hybrid) follows biologically inspired adaptation patterns, thereby approximating the behaviour of autonomous systems.
Explainable adaptation (RST): the integration of rough set theory (RST) enables the model to provide rule-based feedback on when and why it transitions between operating modes, significantly enhancing the transparency and interpretability of the decision-making process.
Exploration of the OAP parameter space: the Bayes–Grid-based optimisation framework adaptively searches for optimal parameter configurations, enabling online recalibration.
6.8. Summary and Future Directions
The OAP machine learning model represents a new generation of organically evolving learning architectures capable of self-stabilization and self-correction. As a result, OAP functions not merely as a predictive detector but as a self-tuning adaptive entity that maintains its predictive performance even in dynamic and non-stationary environments. Potential future research directions include developing OAP 2.0 by integrating attention mechanisms, incorporating graph neural network (GNN)-based relational learning into the OAP framework, and introducing a Meta-OAP architecture that enables meta-adaptation across multiple models.
6.9. Concluding Remark
The OAP model constitutes an important milestone on the path toward adaptive artificial intelligence. By combining predictive capability, interpretability, and self-organizing adaptation, it establishes a new paradigm for organically learning systems.
7. Methodological Design: Hyperparameter Selection and Ablation Study
This section provides a detailed description of the methodological design of the proposed machine learning framework, with particular emphasis on the justification of model parameter selection and the evaluation of the individual contributions of the applied methods. The objective is to demonstrate, in a transparent and reproducible manner, that the selected hyperparameters, decision thresholds, and adaptive mechanisms were not determined in an ad hoc fashion but rather emerged from the integrated consideration of theoretical principles, empirical investigations, and practical industrial experience. The proposed framework comprises multiple interacting components, including time-series predictive models, optimisation procedures, and decision-level and adaptive control mechanisms. Accordingly, hyperparameter selection followed a hierarchical strategy. In the first stage, parameters ensuring model stability, learning capacity, and generalisability were determined. This was subsequently followed by configuring decision thresholds and adaptive regulation factors. During parameter selection, primary emphasis was placed on ensuring temporal stability and industrial applicability, rather than on maximising a single performance metric. The second part of the section presents an ablation study designed to assess the individual and combined effects of the various methods and components relative to a unified baseline system. The ablation analysis enables the identification of those components that contribute substantially to performance improvement, those with marginal impact, and those method combinations that may be considered redundant or unnecessarily complex. Based on these results, we justify why the final framework employs a selectively integrated, functionally complementary set of methods instead of simultaneously incorporating all examined techniques. This methodological approach ensures that the reported results represent not merely empirical performance gains but are grounded in clearly interpretable and transferable design principles. The resulting guidelines and conclusions are applicable to other industrial time-series anomaly detection tasks, thereby extending the contribution of the study beyond the specific application presented.
7.1. Model Parameter Selection
The parameters of the models (LSTM architecture, thresholding strategy, OAP configuration, drift-trigger mechanism, and optimisation frameworks) were determined using a hierarchical selection procedure grounded in four principal considerations: industrial robustness and temporal stability; detectability of rare faults assessed through precision–recall-based evaluation; resilience to non-stationary environments and concept drift; and computational feasibility with respect to real-time applicability. Parameter selection proceeded in three main stages. First, a simple baseline system (PCA–LSTM with static thresholding) was established as a reference. Second, ablation and sensitivity analyses were conducted on critical parameters, including the lookback window, number of hidden units, percentile-based threshold, drift-trigger parameter (Δ_thr), and the OAP parameters (γ, p). Finally, objective-function-driven fine-tuning was performed using PSO, simulated annealing (SA), and Bayesian optimisation within predefined, practically relevant search intervals. The LSTM architecture followed a sequence input → LSTM → fully connected → regression output structure, with nine input features and a nine-dimensional regression output vector. Hyperparameters were selected based on the joint assessment of prediction error stability and anomaly detection performance. The lookback window was chosen to capture the dominant temporal dynamics of harmonic components without introducing excessive training instability or decision latency. Multiple window sizes were evaluated, and the final value was selected based on the optimal compromise between validation prediction error and PR–AP performance. Empirical testing revealed that deeper LSTM architectures increased the risk of overfitting under drifting industrial data; consequently, a single LSTM layer was adopted and complemented by decision-level and adaptive mechanisms. The number of hidden units was selected to ensure sufficient capacity to represent nonlinear spectral dynamics while avoiding over-parameterisation; a configuration around 128 units provided a favourable balance between predictive accuracy and temporal stability. Dropout regularisation was introduced to enhance generalisation and robustness under drift conditions. Excessive dropout degraded signal tracking; therefore, a moderate level was fixed based on validation performance and detection metrics. Decision thresholds were defined using a percentile-based approach, motivated by the non-Gaussian and time-varying distribution of prediction errors under drift conditions. Sensitivity analyses and PR–AP/F1 optimisation were employed to determine the percentile value that minimised under-alarming risk while controlling false alarm rates. The OAP parameters (γ and p) were selected by exploring the parameter space using F1-based performance surfaces. The parameter γ controls adaptation strength, while p governs the application ratio of corrective mechanisms. Rather than targeting a sharp optimum, parameter values were chosen within a robust maximum region to ensure stable behaviour under drift. The drift-trigger threshold (Δ_thr) was configured to prevent excessive mode switching while ensuring responsiveness to meaningful increases in prediction error. The threshold was determined based on statistical properties of the drift signal and its empirical correlation with real fault events, ensuring that dynamic mode activations remain infrequent yet justified. The optimisation frameworks were applied in a functionally differentiated manner. PSO primarily supported stabilisation of decision dynamics, SA provided efficient and robust global exploration with limited iterations, and Bayesian optimisation enabled reproducible, data-driven hyperparameter tuning. Objective functions combined prediction error metrics (RMSE, MAE), detection metrics (PR–AP, F1), and stability indicators (decision flips, latency), ensuring that the identified optima remained meaningful and applicable within real industrial operating conditions.
7.2. Ablation Study and Analysis of Method Combinations
This section presents the methodological justification of the selected method combinations and the results of the ablation analysis. Although the proposed framework incorporates multiple correction and optimisation techniques, these components are neither mandatory nor simultaneously applied. Their selection and integration followed hierarchical and functional principles, with the objective of achieving maximal performance improvement while maintaining minimal system complexity and sustainable industrial applicability. Five principal method categories were distinguished in the analysis: data preprocessing and data-quality enhancement techniques (with particular emphasis on Rough Set Theory (RST)-based attribute reduction and rule-based correction), predictive models (LSTM, PCA–LSTM), optimisation methods (genetic algorithm, PSO, Bayesian optimisation), decision-level correction mechanisms (static, dynamic, and adaptive thresholding), and the Organically Adaptive Predictive (OAP) framework. Within this structure, RST is not employed as a conventional class-balancing technique, but rather as a mechanism for refining the decision information space by eliminating redundant attributes and stabilising decision rules associated with rare fault conditions. This approach is especially advantageous in industrial environments, as it avoids artificial sample generation and preserves the physical interpretability of measured data. The impact of each component was evaluated using an ablation framework in which individual contributions were assessed both in isolation and in combination. The baseline system consisted of an RST-pre-processed PCA–LSTM predictive model with static thresholding. The evaluation proceeded in a staged manner: first, the individual application of optimisation techniques (GA, PSO, Bayesian optimisation) was examined; second, dynamic thresholding was incorporated at the decision level; finally, the full integration of the OAP adaptive mechanism was implemented and analysed. The ablation results demonstrated that the methods do not contribute equally to performance enhancement. Certain optimisation algorithms exhibited specific, targeted effects: the genetic algorithm did not yield statistically significant improvement within the explored search space, whereas PSO primarily enhanced temporal stability rather than classical performance metrics. Bayesian optimisation provided efficient and reproducible hyperparameter tuning, although, when applied in isolation, it resulted in comparatively conservative decision behaviour. The ablation analysis clearly confirmed that the greatest and most stable performance gains were achieved by combining RST-based data refinement, prediction-error-driven decision logic, dynamic thresholding, and OAP adaptive regulation. This finding justifies why the final framework does not integrate all investigated techniques simultaneously, but instead employs a selectively chosen, functionally complementary combination. Accordingly, the presented methods should be regarded not as elements that must be co-deployed, but as components of a validated methodological toolkit. RST contributes to decision-space stabilisation and data-quality enhancement, while the final system incorporates only those elements that provide measurable and long-term stable performance improvements in industrial operating conditions.
9. Results and Conclusions
The combined application of correction functions and refinement algorithms can significantly reduce the rates of false-positive and false-negative alarms. Based on ROC curves and F1-score metrics, it can be clearly demonstrated that calibration and ensemble-based approaches are particularly effective in improving predictive performance. However, computational costs may increase substantially, especially when genetic algorithms (GA) or AutoML techniques are applied, which may limit real-time deployment under certain conditions. Based on the results obtained, it can be concluded that predictive maintenance can be effectively realised using electrical harmonic data. Through anomaly detection, harmonic measurements enable the identification of deviations from the baseline, fault-free operating state of industrial equipment. The effectiveness of the detection process naturally depends on how appropriately the machine learning model has been designed, configured, and trained. This aspect is crucial to ensure that notifications are triggered exclusively by practically relevant and physically meaningful state changes. In this study, an Organically Adaptive Predictive (OAP) machine learning model has been developed, representing a self-tuning, explainable, and robust hybrid system. The model is capable of maintaining predictive homeostasis by adapting to statistical and structural changes in the operating environment. The OAP concept introduces a learning paradigm inspired by biological systems, in which the model not only detects anomalies but also adapts organically and provides predictive insight. Within the broader ecology of learning systems, the OAP model represents a paradigm in which the system does not merely react to deviations but evolves through self-organising adaptive mechanisms. The proposed framework integrates data drift analysis, Bayesian parameter tuning, RST-based rule systems, and LSTM-based learning into a unified adaptive architecture. Importantly, the principal contribution of this work lies not in prescribing fixed hyperparameter values or universally optimal configurations, but in demonstrating a hierarchical and stability-oriented design strategy. This strategy emphasises decision-level adaptivity, robustness under non-stationary industrial conditions, and selective method integration validated through ablation analysis. Such a design philosophy is transferable to other industrial anomaly detection contexts, particularly where rare events, concept drift, and safety-critical decision requirements must be addressed in a controlled and interpretable manner. The combined application of correction functions and refinement algorithms can significantly reduce the rates of false-positive and false-negative alarms. Based on ROC curves and F1-score metrics, it is clear that calibration and ensemble-based approaches are particularly effective in improving predictive performance. However, computational costs may increase substantially, especially when genetic algorithms (GA) or AutoML techniques are applied, which can limit real-time deployment under certain conditions. Based on the results obtained so far, it can be concluded that predictive maintenance can be effectively achieved using electrical harmonic data. Through anomaly detection, harmonic measurements enable the identification of deviations from the baseline, fault-free operating state of industrial equipment. The effectiveness of the detection process naturally depends on how successfully the machine learning model has been designed and trained. This aspect is crucial to ensure that notifications are triggered only by practically relevant and physically meaningful state changes. In this study, an Organically Adaptive Predictive (OAP) machine learning model has been developed, representing a self-tuning, explainable, and robust hybrid system. The model is capable of maintaining predictive homeostasis by adapting to statistical and structural changes in the operating environment. The OAP concept introduces a novel learning paradigm inspired by living systems, in which the model not only detects anomalies but also adapts organically and provides predictive insight. Within the ecology of learning systems, the OAP model represents a new paradigm in which the system does not merely react but evolves through self-organizing mechanisms. The proposed framework integrates data drift analysis, Bayesian parameter tuning, RST-based rule systems, and LSTM-based learning into a unified adaptive architecture. In the following sections, the experimental results are presented and discussed in detail.
9.1. Industrial Validation and Identified Limitations
The proposed anomaly detection framework was validated using time-synchronised real industrial measurement data. Predicted anomalies and potential failure indicators derived from electrical harmonic measurements were compared within identical time intervals to actual machine states, maintenance interventions, and documented failure events recorded in the enterprise resource planning (ERP) system. Consequently, validation was performed under real operational conditions rather than in a laboratory environment, reflecting genuine industrial load variations and network dynamics. The time-aligned comparison enabled assessment of the extent to which model-generated alerts preceded or coincided with actual fault events. The results confirmed that variations in the harmonic spectrum frequently correlated with documented changes in equipment condition and recorded failures. However, the validation process also revealed several inconsistencies, including under-alarms (missed detections) and over-alarms (non-relevant warnings). A primary limitation of the validation arises from the slow and gradual degradation mechanisms typical of industrial machinery. Several failure modes—particularly those related to mechanical wear or thermal ageing—develop over extended time horizons, requiring long-term monitoring for statistically reliable evaluation. A further constraint is that deliberate fault simulation in live production environments is practically infeasible due to economic and operational risks. As a result, validation was conducted predominantly in a retrospective manner, examining whether predictive indications had been generated prior to documented failure events. The validation findings further highlighted that external network disturbances and electrical perturbations may induce harmonic spectral variations unrelated to the internal condition of the investigated equipment. Such phenomena constitute a major source of false-positive alarms and represent an inherent limitation of purely harmonic-based diagnostic approaches. In response to these observations, an ongoing research direction focuses on distinguishing network-induced disturbances from equipment-related harmonic signatures. The current investigation explores the application of EWMA-based filtering techniques and severity-derivative indicators derived from harmonic components. Preliminary results are promising, suggesting that the proposed filtering methodology can substantially reduce false alarms while preserving anomaly sensitivity. The detailed findings will be reported in a dedicated subsequent publication. In summary, although real industrial validation entails practical constraints, harmonic-based anomaly detection has demonstrated operational feasibility and meaningful predictive capability. Future developments aim to further enhance robustness and improve selective sensitivity to condition-related harmonic deviations while mitigating the influence of external network disturbances.
9.2. Investigation of Electrical Harmonics for Anomaly Detection
The results of this chapter clearly demonstrate that time-domain measurements of electrical harmonics, when organised into a properly structured database, are well suited for condition monitoring and anomaly detection of industrial equipment, and even for forecasting potential fault types (e.g., OL, SC). Data collected using a power quality analyser—including total harmonic distortion (THD), DC components, the fundamental harmonic, and higher-order harmonic amplitudes—provides a high-resolution operational fingerprint of the monitored equipment. Similar to vibration diagnostics, but in a more general sense, these measurements enable early indication of electrical, electronic, and indirectly mechanical faults. The before–after measurement concept—consisting of baseline measurements obtained after major maintenance in a fault-free state followed by continuous online monitoring—aligns well with current international trends in predictive maintenance and provides a practically feasible framework for LSTM- and AutoML-based models in real industrial environments. However, detailed model evaluation reveals that a raw LSTM model combined with a fixed threshold is not sufficient on its own to achieve robust harmonic-based anomaly detection. High degradation indices (Dd ≈ 15.87% and 17.76%), persistent underestimation of predictions, strong signal non-stationarity, and the quasi one-dimensional, collapsed representations observed in the PCA of the hidden states are all indicative of underfitting and insufficient model capacity. Due to the fixed and overly conservative threshold, the model detects very few anomalies, even though the prediction errors exhibit locally extreme values. This behaviour corresponds to classical under-alarming, which is particularly risky in the context of predictive maintenance. During the testing phase, the strong segment dependence of performance—good fitting in simpler operating regions and severe errors in more complex segments—further indicates that the structure of electrical harmonic signals is heterogeneous. The current model is therefore unable to learn appropriate internal representations for all operating modes. From a scientific perspective, the key conclusions of this chapter can be summarised as follows. The electrical harmonic spectrum itself constitutes a suitable carrier of condition and fault-related information and can serve as an effective input for various machine learning models (e.g., LSTM, decision trees, SVM, random forests, and boosting methods), provided that the measurements are sufficiently structured and properly labelled (e.g., Alarm Type, Fault Type). The baseline LSTM model underfits in both the training and test phases. The overly smooth, capacity-limited internal representation, along with very high RMSRE and degradation metrics, indicates that the model architecture, regularisation strategy, and both the quantity and diversity of training data must be expanded. Fixed, static thresholding is not compatible with the strongly non-stationary nature of harmonic data, as it leads to conservative behaviour with few detected anomalies despite locally high prediction errors. This justifies the introduction of dynamic, data-dependent thresholds, such as those based on Gaussian mixture models (GMM), quantiles, or rough set theory (RST). For the successful realisation of a harmonic-based predictive maintenance (PdM) system, several elements are essential: systematic simulation of fault types, the availability of large volumes of properly labelled training data, careful feature engineering (including derivatives, moving averages, frequency-domain features, and PCA-based features), handling of class imbalance (e.g., using SMOTE), and the selection of model architectures and thresholding strategies that simultaneously minimise the risks of both over-alarming and under-alarming.
Overall, this chapter demonstrates that the use of electrical harmonics for predictive maintenance purposes is scientifically well-founded and highly promising. However, reliable anomaly detection in practical applications can only be achieved if the entire modelling pipeline—including data acquisition, labelling, model architecture, and threshold strategy—is adaptively aligned with the non-stationary and drifting nature of real industrial environments.
9.3. PCA—Principal Component Analysis
The application of Principal Component Analysis (PCA) to the investigated electrical harmonic data clearly demonstrates that the input feature space is redundant and highly correlated and can be accurately represented by only a small number of principal components. Eigenvalue analysis shows that the first five principal components explain 98.053% of the total variance, indicating that a substantial portion of the original high-dimensional input space is redundant in terms of information content. This characteristic provides strong methodological justification for the use of PCA as a deterministic preprocessing step for dimensionality reduction, noise suppression, and structural feature extraction, particularly in systems where excessive dimensionality may lead to overfitting, instability, or increased computational burden in learning models. Although PCA is neither adaptive nor driven by a task-specific objective function, its role in the present research is nevertheless significant. Using principal components, the data’s latent structure became more transparent. In both GMM- and LSTM-based analyses, the two-dimensional PCA projection clearly revealed class overlaps as well as the internal organization of the harmonic data. The geometric relationships uncovered by PCA further provided scientific evidence that the Alarm Type categories do not form well-separated Gaussian regions, which explains the limited discriminative performance of unsupervised methods such as Gaussian Mixture Models (GMM).
Overall, PCA served as a critical dimensionality-reduction and noise-reduction technique within the system, enabling more stable, robust, and computationally efficient learning models while simultaneously improving the interpretability of the underlying data structures. From a scientific perspective, the PCA results support the application of further hybrid approaches (e.g., PCA + LSTM, PCA + GMM, PCA + Bayesian methods) and demonstrate that the electrical harmonic dataset exhibits a strongly low-dimensional manifold structure. This property makes the data highly compressible and particularly well-suited for subsequent model development and learning tasks.
9.4. GMM—Gaussian Mixture Model
The application of Gaussian Mixture Models (GMM) to the electrical harmonic dataset revealed that the statistical structure of the data is not aligned with the labelling logic of the three Alarm Type categories (OK–Warning–Error). Analysis performed in the PCA-reduced space, together with the resulting confusion matrices, indicates substantial overlap between the OK and Warning classes. Moreover, the Error class does not form a single compact, Gaussian-like cluster; instead, it appears as several smaller, dispersed subsets. Consequently, the individual GMM components cannot be unambiguously mapped to the Alarm Type classes. OK samples are distributed across multiple GMM clusters, the Warning class is only partially separable, and Error samples predominantly fall within regions associated with the OK and Warning classes. Quantitatively, the performance of the GMM is severely limited: the recall for the Error class is effectively 0%, while both recall and precision values for the OK and Warning classes remain below 30%. These results indicate that, in this context, GMM is not suitable as a reliable standalone anomaly detector, nor as a primary decision-making or alarm classification model. Despite these limitations, the GMM provides valuable insight into the underlying distributional structure of the data. In particular, it highlights that the Alarm Type labels do not correspond to well-separated Gaussian regions; rather, they form a continuum with gradual transitions between the OK, Warning, and Error states. Overall, in this application, GMM should be regarded primarily as a diagnostic and structure-exploration tool rather than a decision model. From a scientific perspective, these findings substantiate the significant mismatch between distribution-based unsupervised clustering and label-driven supervised alarm logic, explaining why GMM alone cannot reproduce the Alarm Type categories. The primary value of the model lies in its ability to reveal the internal geometry of the data, which can subsequently serve as a foundation for hybrid supervised–unsupervised approaches, such as LSTM + GMM architectures or Bayesian-weighted ensemble methods.
9.5. GA—Genetic Algorithm
The application of a genetic algorithm (GA) to the LSTM-based anomaly detection system trained on electrical harmonic data revealed that, although GA is theoretically well suited for the global exploration of nonlinear and complex search spaces, it did not prove to be an effective optimisation method in the specific experimental setting investigated in this study. Based on the iteration behaviour, the GA converged very early—practically within the first few generations—to a local optimum, from which neither crossover nor mutation operators were able to escape. The fitness function values stagnated around 0.05765, and the population rapidly lost diversity. This behaviour indicates either a very flat search space or a weak influence of the optimised parameters on the model’s performance. As a result, the GA-based optimisation did not yield any noticeable improvement in either the predictive accuracy or the anomaly detection performance of the LSTM model. From the observed convergence dynamics, it can be concluded that, in its current configuration, the GA was unable to exploit its fundamental evolutionary advantages, such as effective global search, maintenance of population diversity, and robust tracking of optimal solutions. The stagnating evolution, extended stall intervals, and population homogenization suggest that the method is highly sensitive to parameter settings and to the scaling of the fitness function. In this task, the fitness landscape did not provide sufficient discriminative information to guide the algorithm toward a true optimum. Based on these findings, GA cannot be considered a suitable standalone choice for hyperparameter optimisation in LSTM-based anomaly detection tasks within this application context. While its performance could potentially be improved through fitness function normalization, redesign of the search space, substantial increases in population size, or the introduction of more aggressive mutation strategies, alternative optimisation methods—such as simulated annealing (SA), Bayesian optimisation, or sequential Monte Carlo (SMC)-based control mechanisms—proved to be significantly more effective for the problem at hand. This superiority was particularly evident in terms of convergence speed, robustness, and temporal stability. These results suggest that GA tends to perform well only in well-structured search spaces with carefully scaled fitness functions, whereas for complex industrial time-series prediction and anomaly detection problems, other metaheuristic or statistical optimisation techniques offer more reliable and efficient solutions.
9.6. PSO—Particle Swarm Optimisation
Based on the analysis, the application of Particle Swarm Optimisation (PSO) to the anomaly detection model operating on electrical harmonic data primarily yielded improvements in temporal decision stability rather than classical accuracy metrics. A comparison between the baseline and PSO-optimised models shows that the main detection performance indicators (e.g., recall and false positive rate, FPR) remained practically unchanged. This indicates that PSO neither degraded the model’s ability to detect anomalies nor increased the false alarm rate. In contrast, the impact of PSO on decision stability is particularly pronounced. The number of output class transitions (decision flips) was reduced from 434 to 66, representing an improvement of more than sixfold. This clearly indicates that the PSO-tuned model is significantly less sensitive to noise, exhibits fewer impulsive transitions between OK and ANOMALY states, and maintains its decisions much more consistently over time. In industrial environments with noisy measurement systems, this property is especially important, as it reduces unnecessary alarms, enhances operator trust, and supports reliable, continuous condition monitoring. Overall, in this application, PSO proved effective not in significantly improving conventional error metrics (e.g., RMSE or recall), but in enhancing the quality of the decision dynamics. Therefore, PSO is particularly well-suited for fine-tuning anomaly detection systems where the objective extends beyond maximising accuracy to achieving robust, noise-tolerant, and temporally stable alarm behaviour. Nevertheless, PSO parameterisation—such as inertia weight and acceleration coefficients—remains critical, and convergence to the global optimum cannot be guaranteed. Consequently, a promising direction for future research lies in combining PSO with other metaheuristic approaches (e.g., simulated annealing or Bayesian optimisation) or with rule-based stability constraints, such as those derived from sliding mode control (SMC) or linear matrix inequalities (LMI).
9.7. SA—Simulated Annealing
Simulated Annealing (SA) is a stochastic global optimisation algorithm inspired by the physical annealing process, whose objective is to approximate the global minimum of a given cost function while avoiding entrapment in local minima. During the initial high-temperature phase, the algorithm accepts not only better solutions but also, with a certain probability, worse ones, thereby enabling intensive and exploratory global search. As the temperature is gradually reduced, the system becomes increasingly deterministic, and preference for better solutions dominates, leading to convergence toward a narrow, quasi-optimal region of the parameter space. The main advantages of SA include its derivative-free nature and its robustness as a global search method, making it applicable even to complex, non-convex objective functions. Its primary drawbacks are potentially slow convergence and sensitivity to the cooling schedule. In this research, SA was applied to the hyperparameter optimisation of an LSTM-based reconstruction and anomaly detection model trained on electrical harmonic data. The concept of weights appears in multiple forms within the optimisation process: as priorities in the objective function and as scaling factors for parameter perturbations. In the specific implementation, the total harmonic distortion (THD) values associated with individual harmonic orders were used as weights, as THD metrics closely track variations in harmonic amplitudes. The iterative optimisation curves indicate that SA performs strong stochastic exploration of the parameter space during the initial iterations, followed by rapid convergence. By the 5th–6th iteration, the algorithm achieves the lowest root mean square error (RMSE), improving from 0.0725 to 0.0645. After this point, only small oscillations are observed around a stable optimal region. The THD metric stabilises at a low level in synchrony with the RMSE curve, indicating that the hidden states of the network learn physically consistent representations and that there is no contradiction between statistical and physical performance indicators.
Overall, the SA-tuned LSTM model achieved low and stable reconstruction error, a robust error distribution, and well-separated anomalies. These results demonstrate that SA is particularly well-suited for anomaly detection tasks based on electrical harmonic data, where both statistical robustness and physical consistency are required.
9.8. Bayesian Weighting
The investigated Bayesian weighting framework demonstrated numerically successful optimisation behaviour. Bayesian optimisation converged rapidly and stably to a well-defined minimum (objective ≈ 0.058), while the decision threshold (approximately the 90th percentile), the number of clusters (k = 6), and the harmonic weights (w3 ≈ 0.03, w5 ≈ 0.37) settled into a robust and reproducible parameter range. The resulting model produced well-structured latent representations characterised by high Silhouette scores, indicating favourable cluster separability. The absolute error metrics of the LSTM-based predictions (RMSE and MAE) were particularly strong, and the precision of anomaly detection was also high, meaning that the majority of raised alarms corresponded to genuinely faulty operating states. At the same time, the current Bayesian weighting configuration resulted in overly conservative behaviour. The extremely low recall indicates that most anomalies remain undetected, leading to an under-alarming system that poses a potential risk in predictive maintenance applications. The instability of the MAPE metric—caused by near-zero denominator values—further distorts the performance assessment, despite the low absolute prediction errors. In its present form, Bayesian weighting therefore provides an excellent predictive and clustering structure; however, its practical anomaly detection performance is strongly limited by overly strict thresholding and class imbalance effects. From a future development perspective, recalibration of the decision threshold using percentile-based strategies (e.g., adaptive thresholding in the 90–95% range guided by Silhouette metrics) is warranted. To increase recall, a larger volume of anomaly-class training data and/or SMOTE-based class balancing will be required, along with the integration of SMC- and RST-based rules into the decision logic. Additionally, the use of more stable error metrics (such as RMSE and MAE) instead of MAPE is recommended for model evaluation. Overall, Bayesian weighting is a promising, theoretically well-founded component of the proposed hybrid machine learning system. However, improving anomaly sensitivity and refining the thresholding strategy are essential steps toward enhancing its practical applicability.
9.9. Combination of LSTM, GA, PCA, GMM, SPE/Q-Statistic, Hotelling’s T2, and Bayesian Weighting
The results of this chapter clearly demonstrate that integrating diverse statistical, machine learning, and optimisation methods provides significant advantages in both predictive accuracy and anomaly detection effectiveness. The hybrid approach enables individual components—such as LSTM, PCA, GMM, GA, and Bayesian weighting—to compensate for each other’s limitations. Genetic algorithm (GA)-based hyperparameter optimisation supports stable learning behaviour, while PCA improves predictive performance by providing a cleaner, more compact signal space. GMM and statistical monitoring tools, including SPE/Q-statistics and Hotelling’s T2, enhance the detection of structural and local deviations within the data. The adaptive model integration achieved through Bayesian weighting further increases system robustness, although it requires careful prior selection.
Overall, the complete hybrid system clearly outperforms the simple baseline model. Across predictive, classification, and clustering metrics, the resulting framework exhibits a more refined, reliable, and better-generalising structure. Silhouette analysis indicates that the system’s latent space is well-structured; however, optimising the anomaly detection threshold remains a critical factor. Based on the obtained results, a threshold level in the range of approximately 93–94% provides an optimal balance between cluster separability and the maintenance of adequate recall. In summary, the presented integrated methodology offers a scientifically well-founded, robust, and scalable framework for analysing complex industrial time series. It represents a promising direction for future research toward the development of multi-model, adaptive anomaly detection systems.
9.10. Study of the Organically Adaptive Predictive (OAP) ML Model
The OAP machine learning model is a hybrid, multi-level predictive architecture that applies the principle of biological homeostasis to the domain of machine learning. This approach offers a novel solution for handling concept drift and non-stationarity in electrical harmonic data. Rather than functioning as a static classifier, the model operates as a self-regulating and self-correcting system. Transitions between static, dynamic, and hybrid operating modes are governed by drift-trigger analysis, based on temporal changes in prediction errors and data distributions. Through this mechanism, the OAP model can preserve learned, stable operating regimes while adaptively responding to environmental changes, representing a substantial advancement over conventional LSTM-based anomaly detectors that employ fixed thresholds. The integration of Bayes–Grid-based OAP tuning and RST-based rule-driven feedback ensures that model parameters (γ, s, h, p) are not configured statically, but are instead optimised in a data-dependent manner, while maintaining interpretability of system behaviour. Analysis of the F1-score parameter space indicates that the OAP model achieves its best performance in the region around γ ≈ 1 and p ≈ 95%. This observation supports the model’s nonlinear, drift-sensitive nature and justifies the need for a Bayes–Grid search for effective parameter optimisation. The homeostatic formalism and the associated adaptation equations mathematically constrain the model to intervene only when anomaly energy increases persistently or abruptly, thereby preventing unnecessary or unstable reconfigurations. Empirical evaluation using real industrial electrical harmonic data confirms the practical validity of the proposed concept. The static detector exhibits extreme under-sensitivity, classifying nearly all observations as normal, whereas the dynamic and hybrid OAP modes detect 36–38 percentage points more anomalies on the same dataset while maintaining acceptable precision. Although the ROC–AUC values remain close to 0.5—primarily due to the limited dynamic range of the error signal—the precision–recall curves and average precision values (AP > 0.7) demonstrate that the dynamic and hybrid detectors provide consistent and practically useful anomaly detection performance. The temporal behaviour of the dynamic threshold (τ_dyn) further reflects adaptive regulation around the static threshold, with an average offset of approximately 10%, reinforcing the conclusion that the OAP model organically adapts to current load conditions and noise characteristics. Overall, the OAP ML model represents a new generation of organically adaptive learning architectures from a scientific perspective, integrating predictive homeostasis, drift-sensitive threshold regulation, Bayes–Grid hyperparameter optimisation, and RST-based explainability within a unified framework. By extending beyond classical LSTM-based anomaly detectors, the OAP model not only generates predictions but actively regulates its own stability and sensitivity, thereby approximating the self-organising and self-balancing behaviour of biological systems. Based on the presented results, the OAP model defines a promising research direction for the development of self-adaptive, explainable, and organically inspired artificial intelligence systems.
9.11. Industrial Implementation, Computational Complexity and Energy Considerations
The real-time industrial deployment of the proposed methodology is currently under implementation. The objective is to establish an online analytical architecture capable of continuously processing electrical harmonic data, executing predictive inference, and activating decision-level and adaptive mechanisms under industrial operating conditions. The computational platform is based on an industrial-grade PC designed for continuous operation and high reliability. On the measurement side, the selection of harmonic analysers is in progress, with particular attention to sampling frequency, spectral resolution, and interface compatibility requirements. At present, the principal implementation challenge concerns data transmission. Due to the volume and temporal resolution of the harmonic spectral data, a LoRaWAN-based wireless solution proved inadequate, primarily because of bandwidth limitations. A wired LAN-based infrastructure would technically support the required data throughput; however, deployment is currently subject to corporate IT security policies and is undergoing formal approval procedures. As a temporary solution during the pilot phase, data are transferred via physical storage media to the analysis workstation. The online evaluation is executed within a Linux environment using a Python-based runtime framework. The machine learning model was originally developed and validated in MATLAB and subsequently exported to the ONNX format, enabling platform-independent deployment. The ONNX model is loaded and executed in Python for online inference, and the evaluation results are displayed through a graphical monitoring interface. From a computational complexity perspective, the dominant component of the PCA–LSTM–GMM pipeline is the LSTM-based predictive module. During model design, explicit emphasis was placed on constraining architectural complexity to ensure deterministic execution on an industrial PC without GPU acceleration. The optimisation and adaptive mechanisms implemented in the OAP framework do not perform exhaustive hyperparameter searches during runtime; rather, they apply local corrective adjustments and rule-based refinements. Consequently, real-time computational overhead remains bounded and compatible with edge-level deployment scenarios. Energy consumption constitutes an important consideration in industrial contexts, particularly in decentralised or edge-computing architectures. The current implementation operates on a mains-powered industrial PC, where energy constraints are not limiting. Nevertheless, the model architecture and computational profile have been designed with sufficient efficiency to allow potential migration to lower-power embedded industrial platforms in future developments.
In summary, real-time deployment of the proposed framework is technically feasible. The primary practical constraint currently lies in secure and high-throughput data transmission within the industrial network infrastructure. Computational complexity and energy efficiency have been explicitly considered during system design to ensure scalability, robustness, and long-term industrial applicability.
9.12. Industrial Implementation and Integration Considerations
The practical applicability of the proposed method has been investigated within an edge-based real-time architectural framework. The computational backbone of the system is provided by industrially deployable, low-power embedded platforms (Raspberry Pi and NVIDIA Jetson reComputer J4012), capable of processing electrical harmonic data streams acquired from installed network analysers. The machine learning model was originally developed and validated in the MATLAB environment and subsequently exported in ONNX format to enable deployment within a Python-based inference pipeline operating under a Linux system. This configuration allows real-time online evaluation of harmonic data and continuous anomaly detection directly at the edge level. Measurement data from online power quality analysers installed at the monitored equipment are processed locally. The system provides real-time visual feedback regarding detected anomalies and predicted risk states via a dedicated monitoring interface. In accordance with industrial operational practices, role-based access control has been implemented to ensure that operators, maintenance personnel, and supervisory staff receive appropriately filtered and context-specific information. Alarm signalling can be realised through multiple channels, including visual indicators (warning lights, signal beacons), as well as digital communication pathways such as email notifications, SMS alerts, or integration into enterprise resource planning (ERP) and maintenance management systems. Despite the technical feasibility of deployment, several integration challenges have been identified. These include limitations of data transmission infrastructure, corporate cybersecurity policies (e.g., network segmentation and firewall constraints), optimisation of computational and energy resources on embedded hardware, and ensuring interoperability with existing supervisory control, maintenance, and enterprise systems. Furthermore, non-stationary operational conditions may necessitate periodic recalibration or adaptive refinement of the deployed models.
Overall, the proposed framework is technically transferable to industrial environments through an edge-computing architecture. However, successful integration requires not only algorithmic robustness but also careful consideration of infrastructural constraints, cybersecurity regulations, and organisational processes.