1. Introduction
Industrial production systems increasingly depend on continuous health monitoring to ensure reliability, safety, and energy efficiency [
1,
2,
3]. Conventional approaches rely on additional vibration, temperature, or acoustic sensors, which require physical access to equipment and increase maintenance costs. In contrast, electrical parameters such as current, voltage, and power factor are readily available from standard industrial metering systems and inherently reflect the electromechanical behaviour of machines. This makes electrical parameter-based monitoring a practical non-intrusive alternative [
4].
The transition toward Industry 4.0 has placed significant emphasis on the digital twin concept and predictive maintenance (PdM) as key drivers for operational excellence. In energy-intensive sectors, such as plastic processing, unplanned downtime can lead to substantial financial losses due to wasted raw materials and disrupted production cycles. While vibration and acoustic monitoring remain the gold standards for mechanical diagnostics, their deployment across entire factories is often hindered by the high cost of specialized sensors and the complexity of data synchronization. Consequently, there is a growing industrial demand for “lean” monitoring solutions that leverage pre-existing infrastructure. Monitoring electrical consumption patterns offers a dual advantage: it serves as a non-intrusive proxy for mechanical health while simultaneously providing insights into energy efficiency, aligning technical maintenance with corporate sustainability goals.
However, traditional Electrical Signature Analysis (ESA) and its derivatives are primarily designed for steady-state operation. Their performance decreases under variable loads, transient regimes, or multi-mode production cycles, which are common in modern industrial environments [
5,
6,
7]. Time–frequency and wavelet-based extensions improve sensitivity to transient events but require careful parameter tuning and still do not explicitly account for distinct operating regimes [
8,
9]. Likewise, recent machine learning approaches—although capable of modelling nonlinear relationships in electrical data—typically depend on labelled fault datasets and often ignore the influence of the operating state, resulting in reduced robustness and a high rate of false alarms [
1,
10].
Industrial machines rarely operate in a single uniform mode; they cycle through idle, heating, and production phases, each with its own characteristic electrical signature [
7,
10]. Global anomaly detection methods that do not consider these states can misinterpret normal load-driven variability as degradation [
1,
10]. This highlights the need for a state-aware, data-driven framework that evaluates machine behaviour relative to the appropriate operating context.
Motivated by these limitations, this work investigates whether state-aware modelling based solely on routinely measured electrical parameters can provide a reliable, fully non-intrusive health assessment without requiring explicit fault annotations [
1,
7,
10].
To address this, we propose a unified state-based predictive framework that integrates the following:
By relying exclusively on standard electrical quantities—active power, current, voltage, and power factor—the framework ensures low deployment cost, high scalability, and compatibility with existing metering infrastructure [
1,
10]. The approach is consistent with ISO 17359 and ISO 13379 guidelines for data-driven condition monitoring and is applicable across diverse industrial assets [
11,
12,
13].
The novelty of this work does not lie in the individual components, such as state recognition using XGBoost or fleet-level aggregation, which are well established in the literature. Rather, the novelty arises from the integration of these components into a unified framework that operates under strict industrial constraints. This system is designed to work in zero-label environments, where fault data are not explicitly available. By synergistically combining operating-state recognition with per-state regression and Health Index computation, the framework can continuously monitor machine health without relying on fault-specific labels or additional sensors. The integration of these components provides a scalable, cost-effective solution that meets the real-world needs of Industry 4.0 applications.
2. Literature Review
To maintain focus and relevance, the review in this section concentrates on diagnostic methods that operate on electrical parameters and are directly comparable to the state-aware, regression-based framework developed in this work. Recent studies [
1,
10] further emphasize the growing shift toward electrical signature-based monitoring due to its non-intrusiveness and compatibility with existing metering systems. Broader fault detection techniques that rely on vibration, acoustic analysis, or high-frequency waveform sensing are therefore mentioned only insofar as they relate to electrical signature-based monitoring.
Electrical parameter-based diagnostics have evolved from traditional frequency-domain analysis to hybrid, data-driven models that combine physical interpretability with machine learning flexibility [
5,
7]. This evolution reflects the growing need for scalable, non-intrusive, and intelligent condition monitoring in industrial systems [
1,
10]. The following subsections summarize the main methodological directions—frequency-domain ESA, time–frequency analysis, data-driven modelling, and state-aware hybrid frameworks—and outline current gaps that motivate the present work.
2.1. Frequency-Domain Electrical Signature Analysis
Early developments in Electrical Signature Analysis (ESA) relied on frequency-domain methods that examine the spectral content of stator currents and voltages to detect characteristic fault signatures. Glowacz et al. [
4] first demonstrated that broken rotor bars produce sidebands in the current spectrum, while Benbouzid [
5] and Bhosinak et al. [
6] extended this principle to eccentricity and bearing defects. These approaches remain valued for their clear physical interpretability—each harmonic relates to a specific mechanical or electrical phenomenon—making fault identification intuitive for practitioners [
3].
However, frequency-domain ESA assumes steady-state operation and constant load. Pohakar et al. [
14] developed non-invasive detection of broken bars under nominal conditions, and Hassan et al. [
15] refined spectral filtering for better accuracy, yet the approach remains sensitive to load variation, noise, and torque transients. As modern machines operate under flexible duty cycles, pure spectral analysis loses reliability.
2.2. Time–Frequency and Transient Analysis
To overcome the steady-state limitation, time–frequency techniques were introduced. Douglas et al. [
16] used the Short-Time Fourier Transform (STFT) to trace evolving fault frequencies during acceleration and deceleration, while Antonino-Daviu et al. [
8] demonstrated that wavelet transforms isolate transient events such as rotor bar breakage with improved time localization. Later research employed Empirical Mode Decomposition (EMD) and Hilbert–Huang transforms [
17], adaptively extracting fault-related oscillations.
Although these techniques increased sensitivity to transient faults, their effectiveness depends strongly on signal quality and parameter tuning (window size, mother wavelet, decomposition thresholds) [
7,
8,
17]. In real industrial power systems, switching harmonics and voltage disturbances often obscure the informative bands. Thus, time–frequency ESA improves temporal insight but offers limited robustness and reproducibility [
7,
17].
2.3. Machine Learning and Data-Driven Diagnostics
With the expansion of high-frequency metering and affordable computing, machine learning (ML) approaches became central to electrical condition monitoring. Chisedzi and Muteba [
18] employed ML models to detect rotor bar faults from statistical electrical features, while Chen et al. [
19] showed that ensemble classifiers outperform static thresholds. Gradient-boosted trees, particularly XGBoost [
20], proved effective for modelling nonlinear dependencies in noisy, tabular industrial data.
Deep learning has further advanced ESA capabilities. Convolutional (CNN) and recurrent (LSTM) neural networks learn discriminative patterns directly from raw current or voltage signals [
21,
22], enabling fully automated diagnostics. Artigao et al. [
22] demonstrated this for doubly fed induction generators in wind turbines. However, deep models demand extensive labelled datasets and lack interpretability—an obstacle for explainable maintenance applications [
18,
19]. Consequently, although supervised ML offers strong predictive performance, it suffers from scarce labelled faults, limited transferability, and low transparency in industrial contexts.
2.4. Unsupervised and Hybrid Learning Pipelines
To mitigate label scarcity, research has increasingly explored unsupervised and hybrid frameworks. Clustering algorithms such as K-Means, Gaussian Mixture Models (GMMs), and Self-Organizing Maps (SOMs) segment electrical data into operating regimes or anomaly clusters [
3,
10]. Cocca et al. [
10] demonstrated such an ESA–ML pipeline for CNC machines, identifying process-dependent regimes and anomalies without prior labels.
While promising, clustering-based approaches remain sensitive to initialization and often assume linear separability in feature space. Consequently, hybrid pipelines have emerged that first classify operational states and then perform diagnostics within each state, reducing false alarms [
1,
21]. However, many still treat regime recognition as a separate preprocessing step rather than an integral part of the health estimation model.
2.5. Regression-Based Modelling and Residual Health Metrics
An alternative strand of research models the expected electrical behaviour and uses residual deviations as indicators of degradation. Chen et al. [
19] predicted total active power from correlated features and analyzed deviations ΔP = P_total −
_total to infer efficiency losses. Artigao et al. [
22] applied similar residual analysis to renewable energy assets. Regression-based ESA enables continuous, quantitative health assessment instead of binary classification.
However, when trained on global data across mixed regimes, such models conflate normal load variation with degradation. This motivates state-normalized residuals, where deviations are evaluated within comparable operating conditions [
10,
22].
2.6. State-Aware ESA, XGBoost Rationale, and Research Gap
Recent developments combine regime recognition and residual modelling into state-aware ESA frameworks, enabling contextual health evaluation. Gradient-boosted trees are particularly well suited for this task because they capture nonlinear feature interactions, handle collinearity, and provide built-in feature importance metrics that enhance interpretability [
10].
The choice of XGBoost in this study is deliberate. Compared to Random Forests, XGBoost allows fine control over bias–variance trade-offs via learning-rate and regularization parameters. In contrast to deep neural networks, it performs efficiently on tabular industrial data, generalizes well from small to moderate datasets, and operates without labelled fault examples. Its transparency—feature importance and residual interpretation—ensures physical explainability, while its scalability and low computational demand make it ideal for deployment across multiple machines [
20,
21].
Although XGBoost itself is not a novel algorithm, its role in this work is to serve as a robust and interpretable component within a broader state-aware framework. The innovation of the present study does not lie in the choice of classifier, but in the integration of operating-state recognition, per-state regression, and residual-based health estimation into a unified, electrical-only diagnostic pipeline.
A recent study by [
23] presents a smart Industrial Internet of Things (AIoT) framework for real-time monitoring and forecasting in industrial manufacturing. The framework integrates sensor arrays, data integration platforms, and AI models to enable process condition awareness and prediction in a practical industrial setting. This study demonstrates the value of combining multiple components into a cohesive system for industrial monitoring, which resonates with the system-level integration approach taken in our framework.
Despite progress, key challenges remain:
Dependence on labelled or synthetic fault data that limit generalizability.
Weak integration between regime detection and health estimation, often handled as disconnected stages.
Absence of fleet-level aggregation for maintenance prioritization.
The proposed framework directly addresses these gaps by combining state identification, per-state regression, and normalized residual scoring into a single interpretable pipeline. Using only standard electrical quantities—active power, current, voltage, and power factor—it enables scalable, non-intrusive predictive maintenance consistent with ISO 17359 and ISO 13379 guidelines [
11,
12,
13].
Although prior research has significantly advanced electrical parameter-based diagnostics, the current approaches remain fragmented [
1,
2,
3,
7,
10]. The existing studies typically address regime recognition, residual modelling, or anomaly detection separately, but do not integrate these components into a unified, reproducible workflow [
3,
7,
10,
18]. Moreover, most methods rely either on labelled fault data, assumptions of steady-state operation, or global models that do not account for regime-specific variability [
5,
6,
7,
10,
18]. As a result, their applicability in real industrial environments—characterized by frequent load changes, multi-mode operation, and limited fault annotations—is restricted [
1,
3,
7,
10].
To the best of our knowledge, no existing work combines
State classification;
Per-state modelling of expected electrical behaviour; and
Standardized, residual-based health assessment into a single, fully electrical-only framework.
This gap motivates the methodology proposed in the present study.
2.7. Research Gap and Contributions
The existing research on electrical parameter-based diagnostics provides valuable tools for fault detection, but the current approaches remain fragmented [
1,
2,
3,
7,
10]. Frequency and time–frequency ESA techniques offer physical interpretability, yet they are sensitive to load variations and require steady-state operation [
5,
6,
7,
8,
9,
17]. Machine learning and deep learning methods improve the predictive performance but typically rely on labelled fault data that are scarce in industrial environments [
15,
21].
To address these gaps, this work introduces a unified, state-aware diagnostic framework that
Automatically identifies operating regimes using electrical parameters only.
Learns per-state regression models to estimate expected behaviour under each regime.
Performs zero-shot anomaly detection through residual deviations without requiring fault labels.
Defines a normalized Health Index (HI) consistent with ISO 17359 and ISO 13379 principles.
Enables scalable, fully non-intrusive monitoring using standard metering infrastructure.
The originality of the contribution lies not in the use of individual algorithms, but in the integration of state classification, regime-conditioned modelling, and standardized residual-based health assessment into a single, reproducible workflow applicable across heterogeneous industrial assets.
In summary, while the existing literature establishes the potential of Electrical Signature Analysis (ESA) for machine diagnostics, several research gaps remain. First, most studies focus on steady-state conditions, failing to account for the multi-state operational cycles typical of complex industrial machinery. Second, there is a lack of frameworks that operate under “zero-label” constraints, where historical fault data are unavailable for training. This study addresses these gaps by investigating the following research questions:
To what extent can routine electrical parameters delineate distinct operating states without process-level metadata?
How does state-aware residual modelling improve the robustness of health indicators compared to context-agnostic approaches?
Can fleet-level benchmarking provide a reliable baseline for anomaly detection in the absence of labelled failure datasets?
3. Materials and Methods
This section describes the data, preprocessing, feature extraction, and modelling methodology used to develop the proposed state-based health monitoring framework. The approach is fully data-driven and relies solely on electrical measurements from digital power metres installed on three-phase industrial machines. A short excerpt of the processed dataset, including key electrical features, operating-state labels, and the computed Health Index (HI), is provided in
Appendix A (
Table A1). Machine identifiers have been anonymized for confidentiality.
3.1. Data Acquisition and Structure
The experimental data were collected from ten three-phase injection moulding machines operating in a continuous industrial environment. The overall data acquisition and monitoring architecture is illustrated in
Figure 1. Each machine was instrumented with an industrial-grade digital power analyser (Class 0.5S accuracy according to IEC 62053-22) and three clamp-on current sensors, enabling non-invasive measurement of electrical parameters such as active power, current, voltage, and power factor. The analysers employ internal Digital Signal Processing (DSP) to sample raw AC waveforms at a high frequency (exceeding 20 kHz), computing true RMS values and active power before data transmission. The measured data were locally collected and forwarded via Modbus TCP protocol to a dedicated edge device installed at each machine. To ensure signal integrity and mitigate high-frequency switching noise from the drives, a hardware-level moving-average filter was applied during the aggregation phase. The edge devices then transmitted the filtered measurements over a secure network connection to a centralized cloud-based data logger. The monitored machines employ standard industrial three-phase motor–drive systems, typically consisting of induction motors coupled with hydraulic pumps and controlled via variable-frequency drives. Since the proposed method operates exclusively on electrical parameters measured upstream of the drive, it is independent of the specific motor technology. Measurements were recorded for approximately two months with a sampling frequency of 0.2 s (5 Hz). This sampling rate was selected to provide a balance between capturing macroscopic regime transitions and maintaining computational efficiency for fleet-level monitoring, while the underlying DSP-based RMS calculation ensured that the captured data remained reliable even in the presence of transient harmonic distortions.
Each dataset contained the following electrical parameters for all three phases (L1–L3): active energy, active power, current, power factor, reverse active energy, reverse active power, and voltage. These quantities were selected because they directly reflect both electrical efficiency and mechanical loading and are universally available through modern industrial metering infrastructure [
1,
2,
3,
4].
The monitored injection moulding machines operate in three dominant regimes—idle (standby), heating/melting, and moulding (active production)—each exhibiting distinct electrical signatures in terms of total power and current amplitude [
7,
10]. This cyclic behaviour provides an ideal benchmark for testing the ability of the framework to identify and model operational states using only electrical features.
Although the experiments focused on injection moulding machines, the conceptual design is machine-agnostic. Because it relies exclusively on standard electrical quantities rather than process-specific signals, the method is scalable and transferable to various assets such as pumps, compressors, or CNC machines. The only requirement is the availability of three-phase electrical data, making the framework deployable fleet-wide using existing energy management or supervisory systems [
10,
24,
25,
26,
27].
From raw data, the following derived indicators were computed and used for modelling:
{P_total, I_mean, V_mean, PF_mean, I_imbalance, V_imbalance}, where
PF_mean: Average power factor;
I_imbalance, V_imbalance: Normalized imbalance indices calculated as shown in Equations (4) and (5)
All signals were stored in a local time series database and exported as CSV files. A healthy baseline period was defined using the first 10% of available data (6 days) for model training, while the remaining 90% (54 days) served as the testing set for evaluating the Health Index (HI) stability and sensitivity.
Although the proposed framework employs a supervised classifier for operating-state identification, this step does not affect the zero-shot nature of the diagnostic methodology. The supervised component is used exclusively to recognize normal operating regimes (idle, heating, moulding), not to detect faults. No fault labels are required at any stage of training. Health assessment is performed solely through residual deviations between measured and model-predicted electrical behaviour within each regime, meaning that the model never learns explicit fault patterns. Anomalies are therefore detected as departures from the learned normal behaviour, preserving the zero-shot, anomaly-based character of the method while enabling robust regime-aware modelling.
The choice of a 0.2 s sampling interval (5 Hz) was strategically selected to balance data granularity with the computational constraints of industrial SCADA systems. While high-frequency sampling is necessary for detecting transient harmonic distortions, a 5 Hz resolution is sufficient for characterizing the thermal and mechanical duty cycles of injection moulding machines, where major state transitions (e.g., heating cycles and mould clamping) occur over several seconds. To ensure data reliability, a hardware-level pre-averaging filter was applied at the metre level to eliminate high-frequency noise while preserving the fundamental power consumption trends. This resolution aligns with ISO 13379-1 recommendations for condition monitoring based on process parameters, providing a robust signal-to-noise ratio for residual-based health assessment. Summary of dataset characteristics is provided in
Table 1.
3.2. Temporal Aggregation and Windowing
To balance temporal resolution and statistical stability, each machine’s time series was divided into three-minute windows. This window length was determined empirically as the smallest interval capable of reliably capturing regime transitions without excessive noise.
Shorter windows (10–30 s) caused unstable class assignments due to transient control behaviour and switching noise, while longer windows (≥5 min) smoothed out relevant regime boundaries. The chosen 3 min window (~900 samples) provided optimal granularity for regime detection and rolling statistics.
Within each window, the mean, standard deviation, and coefficient of variation were computed for all electrical parameters, providing a compact yet informative representation of both steady and transient behaviour (see Equations (6)–(8)). Similar multi-scale temporal segmentation has been used in industrial monitoring studies for balancing noise suppression and dynamic sensitivity [
25,
26,
27,
28,
29].
Active power was selected as the primary target variable because it serves as an integral proxy for the electromechanical work performed. Any increase in mechanical friction, hydraulic resistance, or heater inefficiency directly manifests as a deviation in power consumption. However, to ensure robustness against external grid fluctuations, the voltage and power factor are included as exogenous inputs. This multi-parameter approach allows the model to distinguish between supply-side variations and internal asset degradation.
—signal value at time
—number of samples in the window;
—mean value of the signal.
—signal value;
—mean of the signal;
—standard deviation;
—number of samples.
3.3. Feature Engineering and Preprocessing
All electrical signals were resampled to 5 Hz, synchronized, and cleaned using a robust modified z-score filter (|M| > 3.5) as defined in Equation (9) [
27].
Short gaps caused by transient PLC resets were interpolated linearly. Longer data gaps exceeding the interpolation threshold (e.g., during network outages) were treated as missing entries and the corresponding cycles were excluded from the analysis to maintain dataset consistency. Furthermore, the robust modified z-score filter was specifically calibrated to treat non-physical power spikes as outliers, ensuring that they do not distort the subsequent regression models. To capture local dynamics, the rolling mean and standard deviations of P_total, I_mean, and PF_mean were computed over 30 s sub-windows within each 3 min window. This combination of static and short-term descriptors captures both gradual drifts and transient behaviour relevant for regime discrimination. All features were subsequently scaled using robust normalization (median and MAD) [
27,
28,
29], which improves stability across machines with different load capacities.
The exclusive use of low-frequency electrical parameters is intentional. These quantities—total active power, phase currents, voltages, and power factor—are universally available from standard industrial metering systems and inherently reflect the electromechanical loading of the machine. This enables fully non-intrusive monitoring without additional sensors or high-frequency waveform acquisition. For operating-state recognition, the three regimes studied here (idle, heating, moulding) exhibit distinct distributions in these electrical variables, making simple statistical window features sufficient and highly reproducible.
The goal of this work is not detailed fault-type classification but regime-aware residual modelling, where health assessment is derived from deviations between measured and expected electrical behaviour within each state. Under this objective, more complex feature extraction techniques or high-resolution waveform representations would impose additional sensing requirements, reduce scalability, and provide little diagnostic benefit in the absence of labelled fault data. The proposed feature set therefore prioritizes robustness, interpretability, and deployability under realistic industrial constraints while remaining extensible if richer data become available in future studies.
3.4. Operating State Identification Model
Identifying distinct regimes (idle (standby), heating/melting, and moulding (active production)) is crucial for contextualized diagnostics. Four candidate models were evaluated (
Table 2).
The XGBoost classifier was selected for its ability to model nonlinear feature interactions, handle correlated inputs, and maintain interpretability through feature importance metrics. Its embedded regularization, formulated in Equation (12), reduces overfitting, and its strong performance on tabular, mixed-scale industrial data makes it well suited for this task.
XGBoost consists of an ensemble of shallow decision trees, where each tree corrects the residual error of the previous ones, by minimizing the objective function defined in Equation (11). This boosting mechanism enables the model to resolve subtle nonlinear distinctions between electrical signatures of the idle, heating, and moulding regimes, even when their feature distributions partially overlap. The ability to reflect these interactions—while remaining computationally efficient and fully explainable—makes gradient-boosted trees a robust choice for regime identification in low-frequency electrical data [
17].
L—overall objective (loss) minimized during training;
—training loss for sample i (e.g., squared error);
—true measured value;
—model prediction;
—regularization term for the k-th tree;
—number of training samples;
—number of trees in the ensemble.
Unlike deep learning models, XGBoost does not require large, labelled datasets or extensive hyperparameter tuning, and it performs reliably under moderate class imbalance. Compared with unsupervised clustering, it provides consistent and reproducible regime boundaries rather than depending heavily on initialization. Although XGBoost itself is not novel, its integration within a unified state-aware pipeline—together with per-state regression and residual-based health estimation—forms the methodological contribution of this work [
20,
21].
—complexity penalty for tree k;
—number of leaves in the tree;
—regularization parameter penalizing the number of leaves;
—L2 regularization coefficient;
—weight (score) assigned to leaf j.
Each machine’s data were pseudo-labelled via quantile thresholds of total active power P_total, representing low, nominal, and high regimes. An 80/20 stratified train–test split ensured balanced representation across classes. Hyperparameters (max_depth, learning_rate, n_estimators) were optimized via grid search to balance accuracy and generalization.
This classifier provides consistent regime boundaries across machines and forms the basis for subsequent regression and Health Index computation.
3.5. Reference Power Modelling
After operating-state identification, a separate regression model was trained for each regime to estimate the expected total active power from electrical features. It is important to note that XGBoost is used here strictly as a regression model for state-aware power prediction rather than as a supervised fault classification algorithm. The framework generates a continuous Health Index based on residual deviations and does not perform explicit fault identification or multi-class classification.
XGBoost was selected due to its robustness to multicollinearity, ability to model nonlinear relationships, and strong generalization performance on structured industrial data. Compared to linear regression, tree-based boosting better captures complex interactions between electrical variables without requiring extensive feature engineering. At the same time, XGBoost remains computationally efficient and interpretable, making it suitable for deployment in large-scale industrial monitoring systems.
Using per-state models is essential because the relationship between current, voltage, power factor, and total power differs significantly across idle, heating, and moulding phases.
The regression model for each state is formulated in Equation (13):
where represents the state-specific XGBoost regression function; and
denotes the residual error.
The reference state for each machine is defined as the distribution of these residuals during a verified period of nominal operation (the baseline interval).
The XGBoost-based regression framework enables flexible modelling of nonlinear and heteroscedastic dependencies among electrical variables, which is particularly relevant in industrial three-phase systems where power–current–voltage relationships vary across operating states and loading conditions.
To establish a robust statistical reference, a state-specific reference dispersion
is computed during the healthy baseline period as shown in Equation (14):
N represents the number of samples in the baseline period for machine m and state s;
represents the mean residual value.
This value serves as the quantitative benchmark for “normal” process variability. Any subsequent increase in the residual magnitude relative to this reference is interpreted as a loss of efficiency or emerging degradation.
Modelling expected power per regime—rather than globally—removes confounding operational effects and isolates deviations that correspond to mechanical degradation, hydraulic inefficiency, or electrical imbalance. This state-aware formulation is therefore critical for obtaining reliable residuals and a meaningful Health Index, as it ensures that the “health” of the machine is always evaluated against a statistically sound and context-dependent reference.
3.6. Health Index Formulation
Let
denote the measured total active power in window
, and
the corresponding per-state regression estimate. The residual is defined as in Equation (15):
—residual (deviation) in time window t;
—measured total active power in window t;
—predicted total active power from the per-state regression model.
For each machine m and operating state , a reference dispersion is computed from a baseline interval verified to represent healthy operation. This reference quantifies the normal residual variability within each state and provides the normalization term required for health assessment.
The Health Index (HI) is formulated as a bounded, dimensionless score as shown in Equation (16):
—residual deviation;
—reference residual variability for machine in state ;
—normalized health score in (0, 1).
Values close to 1 indicate behaviour consistent with the healthy baseline, while decreasing values reflect increasing deviation from the expected electrical behaviour under the corresponding operating state. To provide a structured diagnostic interpretation, the health states are categorized into three functional zones:
Healthy State HI > 0.9: The machine operates within the expected statistical variance. Residuals are dominated by Gaussian noise, indicating nominal electromechanical efficiency.
Warning State 0.8 < HI < 0.9: Detectable architectural shifts in the power signature. This state reflects emerging irregularities, such as increased mechanical friction or minor hydraulic pressure drops, requiring scheduled inspection.
Faulty/Degraded State HI < 0.8: Significant deviation from the fleet-level baseline. At this level, the “energy penalty” exceeds the threshold established from the reference variability, correlating with potential anomalies like heater band failures or severe motor strain.
This formulation aligns with ISO 17359 and ISO 13379 principles, where condition indicators are evaluated relative to a reference baseline and degradation is inferred from persistent trends rather than single-point anomalies. In this study, values above approximately 0.9 generally correspond to normal operation, whereas sustained decreases towards 0.8–0.7 may indicate emerging abnormalities.
Weekly machine-level health is computed as the median of all values in the given period, with 95% bootstrap confidence intervals included to account for variability. Because the index is normalized by state-specific variability, machines with different load profiles or power ratings can be compared on a common scale, enabling fleet-level prioritization and integration into predictive maintenance workflows.
3.7. Fleet-Level Aggregation and Benchmarking
To extend the analysis beyond individual machines, Health Index values are aggregated on a weekly basis and evaluated separately for each operating state. This state-aware aggregation enables meaningful fleet-level benchmarking by ensuring that machines are compared under comparable operating conditions.
The resulting fleet-level metrics support three key maintenance objectives:
Identification of machines exhibiting declining health trends, indicated by sustained reductions in weekly HI values.
Assessment of relative operational efficiency across assets, independent of differences in load profiles or power ratings due to the normalized formulation of the HI.
Prioritization of maintenance actions, where assets displaying accelerated degradation or consistently low HI values can be flagged for inspection, as in Equation (17).
All computations were implemented in Python 3.12 using the XGBoost, pandas, and scikit-learn libraries, enabling a scalable and fully reproducible workflow from raw electrical measurements to actionable diagnostic indicators suitable for industrial deployment.
5. Discussion
The presented approach demonstrates that electrical parameters alone can serve as a robust foundation for the data-driven health assessment of industrial machines when the operating-state context is explicitly modelled. The proposed state-aware framework integrates regime recognition, per-state regression, and residual-based health evaluation into a unified, fully non-intrusive workflow that requires no additional measurements or labelled fault data.
The identification of operating regimes based purely on electrical signatures proved highly consistent across all ten monitored machines. The XGBoost classifier achieved validation accuracy between 0.82 and 0.88, clearly distinguishing the idle, heating, and operational states. This result confirms that routine electrical quantities, such as power, current, and power factor, inherently contain sufficient information to delineate machine behaviour without process-level inputs.
Regarding the sufficiency of the parameter set, while high-frequency vibration or thermal imaging could provide deeper diagnostic resolution for specific components like bearings or nozzle heaters, their integration would contradict the requirement for a non-intrusive, zero-cost deployment. The results confirm that for the majority of common electromechanical irregularities in injection moulding machines, the routinely measured electrical signature provides a sufficient signal-to-noise ratio for early anomaly detection without the need for additional specialized sensing.
Per-state regression modelling further improved the stability and accuracy under varying load conditions. Compared to a global, context-agnostic model, the state-aware approach reduced residual variability by approximately thirty percent and increased the median determination coefficient (R2) from 0.64 to 0.86. These findings highlight that incorporating operational context directly into the model structure leads to greater robustness and generalization than increasing the model complexity or data volume alone.
While previous research has explored state recognition and fleet analytics in isolation, this work integrates these components into a cohesive, system-level framework designed for strict industrial constraints. Unlike traditional AIoT or deep learning approaches that rely on specialized sensing and large labelled fault datasets, our framework achieves robust predictive maintenance using only standard electrical parameters. By prioritizing this synergistic integration over mere algorithmic complexity, the solution remains cost-effective and non-intrusive, making it directly applicable to legacy machine fleets where zero-label constraints are a practical reality. The normalized Health Index (HI), derived from residual deviations, provided a reproducible and interpretable measure of machine behaviour over time. Stable assets maintained HI values above 0.9, while those exhibiting mechanical or electrical deviations showed gradual decreases toward 0.7–0.8. This indicates that the framework can reveal emerging irregularities early and with reduced false detections compared to purely statistical or unsupervised approaches. The smooth evolution of the HI also minimizes sensitivity to normal load transitions, which often trigger false alarms in context-free detection schemes.
While previous research has explored state recognition, regression modelling, and fleet-level analysis separately, this work represents the first integration of these components into a cohesive, system-level framework for predictive maintenance. The novelty of this approach lies not in the individual methods themselves, but in how they are combined to create a system that operates under strict industrial constraints—particularly the lack of labelled fault data and the reliance on standard electrical parameters, such as active power, voltage, and current. This system-level integration enables scalable and cost-effective machine health monitoring without the need for additional sensors or labelled fault data. Compared with traditional Electrical Signature Analysis and existing machine learning models, the proposed framework combines physical interpretability with scalability and computational efficiency. Frequency- and wavelet-based ESA methods, though effective for stationary signals, remain sensitive to parameter tuning and are less robust under dynamic operation. Deep learning approaches offer high predictive power but rely on large, labelled datasets and limited explainability. The gradient-boosted modelling strategy used here achieves comparable accuracy on standard metering data while maintaining transparency, interpretability, and full compliance with ISO 17359 and ISO 13379 principles for data-driven diagnostics.
Fleet-level aggregation of the Health Index extended the analysis from single-machine evaluation to collective performance monitoring. By comparing weekly HI trajectories across machines, the framework enabled quantitative benchmarking and identification of assets with emerging performance deviations. Its lightweight implementation—requiring only a few minutes for training and less than 100 milliseconds for inference—ensures compatibility with existing SCADA and energy monitoring infrastructures and supports deployment on edge devices without specialized hardware.
Regarding the comparison with alternative modelling approaches listed in
Table 2, a direct quantitative benchmark is not applicable, as the methods differ fundamentally in their data requirements and modelling assumptions. Unsupervised techniques such as K-Means and GMM do not provide stable regime boundaries under class imbalance, making them unsuitable for supervised per-state regression. Deep recurrent architectures (LSTM/RNN) require high-frequency waveform data and large labelled datasets, which are not available in the present industrial setting. Random Forests provide lower resolution and weaker separation between operating regimes, which led to higher residual variance in preliminary tests. For these reasons, XGBoost offered the best trade-off between robustness, interpretability and suitability for the proposed state-aware residual framework.
A notable limitation of the current framework is the potential for a self-referential loop, as electrical parameters (specifically active power) are used both for operating-state recognition and as targets for Health Index regression. In theory, a degradation that manifests as a significant power shift could be misclassified as a legitimate state transition, potentially masking the anomaly. However, this risk is mitigated by the multi-dimensional nature of the input features—including current, voltage, and power factor—which provides physical redundancy. Furthermore, the fleet-level aggregation acts as a critical safeguard; since the health baseline is derived from the median behaviour of multiple identical machines, an individual asset’s degradation cannot “redefine” the global state boundaries, ensuring that significant deviations remain detectable.
Overall, the results confirm that the developed state-aware, electrical-only modelling framework provides a reliable, interpretable, and scalable foundation for continuous condition assessment in industrial environments. By linking data-driven modelling with physically meaningful indicators, the approach bridges the gap between raw electrical signals and actionable diagnostic information. Future work will focus on expanding the framework toward online adaptation, the inclusion of frequency-domain indicators, and transferability across heterogeneous machine types, further strengthening its role as a core enabler of intelligent and sustainable industrial operation.
6. Conclusions
This study presented a zero-label, state-aware framework for the health assessment of industrial assets using only Electrical Signature Analysis. By integrating automated regime identification with state-specific XGBoost regression models, the proposed Health Index (HI) effectively isolates mechanical and hydraulic degradation from normal operational load variations.
The method successfully detected emerging degradation trends 3–5 days before maintenance intervention without requiring any historical fault labels.
The state-aware approach significantly reduced false positive rates compared to context-agnostic anomaly detection methods (e.g., Isolation Forest).
The use of standard electrical parameters makes the framework highly scalable and deployable across various industrial fleets without additional sensor investments.
While robust, the framework’s performance depends on the initial definition of the “healthy baseline.” If the baseline period contains undetected minor defects, the reference dispersion may be inflated, reducing the sensitivity of the HI. Furthermore, extremely high-frequency transients (sub-millisecond) are not captured by the 5 Hz sampling rate, which may limit the detection of certain rapid electrical switchgear faults.
Future research will focus on the automatic adaptation of the baseline (transfer learning) to account for seasonal environmental changes (e.g., ambient temperature effects on hydraulic oil viscosity). Additionally, we aim to extend the methodology to multi-asset correlation, where the health of one machine is cross-referenced with its neighbours in the fleet to further isolate external grid-level disturbances from local mechanical wear.