1. Introduction
Large-scale power monitoring underpins predictive maintenance, cost control, and safety in manufacturing environments. However, mixed periodicities (shift-, daily-, and weekly-level rhythms) and nonstationary operating regimes often violate the assumptions behind classical detectors and plainly trained deep autoencoders, resulting in inflexible thresholds, phase-sensitive errors, and elevated false alarms. In Industry 4.0 settings where cyber–physical systems and IoT data streams are pervasive, scalable unsupervised detection is especially desirable to reduce annotation cost and to react promptly to regime changes [
1,
2,
3,
4,
5].
Modern deployments also face practical constraints that shape the detector design: (i) label sparsity and annotation delay in plant operations, (ii) periodic but drifting rhythms driven by shift schedules, weekend/holiday policies, and seasonal demand, and (iii) compute/latency budgets for near-real-time triage on commodity servers. These constraints make purely prediction-based models sensitive to calendar drift and purely reconstruction-based models prone to phase error. In contrast, frequency-informed modeling aligns capacity with the site-specific rhythms before training, reducing the degree of window/scale mismatch while keeping the pipeline lightweight and interpretable for operators. In our setting with 15 min sampling, FFT on the first-differenced series reveals stable sub-daily peaks that serve as an effective control knob to define only a few window sizes, enabling robust detection without exhaustive hyperparameter search.
We propose a frequency-informed, unsupervised pipeline tailored to industrial power time-series. First, we compute the Fast Fourier Transform (FFT) on the first-differenced signal to expose dominant periodicities and
translate the top spectral peaks into a compact set of modeling window sizes [
6]. Second, for each window, we train a USAD-style autoencoder with a GELU-activated CNN–GRU backbone, leveraging the two-phase (reconstruction/adversarial) procedure to tighten the normal region [
7,
8,
9,
10]. Third, we compute reconstruction dissimilarities with Dynamic Time Warping (DTW)—to mitigate phase jitter and timing drift—and obtain final anomaly decisions with Isolation Forest to avoid manual threshold tuning [
11,
12,
13]. This design uses standard components, integrates cleanly with streaming stacks, and remains robust across sub-daily to daily rhythms.
Relation to prior work. Classical and deep learning approaches to time-series anomalies include distance- and density-based methods, probabilistic models, and sequence autoencoders/VAEs [
1,
2,
14,
15,
16,
17]. USAD [
8] sharpened the boundary of “normal” via a two-decoder scheme, while GRU-based encoders and lightweight 1D CNNs have proven effective for temporal representation learning [
9,
10]. Our contribution is orthogonal and complementary: we
guide model capacity using FFT-derived windows so that each autoencoder operates at the timescale of salient cycles. Inference then compares observed and reconstructed windows elastically (DTW) rather than pointwise, reducing sensitivity to phase misalignment common in power loads [
11,
13]. For deployment, we replace hand-tuned cutoffs with Isolation Forest, a nonparametric outlier method that scales well and requires minimal supervision [
12]. Within the smart-factory context, this combination aligns with CPS principles and predictive maintenance workflows [
4,
5].
Recent advances (last three years) further highlight three lines of progress highly relevant to our design. First,
spectral-guided multi-scale modeling has been used to reduce window mismatch in periodic industrial signals by selecting modeling scales from dominant frequency components rather than from a uniform grid (e.g., shift-aware or day-level rhythms) [
18,
19]. Second,
elastic similarity has matured with bounded-band and differentiable variants of DTW, improving tolerance to phase jitter and local lags without sacrificing runtime—particularly important under streaming constraints; large-scale multivariate settings also benefit from representation reuse and transfer to reduce cold-start cost [
20]. Third,
threshold governance has moved toward nonparametric outlier rules (e.g., Isolation Forest and its streaming variants) to avoid brittle, hand-tuned cutoffs and to support light human-in-the-loop recalibration; in the built-environment/industrial energy domain, unsupervised detectors continue to report strong results amid pronounced periodicity and regime shifts [
21,
22]. Our pipeline is intentionally simple and deployment-oriented: we adopt these ingredients in a label-efficient composition—FFT-guided windows → USAD with CNN–GRU backbone → DTW-aware reconstruction scoring → Isolation Forest—so that each step remains standard, auditable, and easy to maintain in Industry 4.0 environments [
18,
19].
Contributions. The main contributions of this work are as follows:
FFT-guided window selection. We translate top spectral peaks of the first-differenced series into a small set of window sizes that cover sub-daily to daily behaviors using few models [
6].
USAD with CNN–GRU backbone. A lightweight 1D CNN front-end and a GRU bottleneck, trained under the two-phase USAD paradigm, tighten the normal region while remaining efficient for streaming data [
7,
8,
9,
10].
DTW-aware scoring and nonparametric decisions. DTW mitigates phase misalignment in reconstruction; Isolation Forest yields fast, distribution-agnostic anomaly decisions with minimal threshold tuning [
11,
12,
13].
Deployment-oriented design. The pipeline uses standard, well-documented components (FFT, CNN/GRU, DTW, Isolation Forest) and unsupervised data only, facilitating integration in smart-factory monitoring [
4,
5].
Taken together, these choices aim for
robustness under mixed periodicities with minimal supervision, offering a practical path from research prototypes to plant-floor monitoring. The remainder of this paper is organized as follows:
Section 2 (Materials and Methods) presents the dataset and preprocessing, FFT-guided window selection, USAD modeling with a CNN–GRU backbone, and DTW/Isolation Forest scoring.
Section 3 (Results) reports qualitative and operational findings under 15 min sampling and mixed periodicities.
Section 4 (Discussion) analyzes implications, limitations, and deployment practices in smart-factory settings.
Section 5 concludes with takeaways for real-world monitoring.
3. Results
3.1. Detection Performance
We first evaluate the proposed method against representative unsupervised, reconstruction-based baselines to assess the benefit of FFT-guided multi-window modeling under a fully unlabeled setting. Our FFT-guided multi-window USAD–GRU with DTW/IF achieves F1 = 0.2060 and PR-AUC = 0.0923, with a precision of 0.1600 and a recall of 0.2892. Compared with single-window baselines (GRU-AE: F1 = 0.0323; Linear AE: F1 = 0.0350), our method yields a relative F1 improvement of 489%. We evaluate at the point level by dilating each window-level detection to its covered timestamps and applying a minimum event-length filter of length points. Given the extreme class imbalance in industrial streams, we prioritize PR–AUC over ROC–AUC, as the latter can be overly optimistic under rare-event conditions.
Beyond the single operating point, we examine threshold-free behavior via PR–AUC, which is for our method versus near-zero for single-window baselines. In highly imbalanced streams, PR–AUC is a more faithful indicator of practical utility than ROC–AUC, and the observed gap implies that our ensemble maintains precision in the low-recall regime while recovering recall as the threshold is relaxed. Importantly, the gains persist when we vary the minimum event-length filter from 2 to 5 points, indicating that improvements are not merely due to aggressive postprocessing.
Ablation on window participation further clarifies the source of gains: short windows (e.g., ) capture abrupt spikes from switching operations, whereas longer windows () capture slow drifts and regime transitions. The n-of-M aggregation attenuates idiosyncratic errors from any single scale and produces more stable decisions over calendar boundaries (e.g., month-end load reshaping). To avoid optimistic bias, we report point-level metrics after dilating window-level hits to their covered timestamps, which penalizes fragmented detections and aligns with plant-floor alerting practice. For transparency, the “489%” relative F1 improvement is computed as , using the strongest single-window baseline (Linear AE, F1) as the reference. While absolute F1 remains modest due to label sparsity and mixed periodicities, the relative lift is operationally meaningful: with the same alert budget, the ensemble surfaces more true events and reduces weekend/night artifacts. We note that comparisons with supervised detectors, which have access to labeled anomalies, are deferred to a subsequent subsection to contextualize performance trade-offs between supervised and fully unsupervised settings.
We compared our method with a state-of-the-art Transformer-based autoencoder. Despite its high complexity, it failed to capture the diverse periodic patterns effectively in this unsupervised setting (
Table 4).
3.2. Ablation on , r, and n-of-M
We analyze the sensitivity of the proposed pipeline to key deployment-facing hyperparameters, namely the IF contamination , the DTW band ratio r, and the aggregation rule n-of-M. Recall increases with larger and union aggregation (), while precision improves with and longer event filters. We select to maximize F1 on validation.
The loss-weight controls the strength of adversarial regularization in USAD. A larger encourages conservative reconstructions and increases recall by widening the margin between normal and abnormal patterns; however, excessive values oversmooth local motifs and reduce precision. The DTW band ratio r governs boundary refinement and latency: smaller r tightens the alignment and favors precision, while larger r relaxes the alignment and improves recall near regime changes, at the cost of higher compute.
Aggregation exhibits the expected trade-off. Union () maximizes coverage and is useful for forensic review, whereas majority () suppresses scale-specific noise and yields higher precision under mixed periodicities. On validation splits, we found that typically pairs with , indicating that a modest consensus is sufficient to stabilize decisions without sacrificing the complementary strengths of short and long windows. These trends persist across alternative elbow thresholds and event-length filters, suggesting that the gains are not hyperparameter-brittle.
Figure 3 illustrates the sensitivity analysis results. Interestingly, the performance drops significantly for majority voting (
), indicating that anomalies detected at different temporal scales are mutually exclusive. This strongly justifies our adoption of the union aggregation strategy (
).
3.3. Operational Efficiency
Table 5 reports inference latency per window with a Sakoe–Chiba band (
) and stride
, remaining compatible with a 15 min sampling cadence. Latency grows approximately linearly with the window length, remaining comfortably within a 15 min sampling cadence even for
. Because DTW with a Sakoe–Chiba band scales as
in both time and memory, choosing
limits overhead while preserving boundary sensitivity. In deployment, window branches can be parallelized across cores/GPUs and DTW evaluations can be batched, keeping end-to-end wall-clock time near the slowest branch rather than the sum. This profile enables near-real-time triage and periodic re-estimation of spectral peaks without disrupting ongoing inference.
3.4. Clustering and Operational Insights
We cluster daily profiles (96-slot vectors) and obtain
K clusters. Cluster-wise averages reveal shift/day–night regimes and weekend patterns (
Figure 4);
Table 6 summarizes the days per cluster. Daily-profile clustering reveals distinct operating regimes that align with shift start/stop, lunch breaks, and weekend schedules. Clusters with sharp morning ramp-ups are correlated with higher anomaly rates during early hours, while flatter clusters concentrate residual alerts around maintenance windows. These regimes explain a portion of the spectral energy near daily harmonics and motivate keeping multiple window sizes during training. They also inform hour- and mode-conditioned thresholds, which reduced false positives in validation by suppressing alerts during predictable high-variance periods.
We then group anomaly
events using length/shape/temporal features (PCA–2D in
Figure 5);
Table 7 shows that short bursts and longer drifts form distinct types with different hour/day-of-week preferences. Event-level clustering corroborates two dominant types: short bursts with steep onsets (often linked to switching or brief overloads) and longer drifts (linked to calibration issues or gradual fouling). The separation suggests different response playbooks: rapid operator confirmation for bursts versus preventive maintenance tickets for drifts. Embedding these priors into alarm routing can shorten time-to-action without changing the underlying detector.
Finally, hour-of-day/day-of-week analyses (
Figure 6) motivate hour-adaptive thresholds in deployment. A clear weekly rhythm appears, with elevated loads during weekday daytime and reduced activity overnight and on weekends. Hours with consistently high variance are also periods where false positives are more likely if a single threshold is used; schedule-aware thresholds and periodic peak re-estimation help mitigate this issue.
3.5. Qualitative and Error Analysis
Figure 7 shows the full period with detected anomalies; a one-month zoom is provided in
Figure 8. We observe that multi-window aggregation reduces night/weekend false positives while retaining abrupt spikes and slow drifts. Visual inspection over the full period shows that the ensemble suppresses many night/weekend artifacts while retaining operationally meaningful events. The one-month zoom highlights boundary handling: DTW-based smoothing merges fragmented hits across short gaps, and the minimum event-length filter removes transient blips. Remaining false positives are mostly attributable to scheduled tests and sensor dropouts; both can be mitigated by excluding known maintenance windows and by down-weighting channels with intermittent missingness.
Typical failure modes include borderline oscillations near setpoint changes and slow drifts that exceed the longest window only after several days. A simple remedy is to append a coarser weekly window to the ensemble or to incorporate an exponential moving average of scores for slow-trend tracking. We leave this extension to future work, as the current design already meets latency and precision requirements for plant-floor monitoring.
4. Discussion
FFT-guided windowing aligns model capacity with site-specific rhythms, while the two-phase USAD training tightens the normal region by contrasting reconstructions. DTW-based scoring reduces sensitivity to phase jitter and minor timing shifts, and Isolation Forest provides fast, nonparametric decisions without manual threshold tuning. Together, these choices form a deployment-oriented pipeline that preserves robustness under mixed periodicities and nonstationary regimes. Beyond individual components, the proposed framework is designed around the principle of data-adaptive simplicity: rather than increasing model depth or task-specific supervision, we align representation scales with site-specific rhythms and enforce robustness at the scoring stage. This design choice prioritizes stable operation under nonstationarity and mixed periodicities, which are common in industrial environments but rarely satisfied by off-the-shelf detectors.
4.1. Unsupervised vs. Supervised Detectors
While supervised detectors generally achieve higher absolute performance when abundant labeled anomalies are available, their applicability in industrial monitoring is often limited by annotation cost, label sparsity, and concept drift. In our experiments, supervised baselines achieve higher peak F1 under controlled conditions, but require frequent re-labeling and retraining to maintain performance across calendar and regime changes.
In contrast, the proposed method operates fully unsupervised and emphasizes robustness and deployability. Although absolute F1 remains modest, the relative improvement over unsupervised baselines and the stability under temporal shifts suggest that the method is better suited for continuous monitoring scenarios where labels are scarce or nonstationary.
4.2. Practical Implications
The proposed design depends only on unsupervised data and uses standard components (FFT, 1D CNN/GRU, DTW, Isolation Forest), which simplifies integration into existing monitoring stacks. FFT-guided windowing yields a compact set of models covering sub-daily to daily cycles, avoiding exhaustive window searches. DTW scores can be inspected at the point level to localize abnormal segments within a window, which is useful for triage and operator feedback. As a result, the framework can be incrementally deployed and calibrated without disrupting existing alerting pipelines or requiring labeled anomaly archives.
4.3. Limitations and Threats to Validity
First, window-specific hyperparameters (e.g., DTW band width, GRU size, score mixing ) introduce tuning overhead, and poorly chosen parameters may inflate latency or false alarms. Second, purely unsupervised scores can drift under long-term distribution shifts, especially when operating schedules change. Third, DTW—even with banding—adds per-window cost that grows with window length; real-time use requires careful selection of r and model pruning. Finally, results are from a single industrial site; external validity to other plants and load regimes must be established.
4.4. Operational Considerations
In production, we recommend the following: (i) periodic re-estimation of FFT peaks when operating schedules change; (ii) light human-in-the-loop calibration using a small, curated set of confirmed events to stabilize Isolation Forest contamination; and (iii) watchdogs for score drift (e.g., population shift tests on S) to trigger retraining.
4.5. Future Work
Future directions include multi-site validation with heterogeneous schedules, attribution methods using segment-level DTW cost heatmaps for explainability, adaptive window management (adding/removing windows as spectra evolve), and closed-loop integration with maintenance systems for automated ticketing and prioritization.