Thermo-Mechanical Fault Diagnosis for Marine Steam Turbines: A Hybrid DLinear–Transformer Anomaly Detection Framework

Ziyi Zou; Guobing Chen; Luotao Xie; Jintao Wang; Zichun Yang

doi:10.3390/jmse13112050

,

and

Naval University of Engineering, Wuhan 430033, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng.2025, 13(11), 2050;https://doi.org/10.3390/jmse13112050
(registering DOI)

This article belongs to the Section Ocean Engineering

Version Notes

Order Reprints

Abstract

Thermodynamic fault diagnosis of marine steam turbines remains challenging due to non-stationary multivariate sensor data under stochastic loads and transient conditions. While conventional threshold-based methods lack the sophistication for such dynamics, existing data-driven Transformers struggle with inherent non-stationarity. To address this, we propose a hybrid DLinear–Transformer framework that synergistically integrates localized trend decomposition with global feature extraction. The model employs a dual-branch architecture with adaptive positional encoding and a gated fusion mechanism to enhance robustness. Extensive evaluations demonstrate the framework’s superiority: on public benchmarks (SMD, SWaT), it achieves statistically significant F1-score improvements of 2.7% and 0.3% over the state-of-the-art TranAD model under a controlled, reproducible setup. Most importantly, validation on a real-world marine steam turbine dataset confirms a leading fault detection accuracy of 94.6% under variable conditions. By providing a reliable foundation for identifying precursor anomalies, this work establishes a robust offline benchmark that paves the way for practical predictive maintenance in marine engineering.

Keywords:

marine steam turbines; fault diagnosis; anomaly detection; DLinear–Transformer; dynamics-integrated monitoring

1. Introduction

Marine steam turbine systems serve as safety-critical components in maritime operations, whose operational integrity and performance directly govern overall vessel safety, maneuverability, and navigation efficiency. These systems exhibit inherent multidisciplinary complexity due to tightly coupled thermo-fluid, mechanical, and control subsystems operating within highly dynamic and uncertain marine environments. Challenging operational conditions, such as fluctuating loads, wave-induced vibrations, and variations in seawater properties, further complicate their dynamic behavior, making accurate anomaly detection particularly difficult. Currently, the maritime industry predominantly relies on threshold-based condition monitoring methods, which use predefined limits on selected physical parameters to trigger alarms. Although straightforward to implement, these conventional methods lack the capacity to effectively process and interpret the high-dimensional, nonlinear, and temporally correlated operational data produced by modern marine steam turbine systems. As a result, they suffer from limited sensitivity to incipient faults and a high false positive rate during transient conditions, thereby falling short in supporting proactive maintenance and operational decision-making. Given the critical role of these systems and the growing availability of sensor data onboard vessels, the development and implementation of advanced data-driven anomaly detection methods specifically designed for marine steam turbines have become a research priority, offering considerable potential for enhancing safety, reliability, and efficiency in marine engineering applications.

Traditional anomaly detection techniques applied to marine steam turbine systems, such as k-means [] clustering, support vector machines (SVMs) [], and regression modeling [], are primarily rule-based or grounded in conventional statistical models. While these methods perform adequately in detecting discrete anomalous data points, they exhibit considerable limitations in complex dynamic systems. In such environments, anomalies may manifest not only as isolated outliers but also as continuous anomalous subsequences, where individual time points may appear normal under predefined thresholds. Consequently, approaches focused exclusively on point-wise outlier detection are often insufficient for reliably identifying anomalies in multivariate and highly nonlinear time series [].

In contrast, modern data-driven anomaly detection methods encompass a diverse spectrum of techniques, ranging from statistical process control models [] to deep learning architectures such as autoencoders [] and long short-term memory (LSTM) networks []. These approaches leverage automated feature extraction and sequential modeling to effectively capture non-linear temporal dependencies and dynamic system behaviors, thereby substantially improving the detection of subtle and progressive anomalies. A pivotal advancement was marked by the introduction of attention mechanisms [] and the subsequent advent of Transformer-based models [], which significantly enhanced the modeling of long-range dependencies in complex operational sequences. The integration of physical principles with deep learning frameworks [] has further yielded hybrid methods that partially mitigate data fragmentation issues and improve generalization across varying vessel operating conditions. architectural evolution continued with the fusion of Transformers and generative models, as exemplified by MT-VAE [] and MAD-GAN [], which employ adversarial and co-evolution mechanisms to strengthen temporal pattern recognition. This innovation trajectory is further evidenced by subsequent refinements: TransAnomaly [] leverages self-attention weights for dependency modeling, FGANomaly [] incorporates pseudo-label filtration to enhance data integrity, AnoFormer [] introduces adaptive masking for improved sensitivity, and TranAD [] achieves state-of-the-art performance through deep attention fusion. These advances collectively underscoring the maturing potential of Transformer-based frameworks in tackling complex anomaly detection tasks.

Concurrently, the broader field of industrial diagnostics has made significant strides, further validating the efficacy of data-driven paradigms. Substantial performance gains have been demonstrated through enhanced convolutional architectures and advanced signal processing techniques for component-level diagnostics, as evidenced by gear and bearing fault diagnosis under variable conditions in studies by Spirto et al. [] and Lin et al. []. The application scope of deep learning has expanded considerably, now encompassing strategies such as deep transfer learning and Koopman operator-based methods for turbocharger and gas turbine systems [,], as well as two-tier machine learning frameworks for wind turbine fault detection []. Notably, these paradigms are increasingly demonstrating practical efficacy in maritime energy systems. A prominent example is the work of Liu et al. [], who successfully implemented a deep transfer learning framework for the condensate system of marine steam turbines.

This collective progress highlights the success of deep learning in applications spanning from component-level (e.g., gears, bearings) to subsystem-level (e.g., condensate systems) diagnostics. However, this predominant focus on component or subsystem-level diagnosis, often reliant on pre-processed features, reveals a critical gap in addressing system-level anomaly detection from the raw, non-stationary multivariate time series generated by an integrated marine steam turbine system under dynamic operating conditions.

When applied to this task, current models, including sophisticated Transformer-based variants, reveal fundamental limitations. Although the self-attention mechanism inherent in Transformers represents a theoretical leap in capturing long-range dependencies, its practical efficacy in highly dynamic environments remains constrained. A primary bottleneck is the inability to robustly capture explicit temporal dependencies [] amidst pronounced non-stationarity. Compounding this issue, empirical evidence indicates that the self-attention mechanism may over-prioritize local features, inadvertently fragmenting the coherent global temporal contexts [] and undermining the model’s inherent sequential perception []. These challenges are acutely magnified in marine steam turbine systems, which present unique anomaly detection challenges due to their escalating structural sophistication, highly dynamic operational regimes, and multivariate data streams stemming from intricate component interdependencies and cross-domain interactions. The resulting complex correlations within the feature data pose a formidable obstacle, significantly complicating the effective deployment of Transformer-based anomaly detection in real-world maritime settings.

To enhance information preservation in time series analysis and improve reconstruction accuracy for non-stationary sequences, we propose an enhanced framework that refines the Transformer-based collaborative reconstruction network architecture. Our methodology integrates a novel cooperative training paradigm, DLinear–Transformer, which synergizes the sequence modeling strengths of Transformer with the multi-horizon forecasting capabilities of DLinear. In this configuration, the Transformer operates as the reconstruction evaluator, while the DLinear module functions as the reconstruction module. While maintaining the original Transformer structure, this study utilizes DLinear’s direct multi-step prediction strategy, enabling it to work collaboratively with the Transformer in a cooperative setup. Furthermore, we calculate the anomaly scores at each time point and employ The Peak Over Threshold (POT) method [] to automatically determine dynamic thresholds. The DLinear–Transformer cooperative training process effectively captures dependencies across all positions in the sequence and generates a reconstructed sequence that closely approximates the actual situation by integrating both local and global information. We have validated the effectiveness of the DLinear–Transformer algorithm on publicly available datasets, including Server Machine Dataset (SMD) [] and the Secure Water Treatment (SWaT) benchmark [], as well as our self-constructed marine steam turbine system dataset. The results demonstrate that this method offers a more effective solution for forecasting non-stationary time series, thereby better addressing the challenges of real-world applications.

The main contributions of this study can be summarized as follows:

Domain-specific marine steam turbine dataset: We introduce a dataset capturing 62 dynamic parameters collected under real maritime conditions, capturing multi-regime operational states to support robust anomaly detection model development.
Novel collaborative framework: A novel two-stage cooperative training architecture is proposed, which synergizes a DLinear-based reconstruction module and a Transformer-based reconstruction evaluator to significantly reduce reconstruction errors in non-stationary time series and enhance local-global feature representation.
Enhanced Performance and Maritime Applicability: Our method achieves state-of-the-art results on public benchmarks, evidenced by a markedly improved F1-score. More importantly, its successful deployment in identifying marine-specific faults, such as condenser level anomalies, demonstrates substantial practical value for real-world predictive maintenance and enhanced operational safety.

The remainder of this paper is structured as follows. Section 2 elaborates on the proposed hybrid framework, detailing the architecture and synergistic mechanism of the DLinear–Transformer. Section 3 outlines the experimental design, including the configuration of the marine steam turbine dataset, benchmark datasets, and evaluation protocols. Section 4 presents the experimental results. Section 5 provides comprehensive analyses and discussion, including ablation and sensitivity studies, and concludes with an outlook on limitations and future work. Finally, Section 6 concludes the paper.

2. A Hybrid DLinear–Transformer Framework for Anomaly Detection

The DLinear–Transformer is a novel hybrid architecture designed to overcome the limitations of standard Transformer models in reconstructing non-smooth and non-stationary real-world time series. By integrating a DLinear-based reconstruction module with a Transformer-based module within a two-stage cooperative training framework, the model significantly enhances reconstruction accuracy and anomaly detection performance.

As illustrated in Figure 1, the proposed framework processes input data through two parallel pathways. The upper pathway is the DLinear-based reconstruction module. This component utilizes linear projections to decompose the input from a separate sliding window into seasonal and trend components, which are processed independently before being linearly aggregated. This design effectively preserves essential multi-scale temporal patterns and trend information.

Figure 1. The specific structure of the DLinear–Transformer model.

Simultaneously, the lower pathway is the Transformer-based reconstruction module. It captures complex nonlinear relationships and global contextual dependencies through its encoder–decoder structure and self-attention mechanisms. The encoder processes the historical time series, while the decoder utilizes masked multi-head attention on a sliding window of the sequence, incorporating positional encodings, to reconstruct the output.

The synergy between these two components occurs within a two-stage cooperative training paradigm. The DLinear module’s output is combined and compared with the Transformer module’s reconstruction, fostering a collaborative process that iteratively refines both modules. This synergistic design allows the model to effectively combine global nonlinear representation learning with local linear temporal modeling, resulting in robust performance across diverse and dynamic operational conditions.

The proposed DLinear–Transformer architecture operates through two distinct yet interconnected processing stages. In the first stage, the Transformer encoder processes the historical time series

C

to produce encoded representations. Concurrently, the decoder module accepts sliding window data

W_{1}

incorporates positional encodings, and reconstructs temporal patterns to output the reconstructed series

X_{1}

. In parallel, the DLinear module processes sliding window data

W_{2}

, decomposing it into seasonal and residual components. These are independently reconstructed and linearly aggregated to form the final output

D_{1}

.

The second stage follows a similar procedure, with the critical enhancement that attention scores

F

are integrated with the historical series

C

. This enriched input is processed identically to the first stage, yielding a refined reconstruction output

X_{2}

. All multivariate inputs, specifically

W_{1}

,

W_{2}

and

C

, are represented as matrices. Among these,

W_{1}

and

C

are processed through multi-head (or masked multi-head) attention mechanisms, while

W_{2}

is decomposed by DLinear into trend and residual components.

The attention and multi-head attention mechanisms are defined as follows []:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d}})

(1)

\begin{matrix} {head}_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}) \\ MultiHead (Q, K, V) = Concat ({head}_{1}, \dots, {head}_{n}) W^{O} \end{matrix}

(2)

Here, Q, K, V represent the query, key, and value matrices, respectively;

d

denotes the dimensionality of the key vectors, and

\sqrt{d}

serves as a scaling factor to stabilize gradients. Each head is produced via learnable projections

W_{i}^{Q}

,

W_{i}^{K}

and

W_{i}^{V}

, and the multi-head output is obtained by concatenation and linear projection via

W^{O}

.

The encoder processes the input sequence through:

\begin{matrix} X e_{2} = LayerNorm (X e_{1} + MultiHeadAtt (X e_{1}, X e_{1}, X e_{1})) \\ X e_{3} = LayerNorm (X e_{2} + FeedForward (X e_{2})) \end{matrix}

(3)

When

X e_{1}

is input to the encoder, the multi-head self-attention mechanism extracts inter-feature relationships. The term

MultiHeadAtt (X e_{1}, X e_{1}, X e_{1})

captures long-range dependencies within the historical sequence, while the residual connection

X e_{1} + MultiHeadAtt (X e_{1}, X e_{1}, X e_{1})

preserves original sequence information. The residual connection around the attention output, followed by layer normalization (

LayerNorm

) assists in stabilizing training and preserving original signal information. The subsequent feedforward network further transforms features non-linearly, enhancing representational capacity.

The decoder operates through:

\begin{matrix} X d_{2} = LayerNorm (X d_{1} + Mask (MultiHeadAtt (X d_{1}, X d_{1}, X d_{1}))) \\ X d_{3} = LayerNorm (X d_{2} + MultiHeadAtt (X e_{3}, X e_{3}, X d_{2})) \end{matrix}

(4)

Here,

X d_{1}

represents the decoder’s input. The masked multi-head self-attention mechanism (

Mask (MultiHeadAtt)

) prevents the decoder from attending to future positions, ensuring autoregressive properties. The second attention mechanism performs encoder–decoder cross-attention, where the encoder output

X e_{3}

serves as key and value, allowing the decoder to align with the encoded context.

The final reconstructed sequences are obtained through:

{\hat{X}}_{i} = Sigmoid (Feedforward (X d_{4})), i = 1, 2

(5)

For the DLinear module, the reconstruction is formulated as:

C = T + R, \hat{T} = Linear (T), \hat{R} = Linear (R), D_{1} = Sigmoid (\hat{T} + \hat{R})

(6)

The DLinear model decomposes input series

C

into trend component

T

, representing long-term macroscopic variations, and residual component

R

, capturing short-term fluctuations. Each component undergoes separate linear transformations before being combined through linear aggregation, producing the final reconstructed output

D_{1}

.

This explicit decomposition establishes a crucial division of labor to handle non-stationary marine data. The DLinear module acts as a preconditioner, explicitly isolating non-stationary trends that typically overwhelm standard Transformers. This ‘clears the view’ for the Transformer to focus on modeling complex dependencies within the stabilized residuals where fault signatures reside. By creating this structured representation, the model enables more effective processing by subsequent components.

The DLinear–Transformer framework thus operates as a collaborative, dual-stage reconstruction model, where the DLinear and Transformer modules function as complementary reconstructors, each processing input through different inductive biases.

To effectively coordinate these two pathways, we introduce a dual-branch fusion mechanism. The outputs from both components are integrated via a gated aggregation layer, producing a refined, synergistic reconstruction:

O_{1} = γ \cdot D_{1} + (1 - γ) \cdot X_{1}, γ = σ (β)

(7)

where

D_{1}

is the reconstruction from the DLinear module,

X_{1}

is the first-stage reconstruction from the Transformer module,

σ

is the sigmoid function, and

β

is a learnable scalar parameter initialized to zero. This design allows the model to adaptively balance the contributions from the global trend-focused and local detail-focused branches.

The DLinear-based component aims to reconstruct input sequences with minimal error, minimizing the loss $L_{D} = {‖D_{1} - W‖}_{2}$ , where $W$ represents the true input window.
The Transformer-based component is optimized to reconstruct sequences accurately by minimizing $L_{T} = {‖X_{1} - W‖}_{2}$ .

The training process involves the simultaneous optimization of both components. The overall reconstruction loss function comprehensively aggregates the individual losses and the output of the fusion stage, defined as:

L = \frac{1}{n} {‖D_{1} - W‖}_{2} + (1 - \frac{1}{n}) {‖X_{2} - W‖}_{2} + λ {‖O_{1} - W‖}_{2}

(8)

where n represents the number of training epochs,

X_{2}

denotes the refined reconstruction from the second stage, and

λ

is a weighting hyperparameter that balances the influence of the fused output

O_{1}

.

Anomaly scores are computed using a multi-stage reconstruction discrepancy measure:

s = \frac{1}{2} ‖ X_{1} - W_{ε} ‖_{2} + \frac{1}{2} ‖ X_{2} - W_{ε} ‖_{2}

(9)

where

W_{ε}

represents the training set, and

X_{1}

,

X_{2}

are reconstruction outputs from both stages.

The Peak Over Threshold (POT) method [] is employed for dynamic threshold selection to classify anomalies, where time point i is flagged as anomalous when

s_{i} \geq τ

, with

τ

determined adaptively via POT.

In summary, the DLinear–Transformer leverages the Transformer’s strength in capturing complex nonlinear dependencies and long-range contexts, while the DLinear module excels at extracting multi-scale temporal features through adaptive decomposition and linear projection. This hybrid design effectively addresses the limitations of each individual model and provides a novel framework for multivariate time series analysis.

3. Anomaly Detection Experimental Design

To comprehensively validate the effectiveness and generalizability of the proposed DLinear–Transformer framework, we designed a multi-stage experimental evaluation protocol. This chapter details the datasets, evaluation metrics, and experimental methodology employed to assess the model’s performance across three critical aspects: (1) reconstruction accuracy for non-stationary time series, (2) anomaly detection capability in marine steam turbine systems, and (3) generalization performance on public benchmarks. All core experiments were conducted under consistent training–testing splits and preprocessing procedures to ensure fair and reproducible comparisons.

3.1. Dataset and Preprocessing

To rigorously evaluate the generalization capability of the proposed DLinear–Transformer, we employ one domain-specific dataset and two public benchmarks. The key characteristics of all datasets are summarized in Table 1.

Marine Steam Turbine Dataset: This self-compiled dataset was acquired from a custom-designed test rig. The rig comprises industrial components from actual marine systems, including a boiler, steam turbine, condenser, and associated pumps and valves, and was engineered to emulate the operational behavior of a marine steam turbine system. High-precision sensors were employed to collect high-resolution, time-synchronized data for 62 operational parameters under diverse loads (0–80 kW). This setup provides an effective simulation of actual system operation, thereby offering a realistic and high-fidelity testbed. The principal value of this dataset lies in its inclusion of annotated real-world fault data, which provides a crucial experimental basis for evaluating fault detection methods in maritime environments.
Public Benchmarks (SMD & SWaT): The SMD dataset provides multivariate time-series from large server clusters, ideal for validating performance on high-dimensional IT infrastructure data. The SWaT dataset, originating from a realistic water treatment testbed, contains both normal and attack sequences, making it a standard benchmark for detecting cyber-physical anomalies.

Table 1. Summary of datasets used in this study.

A uniform preprocessing pipeline was applied to all datasets to ensure comparability. The procedure included: (1) data cleaning using linear interpolation to handle missing values; (2) min-max normalization to scale all features to the range [0, 1]; and (3) temporal segmentation via a sliding window approach with a window length of L = 60 timesteps and 85% overlap to preserve temporal dependencies and phase continuity, which is essential for capturing dynamics in rotating machinery systems.

The model is evaluated using a hold-out strategy. To ensure a rigorous assessment of fault discrimination capability beyond the majority “normal” class, the training and test sets were carefully curated to provide a balanced representation of all fault types, avoiding the high imbalance inherent in simple temporal splits. This approach prevents performance metrics from being inflated and offers a truer measure of diagnostic utility in practice.

The integrity of the held-out test set is paramount for an unbiased assessment. For the main experiments across all datasets, a 70%/30% train-test split was employed to ensure fair and consistent comparisons. All hyperparameter optimization was performed via k-fold cross-validation within the training set to prevent information leakage. To ensure statistical reliability, all experiments were repeated over five independent runs with different random seeds. Performance metrics are reported as mean values with standard deviation, and statistical significance was assessed using two-sample t-tests. These measures account for training stochasticity and ensure robust, reproducible findings while establishing generalization performance. Unless otherwise specified, this partitioning strategy applies to all main experiments.

3.2. Evaluation Metrics

To comprehensively evaluate the performance of anomaly detection models under the severe class imbalance inherent in marine steam turbine operational data (where the normal-to-anomaly ratio exceeds 100:1), we employed two robust and widely recognized metrics: the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the F1-score. This focused selection provides a balanced and rigorous assessment crucial for real-world applications.

AUC-ROC:

This metric evaluates the model’s overall discriminative capacity across all possible classification thresholds. It plots the True Positive Rate against the False Positive Rate at various threshold settings. Its primary advantage is its robustness to class imbalance, making it particularly valuable for our industrial setting where anomalies are rare. A higher AUC-ROC indicates a model’s superior ability to distinguish between normal and anomalous states regardless of the chosen operating point.

F1-score:

The F1-score is the harmonic mean of precision and recall, providing a single metric that balances these two critical concerns. It is formally defined as:

F 1 - score = 2 \times \frac{Precision \times Recall}{Precison + Recall}

(10)

where Precision measures the model’s ability to avoid false alarms, and Recall measures its sensitivity to capture true anomalies. The F1-score is especially relevant in our context because it penalizes models that excel in one aspect (e.g., high precision) at the severe expense of the other (e.g., low recall), thus ensuring a balanced performance essential for safety-aware maritime systems.

Together, the threshold-agnostic AUC-ROC and the threshold-dependent F1-score provide a multi-faceted and reliable evaluation of model performance, effectively balancing the trade-off between missed detections and false alarms.

3.3. Experimental Design

The experimental design is structured to systematically evaluate the DLinear–Transformer framework through a comprehensive three-stage validation strategy, ensuring rigorous assessment of both its reconstruction capability and anomaly detection performance across diverse operational scenarios.

The validation strategy encompasses three critical aspects:

Reconstruction Performance: We first assess the DLinear–Transformer model’s ability to accurately reconstruct non-stationary time series data, with particular emphasis on its handling of complex temporal patterns and transient dynamics. Comparative analysis is conducted against state-of-the-art benchmarks to establish performance baselines.
Anomaly Detection Efficacy: The model’s performance in identifying diverse anomalous conditions within marine steam turbine operational data is evaluated through anomaly scenarios. This evaluation considers both point anomalies and contextual anomalies that manifest as subtle deviations from normal operational patterns.
Generalization Assessment: To demonstrate cross-domain applicability, we validate the model’s adaptability and robustness on two publicly available datasets: SMD and SWaT dataset This assessment examines the model’s transfer learning capability and its performance in different industrial contexts.

The model is evaluated using the hold-out strategy detailed in Section 3.1, wherein the integrity of the held-out test set is paramount for an unbiased assessment. All hyperparameter optimization was performed via k-fold cross-validation within the training set to prevent information leakage. To ensure statistical reliability, all experiments were repeated over five independent runs with different random seeds. Performance metrics are reported as mean values with standard deviation, and statistical significance was assessed using two-sample t-tests. These measures account for training stochasticity and ensure robust, reproducible findings while establishing generalization performance.

All experiments were conducted on a standardized computing platform equipped with an NVIDIA RTX 3080 GPU and 32GB RAM, using the PyTorch (v1.12.1) and scikit-learn (v1.2.0) frameworks. This configuration ensured consistent performance measurements and reproducibility, while providing a reasonable reference for potential naval embedded system implementations.

4. Results

Building on the experimental protocol established in Section 3.3, this section systematically presents the evaluation results for the DLinear–Transformer framework. The analysis provides a robust assessment of the model’s capabilities, benchmarked against contemporary methods, by synthesizing qualitative illustrations with quantitative metrics.

4.1. Reconstruction Performance

The initial phase of our evaluation focuses on the fundamental requirement of any reconstruction-based anomaly detection model: the ability to accurately reconstruct normal operational data. This subsection assesses the performance of the DLinear–Transformer framework in replicating complex, non-stationary time series, using steam turbine inlet flow rate data as a benchmark. The evaluation is presented from two perspectives: a qualitative visual comparison and a quantitative metric-based analysis. For a comprehensive benchmark, we selected two state-of-the-art models representing distinct advanced paradigms: TranAD, recognized for its superior performance in capturing long-range dependencies in multivariate time series using deep transformer networks, and Graph Deviation Network (GDN) [], which excels in modeling inter-sensor relationships as a graph for anomaly detection in complex systems. This selection provides a rigorous test against leading data-driven approaches.

The reconstruction fidelity is first evaluated via qualitative visual analysis. As shown in Figure 2, the sequences reconstructed by DLinear–Transformer, TranAD, and GDN for the steam turbine inlet flow rate are directly juxtaposed. Evidently, the proposed framework’s output aligns more precisely with the true data trajectory, especially during transient phases. Conversely, the benchmarks display pronounced deviations and overshooting, underscoring their limitations in modeling the complex, non-stationary dynamics.

Figure 2. Qualitative comparison of time-series reconstruction. (a) DLinear–Transformer and TranAD; (b) DLinear–Transformer and GDN.

To ensure statistical reliability through five independent runs, the quantitative benchmarking substantiates the qualitative advantage of the DLinear–Transformer. As illustrated in Figure 3 and detailed in Table 2, the DLinear–Transformer achieves statistically significant superiority across all evaluation dimensions. It attains a Mean Absolute Error (MAE) of 0.037 and a Normalized Root Mean Square Error (NRMSE) of 0.042, corresponding to reductions of 67.3% and 68.2% against TranAD (p < 0.001). This statistically significant superiority extends to handling transients, with a Peak Variation (PV) of 0.298, which is 44.3% lower than that of TranAD.

Figure 3. Reconstruction performance of the proposed DLinear–Transformer benchmarked against TranAD and GDN. (a) Normalized Mean Absolute Error; (b) Normalized Peak Variation. Asterisks indicate statistically significant improvements of DLinear–Transformer over all baselines (*** p < 0.001, two-sample t-test).

Table 2. Comparative reconstruction performance on the Marine Steam Turbine dataset.

This marked superiority in reconstruction is not merely quantitative but critical for downstream tasks. The low MAE and NRMSE indicate that the model establishes a high-fidelity baseline of normal system behavior. This precision ensures that subsequent spikes in reconstruction error can be attributed to genuine anomalies with higher confidence, rather than being masked by inherent model inaccuracies. Consequently, the demonstrated fidelity provides a robust foundation for the anomaly detection performance discussed in the subsequent section.

4.2. Anomaly Detection Efficacy

An empirical evaluation of the anomaly detection performance was conducted by applying the DLinear–Transformer framework to a selected 3500-s operational segment derived from a marine steam turbine system. This segment is of particular diagnostic value as it contains a well-documented critical fault event: an emergency shutdown triggered by a condenser water level regulation failure, which occurred between 2652 and 2676 s. The analysis concentrates on the detection results for six key thermodynamic and control parameters, with the outcomes visually summarized in Figure 4.

Figure 4. Anomaly detection results for critical operational parameters over a 3500-s duration. The monitored parameters include: (a) turbine inlet flow rate, (b) steam pressure of the inlet chamber, (c) governor valve opening, (d) condenser water level, (e) main steam valve opening, and (f) condenser vacuum. The purple curves represent the ground truth measurements, the orange curves show the model’s reconstructions, and the red shaded areas indicate periods flagged as anomalous.

The analysis of the results clearly demonstrates the substantial anomaly detection capabilities of the DLinear–Transformer framework. The model excels not only in identifying conspicuous extreme deviations but also in capturing more subtle and complex abnormal patterns, as evidenced by its consistent performance across the multiple operational parameters examined.

First, regarding transient anomaly detection, the model proves highly effective in identifying short-duration deviations. For instance, the transient anomalies detected in the turbine inlet flow rate (Figure 4a, at 553–556 s and 579–582 s) and those in the main steam valve opening (Figure 4e, 490–507 s) exhibit clear temporal correlations with dynamic changes in the governor valve opening (Figure 4c). This capability to not only isolate anomalous events but also link them to specific control actions provides actionable insights for refining control strategies and mitigating the impact of transient operational events.

Moving beyond transient events, the framework demonstrates a critical capacity to detect nuanced fluctuation anomalies, which signify sustained operational deviations. This is exemplified by the model’s identification of persistent abnormal variations in the inlet chamber steam pressure over an extended period from 1281 to 2008 s (Figure 4b). The accurate detection of such prolonged subtle shifts indicates the model’s sensitivity to incipient faults, defined as those that develop gradually over time. This capability shifts the application’s focus from reacting to abrupt failures towards diagnosing evolving system degradation, forming the basis for condition-based monitoring.

The most significant advancement, however, lies in the framework’s proficiency for early warning. It demonstrates exceptional skill in identifying precursor anomalies that precede major system failures. During the pre-fault phase (2620–2628 s), statistically significant anomalies were detected concurrently across several interdependent parameters, including inlet steam pressure (Figure 4b), governor valve opening (Figure 4c), condenser water level (Figure 4d), and condenser vacuum (Figure 4f). The synchronous detection of these coordinated precursor signals, with a substantial lead time prior to failure, underscores the framework’s practical value beyond mere anomaly detection. Based on established maintenance practices and expert domain knowledge, such consistent early warnings are sufficient to trigger proactive interventions, such as inspection or planned component replacement, effectively enabling a shift from reactive fault handling to predictive maintenance.

The framework’s practical efficacy is further solidified by its computational efficiency, a critical factor for marine engineering applications. On a standardized testing platform, it achieved an average inference time of 15.3 ms per sample. Evaluated on a test set that was carefully constructed to ensure a balanced representation of fault types and mitigate operational data’s inherent class imbalance, the method demonstrates a favorable balance between detection capability and processing speed, achieving 94.6% overall fault detection accuracy under dynamic maritime load conditions alongside its computational performance.

In summary, the experimental results confirm the substantial practical utility of the DLinear–Transformer framework. By comprehensively identifying a spectrum of anomalies, ranging from transient events to sustained fluctuations and critical precursor signals, and doing so with high accuracy and efficiency, it effectively advances anomaly detection from post-factum identification to a proactive analytical tool. This tool is fully capable of supporting predictive maintenance decisions and operational optimization in real-world marine steam turbine systems.

4.3. Generalization Assessment

Following the experimental protocol outlined in Section 3.3, the generalization capability of the DLinear–Transformer framework was assessed on two public datasets: SMD and SWaT. This phase of validation examines the model’s adaptability and robustness across different industrial contexts. The performance, quantified by AUC-ROC and F1-score, is compared against state-of-the-art methods in Table 3, under a unified evaluation protocol.

Table 3. Comparative performance evaluation on the SMD and SWaT datasets under a unified evaluation protocol.

The results clearly demonstrate the robust generalization capability of the DLinear–Transformer framework. On the SMD dataset, which captures monitoring metrics from large-scale server machines, the model achieves an AUC-ROC of 0.9988 and an F1-score of 0.9871, demonstrating a statistically significant improvement over all baselines. Notably, the F1-score exceeds that of TranAD, a strong contemporary model, by 2.7% (0.9871 vs. 0.9605), while AUC-ROC shows a 0.1% gain. This superior performance underscores the model’s exceptional accuracy in server anomaly detection.

Similarly, on the SWaT dataset, a challenging benchmark derived from an industrial water treatment system with cyber-physical attack scenarios, the proposed method also attains the highest AUC-ROC (0.8514) and F1-score (0.8179) among all compared approaches, corresponding to consistent gains of 0.2% and 0.3% over TranAD, respectively. The key insight is the model’s consistent top-tier performance across these diverse domains. Such a balanced and superior outcome suggests that the hybrid architecture effectively captures discriminative temporal patterns that generalize across domains.

The consistent superiority of DLinear–Transformer on both datasets underscores its capability to handle diverse anomaly types and operational regimes. The SMD dataset involves high-dimensional, stochastic server metrics, while SWaT embodies strong temporal constraints and physical attack signatures. That the same model excels in both contexts indicates that its design, which combines the multi-scale feature extraction of DLinear with the global dependency modeling of the Transformer, provides a generally applicable solution for multivariate time-series anomaly detection.

In summary, the compelling results on SMD and SWaT confirm the strong cross-domain generalization capacity of the DLinear–Transformer framework. These outcomes validate that the proposed hybrid architecture is not only effective in marine steam turbine applications but also possesses the versatility to address anomaly detection tasks across a wide spectrum of industrial systems, highlighting its potential as a scalable and robust solution for real-world monitoring applications.

5. Analyses and Discussion

The results confirm the performance and generalization capability of the DLinear–Transformer framework. To better understand its internal mechanisms, this section employs two analytical approaches. As a data-driven model, the framework prioritizes generalizability and pattern recognition, although it differs from physics-based models, which offer explicit mechanistic insights through embedded domain knowledge but require complete prior physical understanding. The following analyses conducted here aim to explore this data-driven paradigm: an ablation study quantifies the contributions of core components, while a sensitivity analysis evaluates robustness under varying data conditions. Finally, limitations and future potential are critically discussed to position the framework within the broader research context and suggest directions for further development.

5.1. Ablation Analysis

To quantitatively evaluate the individual contributions of the core components within the proposed DLinear–Transformer framework, a systematic ablation study was conducted. The investigation focused on two pivotal design elements: (1) the two-stage learning process, which iteratively refines reconstructions using divergence-based attention, and (2) the DLinear module, dedicated to multi-scale temporal feature extraction. The complete DLinear–Transformer model was rigorously compared against two ablated variants on the SMD and SWaT datasets, with performance quantified by the F1-score, as summarized in Table 4.

Table 4. Results of the ablation study.

The ablation results provide clear evidence of the distinct and complementary contributions of each component. The complete DLinear–Transformer architecture consistently achieves the highest F1-score on both datasets, affirming the synergistic effect of its integrated design.

To statistically validate these performance differences, we conducted a paired t-test based on five independent training runs. The performance improvement of the complete model over the variant without the two-stage process was statistically significant on the SMD dataset (p < 0.05). More notably, the advantage over the model without the DLinear module was highly significant on both SMD and SWaT (p < 0.01). This confirms that the observed synergies are robust and not attributable to random variation.

The exclusion of the two-stage processing framework led to a discernible performance degradation on both benchmarks. The F1-score decreased from 0.9871 to 0.9821 on SMD (ΔF1-score = −0.50%) and from 0.8179 to 0.8151 on SWaT (ΔF1-score = −0.34%). Although the absolute reduction appears modest, it is consistent and indicates that the two-stage mechanism contributes to enhanced modeling precision. This refinement process, which focuses attention on reconstruction discrepancies, is crucial for achieving state-of-the-art performance against strong baselines.

In contrast, removing the DLinear module resulted in a more substantial and dataset-dependent impact. A significant performance drop was observed on the SMD dataset, with the F1-score declining by 8.1% (from 0.9871 to 0.9069). This pronounced effect underscores the indispensable role of the DLinear component in processing the strong local trends and seasonal variations characteristic of server machine metrics. Conversely, the impact on the SWaT dataset was minimal (ΔF1-score = −0.49%), suggesting that for this cyber-physical system data, the Transformer backbone possesses a considerable capacity to model temporal dependencies even without explicit linear decomposition. Nevertheless, the full model’s superior performance on SWaT confirms that the integrated approach provides a more robust and stable solution.

Overall, the ablation study validates the necessity of both the DLinear module and the two-stage learning framework. A key finding from our scenario-specific analysis is the distinct operational preference of each component: the DLinear module proves particularly indispensable in environments with pronounced trend components, such as the SMD dataset where its removal caused a significant 8.1% performance drop, whereas the two-stage mechanism delivers consistent refinements across all scenarios but becomes critically important for achieving precise fault localization in complex, multi-sensor anomaly patterns. The DLinear component is vital for effective local feature decomposition, particularly in environments with pronounced trend components, while the two-stage mechanism delivers consistent refinements for superior accuracy. Their combination forms a complementary and versatile architecture well-suited for a broad spectrum of industrial monitoring scenarios.

5.2. Sensitivity Analysis

To comprehensively evaluate the data efficiency of the DLinear–Transformer framework, we conducted a sensitivity analysis on the proportion of training data. This analysis serves as a supplementary investigation to the fair-comparison experiments reported in Section 3 and Section 4.

We utilized four public benchmarks: the SMD and SWaT datasets from our primary evaluation, along with two additional public benchmarks, namely the Soil Moisture Active Passive (SMAP) dataset and the Mars Science Laboratory (MSL) rover dataset []. For this specific analysis, all four datasets were re-partitioned using a consistent temporal block-wise stratified sampling strategy. The training ratio was varied from 20% to 100% (specifically, 20%, 40%, 60%, 80%, and 100%) against a fixed test set to evaluate performance scalability with increasing data volume.

The results, summarized in Figure 5, demonstrate a clear and statistically significant positive correlation between model performance and training data volume across all four datasets, as measured by both the F1-score and the AUC-ROC.

Figure 5. Sensitivity analysis of the DLinear–Transformer framework to training data volume. (a) F1-score versus training set size ratio across four datasets. (b) Corresponding AUC-ROC values versus training set size ratio.

A detailed analysis of Figure 5 reveals distinct learning phases. The framework demonstrates exceptional initial data efficiency, achieving the majority of its final performance (e.g., over 90% of the maximum F1-score for SMD and SWaT) with only 20% of the training data. Beyond this point, the learning trajectories diverge based on dataset complexity. While the SMAP dataset shows signs of performance saturation in its later stages, others like SMD and MSL continue to exhibit meaningful gains up to the full dataset, indicating that the point of diminishing returns is not universal but is instead context-dependent.

This nuanced understanding directly informs practical deployment strategies. The framework’s robustness supports two complementary approaches: (1) For rapid deployment or in cost-sensitive scenarios, a minimal viable dataset (as low as 20–40%) can bootstrap a highly competent model. (2) For maximizing performance on complex systems or for long-term asset monitoring, a strategy of continuous data collection is justified, as the model reliably converts additional data into enhanced accuracy without signs of overfitting, as evidenced by the sustained growth of curves like SMD and MSL.

Critically, the absence of performance degradation at larger data volumes across all datasets confirms the model’s stability and its capacity to beneficially utilize additional information. This provides a reliable foundation for implementing long-term predictive maintenance strategies in marine steam turbine systems.

5.3. Limitations and Future Outlook

While the proposed framework demonstrates strong performance, this study is not without its limitations, which in turn illuminate productive paths for future research.

A primary consideration lies in the practical deployment of the framework. Its computational requirements, though manageable in our experimental setup, could present challenges for implementation on extremely resource-constrained edge devices. Furthermore, the model’s current performance is contingent upon the availability of sufficient labeled fault data for training, which is often scarce in real-world naval applications. Another limitation concerns the evaluation under diverse operational conditions. Although the dataset incorporates inherent variability, a formal stratified analysis of performance across specific environmental states, such as distinct sea states or load regimes, was not conducted. This aligns with the broader challenge of Environmental and Operational Variability (EOV) in structural health monitoring, as discussed by Rezazadeh et al. [].

To address these limitations and build upon the validated capabilities of the framework, several directions for future work are envisioned. The immediate next step involves prioritizing the transition from offline validation to online deployment. Initial implementation on a digital twin testbed has confirmed basic operational feasibility, laying the groundwork for systematic evaluation under true streaming conditions, with a focus on long-term stability, inference latency, and computational throughput.

Concurrently, technical enhancements will focus on several key areas:

Few-Shot Anomaly Detection by developing semi-supervised or self-supervised learning strategies to mitigate the dependency on extensive labeled anomaly datasets.
Fault Classification through the integration of a multi-class module to differentiate specific fault types (e.g., bearing wear, fouling, actuator failure), thereby advancing the system from general detection to precise diagnosis.
Causal Analysis Extension to move beyond detection towards identifying root causes of anomalies, thereby providing more actionable insights for maintenance crews.

Finally, the framework’s data-driven nature and its proven ability to model transient dynamics and multi-sensor interactions without relying on domain-specific assumptions strongly suggest its inherent suitability for fault diagnosis in other critical rotating systems. This foundational versatility, validated by its consistent performance across several public benchmarks, indicates strong promise for immediate future implementations in systems such as gearboxes, wind turbines, and diesel engines. Consequently, a comprehensive empirical validation for these distinct assets represents a primary objective and a logical extension of this work.

6. Conclusions

This study establishes a reliable, data-driven framework for fault diagnosis in marine steam turbines through a hybrid DLinear–Transformer model. By synergistically integrating the multi-scale feature extraction of DLinear with the global dependency modeling of the Transformer, the proposed approach effectively addresses the challenge of analyzing non-stationary operational data under dynamic maritime conditions. Experimental results demonstrate the framework’s superior performance: it achieved a detection accuracy of 94.6% on real shipboard turbine data, alongside statistically significant F1-score improvements of 2.7% on the SMD dataset and 0.3% on the SWaT dataset over the advanced TranAD baseline.

Beyond performance benchmarking, comprehensive ablation studies quantitatively validated the contribution of each architectural component, while sensitivity analysis confirmed the model’s robustness. The framework’s proficiency in identifying a spectrum of anomalies, ranging from transient events to sustained deviations, underscores its practical utility for real-world condition monitoring. Looking ahead, this system-level detection framework provides a solid foundation for evolution towards automated fault diagnosis and causal analysis. These contributions collectively advance the field of intelligent marine system monitoring and ultimately represent a concrete step toward more autonomous and resilient marine propulsion systems.

Author Contributions

Conceptualization, G.C. and Z.Y.; methodology, Z.Z.; validation, Z.Z. and L.X.; formal analysis, Z.Z.; investigation, J.W.; data curation, Z.Z. and L.X.; writing—original draft preparation, Z.Z.; writing—review and editing, Z.Z. and G.C.; visualization, L.X.; supervision, G.C. and Z.Y.; project administration, Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Restrictions apply to the availability of the marine steam turbine system dataset. These proprietary, domain-specific data were generated by the authors and are not publicly available to protect intellectual property and commercial interests. However, the proprietary marine data are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank all co-authors for their contributions and fruitful collaborations. We would also like to acknowledge the technical support from our research group and the use of its physics experimental facilities. Finally, we extend our gratitude to the editors and anonymous reviewers for their valuable time and insightful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SMD	Server Machine Dataset
SWaT	Secure Water Treatment
LSTM	Long Short-term Memory Network
SVM	Support Vector Machine
POT	The Peaks over Threshold
GDN	Graph Deviation Network
AUC-ROC	Area Under the Receiver Operating Characteristic Curve
MAE	Mean Absolute Error
PV	Peak Variation
NRMSE	Normalized Root Mean Square Error
SMAP	Soil Moisture Active Passive
MSL	Mars Science Laboratory

References

MacQueen, J.B. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1967; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Kingsbury, K.; Alvaro, P. Elle: Inferring Isolation Anomalies from Experimental Observations. Proc. VLDB Endow. 2020, 14, 268–280. [Google Scholar] [CrossRef]
Boniol, P.; Paparrizos, J.; Palpanas, T.; Franklin, M.J. SAND: Streaming Subsequence Anomaly Detection. Proc. VLDB Endow. 2021, 14, 1717–1729. [Google Scholar] [CrossRef]
Montgomery, D.C. Introduction to Statistical Quality Control, 8th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2019; ISBN 978-1-119-39930-8. [Google Scholar]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
He, X.; Zhao, K.; Chu, X. AutoML: A Survey of the State-of-the-Art. Knowl. Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
Yan, X.; Rastogi, A.; Villegas, R.; Sunkavalli, K.; Shechtman, E.; Hadap, S.; Yumer, E.; Lee, H. MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 265–281. [Google Scholar]
Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.K. MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2019, Munich, Germany, 17–19 September 2019; pp. 703–716. [Google Scholar]
Yuan, H.; Cai, Z.; Zhou, H.; Wang, Y.; Chen, X. TransAnomaly: Video Anomaly Detection Using Video Vision Transformer. IEEE Access 2021, 9, 123977–123986. [Google Scholar] [CrossRef]
Du, B.; Sun, X.; Ye, J.; Cheng, K.; Wang, J.; Sun, L. GAN-Based Anomaly Detection for Multivariate Time Series Using Polluted Training Set. IEEE Trans. Knowl. Data Eng. 2023, 35, 12208–12219. [Google Scholar] [CrossRef]
Shin, A.H.; Kim, S.T.; Park, G.M. Time Series Anomaly Detection Using Transformer-Based GAN with Two-Step Masking. IEEE Access 2023, 11, 74035–74047. [Google Scholar] [CrossRef]
Tuli, S.; Casale, G.; Jennings, N.R. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. arXiv 2022. [Google Scholar] [CrossRef]
Spirto, M.; Nicolella, A.; Melluso, F.; Malfi, P.; Cosenza, C.; Savino, S.; Niola, V. Enhancing SDP-CNN for Gear Fault Detection Under Variable Working Conditions via Multi-Order Tracking Filtering. J. Dyn. Monit. Diagn. 2025; Early Access. [Google Scholar] [CrossRef]
Lin, J.; Han, C.; Lu, W.; Song, L.; Chen, P.; Wang, H. Improved Spectral Amplitude Modulation Based on Sparse Feature Adaptive Convolution for Variable Speed Fault Diagnosis of Bearing. J. Dyn. Monit. Diagn. 2025, 4, 31–43. [Google Scholar] [CrossRef]
Hu, L.; Liu, L.; Yang, J.; Hu, H.; Zheng, C.; Yu, Y. Fault Diagnosis Based on Deep Transfer Learning for Marine Turbocharger. Int. J. Mech. Sci. 2025, 300, 110444. [Google Scholar] [CrossRef]
Irani, F.N.; Soleimani, M.; Yadegar, M.; Meskin, N. Deep Transfer Learning Strategy in Intelligent Fault Diagnosis of Gas Turbines Based on the Koopman Operator. Appl. Energy. 2024, 365, 123256. [Google Scholar] [CrossRef]
Allal, Z.; Noura, H.N.; Vernier, F.; Salman, O.; Chahine, K. Wind Turbine Fault Detection and Identification Using a Two-Tier Machine Learning Framework. Intell. Syst. Appl. 2024, 22, 200372. [Google Scholar] [CrossRef]
Liu, Y.; Chen, L.; Shangguan, D.; Yu, C. A Data-Driven Fault Diagnosis Method for Marine Steam Turbine Condensate System Based on Deep Transfer Learning. Machines 2025, 13, 708. [Google Scholar] [CrossRef]
Chen, J.; Li, Z.; Pan, J.; Chen, G.; Zi, Y.; Yuan, J.; Chen, B.; He, Z. Wavelet Transform Based on Inner Product in Fault Diagnosis of Rotating Machinery: A Review. Mech. Syst. Signal Process. 2016, 70, 1–35. [Google Scholar] [CrossRef]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. arXiv 2023. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar]
Siffer, A.; Fouque, P.A.; Termier, A.; Largouët, C. Anomaly Detection in Streams with Extreme Value Theory. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 1067–1075. [Google Scholar]
Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar]
Mathur, A.P.; Tippenhauer, N.O. SWaT: A Water Treatment Testbed for Research and Training on ICS Security. In Proceedings of the 2016 International Workshop on Cyber-Physical Systems for Smart Water Networks (CySWater), Vienna, Austria, 11 April 2016; pp. 31–36. [Google Scholar]
Deng, A.; Hooi, B. Graph Neural Network-Based Anomaly Detection in Multivariate Time Series. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 4027–4035. [Google Scholar]
Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar]
Rezazadeh, N.; De Luca, A.; Perfetto, D.; Salami, M.R.; Lamanna, G. Systematic Critical Review of Structural Health Monitoring Under Environmental and Operational Variability: Approaches for Baseline Compensation, Adaptation, and Reference-Free Techniques. Smart Mater. Struct. 2025, 34, 073001. [Google Scholar] [CrossRef]

Figure 1. The specific structure of the DLinear–Transformer model.

Figure 2. Qualitative comparison of time-series reconstruction. (a) DLinear–Transformer and TranAD; (b) DLinear–Transformer and GDN.

Figure 3. Reconstruction performance of the proposed DLinear–Transformer benchmarked against TranAD and GDN. (a) Normalized Mean Absolute Error; (b) Normalized Peak Variation. Asterisks indicate statistically significant improvements of DLinear–Transformer over all baselines (*** p < 0.001, two-sample t-test).

Figure 4. Anomaly detection results for critical operational parameters over a 3500-s duration. The monitored parameters include: (a) turbine inlet flow rate, (b) steam pressure of the inlet chamber, (c) governor valve opening, (d) condenser water level, (e) main steam valve opening, and (f) condenser vacuum. The purple curves represent the ground truth measurements, the orange curves show the model’s reconstructions, and the red shaded areas indicate periods flagged as anomalous.

Figure 5. Sensitivity analysis of the DLinear–Transformer framework to training data volume. (a) F1-score versus training set size ratio across four datasets. (b) Corresponding AUC-ROC values versus training set size ratio.

Table 1. Summary of datasets used in this study.

Dataset	Domain	Features	Key Characteristics
Marine Steam Turbine	Marine Engineering	62	Real operational data from marine turbine systems under varying loads (0–80 kW); contains annotated anomalies.
SMD	Server Machine	38	Multivariate data from 28 servers over 5 weeks; includes hardware/performance metrics.
SWaT	Water Treatment	51	11-day dataset with 7 days normal operation and 4 days under cyber-attack scenarios.

Table 2. Comparative reconstruction performance on the Marine Steam Turbine dataset.

Model	MAE	NRMSE	PV
GDN	0.339 ± 0.015	0.398 ± 0.020	0.708 ± 0.035
TranAD	0.113 ± 0.005	0.132 ± 0.008	0.535 ± 0.025
DLinear–Transformer	0.037 ± 0.002 ***	0.042 ± 0.003 ***	0.298 ± 0.012 ***

Values are presented as mean ± standard deviation over 5 independent runs. *** indicates statistically significant improvement over all baselines (p < 0.001, two-sample t-test).

Table 3. Comparative performance evaluation on the SMD and SWaT datasets under a unified evaluation protocol.

Method	SMD		SWaT
Method	F1-Score	AUC-ROC	F1-Score	AUC-ROC
GDN	0.9238 ± 0.012	0.9924 ± 0.004	0.8101 ± 0.015	0.8462 ± 0.008
TranAD	0.9605 ± 0.007	0.9974 ± 0.001	0.8151 ± 0.009	0.8491 ± 0.007
DLinear–Transformer	0.9871 ± 0.004 **	0.9988 ± 0.001	0.8179 ± 0.008	0.8514 ± 0.007

Results are presented as mean ± standard deviation over 5 independent runs. All baselines were re-implemented and evaluated under our unified data preprocessing and splitting pipeline to ensure a fair comparison. ** denote statistically significant improvement over the second-best model with p < 0.01 (two-sample t-test).

Table 4. Results of the ablation study.

Method	F1-Score
Method	SMD	SWaT
The model without the two-stage process	0.9821	0.8151
The model without DLinear	0.9069	0.8139
The complete DLinear–Transformer model	0.9871	0.8179

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Thermo-Mechanical Fault Diagnosis for Marine Steam Turbines: A Hybrid DLinear–Transformer Anomaly Detection Framework

Abstract

1. Introduction

2. A Hybrid DLinear–Transformer Framework for Anomaly Detection

3. Anomaly Detection Experimental Design

3.1. Dataset and Preprocessing

3.2. Evaluation Metrics

3.3. Experimental Design

4. Results

4.1. Reconstruction Performance

4.2. Anomaly Detection Efficacy

4.3. Generalization Assessment

5. Analyses and Discussion

5.1. Ablation Analysis

5.2. Sensitivity Analysis

5.3. Limitations and Future Outlook

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics