Adaptive Structured Latent Space Learning via Component-Aware Triplet Convolutional Autoencoder for Fault Diagnosis in Ship Oil Purifiers

Chae, Sun Geu; Yun, Gwang Ho; Park, Jae Cheul; Jang, Hwa Sup

doi:10.3390/pr13093012

Open AccessArticle

Adaptive Structured Latent Space Learning via Component-Aware Triplet Convolutional Autoencoder for Fault Diagnosis in Ship Oil Purifiers

¹

Department of Reliability, Virtual Engineering Platform Research Division, Korea Institute of Machinery and Materials, Daejeon 34103, Republic of Korea

²

AI Convergence Center, R&D Part, Korean Register, Seoul 46762, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Processes 2025, 13(9), 3012; https://doi.org/10.3390/pr13093012

Submission received: 31 July 2025 / Revised: 14 September 2025 / Accepted: 19 September 2025 / Published: 21 September 2025

(This article belongs to the Section Automation Control Systems)

Download

Browse Figures

Versions Notes

Abstract

Timely and accurate fault diagnosis of ship oil purifiers is essential for maintaining the operational reliability of a degree-4 maritime autonomous surface ship (MASS). Conventional approaches rely on manual feature engineering or simple machine learning classifiers, limiting their robustness in dynamic maritime environments. This study proposes an adaptive latent space learning framework that couples a two-dimensional convolutional autoencoder (2D-CAE) with a component-aware triplet-loss regularizer. The loss term structures the latent space to reflect both the fault severity progression and component-specific distinctions, enabling severity-proportional distances among a latent vector learned directly from vibration signals even in a limited data environment. Using data collected on a dedicated ship oil purifier test bed, the method yields a latent vector that encodes the fault severity and physical provenance, enhancing the interpretability and diagnostic accuracy. Experiments demonstrate enhanced performance over state-of-the-art deep models, while offering clear insight into fault evolution and inter-component dependencies. The framework thus advances intelligent, condition-based maintenance for autonomous maritime systems.

Keywords:

condition-based maintenance; embeddings; MASS; vibration signal

Graphical Abstract

1. Introduction

The maritime sector is rapidly adopting maritime autonomous surface ships, with degree-4 vessels capable of operating without human intervention. These autonomous ships promise to transform global shipping through improved efficiency, sustainability, and safety [1]. However, their success depends on exceptional operational reliability. Without onboard crew, minor defects can quickly escalate into mission-critical failures, making continuous self-diagnosis and health monitoring essential [2].

Within this autonomous ecosystem, the dependable functioning of auxiliary subsystems is intrinsically related to the vessel’s overall integrity. Ship oil purifiers, which are centrifugal separators tasked with excising deleterious water, sludge, and metallic particulates from fuel and lubricating oils, constitute a particularly critical subsystem [3,4]. Inadequate purification often stems from mechanical degradation, where faults such as bearing wear, mass imbalance from sludge, and drive system defects manifest as distinct changes in the machine’s vibration signature [5], causing accelerated engine wear, diminished power generation, and compromised safety margins, thereby jeopardizing both the propulsion and electrical power trains [3,6]. Consequently, the construction of diagnostic methodologies that reliably, accurately, and promptly detect purifier faults is indispensable to the viability of autonomous maritime operations.

Historically, purifier fault diagnosis has been formulated as a two-stage procedure encompassing, first, the derivation of discriminative features from raw sensor streams (typically vibration signals) and, second, the classification of health states based on those features [7]. In practice, fluctuating loads and stochastic sea states make manual feature design difficult. Conventional approaches, predicated on domain experts selecting statistical or spectral descriptors, frequently exhibit insensitivity to incipient faults and dependence on heuristic judgment, thereby demonstrating insufficient performance under the nonlinear, non-stationary dynamics prevalent in situ [8]; moreover, they seldom exploit the rich inter-channel couplings that emerge when multiple accelerometers are deployed.

In contrast, data-driven paradigms that leverage deep learning have demonstrated appreciable capability for adaptive feature extraction and nonlinear representation [9,10]. Architectures such as one-dimensional convolutional neural networks (1D-CNNs) and classical autoencoders have already been utilized to infer latent health information directly from raw vibration data. Nevertheless, because 1D kernels process each sensor channel independently, they generally fail to capture the cross-channel dynamics that precede mechanical degradation. Two-dimensional convolutional models, which conceptualize the multi-channel time series as an image-like tensor, overcome this limitation by learning joint spatio-temporal patterns and, when combined with channel attention mechanisms, by weighting sensor streams according to their diagnostic salience [11]. Beyond general-purpose designs, maritime-focused studies have recently broadened this landscape: autoencoder-based anomaly detection with the Mahalanobis distance deployed within an integrated digital-twin decision platform for induction motor monitoring [12], frameworks that augment scarce data of diesel generator failure modes [13], data-driven fault-tolerant control of ship power–condenser systems via real-time optimization [14], and ”agentic” large language model-assisted anomaly-reasoning pipelines that contextualize multimodal signals across shipping operations [15]. However, combined with agentic or spatio-temporal pattern learning methods, complex deep learning algorithms require a large amount of training data for high performance.

Metric learning, notably triplet-loss, further structures the latent space and helps in limited-data settings. Triplet-loss optimizes the latent vector such that anchor–positive pairs (from the same class) are closer than anchor–negative pairs (from different classes), thereby enhancing intra-class cohesion and inter-class separability, even under a limited-data environment [16]. However, the canonical formulation remains class-centric: it neither imposes an ordinal structure reflecting fault-severity progression nor acknowledges functional distinctions between faults arising in disparate machine components. Adaptive margin triplet-loss has recently been proposed to modulate the margin according to relative ratings and can improve the training stability and ranking performance [17]. Triplet-loss has also been used to guide adversarial domain adaptation for bearing fault diagnosis, where it aligns class-level distributions across domains and improves transferability [18]. Yet, to fully harness these metric-learning advancements within an MASS, diagnostic frameworks must extend beyond mere classification and explicitly represent both the severity gradations and the distinct origins of faults across ship subsystems.

To address these deficiencies, the present study proposes an adaptive structured latent-space learning framework for ship oil purifier fault diagnosis, denoted as AS-LSLF hereinafter. The framework integrates a two-dimensional convolutional autoencoder (2D-CAE) augmented with lightweight channel-attention blocks and a component-aware triplet-loss function. By actively arranging the latent manifold, the proposed loss ensures that the latent vector simultaneously encodes (i) ordinal fault-severity levels so that degradation levels appear proportionally spaced and (ii) categorical distinctions between physical subsystems, such as the bowl assembly versus the drive train. This design yields discriminative, interpretable features from raw vibration data, improving the diagnostic accuracy and insight into fault evolution. The principal contributions of this work are threefold:

We construct an automated latent-space learning framework that couples a channel-attentive 2D-CAE with a metric-learning objective, thereby obviating manual feature engineering and yielding interpretable latent vector from six-channel vibration signals.
We introduce a component-aware triplet-loss formulation that jointly embeds ordinal severity information and categorical component identity, enhancing robustness and explanatory power in the resulting diagnostic model.
We validate the proposed framework on a dedicated ship oil purifier test bed, demonstrating superior accuracy and interpretability relative to conventional feature fusion methods and contemporary deep learning baselines.

Section 2 delineates the architecture of the adaptive structured latent-space learning framework, details the component-aware triplet-loss formulation, and presents the experimental environment, simulated fault conditions, and data acquisition protocol. Section 3 analyses the empirical results, comparing the proposed approach with benchmark models and examining the learned latent structure. Finally, Section 4 summarizes the principal findings and discusses their implications for condition-based maintenance in autonomous maritime systems.

2. Methodology: Adaptive Structured Latent-Space Learning Framework

2.1. Proposed Framework and Problem Formulation

Conventional deep learning pipelines for machinery diagnosis frequently regard feature extraction as an opaque optimization exercise driven exclusively by classification accuracy. These black-box representations, although occasionally effective under stationary conditions, generally deteriorate when confronted with limited training data, showing limited explanatory power for maintenance personnel and, of particular consequence for degree-4 MASSs, provide no principled mechanism for monitoring fault progression. To address this limitation, the present study proposes an adaptive structured latent space learning framework that couples a two-dimensional convolutional autoencoder 2D-CAE with a component-aware, severity-preserving metric-learning scheme augmented by lightweight channel attention modules. The framework (i) uses raw six-channel vibration data from multiple subsystems; (ii) organizes the latent manifold so the latent vector encodes location and severity; and (iii) produces a compact, interpretable latent vector for downstream diagnosis or prognosis.

Figure 1 summarizes the operational flow and principal constituents of the proposed architecture, where raw six-channel vibration snippets are encoded by a 2D-CAE; the encoder is trained with a composite loss that enforces reconstruction fidelity, component-wise separability, and severity-wise ordering. The resulting latent vectors subsequently drive a lightweight classifier for real-time fault-type and severity recognition. The remainder of this section is structured as follows: Section 2.2 delineates the 2D-CAE architecture and attendant receptive-field design; Section 2.3 shows the composite loss function and articulates the triplet-sampling logistics and describes the downstream classifier, together with the adopted training regimen. Collectively, these modules establish a methodological foundation for the experimental validation presented in Section 3.

2.2. Two-Dimensional Convolutional Autoencoder

After being first proposed by LeCun et al. [19] for image classification, convolutional neural networks (CNNs) have been widely used to analyze raw data for feature learning and have shown promise in automatically extracting relevant features from complex sensor data. By leveraging the hierarchical learning capabilities of CNNs, we aim to enhance the feature extraction process within our proposed framework by utilizing the CAE structure. A conventional AE is generally composed of two components, corresponding to the encoder

f (\cdot)

and decoder

f^{'} (\cdot)

. It aims to find parameters for reconstructing the original input data by optimizing the AE parameter (

θ

), and thus, minimizing the mean squared errors (MSEs) between the input and reconstructed output over all samples.

min_{θ} \frac{1}{n} \sum_{i = 1}^{n} {∥f^{'} (f (x_{i})) - x_{i}∥}_{2}^{2}

(1)

To leverage the spatial organization of the input-exemplified by multivariate time-series sensor measurements from a ship oil purifier, we employ a convolutional autoencoder. A CAE integrates local convolutional connectivity within the autoencoder framework to learn compact, structure-aware representations. The forward transformation that maps inputs to latent feature maps is the convolutional encoder, while the inverse operation that reconstructs the outputs via deconvolution constitutes the convolutional decoder [20].

The CAE mirrors a conventional autoencoder, where

f (\cdot)

denotes the convolutional encoding operation and

f^{'} (\cdot)

denotes the convolutional decoding operation. Let the input feature maps be

x \in R^{n \times l \times l}

, produced either by the input layer or by the preceding layer, comprising n maps, each of spatial size

l \times l

. The layer employs m convolutional kernels, and therefore, yields m output feature maps. If the maps originate from the input layer, n equals the number of input channels; otherwise, n equals the number of output feature maps produced by the previous layer. Each convolutional kernel has size

d \times d

with

d \leq l

.

θ = W, \hat{W}, b, \hat{b}

collects the learnable parameters of a CAE layer, where

b \in R^{m}

and

W = w_{j}

,

j = 1, 2, \dots, m

, correspond to the convolutional encoder, with

w_{j} \in R^{n \times d \times d}

(and, when flattened,

w_{j} \in R^{n d^{2}}

). Analogously, the decoder is parameterized by

\hat{W} = {\hat{w}}_{j}

,

j = 1, 2, \dots, m

, and

\hat{b}

, with

\hat{b} \in R^{n}

and

{\hat{w}}_{j} \in R^{1 \times n d^{2}}

.

Encoding proceeds by sliding a

d \times d

patch

x_{i}

,

i = 1, 2, \dots, p

, across the input and convolving it with the weight

w_{j}

of the j-th kernel, after which the output neuron values

o_{i j}

,

j = 1, 2, \dots, m

, are computed at the layer output. Within each convolutional layer, kernels impose local receptive fields and sparse inter-layer connectivity, enabling detectors that capture spatially localized patterns. Through successive convolutions, these detectors reorganize information so that feature maps with similar statistics become aligned, and the layer implements the following mapping:

o_{i j} = φ (\sum_{k = 1}^{n} x_{i}^{k} * w_{j}^{k} + b_{j}),

(2)

where

x_{i}^{k}

is the k-th channel of the input patch,

w_{j}^{k}

is the weight for channel k in kernel j, and

b_{j}

is the bias for the j-th output map;

φ

denotes an activation function such as

Sigmoid (x) = 1 / (1 + e^{- x})

or

ReLU (x) = max (0, x)

.

To compress the spatial resolution and impart invariance to small local perturbations, pooling layers are inserted after one or more convolutions in the CAE. In this study, max-pooling is adopted and defined as

o_{j}^{m + 1} (c, d) = max_{0 \leq p, q < r} o_{j}^{m} (c \cdot r + p, d \cdot r + q),

(3)

where

o_{j}^{m}

is the j-th feature map at layer m,

o_{j}^{m + 1} (c, d)

is the pooled element at coordinates

(c, d)

of layer

m + 1

, r is the pooling window size, and p and q are the index positions within the corresponding local region of the input.

After traversing the convolutional and pooling stages, the feature maps produced by the final convolutional block are vectorized and concatenated into a single representation:

z_{flatten} = flatten (o^{m + 1}),

(4)

where

o^{m + 1}

denotes the collection of output maps from the encoder’s last pooling layer

(m + 1)

and

z_{flatten}

is the resulting flattened latent vector for downstream processing.

Reconstruction fidelity alone is insufficient for diagnostic separability; hence, the encoder is trained jointly with component-aware and severity-aware triplet-loss introduced in the next subsection, which impose a physically meaningful geometry on the latent space while retaining the reconstruction guarantee. By incorporating this structure, the framework achieves robust feature learning under data scarcity and markedly improves its capacity to extract meaningful representations from complex multi-sensor vibration data.

2.3. Composite Loss for Structured Latent-Space Engineering

The composite loss described in this section formulates a latent space that incorporates not only reconstruction loss but also encodes the locations of faults and the intensity of degradation through triplet-loss, defined as follows: given an anchor sample

x^{a}

, a positive sample

x^{p}

that shares the anchor’s label, and a negative sample

x^{n}

drawn from a different label, the network is trained to satisfy

∥ f (x^{a}) - f (x^{p}) ∥_{2}^{2} + α < {∥ f (x^{a}) - f (x^{n}) ∥}_{2}^{2},

(5)

so that the hinge loss

L_{triplet} = max (D (f (x^{a}), f (x^{p})) - D (f (x^{a}), f (x^{n})) + α, 0),

(6)

where

f (\cdot)

represents the shared CAE encoder, and

{∥ \cdot ∥}_{2}

denotes the Euclidean norm, with

D (z_{1}, z_{2}) = {∥ z_{1} - z_{2} ∥}_{2}^{2}

vanishing only when the margin

α > 0

is respected. In practice, the optimizer receives gradient signals solely from semi-hard and hard triplets—respectively, those satisfying

D (a, p) < D (a, n) < D (a, p) + α

and

D (a, n) < D (a, p)

—which is a mining strategy that generally accelerates convergence.

To embed distinct aspects of the fault domain, two specialized triplet branches are superimposed upon the reconstruction error

L_{recon}

. The overall loss reads

L_{total} = W_{recon} L_{recon} + W_{comp} L_{comp} + W_{sev} L_{sev},

(7)

where

W_{recon}, W_{comp}, W_{sev} > 0

are trade-off coefficients. The component-aware term operates on triplets

(x^{a}, x^{p}, x_{comp}^{n})

selected such that the anchor–positive pair originates from the same physical subsystem and the negative from a different subsystem; employing batch-hard mining [17], the loss

L_{comp} = max (D (f (x^{a}), f (x^{p})) - D (f (x^{a}), f (x_{comp}^{n})) + α_{comp}, 0)

(8)

drives inter-component separation. In contrast, the severity-aware branch acts within each component cluster: given a severity level s for both the anchor and positive and a higher-level

s^{'} > s

for the negative

x_{sev}^{n}

, we adopt the adaptive margin scheme so that

L_{sev} = max (D (f (x^{a}), f (x^{p})) - D (f (x^{a}), f (x_{sev}^{n})) + α_{sev} Δ s, 0), Δ s = s^{'} - s,

(9)

thereby imposing a monotone ordering whose geodesic length correlates with physical wear.

Figure 2 depicts the geometric effect: (a) demonstrates how the component margin

α_{comp}

creates well-separated islands, whereas panel (b) illustrates the severity margin

α_{sev} Δ s

, which increases linearly with the degradation level, and thus, yields a smoothly ordered trajectory.

Upon convergence, the latent representation

z = f (x)

is dispatched to two lightweight prediction heads that remain jointly trainable with the 2D-CAE encoder: a softmax classifier delivering component probabilities and a single-neuron sigmoid regressor outputting a continuous severity index. Their cross-entropy and mean squared error contributions enter the composite loss with negligibly small weights so as not to distort the metric-learning geometry.

2.4. Data Description

This subsection describes the protocol for building a high-fidelity vibration dataset under controlled lab conditions. Specifically, a purpose-built, land-based test bed, whose salient features are illustrated in Figure 3, houses an Alfa Laval S805 (Lund, Sweden) centrifugal oil purifier, a unit that is widely deployed across maritime platforms. A three-phase, 3600 rpm (60 Hz) electric motor actuates the purifier’s bowl assembly, which, in turn, attains 9300 rpm through a flat-belt transmission, thereby generating the requisite centrifugal field for contaminant separation. A complementary schematic of the principal subsystems is presented in Figure 4.

The test bed was extensively instrumented to capture dynamic responses representative of disparate health states. Consistent with prior investigations [21], preliminary screening confirmed that broadband vibration is the most sensitive observable for incipient mechanical defects; hence, vibration was selected as the primary diagnostic modality. Two high-sensitivity, tri-axial accelerometers (PCB Piezotronics 356A15 (Depew, NY, USA)) were affixed to the upper frames of the bowl assembly and the drive motor, and their outputs were sampled at 25.6 kHz, a rate that generally affords sufficient bandwidth to resolve fault-induced spectral content.

To emulate realistic failure scenarios, we modeled one baseline and six fault modes, each built at multiple severity levels (Table 1). For each class, the purifier operated for 30 min to attain thermal equilibrium, after which the final 20 min of continuous data were retained. The resulting stream was segmented into 10 s sliding windows with a 1 s overlap. Consequently, every sample constitutes a six-channel time series of length 256,000 (6 channels × 256,000 time steps), corresponding to the x-, y-, and z-axes of the two accelerometers.

3. Results and Discussion

3.1. Experimental Results

To validate the efficacy of the proposed structured latent-space learning approach, an ablation study was conducted. This study compared the full framework (AS-LSLF) with a baseline two-dimensional convolutional autoencoder (2D-CAE) trained only with the standard reconstruction loss, combined with a classification network for both component and severity classifications. The experiments were performed on a PC with an Intel Core Ultra 7265 K processor (Santa Clara, CA, USA), 128 GB of RAM, and an NVIDIA RTX 5090 GPU (Santa Clara, CA, USA). This setup used Python 3.11 (Wilmington, DE, USA) and Pytorch 2.8.0 (New York City, NY, USA) to facilitate the best utilization of the latest software capabilities and GPU acceleration. The hyper-parameters for each model were optimized using grid search optimization. The hyper-parameters for grid search optimization included the learning rate, the number of layers, filter sizes, and regularization terms, as shown in Table 2. Following this search, the optimal parameters for the final model were a latent dimension size of 256, a learning rate of

5 \times 10^{- 4}

, a weight decay of

1 \times 10^{- 5}

, a reconstruction weight of

1.0

, a component triplet weight of

1.0

, a severity triplet weight of

0.5

, a triplet margin of

0.8

, and a classification weight of

0.2

. The models were trained for a maximum of 50 epochs, with a batch size of 16. To prevent overfitting and determine the optimal training duration, an early stopping mechanism was employed with a patience of 7 epochs, monitoring the validation loss. The dataset was shuffled and split into a 70:30 train–test split and was stratified to maintain class proportions across all sets. Figure 5 illustrates the training loss and accuracy of the proposed AS-LSLF model. The loss curve demonstrates stable convergence of the combined objective function, with the multi-task learning approach effectively balancing the reconstruction quality and latent space structuring.

To show the capability of the proposed framework to capture the latent organization of diagnostic measurements, we visualized the latent vector from the baseline 2D-CAE and the proposed AS-LSLF encoder as two-dimensional maps. Specifically, we adopted the Uniform Manifold Approximation and Projection (UMAP) [22] and t-Distributed Stochastic Neighbor Embedding (t-SNE) [23] to project the high-dimensional embeddings into a plane for qualitative assessment. UMAP provides a topology-aware low-dimensional representation that preserves both the neighborhood structure and coarse global geometry of the original space. By contrast, t-SNE emphasizes the local neighborhood fidelity by minimizing the Kullback–Leibler divergence between pairwise similarity distributions in the original space and the embedded space, offering a complementary view of the learned latent features.

Figure 6 and Figure 7 depict separability across fault component groups, while Figure 8 and Figure 9 summarize the stratification by fault severity. In the baseline 2D-CAE embeddings, several clusters intermix, indicating weaker discrimination between classes. In contrast, the proposed AS-LSLF framework yields cleaner decision regions with markedly clearer boundaries, most prominently under UMAP, demonstrating an enhanced ability to disentangle operating conditions. The reduced overlap and wider margins achieved by the proposed model indicate more discriminative latent representations than the baseline, supporting improved downstream inference and health state characterization.

This structural superiority translates into improved quantitative performance, as demonstrated in Table 3 and Table 4, which summarize the downstream classification results from both models. Additionally, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 provide detailed quantitative results using four key metrics: Accuracy, F1 Score, Sensitivity, and Specificity. Each metric is reported separately for the component classification and severity classification using test data for performance comparison. Finally, Figure 10 shows the confusion matrix for component classification and fault severity classification of proposed AS-LSLF model. The performance metrics for a generic multiclass classification model are formally defined as follows:

\begin{matrix} Accuracy & = & \frac{Number of Correct Predictions}{Total Number of Predictions} \end{matrix}

(10)

\begin{matrix} F 1 Score & = & \frac{1}{N} \sum_{i = 1}^{N} 2 \times \frac{{Precision}_{i} \times {Recall}_{i}}{{Precision}_{i} + {Recall}_{i}} \end{matrix}

(11)

\begin{matrix} Sensitivity & = & \frac{1}{N} \sum_{i = 1}^{N} \frac{T P_{i}}{T P_{i} + F N_{i}} \end{matrix}

(12)

\begin{matrix} Specificity & = & \frac{1}{N} \sum_{i = 1}^{N} \frac{T N_{i}}{T N_{i} + F P_{i}}, \end{matrix}

(13)

where TP (True Positive) denotes correctly predicted instances of each class, TN (True Negative) indicates correctly predicted instances not belonging to each class, FN (False Negative) indicates instances incorrectly predicted as belonging to other classes, FP (False Positive) represents instances incorrectly predicted as the given class, and N denotes the total number of classes.

Table 3. Component classification performance comparison for the ablation study.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.98	99.97	99.96	99.99
2D-CAE	99.89	99.82	99.81	99.97

Figure 6. UMAP-based projections of the learned latent manifolds from the baseline and proposed models. (a) Baseline 2D-CAE. (b) Proposed AS-LSLF.

Figure 7. t-SNE projections of the learned latent manifolds from the baseline and proposed models. (a) Baseline 2D-CAE. (b) Proposed AS-LSLF.

Figure 8. UMAP visualization of the learned latent space for fault severity. (a) Baseline 2D-CAE. (b) Proposed AS-LSLF.

Figure 9. t-SNE visualization of the learned latent space for fault severity. (a) Baseline 2D-CAE. (b) Proposed AS-LSLF.

Figure 10. Confusion matrix of proposed AS-LSLF model. (a) Component classification. (b) Fault severity classification.

Table 4. Fault severity classification performance comparison for the ablation study.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.88	99.87	99.75	99.89
2D-CAE	99.14	98.95	98.90	99.35

To further assess the robustness and generalizability of our proposed model and the ablation model, we performed a five-fold cross-validation. The dataset was partitioned into five subsets of equal size. In each fold, one subset was used for testing, while the remaining four were used for training. This process was repeated five times, ensuring that each subset was used as the test set exactly once. The results, presented as the mean and standard deviation of the performance metrics across the five folds, are shown in Table 5 and Table 6. The consistently high performance across all folds with low standard deviations demonstrates the stability and reliability of our proposed approach.

Table 5. Five-fold cross-validation results for component classification.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.18 ± 0.013	99.17 ± 0.015	99.16 ± 0.014	99.19 ± 0.014
2D-CAE	98.99 ± 0.014	98.92 ± 0.015	98.91 ± 0.013	99.07 ± 0.014

Table 6. Five-fold cross-validation results for fault severity classification.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.08 ± 0.023	98.97 ± 0.014	98.85 ± 0.018	98.99 ± 0.013
2D-CAE	98.24 ± 0.025	98.05 ± 0.033	98.00 ± 0.019	98.45 ± 0.024

3.2. Performance Comparison with Other Methods

To further establish the superiority of the proposed framework, its diagnostic performance was benchmarked against several state-of-the-art deep learning models commonly employed for time-series classification. The selected baselines include a two-dimensional CNN (2D-CNN) [19], a standard one-dimensional CNN (1D-CNN) [24], and a bidirectional long short-term memory network (BiLSTM) [25]. From a computational standpoint, the proposed AS-LSLF framework introduces additional complexity during training compared with a standard 2D-CNN, owing to the pairwise distance calculations required by the triplet-loss. Its inference time, however, remains comparable with the 2D-CNN. While more computationally intensive than the 1D-CNN due to its two-dimensional architecture, AS-LSLF is substantially more efficient than the BiLSTM model, whose sequential processing is inherently slower for long time series. To evaluate the robustness and data efficiency, which is critical for real-world maritime applications where acquiring extensive labeled fault data is often infeasible, the analysis was conducted under two data-utilization regimes: full and 30% of the available training data.

With full training data (Table 7 and Table 8), the proposed AS-LSLF framework outperforms all baselines across nearly every metric. The performance gap further widens under the reduced 30% dataset setting (Table 9 and Table 10); while the baseline performance drops sharply, AS-LSLF remains remarkably resilient thanks to the metric-learning composite loss that regularizes the latent space, while the BiLSTM model struggles due to lack of training data.

Table 7. Component classification performance comparison with full training data.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.98	99.97	99.96	99.99
2D-CAE	99.89	99.82	99.81	99.97
2D-CNN	99.88	99.88	99.87	99.96
1D-CNN	97.1	96.9	96.8	97.5
BiLSTM	96.68	96.40	96.20	97.10

Table 8. Fault severity classification performance comparison with full training data.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.88	99.87	99.75	99.89
2D-CAE	99.14	98.95	98.90	99.35
2D-CNN	99.88	99.88	99.80	99.90
1D-CNN	92.43	91.98	91.50	93.80
BiLSTM	78.13	78.09	78.09	82.50

Table 9. Component classification performance comparison with 30% training data utilization.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.88	99.87	99.75	99.89
2D-CAE	97.71	97.30	96.80	97.90
2D-CNN	97.90	97.6	97.5	98.1
1D-CNN	94.34	93.76	92.50	94.80
BiLSTM	71.88	63.06	58.40	75.20

Table 10. Fault severity classification performance comparison with 30% training data utilization.

Model	Accuracy (%)	F1 (%)	Sensitivity (%)	Specificity (%)
AS-LSLF (Proposed)	99.87	99.80	99.80	99.90
2D-CAE	95.77	95.63	95.63	99.56
2D-CNN	94.21	93.95	93.95	99.26
1D-CNN	90.38	89.46	89.46	98.11
BiLSTM	61.92	58.62	58.62	90.44

4. Conclusions

This paper introduced an adaptive structured latent space learning framework designed to address the critical need for accurate, interpretable, and data-efficient fault diagnosis in ship oil purifiers, a key enabling technology for degree-4 maritime autonomous surface ships. By integrating a two-dimensional convolutional autoencoder with a component-aware triplet-loss function, our framework moves beyond conventional classification-driven objectives. The core improvement lies in the composite loss, which simultaneously optimizes for signal reconstruction fidelity, categorical separation of faults by their physical origin, and ordinal arrangement according to fault severity. The experimental results, validated on a dedicated oil purifier test bed, demonstrate the efficacy of our approach. The ablation study visually and qualitatively confirmed that the proposed triplet-loss successfully sculpts the latent space into a highly structured and physically meaningful manifold, where the latent vector is organized by both the component and severity. Furthermore, the comprehensive comparative analysis against state-of-the-art deep learning models revealed the superior performance of our framework. It not only achieved a higher diagnostic accuracy but also exhibited exceptional robustness under data-scarce conditions, a testament to the powerful regularization effect of metric learning. The proposed framework thus provides a robust and interpretable solution for condition-based maintenance, advancing the operational reliability and safety of autonomous maritime systems. Future work will focus on three primary avenues. First, we aim to validate the framework under complex, non-stationary conditions of real-world maritime operations to verify its robustness and effectiveness. Second, we will explore the integration of the structured latent space with prognostic models to enable accurate prediction of remaining useful life (RUL), transitioning from diagnostics to prognostics. Finally, the framework will be extended to incorporate additional sensor modalities, such as thermal and acoustic data, to build a more holistic and resilient diagnostic system for the critical machinery of autonomous ships.

Author Contributions

Conceptualization, S.G.C.; Methodology, S.G.C.; Software, S.G.C.; Validation, S.G.C.; Formal analysis, S.G.C.; Data curation, G.H.Y., J.C.P., and H.S.J.; Writing—original draft, S.G.C.; Writing—review and editing, H.S.J.; Supervision, H.S.J.; Funding acquisition, J.C.P. and H.S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the ‘Autonomous Ship Technology Development Program (20011164, Development of Performance Monitoring and Failure Prediction, and Diagnosis Technology for Engine System of Autonomous Ships)’ funded by the Ministry of Trade, Industry and Energy (MOTIE, Republic of Korea).

Data Availability Statement

The dataset generated and analyzed during the current study is available from the corresponding author upon reasonable request. Please note that due to the sensitive nature of the project, distribution of the data is subject to prior approval and permission.

Conflicts of Interest

Authors Gwang Ho Yun, Jae Cheul Park and Hwa Sup Jang were employed by AI Convergence Center, R&D Part, Korean Register. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

International Maritime Organization. Autonomous Ships: Regulatory Scoping Exercise Completed. Available online: https://www.imo.org/en/MediaCentre/PressBriefings/pages/MASSRSE2021.aspx (accessed on 29 July 2025).
Lee, S.H.; Kim, J.Y.; Lee, J.J.; Kim, Y.J.; Kim, S.K.; Lee, T.H. A Study on the Development of Database and Algorithm for Fault Diagnosis for Condition Based Maintenance of Rubber Seal in Ancillary Equipment of Autonomous Ships. J. Appl. Reliab. 2022, 22, 48–58. [Google Scholar] [CrossRef]
Alfa Laval Tumba, A.B. Separator Manual High Speed Separator P 605; Alfa Laval Tumba AB: Tumba, Sweden, 2005. [Google Scholar]
Kim, J.Y.; Lee, T.H.; Lee, S.H.; Lee, J.J.; Shin, D.M.; Lee, W.K.; Kim, Y.J. A Study on the Development of a Failure Simulation Database for Condition Based Maintenance of Marine Engine System Auxiliary Equipment. J. Soc. Nav. Archit. Korea 2022, 59, 200–206. [Google Scholar] [CrossRef]
Zamorano, M.; Avila, D.; Marichal, G.N.; Castejon, C. Data Pre-Processing for Vibration Analysis: Application in Indirect Monitoring of ‘Ship Centrifuge Lube Oil Separation Systems’. J. Mar. Sci. Eng. 2022, 10, 1199. [Google Scholar] [CrossRef]
Park, M.H.; Yeo, S.; Choi, J.H.; Lee, W.J. Review of Noise and Vibration Reduction Technologies in Marine Machinery: Operational Insights and Engineering Experience. Appl. Ocean Res. 2024, 152, 104195. [Google Scholar] [CrossRef]
Yan, R.; Gao, R.X.; Chen, X. Wavelets for Fault Diagnosis of Rotary Machines: A Review with Applications. Signal Process. 2014, 96, 1–15. [Google Scholar] [CrossRef]
Chen, J.; Li, Z.; Pan, J.; Chen, G.; Zi, Y.; Yuan, J.; Chen, B.; He, Z. Wavelet Transform Based on Inner Product in Fault Diagnosis of Rotating Machinery: A Review. Mech. Syst. Signal Process. 2016, 70–71, 1–35. [Google Scholar] [CrossRef]
Oh, H.S.; Yoon, B.D. Trends in PHM Technology for Extraction of Data Characteristic Factor. J. Korean Soc. Mech. Eng. 2016, 56, 32–36. [Google Scholar]
Velasco-Gallego, C.; Navas De Maya, B.; Matutano Molina, C.; Lazakis, I.; Cubo Mateo, N. Recent Advancements in Data-Driven Methodologies for the Fault Diagnosis and Prognosis of Marine Systems: A Systematic Review. Ocean Eng. 2023, 284, 115277. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. [Google Scholar]
Fera, F.; Spandonidis, C. A Fault Diagnosis Approach Utilizing Artificial Intelligence for Maritime Power Systems within an Integrated Digital Twin Framework. Appl. Sci. 2024, 14, 8107. [Google Scholar] [CrossRef]
Yigin, B.; Celik, M. A Prescriptive Model for Failure Analysis in Ship Machinery Monitoring Using Generative Adversarial Networks. J. Mar. Sci. Eng. 2024, 12, 493. [Google Scholar] [CrossRef]
Wang, H.; Ning, Z.; Liu, B.; Wu, S.; Xu, X.; Qiao, X. A Data-Driven Fault-Tolerant Control Approach for Ship Power Condenser System. In Proceedings of the 2025 IEEE 14th Data Driven Control and Learning Systems (DDCLS), Wuxi, China, 9–11 May 2025; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
Timms, A.; Langbridge, A.; O’Donncha, F. Agentic Anomaly Detection for Shipping. In Proceedings of the NeurIPS 2024 Workshop on Open-World Agents, Vancouver, BC, Canada, 15 December 2024; NeurIPS: San Diego, CA, USA, 2024. [Google Scholar]
Hermans, A.; Beyer, L.; Leibe, B. In Defense of the Triplet-Loss for Person Re-Identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
Ha, M.L.; Blanz, V. Deep Ranking with Adaptive Margin Triplet Loss. arXiv 2021, arXiv:2107.06187. [Google Scholar] [CrossRef]
Wang, X.; Liu, F. Triplet Loss Guided Adversarial Domain Adaptation for Bearing Fault Diagnosis. Sensors 2020, 20, 320. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Handwritten Digit Recognition with a Back-Propagation Network. In Advances in Neural Information Processing Systems 2; Touretzky, D., Ed.; Morgan Kaufmann: San Francisco, CA, USA, 1989; pp. 396–404. [Google Scholar]
Zhang, Y. A Better Autoencoder for Image: Convolutional Autoencoder. In Proceedings of the ICONIP17-DCEC, Guangzhou, China, 14–18 November 2017; Available online: http://users.cecs.anu.edu.au/~Tom.Gedeon/conf/ABCs2018/paper/ABCs2018_paper_58.pdf (accessed on 29 July 2025).
Lee, S.; Lee, T.; Kim, J.; Lee, J.; Ryu, K.; Kim, Y.; Park, J.-W. A Study on the Application of Discrete Wavelet Decomposition for Fault Diagnosis on a Ship Oil Purifier. Processes 2022, 10, 1468. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
van der Maaten, L.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Wang, Z.; Yan, W.; Oates, T. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1578–1585. [Google Scholar]
Graves, A.; Fernández, S.; Schmidhuber, J. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In Proceedings of the International Conference on Artificial Neural Networks (ICANN 2005), Warsaw, Poland, 11–15 September 2005; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2005; Volume 3697, pp. 799–804. [Google Scholar]

Figure 1. Schematic diagram of the adaptive structured latent space learning framework (AS-LSLF).

Figure 2. Geometric intuition underpinning the composite metric-learning objective. (a) Component-aware triplet. (b) Severity-aware triplet.

Figure 3. Schematic of the Alfa Laval S805 oil purifier highlighting components prone to failure (bowl assembly and drive system).

Figure 4. Instrumented land-based test bed with tri-axial accelerometers mounted on the bowl and motor frames.

Figure 5. Training curves for the proposed AS-LSLF model. (a) Training and testing loss over epochs. (b) Training and testing accuracy over epochs.

Table 1. Simulated fault conditions and severity definitions.

Component	Fault Mode	Failure Mechanism	Severity Definition
Bowl Assembly	Seal Degradation	Thermal aging hardens and cracks bowl seals, causing leakage and imbalance.	25%: aged at 120 °C for 45 h
			50%: aged at 120 °C for 90 h
			75%: aged at 120 °C for 125 h
			100%: aged at 120 °C for 180 h
	Mass Imbalance	Uneven mass distribution from sludge or wear induces excessive vibration.	25%: 10 g mass attached
			50%: 20 g mass attached
			75%: 30 g mass attached
			100%: 40 g mass attached
Drive System	Bearing Lubrication Fault	Insufficient lubricant in motor bearings increases friction and wear.	Lubricant removed (100%)
	Belt Tension Degradation	Fatigue or stretching of the flat belt causes slippage and power loss.	25%: center distance 82.56 mm
			50%: 80.29 mm
			75%: 78.02 mm
			100%: 75.75 mm
	Friction-Coupling Wear	Progressive loss of elastomer in the flexible coupling increases backlash and torsional resonance.	25%: residual mass 22.21 g
			50%: 14.74 g
			75%: 7.37 g
			100%: 0 g
	Shaft Misalignment	Angular misalignment between the motor shaft and purifier spindle induces bending stress.	25%: shaft tilted by 0.3°
	Shaft Misalignment		50%: 0.6°
Normal Condition		All components operating within nominal specifications.

Table 2. Search ranges used for grid search optimization of the component-aware triplet (2D-CAE) and models for comparison.

Parameter	Range/Options	Purpose in the Code
Latent dimension size	{128, 256, 512}	Width of the bottleneck vector `z` produced by the encoder.
Learning rate	$[1 \times 10^{- 4}, 5 \times 10^{- 3}]$	Initial step size for the Adam optimizer.
Weight decay (L2)	$[1 \times 10^{- 6}, 1 \times 10^{- 3}]$	L₂ regularization applied by Adam.
Reconstruction weight	$[0.5, 2.0]$	Weight of the MSE reconstruction term.
Component triplet weight	$[0.5, 2.0]$	Weight of the class-aware triplet-loss on `z`.
Severity triplet weight	$[0.1, 1.0]$	Weight of the continuous-severity triplet-loss.
Triplet margin	$[0.5, 2.0]$	Margin used when computing $L_{triplet_comp}$ .
Classification weight	$[0.05, 1.0]$	Multiplier for the cross-entropy loss on component logits.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chae, S.G.; Yun, G.H.; Park, J.C.; Jang, H.S. Adaptive Structured Latent Space Learning via Component-Aware Triplet Convolutional Autoencoder for Fault Diagnosis in Ship Oil Purifiers. Processes 2025, 13, 3012. https://doi.org/10.3390/pr13093012

AMA Style

Chae SG, Yun GH, Park JC, Jang HS. Adaptive Structured Latent Space Learning via Component-Aware Triplet Convolutional Autoencoder for Fault Diagnosis in Ship Oil Purifiers. Processes. 2025; 13(9):3012. https://doi.org/10.3390/pr13093012

Chicago/Turabian Style

Chae, Sun Geu, Gwang Ho Yun, Jae Cheul Park, and Hwa Sup Jang. 2025. "Adaptive Structured Latent Space Learning via Component-Aware Triplet Convolutional Autoencoder for Fault Diagnosis in Ship Oil Purifiers" Processes 13, no. 9: 3012. https://doi.org/10.3390/pr13093012

APA Style

Chae, S. G., Yun, G. H., Park, J. C., & Jang, H. S. (2025). Adaptive Structured Latent Space Learning via Component-Aware Triplet Convolutional Autoencoder for Fault Diagnosis in Ship Oil Purifiers. Processes, 13(9), 3012. https://doi.org/10.3390/pr13093012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Structured Latent Space Learning via Component-Aware Triplet Convolutional Autoencoder for Fault Diagnosis in Ship Oil Purifiers

Abstract

1. Introduction

2. Methodology: Adaptive Structured Latent-Space Learning Framework

2.1. Proposed Framework and Problem Formulation

2.2. Two-Dimensional Convolutional Autoencoder

2.3. Composite Loss for Structured Latent-Space Engineering

2.4. Data Description

3. Results and Discussion

3.1. Experimental Results

3.2. Performance Comparison with Other Methods

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI