Multiscale Feature Extraction and Decoupled Diagnosis for EHA Compound Faults via Enhanced Continuous Wavelet Transform Capsule Network

Cao, Shuai; Li, Weibo; Deng, Xiaoqing; Huang, Kangzheng; Li, Rentai

doi:10.3390/pr14071043

Open AccessArticle

Multiscale Feature Extraction and Decoupled Diagnosis for EHA Compound Faults via Enhanced Continuous Wavelet Transform Capsule Network

by

Shuai Cao

¹

,

Weibo Li

^1,*,

Xiaoqing Deng

²,

Kangzheng Huang

¹ and

Rentai Li

¹

School of Automation, Wuhan University of Technology, Wuhan 430070, China

²

Hubei ChuangSiNuo Electrical Technology Corp., Enshi 445000, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(7), 1043; https://doi.org/10.3390/pr14071043

Submission received: 19 January 2026 / Revised: 12 March 2026 / Accepted: 24 March 2026 / Published: 25 March 2026

(This article belongs to the Section Automation Control Systems)

Download

Browse Figures

Versions Notes

Abstract

The vibration signals of Electro-Hydrostatic Actuators (EHAs) exhibit strong non-linearity and non-stationarity, particularly under complex coupling mechanisms, making the extraction of intrinsic fault features computationally challenging. Conventional deep learning approaches often lack mathematical interpretability and struggle to decouple superimposed fault signatures from incomplete datasets. To address these issues, this paper proposes the Enhanced Continuous Wavelet Transform Capsule Network (ECWTCN), an intelligent decoupled diagnosis framework designed for multiscale signal analysis. The architecture integrates a wavelet-kernel convolution layer to extract physically interpretable time–frequency features across multiple scales, effectively capturing transient impulses associated with incipient faults. Furthermore, a novel maximized aggregation routing algorithm is introduced to optimize the dynamic routing process, enhancing global feature aggregation. A distinct advantage of the ECWTCN is its capability to generalize distinct fault patterns, enabling the identification of unseen compound faults by training exclusively on normal and single-fault samples. Comparative experiments show that the proposed method delivers strong multi-label classification performance under operating condition A, achieving a Subset Accuracy of 93.7% and a Label Ranking Average Precision of 0.998. Complexity analysis further confirms the method’s efficiency in terms of FLOPs and parameter size. This work presents a robust, lightweight, and mathematically interpretable solution for the analysis of complex signals in high-reliability equipment.

Keywords:

compound fault diagnosis; electro-hydrostatic actuators (EHA); incomplete data; wavelet kernel convolution; capsule network

1. Introduction

With the deep integration of next-generation information technology and advanced manufacturing technology, industrial big data-driven fault diagnosis methods are emerging as a crucial technical foundation for intelligent equipment operation and maintenance. This approach holds significant engineering application value and theoretical research significance for advancing the intelligent upgrade of high-end equipment, particularly ship propulsion and control systems [1]. The EHA represents a novel servo-actuating device that highly integrates an electric motor, hydraulic pump, hydraulic cylinder, and control unit. Boasting advantages such as a high power-to-weight ratio, high integration, and low pollution, it has progressively replaced traditional centralized hydraulic systems and found widespread application in scenarios including ship steering gear [2], propulsion systems [3], deck machinery [4], and superstructure motion control [5]. Under prolonged service in harsh marine environments characterized by high salt spray, high humidity, intense vibration, and significant load fluctuations, shipboard EHA systems are highly susceptible to aging caused by thermal and mechanical stresses, as well as the combined effects of strong electromagnetic interference [6]. Internal pump and valve components, actuators, and control units may experience multiple failures simultaneously or in cascade, resulting in compound fault with strong coupling and hidden characteristics. Once critical actuators fail, it directly jeopardizes vessel maneuvering safety and mission capability, leading to severe economic losses or even catastrophic accidents. Therefore, high-reliability compound fault diagnosis and decoupling for ship EHA systems are of paramount importance [7].

However, due to the highly compact mechatronic structure and complex, variable operating conditions within the ship’s EHA system, its failure patterns exhibit the following typical characteristics:

Compound failures are commonplace, with multiple failure modes such as wear [8], fatigue [9], leakage [10], and sensor interference often occurring simultaneously [11], leading to uncertainty in determining cause-and-effect relationships.
Fault characteristics exhibit severe coupling, with strong feedback introduced by closed-loop control and hydraulic circuits causing fault response signals to overlap significantly in the time–frequency domain. This makes it difficult to directly isolate individual fault components from the measured data [12].
Data acquisition is costly and unevenly distributed. Under real vessel operating conditions, the system remains in a normal or slightly degraded state for extended periods, resulting in extremely limited observable samples of multi-category compound faults. Meanwhile, fault condition testing is constrained by safety and cost considerations [13].

The aforementioned factors make traditional diagnostic methods based on empirical thresholds or simple pattern recognition difficult to apply to the complex engineering scenarios of ship EHA systems. There is an urgent need to develop compound fault intelligent decoupling methods capable of operating under conditions of incomplete data.

Research progress in compound fault diagnosis for mechanical systems indicates that existing methods can be broadly categorized into three types. The first type comprises mechanism-based compound fault diagnosis methods. These establish dynamic or fluid–structure interaction models incorporating multiple fault mechanisms to analyze the impact of different failure modes on system responses, thereby enabling prediction and identification [14]. Such methods can reveal the physical nature of faults to a certain extent, but they heavily rely on domain expertise and precise modelling capabilities, making them difficult to adapt to the complex engineering environment of ship EHA systems with multiple components and operating conditions. The second category comprises compound fault diagnosis methods based on signal processing. These techniques utilize time–frequency analysis, sparse decomposition, independent component analysis, and transform-domain filtering to extract or isolate the characteristic frequencies or components of individual faults from compound fault signals [15]. Although such methods have demonstrated promising results on objects like gearboxes and rolling bearings, they typically rely on substantial prior knowledge to design feature extraction and separation strategies. Their limited model generalization capabilities constrain their application in shipboard EHA systems characterized by strong signal-to-noise ratios, strong coupling, and highly variable operating conditions. The third category involves AI-based compound fault diagnosis methods that utilize models such as convolutional neural networks, deep belief networks, graph neural networks, and capsule networks to automatically learn fault features from multi-source sensor data and construct end-to-end diagnostic models [16].

Within the framework of deep learning, existing approaches to compound fault diagnosis primarily fall into two categories: one treats compound faults as novel fault patterns distinct from single faults, employing multi-class classifiers to directly recognize the overall “single fault + compound fault” scenario [17]. Another approach adopts a multi-label learning perspective, explicitly modelling the relationship between single faults and compound faults. It achieves compound fault decoupling by simultaneously outputting multiple fault labels [18]. The former achieves high recognition accuracy with sufficient samples but essentially “labels” compound faults, failing to reflect the interrelationships between faults and demonstrating poor generalization capabilities for unseen compound patterns. The latter partially accounts for fault correlations but typically relies on large volumes of fully annotated multi-label data, while the “black-box” nature of deep learning models results in insufficient interpretability of outcomes. In recent years, capsule networks have demonstrated promising decoupling potential in the diagnosis of complex faults in mechanical systems, leveraging their vectorized representations and hierarchical modelling capabilities that distinguish between global and local aspects. Their dynamic routing mechanism facilitates the characterization of diverse fault modes and their combinatorial relationships [19]. However, most existing capsule network diagnostic methods are still built upon the ideal premise of readily available comprehensive compound fault samples, making them difficult to directly transfer to engineering objects like ship EHA systems where compound fault data is extremely scarce.

To address the aforementioned issues, this paper proposes a non-complete data-driven compound fault intelligent decoupling method for ship EHA, based on wavelet feature extraction and capsule network architecture. This method employs a convolutional layer with a wavelet-shaped kernel to perform end-to-end feature extraction on motor vibration signals, ensuring that the learned features maintain a clear correspondence with the original vibration signals in the time–frequency domain. By further leveraging capsule layers to model hierarchical relationships between single and compound faults, this approach achieves automatic fault mode decoupling and multi-label decision-making through vectorized outputs and a maximized aggregation routing mechanism. Consequently, it enables intelligent recognition of unseen compound faults solely through training on normal and single-fault samples.

The main contributions of this paper are as follows:

(1): For complex engineering scenarios involving ship electric hydrostatic actuators, this study proposes a non-complete data-driven intelligent decoupling model for compound faults. Trained solely on normal samples and single-fault samples without requiring compound fault data, the model achieves effective decoupling and diagnosis of multiple typical compound faults under actual operating conditions. This approach offers a novel strategy for health management of critical ship actuation systems when compound fault data is difficult to obtain.
(2): A wavelet capsule network is constructed by integrating a wavelet kernel convolution layer with a capsule network architecture. This approach performs feature learning and fault decoupling on ship EHA vibration signals through a maximized aggregation routing mechanism. Furthermore, an interpretability analysis of the features extracted by the wavelet kernel convolution layer establishes a comprehensible mapping relationship between these features and the original vibration signals. This enhances the credibility and interpretability of the diagnostic model in safety-critical scenarios for ships.
(3): A test platform and dataset based on the self-developed ship EHA controller were constructed. Multiple single and compound fault scenarios were designed to conduct comparative experiments between the proposed method and several typical deep learning approaches. Results demonstrate that the proposed method exhibits significant advantages in compound fault recognition accuracy, decoupling capability, and robustness. This validates the effectiveness and engineering application potential of the incomplete data-driven wavelet capsule network in the intelligent diagnosis of compound faults in ship EHA systems.

The remainder of this paper is organized as follows: Section 2 introduces the theoretical foundations of CWT and Capsule Networks. Section 3 details the mathematical formulation of the ECWTCN and the decoupled diagnosis strategy. Section 4 describes the experimental setup on a shipboard EHA platform. Section 5 presents a comprehensive analysis of the results, focusing on accuracy, interpretability, and complexity. Finally, Section 6 concludes the paper.

2. Fundamental Theory of the Small Wave Capsule Network

2.1. Continuous Wavelet Transform

The Continuous Wavelet Transform (CWT) effectively extracts frequency components from raw vibration signals and precisely locates their corresponding time intervals. This method is widely applied in mechanical equipment fault diagnosis. By performing time–frequency analysis on vibration signals, it reveals the distribution characteristics of different frequency components along the time axis, thereby providing reliable evidence for identifying the type of mechanical equipment failure [20].

CWT employs mother wavelet functions with multiscale characteristics as analytical tools. Wavelet basis functions are jointly determined by scale parameters and shift parameters: scale parameters control the compression or stretching of wavelet functions to extract different frequency components; shift parameters move the wavelet along the time axis to enable point-by-point analysis of local signal features. The general form of wavelet basis function

ψ_{T, S} (\cdot)

is shown in Equation (1).

ψ_{T, S} (t) = \frac{1}{\sqrt{S}} ψ (\frac{t - T}{S})

(1)

where T represents the translation parameter, and S represents the scale parameter. S is inversely proportional to the frequency of the wavelet basis function.

The inner product calculation between the original vibration signal and the wavelet basis function is shown in Equation (2).

ψ f (T, S) = \int_{- \infty}^{+ \infty} f (t) ψ_{T, S} (t) dt

(2)

where

ψ f (T, S)

denotes the result of the inner product calculation, while

f (t)

represents the original vibration signal.

2.2. Convolutional Neural Network

Convolutional Neural Networks (CNNs) can efficiently learn and extract high-dimensional features from signal data automatically, making them widely applied in classification, prediction, and other tasks [21]. CNNs designed for classification tasks typically consist of two components: a feature extractor and a classifier. The feature extractor is sequentially stacked with multiple convolutional layers, activation layers, and pooling layers, responsible for progressively extracting deep features from the input signal. The classifier comprises several fully connected layers that comprehensively process the extracted features to output prediction probabilities for each health status category, thereby completing the final classification.

In intelligent fault diagnosis scenarios, the convolutional layer performs the core function of automatically learning feature representations from input signals. By applying convolution operations to the input signal using convolutional kernels with learnable parameters, feature maps are generated that progressively capture the signal’s local patterns and discriminative information. The mathematical expression for this convolution process is shown in Equation (3).

A_{i} = W_{i} \otimes x + b_{i}

(3)

where x represents the input signal, W_i and b_i denote the weights and biases of the i-th convolutional layer respectively, A_i represents the feature map learned by the i-th convolutional layer, and ⊗ denotes the convolution operation.

The mathematical form of the pooling operation is shown in Equation (4).

y_{i} = p o o l i n g (z_{i})

(4)

where

p o o l i n g (\cdot)

denotes the pooling operation, where z_i and y_i represent the outputs of the i-th activation layer and pooling layer, respectively.

In the classifier section, a multi-layer fully connected network is typically employed to comprehensively analyze the high-dimensional features extracted by the feature extractor and perform the final classification. The terminal fully connected layer incorporates the Softmax function to normalize the predicted values for each category, mapping them to a probability distribution within the range of 0 to 1, where the sum of all category probabilities equals 1. Ultimately, the model outputs the category with the highest probability as the predicted result. The mathematical form of the Softmax function is shown in Equation (5).

S_{P} = \arg \max (f_{s} (y_{f o}))

(5)

where

y_{f o}

denotes the output of the final fully connected layer,

f_{s} (\cdot)

represents the Softmax function, and S_P is the predicted state category.

Finally, the model will be optimized using the cross-entropy loss function, and its parameters will be updated through the back propagation (BP) algorithm.

3. The Proposed Method

3.1. ECWTCN Algorithm Structure

To address key challenges in intelligent diagnosis of compound faults, such as incomplete data and complex feature coupling, this paper proposes an incomplete data-driven intelligent decoupling method for equipment compound faults based on the ECWTCN. The overall structure of the model is shown in Figure 1. First, a wavelet kernel convolution layer is constructed to form a feature extractor, extracting multi-scale time–frequency features with physical interpretability from raw vibration signals. Subsequently, the extracted features are fed into a decoupling classifier composed of capsule layers. Leveraging the capsule network’s strengths in representing hierarchical relationships and handling feature coupling, this achieves effective decoupling of single-fault identification and compound fault patterns. Finally, incomplete fault samples are used as training inputs, and the network is trained and optimized by minimizing the marginal loss function. The overall algorithm comprises five core modules.

(1) Wavelet Kernel Convolutional Neural Network: The wavelet kernel convolutional layer extracts features from raw vibration signals that possess clear physical meaning and interpretability. By directly learning features related to the signal’s physical mechanisms, it not only enhances diagnostic accuracy but also improves model interpretability [22]. This convolutional layer replaces traditional convolutional kernels with wavelet kernels. By adjusting translation and scale parameters, multiple sets of wavelet kernels with distinct center frequencies are constructed. Convolving these wavelet kernels with the raw vibration signal effectively extracts signal components matching the wavelet kernel’s center frequency, thereby capturing critical information such as impact features within the vibration signal. The convolution operation between the wavelet basis function and the original signal is expressed as shown in Equation (6).

W_{C O} = L_{T, S} (t) \otimes x

(6)

where

L_{T, S} (t)

represents the wavelet function with translation parameter T and scale parameter S, while W_CO denotes the output of the wavelet kernel convolution layer.

Additionally, Batch Normalization (BN) layers are employed to normalize extracted features, accelerating model training convergence while effectively mitigating overfitting issues. Rectified Linear Unit (ReLU) activation layers introduce nonlinear mappings, enabling the network to learn complex feature patterns. Max Pooling Layer (MPL) reduce model parameter size and computational complexity by performing down-sampling on local regions, thereby enhancing training efficiency and strengthening model robustness.

(2) Maximized Aggregation Routing Mechanism: This algorithm centers on the core design principle of “maximizing per-bit efficiency.” It optimizes the routing process by precisely quantifying the associative relationships between input and output capsules. Specifically, the algorithm calculates both the net benefit gained when an input capsule is adopted by an output capsule and the net cost incurred when an input capsule is ignored. The net benefit reflects the gain in feature representation or semantic consistency achieved by the output capsule when utilizing the input capsule. The net cost, conversely, quantifies the potential information loss or structural bias incurred by the output capsule when omitting the input capsule. Based on a comprehensive evaluation of net benefits and net costs, the algorithm dynamically adjusts routing coefficients to achieve the optimal capsule aggregation strategy.

The Max-Aggregate routing algorithm involves four designable and differentiable neural network modules, each performing distinct modeling functions. First,

A (\cdot)

, the activation coefficient network, generates the activation strength

a_{c}^{i n}

for each input capsule

x_{r e f}^{i n}

; Second,

F (\cdot)

, the generation network, produces a set of candidate output vector sequences

S_{p r}^{o u t}

conditioned on a given input vector

x_{r e f}^{i n}

. Third,

G (\cdot)

, the prediction network, reconstructs or predicts the input vector

{\hat{x}}_{p r e 1}^{i n}

based on the candidate output vectors

S_{g e}^{o u t}

. Finally,

S (\cdot)

, the matching network, measures the consistency between the actual input vector

x_{a c}^{i n}

and the predicted vector

{\hat{x}}_{p r e 2}^{i n}

to quantify their matching degree.

In each iteration of the algorithm, the state of the output capsule is updated based on its current predictive capability. The activation coefficients for each input vector are prioritized and allocated to the output capsule that most accurately predicts that vector. This allocation process optimizes for “maximizing bit-wise salience,” enabling the output capsule to interpret the input data as efficiently as possible in terms of information representation. As iterations progress, the state of the output vector sequence gradually converges, enabling it to reconstruct or interpret the entire input vector in an optimal manner, thereby maximizing bit-wise utility.

This mechanism can be viewed as an extension and modification of the traditional Expectation Maximization (EM) loop [23]. Its execution process can be summarized in three steps: (1) D-step (Data Distribution): Calculate the proportion of data from each input capsule that is utilized or ignored by each output capsule; (2) M-step (Maximization): Update the state of output capsules to enable more accurate prediction of input capsules in a manner that maximizes per-bit utility; (3) E-step (Expectation): Estimate routing probabilities, i.e., the activation probabilities of each output capsule for every input capsule. Through iterative cycles, the algorithm adaptively forms optimal capsule aggregation patterns. Figure 2 illustrates the overall paradigm of the maximized aggregation routing algorithm.

This routing mechanism receives input vector

x_{r e f}^{i n} \in ℝ^{n^{i n} \times d^{i n}}

, undergoes aggregation and update processes, and generates the corresponding output vector set

x_{m n}^{o u t} \in ℝ^{n^{o u t} \times d^{o u t}}

. Here, nⁱⁿ denotes the number of input vectors, and dⁱⁿ denotes the dimension of each input vector; n^out denotes the number of output vectors, and d^out denotes the dimension of each output vector. The relevant implementation process can be referenced in the pseudocode shown in Algorithm 1, while the key mathematical expressions of the routing mechanism are given by Equations (7)–(10), constituting the core computational framework of the maximized aggregation routing algorithm.

a_{c}^{i n} = A (x_{r e f}^{i n}) \frac{\sum_{f} W_{r e f}^{A} x_{r e f}^{i n}}{\sqrt{n^{i n}} + B_{r e f}^{A}}

(7)

S_{p r}^{o u t} = F (x_{r e f}^{i n}) = \frac{\sum_{f} W_{f n}^{F_{2}} W_{m f}^{F_{1}} x_{r e f}^{i n}}{\sqrt{n^{i n}} + B_{m n}^{F_{2}}}

(8)

{\hat{x}}_{p r e 2}^{i n} = G (x_{m n}^{o u t}) = W_{m f}^{G_{2}} \sum_{n} W_{n f}^{G_{1}} L N (x_{m n}^{o u t}) + B_{m f}^{G_{2}}

(9)

S_{e m} = S (x_{p r e 2}^{i n}, x_{r e f}^{i n}) = \log f (W_{e m}^{S} \sum_{f} x_{r e f}^{i n} {\hat{x}}_{i n f}^{i n} + B_{e m}^{S})

(10)

where

W_{r e f}^{A}

,

B_{r e f}^{A}

,

W_{f n}^{F_{2}}

,

W_{m f}^{F_{1}}

,

B_{m n}^{F_{2}}

,

W_{m f}^{G_{2}}

,

W_{n f}^{G_{1}}

,

B_{m f}^{G_{2}}

,

W_{e m}^{S}

,

B_{e m}^{S}

,

β_{e m}^{u s e}

and

β_{e m}^{i g n}

all represent learning parameters;

L N (\cdot)

denotes the layer normalization operation used to reduce internal covariate shifts; and

f (\cdot)

represents the sigmoid function, which constrains the output value between [0, 1], thereby ensuring

0 \leq D_{e m}^{u s e} \leq f (a_{c}^{i n}) \leq 1

and

0 \leq D_{e m}^{i g n} \leq f (a_{c}^{i n}) \leq 1

.

Algorithm 1 Maximized Aggregation Routing Mechanism

1 Input:

x_{r e f}^{i n}

Output:

x_{m n}^{o u t}

2

a_{c}^{i n} \leftarrow \frac{\sum_{f} W_{r e f}^{A} x_{r e f}^{i n}}{\sqrt{n^{i n}} + B_{r e f}^{A}}

3

R_{e m} \leftarrow \frac{1}{n^{o u t}}

4 For r = 1 to T do

5 Begin D-step

6

D_{e m}^{u s e} \leftarrow f (a_{c}^{i n}) R_{e m}

7

D_{e m}^{i g n} \leftarrow f (a_{c}^{i n}) - D_{e m}^{u s e}

8 end

9 Begin M-step

10

ϕ_{e m} \leftarrow β_{e m}^{u s e} D_{e m}^{u s e} - β_{e m}^{i g n} D_{e m}^{i g n}

11

x_{m n}^{o u t} \leftarrow \frac{\sum_{f} W_{f n}^{F_{2}} W_{m f}^{F_{1}} \sum_{e} ϕ_{e m} x_{r e f}^{i n}}{\sqrt{n^{i n}} + \sum_{e} ϕ_{e m} B_{m n}^{F_{2}}}

12 end

13 Begin E-step

14

{\hat{x}}_{p r e 2}^{i n} \leftarrow W_{m f}^{G_{2}} \sum_{n} W_{n f}^{G_{1}} L N (x_{m n}^{o u t}) + B_{m f}^{G_{2}}

15

S_{e m} \leftarrow \log f (W_{e m}^{S} \sum_{f} x_{r e f}^{i n} {\hat{x}}_{i n f}^{i n} + B_{e m}^{S})

16

R_{e m} \leftarrow \frac{e^{S_{e m}}}{\sum_{m} e^{S_{e m}}}

17 end

18 end

(3) Decoupling Classifier: The decoupling classifier consists of two sequentially stacked capsule layers, with its core objective being the effective separation and identification of compound fault modes. The first layer is the intelligent decoupling layer, which introduces a maximized aggregation routing mechanism. This mechanism automatically decomposes the input coupled fault representation into multiple interrelated yet independently describable fault sources, thereby achieving structural decoupling of compound fault features. The second layer is the digital capsule layer, which outputs vectors rather than the scalar probability values typical of traditional CNNs. Each fault category corresponds to an independent output vector, whose magnitude represents the probability of that fault occurring. Furthermore, the dimensional components of the output vector not only indicate the presence of a fault but also contain multidimensional semantic information about the fault type, such as the fault mode, severity, and potential location. This vectorized feature representation enables the capsule layer to simultaneously capture both category probabilities and structural attributes, providing a higher-level, more interpretable expression for fault diagnosis tasks. The complete diagnostic process of the capsule layer is illustrated below.

Capsule Networks (CNs), first proposed by Sabour et al. [24], are a neural network architecture capable of advanced feature modeling through vectorized feature representations and dynamic routing mechanisms. Their core lies in enabling information exchange between different capsules via routing policies, thereby capturing the spatial hierarchical relationships within input data. The fundamental workflow of the inter-capsule routing algorithm is as follows: First, the feature vectors generated by the input capsule are multiplied by the corresponding learnable transformation matrix to produce a set of “prediction vectors.” These prediction vectors represent the original features from multiple perspectives, providing diverse feature representations for subsequent capsule aggregation and activation computations. The mathematical form is shown in Equation (11).

u_{j |i} = W_{i j} u_{i}

(11)

where

u_{i}

denotes the input feature vector,

W_{i j}

represents the transformation matrix, and

u_{j |i}

indicates the prediction vector from capsule i to capsule j. The prediction vector

u_{j |i}

is combined with the routing coupling coefficient

C_{i j}

to yield the total input for capsule j.

Secondly, feature aggregation between capsules is achieved through a dynamic routing algorithm. During the initial routing iteration, the prior probability b_ij of each route must be initialized, typically set to zero. Since this initial setting implies equal allocation weights for each input capsule to different output capsules, the initial vector v_j of the output capsule can be computed as the mean of all prediction vectors, with its update form shown in Equation (12). Through this initialization strategy, the dynamic routing mechanism progressively adjusts allocation weights based on prediction consistency during subsequent iterations, thereby forming a discriminative feature clustering structure.

b_{i j} = b_{i j} + u_{j |i} \cdot v_{j} = 0

(12)

where

b_{i j} = 〈u_{j |i}, v_{j}〉

.

Subsequently, the dynamic routing algorithm iteratively updates routing weights by introducing the Softmax function to progressively optimize the coupling relationship between input and output capsules. Specifically, in each iteration, the prior probability b_ij is normalized via Softmax to yield the coupling coefficient C_ij, thereby ensuring that the weight distribution assigned by each input capsule to all output capsules forms a valid probability distribution. The mathematical form of this update process is shown in Equation (13).

C_{i j} \leftarrow softmax (b_{i j}) = \frac{e^{b_{i j}}}{\sum_{h} e^{b_{i h}}}

(13)

where h denotes the index of all possible capsules in the next layer. The mathematical form of the weighted sum of prediction vectors

S_{j}

is shown in Equation (14).

S_{j} \leftarrow \sum_{i} C_{i j} u_{j |i}

(14)

where

S_{j}

denotes the weighted sum of all prediction vectors

u_{j |i}

in the lower-layer capsule. The mathematical form of the output vector

v_{j}

is shown in Equation (15).

v_{j} \leftarrow squash (S_{j}) = \frac{{‖S_{j}‖}^{2}}{1 + {‖S_{j}‖}^{2}} \cdot \frac{S_{j}}{‖S_{j}‖}

(15)

where

v_{j}

denotes the attribute vector of the target entity corresponding to the output capsule. To generate this output vector, the total input must be subjected to

S_{j}

. This function nonlinearly compresses the magnitude while preserving the directional information of the vector, constraining the output vector’s norm between 0 and 1. This effectively represents the entity’s probability of existence and its associated attributes. Ultimately, the

v_{j}

processed by the squash function serves as the output vector for that capsule.

The overall operation process of the capsule network is illustrated in Figure 3. In each routing iteration, the prior probability b_ij is first initialized to zero according to Equation (12) to ensure unbiased initial allocation of input capsules to output capsules. Subsequently, the coupling coefficient C_ij is computed according to Equation (13). Equation (14) and Equation (15) are then used to determine the total input

S_{j}

and the final output vector

v_{j}

of the output capsule, respectively. After completing these steps, b_ij is updated again based on Equation (12) to reflect the consistency between the predicted vector and the output vector in the current iteration. This process is repeated T times to gradually converge the routing weights toward the optimal solution.

In the Max-Sum Aggregation Routing Mechanism, the iteration count is set to 3. After the final routing iteration completes, the output capsule generates a corresponding high-dimensional feature representation. By calculating the L2 norm of the output vector

v_{j}

, the prediction probabilities for normal state, motor bearing aging failure, and strong magnetic interference failure from the rotary encoder can be obtained. Subsequently, the health status is determined based on a preset threshold: when the predicted probability for a specific category exceeds the threshold, that category is deemed the model’s output result. If both the predicted probabilities for motor bearing aging failure and strong magnetic interference failure in the rotary encoder exceed the threshold simultaneously, it is determined as a compound failure mode where both faults coexist.

3.2. Diagnostic Process for the ECWTCN Method

This paper proposes a novel ECWTCN algorithm for the first time. Its complete fault diagnosis process is illustrated in Figure 4. The algorithm consists of five core modules, specifically as follows:

(1) Data Acquisition: Based on the hydraulic transmission principles of EHAs and their typical failure modes, vibration signals under four operating conditions are collected at key positions within the drive system using rotary transformers, pressure sensors, and piezoelectric accelerometers. This is performed at a predetermined sampling frequency and interval. This step aims to obtain real-time multi-sensor data of the equipment under varying health states, providing a foundation for subsequent modeling.

(2) Data Partitioning and Preprocessing: The collected raw vibration signals are divided into training and test sets. The training set contains only normal samples and single-fault samples, while the test set includes normal, single-fault, and compound-fault samples. Subsequently, Z-score normalization is applied to preprocess the data, enhancing the stability and convergence speed of model training.

(3) Model Construction: To achieve intelligent diagnosis of compound faults, the network architecture shown in Figure 2 is constructed. The model comprises two major modules: a feature extractor and a decoupled classifier. The feature extractor consists of a wavelet kernel convolution layer, a BN layer, a ReLU activation layer, and a max pooling layer. It aims to extract key features with clear physical significance across multiple scales, thereby enhancing the separability and representational capability of the input signals. The decoupled classifier incorporates a maximized aggregation routing mechanism. By improving the expressive accuracy of routing relationships between capsules, it achieves structured separation and discrimination of compound fault features, further enhancing the model’s ability to analyze coupled fault patterns.

(4) Model Training and Optimization: The model is trained using the training set, with network parameters optimized based on the loss function. Training ceases when the number of training iterations reaches the preset value. The model employs the marginal loss function for supervised learning, whose mathematical expression is shown in Equation (16).

Y_{C} = \{L_{C} \max {(0, l b - y_{C}^{p r e d})}^{2} + γ (1 - L_{C}) \max {(0, y_{C}^{p r e d} - u b)}^{2}\}

(16)

where L_C denotes the sample label (Boolean value),

y_{C}^{p r e d}

represents the prediction probability

{‖v_{C}‖}^{2}

, lb and ub denote the upper and lower bounds of the predicted value, respectively, and

γ

indicates the weighted penalty factor.

(5) Testing and Validation: Evaluate the model’s fault diagnosis performance using a test set to validate its effectiveness and robustness. Simultaneously, conduct interpretability analysis on the features learned by the wavelet capsule network to enhance the credibility of diagnostic results and further reveal the model’s decision-making mechanism.

4. Test Platform Setup and Data Acquisition

4.1. Experimental Platform Setup

As shown in Figure 5, the EHA integrated test platform primarily consists of key components including a DC power supply, proprietary EHA controller, PMSM, reduction pump, LCL filter, electro-hydraulic actuator, data acquisition system, data communication system, and industrial host computer. During system operation, various sensors first collect real-time data on the operational status of the actuator and hydraulic system. The measured signals are processed by the data acquisition system and transmitted to the industrial host computer. The host computer executes control algorithms based on the acquired data, generating corresponding control commands that are applied to the servo motor. The servo motor dynamically adjusts the flow rate and pressure of the hydraulic system by regulating the pump’s output power, thereby driving the movement of the hydraulic cylinder and load. This test platform effectively simulates the dynamic response characteristics encountered in actual engineering conditions, enabling accurate performance evaluation of the electro-hydraulic system under various operating states. Specific technical parameters of the experimental equipment are detailed in Table 1.

The EHA integrated test platform is designed to effectively replicate the operational states of electrohydraulic actuators under highly representative real-world fault conditions. In practical applications, the PMSM within the EHA is frequently subjected to severe mechanical and electromagnetic stresses. According to extensive industrial reliability surveys, motor bearing wear is the most prevalent mechanical degradation, accounting for approximately 40% to 50% of all motor-related failures [25]. Concurrently, the highly compact integration of EHAs places sensitive sensors in extremely close proximity to high-current power electronics, making rotor encoders exceptionally susceptible to strong electromagnetic interference (EMI) and signal distortion [26]. Crucially, these two failure modes frequently co-occur and interact in harsh operating environments; mechanical vibrations and rotor eccentricity induced by worn bearings dynamically distort the internal magnetic field, which subsequently intensifies the electromagnetic interference experienced by the encoder. To faithfully recreate this typical electromechanical coupling effect, the experimental design incorporates worn motor bearings to simulate mechanical aging, while operating the platform within strong magnetic fields generated by enclosed heavy equipment to emulate intense EMI. By accurately reflecting these physical realities, the platform reliably generates three typical failure modes—isolated bearing wear, isolated encoder magnetic interference, and their compound fault—thereby providing a controllable, realistic, and robust data foundation for the subsequent training and validation of intelligent diagnostic models.

To address the potential impact of fault severity on diagnostic performance, the severity of the injected mechanical wear fault (MF) and electromagnetic interference fault (RF) was explicitly quantified and verified.

(1): Bearing wear fault (MF). A set of intentionally degraded bearings was used to emulate motor-bearing aging. The mechanical severity was quantified by a combination of (i) physical indicators and (ii) signal-based indicators. Specifically, the bearing condition was characterized by the increase in radial clearance (ΔC = 40 μm) measured using a dial indicator, and by the surface defect morphology measured via optical microscopy, where the representative defect size was diameter 1.2 mm and depth 80 μm. In addition, the vibration signal under MF exhibited a consistent increase in impulsiveness, quantified by kurtosis (3.8), confirming the presence of repetitive impact signatures. Based on these indicators, the MF used in this study corresponds to an intermediate-stage degradation level according to the observed defect morphology and impact characteristics.
(2): Rotary encoder magnetic interference fault (RF). The electromagnetic severity was quantified by measuring the magnetic flux density at the encoder housing using a calibrated Lake Shore 475 DSP Gaussmeter. During RF tests, the external magnetic field was applied by permanent magnets, and the flux density at the encoder position was maintained at B = 35 mT (measured at distance 10 mm from the encoder surface, along radial direction). To verify interference, the encoder output exhibited increased jitter and occasional missing pulses, and the corresponding disturbance level was quantified using standard deviation of position error 0.12° and/or count error rate 0.8%.
(3): Compound fault (CF). CF data were collected by simultaneously applying the MF bearing condition and the RF magnetic field level described above, ensuring that the compound samples correspond to a consistent and reproducible severity setting.

4.2. Dataset Construction

This study validates the effectiveness and robustness of the proposed model in fault diagnosis tasks using the constructed EHA test platform. The basic experimental conditions are shown in Table 2.

During the experiments, vibration acceleration sensors were mounted on the housing of a permanent magnet synchronous motor, and vibration signal data was acquired at a sampling frequency of 24 kHz. The collected data encompassed four typical operating conditions for systematic model validation: 1000 r/min & 0 N·m, 1000 r/min & 50 N·m, 1250 r/min & 0 N·m, and 1250 r/min & 50 N·m. For each operating condition, data was collected for four operational states: Normal Condition (NC), Motor Bearing Wear Fault (MF), Rotary Encoder Strong Magnetic Interference Fault (RF), and Compound Fault of Bearing and Rotary Encoder (CF). Detailed information for each operating state is provided in Table 3. The aforementioned data provides a reliable basis for model training, testing, and performance evaluation.

4.3. Data Preprocessing

(1) Dataset Partitioning: First, the raw vibration signals collected from the in-house EHA controller test platform were partitioned chronologically, as shown in Figure 6. The first 70% of the data was allocated for model training, while the remaining 30% was reserved for model evaluation. Subsequently, within the training data segment, training samples were generated using a sliding window approach with a sample length of 4096 and an overlap rate of 0.25, thereby constructing the training set. The test set was constructed using the same method. To validate the model’s generalization capability under unseen fault modes, the test set additionally included compound fault samples not present in the training set. Through this process, a training set comprising 1002 samples and a test set comprising 568 samples were ultimately constructed.

(2) Normalization processing. To enhance the model’s training efficiency and improve its fault diagnosis accuracy, the input data undergoes preprocessing using the Z-score normalization method. This approach applies a linear transformation to the raw data, normalizing its mean to 0 and standard deviation to 1. This effectively mitigates training instability caused by differences in measurement units and magnitude variations. Its mathematical form is shown in Equation (17).

x^{'} = \frac{x - μ}{σ}

(17)

where

μ

and

σ

represent the mean and standard deviation of all signal data, respectively. x denotes the raw vibration signal, while

x^{'}

represents the normalized vibration signal.

4.4. Model Parameters

The wavelet capsule network consists of two components: a feature extractor and a decoupled classifier. The feature extractor comprises a Laplace wavelet kernel convolution layer, a BN layer, a ReLU activation layer, and a max pooling layer, designed to extract key features at multiple scales with physical interpretability. The decoupled classifier comprises two stacked capsule layers, enabling the structured separation and recognition of fault features. The output of the final capsule layer corresponds to the predicted probabilities for three operational states: normal operation, bearing failure, and rotary transformer failure. To enable effective fault discrimination, the threshold is set to the average of these three predicted probabilities. When the predicted probability for a specific operating state exceeds this threshold, it is classified as the corresponding fault category. Since compound faults result from the coupled effects of bearing and rotary transformer failures, the system identifies a compound fault when the predicted probabilities for both fault types exceed the threshold. The specific model architecture and parameter configuration are illustrated in Figure 7.

5. Experimental Verification and Analysis

5.1. Performance Evaluation Metrics

Given the multi-label nature of the proposed network’s output mechanism, this study adopts an evaluation metric system suitable for multi-label classification tasks [27]. Specifically, Subset Accuracy (SA), Hamming Loss (HL), Ranking Loss (RL), and Label Ranking Average Precision (LRAP) are employed as performance evaluation metrics.

SA measures the degree to which a model’s predicted label set perfectly aligns with the true label set. A prediction is considered correct only when all predicted labels for a sample perfectly match its true labels. As a stringent multi-label evaluation metric, this indicator comprehensively reflects a model’s ability to characterize and predict the overall label relationships within a sample. The formula for SA is shown in Equation (18).

SA = \frac{1}{N} \sum_{i = 1}^{N} I ({\overset{⌢}{y}}_{i} = y_{i})

(18)

where N denotes the total number of samples,

{\overset{⌢}{y}}_{i}

represents the set of predicted labels for the i-th sample,

y_{i}

denotes the corresponding set of true labels, and

I (\cdot)

is the indicator function, which takes the value 1 when the predicted label set is completely consistent with the true label set, and 0 otherwise. Based on this, HL is used to measure the model’s independent error rate for each label in multi-label prediction, defined as shown in Equation (19).

HL = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{L} \sum_{j = 1}^{L} I ({\overset{⌢}{y}}_{i j} \neq y_{i j})

(19)

where L denotes the number of labels,

y_{i j}

represents the true label for sample i and label j, and

{\overset{⌢}{y}}_{i j}

denotes the predicted label for sample i and label j.

RL evaluates a model’s ability to determine label order within a sample, with its core metric being the average proportion of incorrectly ordered label pairs. This metric reflects the model’s sorting performance in distinguishing relevant labels from irrelevant ones, playing a crucial role in analyzing the model’s discrimination accuracy in multi-label scenarios. The formal definition of ranking loss is shown in Equation (20).

RL (y, \hat{f}) = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{{‖y_{i}‖}_{0} (L - {‖y_{i}‖}_{0})} \times |\{(p, q) : {\hat{f}}_{i p} \leq {\hat{f}}_{i q}, y_{i p} = 1, y_{i q} = 0\}|

(20)

where

{\hat{f}}_{i p}

denotes the model’s predicted score for the p-th label of the i-th sample;

‖\cdot‖

represents the number of elements in the set, and

{‖\cdot‖}_{0}

indicates the L0 norm, which calculates the number of non-zero elements in the vector.

LRAP is used to evaluate the overall accuracy of a model in label ranking tasks, with its calculation process comprehensively considering the relative ranking relationships among predicted labels. This metric reflects the model’s correctness in prioritizing relevant labels within multi-label scenarios. The mathematical expression for LRAP is shown in Equation (21).

LRAP (y, \hat{f}) = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{{‖y_{i}‖}_{0}} \sum_{j : y_{y} = 1} \frac{|L_{i j}|}{{rank}_{i j}}

(21)

where

L_{i j} = \{q : y_{i q} = 1, {\hat{f}}_{i q} \geq {\hat{f}}_{i j}\}

denotes the set of all true labels in the i-th sample where the prediction scores are greater than or equal to the prediction score

{\hat{f}}_{i j}

of the j-th label.

{rank}_{i j} = |\{q : {\hat{f}}_{i q} \geq {\hat{f}}_{i j}\}

represents the cardinality of set L_ij, characterizing the ranking position of the j-th label within that sample.

5.2. Performance Analysis of Compound Fault Diagnosis

To validate the effectiveness of the proposed model, this study conducts comparative evaluations against multiple advanced methods representative of the intelligent fault diagnosis field. All experiments are implemented using the Python 3.11 programming language within the PyTorch 2.5.1 deep learning framework. Experimental settings include a batch size of 8 for model inputs and 80 training iterations; the decoupled classifier’s output threshold is set to the average of all predicted probabilities; The learning rate was set to 0.0001, and the Nadam optimizer was employed to update network parameters. To ensure fair comparisons between different network architectures, key training parameters were standardized across models under comparable network conditions. Given the structural design and training strategy differences among models in multi-label classification tasks, their output probability distributions may also vary significantly. Therefore, to evaluate each method under optimal conditions, this experiment sets corresponding decision thresholds based on the probability distribution characteristics exhibited by each method during training. Currently, commonly used methods in the field of multi-label compound fault diagnosis include CNN, DCNN, DDCNN, WavCNN, EMCCN, and ECWTCN. Specific details of the aforementioned comparative methods are provided in Table 4.

To evaluate the performance of the proposed ECWTCN method, this study systematically compared it with five other typical fault diagnosis methods under four different operating conditions. For each method–condition combination, experiments were independently repeated 20 times, and the results were averaged to mitigate the impact of random fluctuations on performance assessment. The experimental results for each method under different operating conditions are summarized in Table 5.

To more intuitively compare the performance of different methods under four operating conditions, an analysis based on the data in Table 5 was conducted, yielding the following conclusions.

(1): Based on the overall diagnostic performance across the four operating conditions, Condition C achieved the best results in SA and LRAP, while also exhibiting the lowest HL and RL values among all four conditions. This indicates that the model demonstrates the most superior comprehensive diagnostic performance under this condition. In contrast, operating condition D showed slightly weaker overall performance than condition C but still outperformed conditions A and B. Condition A exhibited the weakest metrics and lowest diagnostic capability. This disparity primarily stems from condition C’s larger vibration amplitude and higher frequency characteristics, which render fault modes more pronounced in the signal. This facilitates different neural networks in extracting discriminative features and achieving accurate classification.
(2): Comparing the performance of different methods, the proposed ECWTCN approach achieved the highest SA across all operating conditions while maintaining the lowest HL, demonstrating its significant advantage in predicting label consistency with actual labels. Regarding RL and LRAP metrics, ECWTCN’s performance is comparable to DDCNN and EMCNN, indicating that all three methods demonstrate strong capabilities in label ranking. Overall, however, ECWTCN achieves optimal or near-optimal levels across all four metrics, highlighting its superior generalization ability and stability in multi-label compound fault diagnosis tasks.

In summary, the proposed ECWTCN method demonstrates remarkably superior diagnostic performance across all four operating conditions. It maintains robust discrimination capability for both single and compound faults; under operating condition A, it achieves a SA of 93.7% and an LRAP of 0.998, indicating reliable exact-match prediction and highly accurate label ranking. These results indicate that the proposed model not only effectively differentiates various fault modes but also exhibits robust performance under diverse operating conditions. To further demonstrate the model’s advantage in decoupling compound faults, Figure 8 presents a confusion matrix comparison of four representative methods under operating condition A. The results reveal that CNN and WavCNN, lacking mechanisms for structured separation of compound fault patterns, fail to accurately identify compound faults and generally misclassify them as single bearing aging faults, as shown in Figure 8a,d. In contrast, DCNN, DDCNN, EMCNN, and the proposed method all employ capsule structures to construct decoupling classifiers, enabling effective decomposition and recognition of compound fault patterns, as demonstrated by the confusion matrices in Figure 8b,c,e,f.

To provide a more complete quantitative assessment, the paper further reports standard classification metrics derived from the confusion matrices in Figure 9, including Accuracy, Precision, Recall, and F1-score. These metrics are computed directly from the confusion matrices, and Precision/Recall/F1-score are summarized using macro-averaging over the four status categories (NC, MF, RF, and CF).

As shown in Table 6, the proposed ECWTCN achieves the best overall performance, reaching 100.00% in all four metrics under operating condition A. Among the comparative baselines, EMCCN and DDCNN also yield high scores, whereas CNN and WavCNN exhibit notable degradation due to the misclassification of CF samples as MF.

5.3. Explainability Analysis

5.3.1. Comparative Analysis of Continuous Wavelet Transform and Feature Maps

The wavelet kernel convolution layer performs convolution operations on the input signal by applying wavelet convolution kernels with varying translation and scale parameters. Its core objective is to extract frequency components from the original signal that match the center frequency of the wavelet kernel, thereby yielding feature representations with clear physical meaning and interpretability. This process is fundamentally analogous to the feature extraction mechanism of the CWT: The CWT generates a series of wavelet basis functions with distinct central frequencies and time-domain positions through scale transformation and time shifting. These basis functions are then multiplied and integrated with local signal segments to extract the signal’s frequency components across different time intervals.

To validate the effectiveness of wavelet convolution layers in extracting interpretable features, this study uses compound fault signals from bearings and rotary encoders as examples. Time–frequency maps are first obtained via CWT, followed by feature maps generated through wavelet kernel convolution layers. The resulting outputs are then subjected to comparative analysis.

As shown in Figure 9a,b, the CWT exhibits distinct bright regions at specific frequencies and time intervals. These regions correspond to prominent impact components within the compound fault signal, indicating that the CWT accurately captures the key frequency characteristics of the compound fault and their positions along the time axis. Subsequently, the vibration signal is input into the wavelet kernel convolution layer to generate feature maps, with values below zero in the feature maps set to zero, as shown in Figure 9c. It can be observed that the response regions of multiple wavelet convolution kernels are significantly highlighted, and these highlighted sample points highly align with the impact sample points possessing large magnitudes in the compound fault signal. This result demonstrates that the wavelet kernel convolution layer effectively extracts impact features and their corresponding frequency information from the vibration signal, validating the interpretability and accuracy of its feature extraction.

5.3.2. Analysis of Feature Extractor Output

The feature extractor consists of a wavelet kernel convolution layer, a BN layer, a ReLU activation layer, and a max pooling layer. Its function is to learn discriminative features from raw vibration signals and feed them into the decoupled classifier for fault identification and classification. The ability of the feature extractor to accurately capture discriminative features of vibration signals across different categories critically impacts subsequent classification performance [34].

To validate the effectiveness of the feature extractor, a comparative analysis was conducted on four types of time-domain vibration signals and their corresponding feature maps obtained through the feature extractor’s learning process. Figure 10a shows that under normal conditions, the bright regions are relatively dispersed. This is because normal vibration signals lack significant high-amplitude impact components, resulting in more uniformly extracted impact features by the model. As shown in Figure 10b, the bright regions in the bearing aging fault state are more concentrated compared to the normal state, indicating that the model successfully captured the impact features characteristic of this type of fault. Figure 10c reveals that the highlighted regions for strong magnetic interference faults in rotary encoders converge further, exhibiting more pronounced and distinct impact patterns. Figure 10d shows the most prominent highlighted regions for compound faults, consistent with the significant impact features observed in the original time-domain signal.

The above analysis demonstrates that for different fault types, the feature extractor can learn corresponding feature patterns, and these patterns exhibit good distinguishability across different categories. This lays the foundation for accurate diagnosis and effective decoupling of compound faults.

5.4. Computational Complexity Analysis

The hardware environment used in this experiment is shown in Table 7. To comprehensively evaluate the computational efficiency of the proposed ECWTCN method, this study systematically analyzed the model performance from both time complexity and space complexity perspectives. Time complexity was measured using runtime and floating-point operations (FLOPs): runtime was calculated as the average of twenty consecutive training and testing iterations to reflect actual computational overhead; FLOPs were used to measure computational demands during forward and backward propagation. Lower runtime and reduced FLOPs indicate superior time complexity performance. Spatial complexity was evaluated by the number of model parameters, where a smaller parameter size offers greater advantages in storage and deployment. To compare the impact of different capsule dimensions on computational complexity, this experiment conducted comparative tests with digital capsule layers set to 16 and 32 dimensions respectively.

As shown in Table 8, the ECWTCN method demonstrates superior computational efficiency compared to DDCNN and EMCCN across all capsule dimensions, with particularly pronounced advantages in high-dimensional settings, indicating higher time efficiency in practical computations. Regarding the FLOPs metric, ECWTCN maintains the lowest computational load across both capsule dimension settings. Moreover, as the capsule dimension increases from 16 to 32, its FLOP growth rate is significantly lower than that of other methods, reflecting more stable time complexity characteristics. Furthermore, in terms of space complexity, ECWTCN exhibits significantly fewer parameters than both DDCNN and EMCCN across all dimensionality settings, demonstrating its substantial advantages in storage requirements and network size control.

In summary, the ECWTCN method not only demonstrates superior performance in compound fault diagnosis but also outperforms the comparison models in terms of time and space complexity, showcasing its efficient, lightweight, and computationally practical characteristics suitable for engineering applications.

6. Conclusions

To address the challenges of data scarcity and fault coupling in compound fault identification for electro-hydrostatic actuators (EHAs), this paper proposes an intelligent compound fault decoupling diagnosis framework based on ECWTCN. The proposed method is trained using only single-fault labeled data and integrates a wavelet-kernel convolution module with a capsule-based architecture to extract multi-scale time–frequency representations and decouple coupled fault characteristics through hierarchical vector features. In addition, a maximized aggregation routing strategy is introduced to establish more globally coherent information transmission between capsule layers, alleviating the perspective locality and computational inefficiency of conventional dynamic routing. With a marginal-loss-based optimization strategy, the framework is evaluated on real-world vibration data collected from an EHA experimental platform under multiple operating conditions.

Experimental results demonstrate that the proposed method consistently achieves strong multi-label diagnostic performance and outperforms representative comparative approaches across multiple evaluation metrics. The interpretability analyses further indicate that the wavelet-kernel convolution layer captures physically meaningful impact-related patterns and relevant frequency information, while the capsule representations learn class-discriminative features that support compound fault decoupling. These findings suggest that the proposed framework provides a reliable and interpretable solution for compound fault diagnosis under limited labeled data conditions, showing promise for practical engineering applications.

Despite these encouraging results, several limitations should be acknowledged. First, the current validation is conducted on data from a specific EHA platform; the generalization capability to other actuator configurations, sensor placements, and industrial environments with different noise characteristics remains to be further verified. Second, although multiple operating conditions are considered, real-world deployments may involve more complex and rapidly varying regimes, and distribution shifts could affect robustness. Third, the training paradigm relies on the availability and correctness of single-fault labels; incomplete, ambiguous, or biased annotations in practice may influence diagnostic reliability, particularly for rare or evolving fault modes. Finally, while the proposed routing improves coherence, the capsule-based inference still introduces additional computational overhead, which may constrain strict real-time deployment on resource-limited hardware.

Future work will extend the evaluation to more diverse and dynamically changing operating conditions and investigate cross-platform generalization, including transfer learning or domain adaptation to mitigate distribution shifts. We will also explore adaptive thresholding and learning strategies for weakly labeled or partially labeled scenarios, and develop lightweight implementations to reduce computational cost. In addition, integrating fault severity quantification and prognosis-oriented modeling will be considered to enhance the practical utility of the proposed framework in long-term health monitoring applications.

Author Contributions

Conceptualization, S.C. and W.L.; methodology, S.C., W.L., K.H., X.D. and R.L.; software, S.C., R.L. and K.H.; visualization, X.D.; writing (original draft preparation), S.C. and W.L.; writing (review and editing), W.L., X.D. and S.C.; validation, K.H.; supervision, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Department of Hubei Province, China (2024BAB067), the Fundamental Research Funds for the Central Universities (104972025YJS0119) and 2025 Chutian Elite Program for Innovation and Entrepreneurship Teams (20259109).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

We are grateful to our families, friends, and laboratory colleagues for their unwavering understanding and encouragement.

Conflicts of Interest

Author Xiaoqing Deng was employed by the Hubei ChuangSiNuo Electrical Technology Corp. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EHA	Electro-Hydrostatic Actuator
CWT	Continuous Wavelet Transform
CNN	Convolutional Neural Network
BP	Back Propagation Neural Network
ECWTCN	Enhanced Continuous Wavelet Transform Capsule Network
PMSM	Permanent Magnet Synchronous Motor
MF	Motor Bearing Wear Fault
RF	Rotary Encoder Strong Magnetic Interference Fault
CF	Compound Fault of Bearing and Rotary Encoder
DCNN	Deep CNN
EMCCN	Expectation-Maximization Capsule Network
SA	Subset Accuracy
HL	Hamming Loss
RL	Ranking Loss
LRAP	Label Ranking Average Precision
ReLU	Rectified Linear Unit
NC	Normal Condition
BN	Batch Normalization
MPL	Max Pooling Layer
EM	Expectation Maximization
CN	Capsule Networks
WavCNN	Wavelet Kernel CNN
DDCNN	Deep Decoupling CNN

References

Melluso, F.; Spirto, M.; Nicolella, A.; Malfi, P.; Tordela, C.; Cosenza, C.; Savino, S.; Niola, V. Torque fault signal extraction in hybrid electric powertrains through a wavelet-supported processing of residuals. Mech. Syst. Signal Process. 2026, 242, 113652. [Google Scholar] [CrossRef]
Huang, K.; Li, W.; Cao, S.; Gao, F.; Li, R.; Xu, W.; Lin, B. Recent advances in fault diagnosis of ship integrated power systems: A review. Ocean Eng. 2026, 343, 123141. [Google Scholar] [CrossRef]
Jian, Z.; Guo, J.; Peng, G.; Yin, Y. Fractal Operators and Fractional-Order Mechanics of Bone. Fractal Fract. 2023, 7, 642. [Google Scholar] [CrossRef]
Song, W.; Zhong, M.; Yang, M.; Qi, D.; Spadini, S.; Cattani, P.; Villecco, F. Remaining Useful Life Prediction of Roller Bearings Based on Fractional Brownian Motion. Fractal Fract. 2024, 8, 183. [Google Scholar] [CrossRef]
Hu, Y.; Song, Y.; He, X.; Zhao, X.; Yang, X.; Yao, J. MAACCN: An intelligent decoupling diagnosis method for compound faults in electro-hydrostatic actuators. IEEE Trans. Instrum. Meas. 2025, 74, 3532611. [Google Scholar] [CrossRef]
Chen, J.; Li, T.; He, J.; Liu, T. An Interpretable Wavelet Kolmogorov Arnold Con-volutional LSTM for Spatial-temporal Feature Extraction and Intelligent Fault Diagnosis. J. Dyn. Monit. Diagn. 2025, 4, 183–193. [Google Scholar] [CrossRef]
Song, R.; Jiao, C.; Shi, H.; Chen, L. Intelligent compound fault decoupling of rolling bearing based on parallel capsule network. Eng. Appl. Artif. Intell. 2025, 161, 112206. [Google Scholar] [CrossRef]
Song, Y.; Qi, Y.; Liu, H.; Dai, H. Research on the effect of electro-hydrostatic actuators (EHA) on the curve negotiation performance and wheel wear of intercity EMUs. Veh. Syst. Dyn. 2025, 1–30. [Google Scholar] [CrossRef]
Li, J.; Xu, F.; Xu, Y.; Wang, Q.; Zhang, H. Touch sensing–based high-precision robotic grinding for spatially curved weld seam. Int. J. Adv. Manuf. Technol. 2025, 140, 1523–1539. [Google Scholar] [CrossRef]
Tan, C.; Liu, H.; Chen, L.; Wang, J.; Chen, X.; Wang, G. Characteristic analysis and model predictive-improved active disturbance rejection control of direct-drive electro-hydrostatic actuators. Expert Syst. Appl. 2026, 301, 130565. [Google Scholar] [CrossRef]
Li, Y.; Jia, Z.; Liu, J.; Wang, K.; Zhao, P.; Liu, X.; Liu, Z. An Integrated Strategy for Interpretable Fault Diagnosis of UAV EHA DC Drive Circuits Under Early Fault and Imbalanced Data Conditions. Drones 2025, 9, 189. [Google Scholar] [CrossRef]
Sun, B.; Li, H.; Wang, C.; Ma, Z.; Guan, X. Optimized weights Time-Frequency Analysis: A novel method for fault diagnosis in rotating Machinery under Time-Varying speeds. Mech. Syst. Signal Process. 2025, 226, 112345. [Google Scholar] [CrossRef]
Liu, J.; Zhong, T.; Liu, Y.; Cen, L.; Shao, H. Small Sample-oriented Variable Working Condition Fault Diagnosis via Non-data-enhanced Multi-category Contrastive Learning. IEEE Trans. Instrum. Meas. 2025, 74, 3520714. [Google Scholar] [CrossRef]
Xu, Z.; Chen, H.; Liu, C.; Shen, J. Fully coupled fluid-structure interaction of diaphragm rupture in high-pressure-ratio shock tunnels. Chin. J. Aeronaut. 2025, 39, 103950. [Google Scholar] [CrossRef]
Lv, Y.; Wang, J.; Zhang, C.; Ding, J. Composite fault feature extraction for gears based on MCKD-EWT adaptive wavelet threshold noise reduction. Meas. Control 2025, 58, 185–195. [Google Scholar] [CrossRef]
Liang, R.; Ran, W.; Chen, Y.; Zhu, R. Fault diagnosis method for rotating machinery based on multi-scale features. Chin. J. Mech. Eng. 2023, 36, 141. [Google Scholar] [CrossRef]
Khan, S.; Kumar, A. Failure analysis in advance cylindrical composite pressure vessel under pressure & temperature for hydrogen storage: A comprehensive review. Polym. Compos. 2025, 46, 2933–2973. [Google Scholar] [CrossRef]
Xiao, Y.; Zhou, H.; Zhou, X.; Wang, J. Multilabel Transfer Learning Method with Dynamic Multimetric for Coupling Fault Diagnosis. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 18874–18888. [Google Scholar] [CrossRef]
Alhussen, A.; Haq, M.A.; Khan, A.A.; Mahendran, R.K.; Kadry, S. XAI-RACapsNet: Relevance aware capsule network-based breast cancer detection using mammography images via explainability O-net ROI segmentation. Expert Syst. Appl. 2025, 261, 125461. [Google Scholar] [CrossRef]
Aguiar-Conraria, L.; Soares, M.J. The continuous wavelet transform: Moving beyond uni-and bivariate analysis. J. Econ. Surv. 2014, 28, 344–375. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef] [PubMed]
Ju, Z.; Chen, Y.; Qiang, Y.; Chen, X.; Ju, C.; Yang, J. A systematic review of data augmentation methods for intelligent fault diagnosis of rotating machinery under limited data conditions. Meas. Sci. Technol. 2024, 35, 122004. [Google Scholar] [CrossRef]
Georghiades, C.N.; Snyder, D.L. The expectation-maximization algorithm for symbol unsynchronized sequence detection. IEEE Trans. Commun. 2002, 39, 54–61. [Google Scholar] [CrossRef]
Xing, X.; Luo, Y.; Han, B.; Qin, L.; Xiao, B. Hybrid Data-Driven and Multisequence Feature Fusion Fault Diagnosis Method for Electro-Hydrostatic Actuators of Transport Airplane. IEEE Trans. Ind. Inform. 2025, 21, 3306–3315. [Google Scholar] [CrossRef]
Nandi, S.; Toliyat, H.A.; Li, X. Condition monitoring and fault diagnosis of electrical motors—A review. IEEE Trans. Energy Convers. 2005, 20, 719–729. [Google Scholar] [CrossRef]
Maré, J.C. Aerospace Actuators 2: Signal-By-Wire and Power-By-Wire; John Wiley & Sons: Hoboken, NJ, USA, 2017; p. 2. [Google Scholar]
Liu, R.; Meng, G.; Yang, B.; Sun, C.; Chen, X. Dislocated time series convolutional neural architecture: An intelligent fault diagnosis approach for electric machine. IEEE Trans. Ind. Inform. 2016, 13, 1310–1320. [Google Scholar] [CrossRef]
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Into Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef]
Shin, H.C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Summers, R.M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [Google Scholar] [CrossRef]
Huang, R.; Liao, Y.; Zhang, S.; Li, W. Deep decoupling convolutional neural network for intelligent compound fault diagnosis. IEEE Access 2018, 7, 1848–1858. [Google Scholar] [CrossRef]
Jiang, G.; Wang, J.; Wang, L.; Xie, P.; Li, Y.; Li, X. An interpretable convolutional neural network with multi-wavelet kernel fusion for intelligent fault diagnosis. J. Manuf. Syst. 2023, 70, 18–30. [Google Scholar] [CrossRef]
Li, W.; Lan, H.; Chen, J.; Feng, K.; Huang, R. WavCapsNet: An interpretable intelligent compound fault diagnosis method by backward tracking. IEEE Trans. Instrum. Meas. 2023, 72, 1–11. [Google Scholar] [CrossRef]
Zhao, Y.; Zhou, M.; Zhang, N.; Xu, X.; Zhang, X. Fault diagnosis of gas turbine based on matrix capsules with EM routing. Syst. Sci. Control Eng. 2021, 9, 96–102. [Google Scholar] [CrossRef]
Preece, S.J.; Goulermas, J.Y.; Kenney, L.P.; Howard, D. A comparison of feature extraction methods for the classification of dynamic activities from accelerometer data. IEEE Trans. Biomed. Eng. 2008, 56, 871–879. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Algorithmic Structure of the ECWTCN Method.

Figure 2. Examples of Maximizing Aggregated Routing Mechanisms.

Figure 3. The operational process of capsule networks.

Figure 4. Diagnostic Process Based on the ECWTCN Method.

Figure 5. EHA integrated test platform.

Figure 6. Training and testing set partitioning.

Figure 7. Structural parameters of the small-wave capsule network model.

Figure 8. Confusion matrix analysis. (a) CNN; (b) DCNN; (c) DDCNN; (d) WavCNN; (e) EMCCN; (f) ECWTCN.

Figure 9. Vibration signals, Time–Frequency and feature figures under compound faults. (a) Compound failure of PMSM bearings and rotary encoder; (b) Time–Frequency plot; (c) Output feature map.

Figure 10. Time-domain vibration signals and characteristic diagrams for four fault conditions. (a) Normal condition (NC); (b) Motor bearing wear fault (MF); (c) Rotary encoder strong magnetic interference fault (RF); (d) Compound fault of bearing and rotary encoder (CF).

Table 1. Detailed specifications and parameters of experimental equipment.

Hardware Type	Model Selection	Manufacturer (Location)
Main control chip	AVP32F335QP176S	AVICHIP Technology (Shenzhen, China)
Temperature sensor	TR34	WIKA Alexander Wiegand SE & Co. KG (Klingenberg, Germany)
Pressure sensor	MEAS-US175-C00002-200BG	TE Connectivity (Schaffhausen, Switzerland)
Flowmeter	KRACHT-VC1/VC0.025	KRACHT GmbH (Werdohl, Germany)
Grid ruler	Heidenhain LC 483	DR. JOHANNES HEIDENHAIN GmbH (Traunreut, Germany)
Rotary Encoder	Heidenhain ERN 1387	DR. JOHANNES HEIDENHAIN GmbH (Traunreut, Germany)

Table 2. Parameters of data acquisition and experimental conditions.

Data Type	Data Specifications
Signal	Motor vibration signal
System sampling frequency	24 kHz
Accumulator pressure	0.65 MPa
Temperature	25°
Hydraulic fluid	ISO VG 32

Table 3. Test dataset.

Operating Conditions	Rotational Speed (r/min)	Load (N·m)	Training Set	Test Set
A	1000	0	NC, MF, RF	NC, MF, RF, CF
B	1000	50	NC, MF, RF	NC, MF, RF, CF
C	1250	0	NC, MF, RF	NC, MF, RF, CF
D	1250	50	NC, MF, RF	NC, MF, RF, CF

Table 4. Introduction to comparative methods.

Algorithm Name	Algorithm Structure	Loss Function	Threshold Setting
CNN [28] (Convolutional Neural Network)	Extract high-dimensional features using multiple convolutional layers, and perform classification through fully connected layers.	-	-
DCNN [29] (Deep CNN)	The convolutional layer is followed by three fully connected layers (128 → 64 → 3), with the final layer employing a Softmax activation function.	Binary cross-entropy	0.2
DDCNN [30] (Deep Decoupling CNN)	Introducing capsule networks, dynamic routing establishes transmission relationships between capsules; utilizing multi-layer capsules to replace fully connected layers enables feature clustering, capable of outputting both single-fault and compound-fault scenarios.	Multi-label Margin Loss	0.3
WavCNN [31,32] (Wavelet Kernel CNN)	Replace the first traditional convolutional layer with a wavelet kernel convolutional layer to learn interpretable features.	-	-
EMCCN [33] (Expectation-Maximization Capsule Network)	Replacing traditional dynamic routing mechanisms with the EM routing algorithm to optimize the model using a divergence loss function.	Diffusion Loss Function	0.3

Table 5. Compound fault diagnosis results.

Comparative Method	Operating Conditions	SA		HL		RL		LRAP
Comparative Method	Operating Conditions	Mean	STD	Mean	STD	Mean	STD	Mean	STD
CNN	A	0.741	1.2 × 10⁻³	0.236	4.1 × 10⁻²	0.022	3.7 × 10⁻²	0.752	9.7 × 10⁻³
DCNN		0.772	3.5 × 10⁻²	0.228	2.2 × 10⁻²	0.019	1.4 × 10⁻⁴	0.994	3.7 × 10⁻³
DDCNN		0.903	4.5 × 10⁻³	0.097	1.7 × 10⁻³	0.006	4.5 × 10⁻⁴	0.982	3.8 × 10⁻²
WavCNN		0.765	8.5 × 10⁻⁴	0.213	3.3 × 10⁻²	0.020	1.0 × 10⁻²	0.783	6.4 × 10⁻⁴
EMCCN		0.895	2.8 × 10⁻²	0.107	2.5 × 10⁻²	0.005	1.2 × 10⁻⁴	0.998	1.2 × 10⁻³
ECWTCN		0.937	1.7 × 10⁻⁵	0.061	1.9 × 10⁻³	0.005	3.3 × 10⁻⁵	0.998	6.4 × 10⁻⁴
CNN	B	0.744	1.3 × 10⁻³	0.234	1.3 × 10⁻³	0.022	2.1 × 10⁻⁴	0.769	3.2 × 10⁻³
DCNN		0.771	2.8 × 10⁻³	0.227	3.5 × 10⁻²	0.018	7.6 × 10⁻⁵	0.992	3.3 × 10⁻³
DDCNN		0.908	1.3 × 10⁻²	0.092	2.6 × 10⁻²	0.001	3.6 × 10⁻⁴	0.986	4.1 × 10⁻³
WavCNN		0.774	3.6 × 10⁻²	0.211	1.2 × 10⁻²	0.019	1.5 × 10⁻²	0.799	4.3 × 10⁻³
EMCCN		0.917	1.7 × 10⁻⁵	0.092	3.9 × 10⁻²	0.002	2.6 × 10⁻²	1.000	1.9 × 10⁻⁴
ECWTCN		0.948	1.4 × 10⁻⁴	0.054	1.1 × 10⁻²	0.002	2.7 × 10⁻³	1.000	2.4 × 10⁻²
CNN	C	0.738	1.2 × 10⁻³	0.233	3.0 × 10⁻³	0.021	7.7 × 10⁻³	0.785	2.7 × 10⁻²
DCNN		0.774	3.6 × 10⁻²	0.225	1.0 × 10⁻²	0.016	1.6 × 10⁻³	0.994	2.5 × 10⁻³
DDCNN		0.914	3.7 × 10⁻²	0.087	1.2 × 10⁻²	0.001	2.6 × 10⁻²	0.991	1.1 × 10⁻³
WavCNN		0.768	3.3 × 10⁻³	0.207	3.7 × 10⁻³	0.018	7.2 × 10⁻⁴	0.785	3.0 × 10⁻³
EMCCN		0.932	1.3 × 10⁻⁵	0.079	2.8 × 10⁻²	0.001	3.1 × 10⁻³	0.999	1.2 × 10⁻³
AECWTCN		0.951	1.6 × 10⁻³	0.047	2.5 × 10⁻⁴	0.001	1.8 × 10⁻⁴	0.999	3.6 × 10⁻²
CNN	D	0.746	1.4 × 10⁻⁴	0.228	1.6 × 10⁻⁵	0.022	1.6 × 10⁻³	0.776	3.8 × 10⁻²
DCNN		0.773	3.4 × 10⁻³	0.226	7.4 × 10⁻³	0.017	2.2 × 10⁻³	0.995	1.8 × 10⁻³
DDCNN		0.938	3.6 × 10⁻²	0.061	1.1 × 10⁻⁴	0.000	3.2 × 10⁻⁴	0.992	3.4 × 10⁻²
WavCNN		0.772	1.1 × 10⁻²	0.198	4.9 × 10⁻⁴	0.019	2.4 × 10⁻²	0.791	2.8 × 10⁻²
EMCCN		0.946	4.1 × 10⁻²	0.049	2.3 × 10⁻³	0.000	2.3 × 10⁻²	1.000	3.1 × 10⁻⁴
ECWTCN		0.968	9.7 × 10⁻³	0.032	4.2 × 10⁻⁴	0.000	1.8 × 10⁻⁴	1.000	4.9 × 10⁻³

Table 6. Performance comparison of different models under operating condition A.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
CNN	75.00	87.50	75.00	66.67
DCNN	98.67	98.74	98.67	98.67
DDCNN	99.12	99.15	99.12	99.12
WavCNN	75.00	87.50	75.00	66.67
EMCCN	99.75	99.75	99.75	99.75
ECWTCN	100.00	100.00	100.00	100.00

Table 7. Configuration of the computational hardware environment.

Hardware Type	Parameter Configuration
Central Processing Unit (CPU)	Intel Core i7-11800H
Random Access Memory (RAM)	32 GB
Graphics Processing Unit (GPU)	NVIDIA RTX 4060 Ti (16 GB)

Table 8. Comparison of computational complexity and efficiency for different methods.

Method	Digit Capsule Dimension	Runtime (s)	FLOPs	Number of Parameters
ECWTCN	16	4.56	1.6378 × 10⁷	1.8963 × 10⁵
EMCCN	16	4.68	1.6599 × 10⁷	4.1735 × 10⁵
DDCNN	16	5.27	1.6389 × 10⁷	4.1218 × 10⁵
ECWTCN	32	4.51	1.6095 × 10⁷	1.9017 × 10⁵
EMCCN	32	4.92	1.6832 × 10⁷	6.8765 × 10⁵
DDCNN	32	5.72	1.6631 × 10⁷	6.4022 × 10⁵

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cao, S.; Li, W.; Deng, X.; Huang, K.; Li, R. Multiscale Feature Extraction and Decoupled Diagnosis for EHA Compound Faults via Enhanced Continuous Wavelet Transform Capsule Network. Processes 2026, 14, 1043. https://doi.org/10.3390/pr14071043

AMA Style

Cao S, Li W, Deng X, Huang K, Li R. Multiscale Feature Extraction and Decoupled Diagnosis for EHA Compound Faults via Enhanced Continuous Wavelet Transform Capsule Network. Processes. 2026; 14(7):1043. https://doi.org/10.3390/pr14071043

Chicago/Turabian Style

Cao, Shuai, Weibo Li, Xiaoqing Deng, Kangzheng Huang, and Rentai Li. 2026. "Multiscale Feature Extraction and Decoupled Diagnosis for EHA Compound Faults via Enhanced Continuous Wavelet Transform Capsule Network" Processes 14, no. 7: 1043. https://doi.org/10.3390/pr14071043

APA Style

Cao, S., Li, W., Deng, X., Huang, K., & Li, R. (2026). Multiscale Feature Extraction and Decoupled Diagnosis for EHA Compound Faults via Enhanced Continuous Wavelet Transform Capsule Network. Processes, 14(7), 1043. https://doi.org/10.3390/pr14071043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiscale Feature Extraction and Decoupled Diagnosis for EHA Compound Faults via Enhanced Continuous Wavelet Transform Capsule Network

Abstract

1. Introduction

2. Fundamental Theory of the Small Wave Capsule Network

2.1. Continuous Wavelet Transform

2.2. Convolutional Neural Network

3. The Proposed Method

3.1. ECWTCN Algorithm Structure

3.2. Diagnostic Process for the ECWTCN Method

4. Test Platform Setup and Data Acquisition

4.1. Experimental Platform Setup

4.2. Dataset Construction

4.3. Data Preprocessing

4.4. Model Parameters

5. Experimental Verification and Analysis

5.1. Performance Evaluation Metrics

5.2. Performance Analysis of Compound Fault Diagnosis

5.3. Explainability Analysis

5.3.1. Comparative Analysis of Continuous Wavelet Transform and Feature Maps

5.3.2. Analysis of Feature Extractor Output

5.4. Computational Complexity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI