1. Introduction
With the increasing depletion of terrestrial resources worldwide, the ocean has become a vital strategic field for resource exploration and development in the 21st century [
1,
2]. As core pieces of equipment for the exploitation of marine oil and gas resources, subsea drilling rigs operate for long periods of time in extreme environments characterized by high deep-sea pressure, intense corrosion, and variable loads. This poses extremely high demands on the equipment’s safety and reliability. On subsea drilling rigs, main shaft bearings are critical rotating components that transmit power and receive the shock loads from drilling. Consequently, their health condition directly influences the equipment’s operational reliability and working efficiency.
Unlike land-based equipment, the operating environment for subsea drilling rig bearings is harsh. The strong damping effect of the deep-sea liquid medium causes the vibration signals generated by bearing faults to rapidly attenuate. Concurrently, drilling operations, ocean currents, and marine biological activity all generate significant background noise. This results in the collected bearing vibration signals generally exhibiting characteristics of a low Signal-to-Noise Ratio and weak fault features. Furthermore, the scarcity of energy resources in offshore and deep-sea environments imposes stringent requirements on the power consumption of the diagnostic system. Therefore, researching a diagnostic method that can accurately and reliably identify subsea drilling rig main shaft bearing faults against a strong noise background while also ensuring low energy consumption is of practical significance for guaranteeing the safety of marine engineering equipment and improving resource extraction efficiency.
Methodologies for bearing fault diagnosis are typically bifurcated into model-driven and data-driven categories. Model-driven techniques depend heavily on detailed physical models and expert-level domain knowledge, which complicates the creation of precise models for harsh operational settings, such as deep-sea conditions. In contrast, data-driven methodologies leverage bearing fault training data to model the intricate, non-linear mapping from raw sensory input to the fault dimension, reducing the dependency on a priori knowledge. These methodologies are generally categorized into Machine Learning (ML) and Deep Learning (DL) branches. Classic ML approaches, like Principal Component Analysis (PCA) [
3,
4,
5], Support Vector Machine (SVM) [
6,
7,
8,
9], and k-Nearest Neighbor (KNN) [
10,
11,
12], are adept at identifying lower-level patterns from the data. Their performance, however, is contingent upon the quality of hand-crafted feature extraction. For these methods, it remains difficult to accurately characterize and differentiate weak, highly-distorted fault features, especially within the complex high-pressure, high-noise environments.
Deep Learning methods, through complex network architectures, can adaptively represent the internal correlations of the data and extract deep and robust hidden details from raw data, effectively bypassing the strong reliance of traditional methods on manual expertise. For example, Convolutional Neural Network (CNN) [
13,
14,
15,
16] excel in feature extraction, while Long Short-Term Memory network (LSTM) [
17,
18,
19] are proficient at capturing the temporal dependencies of vibration signals. Furthermore, Autoencoders (AE) [
20,
21,
22,
23] and Deep Belief Networks (DBN) [
24,
25,
26] are also frequently used for feature dimensionality reduction and deep learning.
Building on these advances, and in view of the specific characteristics of underwater environments, existing studies have optimized deep learning models. For example, CNN with multisensor feature fusion have been employed to diagnose faults in autonomous underwater vehicles (AUVs) [
27,
28] or thruster blades [
29], and other studies have integrated convolutional Kolmogorov–Arnold networks (CKANs) with Squeeze-and-Excitation networks (SENets) to address the diagnosis of bearings in deep-sea propulsion systems [
30]. Although these deep-learning-based methods can achieve high diagnostic accuracy, they often rely on powerful computing resources. This dependence conflicts with the very limited hardware capabilities and power supply of deep-sea edge devices, which restricts their practical application in deep-sea engineering scenarios.
As a third-generation neural network model, the spiking neural network, with its event-driven computation paradigm and ability to process complex spatiotemporal information, has been widely used in bearing fault diagnosis and has demonstrated competitive accuracy on benchmark datasets [
31,
32]. Thanks to the feature of emitting spikes only when the membrane potential exceeds a threshold, SNNs naturally enjoy sparse computation, strong noise robustness, and low power consumption. However, existing SNN-based diagnostic methods are mainly developed for onshore operating conditions and overlook the stringent energy-efficiency requirements of deep-sea environments. These models typically adopt a non-spiking “readout layer” at the output, causing membrane voltage amplitudes to substantially exceed the effective thresholds required for classification decisions. Such “over-confident representations” imply that the model continues to perform redundant computations even after the diagnostic outcome has become certain, thereby incurring unnecessary energy consumption. Therefore, this paper focuses on the main-shaft bearings of subsea drilling rigs and proposes a bearing fault diagnosis method that combines an adaptive-threshold k-winner-take-all (
k-WTA) mechanism with population coding. In contrast to existing SNNs that adopt generic fixed-threshold encoding schemes, we introduce an adaptive-threshold
k-WTA strategy which, through lateral competition, actively sparsifies spike firing and constrains membrane voltage amplitudes without compromising information integrity, thereby fundamentally reducing the power wasted by computational redundancy.
We conducted comprehensive experimental validation of the proposed method on two types of dataset: one consists of publicly available land-based benchmark datasets (including CWRU and Paderborn University dataset), and the other consists of a self-developed deep-sea drilling platform bearing fault dataset, which simulates real-world operating conditions. In terms of diagnostic accuracy and generalization ability, the method achieves state-of-the-art (SOTA) levels of high accuracy on the CWRU benchmark, while its differentiated performance on the PU dataset emphasizes the critical importance of feature selection tailored to specific operating conditions. Regarding noise robustness, the method significantly outperforms baseline models on artificially noise-augmented datasets, and more importantly, it maintains high diagnostic accuracy on the self-collected deep-sea testbed dataset, which inherently contains strong noise and signal attenuation, demonstrating its robustness in real-world harsh environments. Finally, in terms of energy efficiency, the proposed method shows a clear advantage over other deep learning approaches on the CWRU dataset, highlighting its potential for low-power monitoring applications in deep-sea environments.Based on the existing research, the main contributions of this paper are concentrated in the following three aspects:
Targeted Methodological Application: This work tailors and optimizes a spiking neural network–based fault diagnosis scheme specifically for the challenges faced by main-spindle bearings of subsea drilling rigs under deep-sea operating conditions.
Technical Contribution: We propose a novel encoding strategy that integrates Gaussian Receptive Field-based encoding with the adaptive-threshold k-Winner-Takes-All mechanism in SNN. This innovative combination effectively optimizes energy efficiency by reducing pulse amplitude and frequency without compromising the transmission of information.
Validation and Proof: Through extensive experiments on both publicly available benchmark datasets (CWRU and PU) and a self-developed testbed dataset, we systematically validate the performance of the proposed method across three core metrics: diagnostic accuracy, noise robustness, and energy efficiency. The experimental results demonstrate that the proposed method offers a comprehensive advantage in terms of accuracy, robustness, and power consumption, showcasing its potential for low-power monitoring applications in deep-sea environments.
The remainder of this article is rolled out as follows.
Section 2 provides a detailed description of the proposed method, including signal preprocessing and feature extraction techniques tailored to the characteristics of deep-sea environments, the core spike encoding mechanism, the SNN neuron model, and the model’s training strategy.
Section 3 presents the experimental study, which focuses on three core dimensions: diagnostic accuracy, noise robustness, and energy efficiency. Finally,
Section 4 briefly summarizes the research findings and discusses future practical engineering applications.
2. Methodology
The proposed diagnostic procedure for bearing faults is outlined in
Figure 1. First, the raw vibration signal of the subsea drilling rig main shaft bearing is collected via an accelerometer. Second, to extract high-quality input fault features, the Local Mean Decomposition (LMD) method is applied to the raw vibration signal, decomposing it into a linear combination of a series of Product Functions (PFs). Third, the extracted features are transformed into stable, high-information-content, and low-amplitude spike information through Gaussian Receptive Field Population Encoding combined with the Adaptive Threshold k-WTA mechanism. Subsequently, the SNN is trained using the Surrogate Gradient-based Backpropagation (BP) algorithm. Fourth, the encoded features are input into the trained SNN model to achieve the final bearing fault diagnosis. The key technical aspects are detailed in the subsequent sections.
2.1. LMD-Based Feature Extraction
The complexity of stratum drilling, rolling element skidding, and faults generating multiple shocks, all overlaid by the complex background noise of the subsea drilling rig, results in the main shaft bearing fault vibration signals having complex and overlapping frequency bands. To minimize the influence of non-linear factors and meet the subsequent requirements for stability and linearity of the input signal, preprocessing the raw vibration signal with a suitable signal processing algorithm is necessary. A variety of signal processing techniques are prevalent in the literature for bearing fault diagnosis, including established methods like Fourier Transform (FT), Wavelet Transform (WT), Empirical Mode Decomposition (EMD), and LMD. Nonetheless, these algorithms exhibit certain inadequacies when applied to the highly complex vibration signals originating from subsea main shaft bearings. As an example, the FT, while capable of intuitively presenting the frequency makeup of a signal, is inherently ill-suited to processing non-stationary signals. Given that subsea drilling rigs operate in complex geological strata, with continuous changes in shock, vibration, and load during the drilling process, the signals lack the required statistical properties of stationary signals. The WT is capable of scale contraction, allowing it to analyze local features of a signal in both time and frequency, making it suitable for non-stationary signals. Nevertheless, the empirical nature of selecting the wavelet basis function and the complex computation of the Continuous Wavelet Transform limit its application in real-time scenarios. The EMD often suffers from the problem of mode mixing when dealing with pulse interference and strong noise in the vibration signals of subsea drilling rig bearings, which can render the analysis results unusable. The LMD method can decompose non-linear signals with multiple superimposed vibration modes into a series of single-component signals, each possessing an instantaneous amplitude and instantaneous frequency. These components are s, which can be represented in the form of an Amplitude Modulation (AM) signal multiplied by a Frequency Modulation (FM) signal. By adopting a more localized and robust decomposition strategy, LMD generally proves advantageous in processing complex signals. Therefore, this paper will process the raw subsea drilling rig bearing vibration signal using LMD and extract the fault features from the resulting Product Functions. The brief implementation procedure is described below.
For a given signal
, the first step is to identify all its local extreme points
. Based on these extreme points, two numerical values are constructed: the local mean
and the local envelope
:
where
i denotes the index of the PF,
j denotes the iteration number during the decomposition process and
k denotes the index of the extreme value.
Using the moving average method to generate the local mean function
and the envelope function
, and subtracting the local mean function
from the signal
, yields a detrended signal
:
Normalize the detrended signal
by dividing it by the local envelope function to obtain an ideal purely Frequency-Modulated signal
:
If
does not satisfy the criterion for a purely FM signal, then
is taken as the new input signal, and the aforementioned Steps 1–3 are repeated until the convergence condition is met. If the iteration converges, the final purely FM signal
is multiplied by the product of all local envelope functions
obtained during all iteration steps. The result is the final Product Function
:
where
l refers to the maximum number of iterations.
Subtract the extracted
from the original signal
to obtain the residue signal
:
Take
as the new input signal, and repeat the entire aforementioned procedure to extract the next PF component, until the final residue signal
becomes a monotonic function and can no longer be decomposed. The original signal
will then be decomposed as:
To ensure the reproducibility of the LMD decomposition process, the detailed parameter settings used in this study are listed in
Table 1.
The key parameters listed in
Table 1 control the behavior and stopping criteria of the LMD decomposition. First, to mitigate edge effects, the signal endpoints are treated as pseudo-extrema, enabling a stable estimation of the local envelope and mean near the boundaries. Within the inner sifting process, a maximum of
smoothing iterations and
envelope iterations are permitted for each PF. To prevent excessive computation without significant quality gain, the sifting terminates early if the variations in the envelope and modulation signal fall below the tolerances
and
, respectively. Globally, the decomposition depth is capped at
, ensuring that only a finite number of physically meaningful oscillatory modes are extracted. For the subsequent feature extraction, only the first five components (PF
1–PF
5) and the final residual are retained, as shown in
Figure 2. This selection is justified because these initial components capture the majority of the vibration energy and fault-related modulation signatures, whereas the discarded higher-order PFs primarily represent slowly varying trends with negligible diagnostic value.
In the high-pressure fluid environment of the deep sea, the high-density fluid acts as a damping medium, leading to significant attenuation of bearing vibration energy. The associated fluid–structure interaction also modifies the time-varying stiffness and modal characteristics of the system. Therefore, when constructing the feature set, we give priority to dimensionless statistical descriptors that characterize the intrinsic waveform shape and periodicity of the PF components and are invariant to uniform scaling of the vibration amplitude, namely kurtosis, crest factor, impulse factor and spectral kurtosis. In addition, we include spectral energy as a complementary feature to retain information about the absolute vibration energy in the frequency domain under strong attenuation. Consequently, five features are extracted from each PF component and the residual signal—kurtosis, crest factor, impulse factor, spectral energy and spectral kurtosis—as summarized in
Table 2. These five raw features are not used directly. Before being fed to the SNN coding unit, each feature dimension is min–max normalized so that its values lie in the range
. see Equation (
9) for the exact normalization formula.
The variables used in
Table 2 are defined as follows.
N is the total number of sampling points in the signal segment.
is the discrete time-domain vibration signal of the PF component or residual, measured as bearing acceleration in units of m/s
2.
is the mean value of
, and
is its root-mean-square (RMS) value, both sharing the same unit as
.
denotes the maximum absolute value (peak amplitude) of the vibration signal
.
denotes the one-sided discrete Fourier transform (DFT) coefficient of
at the
k-th frequency bin (
), with units of m/s
2.
is the corresponding power spectrum magnitude with units of (m/s
2)
2, and
denotes its mean value over all
frequency bins.
2.2. SNN Configuration for Subsea Diagnosis
The computation of the Spiking Neural Network (SNN) is based on discrete action potentials rather than continuous values, simulating the behavior patterns of biological neurons. In the SNN, information is transmitted in the form of spikes from presynaptic neurons to postsynaptic neurons. The logical core lies in the dynamic change of the membrane potential.
When a postsynaptic neuron receives any incoming spike, its membrane potential increases accordingly. The function of this membrane potential is the continuous accumulation and integration of all received signals over time. Once the neuron’s membrane potential reaches and surpasses a predetermined firing threshold, it immediately responds by emitting a new output spike. Immediately thereafter, the neuron’s membrane potential is reset to its resting potential state in preparation for the next round of information processing. The architectural design of the SNN and this event-driven spike generation and reset mechanism are illustrated in detail in
Figure 3.
2.2.1. Proposed Encoding Mechanism
Unlike the continuous floating-point activation value transmission mode of an Artificial Neural Network (ANN), the SNN utilizes discrete spike emissions occurring at precise time points to transmit complex information. This event-driven computational style and spike sparsity allow SNN to reduce computational activity, achieving the advantages of low power consumption and high efficiency. Therefore, for classifying subsea drilling rig bearing faults using the SNN, an appropriate encoding scheme is required to convert the aforementioned feature information into spike sequences.
Current spike encoding methods are mainly categorized into Rate Coding, Temporal Coding, and Population Coding. Rate Coding expresses information through the average spiking activity but does not rely on precise spike timing, offering strong robustness. However, it requires prolonged integration and statistics collection, leading to high latency, and the frequent spiking activity results in higher computational power consumption.Temporal Coding utilizes the precise time of spike firing to encode information. It can transmit information with the minimum number of spikes, achieving extremely low computational power consumption and ultra-low latency response speeds. Nevertheless, the limitation of Temporal Coding is its reliance on precise spike timing, making it more sensitive to noise caused by the environment or hardware, and typically requiring more complex architectures and algorithms for processing. Population Coding represents information through the coordinated activity of a group of neurons, achieving strong robustness against random noise and perturbations, as well as accurate representation of feature values. When combined with sparse representation, it can also bring about significant low-power consumption advantages.
Therefore, we propose a spike encoding method based on Gaussian Receptive Fields combined with an Adaptive Threshold k-Winner-Take-All (k-WTA) mechanism. This method skillfully utilizes the feature redundancy of Population Coding and the activation sparsity of the Adaptive k-WTA mechanism to simultaneously satisfy the dual requirements of anti-noise capability and low power consumption in the deep-sea edge computing environment. The implementation steps are as follows:
First, to map features of different scales to a unified range, the entire dataset must be scaled globally, as follows:
where
and
are the minimum and maximum values in the extracted feature set, respectively.
Set the encoding population
to consist of
N neurons, where each neuron defines a unique Gaussian tuning curve
T. The center position
and the Gaussian receptive field width
for the
i-th neuron are defined as:
where
is a hyperparameter.
Substitute
into the receptive field defined by
N Gaussian tuning curves to calculate the raw activation intensity
for each neuron:
where
is the maximum activation intensity, which is usually set to 1.
Perform Adaptive Threshold
k-WTA screening on the activation values to retain the
K neurons with the highest activation intensity in the set
.
Define the
K-th highest activation intensity
as the adaptive threshold
. The final output intensity
is then derived through a comparison operation with the adaptive threshold
:
The final feature
F is then encoded into a spike sequence
. For instance, a normalized real-valued feature of
is encoded within a Gaussian receptive field consisting of
neurons. Subsequently, the activation values are sparsely filtered using the Adaptive Threshold mechanism and the
k-WTA competitive strategy, yielding a sparse output vector (
). This encoding and competition process is detailed in
Figure 4.
2.2.2. AdEx Neuron Model
This paper adopts the Adaptive Exponential Integrate-and-Fire (AdEx) [
33] model as the neuron model for the SNN, as this model strikes a good balance between dynamical richness and computational complexity. The AdEx model functions as a hybrid dynamical system, where its behavior is collectively defined by continuous threshold dynamics and discrete spike firing events. The formula for the discrete-time update of the membrane voltage
of the
j-th postsynaptic AdEx neuron is defined as:
where
is the leak conductance,
is the resting potential,
is spike sharpness,
is the effective threshold potential,
is the synaptic weight between the
i-th presynaptic neuron and the
j-th postsynaptic neuron,
is the spike sequence output of the
i-th presynaptic neuron and
is the adaptation variable, defined as follows:
where
is the time step,
is the adaptation time constant, and
b is a hyperparameter.
The triggering condition for the AdEx neuron’s spike firing event is when the membrane potential exceeds a set peak voltage
. The instantaneous reset rules are as follows:
where
is the output value of the
j-th postsynaptic neuron at time step
,
is the resting voltage, and
a is a hyperparameter.
2.2.3. Model Training Strategy
Currently, there are three main types of mainstream training algorithms for Spiking Neural Networks: the bio-inspired Spike-Timing-Dependent Plasticity (STDP) [
34], the indirect ANN-SNN Conversion method [
35], and the direct training approach of Surrogate Gradient Descent [
36]. The STDP algorithm boasts high biological plausibility but often suffers from limited diagnostic accuracy, making it difficult to satisfy the requirements of industrial scenarios. While the ANN-SNN conversion method can inherit the high accuracy of the source ANN, its drawback is that the converted SNN often requires consuming many time steps during inference to precisely approximate the source network’s performance. This dependency leads to high latency in model inference.
Given the practical demands for real-time performance and low power consumption in subsea drilling rig fault diagnosis, the Surrogate Gradient Descent method—as a direct training approach—is able to achieve high diagnostic accuracy while constructing low-latency, high-energy-efficiency diagnostic models. This makes it the optimal choice for simultaneously addressing the performance and edge deployment efficiency requirements for subsea bearing fault diagnosis.Therefore, this paper will adopt Surrogate Gradient Descent as the training method for the Spiking Neural Network. A brief introduction to the SNN training method based on Surrogate Gradient Descent is presented below.
For multiclass classification tasks like bearing fault diagnosis, the most commonly used and robust choice is the combination of Cross-Entropy Loss and Softmax. Their definitions are as follows:
where
is the one-hot encoding of the true label,
is the probability of class
j given by the Softmax function, and
is the total number of spikes emitted by the output neuron of class
i.
The complete gradient formula for the hidden layer weights
is expressed as:
where
is the connection weight between the
i-th neuron in the
-th layer and the
j-th neuron in the
l-th layer,
is the error signal of the
k-th neuron in the
-th layer at time step
t,
is the membrane voltage of the
j-th neuron in the
l-th layer at time step
t,
is the firing threshold voltage,
is the spike output of the
i-th neuron in the
-th layer at time step
t, and
is the Arctangent Surrogate Gradient Function, defined by the derivative
as:
In the above expression,
represents the membrane potential relative to the threshold, while
, the adjustable parameter controlling the gradient steepness, is set to 25.
The update formula for the connection weight is as follows:
where
is the learning rate.
4. Conclusions and Future Work
This paper focuses on the health monitoring of main-spindle bearings in subsea drilling rigs. In the deep-sea environment, with high pressure and strong noise, the vibration signals of bearings are severely attenuated and the fault features have a low SNR, which makes fault diagnosis challenging. Existing deep-neural-network-based diagnosis methods have strong feature extraction ability but rely heavily on high-power computing resources and are therefore difficult to deploy widely in subsea monitoring scenarios.
To address this problem, this paper proposes a spiking neural network fault diagnosis method that combines population coding with an adaptive-threshold k-Winner-Take-All (k-WTA) mechanism. The adaptive-threshold k-WTA is introduced into the Gaussian receptive-field population coding process. It not only sparsely selects the activation values but also improves energy efficiency by reducing the amplitude of spike emissions. In this way, the method exploits both the feature representation capability of population coding and the sparse activation provided by the k-WTA mechanism, achieving a fault diagnosis scheme for main-spindle bearings that is both noise-robust and energy-efficient. Experimental results show that the proposed method achieves high diagnostic accuracy and good generalization performance on both the CWRU benchmark dataset and the self-built deep-sea testbed dataset. Moreover, in noise-robustness and energy-efficiency evaluation experiments, the proposed method shows clear advantages over the comparison methods in terms of noise resistance and energy consumption.
For future practical engineering applications, we propose the following concrete implementation guidelines for the design of the health monitoring system of subsea drilling rigs. First, in view of the severe attenuation, distortion, and even loss that fault signals may suffer during transmission over ultra-long umbilical cables, the proposed deep-sea fault diagnosis model should preferably be deployed on underwater embedded nodes for local real-time processing, thereby avoiding the communication burden and signal degradation associated with transmitting raw vibration data to the surface. Second, based on the power estimation formula in Equation (
28), the choice of the time-step number in the SNN hyperparameter configuration should be carefully optimized, to avoid unnecessary energy overhead while maintaining diagnostic accuracy. Finally, following the findings of Ref. [
45], we recommend deploying the proposed diagnostic method on dedicated neuromorphic chips with asynchronous, event-driven operation, so as to fully exploit their sparse-computation advantages, further reduce system-level power consumption through hardware acceleration, and improve monitoring efficiency.