Intelligent Diagnosis of Ship Propulsion Motor Bearings Based on Dynamic Class Weights

Yan, Guohua; Wang, Xiaoding; Liu, Kai; Kang, Jingran; Yi, Xinhua

doi:10.3390/jmse13112204

Open AccessArticle

Intelligent Diagnosis of Ship Propulsion Motor Bearings Based on Dynamic Class Weights

by

Guohua Yan

¹

,

Xiaoding Wang

¹,

Kai Liu

²,

Jingran Kang

³ and

Xinhua Yi

^1,*

¹

Department of Mechanical and Automotive Engineering, NingBo University of Technology, Ningbo 315211, China

²

Navigation Technology Department, Tianjin Maritime College, Tianjin 300355, China

³

Geely Automobile Research Institute (Ningbo) Co., Ltd., Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(11), 2204; https://doi.org/10.3390/jmse13112204

Submission received: 27 October 2025 / Revised: 14 November 2025 / Accepted: 16 November 2025 / Published: 19 November 2025

(This article belongs to the Special Issue Advances in Marine Electric Propulsion: Technologies, Systems Integration and Sustainable Maritime Transportation)

Download

Browse Figures

Versions Notes

Abstract

As an important part of the ship’s power system, the bearing operation status of the propulsion motor is directly related to the reliability and safety of the whole system. However, in the field of marine propulsion motor bearing fault diagnosis, the data imbalance problem seriously affects the performance of the fault detection model. Due to the scarcity of fault data relative to normal operation data, traditional diagnostic methods are ineffective in dealing with unbalanced data. To solve this problem, a dynamic class weighting solution is proposed. The dynamic class weighting method introduces the weight coefficient λ on the basis of the traditional class weighting, which can adjust the class weight value in real time according to the training situation, and comprehensively considers the data distribution and the training situation to ensure that the model can learn better even in the case of insufficient data. Testing on the imbalanced distribution of bearing natural-failure data shows that the proposed method achieves a 5.25% improvement in diagnostic accuracy compared to direct training. Compared with traditional class-weighted approaches, diagnostic accuracy is enhanced by 3.56%, effectively mitigating the impact of scarce and unevenly distributed failure data on model training.

Keywords:

marine propulsion motors; bearing fault diagnosis; data imbalance; dynamic class weights

1. Introduction

Permanent magnet synchronous motors (PMSMs) are indispensable key devices in modern industry due to their high efficiency, high performance, compact design, and rapid response [1], and play a key role in the aerospace, marine, and automotive fields. As the core component of the modern ship power system, the ship permanent magnet propulsion motor bears the important task of energy conversion and transfer, provides reliable and efficient power for ships, and plays an important role in ship performance, economy and environmental protection. However, due to the special working environment of the ship and the complex and changeable navigational conditions, the propulsion motor works in an unfavourable environment with high temperatures, high humidity, salt spray corrosion, and overload, etc. The bearings are subject to friction and impact from the rotating body over a long period, which can easily damage them, leading to faults. Bearings are the most fault-prone components in motors, accounting for about 30–40% of motor failures [2]. Therefore, it is crucial to ensure the reliability and stability of motor bearings under harsh environmental conditions. There is an urgent need for academia and industry to enhance in-depth research and continuous advancement of fault analysis in this area in order to improve fault diagnosis techniques and methods, effectively address challenges and risks, and ensure the safe operation and sustained performance of ship propulsion systems.

Bearings are important components of electric motors, and researchers and scholars have been conducting extensive studies to improve their performance and reliability. Zhao et al. [3] used the EMD method to process the vibration signals of bearings, calculated the sample entropy of IMF components as fault characteristics, and then combined it with Back Propagation Neural Network (BPNN) to achieve fault identification, while optimising the key parameters in BPNN using a hybrid frog hopping algorithm. Mohammad-Alikhani et al. [4] proposed a generalised fault diagnosis method for deep residual networks with long and short term memory conditioning, which was evaluated using the motor turn-to-turn short circuit fault test and the Case Western Reserve University bearing fault dataset in the USA, and achieved an accuracy of 100 per cent on both datasets, which were different in terms of both fault types and signal types. Verstraete et al. [5] processed the vibration signals of bearings using STFT, CWT and Hilbert–Huang Transform (HHT), respectively, and generated the corresponding 2D images, which were then fed into a CNN architecture for classification and fault diagnosis, and explored the diagnostic effects of three time-frequency analysis methods. Ayas et al. [6] designed a deep residual CNN for bearing fault diagnosis, which transformed vibration signals into grayscale maps as inputs to the network, achieving diagnostic accuracy of 99.98%. Celtikoglu et al. [7] used a total of 18 commonly used time-frequency and frequency domain analysis methods to process the vibration signals of the bearings and generate the corresponding images, respectively, and then used the migration learning method based on the ResNet-99 network to identify the fault types, in order to explore a more suitable signal visualisation method for bearing fault diagnosis. Kim et al. [8] designed a one-dimensional dilated CNN using the original vibration signals of the bearings as inputs to the network, and the diagnostic accuracy reached 100% even in the presence of noise, and the diagnostic accuracy reached 99%. Guo et al. [9] designed a diagnostic method combining a CNN and BiLSTM, using the vibration signals of bearings as inputs to the network, achieving an average diagnostic accuracy of more than 99.78%, which is 96.58% even under strong noise interference. Wang et al. [10] designed a novel multiscale CNN that can be used to analyse vibration signals from bearings, achieving an average diagnostic accuracy of more than 99.78%. The multi-scale CNN network adopts two parallel CNN structures; the information from the two CNN columns during feature extraction can be exchanged in real time, and the diagnostic accuracy reaches 100%. At present, most of the research on bearings is carried out under the circumstance of data reorganisation; however, ship propulsion motors are in normal operation most of the time, and the data of various aspects of the equipment are very sufficient. The vast majority of bearing research is currently conducted with ample data available. However, marine propulsion motors operate under normal conditions for the majority of their service life, during which comprehensive data on all aspects of the equipment is readily available. A fault represents a specific condition occurring during a device’s entire lifecycle, characterised by a very small proportion of operational time and rapid progression. It exhibits numerous fault types yet yields limited and unevenly distributed fault data. In actual ship operations, it proves challenging to obtain abundant and uniformly distributed data samples across all possible motor fault states. The distribution characteristics of normal and fault states of the propulsion motor make the data imbalance problem occur in the sample collection process. Data imbalance is a common and unavoidable problem in fault diagnosis [11,12,13], and the use of imbalanced datasets to train deep learning fault diagnosis models can lead to their failure to adequately learn the distribution of the data, with the model paying excessive attention to categories with many data samples and insufficient attention to categories with few data samples, leading to inadequate learning and overfitting, thus affecting the classification effect of the fault diagnosis model.

Class weighting is an effective measure to address data imbalance by adjusting the class weights in the loss function to give more weight to classes with fewer data samples, thereby balancing the importance of different classes. This method has gained widespread application [14,15]; however, this method only focuses on the distribution of the original data. The traditional class weight method calculates the class weight value according to the number of samples in each category, ignoring the influence of data characteristics and the training process. Deep learning-based fault diagnosis usually requires a large amount of labelled data for training to achieve excellent performance; however, when the amount of data is insufficient, deep neural networks are prone to overfitting, which is especially evident on small-scale datasets. Quality classification results in data-based fault diagnosis classification tasks that require having sufficient training data and ensuring that the number of samples from different classes is balanced and evenly distributed, which helps the network learn balanced features from various types of data during training.

Data sample distribution imbalance has long been a persistent challenge in the field of fault diagnosis. Traditional class weighting methods focus solely on the distribution of the original samples, assigning identical weights to each category during training, which results in inadequate model training [16]. To address this, this paper proposes a solution based on dynamic class weights. The introduced weighting coefficient λ integrates both class weighting factors and the model evaluation metric F₁ score. This enables the model to simultaneously consider the original sample distribution and the training process during training. It dynamically adjusts category weights in real time based on training performance, amplifying the weights of underperforming fault categories in low-sample scenarios. This increased “attention” during training enables the model to learn the fault more thoroughly, mitigating the impact of sample imbalance from both the distribution and training perspectives.

2. Bearing Fault Test

High-quality data is the basis for constructing accurate and robust fault diagnosis models, which can effectively improve the reliability and validity of the models. However, obtaining high-quality bearing fault signals remains difficult due to limitations in data-acquisition techniques and the high cost of acquiring natural fault data. Fortunately, some research institutes and universities have made their own bearing fault diagnosis datasets publicly available, such as Case Western Reserve University (USA) and the University of Paderborn (Germany). Among them, the bearing fault diagnosis dataset from the University of Paderborn in Germany is more comprehensive, containing a variety of bearing fault types, fault severities, and operating conditions, and consists of two parts: the man-made damage fault dataset and the natural damage fault dataset. Lessmeier et al. [14] investigated the cross-application of the man-made and natural datasets by adopting the bearing dataset of the University of Paderborn, i.e., using the man-made damage dataset for training and using the natural damage dataset for testing; however, the diagnostic results were poor under several fault diagnosis methods. The results suggest that there are x-pressure differences between the two fault forms, and the fault diagnosis model that performs well on the man-made damage dataset is not effective when applied to the natural damage dataset. Therefore, in order to be closer to the real ship propulsion motor bearing faults, the natural fault dataset of bearings from the University of Paderborn, Germany, was chosen to carry out the study, so as to better improve the accuracy and reliability of the fault diagnostic model to meet the needs of practical applications.

The bearing accelerated life test rig, shown in Figure 1a, consists of a bearing housing and a drive motor, using grooved ball bearings of type 6203, as shown in Figure 1b. The drive motor provides power for the test bearing in the housing, and a spring-screw mechanism is installed above the housing, through which a radial load force larger than the rated value is applied to the bearing to accelerate the onset of fatigue damage, thereby simulating the natural failure of the bearing.

In total, the tests yielded a variety of failure locations and forms, including one and multiple damages on the inner ring, one and multiple damages on the outer ring, and damages on both the inner and outer rings. The damage grades describe a standardised level of damage, independent of bearing size. These grades are based on damage length, and failures are categorised into five grades based on the size of the percentage of damage length to the circumference of the bearing pitch to indicate the severity of bearing failure. Once a specific bearing type has been identified, it must be converted to its actual size. Table 1 shows the classification of failure severity levels and the range of damage lengths for test bearings within the corresponding levels.

Following accelerated life testing to induce natural failures in bearings, vibration data under various fault conditions are subsequently obtained via a bearing test rig. The rig comprises modules, including an electric motor, a torque measurement shaft, a rolling bearing test module, a flywheel, and a load motor, as illustrated in Figure 2. The drive motor is a 425 W permanent magnet synchronous motor (PMSM) with a rated torque T = 1.35 Nm, rated speed n = 3000 rpm, and rated current I = 2.3 A. Bearing housing acceleration was measured using a piezoelectric accelerometer (Model 336C04, PCB Piezotronics, Depew, NY, USA) coupled with a charge amplifier (Model 5015A, Kistler Group, Winterthur, Switzerland). The flywheel and load machine, respectively, simulated the inertia and load of the driven equipment. The load motor was a permanent magnet synchronous motor with a rated power of 1.7 kW and a rated torque of 6 Nm. By installing ball bearings exhibiting different damage types within the bearing test module and adjusting the load motor torque, multiple faulted bearings can be tested under diverse operating conditions.

The test rig replicates four typical operation conditions by adjusting the drive motor’s rotational speed and load torque: a load torque of 0.7 Nm at 900 r/min, and load torques of 0.1 Nm and 0.7 Nm at 1500 r/min. Detailed bearing test conditions are specified in Table 2.

The vibration signals are collected at 64 kHz, 4 s each time, and 20 times for each operation condition. The bearing has nine states, and each state has three operational conditions. The detailed information of the bearing test is shown in Table 3. A variety of fault types and operating conditions make the bearing’s operating characteristics more diverse, but this also increases the difficulty of fault diagnosis. The greater the number of operating conditions a bearing undergoes, the closer it approximates its actual operation scenario. Therefore, we randomly partition all vibration signals from the same bearing across the four operation conditions into training and test sets, thereby evaluating the model’s diagnostic performance under diverse operation conditions.

3. Fault Diagnosis Model Migration Performance Analysis

Motor bearing is a mechanical component; its fault characterisation is mainly reflected in the vibration signal. Most of the domestic and international studies also use the vibration signal to carry out fault monitoring and diagnostic studies in the literature [3,5,18,19,20,21,22,23] only use the vibration signal to diagnose bearing faults; some fault diagnostic models, even in the presence of noise interference, can still achieve 100% diagnostic accuracy, indicating that the vibration signal contains almost all the information of bearing faults. When using vibration signals for condition monitoring and diagnosis of motor bearings, due to the small size of the balls in the bearings, it is usually necessary to collect a high enough sampling frequency to capture details such as small vibrations, shocks and resonance phenomena during the rotation process so as to provide more accurate information about the bearing condition. When employing vibration signals for motor condition monitoring, bearing faults typically require a higher sampling frequency compared to interturn short circuits and demagnetisation faults.

‘End-to-end’ fault diagnosis predicts faults directly from the raw input data to the final fault diagnosis results without intermediate steps or feature engineering. The advantage of this fault diagnosis approach is that it simplifies the process of system design and diagnosis, and it can automatically learn feature representations and perform pattern recognition through deep learning models. Yan et al. [24] proposed a multi-algorithm fusion fault diagnosis method called ‘MD-CNN-BiLSTM’ for the diagnosis of turn-to-turn short circuit and demagnetisation faults of the ship permanent magnet propulsion motor, whose accuracy rate reaches 98.53%, which has a good diagnosis effect. When vibration signals are used for motor condition monitoring, unlike turn-to-turn short-circuit and demagnetisation faults, bearing faults usually require a higher sampling frequency. The bearing is one of the important components of the motor, and it is necessary to test the migration effect of the model on the bearing.

3.1. Impact Analysis of Sample Duration

The data of 15 s for each operation condition is selected as a fault sample, and 0.1 s vibration signal is selected as a sample length. The dataset is divided into training, testing, and validation sets at a 5:2:3 ratio. The MD-CNN-BiLSTM method is used to train and test the dataset, and its accuracy is 99.94%. Because of the high sampling frequency of the vibration signal, the selection of the sample sampling duration affects the efficiency and accuracy of fault diagnosis. In order to test the effect of different sampling time frequencies of each sample on fault diagnosis, 10 kinds of sample lengths were selected according to the sampling time lengths of 0.01~0.1 s, with 0.01 s as the interval, which were trained and tested, respectively, and the results are shown in Figure 3. The proposed fault diagnosis model also achieves very good diagnostic results on the bearing natural-fault dataset, indicating that the proposed fault diagnosis model has good migration performance. When the sample sampling length is between 0.02 and 0.1 s, it has little effect on the model’s diagnostic performance, and accuracy remains above 99%. When the sample sampling length is lower than 0.02 s, the amount of information contained in each sample decreases as the amount of data decreases, and the network structure is relatively complex, which makes the model’s ability to extract fault characteristic information from a small amount of data decrease, resulting in poorer diagnostic results.

3.2. Generalisation Performance Test

In order to test the generalisation ability of the model, the bearing data under three operation conditions were used for training, and then the data under another operation condition were used for testing. The test was conducted 10 times, and the average accuracy was calculated. The results are shown in Figure 4. The average diagnostic accuracy of the model under 10 sample sampling lengths is above 90%, with good generalisation ability. The diagnostic accuracy of the model increases with the increase in the sample length when the sample sampling length is between 0.01 and 0.06 s. When the sample sampling length is higher than 0.06 s, the average diagnostic accuracy, however, shows a decreasing trend.

3.3. Anti-Noise Test

From Figure 3 and Figure 4, it can be seen that the selection of the sample length affects the effectiveness of model fault diagnosis, and when the sample sampling length is 0.06 s, the model has better diagnostic accuracy and generalisation performance. In order to test the model’s ability to resist noise interference, the sample sampling length of 0.06 s is selected, and the Gaussian white noise signal with SNR of 10~80 dB and 10 dB intervals is added to the original vibration signals of the test set, respectively, and the bearing data under the three operation conditions (operation condition 1, 2, 3) are used for training and validation, and then the data under the other operation condition (operation condition 4) with the addition of different strengths of noise are used for testing, and the specific distribution of the dataset is shown in Table 4.

Each group of noise intensity test is also carried out 10 times, and the average accuracy is shown in Figure 5. The model can still maintain more than 90% diagnostic accuracy under noise interference between 20 and 80 dB, indicating that the model has both better noise resistance and generalisation ability. When the noise intensity is higher than 10 dB, the diagnostic accuracy and generalisation ability of the model decrease.

4. Data Imbalance Test for Traditional Class Weights

When the number of samples from different categories in a dataset varies greatly, the model may tend to favour the category with more data, resulting in insufficient learning of the category with fewer samples during training. Class weighting is one of the commonly used techniques when dealing with unbalanced datasets. By giving greater weight to fewer sample categories during training, the model pays more attention to a few categories of samples during training, thus balancing the influence of different categories and improving the model’s ability to learn from them. The class weights define the relative importance of each class for training, and the importance of different classes is balanced by adjusting the weights of the different classes in the loss function, which centres on assigning an appropriate weight value to each class, the value of which is usually proportional to the inverse of the number of samples in each class:

w_{i} = \frac{N}{K \sum_{n = 1}^{N} t_{n i}}

(1)

where w_i is the weight of the i-th category, K is the number of categories; N is the total number of samples; and t_ni is the n-th sample in the ith category.

Most deep learning-based fault diagnosis models are multi-classification networks, where the classification layer is usually located after the Softmax layer, and the cross-entropy function is used as the loss function. During network training, the classification layer computes cross-entropy loss by receiving values from the Softmax layer and assigns each input value to one of the K mutually independent categories using a 1-K coding scheme.

l o s s = \frac{1}{N} \sum_{n = 1}^{N} \sum_{i = 1}^{K} w_{i} t_{n i} l n y_{n i}

(2)

where is the probability that the nth sample output from the Softmax layer belongs to category i.

Various equipment of a ship is in normal operation most of the time, so the monitoring system can obtain a large amount of fault-free data, while the fault data is relatively very scarce. A high-performance condition monitoring system can accurately diagnose early equipment failures in a timely manner, so that repairs or replacements can be carried out even if they are needed to avoid further escalation of failures. However, due to conditions such as navigational conditions or mission requirements, the equipment needs to continue to operate with faults, which makes the faults more and more serious, and even leads to other component failures. Therefore, the amount and types of data for early failures are more than those for serious failures.

4.1. Test Analyses

Considering that the more serious and complex the faults of the equipment, the more difficult it is to obtain and the rarer the data, the training datasets with different percentage intervals are selected, and then a percentage number within the interval is randomly determined, and then samples with this percentage number are randomly selected from the dataset to simulate the data imbalance situation. All the data in the four states are used for training and testing, and the samples of the validation set and test set in each state are 200 and 300, respectively. The selection of the amount of unbalanced data in the training set is specified in Table 5.

Figure 6 shows the number of data imbalance samples and their corresponding class weight values for each class in the training set; the fault severity is inversely proportional to the number of samples, which is consistent with the real fault data distribution. The class with fewer training samples has a larger corresponding weight value and can be more focused by the model during training.

Figure 7 shows the diagnostic results of the model in both cases of adding class weights and no class weights, and the diagnostic accuracies of the model can maintain more than 98%, indicating that the model still has relatively high diagnostic accuracy even in the case of data imbalance when the training set contains all the operation condition data. In cases of data imbalance, adding class weights during training can improve learning for underrepresented classes, thereby enhancing the model’s fault diagnosis performance.

Precision and Recall are two other important metrics used to evaluate the performance of fault diagnosis models. Precision is concerned with the accuracy of the model in predicting positive categories and is defined as the ratio of the number of samples correctly predicted as faults by the model (True Positives, TP) to the number of all samples predicted as faults by the model (True Positives + False Positives, TP + FP), with the mathematical expression:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

where TP is the number of faulty samples correctly predicted as faults by the model, and FP is the number of normal samples incorrectly predicted as faults by the model.

Recall measures the proportion of faulty samples correctly identified by the model to all actual faulty samples and reflects the ability to detect and identify faults. A higher recall rate means that the model is better able to capture fault samples, reduce missed diagnoses, and improve the comprehensiveness and reliability of fault diagnosis:

R e c a l l = \frac{T P}{T P + F N}

(4)

where False Negatives (FNs) is the number of faulty samples that the model predicts as normal.

In fault diagnosis, it is often desirable for the model to have high precision and recall. However, a high precision rate may be accompanied by a low recall rate (i.e., the model may miss some real faults). The F₁ score, which is the reconciled average of the precision rate and the recall rate, is a metric that integrates the precision rate and the recall rate, avoids the limitations of a single metric, and enables a more comprehensive assessment of the model’s classification performance:

F 1 = 2 (\frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l})

(5)

The F₁ score ranges between 0 and 1, with values closer to 1 indicating better performance of the model. When both precision and recall are high, the F₁ score will be closer to 1, indicating that the model performs better in terms of both precision and recall. Figure 8 shows the F₁ scores for each category and the average obtained for both training methods of the model with the introduction of class weights and without class weights. Among the several states of the bearing, the lowest F₁ score is obtained for the inner ring with class 2 failure, which has a small increase after the introduction of class weights. From the average F₁ score, it can be seen that its value has increased to a certain extent after the introduction of class weights, which indicates that the introduction of class weights can improve the diagnostic effect of the model.

4.2. Generalisation Performance Test

Data from three states were used for training and validation, and data from another state was used for testing, so that the generalisation ability of the model could be tested. The samples of the validation and test sets for each state are 225 and 250, and the selection of the amount of unbalanced data in the training set is specified in Table 6.

Figure 9 shows the number of data imbalance samples and the corresponding weight values for each category of the training set, which still follows the principle that there are more normal samples than faulty samples, and more severe faulty samples than minor samples, which is in line with the distribution of real samples.

Figure 10 shows the diagnostic results of the model in both cases of adding class weights and no class weights. Before the introduction of class weights, the model was less effective in diagnosing class 2 faults in the outer ring and class 2 faults in the inner ring, with accuracy rates of 88.4% and 32.8%, respectively. After the introduction of class weights, the diagnostic effectiveness of these two classes of faults is significantly improved to 96.0% and 41.2%, which indicates that adding class weights during training can enhance the generalisation ability of the model and improve diagnostic accuracy. However, the diagnostic accuracy of inner-ring class 2 faults is very low regardless of whether class weights are added or not, and the model misdiagnoses inner-ring class 2 as inner-ring class 3 faults, because both faults are inner-ring faults with very similar vibration characteristics, but inner-ring class 2 faults have more samples, and the model is more sensitive to them, which may result in poorer performance on the category with more samples during testing.

Figure 11 shows the F₁ scores for each category and the average obtained for both training methods of the model, introducing class weights and no class weights. It can be found that the F₁ scores of the inner circle class 2, inner circle class 3 and outer circle class 1 faults are also affected by data balancing, with the inner circle class 2 having the lowest F₁ score, indicating that this category is most affected by data imbalance and has the worst diagnostic effect. The introduction of class weights during the training process can improve the F₁ scores to some extent, i.e., enhance the generalisation ability of the model, compared with the approach without class weights.

5. Data Imbalance Test for Dynamic Class Weights

5.1. Dynamic Class Weights

Too small weights may not be able to solve the class imbalance problem, while too large weights may cause the model to focus too much on a few classes, affecting generalisation performance for the majority. Traditional approaches to class weighting consider only the distribution of sample imbalances in each state of the device and do not take into account data characteristics and training situations. The fault characteristics between multiple states may be very similar; however, due to the effect of fixed class weights, it is easy for a certain class or some classes to have too much or too little weight during the training process, which affects the training effect of the model.

In order to solve the shortcomings of traditional fixed class weights, a dynamic class weighting method is proposed. The main idea of the method is to automatically and dynamically increase the weights of poorly trained classes during the training process so that they can receive more attention from the model and thus be fully learnt. The weight coefficient λ is introduced on the basis of the traditional class weights, which is used to correct the problem of too large or too small weights of certain classes in the training process. The weight coefficient λ consists of class weights w and F₁ scores, and the F₁ scores are usually used to comprehensively assess the diagnostic performance of the model, which can also reflect its training effect, and λ is expressed as follows:

λ_{i n} = \frac{w_{i n}}{F_{1 i n}}

(6)

where λ_in is the weight coefficient of the n-th, i-th class, F₁ in is the F₁ score of the n-th, i-th class, and n is the number of updates of the dynamic class weights. When n is 1, it is the traditional class weight.

After each update, the new class weights are as follows:

w_{i n + 1} = w_{i n} λ_{i n}

(7)

By introducing the weight coefficient λ, it is able to adjust the class weight value according to the training situation. The weight coefficient λ enables increasing the weight of the poorly trained class in real time during the training process, and the worse the training effect, the larger its class weight becomes, ensuring the model learns better.

5.2. Test Analyses

All the data in the four bearing states are used for training and testing, and the validation set can reflect the current training effect and generalisation ability of the model, thus helping to adjust the model structure and parameters. The F₁ scores of the validation set during the training process are used to adjust the dynamic weight coefficient λ. The validation set is validated every 2 iteration cycles during training, and the F₁ scores of the validation set in the 10th, 20th, 30th, and 40th iteration cycles are selected to update the dynamic class weights, i.e., the new current class weights are calculated according to Equations (6) and (7). Table 7 shows changes in the class weights during the training process. Dynamic class weights can automatically adjust the class weight values according to the training effect during the training process to avoid the problem that some class weight values are too small. As the training proceeds, the class weights are constantly updated, and the F₁ score of the validation set then gradually converges to 1, indicating that the training effect improves.

Figure 12 shows the diagnostic accuracy of the model introducing dynamic class weights, which reaches 99.63%, and provides very good diagnostic results. Compared with the diagnosis without class weights and traditional class weights, the proposed dynamic class weights method is more effective.

Figure 13 shows the F₁ scores of the test set after the introduction of dynamic class weights to the model, and the F₁ scores for each state of the bearing are very high, indicating that the trained model has both very good precision and recall and good overall performance.

5.3. Generalisation Test

Data from three states were used for training and validation, and then data from another state was used to test the generalisation ability of the model. The F₁ scores of the validation set during training were also used to dynamically weight the coefficients λ. The class weights were updated at the 10th, 20th, 30th and 40th iteration cycles, and Table 8 shows the changes in the class weights during training. The dynamic class weights are continuously updated as the training proceeds, and the F₁ scores for each class in the validation set gradually converge to 1, indicating that the accuracy and precision of the validation set are very high, and the training effect is better.

Figure 14 shows the diagnostic results of the model under brand-new operating conditions, and the diagnostic accuracy with the introduction of dynamic class weights reaches 96.49%, which has a very good generalisation ability. The inner circle class 2 fault is still the most misdiagnosed category; however, compared with the diagnostic results in Figure 6, Figure 7 and Figure 8 with no class weights and traditional class weights, the accuracy of this category has been greatly improved, with an accuracy of 72.4%, and the overall diagnostic accuracy has been increased by 3.56%.

Figure 15 shows the F₁ scores for the test set after the introduction of dynamic class weights, and the inner circle class 2 and inner circle class 3 faults, although still two of the lower overall classes, show a very large improvement over the F₁ scores of Figure 11 with no class weights and traditional class weights, suggesting that the dynamic class weights also improve the model’s performance by increasing both the precision and the recall of the model.

6. Conclusions

Class weighting is an effective measure to address data imbalance by adjusting the weight value of each class in the loss function to give more attention and importance to the classes with fewer data samples, thus balancing the importance of different classes. The traditional class weighting method calculates class weights based on the number of samples in each category; however, it ignores the influence of data characteristics and the training process. To address this issue, a dynamic class weighting method is proposed that enables traditional class weighting methods to adjust weight values.

A class weight coefficient λ is introduced on the basis of traditional class weights, which is obtained from the F₁ scores of the validation set during the training process, because the validation set reflects the training effect of the model, and the F₁ scores comprehensively assess the classification results of the validation set. The class weight coefficient λ is used to update the class weight value according to the validation set results of a specific iteration cycle during the training process so that poorly trained classes receive a larger weight and gain more attention from the model and thus are adequately learnt. The experimental results show that the method of dynamic class weighting can effectively improve the training effect of the model, and the diagnostic results under brand-new operation conditions show that the accuracy rate is 5.25% higher than that of no class weighting and 3.56% higher than that of the traditional class weighting method, which makes the model have better generalisation performance.

Author Contributions

Methodology, G.Y. and J.K.; Software, G.Y.; Validation, J.K.; Formal analysis, X.W. and K.L.; Writing—original draft, G.Y.; Writing—review & editing, X.W. and K.L.; Visualization, X.W.; Supervision, X.Y.; Project administration, X.Y.; Funding acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by [Ningbo Key R&D Programme Project] grant number [2023Z036].

Data Availability Statement

The data presented in this study are openly available in [Data Sets and Download] at [https://mb.uni-paderborn.de/kat/forschung/kat-datacenter/bearing-datacenter/data-sets-and-download].

Conflicts of Interest

Author Jingran Kang was employed by the company Geely Automobile Research Institute (Ningbo) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Orlowska-Kowalska, T.; Wolkiewicz, M.; Pietrzak, P.; Skowron, M.; Ewert, P.; Tarchala, G.; Krzysztofiak, M.; Kowalski, C.T. Fault Diagnosis and Fault-Tolerant Control of PMSM Drives–State of the Art and Future Challenges. IEEE Access 2022, 10, 59979–60024. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Deep Learning Algorithms for Bearing Fault Diagnostics—A Comprehensive Review. IEEE Access 2020, 8, 29857–29881. [Google Scholar] [CrossRef]
Zhao, Z.; Xu, Q.; Jia, M. Improved shuffled frog leaping algorithm-based BP neural network and its application in bearing early fault diagnosis. Neural Comput. Appl. 2016, 27, 375–385. [Google Scholar] [CrossRef]
Mohammad-Alikhani, A.; Nahid-Mobarakeh, B.; Hsieh, M.F. One-Dimensional LSTM-Regulated Deep Residual Network for Data-Driven Fault Detection in Electric Machines. IEEE Trans. Ind. Electron. 2024, 71, 3083–3092. [Google Scholar] [CrossRef]
Verstraete, D.; Ferrada, A.; Droguett, E.L.; Meruane, V.; Modarres, M. Deep Learning Enabled Fault Diagnosis Using Time-Frequency Image Analysis of Rolling Element Bearings. Shoc. Vib. 2017, 2017, 5067651. [Google Scholar] [CrossRef]
Ayas, S.; Ayas, M.S. A novel bearing fault diagnosis method using deep residual learning network. Multimed. Tools Appl. 2022, 81, 22407–22423. [Google Scholar] [CrossRef]
Deveci, B.U.; Celtikoglu, M.; Albayrak, O.; Unal, P.; Kirci, P. Transfer Learning Enabled Bearing Fault Detection Methods Based on Image Representations of Single-Dimensional Signals. Inf. Syst. Front. 2024, 26, 1345–1397. [Google Scholar] [CrossRef]
Khan, M.A.; Kim, Y.-H.; Choo, J. Intelligent fault detection using raw vibration signals via dilated convolutional neural networks. J. Supercomput. 2020, 76, 8086–8100. [Google Scholar] [CrossRef]
Guo, Y.; Mao, J.; Zhao, M. Rolling Bearing Fault Diagnosis Method Based on Attention CNN and BiLSTM Network. Neural Process. Lett. 2023, 55, 3377–3410. [Google Scholar] [CrossRef]
Wang, Y.; Cao, G. A multiscale convolution neural network for bearing fault diagnosis based on frequency division denoising under complex noise conditions. Complex Intell. Syst. 2023, 9, 4263–4285. [Google Scholar] [CrossRef]
Van Horn, G.; Mac Aodha, O.; Song, Y.; Cui, Y.; Sun, C.; Shepard, A.; Adam, H.; Perona, P.; Belongie, S. The inaturalist species classification and detection dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8769–8778. [Google Scholar]
Gupta, A.; Dollar, P.; Girshick, R. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5356–5364. [Google Scholar]
Bansal, M.A.; Sharma, D.R.; Kathuria, D.M. A systematic review on data scarcity problem in deep learning: Solution and applications. ACM Comput. Surv. (CSUR) 2022, 54, 208. [Google Scholar] [CrossRef]
Mahajan, D.; Girshick, R.; Ramanathan, V.; He, K.; Paluri, M.; Li, Y.; Bharambe, A.; Van Der Maaten, L. Exploring the limits of weakly supervised pretraining. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 181–196. [Google Scholar]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems 26, Lake Tahoe, NV, USA, 5–10 December 2013. [Google Scholar]
Fan, B.; Ma, H.; Liu, Y.; Yuan, X. BWLM: A Balanced Weight Learning Mechanism for Long-Tailed Image Recognition. Appl. Sci. 2024, 14, 454. [Google Scholar] [CrossRef]
Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the PHM Society European Conference, Bilbao, Spain, 5–8 July 2016; Volume 3. [Google Scholar]
Zhang, W.; Li, X.; Ding, Q. Deep residual learning-based fault diagnosis method for rotating machinery. ISA Trans. 2019, 95, 295–305. [Google Scholar] [CrossRef] [PubMed]
Huang, R.; Liao, Y.; Zhang, S.; Li, W. Deep Decoupling Convolutional Neural Network for Intelligent Compound Fault Diagnosis. IEEE Access 2019, 7, 1848–1858. [Google Scholar] [CrossRef]
Ding, X.; He, Q. Energy-Fluctuated Multiscale Feature Learning with Deep ConvNet for Intelligent Spindle Bearing Fault Diagnosis. IEEE Trans. Instrum. Meas. 2017, 66, 1926–1935. [Google Scholar] [CrossRef]
Yuan, J.; Yao, Z.; Zhao, Q.; Xu, Y.; Li, C.; Jiang, H. Dual-Core Denoised Synchrosqueezing Wavelet Transform for Gear Fault Detection. IEEE Trans. Instrum. Meas. 2021, 70, 3521611. [Google Scholar] [CrossRef]
Smith, W.A.; Randall, R.B. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process. 2015, 64–65, 100–131. [Google Scholar] [CrossRef]
Zhao, Z.; Jiao, Y.; Zhang, X. A Fault Diagnosis Method of Rotor System Based on Parallel Convolutional Neural Network Architecture with Attention Mechanism. J. Signal Process. Syst. 2023, 95, 965–977. [Google Scholar] [CrossRef]
Yan, G.; Hu, Y. Inter-turn short circuit and demagnetization fault diagnosis of ship PMSM based on multiscale residual dilated CNN and BiLSTM. Meas. Sci. Technol. 2024, 35, 046105. [Google Scholar] [CrossRef]

Figure 1. Apparatus for accelerated life time test. (a) Bearing accelerated life test rig [17]. (b) Schematic diagram of grooved ball bearings.

Figure 2. Bearing test rig [17].

Figure 3. Model at different sampling times, diagnostic accuracy and diagnostic time consumption.

Figure 4. Generalisation performance of the model with different sampling times.

Figure 5. Generalisation ability of the model in the presence of noise.

Figure 6. The number of samples in each category of the training set and the corresponding class weight values.

Figure 7. Diagnostic accuracy in two scenarios.

Figure 8. F₁ scores for each state and the average of the bearings.

Figure 9. The number of samples in each category of the training set and the corresponding class weight values.

Figure 10. Diagnostic accuracy in two scenarios.

Figure 11. F₁ scores for each state and the average of the bearings.

Figure 12. Diagnostic accuracy with the introduction of dynamic class weights.

Figure 13. Test set F₁ scores after introducing dynamic class weights.

Figure 14. Diagnostic accuracy with the introduction of dynamic class weights.

Figure 15. Test set F₁ scores after introducing dynamic class weights.

Table 1. Bearing Fault Levels [17].

Damage Level	Percentage of Damage Length to Pitch Circumference	Corresponding Damage Range of Test Bearing
1	0~2%	≤2 mm
2	2~5%	>2 mm
3	5~15%	>4.5 mm
4	15~35%	>13.5 mm
5	>35%	>31.5 mm

Table 2. Test conditions [17].

Test Operation Condition	Speed (r/min)	Load Torque (Nm)	Radial Force (N)
1	1500	0.7	1000
2	900	0.7	1000
3	1500	0.1	1000
4	1500	0.7	400

Table 3. Bearing test details [17].

Status	Damage Level	Code
Health		H
Inner ring damage	1	IR I
	2	IR II
	3	IR III
Outer ring damage	1	OR I
Outer ring damage	2	OR II
Damage to both inner and outer rings	1	IR + OR I
	2	IR + OR II
	3	IR + OR III

Table 4. Bearing fault types and the number of samples.

Fault Type	Training Samples	Validation Samples	Test Sample	Status Code
Health	525	225	250	H
Inner ring level 1	525	225	250	IR I
Inner ring level 2	525	225	250	IR II
Inner ring level 3	525	225	250	IR III
Outer ring level 1	525	225	250	OR I
Outer ring level 2	525	225	250	OR II
Compound fault level 1	525	225	250	IR + OR I
Compound fault level 2	525	225	250	IR + OR II
Compound fault level 3	525	225	250	IR + OR III

Table 5. Training set sample imbalance settings.

Fault Type	Training Samples	Percentage Range (%)	Unbalanced Sample Intervals	Status Code
Health	500	100	500	H
Inner ring level 1	500	20~30	100~150	IR I
Inner ring level 2	500	10~20	50~100	IR II
Inner ring level 3	500	5~10	25~50	IR III
Outer ring level 1	500	20~30	100~150	OR I
Outer ring level 2	500	10~20	50~100	OR II
Compound fault level 1	500	15~20	75~100	IR + OR I
Compound fault level 2	500	10~15	50~75	IR + OR II
Compound fault level 3	500	5~10	25~50	IR + OR III

Table 6. Training set sample imbalance settings.

Fault Type	Training Samples	Percentage Range (%)	Unbalanced Sample Intervals	Status Code
Health	525	100	525	H
Inner ring level 1	525	20~30	105~158	IR I
Inner ring level 2	525	10~20	53~105	IR II
Inner ring level 3	525	5~10	26~53	IR III
Outer ring level 1	525	20~30	105~158	OR I
Outer ring level 2	525	10~20	53~105	OR II
Compound fault level 1	525	15~20	79~105	IR + OR I
Compound fault level 2	525	10~15	53~79	IR + OR II
Compound fault level 3	525	5~10	26~53	IR + OR III

Table 7. Dynamic class weight changes during model training and F₁ scores for the validation set.

Iteration Period	Items	Health	Inner Ring Level 1	Inner Ring Level 2	Inner Ring Level 3	Outer Ring Level 1	Outer Ring Level 2	Compound Fault Level 1	Compound Fault Level 2	Compound Fault Level 3
1	Number of class samples	500	163	97	44	149	97	104	74	56
1	Initial class weights	0.29	0.95	1.44	2.80	0.94	1.32	1.37	1.97	2.47
10	F₁ scores	1.00	1.00	0.72	0.76	0.97	0.99	0.95	1.00	1.00
10	Updated category weights	0.29	0.95	1.92	3.25	0.98	1.36	1.42	1.97	2.48
20	F₁ scores	1.00	0.99	0.83	0.74	0.99	0.98	0.93	0.99	0.99
20	Updated category weights	0.29	0.96	2.34	3.85	0.98	1.38	1.51	1.98	2.47
30	F₁ scores	1.00	1.00	0.95	0.88	0.99	1.00	0.99	1.00	1.00
30	Updated category weights	0.29	0.96	2.71	4.66	1.01	1.38	1.53	1.98	2.47
40	F₁ scores	1.00	1.00	0.99	1.00	1.00	1.00	1.00	1.00	1.00
40	Updated category weights	0.29	0.96	2.73	4.68	1.02	1.39	1.53	1.98	2.48

Table 8. Dynamic class weight changes during model training and F₁ scores for the validation set.

Iteration Period	Items	Health	Inner Ring Level 1	Inner Ring Level 2	Inner Ring Level 3	Outer Ring Level 1	Outer Ring Level 2	Compound Fault Level 1	Compound Fault Level 2	Compound Fault Level 3
1	Number of class samples	525	146	110	45	166	111	117	74	52
1	Initial class weights	0.29	1.02	1.35	3.32	0.90	1.35	1.28	2.02	2.88
10	F₁ scores	0.99	1.00	0.68	0.87	0.92	0.98	0.90	1.00	1.00
10	Updated category weights	0.29	1.03	1.99	3.82	0.97	1.37	1.42	2.02	2.89
20	F₁ scores	1.00	1.00	0.99	0.98	0.92	0.94	1.00	1.00	1.00
20	Updated category weights	0.29	1.03	1.99	3.87	1.06	1.46	1.42	2.02	2.89
30	F₁ scores	1.00	1.00	0.99	0.99	1.00	1.00	1.00	1.00	1.00
30	Updated category weights	0.29	1.03	2.01	3.89	1.06	1.46	1.42	2.02	2.90
40	F₁ scores	1.00	1.00	0.99	1.00	1.00	1.00	0.99	1.00	1.00
40	Updated category weights	0.29	1.03	2.03	3.90	1.06	1.46	1.44	2.02	2.91

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, G.; Wang, X.; Liu, K.; Kang, J.; Yi, X. Intelligent Diagnosis of Ship Propulsion Motor Bearings Based on Dynamic Class Weights. J. Mar. Sci. Eng. 2025, 13, 2204. https://doi.org/10.3390/jmse13112204

AMA Style

Yan G, Wang X, Liu K, Kang J, Yi X. Intelligent Diagnosis of Ship Propulsion Motor Bearings Based on Dynamic Class Weights. Journal of Marine Science and Engineering. 2025; 13(11):2204. https://doi.org/10.3390/jmse13112204

Chicago/Turabian Style

Yan, Guohua, Xiaoding Wang, Kai Liu, Jingran Kang, and Xinhua Yi. 2025. "Intelligent Diagnosis of Ship Propulsion Motor Bearings Based on Dynamic Class Weights" Journal of Marine Science and Engineering 13, no. 11: 2204. https://doi.org/10.3390/jmse13112204

APA Style

Yan, G., Wang, X., Liu, K., Kang, J., & Yi, X. (2025). Intelligent Diagnosis of Ship Propulsion Motor Bearings Based on Dynamic Class Weights. Journal of Marine Science and Engineering, 13(11), 2204. https://doi.org/10.3390/jmse13112204

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Diagnosis of Ship Propulsion Motor Bearings Based on Dynamic Class Weights

Abstract

1. Introduction

2. Bearing Fault Test

3. Fault Diagnosis Model Migration Performance Analysis

3.1. Impact Analysis of Sample Duration

3.2. Generalisation Performance Test

3.3. Anti-Noise Test

4. Data Imbalance Test for Traditional Class Weights

4.1. Test Analyses

4.2. Generalisation Performance Test

5. Data Imbalance Test for Dynamic Class Weights

5.1. Dynamic Class Weights

5.2. Test Analyses

5.3. Generalisation Test

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI