Gearbox Fault Diagnosis Based on Multi-Sensor and Multi-Channel Decision-Level Fusion Based on SDP

: In order to deal with the shortcomings (such as poor robustness) of the traditional single-channel vibration signal in the comprehensive monitoring of the gearbox fault state, a multi-channel decision-level fusion algorithm was proposed based on symmetrized dot pattern (SDP) analysis, with the visual geometry group 16 network (VGG16) fault diagnosis model. Firstly, the SDP method was used to convert the vibration signal of a single multi-channel sensor into an imaging arm. Secondly, the obtained image arm was input into the VGG16 convolutional neural network in order to train the fault diagnosis model that can be obtained. Then, the SDP images of the signals that were to be measured from multiple multi-channel sensors were input into the fault diagnosis model, and the diagnosis results of multiple multi-channel sensors could then be obtained. Experimentally, it was demonstrated that the diagnostic results of multi-channel sensors one, two, and three were more accurate than those of single-channel sensors one, two, and three, by 3.01%, 16.7%, and 5.17%, respectively. However, the fault generation was not generated in a single direction, but rather multiple directions. In order to improve the comprehensiveness of the raw vibration data, a fusion method using DS (Dempster–Shafer) evidence theory was proposed in order to fuse multiple multi-channel sensors, in which the accuracy achieved 99.93% when sensor one and sensor two were fused, which was an improvement of 8.88% and 1.02% over single sensors one and two, respectively. When sensor one and sensor three were fused, the accuracy reached 99.31%, which was an improvement of 8.31% and 6.17% over single sensors one and three, respectively. When sensor two and sensor three were fused, the accuracy reached 99.91%, which was an improvement of 1.00% and 6.74% over single sensors two and three, respectively. When three sensors were fused simultaneously, the accuracy reached 99.99%, which was 8.93%, 1.08%, and 6.81% better than single sensors one, two, and three, respectively. Therefore, it can be proved that the number of sensor channels has a great inﬂuence on the diagnosis results. fusion proposes


Introduction
The gears are important components of gearboxes. Gearbox failures may lead to the failure of the entire mechanical drive system [1]. Among other things, the normal operation of gearboxes mainly depends on the health status of the gears [2]. However, the gears work in a confined and harsh environment, as it is not convenient for maintenance. In addition, gear pitting is one of the common failures that arises from the use of gear transmission systems, and gear pitting tends to cause tooth-surface damage and can result in gear surface spalling, which can bring great losses to the gear transmission system. In case of failure, serious consequences may take place. In this regard, gear failure needs to be detected as early as possible in order to avoid serious mechanical accidents and personnel injuries [3].
Gear diagnostics commonly use vibration signals as monitoring signals [3][4][5]. Gear wear is a common and unavoidable degradation phenomenon that occurs during the life of gear sensing systems, which can lead to failures and economic losses in windmill turbine drive systems. Feng et al. [6] developed a vibration-based health indicator in order to detect gear wear, which can enable reliable and safe gear operation, saving efficiency and cost significantly more than the widely used wear particle analysis method. There are many contemporary methods that are used for gear fault detection, which can be divided into the following three main categories: signal processing methods, physical model-based methods, and data-driven methods. Among them, the first two methods, which are more demanding for operators, require a high reserve of empirical knowledge. Therefore, data-driven methods are popular among contemporary scholars. Such methods include traditional machine learning algorithms and deep learning algorithms [7]; the traditional machine learning algorithms that are used are the multi-back propagation (BP) algorithm [8,9], support vector machine (SVM) [10], etc. Although the traditional machine learning algorithms have achieved excellent results in the field of fault diagnosis, deep learning saves a lot of time by reducing human experience interference and acquiring high-level features directly from data centers. However, the machine learning algorithms still have defects, such as insufficient data processing and the over-fitting or under-fitting of training data, which can be fixed by the deep learning algorithms, therefore, the latter is widely used in the field of fault diagnosis. The popular deep learning algorithms that are used nowadays are convolutional neural networks (CNNs) [11], deep confidence networks (DBNs) [12], and long-and short-term memory (LSTM) [13,14].
Deep learning is widely used in fault diagnosis due to its inherent intelligence [15]. When similarity occurs between detection signals, the requirement of fault classification is higher. Therefore, in order to solve this problem, Gai et al. [16] proposed a sparrow search algorithm (SSA) in order to optimize the parameters of a deep confidence network, which has better feature extraction capability, stability, and detection accuracy compared to other methods. However, DBN cannot use a spatial relative relationship to reduce the number of parameters in order to achieve the extraction of features in each layer. In order to achieve the extraction of features for each layer, Guo et al. [17] designed a method to improve the hierarchical learning rate of convolutional neural networks for the existing problems of automatic feature extraction and low diagnostic accuracy, and applied it to bearing fault diagnosis, and through verification, the proposed method is well suited to the field of fault diagnosis, which can automatically and sensitively extract deep fault features without manual screening, therefore, improving the diagnostic accuracy compared to the common methods. In practical industrial applications, the differences in working conditions and data acquisition environments can lead to variability in the collected data, resulting in the poor performance of the diagnostic methods. In order to address these deficiencies, Xu et al. [18], developed a method that combines CNN with the variational modal decomposition method (VMD) in order to process the original vibration signal directly, in an end-to-end manner, without human intervention, which can make up for the shortcomings of the insufficient features of a single signal and can improve the range capability of the model in order to achieve the fault diagnosis for the bearings under different working conditions. The electro-hydraulic actuator (EHA) is a key component in the equipment that affects the performance of the whole system. In order to improve the reliability of the equipment, data-driven methods have often been used in the past, but it is difficult to build an accurate EHA model due to the complex structure of the system. Miao et al. [19] made use of a deep learning model-based approach for training the output prediction using monitoring data, which can effectively solve the difficulty of model construction. When studying the rolling body defects, the increased computational effort and the reduced classification progress can cause many statistical algorithms to be ineffective, therefore, Ravikumar et al. [20] used CNN and residual learning for local feature extraction and dimensionality reduction in order to solve the sequential data in the task of internal combustion engine gearbox health prediction, which has improved the robustness of the diagnostic model and has saved computational time. The complexity of rotating machinery working conditions leads to different distributions of training and testing samples, and the practical applicability of deep learning models has become an urgent task. There are six available CNNs for the transfer and the classification of convolutional neural networks, which are as follows: ShuffleNet, GppgLeNet, ResNet-18, ResNet-50, VGG-16, and DenseNet-201. By comparing these various algorithms for experimental analysis, it can be observed that the VGG16 has the highest accuracy in the experimental results [21]. The traditional fault diagnosis method, based on deep learning algorithms, is time-domain signal processing, with which it is difficult to intuitively see the health status of the monitored object and demands high requirements from the operator. A good method is that which can intuitively monitor the health status and can reduce the professional technical requirements for the operator. The SDP algorithm that is used in this paper can make up for the defect of showing the health status in the form of time-domain signals. It can show the health status of the detected object more intuitively by image.
The SDP technique converts the raw signal of the time series into an image, and operators with low expertise can visualize the state of the detection object. The previous detection methods for blades were unable to make a sufficiently rapid response to the onset of a stall and to perceive the method of a stall. Bianchi et al. [22] proposed the application of SDP technology by using sound visualization for early stall-warning detection for the early fault diagnosis of cooling axial fans, in the context of strong noise. Based on the visual patterns, the researchers can detect the failure location, therefore, solving the problem that the conventional methods cannot achieve a perception of the failure state. The rolling bearing vibration signal is a typical non-smooth signal, and the traditional time-frequency analysis technique is difficult to apply. Therefore, Sun et al. [23] used a composite method of the SDP method and Harmattan distance collection in order to diagnose the bearing faults and classification, which converts the incoming vibration signal into an image by using the SDP technique and monitors the health of the object more intuitively than the traditional method. The vibration signal is complex, nonlinear, and nonstationary, but it is a common fault diagnosis signal. Li et al. [24] developed a rolling bearing fault diagnosis method (ASDP-DBSCAN) by combining the adaptive symmetric point pattern junction density-based noise application space clustering. The proposed method can improve the diagnosis accuracy and can solve the problem that occurs when the detection object is in a harsh environment, using the traditional time-domain, in that the frequency-domain and the time-frequency domain fault processing methods may lead to error information and the signal, therefore, needs to be denoised in advance when using the traditional methods for fault diagnosis. However, the SDP technique requires no denoising of the signal when the amplitude that is required is lower than the background noise; in addition, the SDP technique is suitable for making subtle differences in the apparent input signal for easy analysis and is, therefore, more effective than the other techniques. The rotational stall is a common fault in centrifugal fans, and many researchers have proposed rotational stall detection methods, but the detection has been seen to be unintuitive. Xu et al. [25] made use of the SDP technique for fault diagnosis by extracting the pressure signal for detecting the rotational stall from a fan, characterizing the signal in a visual form, which is more effective than the other techniques when the effective signal is relatively low compared to the background noise, laying the foundation for centrifuge stall prevention. Transient shock is the main form of rolling bearing failure, and it is usually disturbed by external noise. Therefore, in order to solve this problem, Sun et al. [26] used the SDP method with empirical modal decomposition after the first five eigenmode functions (IMF) improved the Chebyshev distance for fault diagnosis, which is more accurate than the common fault diagnosis. For transient shock, a common solution is to combine the empirical modal decomposition and the principal. The advantages of the component analysis is the analysis of the bearing faults, but the error of this classification effect is larger. In addition, the SDP technique is used to convert the vibration signal into an image, which saves the parameter adjustment time and avoids information loss. The traditional fault diagnosis methods are difficult to deal with in terms of the nonlinear vibration signals. Therefore, Gu et al. [27] designed an SDP technique with a deep convolutional neural network for a variable condition bearing fault diagnosis algorithm; traditional fault diagnosis methods may lead to undesirable or even wrong results for nonlinear signals, therefore, using the proposed method can not only visualize and diagnose the mechanical equipment faults, but can also improve the fault diagnosis results. Although the SDP algorithm can visualize the health status, the fault generation is not generated by a single direction, but rather by multiple directions. The fault signal cannot be fully captured by mounting the sensor at a fixed location, and the accurate diagnosis of the fault cannot be obtained by the data from a single direction sensor. It is not enough to only collect a single sensor signal and convert it into the SDP image in order to assess the health status, which results in low accuracy and unstable diagnostic results. In order to improve the accuracy of the diagnostic results, this paper adopts a decision-level approach of multi-sensor fusion.
Nowadays, most diagnostics are processed by acquiring a single channel vibration signal, which lacks a comprehensive and accurate signal in the acquisition of the raw signal [28]. Many researchers have worked on improving the performance of fault diagnosis models in order to improve the fault diagnosis accuracy but have ignored the limitations of single sensors that are used to describe the fault diagnosis. Multi-sensor decision-level fusion fault diagnosis methods have been introduced in order to improve the accuracy of the fault diagnosis. Compared with single-sensor diagnosis methods, multi-sensor diagnosis methods can obtain more fault information and, therefore, they produce more reliable diagnosis results [29]. The multi-channel signals that are collected by multi-sensors contain more signals than the single-channel signals, therefore, the multi-channel processing method can improve the reliability and the accuracy of the fault diagnosis [30]. With the increase in mechanical detection sensors, Junior et al. [31] selected two multi-channel sensors to detect and to diagnose the motor without using the faults, which saves time when compared to the traditional multiple single-head sensors and allows the separate processing of each sensor channel, increasing the feature extraction and improving the fault identification. Li et al. [32] designed a fault diagnosis method of multi-channel featurelevel fusion in order to detect the problem of sudden bearing failure, where multiple sensors are usually set up in order to evaluate the rolling bearing health status so as to improve the stability and the reliability of the detection results. However, the featurelevel fusion is located at the middle level of the fusion theory hierarchy, which requires a certain feature screening capability and is more complicated compared to the decision-level fusion. Inspired by deep learning algorithms and multichannel signals for the automatic diagnosis of fault types in complex operating conditions of rotating machinery, He et al. [33] input multi-channel signals into an integrated transport convolutional neural network and used decision-level fusion in order to flexibly fuse the results of each channel, with a higher diagnostic performance than the existing deep migration learning algorithms. Wang et al. [34] addressed the problem that unimodal signals are easily affected by the external environment, which leads to low accuracy, by proposing a convolutional neural network (CNN)-based approach in order to achieve multimodal sensor signal fusion, and the proposed method was experimentally shown to have a higher diagnostic accuracy than that of single sensors. Identifying micro-faults in early rotating machinery is a hot research topic in the field of fault diagnosis, and Gong et al. [35] proposed an improved convolutional neural network-support vector machine (CNN-SVM) method to compensate for the traditional fault diagnosis method that relies on the manual extraction of the fault features by engineers with a priori knowledge. This method is rich in data multidimensionality and can automatically and accurately achieve early fault diagnosis. Deep learning algorithms have high requirements on the amount of training data. Bai et al. [36] developed a composite method of multi-channel convolutional neural networks combined with multiscale limit fusion data in order to enhance the sensor data, which can improve the fault classification accuracy and clustering effect. Yin et al. [37] applied the Dempster-Shafer evidence theory (DS theory) to the fault diagnosis of the interfacial adhesion of honeycomb sandwich structures in order to solve the influence of noise on the diagnosis results, which can improve the diagnostic performance and noise resistance. In fault diagnosis, since it is difficult to detect a single state from a complex state, Tang et al. [38] proposed a composite method of combining random forest and DS evidence theory, which has a higher accuracy compared to methods that use a single sensor. The hydraulic valve occupies an important position in the hydraulic system and, due to its complex structure, it is difficult for the existing methods to detect multiple faults in the hydraulic valve. Ji et al. [39] used an integrated intelligent diagnosis method for hydraulic valve faults based on DS theory. The common integration algorithm integrates multiple information fusion theories into the fault learning algorithm, but the simple information fusion theory cannot effectively solve the unavoidable information source uncertainty problem in the decision support system process. The DS evidence theory algorithm can solve the information source uncertainty problem, and the diagnosis accuracy is, therefore, improved. In order to substitute the use of two or three single-channel sensors to compensate for the low accuracy of a single sensor, we directly use the multi-channel sensors. The multi-channel sensors can ensure the real-time data acquisition and the accuracy, can facilitate the acquisition of the experimental data, and can reduce the error of the original data. In addition, arranging multiple multichannel sensors at different locations can allow the collection of the health status data of multiple locations of the test object, which can more carefully ensure the comprehensiveness of the data and the accuracy of the diagnosis results. Therefore, multiple multi-channel sensors are used in order to collect vibration signals in this paper.
Nowadays, deep learning is very popular in the field of fault identification. In traditional fault diagnosis methods, it is difficult to visualize the gear fault status. However, the SDP-based processing methods can convert the time series signals into visual images, which can realize the directness of the gear fault diagnosis. Researchers are committed to improving the performance of the fault diagnosis models in order to improve the fault diagnosis accuracy, but they ignore the limitations of individual sensors that are used to describe the fault diagnosis. The SDP image processing method can achieve the featurelevel fusion of multi-channel sensor data, but does not consider decision-level fusion. The decision-level fusion is located at the highest level in the fusion theory, which can make the diagnosis results retain the original data state, can reduce the missing data, can ensure that the diagnosis results are comprehensive, and can improve the accuracy of the fault diagnosis. Therefore, this paper proposes a multi-sensor multi-channel decision-level fusion fault diagnosis method by combining the SDP image method with the VGG16 for the current fault diagnosis method that only performs multi-channel feature-level fusion.

SDP Technology
As a new image processing method that presents the original signal of the time series in an intuitive image, the SDP method is able to fully characterize the gear vibration signal. Assuming that the time-domain vibration signal is X = {x 1 , x 2 , . . . , x i , . . . , x n }, the timedomain vibration signal is converted to a polar plot by the SDP technique, and this point is The conversion diagram of the SDP technique is shown in Figure 1.
In Figure 1a, x min and x max are the minimum and maximum amplitudes of the vibration signal, respectively, and x(i) and x(i + l) are the corresponding amplitudes of the vibration signal at time i and time i + l, respectively. The converted coordinates can be obtained by putting the vibration signal through the SDP technique, as shown in Figure 1b. The relationship of the images is built before and after processing with the SDP technique, as follows [40]: 6 of 25 wherein, r(i), θ(i), and φ(i) denote the radius of the polar coordinates and the deflection angles along the mirror symmetry plane, counterclockwise and clockwise, respectively. In addition, θ and ξ are the mirror plane of the symmetry angle and the angular gain factor, respectively, where ξ ≤ θ.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 6 of 2 can be obtained by putting the vibration signal through the SDP technique, as shown in Figure 1b. The relationship of the images is built before and after processing with the SDP technique, as follows [40]: denote the radius of the polar coordinates and the de flection angles along the mirror symmetry plane, counterclockwise and clockwise, respec tively. In addition, θ and ξ are the mirror plane of the symmetry angle and the angula gain factor, respectively, where ξ θ ≤ .

VGG16 Convolutional Neural Network
The VGG16 consists of an input layer, a convolutional layer, and a maximum pooling layer with a ReLU activation function, which is one of the CNNs for feature extraction which finally enters SoftMax in order to achieve the result of the fault classification Among them, the convolutional layer and the pooling layer are crossed with the filte

VGG16 Convolutional Neural Network
The VGG16 consists of an input layer, a convolutional layer, and a maximum pooling layer with a ReLU activation function, which is one of the CNNs for feature extraction, which finally enters SoftMax in order to achieve the result of the fault classification. Among them, the convolutional layer and the pooling layer are crossed with the filter layer for the feature extraction of the data, respectively. The typical structure of the VGG16 is shown in Figure 2.

DS Evidence Fusion Theory
The DS evidence theory is a mathematical theory that was established by Dempster and his student Shafer in the late 1960s and early 1970s [41,42]. This theory enables interevidence fusion without a priori information to improve the diagnostic accuracy.

DS Evidence Fusion Theory
The DS evidence theory is a mathematical theory that was established by Dempster and his student Shafer in the late 1960s and early 1970s [41,42]. This theory enables interevidence fusion without a priori information to improve the diagnostic accuracy. The DS evidence theory mainly consists of a discriminative framework, a basic probability distribution function, and a confidence function.
The discriminative framework is denoted by Θ, which is the set of the class, and the set is mutually exclusive between "2". The structure of the discriminative framework is shown below [43].
The basic probability distribution function is denoted by m, which is the probability that each state in a category has a corresponding probability. It has a mapping relationship with the discriminative framework, as shown in Equation (5) [44] as follows: wherein, A denotes any subset of the discriminative framework and m(A) is the underlying probability of A.
In order to fuse the multiple independent information sources, it is necessary to fuse the underlying probabilities from multiple sensors in order to obtain the final comprehensive and accurate results using the DS fusion rules. The fusion rule is shown in Equation (6) as follows: wherein, k is the conflict factor.

Pitting Fault Identification Based on SDP Technology
Pitting faults are one of the most common failure types within gears, and the accurate identification of a pitting fault can effectively reduce the economic loss and the casualties that are caused by pitting faults. Nonetheless, it is difficult to monitor the pitting fault of gears when they are in normal operation. Given that the pitting condition is not caused by one direction, this knot does not intend to use SDP technology for single-channel and multichannel vibration signals in order to achieve three different degrees of pitting identification.

Experimental Data Acquisition
The present study was conducted for the health diagnosis of three different degrees of pitting in the individual gear teeth of an active gear, namely normal, single pitting, and double pitting, where the gear pitting condition usually occurs first on the root surface of the tooth near the pitch line. A schematic diagram of the three types of pitting in the gear is shown in Figure 3. Figures 3 and 4 are integrated, with Figure 3 representing a pitting schematic where a single gear gradually changes from a normal condition to a double pitting condition, while Figure 4 represents the real situation of gear pitting. In Figure 3, n = 1 ∼ m, where m is the number of all of the teeth on the gear and n indicates a gear tooth with pitting failure on the gear. The arrows in Figure 4 are used to highlight the location of the pitting on the gear teeth.

Experimental Data Acquisition
The present study was conducted for the health diagnosis of three different degree of pitting in the individual gear teeth of an active gear, namely normal, single pitting, an double pitting, where the gear pitting condition usually occurs first on the root surface the tooth near the pitch line. A schematic diagram of the three types of pitting in the gea is shown in Figure 3.   The experimental bench was built on a modified gearbox test bench, which alrea existed in the project team. The three three-direction sensors were installed in three d ferent planes of the gearbox near to the vibration source, and the structure design d gram, for the sake of comprehensively monitoring the gearbox gear teeth health statu was produced as shown in Figure 5. The installation of the structure design was com pleted according to the set up that is shown in Figure 6, and the physical construction the experimental bench was also conducted. The oil was injected at the bottom of the low box of the gearbox at a height of 10 mm, and the oil immersion lubrication was used order to reduce the occurrence of the gear wear state and to ensure the normal operati of the gears.
As required by the partner, the gear working conditions were as follows: The rotati speed was at 900 r/min and 1000 r/min, low loads were applied, and the operation of t gears was also monitored. The load increased from 0 N.m to 30 N.m in steps of 5 N. After several sets of experiments, it was verified that, when the working conditions we at 1000 r/min and the load was 30 N.m, the pitting of the gears under the different cond tions was obvious and could, therefore, help to prevent the appearance of pitting effe tively. The experimental working conditions are shown in Table 1. The experimental bench was built on a modified gearbox test bench, which already existed in the project team. The three three-direction sensors were installed in three different planes of the gearbox near to the vibration source, and the structure design diagram, for the sake of comprehensively monitoring the gearbox gear teeth health status, was produced as shown in Figure 5. The installation of the structure design was completed according to the set up that is shown in Figure 6, and the physical construction of the experimental bench was also conducted. The oil was injected at the bottom of the lower box of the gearbox at a height of 10 mm, and the oil immersion lubrication was used in order to reduce the occurrence of the gear wear state and to ensure the normal operation of the gears.
As required by the partner, the gear working conditions were as follows: The rotation speed was at 900 r/min and 1000 r/min, low loads were applied, and the operation of the gears was also monitored. The load increased from 0 N·m to 30 N·m in steps of 5 N·m. After several sets of experiments, it was verified that, when the working conditions were at 1000 r/min and the load was 30 N·m, the pitting of the gears under the different conditions was obvious and could, therefore, help to prevent the appearance of pitting effectively. The experimental working conditions are shown in Table 1.    The data acquisition method was to first install the BK sensor in the three planes of the gearbox body near to the source of the gearbox vibration. Since the test frequency of gearbox gear teeth was 5 kHZ−10 kHZ, the test frequency was set to 12.8 kHZ in the pulse acquisition system that was used with the BK sensor. For the motor control, the gear working condition was set under the Yblsoft specialized software, and the partner required the speed to be 900 r/min and 1000 r/min, and the load to be increased from 0 N·m to 30 N·m in steps of 5 N·m. Finally, the gear health status was analyzed 30 times from normal, single pitting, and double pitting, and the final result was that the gear was tested at a speed of 1000 r/min and 30 N·m load. Therefore, it was determined that the working condition of the gear in this study was 1000 r/min, 30 N·m load, and 12.8 kHz test frequency. In addition, the data acquisition card that was used was the Type 3053-B-120 manufactured by BK.
After the completion of the data acquisition, one of the more obvious channels of each sensor signal was selected separately as the data input in order to evaluate the gear's health status. In addition, the gear pitting status was indicated by zero, one, and two for normal, single pitting, and double pitting states, respectively. After several experiments, the Y-direction of sensor one, the X-direction of sensor two, and the Y-direction of sensor three were clearly distinguished in these experimental data, therefore, these three channels were selected for each sensor. There were 1000 sets of experimental data, and each set of data consisted of 4096 points. Among them, 800 sets of data were selected to be the training set, the remaining 200 sets were used as the test set, then another 100 sets of experimental data were randomly generated as the validation set for detecting the gear fault diagnosis status. Finally, the accurate values that were obtained from the validation data set were adopted as the evidence body of the DS evidence theory for the sake of achieving decision-level fusion.

Single-Channel Fault Diagnosis
Based on the SDP technique, the choice of parameters for the time lag coefficient l, the mirror symmetry plane angle θ, and the angular gain factor ξ were all important for the fault state identification. In addition, the SDP image was able to accommodate information from multiple measurement points of the gear, contributing to determining the gear fault state more intuitively than the common time-domain vibration signal. Previously [45], a decision-level fusion approach based on a single-channel of sensors for the gearbox gear tooth fault diagnosis was used in order to monitor the gear fault state more comprehensively than a single sensor by fusing the sensors that were arranged in the different planes of the gearbox. Nonetheless, the single-channel of a single sensor that was located in the gearbox lead to inaccurate single-channel diagnosis results as the fault data in the other directions were not collected. Hence, with respect to this problem, a multi-sensor multi-channel fault diagnosis method has been proposed in this paper. Next, a comparison between the single-channel fault diagnosis method and the multi-channel fault diagnosis method for gearboxes will be explained.
Based on the single-channel fault diagnosis algorithm, a mix SDP image fusing the information of multiple single-channels is proposed in this subsection, 1st measuring point converted to darkorange image, 2nd measuring point converted to turquoise image, and 3rd measuring point converted todarkorchid image, as shown in Figure 7.
Based on the single-channel fault diagnosis algorithm, a mix SDP image fusing the information of multiple single-channels is proposed in this subsection, 1st measuring point converted to darkorange image, 2nd measuring point converted to turquoise image, and 3rd measuring point converted todarkorchid image, as shown in Figure 7. Figure 7a depicts the raw vibration signals of the three channels of the multi-channel sensor, and the conversion of the SDP technology can make the raw vibration time domain signals into the SDP images, as shown in Figure 7b, which can then visualize the health status of the detected object.
The SDP image technology belongs to feature-level fusion, but the feature-level fusion is located in the middle level of the fusion theory hierarchy and is more complex compared to decision-level fusion, which requires a certain feature filtering capability. The decision-level fusion theory is the highest-level fusion process, which can fuse various information sources in order to realize the final decision result. Therefore, in this section, single-channel data was selected to be used for mirroring in order to achieve decisionlevel fusion of data. The framework flowchart of the single-channel fault diagnosis is shown in Figure 8.  The SDP image technology belongs to feature-level fusion, but the feature-level fusion is located in the middle level of the fusion theory hierarchy and is more complex compared to decision-level fusion, which requires a certain feature filtering capability. The decisionlevel fusion theory is the highest-level fusion process, which can fuse various information sources in order to realize the final decision result. Therefore, in this section, single-channel data was selected to be used for mirroring in order to achieve decision-level fusion of data. The framework flowchart of the single-channel fault diagnosis is shown in Figure 8. The selection of the parameters for the SDP technique can affect the fault state identification, as shown in Figure 9 for the Y-direction channel of sensor one with different time lag coefficients l and angular gain factors ξ for measuring the SDP image arms with different pitting states. The differences in the varying time lag coefficients l and angular gain factors ξ were manifested in the images in terms of the arm shape, the  Figure 8. Fault diagnosis framework based on single-channel gear pitting level.
The selection of the parameters for the SDP technique can affect the fault state identification, as shown in Figure 9 for the Y-direction channel of sensor one with different time lag coefficients l and angular gain factors ξ for measuring the SDP image arms with different pitting states. The differences in the varying time lag coefficients l and angular gain factors ξ were manifested in the images in terms of the arm shape, the thickness, and the concentration areas. Ten sets of comparison experiments were performed, and the time lag coefficient l gradually rose from 0 to 10 in steps of one and the angular gain factor ξ gradually increased from 0 to 40 in steps of five. Moreover, it was found that the fault states of l = 10, ξ = 35 • in the case of θ = 120 • , were most clearly distinguished. These parameters greatly affected the accuracy of the diagnostic results, which has been reported in previous studies [45]. Then, the Y-channel parameters of sensor one were selected as l = 10, ξ = 35 • . The comparison of the method that has been proposed in the paper has been conducted with a Swin Transformer (S-T). The S-T model achieves a powerful performance on recognition tasks of image classification, target detection, and semantic segmentation. The S-T model not only has the ability to focus on global modeling, but also adopts the method of moving windows in order to achieve cross-window connection, so that the model can focus on the relevant information of the other adjacent windows and can perform cross-window feature interaction, which can, to a certain extent, expand the perceptual field and can improve the computational efficiency. Therefore, two deep learning models have been chosen for comparison in this study, the VGG16 and the S-T, and the parameter settings of these two models are shown in Table 2. The training metrics of the two models are shown in Figure 10. The hardware configurations of the two models run in an environment with a provided NVIDIA GeForce RTX 3080 GPU with 10 GB, Intel processor running at 2.9 GHz with i7-10700.  Figure 9. SDP images' arm of one to three pitting of sensor 1 under different l and ζ.
The comparison of the method that has been proposed in the paper has been conducted with a Swin Transformer (S-T). The S-T model achieves a powerful performance on recognition tasks of image classification, target detection, and semantic segmentation. The S-T model not only has the ability to focus on global modeling, but also adopts the method of moving windows in order to achieve cross-window connection, so that the model can focus on the relevant information of the other adjacent windows and can perform crosswindow feature interaction, which can, to a certain extent, expand the perceptual field and can improve the computational efficiency. Therefore, two deep learning models have been chosen for comparison in this study, the VGG16 and the S-T, and the parameter settings of these two models are shown in Table 2. The training metrics of the two models are shown in Figure 10. The hardware configurations of the two models run in an environment with a provided NVIDIA GeForce RTX 3080 GPU with 10 GB, Intel processor running at 2.9 GHz with i7-10700.   From Figure 10, it can be concluded that the two convolutional neural network structure models have been basically trained and were stable at 1300 iterations. In Figure 10a, the accuracy of the VGG16 and S-T models were 99.84% and 99.63%, respectively; moreover, in Figure 10b, the loss of the VGG16 and the S-T models were 0.0058 and 0.0104, respectively. In the model training time, the VGG16 took about 262 min, but the S-T model took about 392 min, from which it can be known that the S-T convolutional neural network model is more complex than the VGG16 structure. The overall accuracy of the fault diagnosis for the VGG16 and S-T models was 82.98% and 82.12% for each of the three health states of the gear teeth that were tested in 2000 groups, respectively, and the accuracy of the VGG16 algorithm was higher than that of the S-T model, and the model training time and the fault diagnosis time were shorter and more efficient. In addition, the S-T convolutional neural network was more sensitive to the parameter settings, and the learning rate of the two model algorithms expanded by 10 times, respectively, as an example. The fault model training of the VGG16 and S-T models are shown in Figure 11. From Figure 10, it can be concluded that the two convolutional neural network structure models have been basically trained and were stable at 1300 iterations. In Figure 10a, the accuracy of the VGG16 and S-T models were 99.84% and 99.63%, respectively; moreover, in Figure 10b, the loss of the VGG16 and the S-T models were 0.0058 and 0.0104, respectively. In the model training time, the VGG16 took about 262 min, but the S-T model took about 392 min, from which it can be known that the S-T convolutional neural network model is more complex than the VGG16 structure. The overall accuracy of the fault diagnosis for the VGG16 and S-T models was 82.98% and 82.12% for each of the three health states of the gear teeth that were tested in 2000 groups, respectively, and the accuracy of the VGG16 algorithm was higher than that of the S-T model, and the model training time and the fault diagnosis time were shorter and more efficient. In addition, the S-T convolutional neural network was more sensitive to the parameter settings, and the learning rate of the two model algorithms expanded by 10 times, respectively, as an example. The fault model training of the VGG16 and S-T models are shown in Figure 11. From Figure 11, it can be concluded that the two convolutional neural network structure models have been basically trained and were stable at 1300 iterations. In Figure 11a, the accuracy of the VGG16 and S-T models were 99.72% and 33.36%, respectively; moreover, in Figure 11b, the loss of the VGG16 and S-T models were 0.0108 and 1.0963, respectively. When the three health states of gear teeth were tested in 2000 groups using the VGG16 and S-T models, the overall accuracy of the fault diagnosis was 78.08% and 33.32%, respectively. Therefore, considering the accuracy of the diagnosis results, the model train- From Figure 11, it can be concluded that the two convolutional neural network structure models have been basically trained and were stable at 1300 iterations. In Figure 11a, the accuracy of the VGG16 and S-T models were 99.72% and 33.36%, respectively; moreover, in Figure 11b, the loss of the VGG16 and S-T models were 0.0108 and 1.0963, respectively. When the three health states of gear teeth were tested in 2000 groups using the VGG16 and S-T models, the overall accuracy of the fault diagnosis was 78.08% and 33.32%, respectively. Therefore, considering the accuracy of the diagnosis results, the model training time and the diagnosis time, and the sensitivity of the parameters, the VGG16 diagnosis model has a better performance than the S-T diagnosis model, and the VGG16 convolutional neural network algorithm was, therefore, selected for the fault diagnosis in this study.
In the VGG16 diagnostic model, the learning rate was set at lr = 0.005 by adjusting the parameters, and the number of iterations was 1300. The accuracy and the loss of the training results of the three-sensor single-channel model are shown in Figure 12.  From Figure 12, it can be concluded that the individual sensor models have been basically trained and were stable at 1300 iterations. In Figure 12a, the accuracy was 99.84% with a loss of 0.0058; in Figure 12b, the accuracy was 99.67% with a loss of 0.0015; and in Figure 12c, the accuracy was 99.91% with a loss of 0.0048.

Single-Channel Fusion of Multiple Sensors
In order to verify the effectiveness of the already-trained and stable VGG16 fault diagnosis model, 100 sets of data were randomly generated for each fault state of the gear for detection. In this single-channel experiment, three three-phase sensors were selected, and the experimental data in the direction of more obvious state changes were selected in each of the three sensors. Therefore, in the DS evidence theory, the evidence body was E = {E1, E2, E3}, where E1, E2, and E3 denote the data that were collected in the Y-direction, the X-direction, and the Y-direction of sensors one, two, and three, respectively. In the identification frame, Θ = {A, B, C}, where A, B, and C denote the normal state, single pitting, and double pitting of the gear teeth, respectively. The results were obtained by inputting the 100 randomly generated sets of data into the VGG16 fault diagnosis model, which was used as the evidence body of DS evidence theory, and then the decision-level fusion was performed. The accuracy of each sensor before and after the fusion is shown in Figure 13 and Table 3. From Figure 12, it can be concluded that the individual sensor models have been basically trained and were stable at 1300 iterations. In Figure 12a, the accuracy was 99.84% with a loss of 0.0058; in Figure 12b, the accuracy was 99.67% with a loss of 0.0015; and in Figure 12c, the accuracy was 99.91% with a loss of 0.0048.

Single-Channel Fusion of Multiple Sensors
In order to verify the effectiveness of the already-trained and stable VGG16 fault diagnosis model, 100 sets of data were randomly generated for each fault state of the gear for detection. In this single-channel experiment, three three-phase sensors were selected, and the experimental data in the direction of more obvious state changes were selected in each of the three sensors. Therefore, in the DS evidence theory, the evidence body was E denote the data that were collected in the Ydirection, the X-direction, and the Y-direction of sensors one, two, and three, respectively. In the identification frame, , where A , B , and C denote the normal state, single pitting, and double pitting of the gear teeth, respectively. The results were obtained by inputting the 100 randomly generated sets of data into the VGG16 fault diagnosis model, which was used as the evidence body of DS evidence theory, and then the decision-level fusion was performed. The accuracy of each sensor before and after the fusion is shown in Figure 13 and Table 3.   According to Table 3, it can be concluded that before fusion the accuracy of the singlechannel of all three of the sensors was not high, at 88.32%, 82.39%, and 88.36%, respectively. When the decision-level theory was applied to the diagnostic results that were obtained from each sensor, the accuracy improved greatly, reaching 99.77% when sensor one and sensor two were fused, which was 11.48% and 17.42% higher than that of single sensors one and two, respectively. When sensor one and sensor three were fused, the accuracy reached 99.98%, which was an improvement of 11.66% and 11.62% over single sensors one and three, respectively. The accuracy reached up to 99.46% when sensor two and sensor three were fused, which represented an improvement of 17.16% and 11.16% over single sensors two and three, respectively. When the three sensors were fused simultaneously, the accuracy reached 99.99%, which was 11.67%, 17.60%, and 11.63% better than single sensors one, two, and three, respectively. Obviously, the diagnostic accuracy of a single channel before fusion was not high, and the accuracy after using the DS fusion theory was higher than that of a single sensor before fusion. Nevertheless, as the two sensors were fused together, the fusion accuracy was not as high as that of three sensors fused simultaneously. Hence, it is required to collect more comprehensive data signals for gear teeth, as much as possible, so as to immensely enhance the gears' quality. In addition, the fault diagnosis accuracy of the gear teeth can be improved to a large extent.

Analysis of Single-Channel Experimental Results
In this experiment, 1000 sets of experimental data were selected, and each set of data consisted of 4096 points. It can be obtained from the experimental results that the accuracy of the single-channel gear teeth fault diagnosis was not high, even when the accuracy of each sensor was improved after the DS fusion theory. The accuracy was not as high as the accuracy when the three sensors were fused simultaneously when the two sensors were fused with each other. The simultaneous fusion of three sensors improved the accuracy by 11.67%, 17.60%, and 11.63% over sensor one and two fusion, sensor one and three fusion, and sensor two and three fusion, respectively. The experiments showed that the single-channel diagnosis method does not collect comprehensive data of the gearbox gear teeth, which leads to low diagnosis results, however, using the DS fusion method can make up for the disadvantage of incomplete data collection by a single sensor and can improve the gear teeth fault diagnosis results. Therefore, the fusion algorithm is of great significance for the fault diagnosis field.

Multi-Channel Fault Diagnosis
The data that were collected from three-directions of each sensor were used simultaneously, and the data from the three channels of the sensor were converted into SDP image arms based on SDP technology in order to form a multi-channel mix SDP image arm. By taking the multi-channel of sensor one as an example (its SDP image is shown in Figure 14), a gear fault diagnosis method based on multi-channel fusion was proposed, given that the multi-channel can collect the gear health status more comprehensively. The framework is shown in Figure 15.  The time-domain vibration signals were converted into SDP image hysteresis, the parameters that could change more significantly in the pitting state were initially determined, and 10 sets of comparison experiments were performed, respectively. The time hysteresis coefficient l rose from 0 to 10 in steps of one, and the angular gain factor ξ rose from 0 to 40 in steps of five. In addition, it was found that the fault states of 10 l = , 35 ξ =° in the case of 120 θ =°, were most clearly distinguished. Table 4 shows the parameter settings of the three sensors with multiple channels.
Three-direction sensor 1 Three-direction sensor 2 Three-direction sensor 3  The time-domain vibration signals were converted into SDP image hysteresis, the parameters that could change more significantly in the pitting state were initially determined, and 10 sets of comparison experiments were performed, respectively. The time hysteresis coefficient l rose from 0 to 10 in steps of one, and the angular gain factor ξ rose from 0 to 40 in steps of five. In addition, it was found that the fault states of 10 l = , 35 ξ =° in the case of 120 θ =°, were most clearly distinguished. Table 4 shows the parameter settings of the three sensors with multiple channels.  Figure 15. Fault diagnosis framework based on multi-channel gear pitting level.
The time-domain vibration signals were converted into SDP image hysteresis, the parameters that could change more significantly in the pitting state were initially determined, and 10 sets of comparison experiments were performed, respectively. The time hysteresis coefficient l rose from 0 to 10 in steps of one, and the angular gain factor ξ rose from 0 to 40 in steps of five. In addition, it was found that the fault states of l = 10, ξ = 35 • in the case of θ = 120 • , were most clearly distinguished. Table 4 shows the parameter settings of the three sensors with multiple channels.   18 show the different pitting levels of the three channels of the sensors, respectively, for the clustering templates of the three datasets of the three pitting levels.     18 show the different pitting levels of the three channels of the sensors, respectively, for the clustering templates of the three datasets of the three pitting levels.      According to Figures 16-18, most of the image features are different for each sensor dataset, mainly in arm thickness, concentration, and curvature, and from these feature indicators, the differences of the SDP image arms in the different states can be found. Figures 16, 17 and 18c show the degree of pitting of the SDP image from the normal state to the double pitting state. In Figure 18b the image arm was thicker when single pitting was present. Based on these discrepancies, it was possible to analyze the differences in the gears at the different pitting levels and to make an accurate diagnosis.
After generating the state templates for each dataset, the moderate model was trained in the VGG16 network. The accuracy and the loss of each multi-channel sensor are shown in Figure 19 in order to measure the stability metrics of the model. According to Figures 16-18, most of the image features are different for each sensor dataset, mainly in arm thickness, concentration, and curvature, and from these feature indicators, the differences of the SDP image arms in the different states can be found. Figures 16-18c show the degree of pitting of the SDP image from the normal state to the double pitting state. In Figure 18b the image arm was thicker when single pitting was present. Based on these discrepancies, it was possible to analyze the differences in the gears at the different pitting levels and to make an accurate diagnosis.
After generating the state templates for each dataset, the moderate model was trained in the VGG16 network. The accuracy and the loss of each multi-channel sensor are shown in Figure 19 in order to measure the stability metrics of the model. According to Figure 19, sensor one had 99.92% accuracy and 0.0037 loss in the fault diagnosis model of VGG16; sensor two had 99.96% accuracy and 0.0021 loss; sensor three had 99.91% accuracy and 0.00475 loss.

Multi-Sensor Multi-Channel Fusion
In order to verify the effectiveness of the VGG16 fault diagnosis model, 100 sets of data were randomly generated for each fault state of the gears that were to be tested. Three channels of three sensors were selected simultaneously as the data sources for detecting the gear health condition in the DS evidence theory, in which the evidence body was E denote the data that were collected from multiple channels of sensors one, two, and three, respectively. In the identification framework , wherein, A , B , and C denote the normal state, single pitting, and double pitting of the gear teeth, respectively. The results that were obtained by inputting the 100 randomly generated sets of data into the VGG16 fault diagnosis model were used as the evidence body of the DS evidence theory, and then the decision-level fusion was performed. The accuracy of each sensor before and after fusion is shown in Figure 20 and Table 5. According to Figure 19, sensor one had 99.92% accuracy and 0.0037 loss in the fault diagnosis model of VGG16; sensor two had 99.96% accuracy and 0.0021 loss; sensor three had 99.91% accuracy and 0.00475 loss.

Multi-Sensor Multi-Channel Fusion
In order to verify the effectiveness of the VGG16 fault diagnosis model, 100 sets of data were randomly generated for each fault state of the gears that were to be tested. Three channels of three sensors were selected simultaneously as the data sources for detecting the gear health condition in the DS evidence theory, in which the evidence body was E = {E1, E2, E3}, where E1, E2, and E 3 denote the data that were collected from multiple channels of sensors one, two, and three, respectively. In the identification framework Θ = {A, B, C}, wherein, A, B, and C denote the normal state, single pitting, and double pitting of the gear teeth, respectively. The results that were obtained by inputting the 100 randomly generated sets of data into the VGG16 fault diagnosis model were used as the evidence body of the DS evidence theory, and then the decision-level fusion was performed. The accuracy of each sensor before and after fusion is shown in Figure 20 and Table 5.
From Table 5, it can be concluded that before fusion, the accuracies of the three multichannel sensors were 91.06%, 98.91%, and 93.18%, respectively. The accuracy reached 99.93% as sensor one and sensor two were fused, which was 8.88% and 1.02% higher than that of single sensors one and two, respectively. When sensor one and sensor three were fused, the accuracy reached up to 99.31%, showing an improvement of 8.31% and 6.17% over single sensors one and three, respectively. When sensor two and sensor three were fused, the accuracy reached up to 99.91%, which was an improvement of 1.00% and 6.74% over single sensors two and three, respectively. When three sensors were fused simultaneously, the accuracy reached 99.99%, which was 8.93%, 1.08%, and 6.81% better than single sensors one, two, and three, respectively. Apparently, the multi-channel fault diagnosis accuracy was higher and more able to accurately identify the gear pitting faults. Moreover, the accuracy after using the DS fusion theory was higher than that of a single multi-channel sensor before fusion. Notwithstanding, the accuracy after fusion was not as high as that of three sensors fused at the same time as when using two sensors fused with each other. When collecting the original data of the gears, the data of the gearbox gear teeth needed to be collected as much as possible in order to reduce the loss of original data. This largely improved the fault diagnosis accuracy of the gear teeth.   Table 5, it can be concluded that before fusion, the accuracies of the three multichannel sensors were 91.06%, 98.91%, and 93.18%, respectively. The accuracy reached 99.93% as sensor one and sensor two were fused, which was 8.88% and 1.02% higher than that of single sensors one and two, respectively. When sensor one and sensor three were fused, the accuracy reached up to 99.31%, showing an improvement of 8.31% and 6.17% over single sensors one and three, respectively. When sensor two and sensor three were fused, the accuracy reached up to 99.91%, which was an improvement of 1.00% and 6.74% over single sensors two and three, respectively. When three sensors were fused simultaneously, the accuracy reached 99.99%, which was 8.93%, 1.08%, and 6.81% better than single sensors one, two, and three, respectively. Apparently, the multi-channel fault diagnosis accuracy was higher and more able to accurately identify the gear pitting faults.

Analysis of Experimental Results
In this experiment, 500 sets of experimental data were selected, and each set of data consisted of 1024 points. In addition, the experimental results showed that the accuracy of each sensor improved after the DS fusion theory. However, when two sensors were fused with each other, the accuracy was not as high as the accuracy of three sensors fused at the same time. In addition, the simultaneous fusion of three sensors improved the accuracy by 0.05%, 0.31%, and 0.58% over the fusion of sensors 1 and 2, sensors 1 and 3, and sensors 2 and 3, respectively. The experiments showed that the use of the DS fusion method can compensate for the disadvantage of incomplete data collection by a single sensor and can improve the gear teeth fault diagnosis results. All in all, the fusion algorithm is important for the field of fault diagnosis.
According to Figure 21, the diagnostic accuracy of the multiple-channel single sensors before fusion was higher than that of the single-channel, by 3.01%, 16.7%, and 5.17%, respectively. It can be proved that the multi-channel for the comprehensive data acquisition can improve the diagnostic accuracy. In order to improve the diagnostic accuracy of the gear fault identification, the DS evidence theory method was used in order to fuse three sensors of gearbox gear teeth, as shown in Figure 22. Both single-channel fault diagnosis and multi-channel diagnosis accuracy were improved and the accuracy of the multi-channel diagnosis after fusion was also mostly higher than the accuracy of single-channel diagnosis after fusion when the fusion theory was used.
In this experiment, 500 sets of experimental data were selected, and each set of data consisted of 1024 points. In addition, the experimental results showed that the accuracy of each sensor improved after the DS fusion theory. However, when two sensors were fused with each other, the accuracy was not as high as the accuracy of three sensors fused at the same time. In addition, the simultaneous fusion of three sensors improved the accuracy by 0.05%, 0.31%, and 0.58% over the fusion of sensors 1 and 2, sensors 1 and 3, and sensors 2 and 3, respectively. The experiments showed that the use of the DS fusion method can compensate for the disadvantage of incomplete data collection by a single sensor and can improve the gear teeth fault diagnosis results. All in all, the fusion algorithm is important for the field of fault diagnosis.
According to Figure 21, the diagnostic accuracy of the multiple-channel single sensors before fusion was higher than that of the single-channel, by 3.01%, 16.7%, and 5.17%, respectively. It can be proved that the multi-channel for the comprehensive data acquisition can improve the diagnostic accuracy. In order to improve the diagnostic accuracy of the gear fault identification, the DS evidence theory method was used in order to fuse three sensors of gearbox gear teeth, as shown in Figure 22. Both single-channel fault diagnosis and multi-channel diagnosis accuracy were improved and the accuracy of the multichannel diagnosis after fusion was also mostly higher than the accuracy of single-channel diagnosis after fusion when the fusion theory was used.

Conclusions
The multi-channel decision-level fusion algorithm that was based on SDP analysis and the VGG16 fault diagnosis model was proposed for the research point of the poor

Conclusions
The multi-channel decision-level fusion algorithm that was based on SDP analysis and the VGG16 fault diagnosis model was proposed for the research point of the poor robustness of the traditional single-channel vibration signals in the comprehensive monitoring of gearbox fault states in the present study. The multi-channel sensor has been proven to be able to collect the signal source comprehensively in order to compensate for the problem of low accuracy of the gear fault diagnosis that was caused by insufficient data sources. The experimental results showed that, before fusion, the multi-channel diagnosis results per sensor were 3.01%, 16.7%, and 5.17% higher than the single-channel diagnosis results, respectively, in case of fewer data sources. After fusion, the difference between the singlechannel and the multi-channel fault diagnosis accuracy was not significant in the presence of sufficient data sources. It can, therefore, be concluded that the number of data sources has a great influence on the fault diagnosis results, and the more data sources there are, the higher the accuracy is. Hence, the research results are of great significance for gearbox gear teeth pitting status identification.

Conflicts of Interest:
The authors declare no conflict of interest. In addition, the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.