SSDStacked-BLS with Extended Depth and Width: Infrared Fault Diagnosis of Rolling Bearings under Dual Feature Selection

: In fault diagnosis, broad learning systems (BLS) have been applied in recent years. However, the best fault diagnosis cannot be guaranteed by width node extension alone, so a stacked broad learning system (stacked BLS) was proposed. Most of the methods for choosing the number of depth layers used optimization algorithms that tend to increase computation time. In addition, the data under single feature selection are not sufﬁciently representative, and effective features are easily lost. To solve these problems, this article proposes an infrared fault diagnosis model for rolling bearings based on integration of principal component analysis and singular value decomposition (IPS) and the stacked BLS with self-selected depth model (SSDStacked-BLS). First, 72 second-order statistical features are extracted from the pre-processed infrared images of rolling bearings. Next, feature selection is performed using IPS. he IPS feature selection module consists of principal component analysis (PCA) and singular value decomposition (SVD). The feature selection is performed by PCA and SVD separately, which are then stitched together to form a new feature. This ensures a comprehensive coverage of infrared image features. Finally, the acquired features are input into SSDStacked-BLS. This model establishes a data storage group for the residual training characteristics of stacked BLS, adding one block at a time. The accuracy rate of each newly added block is output and saved to the data storage group. If the diagnostic rate fails to increase three consecutive times, the block stacking is stopped and the results are output. IPS-SSDStacked-BLS achieved an accuracy of 0.9667 in 0.1775 s. This is almost ﬁve times faster than stacked BLS optimized using the grid search method. Compared with the original BLS, its accuracy was 0.0445 higher and the time was approximated. Compared with IPS-SVM, IPS-RF, IPS-1DCNN and 2DCNN, IPS-SSDStacked-BLS was more advantageous in terms of accuracy and time consumption.


Introduction
A temperature signal has the characteristics of non-contact and high sensitivity [1,2].Therefore, some scholars have applied these for fault diagnosis in the field of electric power or optoelectronics [3,4].In the field of the fault diagnosis of rolling bearings, vibration signals and acoustic emission signals have mostly been used.However, the acquisition of these two signals requires contact measurement, and it is very difficult to capture a pure signal.As a result, there is a lot of interference and error about signal acquisition.In addition, many efficient and accurate noise removal methods need to be investigated based on the mechanistic analysis of the relevant signals.Therefore, the application of both signals for the fault diagnosis of rolling bearings has certain drawbacks [5,6].Alternatively, infrared signals have a strong advantage in high-speed testing targets.In summary, it is feasible to introduce infrared thermal images as diagnostic signals into the field of rolling bearing fault diagnosis.
Infrared thermal images have the disadvantage of low signal-to-noise ratios and blurred visual effects.It carries salt-pepper noise and Gaussian noise, which behave as black and white pixel dots on the image [7,8].This phenomenon is normal, and the noise can be removed by a simple algorithm.In addition, to perform the fault diagnosis of rolling bearings by infrared images, a clear target area needs to be selected.Interference, such as background temperature, can greatly affect the effectiveness of fault diagnosis.Hence, separating the target region using an image segmentation algorithm is also a key step in image preprocessing.
Since its development, deep learning has been widely used for image classification.Li et al. used a convolutional neural network (CNN) for the feature extraction of infrared images of gearboxes to obtain comprehensive feature parameters, before ultimately using SoftMax for classification [9].Choudhary et al. proposed a LeNet-5 based CNN for the fault diagnosis of bearings, which significantly outperformed an artificial neural network (ANN).Fault diagnosis based on infrared images was significantly more advantageous compared with vibration signals [10].Kellil et al. applied an improved visual geometry group (VGG) for fault diagnosis of PV modules.The results show that it outperformed the small deep convolutional neural network (small-DCNN) [11].To address the problem of insufficient data, He et al. fused a convolutional auto-encoder (CAE) and an enhanced convolutional neural network (ECNN).As a result, deeper features could be extracted [12].Wei et al. used migration learning and a deep convolutional generative adversarial network (DCGAN) to construct an extended model of faulty samples to solve the fault diagnosis of a small sample of infrared images [13].Mian et al. performed the fault diagnosis of multiple bearing faults by fusing time-frequency images of vibration signals generated by continuous wavelet transform (CWT) and captured infrared images using CNN.Robust and reliable results were obtained [14].The acoustic emission signal was described as a two-dimensional image by Pham et al.A generative adversarial network (GAN) was employed under unbalanced data [15].Shao et al. proposed a dual-threshold attention-guided GAN (DTAGAN) which contains a dual-threshold training mechanism for generating high-quality infrared thermal images [16].Therefore, the biggest advantage of deep learning is that it can learn deep features and offer good results in fault diagnosis.
In fact, the whole infrared image does cover a much wider range of features; however, all pixel points are input, and long computation time is unavoidable.To reduce the computing time, extracting more representative infrared image features from the image and performing feature selection becomes an excellent solution.Liu et al. extracted texture features, moment features, and modified entropy features from the manually acquired region of interest (RoI) as data for server fault diagnosis [17].Thobiani et al. decomposed bearing infrared thermography.Then, second-order statistical texture features of the images were extracted, and feature selection was performed [18].Subsequently, most scholars have used methods such as machine learning to obtain results for the fault diagnosis of the acquired features.Deilamsalehy et al. extracted the effective features of the infrared images of the rolling bearings of railroad trains based on a histogram of oriented gradients (HOG), before support vector machines (SVM) are used for fault diagnosis [19].Glowacz et al. used the method of areas selection of image differences (MoASoID) and image histograms to obtain feature vectors from a three-phase induction motor.Their comparison revealed that the K-means clustering algorithm (K-means) classification outperformed the nearest neighbor (NN) and back-propagation neural network (BNN) [20].Therefore, most manual feature extraction methods are followed by machine learning methods such as K-means and SVM.Though machine learning cannot learn the deep features of the data as deep learning does, its speed is faster.Therefore, the extraction of effective features can assist machine learning to obtain better fault diagnosis results.
In summary, infrared image fault diagnosis based on deep learning and machine learning has been widely used.Deep learning models can learn deep features but take a lot of time and have high requirements for computing.Machine learning, although faster, has data dependency and poor generalizability of the model.Therefore, fault diagnosis using the above two models has certain defects.In 2018, Chen C L P proposed BLS, which is similar to a random vector functional link neural network (RVFLNN).BLS contains only one hidden layer.When faced with learning inaccuracies, incremental expansion is performed in a lateral manner at any time.This greatly solves the problem of long training time for deep learning [21].In recent years, numerous scholars have introduced it to the field of fault diagnosis.Zhang et al. combined dual-tree composite wavelet decomposition and BLS to accomplish the rapid fault diagnosis of bearings [22].Wang et al. proposed a TSK-BLS that can be computed by a symmetric method so that fault diagnosis can be completed quickly [23].Zhou et al. used BLS and optimized it with a genetic algorithm (GA) to ensure that fault diagnosis of bearing infrared images was accomplished with optimal parameters [24].Wang et al. selected temporal and spatial features as input features, which was combined with BLS to accomplish the fault diagnosis of a power system [25].Some improved BLS algorithms have been proposed consecutively, such as a fuzzy broad learning system (Fuzzy BLS) and stacked BLS [26,27].Although BLS has been widely used, it has been demonstrated that block operations are more effective [27].Among these, stacked BLS applies stacking rules to stack BLS blocks and is constructed as a repeated step.After generating the BLS block, the next BLS block is stacked on top of the previous one.The residuals of the previous block are set as the desired output data of the next block.The design is equally extended in depth and width because the lower BLS block at the bottom is fixed when stacking new blocks.As a result, the computational cost remains relatively low, giving stacked BLS an advantage over BLS.
Based on the above description, the fault diagnosis model of IPS-SSDStacked-BLS is proposed in this article for rolling bearing infrared images.In terms of feature selection, two feature selection methods, PCA and SVD, are combined into an IPS feature selection method.Use of the dual feature selection method allows the selection and merging of the two most representative features.The fusion of features can summarize the image features to a greater extent.In the fault diagnosis section, the SSDStacked-BLS model is proposed.The optimal stacked block count finding for stacked BLS uses the grid search method.This restacks the block count for each calculation so that the computation time is relatively slow.SSDStacked-BLS stores the output of each stacked block and all previous stacked blocks before stacking new blocks.Once the accuracy stops increasing for three consecutive instances, the result of this time is saved as the best result.This process reduces a large amount of computing time compared with other optimization algorithms.
Infrared videos of nine faulty bearings were acquired and infrared images were extracted separately.Subsequently, a series of preprocessing processes, such as grayscale image conversion, region of interest (RoI) acquisition, denoising, and image segmentation were performed on the obtained infrared image.The second-order statistical features were extracted through gray level concurrence matrix (GLCM).Seventy-two features, including energy, contrast, entropy, sum average, variance, and correlation, were obtained from three distances and three angles.Afterwards, IPS was used to select the extracted features.Finally, SSDStacked-BLS was used for fault diagnosis.
The innovations are as follows in this article: (1).IPS feature selection method combines features from PCA and SVD feature selection, making features richer and more general.
(2).Using infrared images to replace previous vibration signals and acoustic emission signals, eliminating more noise interference.
(3).SSDStacked-BLS stores the output of each stacked block and the sum of all previous stacked block outputs, avoiding restacking during the optimization process.

Median Filter Denoising
Median filtering is a common image denoising method that reduces the noise in an image by replacing the value of each pixel with the median of the pixel values in its neighborhood.It is based on localized pixel ordering.For a given neighborhood of pixels, the median filtering algorithm sorts the pixel values according to their size and the sorted median value is selected as the new value for the current pixel.This method can effectively remove salt and pepper noise, impulse noise, and other types of noise.Image details and valid information can be retained.

Adaptive Threshold Segmentation
Adaptive thresholding segmentation determines the threshold value of each pixel based on the statistical information of local pixels.Thus, the image is segmented into different regions.Compared with global thresholding, this is more applicable to the case of background noise and therefore offers great advantages when dealing with the background of IR images with high interference.
The basic idea of adaptive thresholding segmentation is to calculate a local threshold at each pixel position of the image, which is determined according to the neighboring pixel values around that pixel.In this article, the Gaussian weighted averaging method is chosen to calculate the local threshold.A weighted average is applied to the neighboring pixels, where the pixels closer to the center pixel have a greater weight.Pixel values above the threshold are set to 255, and pixel values below the threshold are set to 0.

Second-Order Statistical Features
Second-order statistical features are also known as the texture features of a picture.These can reflect the details of the picture with better robustness.Generally, they can be extracted by GLCM.The theoretical formula of GLCM is: where, i and j represent the horizontal and vertical coordinates of the pixel; δ and θ represent the distance and angle of the pixels; P(i, j, δ, θ) denotes the gray value pixel probability; N(i, j, δ, θ) denotes the number of pixels with grayscale values; and G represents the 256 grayscale values from 0-255.The extracted second-order statistical features contain six second-order statistical features.The pixel distances of 1, 2 and 3 and the pixel angles of 0 • , 45 • , 90 • and 135 • were chosen.These are respectively noted as GLCM_0, GLCM_1, . .., GLCM_71.The six specific characteristics are shown below: Energy indicates the degree of uniformity of image gray distribution and texture thickness, and the formula is as follows: Contrast represents the degree of image sharpness and texture depth, and the formula is as follows: Entropy shows the complexity of the image, and the formula is as follows: Sum average values indicate the degree of regularity of the texture, and the formula is as follows: Variance represents the measure of deviation of the image element value from the mean value, and the formula is as follows: Correlation is a similarity metric.These qualities highlight the direction of the cooccurrence matrix elements and the formula is as follows: where Ng is the number of different gray levels; p(i, j) is the corresponding pixel target of GLCM; σ x and σ y are the standard deviations of p x and p y , respectively; µ x and µ y are the means of p x and p y , respectively; and µ is the mean of µ x and µ y .p x+y (k) = p(i, j), k = 2, 3, . . ., 2N g .

PCA
PCA uses linear transformation to transform the original correlated linked data into linearly uncorrelated data in each dimension and at the same time can reflect more features of the data.Suppose there is a set of data X with m rows and n columns.This is equal to m samples and n features, and the algorithm steps are: (1) Homogenize each feature.
(4) Sort the eigenvectors into matrices according to the magnitude of the eigenvalues.Form the matrix according to the feature contribution rate or artificially set the retention dimension.For more flexibility to retain the remaining features, the feature contribution rate is set here.

SVD
SVD is a matrix decomposition method and is mainly used for data compression and data dimensionality reduction.It achieves dimensionality reduction by truncating singular value decomposition.The data set is the characteristic data of m rows and n columns, if there always exists: Then it can be decomposed by the following steps: (1) The eigenvalues and eigenvectors are obtained by AA T to obtain U.
(2) The eigenvalues and eigenvectors are obtained from A T A, and V can be obtained.
(3) According to Σ mn , the value of singular value decomposition can be obtained.SVD can directly find the new feature space and the reduced dimensional feature matrix without calculating the covariance matrix.In general, it retains the remaining dimensionality by a set number of principal components.However, this will add too many artificial factors.
A method is established to achieve data reduction and compression by setting a suitable contribution threshold and retaining an appropriate number of principal components.First, the proportion of variance explained by each principal component is obtained.These variance proportions indicate the degree of contribution of each principal component to the total variance.Second, these variance proportions are summed cumulatively.For example, if the third element of the cumulative explained variance is 0.85, then the first three principal components will have explained 85% of the total variance.Finally, a contribution threshold is set to retain the dimensionality.

BLS
BLS is built based on traditional RVFLNN.It serves as an alternative to deep learning and overcomes the disadvantage of deep learning's time-consuming nature.BLS is built as a planar network.In the network, the original inputs are transferred and placed as "mapped features" in the feature nodes.The structure is generalized in "enhancement nodes".Figure 1 illustrates the structure of BLS.
Mathematics 2023, 11, x FOR PEER REVIEW 6 of 19 (9) Then it can be decomposed by the following steps: (1) The eigenvalues and eigenvectors are obtained by to obtain .
(2) The eigenvalues and eigenvectors are obtained from , and can be obtained.
(3) According to , the value of singular value decomposition can be obtained.
SVD can directly find the new feature space and the reduced dimensional feature matrix without calculating the covariance matrix.In general, it retains the remaining dimensionality by a set number of principal components.However, this will add too many artificial factors.
A method is established to achieve data reduction and compression by setting a suitable contribution threshold and retaining an appropriate number of principal components.First, the proportion of variance explained by each principal component is obtained.These variance proportions indicate the degree of contribution of each principal component to the total variance.Second, these variance proportions are summed cumulatively.For example, if the third element of the cumulative explained variance is 0.85, then the first three principal components will have explained 85% of the total variance.Finally, a contribution threshold is set to retain the dimensionality.

BLS
BLS is built based on traditional RVFLNN.It serves as an alternative to deep learning and overcomes the disadvantage of deep learning's time-consuming nature.BLS is built as a planar network.In the network, the original inputs are transferred and placed as "mapped features" in the feature nodes.The structure is generalized in "enhancement nodes".Figure 1 illustrates the structure of BLS.After the model obtains the input data, it first obtains the mapped nodes, which can be defined as: where, represents the feature node; represents the random weight; and represents the random bias.The above variables all concern the feature-mapped layer.
Subsequently, the augmented nodes can be obtained based on the mapped nodes.The generation of enhancement nodes can be defined as:

(
), 1,..., where represents the j-th enhancement nodes; : represents the random weight; and represents the random bias.The above variables all concern the feature-enhancement layer.After the model obtains the input data, it first obtains the mapped nodes, which can be defined as: where, Z i represents the feature node; W ei represents the random weight; and β ei represents the random bias.The above variables all concern the feature-mapped layer.Subsequently, the augmented nodes can be obtained based on the mapped nodes.The generation of enhancement nodes can be defined as: where H j represents the j-th enhancement nodes; W hj : represents the random weight; and β hj represents the random bias.The above variables all concern the feature-enhancement layer.
The mapped nodes and enhancement nodes are combined and can be defined as: where, Y represents the node set; Z n : Z n = [Z 1 , Z 2 , . . . ,Z n ] represents the mapped nodes set; H m : H m = [H 1 , H 2 , . . . ,H m ] represents the enhancement nodes set; and W m represents the weight set.Finally, the weights of the output layer can be obtained from the pseudo-inverse.This can be described by the following equation:

Stacked BLS
To better describe the stacked BLS, the BLS is simplified.The structure of Figure 1 can be described as Figure 2. Similarly, BLS includes n sets of mapped nodes and m sets of enhancement nodes, Equation ( 12) can be rewritten as The generalized function for the mapped nodes is denoted by P. Similarly, the generalized function for generating the enhancement nodes is denoted by Q. BLS can be described as where y and x denote the input and output data, respectively; w e = {W ei |1 ≤ i ≤ n } denotes the set of random weights of mapped nodes; and w h = {W hi |1 ≤ j ≤ m } denotes the set of random weights of enhancement nodes.
The mapped nodes and enhancement nodes are combined and can be defined as: (12) where, represents the node set; represents the mapped nodes set; represents the enhancement nodes set; and represents the weight set.
Finally, the weights of the output layer can be obtained from the pseudo-inverse.This can be described by the following equation: (13)

Stacked BLS
To better describe the stacked BLS, the BLS is simplified.The structure of Figure 1 can be described as Figure 2. Similarly, BLS includes n sets of mapped nodes and m sets of enhancement nodes, Equation ( 12) can be rewritten as The generalized function for the mapped nodes is denoted by P .Similarly, the gen- eralized function for generating the enhancement nodes is denoted by where y and x denote the input and output data, respectively;  ( , ) ( ( , ), ) ( , ) ( , , )  Stacked BLS is defined with n stacked BLS blocks: u i , i = 1, 2, . . ., n.The stacked BLS structure diagram is shown in Figure 3.The description of the BLS blocks (different from the original BLS) follows: where i = 1, 2, . . ., n; w ei and w hi are the weights of randomly generated mapped nodes and enhancement nodes, respectively; and v i = g(u i−1 ), g(•) are constant functions, i.e., v i = u i−1 .In this process, the output of the previous layer of blocks is passed to the input of the new layer of blocks.W Ei and W Hi in the above equation are obtained by the following equation: where y i is the expected output of the training data in the i-th block.
In fact, the optimal solution can be approximated by ridge regression as follows: where λ is the ridge coefficient, I is the unit matrix, A = [P(v i , w ei )|Q P (v i , w ei , w hi ) ], and For the n-th block of the BLS (the last block), the last block is trained to approximate n y , that is: As a result, the following can be obtained: All u i are used for the approximate expectation of y.Thus, the expected output of each block is Next, the details of each BLS block are described, given training data (x, y).At this point, set y 1 = y, v 1 = x.This process can be seen as the generation process of the original BLS.The following can be obtained: where w e1 and w h1 are randomly generated.W E1 and W H1 are obtained through the matrices [P(x, w e1 ), Q p (x, w e1 , w h1 )] and The output of the first BLS block g(u 1 ) is entered into the second block, i.e., v 2 = g(u 1 ).The expected output of the second BLS block is the residual of the first BLS block y 2 = y − u 1 .The following can be obtained: where w e2 and w h2 are randomly generated and W E2 and W H2 are obtained through the solution of ( 17): and so on for the i-th BLS block, where all exist as follows: and where w ei and w hi are randomly generated.W Ei and W Hi are: For the n-th block of the BLS (the last block), the last block is trained to approximate y n , that is: As a result, the following can be obtained: Thus, the output y that we expect is eventually approximated by the sum of all outputs of n BLS blocks.In stacked BLS, when adding a new BLS network, the previous block is fixed.Therefore, the BLS block avoids the retraining process.

Experiment
Figure 4a shows the lab bench for infrared video acquisition.Figure 4b shows a sample of the equipment capturing an infrared image.Different faulty bearings and motors are installed on the experimental bench.All other parts are labeled in the figures.In addition, Table 1 demonstrates the types of experimental equipment and the corresponding parameters.
Mathematics 2023, 11, x FOR PEER REVIEW 10 of 19 Thus, the output y that we expect is eventually approximated by the sum of all outputs of n BLS blocks.In stacked BLS, when adding a new BLS network, the previous block is fixed.Therefore, the BLS block avoids the retraining process.

Experiment
Figure 4a shows the lab bench for infrared video acquisition.Figure 4b shows a sample of the equipment capturing an infrared image.Different faulty bearings and motors are installed on the experimental bench.All other parts are labeled in the figures.In addition, Table 1 demonstrates the types of experimental equipment and the corresponding parameters.The steps of infrared thermal image acquisition are shown below: Step 1: A state bearing is mounted to the shaft.
Step 2: The motor controller is turned on and the bearing runs at a constant speed of 2000 rpm.
Step 3: After the temperature is stabilized, 15 min of infrared video is captured.
Step 4: The rolling bearing failure test bench is cooled to an ambient temperature of 24.5 • C.
Step 6: Infrared images are extracted from the videos according to the 9 s time interval.Each category contains 100 images.
Step 7: According to Figure 4b, it can be found that different regional temperature variations caused by bearing failure are reflected in the bearing itself and the shaft.Therefore, the cropping area can only contain rectangular areas.The RoI for each type of IR image is shown in Figure 5.
Step 8: The background temperature of the disturbance is also included.Since the background temperature cannot be guaranteed to be the same in each environment, the focus region is segmented using an adaptive threshold segmentation method.
Step 9: The 72 extracted second-order statistical features are imported into IPS-SSDStacked-BLS for fault diagnosis.Step 7: According to Figure 4b, it can be found that different regional temperature variations caused by bearing failure are reflected in the bearing itself and the shaft.Therefore, the cropping area can only contain rectangular areas.The RoI for each type of IR image is shown in Figure 5.
Step 8: The background temperature of the disturbance is also included.Since the background temperature cannot be guaranteed to be the same in each environment, the focus region is segmented using an adaptive threshold segmentation method.
Step 9: The 72 extracted second-order statistical features are imported into IPS-SSD-Stacked-BLS for fault diagnosis.

IPS-SSDStacked-BLS Fault Diagnosis Model
Based on the underlying theory in the second part, the IPS-SSDStacked-BLS model is

IPS-SSDStacked-BLS Fault Diagnosis Model
Based on the underlying theory in the second part, the IPS-SSDStacked-BLS model is proposed.It contains two parts: the IPS feature selection method and the SSDStacked-BLS fault diagnosis method.The IPS feature selection method consists of PCA and SVD.The data were feature selected using two respective feature selection methods and then the two sets of features were merged.After that, stacked BLS was improved to obtain the SSDStacked-BLS fault diagnosis method.As stacked BLS stacks the output of each block each time the blocks are stacked, this feature can be used instead of the traditional optimization algorithm, which reduces the computation time in obtaining the optimal number of blocks.The proposed IPS-SSDStacked-BLS fault diagnosis flowchart is shown in Figure 6.Specifically, the model sets a condition to jump out of stacking.Once two blocks are stacked, and the accuracy does not increase, the stacking of new blocks is stopped.If this condition is never met, the maximum number of stacks is 10.The last stacking result is kept as the optimal number of blocks.First, 72 second-order statistical features were obtained, amounting to 900 × 72.IPS is used for feature selection.The contribution retained by PCA was set to 0.999 and the contribution retained by SVD was also set to 0.999.The final remaining results were 900 × a and 900 × b respectively.The features were then stitched together as 900 × (a + b).

Begin
SSDStacked-BLS is the fault diagnostic module.Stacked BLS is trained with multiple BLS blocks for residuals, so it is important to determine the number of stacks.Too many stacks will result in long run times.
Firstly, two arrays are set up, region 1 and region 2. region 1 is to store u i acquired in each round.Region 2 is used to store y i acquired in each round.This is used to determine the condition for the number of blocks to end the stack.The first time a block is stacked, the outputs u 1 and y 1 = u 1 are stored in region 1 and region 2, respectively.The second time another block is added and u 2 and y 2 = u 1 + u 2 are stored in region 1 and region 2 respectively.In this order, n blocks are stacked for the n-th time and output u n and y n = u 1 + u 2 + . . .+ u n .These are stored in region 1 and region 2 respectively.Specifically, the model sets a condition to jump out of stacking.Once two blocks are stacked, and the accuracy does not increase, the stacking of new blocks is stopped.If this condition is never met, the maximum number of stacks is 10.The last stacking result is kept as the optimal number of blocks.
In SSDStacked-BLS, the shrinkage factor s = 0.8, the regularization parameter C = 2 −20 , the number of mapped nodes of the first BLS block N1 = 5, the number of windows of mapped nodes N2 = 4, and the number of enhancement nodes N3 = 25.The number of mapped nodes of the remaining stacked BLS blocks N21 = 3, the number of windows of mapped nodes N22 = 2, and the number of enhancement nodes N23 = 10.The range of the number of stacked blocks i is [1,10].The parameters of the UCI dataset for the stacked BLS algorithm proposed by the original authors were referenced here.

Fault Diagnosis Process
The fault diagnosis process applied in this article is shown in Figure 7. Before the IPS-SSDStacked-BLS fault diagnosis model, a series of pre-processing works are required.Pre-processing work can ensure that the extracted features are more representative.As the background temperature of the video was different for each acquisition, it was necessary to intercept the RoI of the temperature change due to the fault.By observing the temperature change during the experiment, it is found that the bearing region and the shaft part produce different temperature changes.Therefore, this region can be used as RoI.
In detail, the steps for the fault diagnosis of rolling bearing IR images based on IPS-SSDStacked-BLS are as follows: Step 1: Collect infrared video of rolling bearings of 9 categories for 15 min.Crop 1 infrared image every 9 s interval, 100 images per category.
Step 2: Crop the images of the bearing with an axis area as RoI.select 280 × 150 pixels images.
Step 3: Convert all of the acquired images into grayscale maps.
Step 5: Convert the images into binary images using adaptive threshold segmentation.The filter is 5 × 5. Gaussian weighted average is applied.
Step 7: Input to IPS-SSDStacked-BLS fault diagnosis model.After obtaining the selected and fused features, we can obtain the best accuracy and the number of stacked blocks.
The fault diagnosis process applied in this article is shown in Figure 7. Before the IPS-SSDStacked-BLS fault diagnosis model, a series of pre-processing works are required.Preprocessing work can ensure that the extracted features are more representative.As the background temperature of the video was different for each acquisition, it was necessary to intercept the RoI of the temperature change due to the fault.By observing the temperature change during the experiment, it is found that the bearing region and the shaft part produce different temperature changes.Therefore, this region can be used as RoI.In detail, the steps for the fault diagnosis of rolling bearing IR images based on IPS-SSDStacked-BLS are as follows: Step 1: Collect infrared video of rolling bearings of 9 categories for 15 min.Crop 1 infrared image every 9 s interval, 100 images per category.
Step 2: Crop the images of the bearing with an axis area as RoI.select 280 × 150 pixels images.
Step 3: Convert all of the acquired images into grayscale maps.
Step 5: Convert the images into binary images using adaptive threshold segmentation.The filter is 5 × 5. Gaussian weighted average is applied.
Step 7: Input to IPS-SSDStacked-BLS fault diagnosis model.After obtaining the selected and fused features, we can obtain the best accuracy and the number of stacked blocks.

Results and Discussion
Before describing the results, the operating environment needs to be given.The details are shown in Table 2.The nine obtained types of infrared images were subjected to RoI cropping, grayscale map conversion, median filtering denoising and adaptive threshold segmentation.The results after pre-processing are shown in Figure 8.

Results and Discussion
Before describing the results, the operating environment needs to be given.The details are shown in Table 2.The nine obtained types of infrared images were subjected to RoI cropping, grayscale map conversion, median filtering denoising and adaptive threshold segmentation.The results after pre-processing are shown in Figure 8. Subsequently, 72 features were extracted using GLCM with GLCM_0, GLCM_1, …, GLCM_71.Among these, the training set and test set were divided according to the ratio of 9:1.Three fault diagnosis models, IPS-SSDStacked-BLS, PCA-SSDStacked-BLS and SVD-SSDStacked-BLS, were then compared.The purpose was to verify the advantages of Subsequently, 72 features were extracted using GLCM with GLCM_0, GLCM_1, . .., GLCM_71.Among these, the training set and test set were divided according to the ratio of 9:1.Three fault diagnosis models, IPS-SSDStacked-BLS, PCA-SSDStacked-BLS and SVD-SSDStacked-BLS, were then compared.The purpose was to verify the advantages of the IPS feature selection method.Firstly, the IPS method mentioned in the previous article was used to select features.Feature selection was performed using PCA with a contribution rate of 0.9999 and SVD with a contribution rate of 0.9999 (here the contribution index was improved).Three groups of features were applied to the SSDStacked-BLS proposed in this article for fault diagnosis, and the results are shown in Table 3 and Figure 9.According to the comparison results in Table 3, it can be seen that the fault diagnosis rate of IPS-SSDStacked-BLS is significantly higher than that of PCA-SSDStacked-BLS and SVD-SSDStacked-BLS.The higher values are 0.0889 and 0.0335, respectively.Although the overall time consumed for feature selection using PCA is 1/2 of the other two methods, its accuracy is too low.In contrast, the time spent on IPS approximates the time spent on PCA more than the time spent on SVD.IPS-SSDStacked-BLS, however, can obtain a fault diagnosis accuracy of 0.9667.It is noteworthy that the running time of SSDStacked-BLS for fault diagnosis is nevertheless similar regardless of the applied feature selection method.In the experiments, both PCA and SVD contributions were manually increased, and the fault diagnosis effect is not as good as that for IPS.Thus, the IPS feature selection method combines the advantages of both sets of features.IPS feature selection has the advantage of favorable feature selection.The time consumption index is set for better time analysis of the acquired data.IPS-SSDStacked-BLS is 1.The others are corresponding time multiples.
Some information can be seen from In Table 3, the values of the three models, obtained by automatically finding the number of stacked blocks, are 2, 3 and 2. To verify whether IPS-SSDStacked-BLS, PCA-SSD-Stacked-BLS and SVD-SSDStacked-BLS export the optimal number of blocks the jump out According to the comparison results in Table 3, it can be seen that the fault diagnosis rate of IPS-SSDStacked-BLS is significantly higher than that of PCA-SSDStacked-BLS and SVD-SSDStacked-BLS.The higher values are 0.0889 and 0.0335, respectively.Although the overall time consumed for feature selection using PCA is 1/2 of the other two methods, its accuracy is too low.In contrast, the time spent on IPS approximates the time spent on PCA more than the time spent on SVD.IPS-SSDStacked-BLS, however, can obtain a fault diagnosis accuracy of 0.9667.It is noteworthy that the running time of SSDStacked-BLS for fault diagnosis is nevertheless similar regardless of the applied feature selection method.In the experiments, both PCA and SVD contributions were manually increased, and the fault diagnosis effect is not as good as that for IPS.Thus, the IPS feature selection method combines the advantages of both sets of features.IPS feature selection has the advantage of favorable feature selection.The time consumption index is set for better time analysis of the acquired data.IPS-SSDStacked-BLS is 1.The others are corresponding time multiples.Some information can be seen from In Table 3, the values of the three models, obtained by automatically finding the number of stacked blocks, are 2, 3 and 2. To verify whether IPS-SSDStacked-BLS, PCA-SSDStacked-BLS and SVD-SSDStacked-BLS export the optimal number of blocks the jump out of the loop mode is canceled and 1-10 stacked results for each fault diagnostic model are exported, as shown in Figure 10.From Figure 10a, the earliest stacked block with the highest accuracy is indeed block 2. This is consistent with the value filtered out by IPS-SSDStacked-BLS.From Figure 10b, one can see that the earliest number of stacked blocks with the highest accuracy is indeed block 3, which is consistent with PCA-SSDStacked-BLS.As can be seen from Figure 10c, the earliest number of stacked blocks with the highest accuracy is block 2, which is consistent with the value filtered by SVD-SSDStacked-BLS.From the change of accuracy in subsequent tests, the accuracy will not increase again even if new blocks continue to be added.Therefore, the SSDStacked-BLS proposed in this article has advantages when finding the optimal number of iterative blocks.
We know that in the article presented by stacked BLS, the authors used a grid search method for finding the optimal number of stacked blocks.The grid search method is an optimization search for the number of blocks, i.e., a parameter optimization.This will result in the re-stacking of the block count with each re-selection, which increases the computational cost.As stacked BLS is based on the characteristics of residual training, a stacked block count screening method is proposed in this article.This avoids restacking for each optimization and thus reduces computing time.To verify the advantage SSD-Stacked-BLS and its stacked blocks, it is compared with stacked BLS of the grid search method proposed by the original authors, and the results are shown in Table 4, as well as some further information.Both IPS-SSDStacked-BLS and IPS-Stacked BLS can obtain the best block count of 2 and the best accuracy of 0.9667.However, the running time of IPSstacked BLS is indeed more than five times that of IPS-SSDStacked-BLS.It is the advantage of stacked BLS to be able to calculate in terms of residuals and thus allow the results to be retained and overlaid each time.This saves even more time.According to the research, broad learning has some advantages that deep learning and machine learning do not.Machine learning is less accurate and deep learning is timeconsuming.Therefore, several comparison schemes were chosen.First, the IPS-BLS method was chosen, which uses the same parameters as IPS-SSDStacked-BLS, i.e., N1 = 5, N2 = 4, and N3 = 25.Fuzzy BLS is used for fault diagnosis, where the value of the fuzzy rule is 2 and the value of fuzzy subsystems is 6, and the number of augmented nodes is 20.In addition, two machine learning algorithms were chosen, i.e., SVM and RF.IPS was applied for feature selection followed by fault diagnosis.Two deep learning algorithms, 1DCNN and 2DCNN, were also selected.Feature selection was performed with IPS before From Figure 10a, the earliest stacked block with the highest accuracy is indeed block 2. This is consistent with the value filtered out by IPS-SSDStacked-BLS.From Figure 10b, one can see that the earliest number of stacked blocks with the highest accuracy is indeed block 3, which is consistent with PCA-SSDStacked-BLS.As can be seen from Figure 10c, the earliest number of stacked blocks with the highest accuracy is block 2, which is consistent with the value filtered by SVD-SSDStacked-BLS.From the change of accuracy in subsequent tests, the accuracy will not increase again even if new blocks continue to be added.Therefore, the SSDStacked-BLS proposed in this article has advantages when finding the optimal number of iterative blocks.
We know that in the article presented by stacked BLS, the authors used a grid search method for finding the optimal number of stacked blocks.The grid search method is an optimization search for the number of blocks, i.e., a parameter optimization.This will result in the re-stacking of the block count with each re-selection, which increases the computational cost.As stacked BLS is based on the characteristics of residual training, a stacked block count screening method is proposed in this article.This avoids restacking for each optimization and thus reduces computing time.To verify the advantage SSDStacked-BLS and its stacked blocks, it is compared with stacked BLS of the grid search method proposed by the original authors, and the results are shown in Table 4, as well as some further information.Both IPS-SSDStacked-BLS and IPS-Stacked BLS can obtain the best block count of 2 and the best accuracy of 0.9667.However, the running time of IPS-stacked BLS is indeed more than five times that of IPS-SSDStacked-BLS.It is the advantage of stacked BLS to be able to calculate in terms of residuals and thus allow the results to be retained and overlaid each time.This saves even more time.According to the research, broad learning has some advantages that deep learning and machine learning do not.Machine learning is less accurate and deep learning is time-consuming.Therefore, several comparison schemes were chosen.First, the IPS-BLS method was chosen, which uses the same parameters as IPS-SSDStacked-BLS, i.e., N1 = 5, N2 = 4, and N3 = 25.Fuzzy BLS is used for fault diagnosis, where the value of the fuzzy rule is 2 and the value of fuzzy subsystems is 6, and the number of augmented nodes is 20.In addition, two machine learning algorithms were chosen, i.e., SVM and RF.IPS was applied for feature selection followed by fault diagnosis.Two deep learning algorithms, 1DCNN and 2DCNN, were also selected.Feature selection was performed with IPS before using 1DCNN.Among these, 1DCNN selects two convolutional and two pooling layers, with the epoch set to 100, while 2DCNN performs fault diagnosis on images directly and uses two convolutional layers and two pooling layers, with the epoch set to 30.The results are shown in Table 5.As can be seen from Table 5, the time used by IPS-BLS is slightly less than that of IPS-SSDStacked-BLS.But its accuracy is only 0.9222.IPS-Fuzzy BLS can also achieve an accuracy of 0.9667, but the consumption time takes 0.3253 s.IPS-SVM and IPS-RF test accuracies are 0.9000 and 0.8778, respectively, which are 18 and 12 times than IPS-SSDStacked-BLS.SVM and RF, as typical representatives of machine learning, have insufficient generalizability.Although the machine learning algorithm is faster, it cannot be compared with the speed of width learning.The accuracy of IPS-1DCNN is 0.9333 and the time is 38.2532 s, which is 216 times faster than IPS-SSDStacked-BLS.This indicates that the use of deep learning is time-consuming.Although 2DCNN can achieve higher accuracy, it consumes tens of thousands of times more time than IPS-SSDStacked-BLS.
To  5.As can be seen from Table 5, the time used by IPS-BLS is slightly less than that of IPS-SSDStacked-BLS.But its accuracy is only 0.9222.IPS-Fuzzy BLS can also achieve an accuracy of 0.9667, but the consumption time takes 0.3253 s.IPS-SVM and IPS-RF test accuracies are 0.9000 and 0.8778, respectively, which are 18 and 12 times better than IPS-SSD-Stacked-BLS.SVM and RF, as typical representatives of machine learning, have insufficient generalizability.Although the machine learning algorithm is faster, it cannot be compared with the speed of width learning.The accuracy of IPS-1DCNN is 0.9333 and the time is 38.2532 s, which is 216 times faster than IPS-SSDStacked-BLS.This indicates that the use of deep learning is time-consuming.Although 2DCNN can achieve higher accuracy, it consumes tens of thousands of times more time than IPS-SSDStacked-BLS.
To In summary, IPS-SSDStacked-BLS not only has the advantage of wide range in feature selection but can also automatically filter stacked BLS blocks.Compared with other broad learning systems, it possesses higher accuracy.Compared with machine learning, it has more advantages in terms of runtime and diagnostic accuracy.Compared with deep learning, it has a hundred or even ten thousand times faster diagnostic time.Therefore, its robustness is better.In summary, IPS-SSDStacked-BLS not only has the advantage of wide range in feature selection but can also automatically filter stacked BLS blocks.Compared with other broad learning systems, it possesses higher accuracy.Compared with machine learning, it has more advantages in terms of runtime and diagnostic accuracy.Compared with deep learning, it has a hundred or even ten thousand times faster diagnostic time.Therefore, its robustness is better.

Conclusions
The IPS-SSDStacked-BLS model is proposed in this article.The model overcomes the drawback of incomplete feature coverage using one feature selection method.In addition, a preferential scheme for the number of stacked blocks n is set using the residual nature of stacked BLS.The experimental results show that the fault diagnosis method exhibits good advantages as follows: (1) IPS feature selection method, which combines the features selected by both PCA and SVD.This enables a wide range of feature coverage and thus can enhance the results of fault diagnosis.
(2) Infrared images offer even more of an advantage in terms of visibility.
(3) The ability to take advantage of the residual training of stacked BLS to store u i and y i in each round.A hop-out loop with the preferred n is designed and the duplicate stacking of the optimization algorithm is omitted, which reduces the computation time.
(4) The kernel of IPS-SSDStacked-BLS is a BLS algorithm, which outperforms machine learning and deep learning in terms of diagnostic time and accuracy.
After experimental comparison, IPS feature selection significantly outperforms PCA and SVD alone.IPS-SSDStacked-BLS takes advantage of residuals and stops block stacking three times when accuracy stops growing.Its speed of preferential n is better than stacked BLS using grid search method.IPS-SSDStacked-BLS outperforms machine learning and deep learning in terms of diagnostic time and accuracy.The algorithms investigated in this article are also applicable to other types of bearing fault diagnosis.Only one type of data is provided in this paper, something which will need to be improved on in the future.

Figure 1 .
Figure 1.The structure of BLS.

Figure 1 .
Figure 1.The structure of BLS.
the set of random weights of mapped nodes; and random weights of enhancement nodes.

.
weights of randomly generated mapped nodes and enhancement nodes, respectively; and In this process, the output of the previous layer of blocks is passed to the input of the new layer of blocks.Ei W andHi W in the above equation are obtained by the following equation:

Figure 3 .
Figure 3.The whole network of the stacked BLS.

Figure 3 .
Figure 3.The whole network of the stacked BLS.

Figure 4 .
Figure 4. Infrared image acquisition.The (a) test bench and (b) infrared video screen.

Figure 4 .
Figure 4. Infrared image acquisition.The (a) test bench and (b) infrared video screen.
In this experiment, we established nine types of bearing failures.Namely, healthy bearing, holder failure bearing, inner ring 0.5 mm crack, inner ring 1.0 mm crack, inner ring 1.5 mm crack, outer ring 0.5 mm crack, outer ring 1.0 mm crack, outer ring 1.5 mm crack and rolling element failure bearing.The corresponding abbreviations are HE, HO, IN05, IN10, IN15, OU05, OU10, OU15 and RO, respectively.

Mathematics 2023 , 19 is stacked, the outputs 1 u
11,  x FOR PEER REVIEW 12 of and 1 1 y u = are stored in region 1 and region 2, respectively.The second time another block is added and 2 respectively.In this order, n blocks are stacked for the n-th time and output + .These are stored in region 1 and region 2 respectively.
verify the robustness of the model, 20 experiments were performed on the data.The results are shown in Figure 11.The training accuracy of the 20 experiments is exhibited in Figure 11a, and the running time of the 20 experiments is shown in Figure 11b.Accuracy remained between 0.9556 and 0.9778 for each round of testing.The average value was 0.9700.Running time ranged from 0.15 to 0.23 s with an average value of 0.1837.Mathematics 2023, 11, x FOR PEER REVIEW 17 of 19 uses two convolutional layers and two pooling layers, with the epoch set to 30.The results are shown in Table

Figure 11 .
Figure 11.Results of 20 experimental runs.(a) Accuracy of the test and (b) running time.

Figure 11 .
Figure 11.Results of 20 experimental runs.(a) Accuracy of the test and (b) running time.

Table 1 .
Types of experimental equipment and parameters.

Table 1 .
Types of experimental equipment and parameters.

Table 3 .
Fault diagnosis results under different feature selection methods.

Table 4 .
Comparison of different optimization methods.

Table 4 .
Comparison of different optimization methods.

Table 5 .
Comparison of machine learning, deep learning and broad learning methods.

Table 5 .
Comparison of machine learning, deep learning and broad learning methods