Automatic Defect Identiﬁcation Method for Magnetic Particle Inspection of Bearing Rings Based on Visual Characteristics and High-Level Features

Featured Application: The magnetic particle indication characteristics of bearing ring surfaces in the ﬂuorescent magnetic particle inspection and crack detection method based on image processing were investigated, providing a valuable reference for automatic crack detection in magnetic particle inspection. Abstract: Fluorescent magnetic particle inspection (MPI) is a conventional non-destructive testing process for railway bearing rings that still needs to be completed manually. Due to the complexity of bearing ring surfaces in inspection, automatic detection for bearing rings based on image processing is difﬁcult to apply. Therefore, we proposed a bearing ring defect identiﬁcation method based on visual characteristics and high-level features. Inspired by the mechanism of human visual perception, defects can be identiﬁed from the complex background conveniently by human eyes. According to the linear structure characteristics and greyscale distribution characteristics of cracks in the acquired images, we introduce the centerline extraction and Gaussian similarity measure to reduce background noise and obtain the crack candidate regions. Then, an improved MobileNetV3 is used to extract high-level features of the candidate regions and determine whether they are defective, which uses a new attention module, Coordinate Attention (CA), to substitute the Squeeze-and-Excitation (SE) attention to improve the performance. The experimental results show that the detection accuracy rate of the proposed method is 96.5%. Compared with traditional methods, the proposed method can efﬁciently extract crack defects in a complex textured background and shows high-quality performance in recall and precision.


Introduction
Bearing rings are important parts of mechanical equipment, and their quality directly affects the performance and reliability of mechanical products.During the actual processing of bearing rings, it is critical to detect surface defects and prevent their failure, which is related to the quality of the whole product.Electromagnetic non-destructive testing plays an increasingly important role in safety and production quality [1,2].Because the human eyes cannot directly observe the cracks, fluorescent magnetic particle inspection (MPI) is widely used for the ring surface detection of railway bearings due to its high sensitivity and low cost [3,4].However, surface defects are always inspected by human experts, which invest considerable labor and time.Therefore, it is necessary to apply computer vision-based techniques for automatic crack defect identification in fluorescent magnetic particle inspection of bearing rings.With the development of computer vision science and technology, automated defect detection is becoming increasingly popular [5][6][7][8][9][10].Many scholars have carried out several defect detection algorithms in fluorescent magnetic particle inspection [11][12][13][14].The processing and analysis of surface defect images are common methods for detecting defects.Traditional methods are mainly used for the automatic detection of surface defects through image processing and machine learning techniques.Most of the image processing methods used smoothing and enhancement algorithms to preserve crack regions and filter the interference simultaneously [15,16].Furthermore, techniques, such as edge-detection [17,18], thresholding in greyscale images [19,20], and image segmentation are typically used to segment defect regions from the images.Because images collected in the production workshop usually contain noise or textured backgrounds, there are many non-defective regions in the segmented images.To suppress complex interference, defect identification is a crucial module of the defect detection algorithm, which is usually composed of feature extractors and machine learning algorithms.Feature extractors are often used to obtain hand-designed features that describe the geometric or texture information of cracks, such as Hu invariant moment features [21], local binary pattern (LBP) features [22], histogram of oriented gradients (HOG) features [23], and SIFT features [24].These features are fed into a trained classifier to decide whether the input image contains defects.For example, in [12], a maximum entropy threshold segmentation algorithm was used to obtain crack regions, and geometric features, greyscale features, and texture features of the regions were extracted.Then, a sparse self-coding neural network was applied to identify cracks.Ma Tao et al. acquired the centerline of the continuous accumulations of magnetic particles in the inspection image and extracted the SIFT features of these regions.Based on these features, they used a support vector machine (SVM) classifier for defect classification [25].However, these methods using hand-designed features extracted from images were less effective in distinguishing cracks under complex conditions.Any change in these conditions, such as illumination, may have affected the reliability of these methods.
In recent years, deep learning algorithms such as convolutional neural networks (CNNs) have been used for the detection of fluorescent MPI procedures.CNN models can extract relevant features from the input image through multilayer neural networks, which are superior to traditional methods based on hand-designed features and machine learning algorithms [26].They can automatically extract high-level features by using the prior knowledge of known image samples and reduce the dependence on expert knowledge.For example, Wang et al. used the improved EfficientNet model to detect cracks in fluorescent MPI images of forgings [27].Tout et al. ran a fixed-size sliding window on the whole image and used ResNet-34 to determine whether each window contained defects [28].Although algorithms for automatic detection of magnetic particle inspection images based on image processing have been continuously proposed, mature industrial applications are still rare.
This paper mainly focuses on crack defect detection in MPI for bearing rings.As shown in Figure 1, there are many complex textures and false indications in the MPI image of the bearing ring.Figure 1a show an MPI image of the bearing outer ring containing crack defects, which are captured by the industrial camera in the bearing factory.We analyze the geometric characteristics of crack indications and common interference factors in Figure 1a by drawing their three-dimensional intensity distribution maps with MATLAB.As shown in Figure 1b, the crack indications generally represent yellow-green linear shapes due to the accumulation of fluorescent magnetic particles.As shown in Figure 1c, there are many randomly distributed magnetic particles caused by rough surfaces, which form the complex textured background in MPI images of bearing rings.The intensities of the crack indications and the randomly distributed magnetic particles are similar, which results in low contrast between the crack indications and background.In addition, some accumulations of fluorescent magnetic particles that also appear as linear shapes may be mistaken for cracks, and we usually call these false indications.Figure 1d show the false indications formed by magnetic particles gathered at the groove of the workpiece.Therefore, the accurate location of complete cracks in MPI images with complex texture backgrounds is challenging.Traditional methods based on hand-designed features of images and machine learning algorithms are useful for images with a simple background.However, these techniques cannot deal with noise, variable illumination, and backgrounds with complex textures.Although existing detection methods based on deep learning algorithms have improved the performance of feature expression, many computations in locating cracks are required, which cannot meet the real-time requirements of detection.In addition, some accumulations of fluorescent magnetic particles that also appear as linear shapes may be mistaken for cracks, and we usually call these false indications.Figure 1d show the false indications formed by magnetic particles gathered at the groove of the workpiece.Therefore, the accurate location of complete cracks in MPI images with complex texture backgrounds is challenging.Traditional methods based on handdesigned features of images and machine learning algorithms are useful for images with a simple background.However, these techniques cannot deal with noise, variable illumination, and backgrounds with complex textures.Although existing detection methods based on deep learning algorithms have improved the performance of feature expression, many computations in locating cracks are required, which cannot meet the real-time requirements of detection.To solve the problems mentioned, a bearing ring defect detection method combining visual characteristics and high-level features is proposed.Human vision can quickly locate the region of interest in complex backgrounds.For the magnetic particle inspection of bearing rings, the human eye can quickly find cracks according to the visual characteristics of cracks.The geometric characteristics and greyscale distribution characteristics of crack are introduced to represent the visual characteristics.The main contributions of the study include (1) presenting an image preprocessing algorithm based on the visual characteristics of crack indications to obtain crack defect candidate regions, which realize the rough location of cracks, and (2) introducing the improved MobileNetV3 model to extract high-level features of crack defects for further discrimination.

Methods
In this section, the proposed detection method is described in detail.The detection method is divided into two stages: acquisition of crack defect candidate regions and identification of the improved MobileNetV3.Inspired by the mechanism of human visual perception, the image processing algorithm based on visual characteristics is applied to obtain crack candidate regions.In the first stage, based on the linear structure characteristics of the crack, the centerline of linear magnetic particles in the input image is obtained.Then, Gaussian similarity is used to measure the greyscale distribution characteristics of crack defects.According to this measure, some false indications are removed from the centerline result image.The circumscribed rectangle of the centerline corresponding to the original image is obtained as the candidate regions to realize the rough location of the crack defect.In the second stage, the candidate regions are patched To solve the problems mentioned, a bearing ring defect detection method combining visual characteristics and high-level features is proposed.Human vision can quickly locate the region of interest in complex backgrounds.For the magnetic particle inspection of bearing rings, the human eye can quickly find cracks according to the visual characteristics of cracks.The geometric characteristics and greyscale distribution characteristics of crack are introduced to represent the visual characteristics.The main contributions of the study include (1) presenting an image preprocessing algorithm based on the visual characteristics of crack indications to obtain crack defect candidate regions, which realize the rough location of cracks, and (2) introducing the improved MobileNetV3 model to extract highlevel features of crack defects for further discrimination.

Methods
In this section, the proposed detection method is described in detail.The detection method is divided into two stages: acquisition of crack defect candidate regions and identification of the improved MobileNetV3.Inspired by the mechanism of human visual perception, the image processing algorithm based on visual characteristics is applied to obtain crack candidate regions.In the first stage, based on the linear structure characteristics of the crack, the centerline of linear magnetic particles in the input image is obtained.Then, Gaussian similarity is used to measure the greyscale distribution characteristics of crack defects.According to this measure, some false indications are removed from the centerline result image.The circumscribed rectangle of the centerline corresponding to the original image is obtained as the candidate regions to realize the rough location of the crack defect.In the second stage, the candidate regions are patched into 224 × 224 images and input into the trained improved MobileNetV3 model, which is used to further determine whether the candidate regions are defective.An overview of the bearing ring defect detection method is shown in Figure 2.
into 224 × 224 images and input into the trained improved MobileNetV3 model, which is used to further determine whether the candidate regions are defective.An overview of the bearing ring defect detection method is shown in Figure 2.

Centerline Extraction Based on the Steger Algorithm
The retention of magnetic particles on the surface makes the detection images form a complex textured background.The intensities of crack defects are similar to those of magnetic particles on the surface, which results in low contrast between the defect and background.The crack regions cannot be extracted completely by threshold segmentation of the grey images.By analyzing the geometric characteristics of crack defects and surface magnetic particles, it is concluded that the crack defects are usually linear with a certain width, while the magnetic particle particles are randomly distributed points.Therefore, crack defects in the inspection images can be considered curvilinear stripes.
The Steger algorithm is one of the most commonly used algorithms in curvilinear stripe center extraction due to its precision and robustness [29].The center of the curvilinear stripe is given by the extreme point of greyscale in the normal direction.The normal direction of the curvilinear structure is determined by calculating the eigenvalues and eigenvectors of the Hessian matrix as follows: ( , ) In the normal direction, the extreme point of the section grey distribution curve is regarded as the subpixel location.The retention of magnetic particles on the surface makes the detection images form a complex textured background.The intensities of crack defects are similar to those of magnetic particles on the surface, which results in low contrast between the defect and background.The crack regions cannot be extracted completely by threshold segmentation of the grey images.By analyzing the geometric characteristics of crack defects and surface magnetic particles, it is concluded that the crack defects are usually linear with a certain width, while the magnetic particle particles are randomly distributed points.Therefore, crack defects in the inspection images can be considered curvilinear stripes.
The Steger algorithm is one of the most commonly used algorithms in curvilinear stripe center extraction due to its precision and robustness [29].The center of the curvilinear stripe is given by the extreme point of greyscale in the normal direction.The normal direction of the curvilinear structure is determined by calculating the eigenvalues and eigenvectors of the Hessian matrix as follows: In the normal direction, the extreme point of the section grey distribution curve is regarded as the subpixel location.d xx , d xy , and d yy are the partial derivatives of the image obtained by convoluting the image d(x, y) with discrete two-dimensional Gaussian partial derivative kernels, which are computed as follows.
The formula G(x, y) is a two-dimensional Gaussian convolution kernel, and the values in the kernel are defined as follows: where σ represents the standard deviation.d(x, y) is an image matrix centered on the point (x, y), which is equal to the size of the Gaussian kernel G(x, y).
Then, the normal direction of the pixel point (x 0 , y 0 ) is given by the eigenvector (n x , n y ) corresponding to the maximum Eigenvalue of H(x, y) at pixel point (x 0 , y 0 ).The gray distribution function in the normal direction of the point (x 0 , y 0 ) is expanded by the second-order Taylor expansion, which is computed as follows: We take the extreme point of gray in the normal direction as the center point.The point can be obtained by setting the first derivative of d(x 0 + tn x , y + tn y ) to zero.Therefore, the subpixel center point is (x 0 + tn x , y 0 + tn y ), the value of t is shown below: where 2.1.2.Gaussian Similarity Measure In the fluorescent magnetic particle inspection images of bearing rings, there are some false indications caused by the accumulation of magnetic particles at the edge or groove of the workpiece.The false indications and cracks have similar linear structural characteristics and obtain the centerline by the Steger algorithm.To suppress the interference of false indications in the process of crack extraction, we analyze the greyscale distribution property of cracks.In the normal direction of the local area of the crack, the curve of grey levels is similar to the Gaussian curve, as shown in Figure 3.According to the above characteristics, the 9 × 3 sliding window is designed to obtain the normal greyscale distribution of the centerline obtained by the Steger algorithm, as shown in Figure 4.The greyscale values in the tangent direction of the crack basically remained unchanged, while the change in the normal direction followed Gaussian variation.Therefore, we assign values to the sliding window according to these characteristics.The sliding window is divided into three rows, 1 l ,  To fit the greyscale distribution of cracks in the normal direction, a Gaussian curve g(x) is defined as follows: In Equation ( 10), b represents the vertical offset of the Gaussian curve, which is equal to the average greyscale of the crack.As a rule of thumb, we set b to 90 in this work.a + b is the maximum value of the curve and a is defined as follows: According to the above characteristics, the 9 × 3 sliding window is designed to obtain the normal greyscale distribution of the centerline obtained by the Steger algorithm, as shown in Figure 4.The greyscale values in the tangent direction of the crack basically remained unchanged, while the change in the normal direction followed Gaussian variation.Therefore, we assign values to the sliding window according to these characteristics.The sliding window is divided into three rows, l 1 , l 2 , l 3 , and the value distribution of each row follows the variation of g(x).According to the above characteristics, the 9 × 3 sliding window is designed to obtain the normal greyscale distribution of the centerline obtained by the Steger algorithm, as shown in Figure 4.The greyscale values in the tangent direction of the crack basically remained unchanged, while the change in the normal direction followed Gaussian variation.Therefore, we assign values to the sliding window according to these characteristics.The sliding window is divided into three rows, 1 l ,  The sliding window traversed all pixels of the extracted centerline in inspection images, and the y-axis (vertical direction) of the window is rotated to coincide with the e2 direction (tangent direction of the extracted centerline).Then, we compare all pixel values in the mask region and the corresponding values in the sliding window before obtaining their cosine similarity.The Gaussian similarity of the regions is measured using cosine similarity by calculating the cosine value of the angle between two vectors.
The closer the cosine value is to 1, the closer the angle  is to 0, and the more similar the two vectors are.The cosine similarity i C between the vector i g in the row( i ) of the sliding window and the corresponding vector i h in the image is calculated as follows: The sliding window traversed all pixels of the extracted centerline in inspection images, and the y-axis (vertical direction) of the window is rotated to coincide with the e2 direction (tangent direction of the extracted centerline).Then, we compare all pixel values in the mask region and the corresponding values in the sliding window before obtaining their cosine similarity.The Gaussian similarity of the regions is measured using cosine similarity by calculating the cosine value of the angle between two vectors.
The closer the cosine value is to 1, the closer the angle θ is to 0, and the more similar the two vectors are.The cosine similarity C i between the vector g i in the row (i) of the sliding window and the corresponding vector h i in the image is calculated as follows: where g i is the value of each coordinate point in one row of the sliding window and h i is the greyscale value of the corresponding pixel in the original image.After calculating the similarity between the three linear vectors of the sliding window and the greyscale value vector of the original image corresponding to the window.The Gaussian similarity GS is obtained by weighted average, which is calculated as follows: We reserve the pixels whose Gaussian similarity GS is smaller than the threshold that we set.After this operation, the centerline display caused by the accumulation of magnetic particles at the edge or groove of the workpiece can be effectively removed.As shown in Figure 5, the centerline result in Figure 5b is obtained by the Steger algorithm.Figure 5c show the results after using the Gaussian similarity measure.It can be seen that most of the centerline results caused by the false indications are removed.Then, we take the maximum bounding rectangles of connected regions in the image and merge the adjacent bounding rectangles.After these operations, the crack candidate regions are shown in Figure 5d.As can be seen, due to the geometric similarity between false indications and cracks, some false indications cannot be effectively removed by relying on the visual characteristics alone.Although the candidate regions contain some false indications, the rough location of the crack is acquired, which reduces the computation of the CNN model.
We reserve the pixels whose Gaussian similarity GS is smaller than the threshold that we set.After this operation, the centerline display caused by the accumulation of magnetic particles at the edge or groove of the workpiece can be effectively removed.As shown in Figure 5, the centerline result in Figure 5b is obtained by the Steger algorithm.Figure 5c show the results after using the Gaussian similarity measure.It can be seen that most of the centerline results caused by the false indications are removed.Then, we take the maximum bounding rectangles of connected regions in the image and merge the adjacent bounding rectangles.After these operations, the crack candidate regions are shown in Figure 5d.As can be seen, due to the geometric similarity between false indications and cracks, some false indications cannot be effectively removed by relying on the visual characteristics alone.Although the candidate regions contain some false indications, the rough location of the crack is acquired, which reduces the computation of the CNN model.

Improved MobileNetV3 CNN Model
To seek a better solution for suppressing the interference by the false indications, the CNN model is proposed to extract the high-level features of candidate regions and further identify the defects.Using depth-wise separable convolution instead of standard convolution, MobileNets considerably reduces the number of parameters and computation.Although the parameters or computation of the lightweight network is reduced, the accuracy of classification also decreases correspondingly.Considering the real-time and accuracy requirements of detection, the improved MobileNetV3 is used in our experiments [30].However, the SE module used in the original model is mainly for the attention mechanism of the feature channel.To make the network model pay attention

Improved MobileNetV3 CNN Model
To seek a better solution for suppressing the interference by the false indications, the CNN model is proposed to extract the high-level features of candidate regions and further identify the defects.Using depth-wise separable convolution instead of standard convolution, MobileNets considerably reduces the number of parameters and computation.Although the parameters or computation of the lightweight network is reduced, the accuracy of classification also decreases correspondingly.Considering the real-time and accuracy requirements of detection, the improved MobileNetV3 is used in our experiments [30].However, the SE module used in the original model is mainly for the attention mechanism of the feature channel.To make the network model pay attention to the important characteristics of crack defects in both channel and spatial dimensions, we used a new attention module Coordinate Attention (CA) module to substitute the SE module in the original MobileNetV3 [31].

Coordinate Attention Module
The CA module considers a more efficient way of capturing positional information and channel-wise relationships to augment the feature representations for mobile networks.It has been proven that it can effectively improve the performance of the model.It can capture channel relationships and long-range dependencies with precise positional information to encode a pair of direction-aware and location-sensitive attention maps, which can effectively enhance the representation of defect discriminative regions and learn richer discriminative information between categories.The coordinate attention module is shown in Figure 6.
Firstly, the horizontal and vertical adaptive average pooling layers are used to extract the features of each input feature channel.Then, generated feature maps are concatenated, and 1 × 1 convolution is used to generate an intermediate feature map with vertical and horizontal spatial information.The intermediate feature map is divided into two feature maps along the spatial direction and two 1 × 1 convolutions to convert the number of channels.Finally, the attention weights in vertical and horizontal spatial directions are obtained and multiplied with the input feature map to obtain the output feature map with attention weight.

Coordinate Attention Module
The CA module considers a more efficient way of capturing positional information and channel-wise relationships to augment the feature representations for mobile networks.It has been proven that it can effectively improve the performance of the model.It can capture channel relationships and long-range dependencies with precise positional information to encode a pair of direction-aware and location-sensitive attention maps, which can effectively enhance the representation of defect discriminative regions and learn richer discriminative information between categories.The coordinate attention module is shown in Figure 6.Firstly, the horizontal and vertical adaptive average pooling layers are used to extract the features of each input feature channel.Then, generated feature maps are concatenated, and 1 × 1 convolution is used to generate an intermediate feature map with vertical and horizontal spatial information.The intermediate feature map is divided into two feature maps along the spatial direction and two 1 × 1 convolutions to convert the number of channels.Finally, the attention weights in vertical and horizontal spatial directions are obtained and multiplied with the input feature map to obtain the output feature map with attention weight.

Architecture of Improved MobileNetV3
MobileNetV3 is composed of convolution layers, bottleneck layers with SE modules, and pooling layers.By introducing the CA module into the bottleneck layers, the overall architecture of the improved MobileNetV3 is shown in Figure 7. Conv2d stands for convolutional layer.Pool means adaptive average pooling layer.NL denotes the activation function.Dwise is the depth-wise separable convolution structure.The bottleneck includes a depth-wise separable convolution layer and a coordinate attention module, which use an inverted residual connection to connect the input and output feature maps.The inverted bottleneck structure enables feature extraction to be performed in high dimensions so as to extract more feature information and reduce the number of parameters.The depth-wise separable convolution layer is composed of a 3 × 3 depthwise convolutional kernel applied to each channel and a 1 × 1 point-wise convolutional kernel with batch normalization layer and the ReLU or h-swish activation functions.
The operation and parameter details of each layer in the improved MobileNetV3 is described in Table 1.Input represents the size of the input feature map.Operator represents the type of each layer in the network.Kernel size refers to the size of the convolution kernel or pooling kernel applied in each operator, and stride is the step size of the convolution or pooling operation.Out denotes the dimension of the output feature

Architecture of Improved MobileNetV3
MobileNetV3 is composed of convolution layers, bottleneck layers with SE modules, and pooling layers.By introducing the CA module into the bottleneck layers, the overall architecture of the improved MobileNetV3 is shown in Figure 7. Conv2d stands for convolutional layer.Pool means adaptive average pooling layer.NL denotes the activation function.Dwise is the depth-wise separable convolution structure.The bottleneck includes a depth-wise separable convolution layer and a coordinate attention module, which use an inverted residual connection to connect the input and output feature maps.The inverted bottleneck structure enables feature extraction to be performed in high dimensions so as to extract more feature information and reduce the number of parameters.The depth-wise separable convolution layer is composed of a 3 × 3 depth-wise convolutional kernel applied to each channel and a 1 × 1 point-wise convolutional kernel with batch normalization layer and the ReLU or h-swish activation functions.The operation and parameter details of each layer in the improved MobileNetV3 is described in Table 1.Input represents the size of the input feature map.Operator represents the type of each layer in the network.Kernel size refers to the size of the convolution kernel or pooling kernel applied in each operator, and stride is the step size of the convolution or pooling operation.Out denotes the dimension of the output feature map by the operator.
Appl.Sci.2022, 12, 1293 9 of 14 CA Indicates whether there is a Coordinate Attention module in that block or not.NL denotes the type of activation function in each layer.For the whole network architecture, the generated candidate regions are pathed and resized into 224 × 224-pixel images, which are fed into the well-trained improved MobileNetV3.The high-level feature maps with a size of 1 × 1 × 1280 are obtained by a series of operations of convolution, bottleneck, and pooling.At the last layer of the network, a 1 × 1 convolution is applied to drop the 1280-dimensional to 2-dimensional, and the final 2-dimensional output represents the prediction result.

Dataset Description and Implementation Details
Bearing ring samples are collected using an industrial camera in the flaw detection room of the bearing factory, which contains 519 images of crack defects.All these bearing rings have at least one crack on their surface and are located in various parts of the bearing rings.Training versus test analysis is used as the validation method of the detection algorithm [32].All original images are divided into two groups: 419 for training the CNN model, 100 for testing.
After we obtain several candidate regions, the improved MbileNetV3 model is used to distinguish whether they are defects.The input of binary classification of the CNN model is the crack defect candidate region from the original image, and the input size is 224 × 224.To create the dataset for the training and evaluation of binary classification, patches of size 224 × 224 are extracted from the 419 images of bearing rings.Patches that contain a crack or a part of a crack are labeled defective, and patches that do not contain any parts of a crack are considered background and labeled non-defective.We usually cut the regions in the original image that are easily mistaken for defects, such as the edge and groove of the workpiece, as shown in Figure 8.
To solve the imbalance between classes in the dataset, data augmentation and image enhancement techniques are performed to enhance the quantity and variety of images given to the classifier for classification, such as horizontal flip, rotation, width shift, and height shift.The dataset consists of 2416 non-defective patches and 2420 defective patches.
We split the dataset in an 8:1:1 ratio; 80% of the total dataset is used for training, 10% is used for validation, while the remaining 10% is for the test, which is shown in Table 2.
After we obtain several candidate regions, the improved MbileNetV3 model is used to distinguish whether they are defects.The input of binary classification of the CNN model is the crack defect candidate region from the original image, and the input size is 224 × 224.To create the dataset for the training and evaluation of binary classification, patches of size 224 × 224 are extracted from the 419 images of bearing rings.Patches that contain a crack or a part of a crack are labeled defective, and patches that do not contain any parts of a crack are considered background and labeled non-defective.We usually cut the regions in the original image that are easily mistaken for defects, such as the edge and groove of the workpiece, as shown in Figure 8.To solve the imbalance between classes in the dataset, data augmentation and image enhancement techniques are performed to enhance the quantity and variety of images given to the classifier for classification, such as horizontal flip, rotation, width shift, and height shift.The dataset consists of 2416 non-defective patches and 2420 defective patches.We split the dataset in an 8:1:1 ratio; 80% of the total dataset is used for training, 10% is used for validation, while the remaining 10% is for the test, which is shown in Table 2. To test the proposed method, a 4.20 GHz Intel(R) Core (TM) i7-7700K CPU and 32 GB of memory are used.The graphics card is a GTX 1060 Nvidia, the computer runs under a Windows 10 system, and the detection program was developed in a Python 3.6 environment with packages PyTorch, NumPy, and OpenCV.To test the proposed method, a 4.20 GHz Intel(R) Core (TM) i7-7700K CPU and 32 GB of memory are used.The graphics card is a GTX 1060 Nvidia, the computer runs under a Windows 10 system, and the detection program was developed in a Python 3.6 environment with packages PyTorch, NumPy, and OpenCV.
Accuracy is used to quantitatively evaluate the performance of the classification task [33], which is calculated as follows: In addition, the recall rate and precision rate are defined and used to evaluate our detect method, which are calculated as follows: where TP is the sample number of the image that actually has defects and our model is also predicted to have defects, TN is the sample number of the image that actually has defects and our model is also predicted to have, FN is the sample number of the image that actually has defects, but our model predicts that it does not, FP is the sample number of the image that actually does not have a defect, but our model predicts that it does.All these indicators are ranged [0, 1], high precision means more correct detection results, and high recall rate means fewer missed detection targets in the detection results.

Identification Performance of the Improved MobileNetV3
To evaluate the identification performance of the improved MobileNetV3 model in this paper, it is compared with the MobileNetV3, ResNet34, GoogleNet, and VGG16 models.
Adam is selected as the optimizer of the CNNs, the number of iteration steps is set to 100, and the learning rate is set to 0.0001.The CNNs are trained using the dataset we split, and the training process is shown in Figure 9.
number of the image that actually does not have a defect, but our model predicts that it does.All these indicators are ranged [0,1], high precision means more correct detection results, and high recall rate means fewer missed detection targets in the detection results.

Identification Performance of Improved MobileNetV3
To evaluate the identification performance of the improved MobileNetV3 model in this paper, it is compared with the MobileNetV3, ResNet34, GoogleNet, and VGG16 models.Adam is selected as the optimizer of the CNNs, the number of iteration steps is set to 100, and the learning rate is set to 0.0001.The CNNs are trained using the dataset we split, and the training process is shown in Figure 9. Figure 9a show the trend of the validation accuracy as the number of iterations increases.The validation accuracy of CNNs varies from low to high and gradually tends to be stable.After 20 epochs, the difference between the five models is obvious.Figure 9b show the curves of the cross entropy loss function for each epoch.With the increasing training iterations, the trend of validation loss is decreasing.The validation loss value gradually becomes smoother to prove that the CNNs are convergent.The testing results of CNN models are shown in Table 3. Figure 9a show the trend of the validation accuracy as the number of iterations increases.The validation accuracy of CNNs varies from low to high and gradually tends to be stable.After 20 epochs, the difference between the five models is obvious.Figure 9b show the curves of the cross entropy loss function for each epoch.With the increasing training iterations, the trend of validation loss is decreasing.The validation loss value gradually becomes smoother to prove that the CNNs are convergent.The testing results of CNN models are shown in Table 3.The testing results show that the accuracy of the improved MobileNetV3 is 94.9%, higher than that of the above classical CNN models.Compared to the original MobileNetV3, the accuracy of the improved MobileNetv3 is 3.1% higher than that of MobileNetv3.Therefore, the CA attention module is stronger than the SE attention module in the feature expression ability of crack defects.Additionally, it is helpful to improve the recognition accuracy of the model.Table 3 also show the size of the weight files generated by each model and the average run time for a single image.The parameter memory of the improved MobileNetV3 is 12.6 MB, and the average execution time of that is 0.675 s, which has a relatively excellent performance considering speed and memory.

Comparison to Other Methods
To evaluate the effectiveness of the crack defect detection method in MPI for bearing rings, this paper compares the algorithm with other algorithms commonly used in MPI images.Table 4 show a qualitative comparison with traditional methods based on the SIFT features and SVM (SIFT + SVM) [25], shape features and SVM (Shape + SVM) [34], Hu invariant moment, and BP neural network (Hu + BP).One hundred MPI images of bearing rings are selected as the test set to verify their performance.Some of the detection results of the test set are shown in Figure 10.

Conclusions
In this paper, a novel crack identification method for MPI of bearing rings is proposed, which is based on visual characteristics and high-level features.by the mechanism of human visual perception, we introduce two visual characteristics to obtain candidate regions of crack defects.The candidate regions of crack defects are acquired through centerline extraction to achieve the rough location of defects, which reduces the computation of the CNN model.By applying the CA module into the original MobileNetV3 model, our well-trained model can automatically extract the high-level features of the suspected regions and identify them with high accuracy.Experiments are carried out based on the bearing ring defect dataset.The experimental results show that the proposed method achieves a recall rate of 96.5%, a precision rate of 91.7%, and an average detection time of 9.33 s, which indicates that it can reduce the influence of the complex textured backgrounds and false indications and the extracted high-level features from CNN models have strong expression ability and high identification efficiency.As seen in Figure 10, a comparison of the results shows that the crack defects are incompletely and incorrectly detected by the other three methods.In the methods based on shape features and SVM, the crack regions are extracted by thresholding, which causes inaccurate location and missed detection, as shown in Figure 10b.In the method based on Hu and BP and the method based on SIFT and SVM, although cracks are detected, the results also contain false indications.As shown in Table 4, the proposed method has a recall rate of 96.5%, a precision rate of 91.7%, and an average detection time of one image of 9.33 s, which shows better real-time performance than the three traditional methods.

Conclusions
In this paper, a novel crack identification method for MPI of bearing rings is proposed, which is based on visual characteristics and high-level features.Inspired by the mechanism of human visual perception, we introduce two visual characteristics to obtain candidate regions of crack defects.The candidate regions of crack defects are acquired through centerline extraction to achieve the rough location of defects, which reduces the computation of the CNN model.By applying the CA module into the original MobileNetV3 model, our well-trained model can automatically extract the high-level features of the suspected regions and identify them with high accuracy.Experiments are carried out based on the bearing ring defect dataset.The experimental results show that the proposed method achieves a recall rate of 96.5%, a precision rate of 91.7%, and an average detection time of 9.33 s, which indicates that it can reduce the influence of the complex textured backgrounds and false indications and the extracted high-level features from CNN models have strong expression ability and high identification efficiency.
The proposed method is planned to be applied in the fluorescent MPI procedure of bearing rings, and it also provides a reference for crack detection in other fields.In the future, more bearing ring defects should be included and investigated to expand the datasets and improve the robustness of the CNN model.

Figure 1 .
Figure 1.The inspection image of the bearing ring and corresponding magnetic particle distribution.(a)The inspection image of bearing ring; (b) Crack indication and corresponding intensity distribution; (c) Randomly distributed magnetic particles and corresponding intensity distribution; (d) False indications and corresponding intensity distribution.

Figure 1 .
Figure 1.The inspection image of the bearing ring and corresponding magnetic particle distribution.(a) The inspection image of bearing ring; (b) Crack indication and corresponding intensity distribution; (c) Randomly distributed magnetic particles and corresponding intensity distribution; (d) False indications and corresponding intensity distribution.

Figure 2 .
Figure 2. Overview of the bearing ring defect identification method.
yy d are the partial derivatives of the image obtained by convoluting the image ( , ) d x y with discrete two-dimensional Gaussian partial derivative kernels, which are computed as follows.

Figure 3 .
Figure 3.The greyscale distribution curves of crack defects in the normal direction.

2 l , 3 l
, and the value distribution of each row follows the variation of () gx.

Figure 3 .
Figure 3.The greyscale distribution curves of crack defects in the normal direction.

Figure 3 .
Figure 3.The greyscale distribution curves of crack defects in the normal direction.

2 l , 3 l
, and the value distribution of each row follows the variation of () gx.

Figure 4 .
Figure 4. Structure of the sliding window.

Figure 4 .
Figure 4. Structure of the sliding window.

Figure 5 .
Figure 5. (a) Input image of the bearing ring; (b) Centerline result by using the Steger algorithm; (c) Result after using the Gaussian similarity measure; (d) Result of the candidate regions.

Figure 5 .
Figure 5. (a) Input image of the bearing ring; (b) Centerline result by using the Steger algorithm; (c) Result after using the Gaussian similarity measure; (d) Result of the candidate regions.
Appl.Sci.2022, 12, x FOR PEER REVIEW 9 of 15 map by the operator.CA Indicates whether there is a Coordinate Attention module in that block or not.NL denotes the type of activation function in each layer.For the whole network architecture, the generated candidate regions are pathed and resized into 224 × 224-pixel images, which are fed into the well-trained improved MobileNetV3.The highlevel feature maps with a size of 1 × 1 × 1280 are obtained by a series of operations of convolution, bottleneck, and pooling.At the last layer of the network, a 1 × 1 convolution is applied to drop the 1280-dimensional to 2-dimensional, and the final 2-dimensional output represents the prediction result.

Figure 7 .
Figure 7. Network architecture of the improved MobileNetV3.

Figure 7 .
Figure 7. Network architecture of the improved MobileNetV3.

Figure 8 .
Figure 8. Bearing ring images of defective and non-defective patches from the dataset.(a) Defective patches; (b) Non-defective patches.

Figure 8 .
Figure 8. Bearing ring images of defective and non-defective patches from the dataset.(a) Defective patches; (b) Non-defective patches.

Figure 9 .
Figure 9. (a) The accuracy of validation; (b) The value of cross entropy loss function.

Figure 9 .
Figure 9. (a) The accuracy of validation; (b) The value of cross entropy loss function.

Figure 10 .
Figure 10.Detection results using four methods.(a) Original image; (b) The method based on shape features and SVM; (c) The method based on Hu moment invariant feature and BP; (d) The method based on SIFT features and SVM; (e) Our method.

Figure 10 .
Figure 10.Detection results using four methods.(a) Original image; (b) The method based on shape features and SVM; (c) The method based on Hu moment invariant feature and BP; (d) The method based on SIFT features and SVM; (e) Our method.

Table 1 .
Related operations and parameters of the improved MobileNetV3.

Table 1 .
Related operations and parameters of the improved MobileNetV3.

Table 2 .
The dataset division for binary classification.

Table 2 .
The dataset division for binary classification.

Table 3 .
The testing results of the CNN models.

Table 4 .
Performance of the proposed method and other traditional detection methods.