An Ensemble Learning Aided Computer Vision Method with Advanced Color Enhancement for Corroded Bolt Detection in Tunnels

Bolts, as the basic units of tunnel linings, are crucial to safe tunnel service. Caused by the moist and complex environment in the tunnel, corrosion becomes a significant defect of bolts. Computer vision technology is adopted because manual patrol inspection is inefficient and often misses the corroded bolts. However, most current studies are conducted in a laboratory with good lighting conditions, while their effects in actual practice have yet to be considered, and the accuracy also needs to be improved. In this paper, we put forward an Ensemble Learning approach combining our Improved MultiScale Retinex with Color Restoration (IMSRCR) and You Only Look Once (YOLO) based on truly acquired tunnel image data to detect corroded bolts in the lining. The IMSRCR sharpens and strengthens the features of the lining pictures, weakening the bad effect of a dim environment compared with the existing MSRCR. Furthermore, we combine models with different parameters that show different performance using the ensemble learning method, greatly improving the accuracy. Sufficient comparisons and ablation experiments based on a dataset collected from the tunnel in service are conducted to prove the superiority of our proposed algorithm.


Introduction
Railway transportation has become the main mode of land transport with its remarkable carrying capacity and fast speed [1,2]. As an important branch, subway systems have developed rapidly in recent years [3], becoming the preferred traveling way for city dwellers. The lining, which is fixed and arranged by bolts, supports the tunnel structure and guarantees the operation of metros. However, the bolts are exposed to the open air, usually influenced by moisture and air pollutants, and the steel material thus tends to become corroded [4][5][6]. When it comes to maintenance and repair, human-based visual inspection still dominates the tunnel industry, which is also limited by training level. Patrol inspectors have to check all bolts during non-running times such as night and early morning. However, commonly, an inspection team composed of 10 to 15 trained maintainers could check two to three kilometers during a maintenance period of about three hours, which is costly and inefficient. Besides, quite a few bolts are misdiagnosed as normal or corroded due to the poor light in tunnels and fatigue caused by night work. Hence, some researchers have tried many approaches to design an automatic, high-accuracy, and fast detection speed method for practical engineering projects.
Computer Vision (CV), which overcomes the limitations of visual inspection by trained human resources and the ability to detect structural damage in images remotely [7,8], has become a prioritized technique for corroded bolt detection. However, the traditional CV algorithms require the manual design of filter modules, which has poor robustness and low accuracy. Deep learning-based CV bolt corrosion detection becomes available for engineering as deep learning develops [9][10][11]. For instance, Cha et al. [12] developed an autonomous structural visual inspection method via Region-based Convolutional Neural Networks (RCNNs) for real-time damage detection covering concrete cracks, steel and bolt corrosion, and steel delamination. Ta et al. [13] monitored and identified the corrosion levels of corroded bolts in a lab-scale steel structure with good illumination using a Mask-RCNN. Suh et al. [14] adopted a Faster RCNN-based model to detect and locate damage types, including bolt corrosion. These RCNNs search the target area with selective search and generate nearly 2000 eigenvectors for each figure. They are mostly applied in the precise pixel-level detection task. However, it is not easy to deploy RCNN models in practical applications compared to end-to-end models. Plus, it is not necessary to precisely distinguish the target pixels on the corrosion bolt in practice at the expense of speed and cost. Another branch of deep learning target detection algorithms, You Only Look Once (YOLO), reinterprets the principle of object detection tasks from classifications to regressions, speeding up the training and detecting processes [15][16][17]. We select YOLOv5 nano (YOLOv5n) as the basis of our proposed model caused of its speed, end-to-end characteristics, and high precision compared with the two-stage detectors.
Although the YOLOv5n shows its superior performance in computing speed and resource consumption, the complex corrosion targets still require improvements in accuracy. Using multiple models with different preferences, ensemble learning makes a better and more comprehensive decision to avoid the wrong prediction created by weak classifiers. For example, Xu et al. [18] applied ensemble deep learning technology to learn and extract features of forest fires. Mohammad et al. [19] presented an ensemble deep-learning approach to recognize structural corrosion in drone images. Seijo-Pardo et al. [20] concluded ensemble learning of homogeneous and heterogeneous approaches, showing the availability of integrating models with different parameters. Inspired by these works, we put forward ensemble learning with YOLOv5n (YOLOv5n-EL) to raise accuracy without slowing down the computing speed too much.
In addition to the corroded bolt detector, tunnels are usually damp and dim, weakening the tunnel scan image to low definition, poor contrast, and color distortion. These problems bring big troubles to the task of corroded bolt detection in such tunnels, which require figures to be pre-processed to make the features of the image more apparent for better corroded bolt detection. It has been proved that the Retinex theory (a color-invariance-based principle) is effective for low-light image enhancement like night and underwater [21][22][23]. Retinex mainly consists of three basic algorithms-Single Scale Retinex (SSR), MultiScale Retinex (MSR), and MultiScale Retinex with Color Restoration (MSRCR). Compared with SSR and MSR, MSRCR shows better image quality improvement and the ability to avoid the color distortion caused by the imbalance of each color channel proportion after convolution computation. However, the performance still degrades in the dim tunnel environment caused by its Gaussian Blur, which reduces the sharpness of edges while brightening the dark areas. Thus, we proposed the Improved-MSRCR (IMSRCR) algorithm to solve the problem of fuzzy bolt edges in low-illumination tunnel images using auto-matched dynamic filters and L 0 regularization. Through a combination scheme of the IMSRCR and the YOLOv5n-EL, our model appears to have excellent performance at bolt corrosion detection. Our main contributions can be summarized as follows.

1.
We optimized the MSRCR color enhancement algorithm based on auto-matched dynamic filters and L 0 regularization to avoid blurring the image when brightening the dark areas.

2.
We put forward ensemble learning with its fusion strategy combining models with different parameters to improve precision accuracy.

3.
The experiments are conducted on actual data collected from a practical railway tunnel. We disclosed our labeled dataset, the first public corroded lining bolt dataset using a professional tunnel scanner.
The rest of this paper is organized as follows. Section 2 exhaustively describes the proposed approach covering the improved color-enhanced module and ensemble learning algorithm for bolt corrosion detection. Section 3 thoroughly exhibits the details of the experiments, including the dataset, experiment settings, comparison schemes, performance evaluations, and the analysis of the results. Section 4 gives a discussion about the method. Section 5 outlines our main results. Figure 1 depicts the flow chart of the corroded bolt detection scheme in a dim tunnel, including two main modules, i.e., the image color enhancement algorithm and the object detection module. Considering the difficulty of distinguishing corroded and normal bolts in a dim environment, an improved MSRCR (IMSRCR) is proposed to sharpen the contrast between the rust-infected area and the background, enhancing the appearance of image features. Then, for essential prediction speed and training efficiency in the object detection module, YOLOv5n is introduced to finish the object detection and location of corroded bolts on the color enhancement image, which is an end-to-end train and predict structure. For a further step up in accuracy, we propose YOLOv5n-EL based on YOLOv5n. Specifically, we train a series of models with different parameters and adopt ensemble learning to integrate all model outputs.

The Improved IMSRCR Color Enhancement Algorithm
As is well-known, the illumination is poor, so the tunnel images gathered are dim and unclear. Thus, we need to enhance the contrast between the bolts and the background. MSRCR is developed on MSR and SSR based on Retinex theory, which has been approved as an effective color enhancement method. However, MSRCR has a limited effect in dark areas and the edges of the dark areas. In our work, we propose IMSRCR to enhance the bolts features in dark areas. According to Retinex, the observed image I(x, y) can be divided into the reflection component R(x, y) carrying target information and the irradiation component Therefore, image enhancement aims to get rid of the irradiated component and extract a reflective part that carries information about the object. By simple mathematical transformation, we can get the expression of R(x, y) with log R(x, y) = log I(x, y) − log L(x, y). (2) L(x, y) can be estimated through low-pass Gaussian center function F(x, y) and the observed image I(x, y) as where F(x, y) is defined by Meanwhile, F(x, y) should satisfies F(x, y)dxdy = 1.
As a result, the expression of SSR can be obtained from (2)-(4) to The parameter c in (4) is strongly related to the scale of image enhancement. However, the enhancements of SSR are not always satisfactory because the parameter c is not suitable for all kinds of images. In response to the above question, MSR imports Gaussian center function at different scales as where ω k and F k (x, y) meets the Equations (8)-(10).
Although MSR enhances image features at both low and high scales, color distortion will occur as the parameters are different for each color channel. Thus, the color recover factor C is added in MSRCR to keep the appearance true through where i represents the i th color channel and C i can be expressed by in which α denotes controlled nonlinear treatment strength and β is the gain constant.
Although MSRCR performs better in image enhancement comparing MSR and SSR, the edge of the enhanced image is still inconspicuous, which makes the performance of MSRCR degrade in a dim environment. Accordingly, we propose an IMSRCR algorithm to solve the problem of fuzzy bolt edges in low-illumination tunnel images. Our algorithm uses Automatic Guide Filtering (AGF) to estimate the illumination image first and then calculate the reflected image according to the Retinex theory mentioned above. Residual image is extracted by the norm. Finally, the color restoration is carried out on the fused image. The flow path of our algorithm is shown in Figure 2.

Illumination Estimation
In order to reduce the edge blur problem of the Gaussian filter, the Illumination Estimation is powered by AGF, which is different from traditional MSRCR using a Gaussian filter. The illuminance images estimated by AGF and Gaussian filter are shown in Figure 3. Guided filter is a local linear model with smooth edge preserving characteristics [24,25] which is defined as where g is the output image after guided filtering and G is the guided image, a k and b k are the linear coefficients at the sub-windows Ω k , Ω k represents the sub-window with scale r, and t is the index of pixels in Ω k . We specify to input image I as the guided image Q. a k and b k could be defined according to Guiding filtering-related theory as The scale r of the guided filter is set to three values referring to the process of the MSRCR algorithm. The range of three values of scale r is [1, r min ], [r min , r mid ] and [r mid , r max ] respectively [26]. r min , r mid and r max could be determined as where m and n are the width and height of the image, and N is the number of selected scales. To balance the smoothing and edge-preserving effects of guided filtering, an Auto multi-scale selection algorithm is expressed by The illumination estimation result applies AGF to each channel of the input image. The reflection component in the logarithmic domain could be defined according to the Retinex theory where F AGF is the reflected image channel corresponding to the AGF.

Residual Fusion
In order to overcome the problem of F AGF detail loss, we used L 0 norm in IMSRCR [27]. Residual results extracted by L 0 norm is shown in Figure 4.  L 0 norm can be expressed as the number of non-zero elements in a vector. The L 0 norm of image gradient can be expressed as where p and p + 1 are adjacent elements in the image. f p − f p+1 is the image gradient which is the forward difference of the image. # represents the number of pixels in the image that satisfied f p − f p+1 = 0. C( f ) is the L 0 norm of the image gradient.
Taking one-dimensional signal as an example, the objective function can be defined as It must be converted into unconstrained problems for two-dimensional images. We set smoothing parameter λ to 0.01 in combination with our use scene The number of gradients in the horizontal and vertical directions of the image needs to be constrained in the two-dimensional images. The objective function and its constraints are expressed as Since the L 0 norm is non-differentiable, the variable splitting method is used here to relax it into two quadratic programming problems. Finally, the iterative method is used to find the global optimum. We rewrite the objective function as The iterative solution result of the objective function is expressed as As presented in Figure 5, the image processed by IMSRCR is more apparent and has higher color contrast based on subjective visual judgment. And the edge of the bolts is more clear compared with the enhanced image processed by SSR, MSR, and MSRCR. Hence, IMSRCR is developed for the detection module to ensure that the pictures inputted to YOLOv5n have distinct visual features.

Ensemble Learning-Based Corroded Bolts Detection
CV modules with different stages, mainly one and two, are used for object detection tasks. One-stage end-to-end algorithms give the prediction results (type and location) directly through the backbone, while two-stage methods form a series of sample boxes first, then classify and locate the object inside the boxes. So the non-end-to-end structure requires much more time than the one-stage method to train and detect separately, slowing the speed in real corroded bolt detection work. YOLOv5n, a fast and accurate one-stage CV model, is chosen as the baseline of our ensemble learning.

Ensemble Learning Method
Usually, a target detection task is based on one given model to train and learn for a good performance in detection results. As far as we know, there are some excellent models to resolve the detection task, such as YOLO and FCNN. However, the performance of the models mentioned above can still be improved. Adjusting HyperParameters of training is a common technique to improve the model performance. However, it has a limited effect as the structure of the model restricts a better performance. Ensemble learning is a machine learning method that integrates the prediction of multiple deep learning models to improve robustness and detection performance. It processes the multiple model outputs as a decision question. If a mistake occurs on one of the multiple models and the others are right, the final output of ensemble learning will correct the error considering the whole model's outputs. Compared with the single model, ensemble learning combining multiple models will improve the accuracy heavily.
Ensemble learning can be divided into two categories according to training methods: Boosting and Bagging. Boosting constructs a series of object detectors through serial learning, which means the new detector is improved based on the adjustment to the mistake detection data weight in the last detector. In contrast, Bagging is a parallel learning method that utilizes the independence of different detectors to improve performance, while a single detector cannot extract whole features. In our work, Bagging is adopted as the ensemble learning method while we integrate different kinds of models which are independent of each other. The structure of ensemble learning is shown in Figure 6. It is worth noticing that our proposed integrated learning model is a parallel structure, corresponding to the use of multi-threaded parallel learning operations to avoid bringing excessive consumption of model training and inference time.
Bagging draws training data from the whole dataset at random and the drawn training data will be put back before the next round of extraction. This process will be continued for k rounds, so we can get k independent sub-datasets. Every sub-dataset is adopted to train a basic model. As a result, we can get k independent basic models.

Fusion Strategy in Ensemble Learning
Fusion strategy is fundamental in ensemble learning. With an excellent fusion strategy, ensemble learning can combine the strengths of each model and get a better result comparing any single model without ensemble learning. We adopt a probabilistic ensemble method to combine the independent basic models in our work. Assume that we have an object with a label y and two outputs of the basic models x 1 and x 2 ( it can easily be expanded to more outputs ). As Bagging mentioned above, the basic models are independent, so the measurements are also conditionally independent, which can be formulated as p(x 1 , x 2 |y) = p(x 1 |y)p(x 2 |y). (24) This is also can be expressed as p(x 1 |y) = p(x 1 |x 2 , y) as the independence between x 1 and x 2 exists, which means that the x 2 will not be changed if we give the value of y. Our purpose is to get the value of y, which can be expressed as As the independence mentioned above, the probabilistic relation can be written as Utilizing the probabilistic relation, we can calculate the score of y. Given the existence of conditional independence, it can be considered the optimal fusion scheme. The calculation can be formulated as The class prior p(y) can be easily obtained by taking the statistics for y from the dataset. Then, according to (27), the results of all basic models can be fused.

Experiment Settings
Experiments in the study have been implemented on an Intel Core TM i7-11700K CPU (3.6 GHz, 32 GB RAM) and an NVIDIA GeForce RTX 3060 (CUDA version 11.6) with Python 3.9.12 (PyTorch 1.11.0) in 64 Bit Ubuntu 18.04.1 Long Term Support operating system.
To train the module properly, we set the input resolution to 640 × 640 and use Stochastic Gradient Descent (SGD) with 0.9 momenta as the optimizer. The learning rate is initialized to 0.001 and the cosine decay with warm-up is selected as the learning rate schedule. All models have been trained completely in the experiments.
As for data augmentation, we set the image rotation rate to 0.5 and the image translation rate to 0.1. Both the image scale rate and image shear rate are set to 0.5. We mainly used Mosaic to further enhance the performance of the detector, and the Mosaic rate is set to 1.0.

Evaluation Metrics
Taking the popular assessment in the CV detection field as a reference, the performance is evaluated by the average precision, recall rate, precision rate, and F1 score. We determined the predicted box as positive based on a common metric where the Intersection over Union (IoU) between the predicted box and the ground truth box is greater than 0.5. The definition of the targets are where Recall and Precision represent the recall and precision rate, respectively. X TP denotes the number of objects correctly identified as true. X FP denotes the number of misidentifications of false targets. X FN represents the number of objects that fail to be correctly detected. F1 score can be regarded as a weighted average of recall rate and precision rate to evaluate the model comprehensively. The engineering problem pays more attention to the F1 score. From the perspective of recall rate and precision rate, the experiments utilize AP to test the detection accuracy of our method. Table 1 shows the comparative results on the test set between our method and some state-of-the-art detection approaches. Faster-RCNN is a two-stage CNN-based object detector, which is a widely used non-end-to-end detection method [28]. YOLOv5n is a fast and powerful end-to-end detector and YOLOv5s denotes a larger size of YOLOv5n. YOLOv5n6 adds a detection head to YOLOv5n, which can have a larger focus scale on targets. Experiments of different color enhancement algorithms, detection structures, and YOLOs are fully taken into consideration in performance comparison. As shown in Table 1, compared with Faster-RCNN, YOLOv5s, and YOLOv5n, the F1 score of YOLOv5n-EL has been enhanced by 0.148, 0.026, and 0.008, respectively. From the perspective of AP, YOLOv5n-EL achieves 0.970 AP@0.5 and 0.530 AP@0.5:0.95, which is the best of Faster-RCNN (0.832 AP@0.5 and 0.316 AP@0.5:0.95), YOLOv5s (0.957 AP@0.5 and 0.509 AP@0.5:0.95) and YOLOv5n (0.969 AP@0.5 and 0.525 AP@0.5:0.95). This illustrates the advantage of YOLOv5n-EL as a corrosion bolt detector. In this problem, the corroded bolt is the target of fixed scale, and the detection head on a larger scale may cause redundancy of features.Therefore, YOLOv5n6 failed to improve the detection performance. Meanwhile, YOLOv5n6 (0.945 AP@0.5 and 0.506 AP@0.5:0.95), which own a larger size of parameters, get a lower AP than YOLOv5n-EL. The detection time consumption of contrastive models is shown in Table 2. It is clear that the Faster-RCNN costs nearly 10 times longer than the YOLOs in experiments caused by the non-end-to-end structure. Because the features of corroded bolts in the dataset are relatively simple, the model with large parameters may be more prone to overfitting in training. In this detection task, YOLOv5n-EL not only can avoid overfitting but also achieves better performance without wasting too much time (only 7 ms more than YOLOv5n, far less than the consumption of color enhancement). Besides, the FLOPs cost of YOLOv5n-EL is still lower than YOLOv5s, while the results are significantly better. The above analysis shows the correctness of choosing YOLOv5n-EL as the detector. With the color feature enhancement module, Table 1 also shows that MSR makes the detection performance of YOLOv5s and YOLOv5n-EL even worse instead of the enhancement. That is due to MSR causing some color distortion, which makes the data processed deviate from real data distribution. However, we notice that MSR lightly improves the detection performance of YOLOv5n6 and YOLOv5n, which illustrates that, with MSR, the overfitting caused by more parameters is somewhat relieved.

Performance Comparisons
We also compare the results of MSRCR and IMSRCR to evaluate the performance further. It can be seen from Table 1 that, compared with MSRCR, IMRCR enhances the performance of detectors. YOLOv5n-EL achieves 0.975 mAP@0.5 and 0.537 mAP@0.5:0.95 with IMSRCR. IMSRCR effectively enhances the darker areas in the image and improves the intensity of the target edge, which offers more help to the detector. This illustrates the effectiveness of the IMSRCR method. We show the effects of different color enhancement algorithms in Figure 8. In contrast, although MSR and MSRCR can also enhance the color features of the corroded parts, color distortion may occur on other occasions, and the edge is not clear in a dim environment. The IMSRCR can not only strengthen the features significantly but also avoid obscurity in a dim environment, which leads to an improvement in comprehensive detection effectiveness. Furthermore, Table 3 shows the time consumption of different color enhancement methods. Since MSRCR uses Gaussian blur, the enhancement speed is significantly slowed down to undertake many numerical calculations. IMSRCR, however, avoids the shortcomings, and the speed increases by about a quarter.  Figure 9b,c. Comparison between Figures 9-11 shows that different color enhancement algorithms can heighten the significance of features, changing the effect of the total model. In summary, the experimental results indicate that YOLOv5n-EL is an efficient corroded bolt target detector. In addition, the ablation study demonstrates that the IMSRCR is helpful for the enhancement of the color features and improves the detection performance for corroded bolts both in speed and accuracy.

Discussion
The method composed in this paper is a corroded bolt detection model, which combines the IMSRCR module and YOLOv5n-EL into one algorithm. The experimental results on the test set demonstrate that our method has good detection performance for corroded bolts. The parameter size of the YOLOv5n-EL basic model (YOLOv5n) is only about 14 MB, which is suitable for project deployment and real-time detection. Our method outperforms other comparative methods in both accuracy and speed. The color feature enhancement made by IMSRCR is helpful for the detector to detect corroded bolts with inconspicuous corrosion features.

Conclusions
In this paper, a method was put forward for tunnel corroded bolt detection. For this purpose, an efficient CV module with color enhancement and ensemble learning is proposed. Considering the low definition, poor contrast, and color distortion in the tunnel, IMSRCR enhances the color and edge appearance based on auto-matched dynamic filters and L 0 regularization. Moreover, YOLOv5n-EL also directly improves the accuracy of detection. To examine the effectiveness of our model, we collect corroded bolts with a professional tunnel scanner from a practical railway tunnel. It achieves a precision of 0.921 and a recall of 0.975 within 84.237 ms (14.367 + 69.870), which confirms that the IMSRCR + YOLOv5n-EL is the most suitable structure for the task.