Multi-Image-Feature-Based Hierarchical Concrete Crack Identiﬁcation Framework Using Optimized SVM Multi-Classiﬁers and D–S Fusion Algorithm for Bridge Structures

: Cracks in concrete can cause the degradation of stiffness, bearing capacity and durability of civil infrastructure. Hence, crack diagnosis is of great importance in concrete research. On the basis of multiple image features, this work presents a novel approach for crack identiﬁcation of concrete structures. Firstly, the non-local means method is adopted to process the original image, which can effectively diminish the noise inﬂuence. Then, to extract the effective features sensitive to the crack, different ﬁlters are employed for crack edge detection, which are subsequently tackled by integral projection and principal component analysis (PCA) for optimal feature selection. Moreover, support vector machine (SVM) is used to design the classiﬁers for initial diagnosis of concrete surface based on extracted features. To raise the classiﬁcation accuracy, enhanced salp swarm algorithm (ESSA) is applied to the SVM for meta-parameter optimization. The Dempster–Shafer (D–S) fusion algorithm is utilized to fuse the diagnostic results corresponding to different ﬁlters for decision making. Finally, to demonstrate the effectiveness of the proposed framework, a total of 1200 images are collected from a real concrete bridge including intact (without crack), longitudinal crack, transverse crack and oblique crack cases. The results validate the performance of proposed method with promising results of diagnosis accuracy as high as 96.25%.


Introduction
As the most commonly used construction material, concrete has been widely used in civil infrastructure, such as buildings, tunnels, dams, bridges, and wharfs. However, most of these structures are susceptible to damage due to the environmental factors including wind, seawater, fog, ice, etc. [1][2][3][4]. Among various concrete damage types, crack is a typical concrete damage which can remarkably influence the stress distribution of structural components and undermine the structural integrity. For the reinforced concrete structure, the crack can also result in the corrosion of steel reinforcement and cause concrete cancer, which further accelerates the crack development and growth. Accordingly, timely and accurate identification of significant cracks on the structure surface is of great importance for protecting civil infrastructure and avoiding unnecessary economic loss. The traditional crack inspection methods are not only labour intensive and time consuming, but also lack the assurance of accuracy and being real-time [5,6]. With rapid development of computer vision technology, using a mobile camera system or remotely piloted aircraft (RPA) to monitor civil infrastructure, combined with advanced image processing to extract useful geometrical features of crack from captured videos or images, is a more reliable and robust solution compared to other contact-based approaches such as ultrasonic [7][8][9]. Although the ultrasonic-based techniques may effectively detect the internal defect or crack of the structure, the installation of such a device on the structure in use is not as convenient as that of imaging methods based on controllable mobile camera system.
To effectively identify the concrete crack, a large number of studies have been conducted in terms of crack edge detection, crack segmentation and crack-sensitive feature extraction. Kim et al. put forward a crack diagnostic method based on the fuzzy set theory for reinforced concrete structures [10]. In the proposed method, the crack symptom and concrete condition were considered as the input variables and the built-in fuzzy rules were employed to evaluate the crack cause in the structure. Nnolim proposed an automated crack identification approach based on using partial differential equation [11]. This approach included both edge protection smoothing and edge enhancement process to pre-process the crack image. Additionally, the local global maximal gradient matching algorithm was used to deal with the crack image for capturing the crack characteristics. Abdel-Qader et al. compared four image processing approaches: Canny filter, Sobel filter, fast Fourier transform (FFT) and Haar transform (HT) in relation to crack detection [12]. The comparison results demonstrated that the HT method had the best performance with detection accuracy of 86% among the four methods. Fujita and Hamamoto developed a robust concrete crack identification algorithm, in which the median filter was used to eliminate the slight change due to background from the raw image and probability relaxation and adaptive thresholding methods were subsequently conducted to detect the crack with high performance including sensitivity of 0.801, specificity of 0.992 and accuracy of 0.606 [13]. A crack detection and classification method based on Beamlet transform was introduced by Ying and Salari, in which the problem of uneven background brightness was fixed by an enhanced algorithm via evaluating the multiplicative factor [14]. Tsai et al. demonstrated the application of geodesic minimal path algorithm in the generation of the crack pattern, which can be used for the process of route planning [15]. Valença et al. introduced a novel approach called Image Processing of Cracking in Concrete Surfaces (MCRACK), which is based on combined global-local method, to process the digital images for automatic characterisation of concrete cracks [16]. Kim et al. presented a novel concrete crack assessment approach based on the integration of RAP technique and hybrid image processing, in which the binarization method was used to evaluate the crack width [17]. The Gabor filter was utilised by Medina et al. to design a diagnostic system for identifying the crack in concrete tunnel surface [18]. In this study, the parameters of Gabor filter were optimized by an enhanced genetic algorithm to obtain the best identification performance.
Even though the aforementioned methods have been effective in concrete crack detection, the identification accuracy is still affected by the non-uniform background brightness and noise contamination. Existing methods are unable to guarantee accurate detection of crack edges while eliminating the noise. To resolve this issue, machine learning (ML) techniques have been introduced for image processing, and several studies have been reported in application of ML in crack classification and segmentation accordingly. Lee et al. presented an approach based on back-propagation (BP) neural networks to identify and analyse the crack on the concrete surface [19]. The proposed method can effectively quantify the crack geometries including length, width and orientation and extract the crack patterns including vertical, horizontal and random cracks, and the classification accuracy is capable to reach as high as 100% for different patterns of crack including horizontal, vertical diagonal (−45 • and +45 • ) and random. Chun et al. employed gradient boost decision tree to design the crack detection algorithm, the inputs of which are colour, gradient and texture characteristics of crack [20]. The random forest learning algorithm was also applied in the design of crack diagnostic system, where the local crack patch can be predicted based on channel and pairwise different features [21,22]. In above studies [21,22], two magnitude channels, three colour channels and eight orientation channels were employed to make up a total of thirteen channels for feature extraction. Liang et al. classified the crack images of concrete using a support vector machine (SVM) classifier, in which the mean square Remote Sens. 2021, 13, 240 3 of 28 deviation and peak ratios of grey histogram and distribution of projective integral are utilized as the inputs and crack type is the output of the classifier [23,24]. Similar crack detection algorithms based on SVM were also reported in [25,26]. In [27], Mokhtari et al. conducted a comparative study on performance evaluation of different ML algorithms, including ANN, DT, k-NN and ANFIS, in terms of computer vision-based crack detection. The results showed that ANFIS and ANN have superior characteristics with regard to calculation time, prediction accuracy and result stability and interpretability. Recently, deep learning (DL) techniques have been quickly developed and widely utilized in processing remote sensing data via dimensionality reduction and feature learning, especially in the application of concrete crack detection. Compared to the traditional ML methods with shallow configuration, the DL approaches are capable to produce the predictions with higher accuracy due to the deeper architecture [28]. Li et al. employed fully convolutional neural networks (CNN) with Bayes fusion algorithm for automated crack identification of concrete bridge [29]. Jo and Jadidi adopted deep belief networks to process both infrared and RGB images for crack classification [30]. Zhang et al. combined long short-term memory and 1-D CNN to analyse the image in frequency domain for detecting the crack on the bridge deck [31]. Furthermore, self-generative adversarial networks [32], semi-supervised deep cross-modal network [33], deep encoder-decoder networks [34], and graph convolutional neural networks [35] were also developed for dealing with remote sensing images for same task of interest. A comprehensive literature review on application of ML and DL in feature extraction and pattern recognition of imagery data was presented in detail by Rasti et al. [36]. In spite of the progressive advances of ML or DL algorithms, further investigation needs to be carried out in this area. The main reason is that the performance of learning model is overly dependent on the complexity of model configuration, setting of model meta-parameters and quality of data set for training.
To address the challenges in existing concrete crack identification methods, a hierarchical framework is proposed in this study via combining various image processing, ML and information fusion techniques. To start with, the raw images are denoised using non-local means method. Then, the processed images are sent to different filters for crack edge detection, and the crack sensitive features are extracted by the integral projection. Principal component analysis on the extracted features is also undertaken to select optimal components. The SVM multi-classifiers, with benefits of high-dimensional and nonlinear pattern recognition with small size of samples, are subsequently built up to achieve the initial identification of crack pattern. To improve the generalization capacities of the classifiers, the enhanced salp swarm algorithm is employed to optimize the meta-parameters of SVM. Eventually, the Dempster-Shafter fusion algorithm is introduced to fuse the initial results of different classifiers corresponding to different image filtering methods to improve the identification accuracy. Finally, the concrete images taken by a RPA are used to verify the effectiveness of the proposed framework with satisfactory results.

Establishment of Dataset
To establish a dataset for developing concrete crack identification approach, a total of 1287 RGB images, with the resolution of 4608 × 3072 pixels, were captured by a RPA. These images were taken from the surfaces of girder and pier of a concrete bridge under different light conditions containing both shadowed and sun facing surfaces. The image files were saved as JPGs, with the mean size of 6 MB. Since the original images include multiple cracks with various types, each image was cropped into several images with the resolution size of 256 × 256 pixels. All the cropped images can be categorised at four levels, background (without crack), longitudinal crack, transverse crack and oblique crack images. For each category, 300 images are selected and, therefore, a total of 1200 images will be employed in this research. The examples of images with different categories are shown in Figure 1.

Proposed Hierarchical Framework
The schematic of the proposed hierarchical framework is displayed in Figure 2. As shown in the figure, the framework is divided into four analysis phases, i.e., image preprocessing, crack edge detection, feature extraction and selection, and crack pattern identification. In the phase of pre-processing, the non-local means algorithm is employed to deal with the raw images to eliminate the noise interference. In the second phase, the denoised images are processed by five adaptive filters, including Sobel, Laplacian of Gaussian (LoG), Gabor, steerable and homogeneity filters, for crack edge detection. In the third phase, crack-sensitive features are extracted and normalized using the integral projection method. To prevent feature information redundancy and decrease the feature dimension, principal component analysis is introduced to select optimal components to stand for whole feature information. Based on selected principal components, the SVM classifiers with posterior probability outputs are built up in the fourth phase, which provide the initial identification results. To ameliorate the generalization capacities of SVM classifiers, the enhanced salp swarm optimization (ESSO) algorithm and cross-validation (CV) operation are employed to optimize the SVM meta-parameters during the model training. In the fifth phase, the posterior probability outputs of SVM, regarded as the evidences, are transformed into the basic probability assignments (PAs), and the D-S evidence fusion algorithm is used to fuse the initial results obtained from different filters to give a final result. Since the initial identification results from different filters may be either inaccurate or conflicting, the fusion of these results is capable to effectively avoid the wrong identification. It is noticeable that the proposed framework adopts a hierarchical architecture, where the outputs in the former phase are the inputs in the next phase. Hence, via multiphase image processing, the identification accuracy and robustness of concrete crack can be guaranteed.

Proposed Hierarchical Framework
The schematic of the proposed hierarchical framework is displayed in Figure 2. As shown in the figure, the framework is divided into four analysis phases, i.e., image preprocessing, crack edge detection, feature extraction and selection, and crack pattern identification. In the phase of pre-processing, the non-local means algorithm is employed to deal with the raw images to eliminate the noise interference. In the second phase, the denoised images are processed by five adaptive filters, including Sobel, Laplacian of Gaussian (LoG), Gabor, steerable and homogeneity filters, for crack edge detection. In the third phase, crack-sensitive features are extracted and normalized using the integral projection method. To prevent feature information redundancy and decrease the feature dimension, principal component analysis is introduced to select optimal components to stand for whole feature information. Based on selected principal components, the SVM classifiers with posterior probability outputs are built up in the fourth phase, which provide the initial identification results. To ameliorate the generalization capacities of SVM classifiers, the enhanced salp swarm optimization (ESSO) algorithm and cross-validation (CV) operation are employed to optimize the SVM meta-parameters during the model training. In the fifth phase, the posterior probability outputs of SVM, regarded as the evidences, are transformed into the basic probability assignments (PAs), and the D-S evidence fusion algorithm is used to fuse the initial results obtained from different filters to give a final result. Since the initial identification results from different filters may be either inaccurate or conflicting, the fusion of these results is capable to effectively avoid the wrong identification. It is noticeable that the proposed framework adopts a hierarchical architecture, where the outputs in the former phase are the inputs in the next phase. Hence, via multi-phase image processing, the identification accuracy and robustness of concrete crack can be guaranteed.

Original Image Pre-Processing Using Non-Local Means Method
Identifying the crack pattern on the surface of concrete structure is challenging due to complicated textures of concrete background and inhomogeneous intensity of crack. The general textures of concrete structure consist of aggregate and cement with various shapes and colours, which may be inaccurately recognized as the crack. On the other hand, some cracks could not be identified because of the fact that they have similar intensities as the background object due to several factors such as lighting condition, narrow width of crack (e.g., micro-crack) and noise influence. To improve the crack identification accuracy, it is of great necessity to eliminate the effect of background textures of concrete surface using image denoising method.
The non-local means (NLM) is a denoising technique that aims to deal with Gaussian white noise in natural images [37]. The fundamental of NLM is to construct the weighting of mean value by means of evaluating the similarity of patch pixels of image, which is different from traditional methods using single pixel similarity. Therefore, image denoising using patch information is capable of better keeping the edges, textures and other features of image as seen in Figure 3. Suppose there is a noisy image denoted by =

Original Image Pre-Processing Using Non-Local Means Method
Identifying the crack pattern on the surface of concrete structure is challenging due to complicated textures of concrete background and inhomogeneous intensity of crack. The general textures of concrete structure consist of aggregate and cement with various shapes and colours, which may be inaccurately recognized as the crack. On the other hand, some cracks could not be identified because of the fact that they have similar intensities as the background object due to several factors such as lighting condition, narrow width of crack (e.g., micro-crack) and noise influence. To improve the crack identification accuracy, it is of great necessity to eliminate the effect of background textures of concrete surface using image denoising method.
The non-local means (NLM) is a denoising technique that aims to deal with Gaussian white noise in natural images [37]. The fundamental of NLM is to construct the weighting of mean value by means of evaluating the similarity of patch pixels of image, which is different from traditional methods using single pixel similarity. Therefore, image denoising using patch information is capable of better keeping the edges, textures and other features of image as seen in where A denotes the coordinate domain of image. For any pixel a in the image, the estimated value of this pixel using NLM can be calculated by: where the weighting function w f (a, b) relies on the similarity degree between pixels a and b, and meets the conditions 0 ≤ w f (a, b) ≤ 1 and ∑ b w f (a, b) = 1.
and b, and meets the conditions 0 ≤ ( , ) ≤ 1 and ∑ ( , ) = 1. The similarity degree between pixels a and b is determined by the grey matrices Na and Nb, which denote the image regions centre on pixels a and b, respectively. The similarity between two regions Na and Nb can be measured by the Gaussian weighted Euclidean distance ( , ), shown as: where ϑ denotes the standard deviation of Gaussian kernel. The more similar the grey matrices between neighbouring regions, the greater the weighting of corresponding pixels in the weighted average. The weighting function wf(a,b) is defined as follows: where Z(a) is a normalized parameter; r denotes the smoothing parameters, which is related to the standard deviation of image noise.

Crack Edge Detection Using Filtering Methods
After the denoising, the pre-processed images are further dealt with for crack edge detection. It is generally acknowledged that the light intensities between non-cracked and cracked images are remarkably different, so the nonlinear filters can be utilized to distinguish the difference in light intensity for crack edge detection. In this study, five different types of filters are used to achieve this objective. The similarity degree between pixels a and b is determined by the grey matrices N a and N b , which denote the image regions centre on pixels a and b, respectively. The similarity between two regions N a and N b can be measured by the Gaussian weighted Euclidean distance dg(a, b), shown as: where ϑ denotes the standard deviation of Gaussian kernel. The more similar the grey matrices between neighbouring regions, the greater the weighting of corresponding pixels in the weighted average. The weighting function w f (a,b) is defined as follows: where Z(a) is a normalized parameter; r denotes the smoothing parameters, which is related to the standard deviation of image noise.

Crack Edge Detection Using Filtering Methods
After the denoising, the pre-processed images are further dealt with for crack edge detection. It is generally acknowledged that the light intensities between non-cracked and cracked images are remarkably different, so the nonlinear filters can be utilized to distinguish the difference in light intensity for crack edge detection. In this study, five different types of filters are used to achieve this objective.

Sobel Filter
The fundamental of the Sobel filter is to conduct the weighted smoothing on the image, and then carry out the differential operation [38]. The related mathematical expressions are given as follows: The gradient value of pixel point (i,j) can be calculated by: The crack edges in the image can be determined according to this gradient value. Given the threshold T s , if M(i, j) ≥ Ts, the pixel is regarded as the edge point; otherwise, it is a non-edge point. In this study, the value of is set at 3.73, as suggested in [18].

Laplacian of Gaussian (LoG) Filter
A LoG filter was developed based on Laplacian operator, which is the derivative filter utilized for localizing the areas of abrupt changes in the image [39,40]. In this filter, before the Laplacian operation is applied, the Gaussian filter is employed to smoothen the image. Consequently, it is called LoG filter. Laplacian operator can be defined as follows: In this study, the four neighbours-differential kernel is employed to approximate Laplacian differential operation, which is shown as follows: where (i,j) denotes the pixel point in the image. To add the Gaussian filter for image smoothing, the function that combines both Gaussian and Laplacian filters can be represented by: where ρ denotes the width of Gaussian kernel.

Gabor Filter
A two-dimensional (2D) Gabor filter was first proposed by Daugman, which can be regarded as a complex sine function modulated by Gaussian function [41]. Similar to the 2D receptive field profile of simple cells in mammalian visual cortex, the 2D Gabor filter has excellent spatial locality and directional selectivity, and is able to capture special frequency and local structural features of multiple directions in the local region of the image. The definition of the 2D Gabor filter is shown in Equation (11): where (x,y) denotes the pixel point in the image; ω 0 denotes the centre frequency of the filter; θ indicates the direction of the Gabor wavelet; and σ denotes the standard deviation of the Gaussian function.
In the image processing, the concrete crack image can be convoluted with the Gabor filter, and the corresponding result of convolution is the extracted Gabor features. As- (16) Consequently, the steerable filter can be formulated by Equation (17): where α denotes the directional angle of filter. Based on Equation (17), the responses of pixel point (z 1 , z 2 ) in image Q can be calculated by the following equation: where "⊗" denotes the convolutional operation as before.

Homogeneity Filter
The homogeneity filter was developed based on the homogeneity operator, in which the pixel is subtracted by its eight neighbouring pixels [43]. The mathematical expression of the homogeneity operator is shown in Equation (19): where P denotes the central pixel; N i denotes the ith neighbour of the central pixel, i = 1,2, . . . ,8; I P denotes the intensity of the central pixel; I t denotes the pre-set threshold, ranging between 0 and 255. In this work, the value of I t is calculated by Ostu method. the examples of comparison of different filters in processing the crack images, respectively. It is clearly observed from these figures that most filters are effective in detecting the edges of different types of crack, even though some small regions in the images are inaccurately identified as the cracks. Overall, the processed grey images by the filters can be used to extract the features as the inputs of the machine learning classifiers.
Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 28 regions in the images are inaccurately identified as the cracks. Overall, the processed grey images by the filters can be used to extract the features as the inputs of the machine learning classifiers.

Crack-Sensitive Feature Extraction and Selection
Based on the filtered images, the crack-sensitive features are extracted and selected based on integral projection and principal component analysis. The detailed process and results will be presented in the following sub-sections.

Feature Extraction Using Integral Projection
The integral projection (IP) was developed on basis of the projection distribution characteristics of the image in some directions [44]. As a statistical approach, the IP essentially consists of horizontal projection and vertical projection, the expressions of which are given as follows: regions in the images are inaccurately identified as the cracks. Overall, the processed grey images by the filters can be used to extract the features as the inputs of the machine learning classifiers.

Crack-Sensitive Feature Extraction and Selection
Based on the filtered images, the crack-sensitive features are extracted and selected based on integral projection and principal component analysis. The detailed process and results will be presented in the following sub-sections.

Feature Extraction Using Integral Projection
The integral projection (IP) was developed on basis of the projection distribution characteristics of the image in some directions [44]. As a statistical approach, the IP essentially consists of horizontal projection and vertical projection, the expressions of which are given as follows: regions in the images are inaccurately identified as the cracks. Overall, the processed grey images by the filters can be used to extract the features as the inputs of the machine learning classifiers.

Crack-Sensitive Feature Extraction and Selection
Based on the filtered images, the crack-sensitive features are extracted and selected based on integral projection and principal component analysis. The detailed process and results will be presented in the following sub-sections.

Feature Extraction Using Integral Projection
The integral projection (IP) was developed on basis of the projection distribution characteristics of the image in some directions [44]. As a statistical approach, the IP essentially consists of horizontal projection and vertical projection, the expressions of which are given as follows:

Crack-Sensitive Feature Extraction and Selection
Based on the filtered images, the crack-sensitive features are extracted and selected based on integral projection and principal component analysis. The detailed process and results will be presented in the following sub-sections.

Feature Extraction Using Integral Projection
The integral projection (IP) was developed on basis of the projection distribution characteristics of the image in some directions [44]. As a statistical approach, the IP essentially consists of horizontal projection and vertical projection, the expressions of which are given as follows: where (x,y) denotes the pixel location and I(x,y) denotes corresponding pixel intensity. n denotes the number of all pixels in row and m denotes the number of all pixels in column.
In this study, the IP is employed to characterize different patterns of concrete crack. After the concrete surface images are processed by the nonlinear filters, the values of IP can be calculated and each type of crack will possess unique IP property. Generally, the longitudinal crack always leads to the stable IP in the X-axis (horizontal axis) but a peak intensity of IP in the Y-axis (vertical). On the contrary, the transverse crack causes a peak intensity of IP in the X-axis but stable IP in the Y-axis. Unlike longitudinal and transverse cracks, both without crack and oblique crack have constant IPs in both horizontal and vertical axes. Moreover, the filtering result of image with oblique crack can reflect the intensity value of crack texture, and the mean value of IP of oblique crack is higher than that of without crack. Figure 7 demonstrates the results of IP for the cases with different types of cracks processed by the LoG filter. As discussed above, the image of longitudinal crack has the peak pixel intensity of 31.43 in the vertical axis while image of transverse crack possesses the maximum pixel value of 62.51 in the horizontal axis. The mean value of pixel intensities of IP of image with oblique crack is around 9.30, which is obviously higher than that of the without-crack case (7.20). Accordingly, the IPs can be used as good indicators to distinguish different scenarios of concrete surface. where (x,y) denotes the pixel location and I(x,y) denotes corresponding pixel intensity. n denotes the number of all pixels in row and m denotes the number of all pixels in column.
In this study, the IP is employed to characterize different patterns of concrete crack. After the concrete surface images are processed by the nonlinear filters, the values of IP can be calculated and each type of crack will possess unique IP property. Generally, the longitudinal crack always leads to the stable IP in the X-axis (horizontal axis) but a peak intensity of IP in the Y-axis (vertical). On the contrary, the transverse crack causes a peak intensity of IP in the X-axis but stable IP in the Y-axis. Unlike longitudinal and transverse cracks, both without crack and oblique crack have constant IPs in both horizontal and vertical axes. Moreover, the filtering result of image with oblique crack can reflect the intensity value of crack texture, and the mean value of IP of oblique crack is higher than that of without crack. Figure 7 demonstrates the results of IP for the cases with different types of cracks processed by the LoG filter. As discussed above, the image of longitudinal crack has the peak pixel intensity of 31.43 in the vertical axis while image of transverse crack possesses the maximum pixel value of 62.51 in the horizontal axis. The mean value of pixel intensities of IP of image with oblique crack is around 9.30, which is obviously higher than that of the without-crack case (7.20). Accordingly, the IPs can be used as good indicators to distinguish different scenarios of concrete surface.

Optimal Feature Selection Using Principal Component Analysis
If all the IPs are used as the inputs to develop the diagnostic models, the model configuration will be much more complicated due to high dimension of the features. In addition, the IPs may contain the redundant information that can affect the generalization capacity of trained model. As a result, it is better to employ fewer significant components to stand for all the IPs as the optimal features. In this research, the problem is addressed by using principal component analysis (PCA), which aims to reduce the dimension of observed variables and obtain the most important information via transforming multiple variables into a few components [45]. The principle of PCA is summarized as follows. Suppose the matrix of n observations of original variable X is , and the algorithm process of PCA can be summarized in the following steps: (1) Normalize the original data using Equations (22)-(24): where i = 1, 2, …, n; and denote the mean value and standard deviation of samples, respectively, and j = 1, 2, …, p.
(4) Determine the number of PCs. The individual and accumulated contributions of PCs can be expressed by:

Optimal Feature Selection Using Principal Component Analysis
If all the IPs are used as the inputs to develop the diagnostic models, the model configuration will be much more complicated due to high dimension of the features. In addition, the IPs may contain the redundant information that can affect the generalization capacity of trained model. As a result, it is better to employ fewer significant components to stand for all the IPs as the optimal features. In this research, the problem is addressed by using principal component analysis (PCA), which aims to reduce the dimension of observed variables and obtain the most important information via transforming multiple variables into a few components [45]. The principle of PCA is summarized as follows. Suppose the matrix of n observations of original variable X is X = X 1 , X 2 , · · · , X p =    and the algorithm process of PCA can be summarized in the following steps: (1) Normalize the original data using Equations (22)-(24): where i = 1, 2, . . . , n; X j and σ j denote the mean value and standard deviation of samples, respectively, and j = 1, 2, . . . , p.
(2) Calculate the correlation coefficient matrix M using Equation (25): where m ij = m ji and m ii = 1.
(3) Calculate the eigenvalue and eigenvector of M. According to the eigen equation |M − λE| = 0, the eigenvalue λ j and eigenvector U j = U 1j , U 2j , . . . , U pj , (j = 1, 2, . . . , p) can be obtained, where λ 1 ≥ λ 2 ≥ · · · ≥ λ p ≥ 0. The extracted principal components (PCs) can be represented by Y j = X * j U j , that is: U 2j + · · · + x * kp U pj (k = 1, 2, . . . , n; p = 1, 2, . . . , n) By PCA, the IPs can be replaced with 512 PCs in the decreasing order of individual contributions. Figure 8 demonstrates an example of PCA results of images processed by the Sobel filter, in which both individual and accumulative contributions of PCs are displayed. It is observed that the first two PCs have the highest contribution rates of 26.87% and 26.41%, respectively. From the third PC, the contribution percentages of PCs reduce to below 10%, while the accumulative contribution percentage can continuously ascend. The first 15 PCs can achieve more than 95% contributions of all the IPs. Even though around 5% feature information of IPs is lost, the feature dimension is significantly reduced, which is beneficial to the model training of the classifier. Based on the same selection criteria (>95% contributions), the numbers of selected PCs for the LoG filter, Gabor filter, steerable filter and homogeneity filter are 22, 17, 15, 25 and 32, respectively.
Remote Sens. 2021, 13, x FOR PEER REVIEW = ∑ ( ) = By PCA, the IPs can be replaced with 512 PCs in the decreasing order of in contributions. Figure 8 demonstrates an example of PCA results of images proc the Sobel filter, in which both individual and accumulative contributions of PC played. It is observed that the first two PCs have the highest contribution rates o and 26.41%, respectively. From the third PC, the contribution percentages of PC to below 10%, while the accumulative contribution percentage can continuously The first 15 PCs can achieve more than 95% contributions of all the IPs. Even around 5% feature information of IPs is lost, the feature dimension is signific duced, which is beneficial to the model training of the classifier. Based on the sam tion criteria (>95% contributions), the numbers of selected PCs for the LoG filte filter, steerable filter and homogeneity filter are 22, 17, 15, 25 and 32, respectively

Feature Level-Based Crack Diagnosis Using Enhanced Salp Swarm Algorith mized SVM Classifiers
In this section, corresponding to each filtering method, the classifier based will be developed to automate the concrete crack identification. To improve the zation capacity of SVM, an enhanced salp swarm algorithm is proposed to opti meta-parameters during the model training.

SVM Sub-Classifiers for Identifying Concrete Crack
The typical application of SVM is for solving binary classification problem, i ing whether the test sample belongs to positive or negative class [46]. However, t aims to identify different crack types of concrete, which is a multi-objective clas problem in nature. One of the most direct ways is to construct multiple hyp which can be used to divide the entire sample space into multiple regions. Eac corresponds to one class. Although this method is able to fundamentally solve t lem, its application prospect is not encouraging due to a large amount of calcu the practical application, there are two common strategies to solve such a multi-c tion problem: one against rest (OAR) and one against one (OAO). The fundam these two strategies is to transform a multi-classification problem into multipl classification problems. The corresponding classifier is also called "sub-classifier n-class classification problem, the OAR strategy only requires to establish classifiers. In i-th sub-classifier, the samples with i-th class are regarded as the

Feature Level-Based Crack Diagnosis Using Enhanced Salp Swarm Algorithm-Optimized SVM Classifiers
In this section, corresponding to each filtering method, the classifier based on SVM will be developed to automate the concrete crack identification. To improve the generalization capacity of SVM, an enhanced salp swarm algorithm is proposed to optimize the metaparameters during the model training.

SVM Sub-Classifiers for Identifying Concrete Crack
The typical application of SVM is for solving binary classification problem, i.e., judging whether the test sample belongs to positive or negative class [46]. However, this study aims to identify different crack types of concrete, which is a multi-objective classification problem in nature. One of the most direct ways is to construct multiple hyperplanes, which can be used to divide the entire sample space into multiple regions. Each region corresponds to one class. Although this method is able to fundamentally solve this problem, its application prospect is not encouraging due to a large amount of calculation. In the practical application, there are two common strategies to solve such a multi-classification problem: one against rest (OAR) and one against one (OAO). The fundamental of these two strategies is to transform a multi-classification problem into multiple binary-classification problems. The corresponding classifier is also called "sub-classifier". For an n-class classification problem, the OAR strategy only requires to establish C 1 n = n sub-classifiers. In i-th sub-classifier, the samples with i-th class are regarded as the positive class samples, while the rest of the samples are regarded as the negative class samples. The final classification result of OAR strategy is the output category of positive class. The main benefit of OAR strategy is that the number of sub-classifiers needed to be established is relatively small, but there exists the potentials of "classification overlap" and "unclassifiable". The OAO strategy, however, establishes the sub-classifiers for any arbitrary two classes of samples in the nclass classification problem, and a total of C 2 n = n(n − 1)/2 sub-classifiers are required. The final classification result of OAO strategy is decided by the voting of all the sub-classifiers. The main feature of OAO strategy is that the number of sub-classifiers is rapidly increased with the adding class number and the training efficiency is lower than that of OAR strategy.
In this work, both OAR and OAO strategies are investigated to develop SVM subclassifiers for concrete surface crack identification. Hence, four (C 1 4 = 4) sub-classifiers should be developed for the OAR strategy, i.e.,

Optimizing Meta-Parameters of SVM Using ESSA
The fundamental of SVM is to find an optimal classification line which can not only separate the data samples correctly but also maximize the margin. For the data with nonlinear separability, the samples in the input space can be mapped to the high-dimensional feature space through the nonlinear transformation, which transforms the nonlinear classification into linear transformation and forms a nonlinear SVM. However, SVM is not capable of directly solving the dot product of the feature space, so the kernel function of original space is employed instead. There are a number of functions that can be used as the kernel of SVM, including polynomial function, radial basis function (RBF), sigmoid function, etc. In this study, the RBF is selected due to wider domain of convergence. The mathematical expression of RBF is: where σ is a free parameter to indicate the variance of kernel. The optimal classification function can be written as: where α * i (i = 1, 2, . . . , m) are optimal Lagrange multipliers in the range of (0, C) and b * denotes the bias.
When the SVM-based sub-classifiers are established for concrete crack identification, the meta-parameters of SVM should be appropriately selected. Here, the SVM metaparameters include balance coefficient C and kernel parameter σ. The influences of both parameters on the SVM performance are totally different. For parameter C, if it is assigned with a low value, the classification function will be flat; if it is assigned with a high value, more samples will be employed as the support vectors to accurately predict all the data. Different settings of parameter combination may result in distinctly different model performances [47]. Accordingly, how to select the optimal values of meta-parameters is of great importance to the development of SVM for the best generalization ability. In this study, an enhanced salp swarm algorithm (ESSA) is put forward to optimize the C and σ during the training of SVM sub-classifiers. The fitness function of meta-parameter optimization is defined as the mean prediction accuracy of training samples using five-fold cross validation (CV), and the optimization problem can be expressed by: The procedure of using ESSA to optimize the meta-parameters of SVM sub-classifiers can be summarized as follows, which is also shown in Figure 9.

SVM Training Results
In this study, 80% of images of each class are randomly selected as the training sam ples to develop the SVM sub-classifier while the rest of images are used as the validatio samples to evaluate the effectiveness of the proposed model. The setting of basic param eters of ESSA is given as follows: swarm size is 50 and the maximum iteration number 200. Additionally, how to define the decreasing weight w is also important, since it d rectly affects the accuracy and convergence of algorithm [48]. Generally, in the initial stag of algorithm evolution, we need a large weighting to enhance the global search ability o ESSA, while in the later stage, a small weighting is required to improve the local searc ability. In [49], a linearly decreasing weighting factor was proposed to update the locatio of leader salp. However, when it is far from the food source, the SSA with linear decrea ing weight may fall into the local optimum. Accordingly, this study corrects this problem and proposes an S-curve-based decreasing weighting factor, with the expression in Equ tion (34): where and are two parameters to tune the shape of the S-curve. A comparison b Step 1. Confirm the optimization target and the parameters to be optimized, and set the algorithm parameters of ESSA, including swarm size of salp, maximum iteration number and parameters affecting S-curve-based decreasing weight.
Step 2. Initialize the locations of the salps in the vector of parameters to be optimized. Here, the parameters are C and σ 2 , and the upper and lower boundaries of parameters are 0.001 and 100, respectively.
Step 3. Calculate the fitness value of each salp in the swarm, and record the individual optimum and global optimum of the swarm.
Step 4. Set current iteration number CIN as 1.
Step 5. For each salp, if it is the leader (first salp, m = 1), use Equation (32) to update its location; otherwise, use Equation (33) to update the location: where X 1 d denotes the location of the leader; F d denotes the food location; w denotes the weighting factor; ub d and lb d are upper and lower search boundaries; c 2 and c 3 are two random numbers between 0 and 1; c 1 denotes convergence factor, which is used to balance the exploitation and exploration abilities of algorithm; l denotes the current iteration number; l max denotes the maximum iteration number; X m d denotes the location of m-th follower.
Step 6. Evaluate the fitness value of each salp, and compare the current individual and global optimum with previous ones. If the current results are better, replace the previous record with current results; or else, keep the record unchanged.
Step 7. Judge whether the current iteration number reaches its maximum value. If so, terminate the optimization. Otherwise, CIN = CIN + 1 and go back to Step 5.

SVM Training Results
In this study, 80% of images of each class are randomly selected as the training samples to develop the SVM sub-classifier while the rest of images are used as the validation samples to evaluate the effectiveness of the proposed model. The setting of basic parameters of ESSA is given as follows: swarm size is 50 and the maximum iteration number is 200. Additionally, how to define the decreasing weight w is also important, since it directly affects the accuracy and convergence of algorithm [48]. Generally, in the initial stage of algorithm evolution, we need a large weighting to enhance the global search ability of ESSA, while in the later stage, a small weighting is required to improve the local search ability. In [49], a linearly decreasing weighting factor was proposed to update the location of leader salp. However, when it is far from the food source, the SSA with linear decreasing weight may fall into the local optimum. Accordingly, this study corrects this problem and proposes an S-curve-based decreasing weighting factor, with the expression in Equation (34): where α 1 and α 2 are two parameters to tune the shape of the S-curve. A comparison between S-curve-based decreasing weight with different combinations of α 1 and α 2 and linear decreasing weight is shown in Figure 10. It is clearly seen that compared to the linear one, the nonlinear weighting based on S-curve keeps the larger value for a longer time in the early stage, which can make the leader salp quickly find the rough location of food, and then rapidly declines to the minimum value in the later stage, which is beneficial to the fine tuning of food location. Among three parameter combinations, the combination (α 1 = 10, α 2 = 4) shows the symmetric property in the range of [0, 1], which is adopted in this study. Then, the training samples are sent to the SVM for obtaining optimal model parameters based on ESSA. Figure 11 demonstrates an example of SVM meta-parameter optimization of without crack-rest sub-classifier based on the images processed by the Sobel filter, in which Figure 11a depicts optimal and mean fitness with the iteration and Figure  11b shows the optimization process of parameters C and σ 2 . It can be observed that the maximum identification accuracy is kept at a relatively stable value of about 93.8%, while Then, the training samples are sent to the SVM for obtaining optimal model parameters based on ESSA. Figure 11 demonstrates an example of SVM meta-parameter optimization of without crack-rest sub-classifier based on the images processed by the Sobel filter, in which Figure 11a depicts optimal and mean fitness with the iteration and Figure 11b shows the optimization process of parameters C and σ 2 . It can be observed that the maximum identification accuracy is kept at a relatively stable value of about 93.8%, while the mean identification accuracy fluctuates between 92% and 94%. For two SVM parameters, their values have obvious variations at around 155th iteration, and then arrive at the optimal values of 25.3911 and 79.8106, respectively. Table 1 summarizes the details of all trained SVM sub-classifiers including optimal meta-parameters and number of support vectors. Based on the best meta-parameters, the trained sub-classifiers can obtain the optimal generalization capacity for concrete crack identification. Then, the training samples are sent to the SVM for obtaining optimal model parameters based on ESSA. Figure 11 demonstrates an example of SVM meta-parameter optimization of without crack-rest sub-classifier based on the images processed by the Sobel filter, in which Figure 11a depicts optimal and mean fitness with the iteration and Figure  11b shows the optimization process of parameters C and σ 2 . It can be observed that the maximum identification accuracy is kept at a relatively stable value of about 93.8%, while the mean identification accuracy fluctuates between 92% and 94%. For two SVM parameters, their values have obvious variations at around 155th iteration, and then arrive at the optimal values of 25.3911 and 79.8106, respectively. Table 1 summarizes the details of all trained SVM sub-classifiers including optimal meta-parameters and number of support vectors. Based on the best meta-parameters, the trained sub-classifiers can obtain the optimal generalization capacity for concrete crack identification.

Dempster-Shafer Fusion Algorithm
In this research, the D-S fusion algorithm is employed to combine the initial diagnostic results of concrete surface corresponding to different filters. In the D-S fusion, the frame of recognition χ should be established first to include all the potential categories of concrete surface condition. Here, χ is defined as χ = [η 1 , η 2 , η 3 , η 4 ], where η 1 , η 2 , η 3 and η 4 correspond to intact, longitudinal crack, transverse crack and oblique crack, respectively. In addition, the basic probabilities are assigned to all the possible hypothesis in the frame of recognition, satisfying the condition in Equation (35): where m(·) is called probability assignment (PA) function, which can assign the subset in 2 χ with a value between 0 and 1. Hence, m(V) corresponds to each evidence.
In this work, the D-S algorithm is adopted to fuse the initial diagnostic results of an image from different SVM sub-classifiers and image filtering methods. Suppose the numbers of sub-classifiers and image filtering methods are m and n, respectively, there are a total of m·n identification results for one test image, corresponding to m·n pieces of evidence, denoted by m 1 , m 2 , . . . , m m·n . The fusion of these pieces of evidence can be considered as the operation of conjunctive summation, expressed by [50,51]: where CP denotes the conflict degree among different pieces of evidence. Final diagnostic result of concrete surface can be made according to maximum degree of subset in 2 χ . For any element V i , if the following condition is met, V i is the fusion result. In Equation (37), τ 1 and τ 2 are the pre-set thresholds that can guarantee the identification accuracy. Θ represents all the possible condition combinations, and its probability assignment corresponds to the uncertainty of the final result. To reduce the influence of uncertainty on the result accuracy, τ 1 and τ 2 are set to 0.3 and 0.1 in this study, according to the suggestion in [52].

Soft Outputs of SVM Sub-Classifier
The class information of sample outputted from standard SVM can always be given clearly, that is, 0 (negative) and 1 (positive), which belongs to "hard" decision. However, in the real situation, for some uncertain classification problems, it is difficult to categorize the samples into a certain class, but instead can give the probability value or membership degree to any class. If the SVM is used to deal with the non-deterministic classification, the "soft" decision is needed. Generally, the identification result of standard SVM can be mapped into the interval of [0, 1] to realize the probability output, which is used to signify the uncertainty of the result. In this research, the sigmoid function is selected as the posterior probability model, the expression of which is shown in Equation (38): This function can be with different forms via tuning the parameters of M and N. To satisfy the condition that the output of po(x) monotonically increases with the probability value of positive class, the value of M should be negative. Adjusting parameter N can make the SVM with the ability to deal with the bias training data. Selecting optimal values of M and N can be regarded as solving the following optimization problem: The SVM with soft decision can adopt the D-S fusion algorithm to make joint decision. To begin with, the posterior probability output of SVM should be transformed to the probability assignment of D-S fusion. The probability assignments of positive and negative classes are easily determined, i.e., po(x) and 1-po(x), respectively. However, according to the fundamental of D-S fusion algorithm, the probability assignment of uncertainty in the frame of recognition Θ should be included as well. Here, the uncertainty is defined as the upper boundary of expectation of misclassification rate of test samples, which can be calculated by the following equation: where the numerator is the average number of support vectors, and N tr denotes the total number of training data. To meet the requirement that the summation of probability assignments is equal to 1, the outputs of positive and negative classes should be multiplied by the coefficient 1 − max[UE(Er)]. Accordingly, the mathematical expressions of probability assignments of SVM sub-classifier are given as follows: where PC and NC denote positive and negative classes, respectively.

Fusion Results and Discussion
To evaluate the performance of proposed method, the test images with different concrete surface conditions are processed and then inputted into the trained SVM subclassifiers for initial diagnosis. As an example, Tables 2-5 list the probability assignments of four SVM sub-classifiers in the OAR strategy for an image without a crack. Table 2 displays the assignment values of without crack-rest sub-classifier, in which m 1,i (i = 1, 2, . . . , 5) correspond to the initial diagnostic results with different image filtering methods. As can be seen from the table, more confidences are assigned to without crack proposition (V 1 ) than to rest proposition (V 2 ∪V 3 ∪V 4 ) conforming to the real condition, except the case of Gabor filter (m 1,3 ) where the assignments of without crack and rest are 0.1766 and 0.2802, respectively. Nevertheless, it is hard for the system to make a diagnostic decision, since the uncertainty (Θ) has the largest probability assignment in each evidence, obviously exceeding the pre-set threshold value of 0.1. The major reason causing this problem is that the number of support vectors is relatively higher compared to the total number of training samples. The probability assignments of the other three sub-classifiers (longitudinal crack-rest, transverse crack-rest and oblique crack-rest) for this image without a crack are provided in Tables 3-5. Like the without crack-rest sub-classifier, these three sub-classifiers allocate more probability assignments to rest proposition (V 1 ∪V 3 ∪V 4 , V 1 ∪V 2 ∪V 4 or V 1 ∪V 2 ∪V 3 ) than longitudinal crack-rest (V 2 ), transverse crack-rest (V 3 ) or oblique crack-rest (V 4 ) proposition. Similarly, the uncertainty with the largest assignment results in the difficulty of decision making of concrete surface condition diagnosis corresponding to each filtering method.
Therefore, the D-S algorithm is applied to the probability assignments of initial diagnostic results via a two-level fusion. In the first-level fusion, at each sub-classifier the assignment results of different image filtering methods are combined, the result of which is shown in Table 6. It is noted that the probability assignment of without crack (V 1 ) is increased to 0.8343 while the uncertainty is decreased to 0.0641 for the without crack-rest sub-classifier. Likewise, for the sub-classifiers of longitudinal crack-rest, transverse crack-rest and oblique crack-rest, the assignment values of rest proposition V 1 ∪V 3 ∪V 4 , V 1 ∪V 2 ∪V 4 and V 1 ∪V 2 ∪V 3 are ascended to 0.9931, 0.9682 and 0.8939, respectively. The assignments of uncertainties of these three sub-classifiers are all below 0.07. In accordance with the firstlevel fusion result, we can find that the proposition of without crack is apt to be the diagnosis result of concrete surface. Then, in the second-level fusion, the evidence assignments from different sub-classifiers are fused for final decision making. Table 7 provides the result of second-level fusion, in which the probability assignment of correct proposition without crack (V 1 ) has been increased to 0.9732, which approaches 1. More importantly, the value of uncertainty declines to zero. According to decision rule in Equation (37), the fusion diagnostic result is V 1 , which is consistent with the real situation of concrete surface (without crack). This example effectively demonstrates that the probability assignments of correct propositions can be significantly increased after two-level data fusion, compared to the diagnostic results without data fusion.
In  Tables 11-16 give the probability assignments of six OAObased sub-classifiers for the image of transverse damage. Apparently, for the sub-classifiers of without crack-transverse crack, longitudinal crack-transverse crack, and transverse crackoblique crack, more probability values are allocated to the right proposition V 3 (transverse crack) than V 1 (without crack), V 2 (longitudinal crack) and V 4 (oblique crack), satisfying the real condition of image. However, similar to OAR-based sub-classifiers, the OAObased SVM sub-classifiers have the same problem of hard decision making, because of high uncertainties in the probability assignment results. Hence, two-level evidence fusion is employed to combine the diagnostic results from different sub-classifiers as well as different image filtering methods. The fusion results are shown in Tables 17 and 18, where  Table 17 corresponds to the result of first-level fusion and Table 18 corresponds to the result of second-level fusion. It is clearly seen that the probability value of proposition V 3 is increased to 0.9981 while the uncertainty is eliminated after the evidence fusion operation. In accordance with Equation (37), the diagnostic condition from the proposed framework should be transverse crack, matching the practical condition of this image.
The fusion results of same images of without crack, longitudinal crack and oblique crack by OAO models are provided in Tables 19-21, where the probability assignments of correct primitives are 0.7706, 1 and 0.7154, respectively. The corresponding uncertainties are reduced to 0.0004, 0 and 0.0006, respectively. Based on the fusion results of all the image examples, it can be concluded that the data fusion is capable of enhancing the confidence level of correct primitives and weakening the effect of uncertainty on the diagnostic results. Via the result comparison between OAR and OAO sub-classifiers, we can see that OAR strategy can provide more confidence (probability value) to the correct propositions than Remote Sens. 2021, 13, 240 20 of 28 OAO strategy after two-level data fusion. In contrast, for the D-S fusion algorithm, the OAO SVM sub-classifier needs less computation cost than the OAR sub-classifier. However, with the increase of number of classification class, the number of OAO-based SVM subclassifiers (C 2 n , n is the category number) is remarkably increased in comparison with that of OAR-based models (C 1 n , n is the category number). Accordingly, how to design SVM multi-classifiers should be decided by the real engineering application. Table 22 summarizes the accuracy performance of different diagnostic models based on all the test images. These evaluated models include the proposed SVM models with data fusion as well as the SVM models with single type of features, such as SVM with Sobel filter-based features (Sobel-SVM), SVM with LoG filter-based features (LoG-SVM), SVM with Gabor filter-based features (Gabor-SVM), SVM with steerable filter-based features (Steerable-SVM) and SVM with homogeneity filter-based features (Homogeneity-SVM). For the SVM with single-type of features, the model accuracies under both OAR and OAO strategies are calculated, and only the better ones are included in Table 22. It can be seen that the proposed SVM models with data fusion have higher accuracy than the independent SVM models with single-type of features in the concrete crack diagnosis. As a result, it can reach a conclusion that with the data fusion, supplementary accurate identification is capable of being realized with various approaches for feature extraction.
Finally, to investigate the contributions of different nonlinear filtering methods to the diagnosis of concrete crack pattern, an ablation study is conducted in terms of diagnostic accuracy of both training and testing imagery data. In this investigation, each filtering method is removed from the proposed framework in turn, and the diagnostic model with the rest filtering methods is then trained for performance evaluation. The results of ablation study are displayed in Table 23. It is obvious to see that Sobel, LoG and steerable filters have predominant effects on the diagnostic accuracy of the proposed framework for both training and testing images, compared to Gabor and homogeneity filters. Overall, OAR strategy-based models have higher accuracy than OAO-based models. Accordingly, in the practical application, under the circumstance of certain accuracy, some filters with fewer contributions may be neglected, which is capable to effectively decrease the online diagnosis time. Table 2. Probability assignments of without crack-rest sub-classifier for an image without a crack.  Table 3. Probability assignments of longitudinal crack-rest sub-classifier for an image without a crack.  Table 4. Probability assignments of transverse crack-rest sub-classifier for an image without a crack.  Table 5. Probability assignments of oblique crack-rest sub-classifier for an image without a crack.  Table 6. The result of first-level fusion for an image without a crack based on OAR classifiers.  Table 7. The result of second-level fusion for an image of without a crack based on OAR classifiers.    Table 9. The result of second-level fusion for an image of a transverse crack based on OAR classifiers.      Table 12. Probability assignments of without crack-transverse crack sub-classifier for an image of a transverse crack.  Table 13. Probability assignments of without crack-oblique crack sub-classifier for an image of a transverse crack.  Table 14. Probability assignments of longitudinal crack-transverse crack sub-classifier for an image of a transverse crack.  Table 15. Probability assignments of longitudinal crack-oblique crack sub-classifier for an image of a transverse crack.  Table 16. Probability assignments of transverse crack-oblique crack sub-classifier for an image of a transverse crack.  Table 17. The result of first-level combination for an image of a transverse crack based on OAO classifiers.  Table 18. The result of second-level fusion for an image of a transverse crack based on OAO classifiers.    Table 20. The result of second-level fusion for an image of a longitudinal crack based on OAO classifiers.

Conclusions
This research develops an intelligent framework for crack diagnosis and classification, which is a combination of signal processing, machine learning, and data fusion techniques. Non-local mean and various filters are employed for noise negation and crack-sensitive pattern recognition, which contribute to a marked diagram of concrete crack. Integral projection, together with PCA, is utilized to diagnose different types of condition surface condition, including without crack, transverse crack, longitudinal crack and oblique crack. The analysis result reveals that the first 15 PCs possess more than 95% feature of all the IPs calculated from the results of the Sobel filter. The reduction of number of features to be learned can enhance the performance of machine learning model. Then, the SVM classifiers with soft outputs under both OAR and OAO strategies are established to conduct initial diagnosis of concrete surface condition. To enhance the generalization ability of the trained classifiers, the ESSA is selected to optimize the meta-parameters of SVM. The optimization results show that the classification accuracy of trained model that can arrive at is as high as 93.8% for the training samples. To fix the problem of wrong or conflicting diagnosis due to different filters, the D-S fusion algorithm is adopted to combine the initial diagnostic results of different sub-classifiers as well as different filters, which are regarded as independent evidences. The fusion results show that the confidence probability of correct proposition can reach 0.99 while the uncertainty of the prediction is reduced to below 0.001 after two-level fusion. In addition, a comparative study demonstrates that the proposed framework outperforms the independent SVM models with single-type of features in terms of the concrete crack diagnosis. Consequently, on the basis of promising results in this research, the proposed framework can be considered as a potential tool for the automatic and real-time structural inspection by the structural engineers and infrastructure agencies.
In this research, the main target is to develop the diagnostic model based on machine learning and data fusion for classifying different patterns of concrete crack. However, in the real situation, the concrete crack pattern may be more complex than three patterns in this study. Accordingly, crack segmentation is necessary for extracting important features of complex cracks including crack width, length and orientation. In the future work, more concrete images with various complex patterns of crack, including V-shape and cross-shape crack, will be collected in the field, and deep learning methods will be employed as the potential tools to fix this problem via building the diagnostic model based on the pre-processed images and corresponding ground truths. In addition, the normalisation operation will be conducted on the raw images to evaluate its effectiveness on the improvement of diagnosis accuracy.