CNN-Based Road-Surface Crack Detection Model That Responds to Brightness Changes

: Poor road-surface conditions pose a signiﬁcant safety risk to vehicle operation, especially in the case of autonomous vehicles. Hence, maintenance of road surfaces will become even more important in the future. With the development of deep learning-based computer image processing technology, artiﬁcial intelligence models that evaluate road conditions are being actively researched. However, as the lighting conditions of the road surface vary depending on the weather, the model performance may degrade for an image whose brightness falls outside the range of the learned image, even for the same road. In this study, a semantic segmentation model with an autoencoder structure was developed for detecting road surface along with a CNN-based image preprocessing model. This setup ensures better road-surface crack detection by adjusting the image brightness before it is input into the road-crack detection model. When the preprocessing model was applied, the road-crack segmentation model exhibited consistent performance even under varying brightness values.


Introduction
To ensure driver safety and smooth vehicle operation, the road surface must always be maintained in a good condition. Maintenance of roads will become even more important with the development and commercialization of autonomous road vehicles. As irregular road-surface damage can affect the driving safety of autonomous vehicles and lead to accidents, a system that minimizes the time taken for damage detection and repair is required. Recently, a system for detecting the damaged portions on a road-surface image using a deep neural network has been developed.
Computer vision problems based on deep learning can be classified into image classification [1][2][3][4], object detection [5][6][7], semantic segmentation [8,9], instance segmentation [10][11][12], and panoptic segmentation [13,14]. In the image classification problem, the image is divided into small patches, and a neural network is trained to classify the cracks in each patch. Zhang et al. [15] classified the presence or absence of cracks in each image patch using a neural network composed of four convolutional layers and two fully connected layers. Pauly et al. [16] showed that the deeper the neural network, the better the crack image patch classification performance. Feng et al. [17] designed a deep residual network for image defect detection and classification. Eisenbach et al. [18] presented the German Asphalt Pavement Distress (GAPs) dataset, which includes road images classified into six types and applied them for deep-learning model training. Rateke et al. [19] presented a dataset for classifying the road-surface type and quality and reported the results of classification using convolutional layers.
The object detection technique mainly utilizes a method for determining the location of a crack in the entire image instead of splitting the image into small patches and considering each patch. Cha et al. [20] developed a technique to scan large images and mark the crack locations by combining a convolutional neural network (CNN) trained with small patches and sliding window techniques. Maeda et al. [21] classified the road surface into eight types and constructed a CNN model to display bounding boxes indicating the location and type of damage.
The semantic segmentation technique, in which the CNN comprises an encoder and a decoder, is advantageous because it can determine the damage location as well as the geometry information of a crack. Schmugge et al. [22] proposed a crack segmentation method and presented the evaluation results for the remote visual inspection videos of nuclear power plants. Lau et al. [23] proposed a U-Net-based network architecture that uses a pretrained ResNet-34 neural network as the encoder to segment cracks on pavement surfaces. Rateke et al. [24] applied a semantic segmentation technique to segment the road-surface type, road-surface damage, and other information.
Instance segmentation combines object detection and semantic segmentation, thereby allowing pixels occupied by individual objects to be distinguished and visualized. In a crack detection problem, the location and shape of individual cracks in one image can be detected by instance segmentation models. Kim and Cho [25] developed a model based on Mask R-CNN for detection of multiple concrete damages. The model was trained using a total of 765 images, and 25 actual test images were used. Tane et al. [26] also developed a crack detector based on Mask R-CNN by training 352 original and annotated crack images.
However, it appears that there are an inadequate number of image datasets suitable for training instance segmentation models for the crack detection problem. Panoptic segmentation is a combination of instance and semantic segmentation wherein all pixels in one image are classified into each object. However, panoptic segmentation hasn't been studied as thoroughly as other techniques since crack detection doesn't require segmenting every pixel in an image.
A common issue with all crack detection models is that they can provide meaningful results only for data that reflect the same conditions used for their training. Thus, even for the same surface damage at the same location, if the external environmental changes and leads to a change in brightness levels with respect to the training images, the damage may not be detected. To address this issue, in this study, we developed a fully convolutional network (FCN) model that semantically divides the damaged road pixels on a road-surface image and a CNN model that automatically provides an image-brightness-control variable to enable the developed FCN model to achieve the best detection performance. The roadsurface image is input into the CNN model and preprocessed before it is input into the FCN-based road-crack segmentation model. Preprocessed and unprocessed images are input into the FCN model, and the effectiveness of the developed system is analyzed by comparing the detection performance between the two conditions. Figure 1 shows the overall structure of the developed road-surface crack detection system that responds to brightness changes. In a 1920 × 1080 image captured while looking ahead when driving on the road, the area corresponding to the road surface was cropped to a size of 1920 × 256. To reduce the usage of computing resources, the cropped area was resized to 960 × 128, and the RBG values of the image were placed in a [960, 128, 3] matrix and input into the CNN-based image preprocessing model.

Development of a Road-Crack Detection System That Responds to Brightness Changes
On receiving the input, the image preprocessing model calculates the enhancement factor (f B ), which adjusts the brightness of the image for the effective detection of roadsurface cracks. In the HSB color space, the image is changed through multiplication by the brightness matrix [B], and the brightness of the changed image is [B'].
The road-surface image with [B'] in the HSB color space is converted to the RGB color space; the converted RBG values are placed in a [960, 128, 3] matrix and input into the FCN-based road-surface damage inspection model. Finally, the damaged parts of the road-surface are labeled as "0" and "1" semantically.
The road-surface image with [B'] in the HSB color space is converted to the RGB color space; the converted RBG values are placed in a [960, 128, 3] matrix and input into the FCN-based road-surface damage inspection model. Finally, the damaged parts of the road-surface are labeled as "0" and "1" semantically.

Structure of the Model
In this study, we developed a road-surface crack detection model based on the FCN architecture, which is extensively used in semantic segmentation. Figure 2 depicts the schematic of the road-surface crack detection FCN model used in this study. The autoencoder structure [27,28] includes a convolution layer and deconvolution layer, and when a 960 × 128 image comprising RBG values is input into the three channels, a 960 × 128 image comprising "0" and "1" is finally output in one channel. A total of five FCN model structures with four, six, eight, ten, and twelve hidden layers excluding the input and output layers, respectively, were tested to select the final FCN model. In all the models, the convolution layers constituted half of the total hidden layers, whereas the deconvolution layers constituted the other half. The kernel size used in all the models was (3,3). After zero padding, the composite product was multiplied by a stride of (1, 1). The number of kernels was 16 in the first convolution layer of each model, which increased by double each time it moved to the next layer. A batch normalization technique [29] was applied before passing the ReLu (rectified linear unit) [30], which was used as the activation function of the hidden layer. After applying the activation function

Structure of the Model
In this study, we developed a road-surface crack detection model based on the FCN architecture, which is extensively used in semantic segmentation. Figure 2 depicts the schematic of the road-surface crack detection FCN model used in this study. The autoencoder structure [27,28] includes a convolution layer and deconvolution layer, and when a 960 × 128 image comprising RBG values is input into the three channels, a 960 × 128 image comprising "0" and "1" is finally output in one channel.
The road-surface image with [B'] in the HSB color space is converted to the RGB color space; the converted RBG values are placed in a [960, 128, 3] matrix and input into the FCN-based road-surface damage inspection model. Finally, the damaged parts of the road-surface are labeled as "0" and "1" semantically.

Structure of the Model
In this study, we developed a road-surface crack detection model based on the FCN architecture, which is extensively used in semantic segmentation. Figure 2 depicts the schematic of the road-surface crack detection FCN model used in this study. The autoencoder structure [27,28] includes a convolution layer and deconvolution layer, and when a 960 × 128 image comprising RBG values is input into the three channels, a 960 × 128 image comprising "0" and "1" is finally output in one channel. A total of five FCN model structures with four, six, eight, ten, and twelve hidden layers excluding the input and output layers, respectively, were tested to select the final FCN model. In all the models, the convolution layers constituted half of the total hidden layers, whereas the deconvolution layers constituted the other half. The kernel size used in all the models was (3,3). After zero padding, the composite product was multiplied by a stride of (1, 1). The number of kernels was 16 in the first convolution layer of each model, which increased by double each time it moved to the next layer. A batch normalization technique [29] was applied before passing the ReLu (rectified linear unit) [30], which was used as the activation function of the hidden layer. After applying the activation function A total of five FCN model structures with four, six, eight, ten, and twelve hidden layers excluding the input and output layers, respectively, were tested to select the final FCN model. In all the models, the convolution layers constituted half of the total hidden layers, whereas the deconvolution layers constituted the other half. The kernel size used in all the models was (3,3). After zero padding, the composite product was multiplied by a stride of (1, 1). The number of kernels was 16 in the first convolution layer of each model, which increased by double each time it moved to the next layer. A batch normalization technique [29] was applied before passing the ReLu (rectified linear unit) [30], which was used as the activation function of the hidden layer. After applying the activation function to the convolution layer, the result was max-pooled with a size of (2, 2). In the deconvolution layer, up-sampling was performed using a (2, 2) filter before performing convolution, and the convolution layers at the symmetrical position were connected through skip connections. A dropout rate of 0.2 was applied during the learning of all the hidden layers. In the output layer, sigmoid was designated as the activation function, and the resulting value for each pixel was classified as "0" and "1". Figure 3 shows the dataset composition for training and testing the model. A camera was attached to the top of the vehicle's windshield facing the road surface, and, while driving at approximately 40-100 km/h, the road surface was periodically photographed. Subsequently, a total of 14,400 road-surface images were carefully prepared. In addition, images were classified based on six properties: artificial joints (such as the expansion joints at bridge junctions and the discontinuous surfaces between existing and new road surfaces present in repaired locations), road markings, roadside structures, shadows on the road, vehicle images, and road cracks. Figure 3a-e depict cases that can cause false positives when detecting road-surface cracks. A one-channel matrix of the same size as the original images was generated, and all the components were labeled as zero to indicate that there was no road-surface damage. In Figure 3f, which shows images displaying road-surface cracks, the pixels of the damaged area on the road are labeled as "1", as shown in Figure 4. Among each image type, 1600 images were used as the training set, 400 as the validation set, and the remaining 400 as the test set. Therefore, the total number of images in the training, validation, and test sets was 9600, 2400, and 2400, respectively.

Model Training and Dataset Configuration for Testing
to the convolution layer, the result was max-pooled with a size of (2, 2). In the deconvolution layer, up-sampling was performed using a (2, 2) filter before performing convolution, and the convolution layers at the symmetrical position were connected through skip connections. A dropout rate of 0.2 was applied during the learning of all the hidden layers. In the output layer, sigmoid was designated as the activation function, and the resulting value for each pixel was classified as "0" and "1". Figure 3 shows the dataset composition for training and testing the model. A camera was attached to the top of the vehicle's windshield facing the road surface, and, while driving at approximately 40-100 km/h, the road surface was periodically photographed. Subsequently, a total of 14,400 road-surface images were carefully prepared. In addition, images were classified based on six properties: artificial joints (such as the expansion joints at bridge junctions and the discontinuous surfaces between existing and new road surfaces present in repaired locations), road markings, roadside structures, shadows on the road, vehicle images, and road cracks. Figure 3a-e depict cases that can cause false positives when detecting road-surface cracks. A one-channel matrix of the same size as the original images was generated, and all the components were labeled as zero to indicate that there was no road-surface damage. In Figure 3f, which shows images displaying road-surface cracks, the pixels of the damaged area on the road are labeled as "1", as shown in Figure 4. Among each image type, 1600 images were used as the training set, 400 as the validation set, and the remaining 400 as the test set. Therefore, the total number of images in the training, validation, and test sets was 9600, 2400, and 2400, respectively.

Learning Results by the Model
In the semantic segmentation model, the intersection over union (IoU) is generally used as an index to evaluate the prediction results [31]. The IoU value is obtained by dividing the intersecting area of the model-predicted pixels and the ground truth pixels by the union area of the model-predicted pixels and ground truth pixels. The closer the IoU

Learning Results by the Model
In the semantic segmentation model, the intersection over union (IoU) is generally used as an index to evaluate the prediction results [31]. The IoU value is obtained by dividing the intersecting area of the model-predicted pixels and the ground truth pixels by the union area of the model-predicted pixels and ground truth pixels. The closer the IoU to unity, the greater the coincidence of the predicted pixel region with the ground truth. It was assumed that if the trained model provides an IoU of 0.4 or more for the test images, the results were classified well predicted. After determining the prediction results of the learning model based on the IoU, the true positive, true negative, false positive, and false negative values were arranged in a confusion matrix to obtain the precision, recall, and F1-score [32]. Figure 5 shows the F1-score according to the number of hidden layers used in the learning model and the amount of the training data. If few hidden layers exist, the number of parameters required to calculate the output decreases; hence, installation on a mobile device becomes easy and can be used to segment road-surface cracks in real-time. However, if the number of parameters is less, and even if the training data are sufficiently prepared, the performance limit is low, and the model results are unreliable. In addition, even if the number of hidden layers is considerable, if the training data are insufficient, the parameters may not be sufficiently learned, resulting in low performance.   Figure 7 depicts the configuration diagram of the image preprocessing model that receives an image with a matrix of (128, 960, 3) and delivers fB approximating toward the brightness used in the training and validation sets. Excluding the input and output layers, it comprises a total of six CNN layers and five fully connected layers. Kernels sized (3,3) in all the CNN layers were strided at intervals of (1, 1). The number of kernels was set to 16 in the first layer, which was designated to increase by double passing through the following CNN layer. In addition, the CNN performed convolution after zero padding, and max pooling was performed with a (2, 2)-sized filter. After flattening the component of As indicated in Figure 5, the model with four hidden layers obtained an F1-score of 0.31, even though all the images prepared for training and validation were used; if only 1500 images are used, the F1-score is zero, and it is not possible to predict the test set result. When the number of hidden layers increased to six, an F1-score of 0.66 was obtained using all the training and validation images, and the performance improved by two-fold in comparison with that of the four-layer model. In the case of the models with 8, 10, and 12 layers each, F1-scores of 0.81, 0.83, and 0.85, respectively, were obtained, and the performances tended to converge.

Structure of Preprocessing Model
As the brightness of road surface was not constant, the same performance may not be obtained. Figure 6 depicts the F1-scores when the brightness was varied by adjusting f B from 0.2 to 1.8 in intervals of 0.2 in the 2400 images used as the test set, which were input into the FCN-based road-surface crack segmentation models trained by a total of 9600 training and 2400 validation images. When f B = 1, the brightness was the same as that of the original image. In all the models, when the brightness decreased, for f B ≥ 0.6, the F1-score was similar to that obtained when the original image was input; however, for f B < 0.6, the F1-score rapidly decreased and approached zero at f B = 0.2. As the brightness increased, in comparison with that of the original image, the F1-score also decreased. Even if the same image is used, it is not possible to achieve the same crack segmentation performance with different brightness levels.   Figure 7 depicts the configuration diagram of the image preprocessing model that receives an image with a matrix of (128, 960, 3) and delivers fB approximating toward the brightness used in the training and validation sets. Excluding the input and output layers, it comprises a total of six CNN layers and five fully connected layers. Kernels sized (3,3) in all the CNN layers were strided at intervals of (1, 1). The number of kernels was set to 16 in the first layer, which was designated to increase by double passing through the following CNN layer. In addition, the CNN performed convolution after zero padding, and max pooling was performed with a (2, 2)-sized filter. After flattening the component of the last CNN layer, the final fB output was calculated using the five fully connected layers with 64, 32, 16, 8, and 1 nodes.   Figure 7 depicts the configuration diagram of the image preprocessing model that receives an image with a matrix of (128, 960, 3) and delivers f B approximating toward the brightness used in the training and validation sets. Excluding the input and output layers, it comprises a total of six CNN layers and five fully connected layers. Kernels sized (3,3) in all the CNN layers were strided at intervals of (1, 1). The number of kernels was set to 16 in the first layer, which was designated to increase by double passing through the following CNN layer. In addition, the CNN performed convolution after zero padding, and max pooling was performed with a (2, 2)-sized filter. After flattening the component of the last CNN layer, the final f B output was calculated using the five fully connected layers with 64, 32, 16, 8, and 1 nodes.   Figure 7 depicts the configuration diagram of the image preprocessing mod receives an image with a matrix of (128, 960, 3) and delivers fB approximating towa brightness used in the training and validation sets. Excluding the input and output it comprises a total of six CNN layers and five fully connected layers. Kernels sized in all the CNN layers were strided at intervals of (1, 1). The number of kernels was 16 in the first layer, which was designated to increase by double passing through t lowing CNN layer. In addition, the CNN performed convolution after zero paddin max pooling was performed with a (2, 2)-sized filter. After flattening the compon the last CNN layer, the final fB output was calculated using the five fully connected with 64, 32, 16, 8, and 1 nodes.   Figure 8 shows the process of creating the dataset used for training the image preprocessing model depicted in Figure 7. The brightness of the 9600 training images and the 2400 validation images were changed by selecting a random f B between 0.2 and 1.8. From a total of 12,000 images with randomly changed brightness values, the f B values of 9600 training images were varied from 0.2 to 1.8 and input into the FCN model; the f B exhibiting the maximum IoU was used as the target value to train the CNN-based image preprocessing model. Thus, the road-surface images are optimized for the crack detection model by using f B , which is provided by the CNN-based image preprocessing model. 2400 validation images were changed by selecting a random fB between 0.2 and 1.8. From a total of 12,000 images with randomly changed brightness values, the fB values of 9600 training images were varied from 0.2 to 1.8 and input into the FCN model; the fB exhibiting the maximum IoU was used as the target value to train the CNN-based image preprocessing model. Thus, the road-surface images are optimized for the crack detection model by using fB, which is provided by the CNN-based image preprocessing model.  Figure 9a depicts one of the original and labeled images among the 2400 test data used for evaluating the result of linking the image preprocessing and road-crack segmentation models. In the original image in Figure 9a, fB varied from 0.2 to 1.8 in increments of 0.2. When the image with the changed fB was input into the FCN-based road-surface crack segmentation model without image preprocessing, the predicted crack areas are depicted in red, as shown in Figure 9b. In the figure, when fB was 0.2, the brightness was the least, and it was difficult to recognize the condition of the road-surface; hence, no road-surface cracks could be detected on the image. Even when the road-surface image was bright, smaller regions were detected as road-surface cracks, degrading the model performance. Figure 9c shows the result of the road-surface crack detection model after the CNN-based preprocessing model optimized the image brightness. The image's brightness was converted such that the cracks of the road surface could be well identified, and the model performance was improved.

Performance Evaluation of the Road-Surface Crack Segmentation Model with Brightness Preprocessing
(a) Original and labeled images.  Figure 9a depicts one of the original and labeled images among the 2400 test data used for evaluating the result of linking the image preprocessing and road-crack segmentation models. In the original image in Figure 9a, f B varied from 0.2 to 1.8 in increments of 0.2. When the image with the changed f B was input into the FCN-based road-surface crack segmentation model without image preprocessing, the predicted crack areas are depicted in red, as shown in Figure 9b. In the figure, when f B was 0.2, the brightness was the least, and it was difficult to recognize the condition of the road-surface; hence, no road-surface cracks could be detected on the image. Even when the road-surface image was bright, smaller regions were detected as road-surface cracks, degrading the model performance. Figure 9c shows the result of the road-surface crack detection model after the CNN-based preprocessing model optimized the image brightness. The image's brightness was converted such that the cracks of the road surface could be well identified, and the model performance was improved. Figure 8 shows the process of creating the dataset used for training the image preprocessing model depicted in Figure 7. The brightness of the 9600 training images and the 2400 validation images were changed by selecting a random fB between 0.2 and 1.8. From a total of 12,000 images with randomly changed brightness values, the fB values of 9600 training images were varied from 0.2 to 1.8 and input into the FCN model; the fB exhibiting the maximum IoU was used as the target value to train the CNN-based image preprocessing model. Thus, the road-surface images are optimized for the crack detection model by using fB, which is provided by the CNN-based image preprocessing model.  Figure 9a depicts one of the original and labeled images among the 2400 test data used for evaluating the result of linking the image preprocessing and road-crack segmentation models. In the original image in Figure 9a, fB varied from 0.2 to 1.8 in increments of 0.2. When the image with the changed fB was input into the FCN-based road-surface crack segmentation model without image preprocessing, the predicted crack areas are depicted in red, as shown in Figure 9b. In the figure, when fB was 0.2, the brightness was the least, and it was difficult to recognize the condition of the road-surface; hence, no road-surface cracks could be detected on the image. Even when the road-surface image was bright, smaller regions were detected as road-surface cracks, degrading the model performance. Figure 9c shows the result of the road-surface crack detection model after the CNN-based preprocessing model optimized the image brightness. The image's brightness was converted such that the cracks of the road surface could be well identified, and the model performance was improved.

Performance Evaluation of the Road-Surface Crack Segmentation Model with Brightness Preprocessing
(a) Original and labeled images.     Figure 11 shows the histogram of the brightness when fB of the original image ( Figure  9) was changed, and that obtained when the image with the changed fB was input into the image preprocessing model. In Figure 11a, the brightness of the original image was distributed between 100 and 150. However, when fB was adjusted under 1.0, it was distributed as a lower brightness value within a narrower range; when fB was increased than 1.0, the brightness was high over a wider range. However, when the fB-adjusted image was      Figure 11 shows the histogram of the brightness when fB of the original image ( Figure  9) was changed, and that obtained when the image with the changed fB was input into the image preprocessing model. In Figure 11a, the brightness of the original image was distributed between 100 and 150. However, when fB was adjusted under 1.0, it was distributed as a lower brightness value within a narrower range; when fB was increased than 1.0, the brightness was high over a wider range. However, when the fB-adjusted image was  Figure 11 shows the histogram of the brightness when f B of the original image ( Figure 9) was changed, and that obtained when the image with the changed f B was input into the image preprocessing model. In Figure 11a, the brightness of the original image was distributed between 100 and 150. However, when f B was adjusted under 1.0, it was distributed as a lower brightness value within a narrower range; when f B was increased than 1.0, the brightness was high over a wider range. However, when the f B -adjusted image was input into the image preprocessing model, the image brightness converged to 50-150, which becomes similar to the original brightness of the image in the training set. input into the image preprocessing model, the image brightness converged to 50-150, which becomes similar to the original brightness of the image in the training set.
(a) Without image preprocessing (b) With image preprocessing Figure 11. Histogram of the brightness with and without image preprocessing. Figure 12 depicts the F1-score after applying the image preprocessing model to the five previously generated FCN-based models. As shown in Figure 6, as the fB moved farther from the original image, the detection performance decreased significantly. However, when the preprocessing model was applied, road-surface cracks can be easily detected in the FCN model, and relatively constant F1-scores can be obtained.

Conclusions
In this study, an FCN-based road-surface crack detection model and a CNN-based image's brightness optimizing model were developed. A total of 14,400 images were collected for six road conditions, including cracks on the road surface; among these, 9600 were used as the training dataset, 2400 were used as the validation dataset, and 2400 were used as the test set. When the test dataset was input into the FCN-based crack detection model with 12 layers, an F1-score of 0.85 was obtained. When the brightness of the original image was changed, the performance of the FCN-based crack detection model degraded rapidly, and there were cases in which the road-surface cracks could not be detected.  Figure 12 depicts the F1-score after applying the image preprocessing model to the five previously generated FCN-based models. As shown in Figure 6, as the f B moved farther from the original image, the detection performance decreased significantly. However, when the preprocessing model was applied, road-surface cracks can be easily detected in the FCN model, and relatively constant F1-scores can be obtained.
Electronics 2021, 10, x FOR PEER REVIEW 9 of 11 input into the image preprocessing model, the image brightness converged to 50-150, which becomes similar to the original brightness of the image in the training set.
(a) Without image preprocessing (b) With image preprocessing Figure 11. Histogram of the brightness with and without image preprocessing. Figure 12 depicts the F1-score after applying the image preprocessing model to the five previously generated FCN-based models. As shown in Figure 6, as the fB moved farther from the original image, the detection performance decreased significantly. However, when the preprocessing model was applied, road-surface cracks can be easily detected in the FCN model, and relatively constant F1-scores can be obtained. Figure 12. Improved F1-score on applying the image preprocessing model.

Conclusions
In this study, an FCN-based road-surface crack detection model and a CNN-based image's brightness optimizing model were developed. A total of 14,400 images were collected for six road conditions, including cracks on the road surface; among these, 9600 were used as the training dataset, 2400 were used as the validation dataset, and 2400 were used as the test set. When the test dataset was input into the FCN-based crack detection model with 12 layers, an F1-score of 0.85 was obtained. When the brightness of the original image was changed, the performance of the FCN-based crack detection model degraded rapidly, and there were cases in which the road-surface cracks could not be detected. Figure 12. Improved F1-score on applying the image preprocessing model.

Conclusions
In this study, an FCN-based road-surface crack detection model and a CNN-based image's brightness optimizing model were developed. A total of 14,400 images were collected for six road conditions, including cracks on the road surface; among these, 9600 were used as the training dataset, 2400 were used as the validation dataset, and 2400 were used as the test set. When the test dataset was input into the FCN-based crack detection model with 12 layers, an F1-score of 0.85 was obtained. When the brightness of the original image was changed, the performance of the FCN-based crack detection model degraded rapidly, and there were cases in which the road-surface cracks could not be detected. However, when the CNN-based image preprocessing model was applied, which improved the detection of road-surface cracks by adjusting the brightness, the crack detection performance remained relatively constant.
Nevertheless, the crack images used in this study did not classify the types of cracks, and the FCN-based road-surface crack detection model was limited to expressing the damaged area alone. In addition, the 1920 × 256-sized labeling images were disadvantageous when directly applied to the latest CNN model or applied for transfer learning. Therefore, to develop a high-performance model that detects cracks and classifies the types of cracks, further research is needed to reconstruct the current labeled data.