Residual Strength Evaluation of Corroded Textile-Reinforced Concrete by the Deep Learning-Based Method

Residual strength of corroded textile-reinforced concrete (TRC) is evaluated using the deep learning-based method, whose feasibility is demonstrated by experiment. Compared to the traditional method, the proposed method does not need to know the climatic conditions in which the TRC exists. Firstly, the information about the faster region-based convolutional neural networks (Faster R-CNN) is described briefly, and then procedures to prepare datasets are introduced. Twenty TRC specimens were fabricated and divided into five groups that were treated to five different corrosion degrees corresponding to five different residual strengths. Five groups of images of microstructure features of these TRC specimens with five different residual strengths were obtained with portable digital microscopes in various circumstances. With the obtained images, datasets required to train, validate, and test the Faster R-CNN were prepared. To enhance the precision of residual strength evaluation, parameter analysis was conducted for the adopted model. Under the best combination of considered parameters, the mean average precision for the residual strength evaluation of the five groups of the TRC is 98.98%. The feasibility of the trained model was finally verified with new images and the procedures to apply the presented method were summarized. The paper provides new insight into evaluating the residual strength of structural materials, which would be helpful for safety evaluation of engineering structures.


Introduction
Textile-reinforced concrete (TRC), which is a new type of composite cement-based material, has received great attention due to its high tensile strength and excellent performance in alkali resistance. Many studies have been carried out for investigating the basic mechanical properties of the TRC. Some scholars investigated the effects of such parameters as the loading rate, temperature, and the arrangement of textile layers on the bending behavior of members made of the TRC through three-point or four-point bending experiments [1][2][3][4]. Some investigated the effects of some parameters, including the prestress levels, steel fiber properties, and freezing-thawing cycles, on the tensile performance of members made of the TRC [5,6]. Kong et al. [7] compared the tensile and flexural behavior of the TRC and found that the ultimate tensile strength of TRC obtained with bending experiments is higher than that obtained with tensile experiments. Additionally, some numerical models were developed for predicting the bending and tensile behaviors of TRC sandwich beams and verified with

The CNNs
Generally, a CNN consists of several convolutional (CONV) layers, max pooling (MP) layers, fully-connected (FC) layers, and soft-max (SM) layers. The CONV layers play a role for feature extraction from input images through a group of kernels composed of learnable weights. The depth of kernels is the same as that of the input layer, but the width and height of the former are smaller than that of the latter. Each kernel is set to slide on the input with a specified stride length and at each location of the kernel, as demonstrated in Figure 2, the dot product is carried out between the kernel and its respective field on the input. The stride length has a significant effect on the computation efficiency and the output size. Smaller stride length results in lower computation efficiency and larger output size, and helps to reduce feature loss. The values of the dot product, namely, the element-byelement multiplication between each kernel element and the corresponding element in the respective field, are added up plus a bias as the outcome of each kernel. All the outcomes of each kernel sliding to different locations of the input are arranged as the output of CONV layers. The output size of the CONV layer is affected by the input size, the kernel size as well as the stride length, and may be smaller than the input size. As feature loss may occur due to the size reduction of the output layer, zero padding the input, as shown in Figure 2, is an efficient way to keep the output size. An MP layer plays a role for size reduction of its input through the operation of downsampling which can save the computation time and reduce the probability of overfitting. Additionally, the MP

The CNNs
Generally, a CNN consists of several convolutional (CONV) layers, max pooling (MP) layers, fully-connected (FC) layers, and soft-max (SM) layers. The CONV layers play a role for feature extraction from input images through a group of kernels composed of learnable weights. The depth of kernels is the same as that of the input layer, but the width and height of the former are smaller than that of the latter. Each kernel is set to slide on the input with a specified stride length and at each location of the kernel, as demonstrated in Figure 2, the dot product is carried out between the kernel and its respective field on the input. The stride length has a significant effect on the computation efficiency and the output size. Smaller stride length results in lower computation efficiency and larger output size, and helps to reduce feature loss. The values of the dot product, namely, the element-by-element multiplication between each kernel element and the corresponding element in the respective field, are added up plus a bias as the outcome of each kernel. All the outcomes of each kernel sliding to different locations of the input are arranged as the output of CONV layers. The output size of the CONV layer is affected by the input size, the kernel size as well as the stride length, and may be smaller than the input size. As feature loss may occur due to the size reduction of the output layer, zero padding the input, as shown in Figure 2, is an efficient way to keep the output size.

The CNNs
Generally, a CNN consists of several convolutional (CONV) layers, max pooling (MP) layers, fully-connected (FC) layers, and soft-max (SM) layers. The CONV layers play a role for feature extraction from input images through a group of kernels composed of learnable weights. The depth of kernels is the same as that of the input layer, but the width and height of the former are smaller than that of the latter. Each kernel is set to slide on the input with a specified stride length and at each location of the kernel, as demonstrated in Figure 2, the dot product is carried out between the kernel and its respective field on the input. The stride length has a significant effect on the computation efficiency and the output size. Smaller stride length results in lower computation efficiency and larger output size, and helps to reduce feature loss. The values of the dot product, namely, the element-byelement multiplication between each kernel element and the corresponding element in the respective field, are added up plus a bias as the outcome of each kernel. All the outcomes of each kernel sliding to different locations of the input are arranged as the output of CONV layers. The output size of the CONV layer is affected by the input size, the kernel size as well as the stride length, and may be smaller than the input size. As feature loss may occur due to the size reduction of the output layer, zero padding the input, as shown in Figure 2, is an efficient way to keep the output size. An MP layer plays a role for size reduction of its input through the operation of downsampling which can save the computation time and reduce the probability of overfitting. Additionally, the MP An MP layer plays a role for size reduction of its input through the operation of downsampling which can save the computation time and reduce the probability of overfitting. Additionally, the MP Materials 2020, 13, 3226 4 of 18 layer performs an operation of extracting the maximum value from the window that slides on the input, as demonstrated in Figure 3. layer performs an operation of extracting the maximum value from the window that slides on the input, as demonstrated in Figure 3. An FC layer plays a role for the connection of all neurons from its previous layer, whose role is different from that of a CONV layer that connects neurons of a local region. In fact, the FC layer is a vector consisting of neurons obtained through the operation of the dot product with a bias for each neuron of its inputs.
A SM layer plays a role for the prediction of the category of its input according to the probabilities of the input being each category. The probabilities are computed with an SM function using the features provided by the FC layer. The input is categorized to be the category with the highest probability.

The Region Proposal Network
The RPN, whose overall architecture is demonstrated in Figure 4, plays a role to efficiently generate high-quality region proposals, through sharing CONV layers with the object detection network of the Fast R-CNN adopted in the present study. As can be seen in Figure 4, when an image is fed into the RPN, the output are a number of region proposals that are generated through sliding a mini-size network on the feature map (of the input image) obtained from the last CNN layer. The mini-size network, in fact, is a 3 × 3 spatial window of the feature map in the present study, as suggested by Ren et al. [27]. At the location of each sliding-window nine region proposals with different sizes are produced, which are actually nine rectangular boxes called anchors. These anchors are located at the center of the sliding-window and can be determined with eight parameters (i.e., the sliding-window center (xa, ya), three widths and three heights: ( , ℎ ), where m, n = 1-3). A concept of Intersection-over-Union (IoU) was proposed to estimate the matching degree between an anchor and a ground-true box (GTB). The IoU of an anchor and a ground-true box (GTB) is calculated to be the ratio of the area of their intersection to the area of their union. An anchor is labelled as positive if it achieves the highest IoU with a GTB, or if its IoU with every GTB is larger than 0.7. A non-positive anchor is labelled as negative if its IoU with any GTB is smaller than 0.3. Anchors that are not labelled are abandoned in the process of training. An FC layer plays a role for the connection of all neurons from its previous layer, whose role is different from that of a CONV layer that connects neurons of a local region. In fact, the FC layer is a vector consisting of neurons obtained through the operation of the dot product with a bias for each neuron of its inputs.
A SM layer plays a role for the prediction of the category of its input according to the probabilities of the input being each category. The probabilities are computed with an SM function using the features provided by the FC layer. The input is categorized to be the category with the highest probability.

The Region Proposal Network
The RPN, whose overall architecture is demonstrated in Figure 4, plays a role to efficiently generate high-quality region proposals, through sharing CONV layers with the object detection network of the Fast R-CNN adopted in the present study. As can be seen in Figure 4, when an image is fed into the RPN, the output are a number of region proposals that are generated through sliding a mini-size network on the feature map (of the input image) obtained from the last CNN layer. The mini-size network, in fact, is a 3 × 3 spatial window of the feature map in the present study, as suggested by Ren et al. [27]. At the location of each sliding-window nine region proposals with different sizes are produced, which are actually nine rectangular boxes called anchors. These anchors are located at the center of the sliding-window and can be determined with eight parameters (i.e., the sliding-window center (x a , y a ), three widths and three heights: (w m a , h n a ), where m, n = 1-3). A concept of Intersection-over-Union (IoU) was proposed to estimate the matching degree between an anchor and a ground-true box (GTB). The IoU of an anchor and a ground-true box (GTB) is calculated to be the ratio of the area of their intersection to the area of their union. An anchor is labelled as positive if it achieves the highest IoU with a GTB, or if its IoU with every GTB is larger than 0.7. A non-positive anchor is labelled as negative if its IoU with any GTB is smaller than 0.3. Anchors that are not labelled are abandoned in the process of training.
For each sliding window, a feature vector is obtained based on the activation function, such as the rectified linear unit (ReLU) function, which is commonly adopted as it not only provides nonlinearity but also enhances the convergence rate. The obtained feature vector is then taken as the input of two correlated functional layers. One functional layer is the box-classification layer, which computes the probability of being an object or just being part of the background in each anchor according to the feature vector and initial weights. The computed probability updates and varies between zero and one during the training process, and eventually gets close to one for a positive anchor and zero for a negative anchor. The other functional layer is the box-regression layer, which computes and updates Materials 2020, 13, 3226 5 of 18 the parameters that determine the location and size of the predicted bounding box (PBB) associated with an anchor during the process of training to better match a GTB [28].  For each sliding window, a feature vector is obtained based on the activation function, such as the rectified linear unit (ReLU) function, which is commonly adopted as it not only provides nonlinearity but also enhances the convergence rate. The obtained feature vector is then taken as the input of two correlated functional layers. One functional layer is the box-classification layer, which computes the probability of being an object or just being part of the background in each anchor according to the feature vector and initial weights. The computed probability updates and varies between zero and one during the training process, and eventually gets close to one for a positive anchor and zero for a negative anchor. The other functional layer is the box-regression layer, which computes and updates the parameters that determine the location and size of the predicted bounding box (PBB) associated with an anchor during the process of training to better match a GTB [28].
Training the RPN end-to-end is, as a matter of fact, a process to minimize the loss function shown in Equation (1) In Equation (1), Lcls and Lreg are the classification loss function and the regression loss function, respectively; i represents the numerical order of an anchor in the min-batch, and p i * represents the ground-truth label, which adopts a value of 1 or 0 for positive or negative anchors, respectively; pi represents the predicted probability of an object in the i th anchor. To normalize the two loss functions, the values of Ncls and Nreg, as suggested by Ren et al. [27], were adopted to be the mini-batch size (MBS) and ten percent of the number of anchors, respectively. The ti,j (j = x, y, w, h) is a vector that describes geometrical differences between the PBB and the anchor, and the t i, j * is a vector that describes geometrical differences between the GTB and the anchor. The ti,j and where (xi, yi, wi, hi) determines the center location and sizes (width and height) of the PBB associated with the i th anchor; Similarly, the team (xi,a, yi,a, wi,a, hi,a) determines the center location and sizes of the i th anchor, and (x * , y * , w * , h * ) determines the center location and sizes of the GTB. It should be noted that the four parameters determining the PBB are continuously renewed to approach those of the Training the RPN end-to-end is, as a matter of fact, a process to minimize the loss function shown in Equation (1) using 128 positive and 128 negative anchors selected randomly from an image. The techniques of backpropagation and stochastic gradient descent (SGD) min-batch are employed in the training process: In Equation (1), L cls and L reg are the classification loss function and the regression loss function, respectively; i represents the numerical order of an anchor in the min-batch, and p * i represents the ground-truth label, which adopts a value of 1 or 0 for positive or negative anchors, respectively; p i represents the predicted probability of an object in the ith anchor. To normalize the two loss functions, the values of N cls and N reg , as suggested by Ren et al. [27], were adopted to be the mini-batch size (MBS) and ten percent of the number of anchors, respectively. The t i,j (j = x, y, w, h) is a vector that describes geometrical differences between the PBB and the anchor, and the t * i,j is a vector that describes geometrical differences between the GTB and the anchor. The t i,j and t * i,j can be obtained with the following matrix: where (x i , y i , w i , h i ) determines the center location and sizes (width and height) of the PBB associated with the i th anchor; Similarly, the team (x i,a , y i,a , w i,a , h i,a ) determines the center location and sizes of the i th anchor, and (x * , y * , w * , h * ) determines the center location and sizes of the GTB. It should be noted that the four parameters determining the PBB are continuously renewed to approach those of the GTB in the process of training. The position relations among the PBB, the anchor, and the GTB are demonstrated in Figure 5.
In Equation (1), the log loss function, L cls = −logp u , was selected as the classification loss function and the Equation (3) shown below was used as the regression loss function: Materials 2020, 13, 3226 6 of 18 where y 1 and y 2 are variables for illustration. For more detailed information on the training process of the RPN can be found in [27].
Materials 2020, 13, x FOR PEER REVIEW 6 of 19 GTB in the process of training. The position relations among the PBB, the anchor, and the GTB are demonstrated in Figure 5. In Equation (1), the log loss function, Lcls = −logpu, was selected as the classification loss function and the Equation (3) shown below was used as the regression loss function: where y1 and y2 are variables for illustration. For more detailed information on the training process of the RPN can be found in [27].

Fast R-CNN
The Fast R-CNN plays a role for localizing and classifying objects in images and its overall architecture is demonstrated in Figure 6. As shown in Figure 6, the Fast R-CNN also makes use of the CNNs to acquire the feature map of the input image and adopts the object proposals provided by the RPN. Features on the feature map, associated with an object proposal, are usually called a region of interest (RoI). For each RoI, a fixed-size feature vector is acquired through the max pooling operation conducted in the RoI pooling layer ( Figure 6). The acquired feature vector is then fed into several FC layers followed by two functional layers. One functional layer is the softmax layer that computes and displays the probability of a RoI being each of g + 1 classes (g training categories +1 background category), and the other functional layer is the regression layer that computes and displays the four parameters that determine the center location ( T x u ,T y u ), height ( T h u ), and width ( T w u ) of object bounding boxes. The IoU of the RoI and the GTB is also used to estimate their matching degree. For each RoI, it is labelled as positive (u = 1) when its IoU with a GTB is greater than 0.5, and it is labelled as negative (u = 0) when the maximum value of its IoU with all the GTB is in the range of [0.1,0.5) [27].

Fast R-CNN
The Fast R-CNN plays a role for localizing and classifying objects in images and its overall architecture is demonstrated in Figure 6. As shown in Figure 6, the Fast R-CNN also makes use of the CNNs to acquire the feature map of the input image and adopts the object proposals provided by the RPN. Features on the feature map, associated with an object proposal, are usually called a region of interest (RoI). For each RoI, a fixed-size feature vector is acquired through the max pooling operation conducted in the RoI pooling layer ( Figure 6). The acquired feature vector is then fed into several FC layers followed by two functional layers. One functional layer is the softmax layer that computes and displays the probability of a RoI being each of g + 1 classes (g training categories +1 background category), and the other functional layer is the regression layer that computes and displays the four parameters that determine the center location (T u x ,T u y ), height (T u h ), and width (T u w ) of object bounding boxes. The IoU of the RoI and the GTB is also used to estimate their matching degree. For each RoI, it is labelled as positive (u = 1) when its IoU with a GTB is greater than 0.5, and it is labelled as negative (u = 0) when the maximum value of its IoU with all the GTB is in the range of [0.1,0.5) [27]. In Equation (1), the log loss function, Lcls = −logpu, was selected as the classification loss function and the Equation (3) shown below was used as the regression loss function: where y1 and y2 are variables for illustration. For more detailed information on the training process of the RPN can be found in [27].

Fast R-CNN
The Fast R-CNN plays a role for localizing and classifying objects in images and its overall architecture is demonstrated in Figure 6. As shown in Figure 6, the Fast R-CNN also makes use of the CNNs to acquire the feature map of the input image and adopts the object proposals provided by the RPN. Features on the feature map, associated with an object proposal, are usually called a region of interest (RoI). For each RoI, a fixed-size feature vector is acquired through the max pooling operation conducted in the RoI pooling layer ( Figure 6). The acquired feature vector is then fed into several FC layers followed by two functional layers. One functional layer is the softmax layer that computes and displays the probability of a RoI being each of g + 1 classes (g training categories +1 background category), and the other functional layer is the regression layer that computes and displays the four parameters that determine the center location ( T x u ,T y u ), height ( T h u ), and width ( T w u ) of object bounding boxes. The IoU of the RoI and the GTB is also used to estimate their matching degree. For each RoI, it is labelled as positive (u = 1) when its IoU with a GTB is greater than 0.5, and it is labelled as negative (u = 0) when the maximum value of its IoU with all the GTB is in the range of [0.1,0.5) [27].  Training the Fast R-CNN end-to-end is, in fact, a process to minimize the multi-class loss function given in Equation (4) for each labelled RoI, in which the techniques of backpropagation and the SGD min-batch are employed: In Equation (4), L cls = −logp u stands for the log loss function, and L reg stands for the regression loss function as given in Equation (3); u is the label of the GTB, and v is a vector that determines the location coordinates and sizes (height and width) of the GTB. The Iverson bracket [u ≥ 1] adopts a value of 1 when u ≥ 1 and 0 otherwise. To keep balance between the two loss functions, the hyper-parameter λ adopted a value of 1 [27]. During each iteration, two images and 128 RoIs (consisting of mini-batches) acquired from the two images are picked at random to train the Fast R-CNN. From the study of Girshick et al. [28], readers can find more detailed information on the training process of the Fast R-CNN.

Architecture of the CNNs Based on VGG16-Net
To enhance computing efficiency, the RPN and Fast R-CNN are intentionally designed to use the same architecture as the CNN. Currently, many famous architectures have been developed for the CNN, including the Microsoft ResNet-152, GoogleNet, ZF-Net, and VGG16-net [18]. As the VGG16-net can make a good balance between the computing efficiency and detecting accuracy, therefore, it was chosen for the CNN architecture in the present study. The VGG16-net is usually constituted by thirteen weighted CONV layers, five MP layers, three weighted FC layers, and a SM layer. All CONV layers make use of nonlinear activation functions (i.e., the ReLU) to enhance the convergence rate, and take advantage of the technique of zero-padding to maintain their spatial sizes. All MP layers, also using zero-padding to maintain size, conduct a spatial pooling operation by sliding 2 × 2 filters two pixels per stride. Following the CONV and MP layers are three FC layers and a SM layer which is adopted to classify objects in images.
To present a Faster R-CNN-based framework to evaluate the residual strength of the TRC, modification was made to the initial overall architecture of VGG16-net to better match the RPN and Fast R-CNN. With regard to the modified RPN demonstrated in Figure 7, the final MP layer and the three FC layers of the primary VGG16-net was substituted with a sliding CONV after which there is an FC layer with 512 dimensions in depth, and the SM layer of the primary VGG16-net was substituted with the SM and regression layers. The detailed information about the VGG16-net-based RPN is summarized in Table 1.

∈{ , , , }
In Equation (4), Lcls = −logpu stands for the log loss function, and Lreg stands for the regression loss function as given in Equation (3); u is the label of the GTB, and v is a vector that determines the location coordinates and sizes (height and width) of the GTB. The Iverson bracket [u ≥ 1] adopts a value of 1 when u ≥ 1 and 0 otherwise. To keep balance between the two loss functions, the hyperparameter λ adopted a value of 1 [27]. During each iteration, two images and 128 RoIs (consisting of mini-batches) acquired from the two images are picked at random to train the Fast R-CNN. From the study of Girshick et al. [28], readers can find more detailed information on the training process of the Fast R-CNN.

Architecture of the CNNs based on VGG16-Net
To enhance computing efficiency, the RPN and Fast R-CNN are intentionally designed to use the same architecture as the CNN. Currently, many famous architectures have been developed for the CNN, including the Microsoft ResNet-152, GoogleNet, ZF-Net, and VGG16-net [18]. As the VGG16-net can make a good balance between the computing efficiency and detecting accuracy, therefore, it was chosen for the CNN architecture in the present study. The VGG16-net is usually constituted by thirteen weighted CONV layers, five MP layers, three weighted FC layers, and a SM layer. All CONV layers make use of nonlinear activation functions (i.e., the ReLU) to enhance the convergence rate, and take advantage of the technique of zero-padding to maintain their spatial sizes. All MP layers, also using zero-padding to maintain size, conduct a spatial pooling operation by sliding 2 × 2 filters two pixels per stride. Following the CONV and MP layers are three FC layers and a SM layer which is adopted to classify objects in images.
To present a Faster R-CNN-based framework to evaluate the residual strength of the TRC, modification was made to the initial overall architecture of VGG16-net to better match the RPN and Fast R-CNN. With regard to the modified RPN demonstrated in Figure 7, the final MP layer and the three FC layers of the primary VGG16-net was substituted with a sliding CONV after which there is an FC layer with 512 dimensions in depth, and the SM layer of the primary VGG16-net was substituted with the SM and regression layers. The detailed information about the VGG16-net-based RPN is summarized in Table 1.    Figure 8, the final MP layer of the primary VGG16-net was substituted with a RoI pooling layer. For the purpose of preventing overfitting during the process of training, between each of the three FC layers of the primary VGG16-net were inserted with dropout layers whose threshold value was set to be 0.5. To match the number of classifications considered in the present study, the depth of the final FC layer was, thus, altered correspondingly to six for five residual strengths and background. The final SM layer was substituted with the SM and regression layers. Table 2 summarizes the detailed information about the VGG16-net-based Fast R-CNN.  Figure 8, the final MP layer of the primary VGG16-net was substituted with a RoI pooling layer. For the purpose of preventing overfitting during the process of training, between each of the three FC layers of the primary VGG16net were inserted with dropout layers whose threshold value was set to be 0.5. To match the number of classifications considered in the present study, the depth of the final FC layer was, thus, altered correspondingly to six for five residual strengths and background. The final SM layer was substituted with the SM and regression layers. Table 2 summarizes the detailed information about the VGG16net-based Fast R-CNN.

Faster R-CNN Composed of the RPN and Fast R-CNN
To improve computing speed efficiently, the Faster R-CNN were designed intentionally to combine the RPN and Fast R-CNN that share the same CNNs for image feature extraction, as demonstrated in Figure 9. Training the Faster R-CNN is actually a four-step alternating process.
Step 1 is to train the RPN following the procedures discussed in Section 2.2, in which the object proposals to be used for training the Fast R-CNN are prepared. The second step is to train the Fast R-CNN, following these procedures discussed in Section 2.3, with the object proposals prepared in step 2. The third step is to initialize the RPN with the final weights obtained from previous step, and to fine-tune these layers exclusive to the RPN with the shared CONV layers fixed. The final step is to fine-tune these layers exclusive to the Fast R-CNN utilizing the object proposals obtained in step 3 with the shared CONV layers fixed. As hundreds to thousands of object proposals are generated from an image through the RPN, which will lower the computing efficiency and estimating accuracy, these object proposals are sorted based on the scores obtained from the box-classification layer, and the first 2000 object proposals are utilized for the training of the Fast R-CNN in step two. Additionally, it has been proved that training the Faster R-CNN with the first 300 object proposals obtained from the final step can make a good balance between detecting accuracy and detecting speed.

Faster R-CNN Composed of the RPN and Fast R-CNN
To improve computing speed efficiently, the Faster R-CNN were designed intentionally to combine the RPN and Fast R-CNN that share the same CNNs for image feature extraction, as demonstrated in Figure 9. Training the Faster R-CNN is actually a four-step alternating process.
Step 1 is to train the RPN following the procedures discussed in Section 2.2, in which the object proposals to be used for training the Fast R-CNN are prepared. The second step is to train the Fast R-CNN, following these procedures discussed in Section 2.3, with the object proposals prepared in step 2. The third step is to initialize the RPN with the final weights obtained from previous step, and to fine-tune these layers exclusive to the RPN with the shared CONV layers fixed. The final step is to fine-tune these layers exclusive to the Fast R-CNN utilizing the object proposals obtained in step 3 with the shared CONV layers fixed. As hundreds to thousands of object proposals are generated from an image through the RPN, which will lower the computing efficiency and estimating accuracy, these object proposals are sorted based on the scores obtained from the box-classification layer, and the first 2000 object proposals are utilized for the training of the Fast R-CNN in step two. Additionally, it has been proved that training the Faster R-CNN with the first 300 object proposals obtained from the final step can make a good balance between detecting accuracy and detecting speed.

Dataset Preparation
To estimate the residual strength of the TRC based on deep learning approaches, datasets need to be prepared beforehand to train, validate, and test the Faster R-CNN model. As demonstrated in Figure 10, datasets were generated following a three-step procedure in the present study.

Dataset Preparation
To estimate the residual strength of the TRC based on deep learning approaches, datasets need to be prepared beforehand to train, validate, and test the Faster R-CNN model. As demonstrated in Figure 10, datasets were generated following a three-step procedure in the present study.
In the first step, specimens of the TRC were prepared. Twenty TRC specimens were fabricated and cured under standard conditions in Shunxing Concrete Co. LTD in Hunan Province of China. These TRC specimens were evenly divided into five groups. One group was exposed to normal conditions while other four groups were treated to four different corrosion degrees by immersing them into the tank made by Cangzhou Xingye Test Instrument Co. LTD in Hebei Province of China, which contained water at 80 degrees, for different days, as demonstrated on the left of Figure 10. The immersed time for the five groups of specimens was 0, 6, 12, 18, and 24 days, respectively, and the corresponding corrosion degrees were 0, 0.43, 0.48, 0.52, and 0.56, respectively, which corresponded to five different residual strengths of 1, 0.57, 0.52, 0.48, and 0.44, respectively, that were denoted with P-0, P-6, P-12, P-18, and P-24, respectively [29]. It should be noted that in the present study residual strengths were obtained from three-point bending test and the value of residual strength under each corrosion degree is the average residual strength of the four specimens under consideration. Materials 2020, 13, x FOR PEER REVIEW 11 of 19 Figure 10. Procedures to generate datasets.
In the first step, specimens of the TRC were prepared. Twenty TRC specimens were fabricated and cured under standard conditions in Shunxing Concrete Co. LTD in Hunan Province of China. These TRC specimens were evenly divided into five groups. One group was exposed to normal conditions while other four groups were treated to four different corrosion degrees by immersing them into the tank made by Cangzhou Xingye Test Instrument Co. LTD in Hebei Province of China, which contained water at 80 degrees, for different days, as demonstrated on the left of Figure 10. The immersed time for the five groups of specimens was 0, 6, 12, 18, and 24 days, respectively, and the corresponding corrosion degrees were 0, 0.43, 0.48, 0.52, and 0.56, respectively, which corresponded to five different residual strengths of 1, 0.57, 0.52, 0.48, and 0.44, respectively, that were denoted with P-0, P-6, P-12, P-18, and P-24, respectively [29]. It should be noted that in the present study residual strengths were obtained from three-point bending test and the value of residual strength under each corrosion degree is the average residual strength of the four specimens under consideration.
In the second step, images of the TRC specimens were captured under different corrosion degrees. At each corrosion degree, two-megapixel portable digital microscopes with a brand of Smolia made in Fukuoka, Japan (1920 × 1080) were used to capture the microstructure features of the TRC specimens, as demonstrated in the middle of Figure 10. As the field of view of the portable digital microscope is only 2.3 × 1.3 mm, the resolution of the captured images reaches approximately 21,000 dots per inch (dpi). To enhance the robustness of the estimation, different portable digital microscopes with the same specification were adopted, based on which images were captured under different light conditions by different photographers. Under each corrosion degree, 550 initial images were taken from all specimens, with 500 images used to train the model and 50 reserved for robustness verification of the trained model. After augmenting the dataset via the operation of mirror (including horizontal and vertical) and 180-degree rotation, the number of images used for model training and robustness verification increased to 10,000 (that is, 500 × 5 × 4) and 1000 images (that is, 50 × 5 × 4), respectively.
In the final step, datasets were established utilizing images obtained in step 2. As the Faster R-CNN is a supervised model, images need to be labelled first and then used for training the model. The 10,000 images (2000 for each residual strength) obtained for model training in step two were labelled with the days that they were immersed in the water, as demonstrated on the right of Figure  10. For instance, if a specimen was immersed into the water for six days, its corresponding image was labeled to be P-6. Among the 10,000 labelled images, the proportion of images utilized for creating training, validating, and testing datasets were 40%, 40%, and 20%, respectively. In the second step, images of the TRC specimens were captured under different corrosion degrees. At each corrosion degree, two-megapixel portable digital microscopes with a brand of Smolia made in Fukuoka, Japan (1920 × 1080) were used to capture the microstructure features of the TRC specimens, as demonstrated in the middle of Figure 10. As the field of view of the portable digital microscope is only 2.3 × 1.3 mm, the resolution of the captured images reaches approximately 21,000 dots per inch (dpi). To enhance the robustness of the estimation, different portable digital microscopes with the same specification were adopted, based on which images were captured under different light conditions by different photographers. Under each corrosion degree, 550 initial images were taken from all specimens, with 500 images used to train the model and 50 reserved for robustness verification of the trained model. After augmenting the dataset via the operation of mirror (including horizontal and vertical) and 180-degree rotation, the number of images used for model training and robustness verification increased to 10,000 (that is, 500 × 5 × 4) and 1000 images (that is, 50 × 5 × 4), respectively.
In the final step, datasets were established utilizing images obtained in step 2. As the Faster R-CNN is a supervised model, images need to be labelled first and then used for training the model. The 10,000 images (2000 for each residual strength) obtained for model training in step two were labelled with the days that they were immersed in the water, as demonstrated on the right of Figure 10. For instance, if a specimen was immersed into the water for six days, its corresponding image was labeled to be P-6. Among the 10,000 labelled images, the proportion of images utilized for creating training, validating, and testing datasets were 40%, 40%, and 20%, respectively.

Implementation Details
Based on the open-source Faster R-CNN model, all experiments were implemented on a service station with the Caffe framework under the graphics processing unit (GPU) mode. The hardware configuration of the service station is as follows: central processing unit (CPU): Intel i7-8700k (3.20 GHz), GPU: 11 GB memory, 16 GB DDR4 memory, and a ZOTAC X-GMING GeForce RTX 2080Ti. For Faster R-CNN, the sizes of all images adopted for training and validating the RPN and Fast R-CNN are resized to make the maximum value of their long and short sides smaller than 1000 and 600 pixels, respectively. The original parameters for the CNN layers and FC layers are obtained from two zero-mean Gaussian distributions whose standard deviations are 0.001 and 0.01, respectively. The values of the MBS, learning rate (LR), momentum, and weight decay adopted to train the RPN and Fast R-CNN are 128, 0.001, 0.9, and 0.0005, respectively. The scales of nine anchors are obtained by combining three different scales {128 2 , 256 2 , 512 2 } and three different aspect ratios {1:1, 1:2, 2:1}. Since cross-boundary anchors could inevitably lead to a non-convergence problem, anchors whose boundaries cross images were abandoned in the process of training. Additionally, the value of non-maximum suppression was given a value of 0.7 for the reduction of overlap between object proposals. A more detailed description about parameter initialization of the Faster R-CNN can be found in [27].
Average precision (AP), which is calculated on the basis of the precision-recall curve of each class, is used for estimating the performance of the trained Faster R-CNN model [30]. For each class, the definition of precision is the proportion of correct detections to all the detections returned by the algorithm, and the definition of recall is the proportion of correct detections to all the considered ground-truth instances. The terminology mean AP (mAP), as the term suggests, is the mean of all calculated APs. From the study of Girshick [30], readers can find more details about the precision-recall curve and the AP.

Training, Validating, and Testing Results
The Faster R-CNN model was first trained through the previously discussed four-step training strategy using original parameters and was tested using the testing dataset. With the service station introduced in Section 3.2, the time required for training the model for 280,000 iterations and evaluating an image with a resolution of 1920 × 1080-pixel is around 14.0 h and 0.072 s, respectively. Figure 11 shows the change of training loss against the number of iterations. As can be observed from Figure 11, the training loss declines as the number of iterations increases and gradually becomes stable after 230,000 iterations. Figure 12 shows the precision-recall curve of each case under consideration for the testing dataset based on the model trained for 230,000 iterations. With the obtained precision-recall curve, the APs and mAP can be computed, as also shown in Figure 12. It can be observed from Figure 12 that the APs for residual strength evaluation of P-0, P-6, P-12, P-18, and P-24 are 99.51%, 99.75%, 98.50%, 90.43%, and 90.75%, respectively, and the relevant mAP is 95.79%. Likewise, based on the models trained for different iterations, the APs and mAPs were obtained and plotted against the number of iterations, as shown in Figure 13. Expectedly, as the number of iterations increases, the APs and mAPs both increase at first and then tend to be stable after 230,000 iterations.

Parameter Optimization
As the original parameters adopted for Faster R-CNN in the study of Ren et al. [27] might not be the best combination for the dataset under consideration, in the present study parameters for Faster R-CNN were optimized for achieving better accuracy of residual strength evaluation. As indicated by Ren et al. [27] that the performance of Faster R-CNN is significantly affected by three key parameters, namely, the anchor scale (AS), the MBS, and LR. Therefore, the influences of these parameters on the accuracy of residual strength evaluation were investigated. With three sets of ASs,

Parameter Optimization
As the original parameters adopted for Faster R-CNN in the study of Ren et al. [27] might not be the best combination for the dataset under consideration, in the present study parameters for Faster R-CNN were optimized for achieving better accuracy of residual strength evaluation. As indicated by Ren et al. [27] that the performance of Faster R-CNN is significantly affected by three key parameters, namely, the anchor scale (AS), the MBS, and LR. Therefore, the influences of these parameters on the accuracy of residual strength evaluation were investigated. With three sets of ASs, five MBSs, and three LRs considered, the number of combinations of the three parameters was 45. The APs and mAPs calculated from the testing dataset under the 45 combinations were summarized, as illustrated in Figure 14 and Table 3. It should be noted that the three sizes of anchors under consideration were determined based on the size of the images (600 × 1000 pixels) to be detected. It should be pointed out that in the study of Ren et al. an RoI was labelled as the computed category if a probability of less than 0.6 was computed from the SM layer for the RoI [27]. It should also be noted that the number of iterations adopted was to be 230,000 for each model training.

Parameter Optimization
As the original parameters adopted for Faster R-CNN in the study of Ren et al. [27] might not be the best combination for the dataset under consideration, in the present study parameters for Faster R-CNN were optimized for achieving better accuracy of residual strength evaluation. As indicated by Ren et al. [27] that the performance of Faster R-CNN is significantly affected by three key parameters, namely, the anchor scale (AS), the MBS, and LR. Therefore, the influences of these parameters on the accuracy of residual strength evaluation were investigated. With three sets of ASs, five MBSs, and three LRs considered, the number of combinations of the three parameters was 45. The APs and mAPs calculated from the testing dataset under the 45 combinations were summarized, as illustrated in Figure 14 and Table 3. It should be noted that the three sizes of anchors under consideration were determined based on the size of the images (600 × 1000 pixels) to be detected. It should be pointed out that in the study of Ren et al. an RoI was labelled as the computed category if a probability of less than 0.6 was computed from the SM layer for the RoI [27]. It should also be noted that the number of iterations adopted was to be 230,000 for each model training.    As can be observed from Figure 14 and Table 3, the AS, MBS, and LR affects the APs and mAPs in a coupled way. The largest APs for residual strength evaluation of P-0, P-6, P-12, P-18, and P-24 are 99.93% for Case 34, 99.30% for Case 1, 99.45% for Case 34, 97.92% for Case 23, and 99.08% for Case 23, respectively. To make a better balance among APs for different cases, Case 23 was selected, in which the mAP reaches the largest value of 98.98%, and the corresponding APs for residual strength evaluation of P-0, P-6, P-12, P-18, and P-24 are 99.20%, 99.66%, 99.05%, 97.92%, and 99.08%, respectively. The ASs in Case 23 are 128 2 , 256 2 , and 512 2 , and the anchor ratios are 1:1, 1:2, and 2:1. The MBS and LR in Case 23 are 64 and 0.0005, respectively.

Testing New Images
To examine the feasibility of the presented method, the model obtained in Case 23 was used for evaluating the residual strength of the 1000 images (200 for each residual strength) that were reserved in Section 3.1. The evaluation results for the five cases under consideration are summarized in Table 4, from which it can be seen that the APs for residual strength evaluation of P-0, P-6, P-12, P-18, and P-24 are 99.5%, 99.5%, 100%, 100%, and 100%, respectively, and the corresponding mAP is 99.8%. The results demonstrate that the trained model also has an excellent performance for residual strength evaluation of new images, demonstrating the feasibility of the presented method. Figure 15 illustrates some of the evaluation results for each case under consideration, in which the images that were wrongly evaluated for residual strength of P-0 and P-6 were given specifically in Figure 15a,b. It should be pointed out that the 1000 reserved new images were captured in various circumstances as discussed in Section 3.1, which means that the effects of these circumstances on the accurate rate of residual strength evaluation is insignificant. It should also be pointed out that during the testing process for each new image, the well-trained model will output a predicted value of residual strength represented by the image. if the predicted value matches the actual residual strength that was obtain from experiment, the evaluation is assumed to be correct and vice versa. The average precision presented in Table 4 was calculated to be the ratio of the number of correct evaluations to the total evaluations. evaluation of P-0, P-6, P-12, P-18, and P-24 are 99.20%, 99.66%, 99.05%, 97.92%, and 99.08%, respectively. The ASs in Case 23 are 128 2 , 256 2 , and 512 2 , and the anchor ratios are 1:1, 1:2, and 2:1. The MBS and LR in Case 23 are 64 and 0.0005, respectively.

Testing New Images
To examine the feasibility of the presented method, the model obtained in Case 23 was used for evaluating the residual strength of the 1000 images (200 for each residual strength) that were reserved in Section 3.1. The evaluation results for the five cases under consideration are summarized in Table  4, from which it can be seen that the APs for residual strength evaluation of P-0, P-6, P-12, P-18, and P-24 are 99.5%, 99.5%, 100%, 100%, and 100%, respectively, and the corresponding mAP is 99.8%. The results demonstrate that the trained model also has an excellent performance for residual strength evaluation of new images, demonstrating the feasibility of the presented method. Figure 15 illustrates some of the evaluation results for each case under consideration, in which the images that were wrongly evaluated for residual strength of P-0 and P-6 were given specifically in Figure 15a,b. It should be pointed out that the 1000 reserved new images were captured in various circumstances as discussed in Section 3.1, which means that the effects of these circumstances on the accurate rate of residual strength evaluation is insignificant. It should also be pointed out that during the testing process for each new image, the well-trained model will output a predicted value of residual strength represented by the image. if the predicted value matches the actual residual strength that was obtain from experiment, the evaluation is assumed to be correct and vice versa. The average precision presented in Table 4 was calculated to be the ratio of the number of correct evaluations to the total evaluations.

Discussion
The results illustrated in Section 4 have demonstrated that the proposed Faster R-CNN-based approach is capable of learning and detecting microstructure feature differences of the TRC with different residual strengths. In reality, the actual residual strength required to be estimated is usually different from one of those adopted for training the model. Therefore, the actual residual strength is most likely to be evaluated to be the training residual strength which is closest to the actual residual strength. To reduce the evaluation error, one effective approach is to use more specimens for training the model, which contributes to shortening the gap between the actual residual strength and the closest training residual strength. Additionally, the surface of the TRC component required for residual strength evaluation needs to be cleaned to take images for good quality. It should be pointed out that different types of materials own different microstructure features, therefore, new models are required to train for different types of materials. The procedures for implementing the proposed deep learning-based framework for residual strength evaluation of a material are summarized below: 1.
Prepare sufficient samples under differing corrosion degrees and guarantee that the samples could contain the scope of residual strengths required to be evaluated and thus enhance the evaluation precision.

2.
Acquire high-quality images of prepared samples and label acquired images.

3.
Select a proper deep learning-based framework and train it using these images acquired in step 2; check the soundness of the trained model with new images not adopted for training.

4.
Acquire the images of components that need to be evaluated and utilize the trained model to estimate the residual strength.
It should be pointed out that enhancing image quality and augmenting the dataset of images adopted for training and validating the model are efficient ways to improve the model performance.

Conclusions
Previous models to predict the residual strength of textile-reinforced concrete need to know the climatic conditions (temperature and humidity) in which the TRC exists, which is difficult in practice. A deep learning-based framework based on the Faster R-CNN is presented to evaluate the residual strength of the TRC under different corrosion degrees, without the need to know the climatic conditions. Five groups of TRC specimens were fabricated and treated to five different corrosion degrees corresponding to five different residual strengths by immersing them into the water tank for different days (namely, from 0 to 24 days with an interval of six days, denoted with P-0, P-6, P-12, P-18, and P-24, respectively). Images of microstructure features of these specimens with five different residual strengths were taken in various circumstances with portable digital microscopes. The resolution of the obtained images reaches approximately 21,000 dots per inch (dpi). Among the 11,000 images adopted in the study, the proportion of 10,000 images utilized for creating the training, validating, and testing datasets were 40%, 40%, and 20%, respectively, and the other 1000 new images (200 for each strength) were reserved to check the feasibility of the trained models.
The influences of three key parameters, namely, the anchor scale, mini-batch size, and learning rate, on the precision of residual strength evaluation were investigated, based on which a best combination of the three parameters was acquired to train the Faster R-CNN. The maximum APs for residual strength evaluation of P-0, P-6, P-12, P-18, and P-24, obtained under the best combination of these parameters, are 99.20%, 99.66%, 99.05%, 97.92%, and 99.08%, respectively, and the mAP is 98.98%. It should be noted that the maximum APs were obtained by make comparisons with the results obtained from experiments.
The paper provides a new way to evaluate residual strength of materials. However, it should be pointed out that under the same corrosion degree the microstructure features of different materials differ from each other. This indicates that specific models are required for different materials. In the future, efforts will be focused on other types of materials which are widely utilized in industry to further check the feasibility of the presented approach. It should also be noted that the presented method is actually a way to build the relationship between the microstructure features and micro properties of a material, which can be applied in many fields.