Crack Identification for Bridge Condition Monitoring Combining Graph Attention Networks and Convolutional Neural Networks

Chen, Feiyu; Tong, Tong; Hua, Jiadong; Cui, Chun

doi:10.3390/app15105452

Open AccessArticle

Crack Identification for Bridge Condition Monitoring Combining Graph Attention Networks and Convolutional Neural Networks

¹

713 Research Institute of CSSC, Zhengzhou 450015, China

²

Henan Key Laboratory of Intelligent Underwater Equipment, Zhengzhou 450015, China

³

School of Reliability and Systems Engineering, Beihang University, Xueyuan Road No. 37, Haidian District, Beijing 100191, China

⁴

Science & Technology on Reliability and Environmental Engineering Laboratory, Beihang University, Xueyuan Road No. 37, Haidian District, Beijing 100191, China

⁵

Advanced Manufacturing Center, Ningbo Institute of Technology, Beihang University, Ningbo 315100, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(10), 5452; https://doi.org/10.3390/app15105452

Submission received: 25 March 2025 / Revised: 25 April 2025 / Accepted: 29 April 2025 / Published: 13 May 2025

(This article belongs to the Special Issue Machine Learning in Vibration and Acoustics 2.0)

Download

Browse Figures

Versions Notes

Abstract

:

Orthotropic steel box girders and steel bridge decks are commonly applied to bridges. Because of the coupling of original defects and alternating forces, fatigue cracks are likely to appear in the structures. In order to ensure the life span of bridges, methods for automatic crack identification are needed. In this paper, we present a novel approach for crack detection and bridge condition monitoring by integrating convolutional neural networks (CNNs) with graph attention networks (GATs). At first, the original large-sized images are divided into small-sized patches, and these patches are input into a CNN architecture to extract features by decreasing dimensions. Then, the output features of the CNN model are considered as nodes of the graph. Considering the spatial relationship among the patches in the original image, the node from the central patch is connected to the nodes from its neighboring patches to constitute a graph structure, which can be input into a GAT model to learn the relationship among the nodes and update the features. Finally, the output features of GAT can judge whether the central patch contains cracks. Forty original large-sized images are cropped into abundant patches for the training of the CNN-GAT model. With the use of a sliding window technique, the trained CNN-GAT model is capable of finding the patches containing cracks in the test images with large sizes. From the test results, the location and the size of the cracks are exhibited, which indicates that the proposed approach is effective for crack identification in bridge structures.

Keywords:

crack identification; convolutional neural networks; graph attention networks; machine vision; bridge condition monitoring

1. Introduction

Steel box girders and steel bridge decks are widely used in large-span bridges. In the manufacturing of these structural components, initial flaws may occur due to improper operation [1]. During the service time of the bridge, alternating forces from vehicles can promote the initiation and expansion of cracks. The expansion of fatigue cracks will reduce reliability and shorten the operational life span of the bridge [2,3]. For safe and reliable operation of bridges, researchers focus on crack detection to develop bridge condition monitoring methods. In recent years, vision-based methods for condition monitoring have attracted interest as machine vision has developed rapidly.

In the field of civil engineering, traditional digital image processing techniques (IPTs) have been applied to crack detection. Abdel-Qader et al. [4] compared the performance of several kinds of frequently-used edge detection operators and concluded that the Haar transform is more effective and reliable than the other methods for crack identification. Yamaguchi and Hashimoto [5] employed a percolation-based method and increased the efficiency of crack detection. Yeum and Dyke [6] integrated prior knowledge with IPTs and realized crack detection near bolts. Li et al. [7] proposed a local binarization algorithm, which can be used to identify concrete cracks. Though conventional IPTs are able to detect cracks in images in some cases, the results are sensitive to environmental changes [8], which limits the application of conventional digital image processing techniques.

Recently, deep learning algorithms have achieved rapid development, which facilitates its wide use in machine vision, including crack detection from pictures. Among the diverse deep learning networks, the convolutional neural network (CNN) is classic and representative due to its effectiveness and applicability. Naturally, the methods for crack detection based on CNNs have been explored. Cha et al. [9] combined CNN with a sliding window technique and achieved crack detection in large-sized images. Modarres et al. [10] compared multiple machine learning algorithms and discovered that the CNN model is efficient for crack detection from images. Xu et al. [11] modified the convolutional neural network architecture with fusion techniques, which can be used to detect cracks and other defects from the images of bridge steel box girders. Gopalakrishnan et al. [12] established a deep convolutional neural network (DCNN) for automatic crack detection in pavement pictures. Laxman et al. [13] developed an integrated CNN model by integrating the convolutional layers with regression models, and the proposed model could automatically decide the crack depth. Yu et al. [14] combined a pre-trained CNN model with the decision fusion algorithm, which improved crack detection accuracy. Li et al. [15] and Gwon et al. [16] used unmanned aerial vehicles and CNNs to automatically detect bridge cracks. Khan et al. [17] also combined unmanned aerial vehicles with the CNN-based method and realized real-time road damage detection. Cai et al. [18] proposed a method for automatically quantifying wall cracks by combining CNN-based object detection and segmentation. Sandric et al. [19] applied the CNN model to mapping landslide cracks in various environmental conditions. In the studies above, CNN-based methods for crack detection have shown robust performance compared to conventional IPTs.

Although CNN is frequently used in the field of crack detection, CNN-based methods have some drawbacks. The training set of the CNN is usually composed of small patches. When the trained CNN network is applied to large-sized testing images, the images need to be cropped into patches so that each patch can be classified. In this process, the patches cropped from the test images are evaluated by the CNN individually, and the relationship between the patches is not of concern. In fact, cracks are always extended spatially, which indicates that the neighborhoods of a crack patch often contain cracks. Therefore, the connection between patches, especially neighboring patches, may contain important information about cracks, which provides the inspiration for improvement.

The central patch and its neighboring patches can be considered as a graph structure. To take full advantage of the information between neighboring patches and improve the crack detection performance, a graph attention network (GAT) is introduced into crack detection in this study. The graph attention network is a kind of graph neural network (GNN), which has shown good performance in graph structure data. Lu et al. [20] utilized a graph attention network to learn the relationship between users and items, which developed a realistic package recommendation system. Wang et al. [21] combined the graph attention network with the residual network to capture the correlation between adjacent image elements, based on which the assessment of the urban habitat was realized. Li et al. [22] proposed an explainable graph attention network framework to build the connection between multiple gene modules and cancer, which could provide mechanistic insights into cancer development and heterogeneity. The graph attention network works well in these diverse cases, which encourages us to apply this model to crack detection.

In this study, a method combining CNN and GAT is proposed. Firstly, a CNN model is used to extract features and reduce dimensions. Next, the CNN outputs of the central patch and its neighborhoods are connected to constitute a graph structure, which is input into a GAT network to learn the relationship and update the node features. Finally, the output features of the central node are analyzed to determine whether the central patch contains cracks.

This paper is arranged as follows. The basic knowledge of CNN, GAT, and the proposed framework are given in Section 2. The case study is presented in Section 3. Finally, the main contribution of this work is concluded in Section 4.

2. Methodology

2.1. CNN

In general, the basic structure of a CNN contains multiple convolutional blocks. A convolutional block consists of a convolutional layer, an activation unit, and a pooling layer. The architecture of the CNN is shown in Figure 1.

The convolutional layers are designed to achieve automatic feature extraction from high-dimensional data. A convolutional kernel slides over the whole input image with a spatial interval known as stride and the dot products at all coordinates are calculated to form a feature map. The number of kernels in a convolutional layer can be set according to demand, and the weights of each kernel are updated with model training. After the convolutional layer, an activation unit is used to add nonlinearity. There are several activation functions for options, among which ReLU is chosen due to its high efficiency. After convolution and activation, the pooling layer is utilized to decrease the dimension of the feature map, which prevents overfitting and slashes computational costs. Max-pooling and average-pooling are common pooling choices, and max-pooling is used in this study. An example of convolution, ReLU, and max-pooling is shown in Figure 2.

2.2. GAT

Graph neural networks (GNNs) have shown remarkable performance in graph structure data including in medical science [23] and social networks [24]. It is assumed that a node is affected by its neighboring nodes. Through optimizing and calculating different edge weights of neighborhoods, the features of neighboring nodes can be aggregated and considered.

As shown in Figure 3, graph attention networks [25] utilize an attention mechanism to assign edge weights, which can be formulated as

α_{i j} = \frac{\exp (LeakyReLU (a [W h_{i} ∥ W h_{j}]))}{\sum_{k \in N_{i}} \exp (LeakyReLU (a [W h_{i} ∥ W h_{k}]))}

(1)

where α_ij is the edge weight between the ith and the jth nodes, N_i denotes the set of the neighboring nodes of the ith node, W is a trainable weight matrix, h_i is the input feature of the ith node, a is a weight vector for parametrizing the attention mechanism, || is the concatenation operation, and LeakyReLU is the activation function. Combined with the weighted neighbor information, the output feature of the ith node

h_{i}^{'}

can be calculated as

h_{i}^{'} = σ (\sum_{j \in N_{i}} α_{i j} W h_{j})

(2)

where σ denotes the sigmoid activation function. In this study, h_i represents the original feature of the concerned area in the image. With Equations (1) and (2), the updated feature

h_{i}^{'}

can be obtained by considering the relationship between the concerned area and its neighborhoods. Then, classification can be fulfilled with the output features to decide whether the concerned area contains cracks. The details will be elaborated in the following section.

2.3. Framework of the Proposed Method

For bridge condition monitoring, a method for crack detection in the images is proposed by combining CNN and GAT in this study. As shown in Figure 4, researchers [9] usually use a sliding window to scan a large-sized image, and each patch inside the window is input to a CNN model to decide whether it contains cracks. Such a traditional method pays attention to the information of a single patch while ignoring the features of its surroundings in images. In general, cracks are continuous, which indicates that if a patch contains cracks, its neighborhoods would be very likely to contain cracks. Thus, the features of neighborhoods can help decide whether the central patch contains cracks. As previously mentioned, GAT can learn the relationship among neighboring nodes and aggregate their features, which is appropriate for the situation of crack detection. To achieve better detection performance in a small patch, the GAT is adopted in this paper to capture more information about its neighboring patches.

A CNN model is constructed to extract information and reduce the dimensions of the input patches. The layer size and corresponding operation in the CNN are listed in Table 1. When a square window with a side of 224 pixels slides over a test image, the central patch inside the window and its 8 same-sized neighboring patches will be input into the CNN model. Through the CNN, the dimensions of the input patches are reduced from 224 × 224 × 3 to 1 × 1 × 96, which condenses the key features and decreases computational costs. Then, the output features of these patches are reshaped into 96 × 1 vectors, which can be considered as nodes to construct a graph.

After data preprocessing through the CNN, a graph structure is constructed by connecting the central node with itself and its 8 neighboring nodes. Then, this graph is input into a GAT model to update the node features. In the established GAT model, the dimensions of the input node features h_i is 96 × 1, which is in correspondence with the output of the CNN, and the dimensions of output node features

h_{i}^{'}

are set as 64 × 1. After aggregating the features of the neighborhoods through GAT, the output features of the central node are sent to a fully connected layer, which is used to classify the central node and figure out whether the original patch contains cracks. The whole procedure is shown in Figure 5.

3. Case Study

3.1. Dataset

Steel box girders and steel bridge decks are important structures of long-span bridges. Because of dynamic vehicle loads and initial material flaws, fatigue cracks often appear at the weak points of the structure, such as the joints between structural components. Through the application of bridge robots, the dataset of images is captured and established, which includes 40 original large-sized images with dimensions of 4928 × 3264. In addition, eight additional images for testing, with dimensions of 4928 × 3264, are also included to demonstrate the effectiveness of the proposed approach.

As shown in Figure 6, the 40 raw images are divided into many small patches with a pixel resolution of 224 × 224. Then, a databank can be established for model training. Apparently, the area occupied by cracks is much less than the area without cracks. Therefore, the crack patches are much fewer than the non-crack patches, which may introduce an imbalance during the model training. As shown in Figure 7, the number of the crack patches can be multiplied with the method of rotation, which relieves such an imbalance. Through rotating the original crack patches with multiple angles, the number of patches containing cracks increases significantly. In addition, it is worth noting that some patches in which the cracks are on the four edges are removed from training databank to avoid confusion in classification. These patches with cracks on the edges may distract the networks from the key crack features, which could weaken the robustness of the trained model. Furthermore, since there always are patches where the cracks can be situated at the center with a small stride, ignoring these patches will not lead to information omission.

3.2. Pre-Training of CNN Model

The proposed model consists of a CNN and a GAT, and contains an abundance of trainable model parameters. If the whole model is trained directly without appropriate initialization, the convergence will probably be slow. Therefore, the CNN model is pre-trained in advance of the GAT training.

The structure of the CNN used in the proposed model is composed of three convolutional blocks and is elaborated in Table 1. In the pre-training, a fully connected layer and a SoftMax layer are additionally added behind the convolutional blocks, which reduces the dimensions of features from 96 to 2 and constitutes a classical CNN model for binary classification. The training set for the CNN is randomly extracted from the databank. The numbers of the patches labeled as “crack” and the patches labeled as “non-crack” are both set as 10,000. The CNN is trained for 100 epochs and the learning rate for each epoch is described in Table 2. The training process is shown in Figure 8.

Through the pre-training process, the CNN for patch classification is obtained, which indicates that the trained convolutional kernels are able to extract crack features. The parameters of the convolution kernels in the pre-trained CNN model can be used for the initialization of CNN-GAT model.

3.3. Training of the Whole CNN-GAT Model

Similarly, the training set for the whole CNN-GAT model is randomly extracted from the databank. The numbers of the two kinds of patches are both set to 10,000. Then, their neighboring patches are acquired from the original images to form samples of the training set. As shown in Figure 9, a labeled patch and its eight neighboring patches constitute a sample of the training set.

The convolutional kernels in the CNN-GAT model are designed to extract crack features from the patches, which play the same role in the pre-trained CNN. Therefore, the parameters of convolutional kernels are initialized with the CNN model pre-trained in Section 3.2 before the training of the CNN-GAT model, which could accelerate convergence and promote training efficiency. During the training process, the parameters of the GAT are updated, and the parameters of the CNN are also fine-tuned. The CNN-GAT model is trained for 100 epochs. The learning rate is described in Table 2. The training process is shown in Figure 10.

3.4. Results and Discussions

Several indicators can be utilized to describe the accuracy of binary classification, including pixel accuracy (PA), true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV), negative predictive value (NPV), etc. In this study, TPR and TNR are used to evaluate model performance, which are defined as [26]

TPR = \frac{TP}{TP + FN}

(3)

TNR = \frac{TN}{FP + TN}

(4)

where TP, FP, FN, and TN represent the patch number of true positives, false positives, false negatives, and true negatives, respectively.

The eight raw images used for testing are not involved in the establishment of the databank, so the detection results over these images can show the performance of the proposed CNN-GAT model. A sliding window of 224 × 224 pixels scans with a stride of 112 pixels is used over the test images. The patches inside the window and their neighbors are input into the trained CNN-GAT model to execute the classification. An example test result is exhibited in Figure 11. The patches classified as “crack” constitute the crack area and show the position of the cracks and verify the detectability of the proposed approach. The confusion matrix of the test results is illustrated in Figure 12. It can be figured out that both the concerned indicators are over 90%, which confirms the ability of the CNN-GAT model to detect cracks.

The pre-trained CNN model in Section 3.2 is also applied to the test images, and the test results are compared with those of the CNN-GAT model. Since the architecture of the pre-trained CNN model is the same as that of the front CNN part in the proposed CNN-GAT model, such a comparison can clarify the advantage of adding the GAT after the CNN. An example test result of the CNN is shown in Figure 13, in which the crack zone is not identified completely and a relatively large number of patches without cracks are classified as “crack” by mistake. Obviously, the performance of the CNN-GAT model surpasses that of the CNN through the comparison. The confusion matrix of the test results with the pre-trained CNN is illustrated in Figure 14. It is observed that the TPR of the CNN model is only 0.7821, while the TPR of the CNN-GAT model is as high as 0.9106, which implies that the proposed method is much more sensitive to cracks. Meanwhile, the TNR of the proposed CNN-GAT is almost as high as that of the CNN, which indicates that the increase in sensitivity to cracks will not cause an increase in sensitivity to other interfering factors. This can be attributed to the fact that the established graph structure in the proposed framework takes into account the spatial continuity of cracks, so the CNN-GAT model can concentrate on the features related to cracks instead of other irrelevant features. Through the quantitative comparison of TPR and TNR, it can be inferred that the introduction of GAT could enhance the crack identification ability of the CNN-based model, which also encourages researchers to add extra graph neural networks to other models for improvement.

4. Conclusions

To guarantee the operation and maintenance of long-span bridges, an intelligent vision-based method for crack identification based on CNN and GAT is proposed. Firstly, a 224 × 224 window slides over images of bridge decks or box girders, and the patch inside the window along with its neighborhoods is input into the CNN model to extract features and reduce dimensions, which generates the central node and its neighboring nodes. Then, a graph structure can be established by connecting the central node with its eight neighboring nodes. Subsequently, the graph structure is treated as the input of the GAT model to classify the patch within the sliding window. After the whole image is scanned, the patches classified as “crack” by the CNN-GAT model compose the defect area, which highlights the position and the shape of cracks in the original bridge structures.

The main contribution of this work is that GAT is introduced into crack identification. In general, CNN is commonly used to detect the patch within the sliding window while its relationship with neighborhoods is ignored. Since cracks are spatially extended, the neighborhoods of crack patches are likely to contain cracks. In this study, we construct a graph structure to establish a connection between the concerned patch and its neighborhoods and use GAT to extract features synthetically. In addition, the pre-training strategy of convolutional kernels is adopted to increase convergence speed during the model training process. The results in the test images demonstrate that the proposed CNN-GAT model has an excellent capability to identify cracks. Compared to the traditional CNN, the TPR of the test results from the CNN-GAT model is much higher, while the TNR is at almost the same high level, which implies that the proposed model is more sensitive to cracks while maintaining the resistance to interference factors. It also indicates that the introduction of GAT could make the whole framework concentrate more on the crack features, which encourages the researchers to add extra GAT modules to other networks for improvement in the field of crack visual detection.

Author Contributions

Conceptualization, F.C.; methodology, F.C.; software, C.C.; validation, J.H.; formal analysis, T.T.; investigation, J.H.; resources, F.C.; data curation, C.C.; writing—original draft preparation, F.C.; writing—review and editing, T.T.; visualization, T.T.; supervision, J.H.; project administration, C.C.; funding acquisition, F.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Henan, grant numbers 242300420054 and 242300420663.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tong, T.; Hua, J.; Gao, F.; Zhang, H.; Lin, J. Disbond Contour Estimation in Aluminum/CFRP Adhesive Joint Based on the Phase Velocity Variation of Lamb Waves. Smart Mater. Struct. 2022, 31, 095020. [Google Scholar] [CrossRef]
Ya, S.; Yamada, K.; Ishikawa, T. Fatigue Evaluation of Rib-to-Deck Welded Joints of Orthotropic Steel Bridge Deck. J. Bridg. Eng. 2011, 16, 492–499. [Google Scholar] [CrossRef]
Tong, T.; Hua, J.; Lin, J.; Zhang, H. Disbond Contours Evaluation in Aluminum/CFRP Adhesive Joint Based on Excitation Recovery of Lamb Waves. Compos. Struct. 2022, 294, 115736. [Google Scholar] [CrossRef]
Abdel-Qader, I.; Abudayyeh, O.; Kelly, M.E. Analysis of Edge-Detection Techniques for Crack Identification in Bridges. J. Comput. Civ. Eng. 2003, 17, 255–263. [Google Scholar] [CrossRef]
Yamaguchi, T.; Hashimoto, S. Fast Crack Detection Method for Large-Size Concrete Surface Images Using Percolation-Based Image Processing. Mach. Vis. Appl. 2010, 21, 797–809. [Google Scholar] [CrossRef]
Yeum, C.M.; Dyke, S.J. Vision-Based Automated Crack Detection for Bridge Inspection. Comput. Civ. Infrastruct. Eng. 2015, 30, 759–770. [Google Scholar] [CrossRef]
Li, L.; Wang, Q.; Zhang, G.; Shi, L.; Dong, J.; Jia, P. A Method of Detecting the Cracks of Concrete Undergo High-Temperature. Constr. Build. Mater. 2018, 162, 345–358. [Google Scholar] [CrossRef]
Tong, T.; Hua, J.; Gao, F.; Lin, J. Identification of Bolt State in Lap Joint Based on Propagation Model and Imaging Methods of Lamb Waves. Mech. Syst. Signal Process. 2023, 200, 110569. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput. Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Modarres, C.; Astorga, N.; Droguett, E.L.; Meruane, V. Convolutional Neural Networks for Automated Damage Recognition and Damage Type Identification. Struct. Control Health Monit. 2018, 25, e2230. [Google Scholar] [CrossRef]
Xu, Y.; Bao, Y.; Zhang, Y.; Li, H. Attribute-Based Structural Damage Identification by Few-Shot Meta Learning with Inter-Class Knowledge Transfer. Struct. Health Monit. 2021, 20, 1494–1517. [Google Scholar] [CrossRef]
Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep Convolutional Neural Networks with Transfer Learning for Computer Vision-Based Data-Driven Pavement Distress Detection. Constr. Build. Mater. 2017, 157, 322–330. [Google Scholar] [CrossRef]
Laxman, K.C.; Tabassum, N.; Ai, L.; Cole, C.; Ziehl, P. Automated Crack Detection and Crack Depth Prediction for Reinforced Concrete Structures Using Deep Learning. Constr. Build. Mater. 2023, 370, 130709. [Google Scholar] [CrossRef]
Yu, Y.; Samali, B.; Rashidi, M.; Mohammadi, M.; Nguyen, T.N.; Zhang, G. Vision-Based Concrete Crack Detection Using a Hybrid Framework Considering Noise Effect. J. Build. Eng. 2022, 61, 105246. [Google Scholar] [CrossRef]
Li, R.; Yu, J.; Li, F.; Yang, R.; Wang, Y.; Peng, Z. Automatic Bridge Crack Detection Using Unmanned Aerial Vehicle and Faster R-CNN. Constr. Build. Mater. 2023, 362, 129659. [Google Scholar] [CrossRef]
Gwon, G.H.; Lee, J.H.; Kim, I.H.; Jung, H.J. CNN-Based Image Quality Classification Considering Quality Degradation in Bridge Inspection Using an Unmanned Aerial Vehicle. IEEE Access 2023, 11, 22096–22113. [Google Scholar] [CrossRef]
Waseem Khan, M.; Obaidat, M.S.; Mahmood, K.; Batool, D.; Muhammad Sanaullah Badar, H.; Aamir, M.; Gao, W. Real-Time Road Damage Detection and Infrastructure Evaluation Leveraging Unmanned Aerial Vehicles and Tiny Machine Learning. IEEE Internet Things J. 2024, 11, 21347–21358. [Google Scholar] [CrossRef]
Cai, R.; Li, J.; Tan, Y.; Shou, W.; Butera, A. Automated Geometric Quantification of Building Exterior Wall Cracks Based on Computer Vision. J. Perform. Constr. Facil. 2024, 38, 04024015. [Google Scholar] [CrossRef]
Sandric, I.; Chitu, Z.; Ilinca, V.; Irimia, R. Using High-Resolution UAV Imagery and Artificial Intelligence to Detect and Map Landslide Cracks Automatically. Landslides 2024, 21, 2535–2543. [Google Scholar] [CrossRef]
Lu, W.; Jiang, N.; Jin, D.; Chen, H.; Liu, X. Learning Distinct Relationship in Package Recommendation With Graph Attention Networks. IEEE Trans. Comput. Soc. Syst. 2023, 10, 3308–3320. [Google Scholar] [CrossRef]
Wang, C.; Yang, K.; Yang, W.; Li, R.; Qiang, H.; Lu, B.; Su, B.; Yang, Z. Assessment of the Urban Habitat Quality Service Functions and Their Drivers Based on the Fusion Module of Graph Attention Network and Residual Network. Int. J. Digit. Earth 2024, 17, 2306310. [Google Scholar] [CrossRef]
Li, H.; Han, Z.; Sun, Y.; Wang, F.; Hu, P.; Gao, Y.; Bai, X.; Peng, S.; Ren, C.; Xu, X.; et al. CGMega: Explainable Graph Neural Network Framework with Attention Mechanisms for Cancer Gene Module Dissection. Nat. Commun. 2024, 15, 5997. [Google Scholar] [CrossRef] [PubMed]
Wang, S.H.; Govindaraj, V.V.; Górriz, J.M.; Zhang, X.; Zhang, Y.D. COVID-19 Classification by FGCNet with Deep Feature Fusion from Graph Convolutional Network and Convolutional Neural Network. Inf. Fusion 2021, 67, 208–229. [Google Scholar] [CrossRef] [PubMed]
Yuan, W.; He, K.; Guan, D.; Zhou, L.; Li, C. Graph Kernel Based Link Prediction for Signed Social Networks. Inf. Fusion 2019, 46, 1–10. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Ren, Y.; Huang, J.; Hong, Z.; Lu, W.; Yin, J.; Zou, L.; Shen, X. Image-Based Concrete Crack Detection in Tunnels Using Deep Fully Convolutional Networks. Constr. Build. Mater. 2020, 234, 117367. [Google Scholar] [CrossRef]

Figure 1. Network architecture of CNN.

Figure 2. An example of convolution, ReLU, and max-pooling.

Figure 3. The illustration of the GAT: (a) the attention mechanism and (b) the output feature of the ith node.

Figure 4. Flowchart of conventional CNN-based methods.

Figure 5. The proposed approach for crack detection using CNN and GAT.

Figure 6. Establishment of databank.

Figure 7. Data multiplication through rotation.

Figure 8. The pre-training process of the CNN model.

Figure 9. A sample of the training data for the CNN-GAT model.

Figure 10. The training process of the CNN-GAT model.

Figure 11. An example test result of the CNN-GAT model: (a) original test image, (b) illustration of cracks, (c) patches classified as “crack” by the CNN-GAT model (shown in white), and (d) crack detection result.

Figure 12. Confusion matrix of the test results by the CNN-GAT model.

Figure 13. An example test result of (a) the CNN-GAT model and (b) the pre-trained CNN model.

Figure 14. Confusion matrix of the test results by the pre-trained CNN model.

Table 1. Sizes and operations of layers.

Layer	Layer Size	Operation	Kernel Size	No.	Stride
Input	224 × 224 × 3	Convolution	20 × 20 × 3	24	2
Layer1	103 × 103 × 24	ReLU	-	-	-
Layer 2	103 × 103 × 24	Max-pooling	7 × 7	-	2
Layer 3	49 × 49 × 24	Convolution	15 × 15 × 24	48	2
Layer 4	18 × 18 × 48	ReLU	-	-	-
Layer 5	18 × 18 × 48	Max-pooling	4 × 4	-	2
Layer 6	8 × 8 × 48	Convolution	8 × 8 × 48	96	1
Layer 7	1 × 1 × 96	ReLU	-	-	-

Table 2. Learning rate.

Epoch	Learning Rate
1–20	0.01
21–40	0.001
41–60	0.0001
61–80	0.00001
81–100	0.000001

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, F.; Tong, T.; Hua, J.; Cui, C. Crack Identification for Bridge Condition Monitoring Combining Graph Attention Networks and Convolutional Neural Networks. Appl. Sci. 2025, 15, 5452. https://doi.org/10.3390/app15105452

AMA Style

Chen F, Tong T, Hua J, Cui C. Crack Identification for Bridge Condition Monitoring Combining Graph Attention Networks and Convolutional Neural Networks. Applied Sciences. 2025; 15(10):5452. https://doi.org/10.3390/app15105452

Chicago/Turabian Style

Chen, Feiyu, Tong Tong, Jiadong Hua, and Chun Cui. 2025. "Crack Identification for Bridge Condition Monitoring Combining Graph Attention Networks and Convolutional Neural Networks" Applied Sciences 15, no. 10: 5452. https://doi.org/10.3390/app15105452

APA Style

Chen, F., Tong, T., Hua, J., & Cui, C. (2025). Crack Identification for Bridge Condition Monitoring Combining Graph Attention Networks and Convolutional Neural Networks. Applied Sciences, 15(10), 5452. https://doi.org/10.3390/app15105452

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Crack Identification for Bridge Condition Monitoring Combining Graph Attention Networks and Convolutional Neural Networks

Abstract

1. Introduction

2. Methodology

2.1. CNN

2.2. GAT

2.3. Framework of the Proposed Method

3. Case Study

3.1. Dataset

3.2. Pre-Training of CNN Model

3.3. Training of the Whole CNN-GAT Model

3.4. Results and Discussions

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI