Dmg2Former-AR: Vision Transformers with Adaptive Rescaling for High-Resolution Structural Visual Inspection
Abstract
:1. Introduction
2. Architecture Design
2.1. Adaptive Rescaling
2.2. Dmg2Former
3. Case Studies
3.1. Datasets Description
3.2. Implementation
4. Results
4.1. Material Segmentation
4.2. Crack Segmentation
4.3. Computational Costs
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- ASCE. The Report Card for America’s Infrastructure; ASCE: Reston, VI, USA, 2021. [Google Scholar]
- U.S. Department of Transportation; Federal Highway Adminstration; Federal Transit Administration. Chapter 7: Capital Investment Scenarios. In Status of the Nation’s Highways, Bridges, and Transit: Conditions & Performance Report to Congress, 24th ed.; US Department of Transportation: Washington, DC, USA, 2021; pp. 7–12. [Google Scholar]
- ASDSO. The Cost of Rehabilitating Dams in the U.S.: A Methodology and Estimate; ASDSO: Lexington, KY, USA, 2023. [Google Scholar]
- Abdel-Qader, I.; Abudayyeh, O.; Kelly, M.E. Analysis of edge-detection techniques for crack identification in bridges. J. Comput. Civ. Eng. 2003, 17, 255–263. [Google Scholar]
- Jahanshahi, M.R.; Masri, S.F. Adaptive vision-based crack detection using 3D scene reconstruction for condition assessment of structures. Autom. Constr. 2012, 22, 567–576. [Google Scholar]
- Munawar, H.S.; Hammad, A.W.; Haddad, A.; Soares, C.A.P.; Waller, S.T. Image-based crack detection methods: A review. Infrastructures 2021, 6, 115. [Google Scholar] [CrossRef]
- Eltouny, K.; Gomaa, M.; Liang, X. Unsupervised Learning Methods for Data-Driven Vibration-Based Structural Health Monitoring: A Review. Sensors 2023, 23, 3290. [Google Scholar] [CrossRef]
- Lynch, J.P.; Loh, K.J. A summary review of wireless sensors and sensor networks for structural health monitoring. Shock. Vib. Dig. 2006, 38, 91–130. [Google Scholar]
- Abdulkarem, M.; Samsudin, K.; Rokhani, F.Z.; Rasid, M.F.A. Wireless sensor network for structural health monitoring: A contemporary review of technologies, challenges, and future direction. Struct. Health Monit. 2020, 19, 693–735. [Google Scholar]
- Soleimani-Babakamali, M.H.; Sepasdar, R.; Nasrollahzadeh, K.; Lourentzou, I.; Sarlo, R. Toward a general unsupervised novelty detection framework in structural health monitoring. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 1128–1145. [Google Scholar]
- Eltouny, K.; Liang, X. Uncertainty-aware structural damage warning system using deep variational composite neural networks. Earthq. Eng. Struct. Dyn. 2023, 52, 3345–3368. [Google Scholar] [CrossRef]
- Wang, Z.; Cha, Y.-J. Unsupervised deep learning approach using a deep auto-encoder with a one-class support vector machine to detect damage. Struct. Health Monit. 2021, 20, 406–425. [Google Scholar]
- Liu, G.; Wang, Q.-A.; Jiao, G.; Dang, P.; Nie, G.; Liu, Z.; Sun, J. Review of wireless RFID strain sensing technology in structural health monitoring. Sensors 2023, 23, 6925. [Google Scholar] [CrossRef]
- Caizzone, S.; DiGiampaolo, E. Wireless passive RFID crack width sensor for structural health monitoring. IEEE Sens. J. 2015, 15, 6767–6774. [Google Scholar]
- Strangfeld, C.; Johann, S.; Bartholmai, M. Smart RFID sensors embedded in building structures for early damage detection and long-term monitoring. Sensors 2019, 19, 5514. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.A.; Zhang, C.; Ma, Z.G.; Jiao, G.Y.; Jiang, X.W.; Ni, Y.Q.; Wang, Y.C.; Du, Y.T.; Qu, G.B.; Huang, J. Towards long-transmission-distance and semi-active wireless strain sensing enabled by dual-interrogation-mode RFID technology. Struct. Control. Health Monit. 2022, 29, e3069. [Google Scholar]
- Kumar, S.S.; Wang, M.; Abraham, D.M.; Jahanshahi, M.R.; Iseley, T.; Cheng, J.C. Deep learning–based automated detection of sewer defects in CCTV videos. J. Comput. Civ. Eng. 2020, 34, 04019047. [Google Scholar]
- Wang, M.; Kumar, S.S.; Cheng, J.C. Automated sewer pipe defect tracking in CCTV videos based on defect detection and metric learning. Autom. Constr. 2021, 121, 103438. [Google Scholar]
- Chikamoto, Y.; Tsutsumi, Y.; Sawano, H.; Ishihara, S. Design and implementation of a video-frame localization system for a drifting camera-based sewer inspection system. Sensors 2023, 23, 793. [Google Scholar] [CrossRef]
- Dorafshan, S.; Thomas, R.J.; Maguire, M. Fatigue crack detection using unmanned aerial systems in fracture critical inspection of steel bridges. J. Bridge Eng. 2018, 23, 04018078. [Google Scholar]
- Liu, Y.F.; Nie, X.; Fan, J.S.; Liu, X.G. Image-based crack assessment of bridge piers using unmanned aerial vehicles and three-dimensional scene reconstruction. Comput. Aided Civ. Infrastruct. Eng. 2020, 35, 511–529. [Google Scholar]
- Chen, D.; Huang, B.; Kang, F. A review of detection technologies for underwater cracks on concrete dam surfaces. Appl. Sci. 2023, 13, 3564. [Google Scholar] [CrossRef]
- Xing, J.; Liu, Y.; Zhang, G. Concrete highway crack detection based on visible light and infrared silicate spectrum image fusion. Sensors 2024, 24, 2759. [Google Scholar] [CrossRef]
- Lou, Y.; Meng, S.; Zhou, Y. Deep learning-based three-dimensional crack damage detection method using point clouds without color information. Struct. Health Monit. 2024, 14759217241236929. [Google Scholar] [CrossRef]
- Huang, Y.-T.; Jahanshahi, M.R.; Shen, F.; Mondal, T.G. Deep learning–based autonomous road condition assessment leveraging inexpensive rgb and depth sensors and heterogeneous data fusion: Pothole detection and quantification. J. Transp. Eng. Part B Pavements 2023, 149, 04023010. [Google Scholar]
- Agyemang, I.O.; Zhang, X.; Mensah, I.A.; Mawuli, B.C.; Agbley, B.L.Y.; Arhin, J.R. Enhanced Deep Convolutional Neural Network for Building Component Detection Towards Structural Health Monitoring. In Proceedings of the 2021 4th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Yibin, China, 20–22 August 2021; pp. 202–206. [Google Scholar]
- Liang, X. Image-based post-disaster inspection of reinforced concrete bridge systems using deep learning with Bayesian optimization. Comput. Aided Civ. Infrastruct. Eng. 2019, 34, 415–430. [Google Scholar]
- Narazaki, Y.; Hoskere, V.; Hoang, T.A.; Fujino, Y.; Sakurai, A.; Spencer, B.F., Jr. Vision-based automated bridge component recognition with high-level scene consistency. Comput. Aided Civ. Infrastruct. Eng. 2020, 35, 465–482. [Google Scholar]
- Sajedi, S.O.; Liang, X. Uncertainty-assisted deep vision structural health monitoring. Comput. Aided Civ. Infrastruct. Eng. 2021, 36, 126–142. [Google Scholar]
- Teng, S.; Liu, Z.; Chen, G.; Cheng, L. Concrete crack detection based on well-known feature extractor model and the YOLO_v2 network. Appl. Sci. 2021, 11, 813. [Google Scholar] [CrossRef]
- Dong, X.; Liu, Y.; Dai, J. Concrete Surface Crack Detection Algorithm Based on Improved YOLOv8. Sensors 2024, 24, 5252. [Google Scholar] [CrossRef]
- Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar]
- Zheng, Y.; Gao, Y.; Lu, S.; Mosalam, K.M. Multistage semisupervised active learning framework for crack identification, segmentation, and measurement of bridges. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 1089–1108. [Google Scholar]
- Tang, W.; Wu, R.-T.; Jahanshahi, M.R. Crack segmentation in high-resolution images using cascaded deep convolutional neural networks and Bayesian data fusion. Smart Struct. Syst. 2022, 29, 221–235. [Google Scholar]
- Zhang, K.; Cheng, H.; Zhang, B. Unified approach to pavement crack and sealed crack detection using preclassification based on transfer learning. J. Comput. Civ. Eng. 2018, 32, 04018001. [Google Scholar]
- Zhang, A.; Wang, K.C.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C. Automated pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 805–819. [Google Scholar]
- Xu, Y.; Fan, Y.; Li, H. Lightweight semantic segmentation of complex structural damage recognition for actual bridges. Struct. Health Monit. 2023, 22, 3250–3269. [Google Scholar]
- Choi, Y.; Park, H.W.; Mi, Y.; Song, S. Crack detection and analysis of concrete structures based on neural network and clustering. Sensors 2024, 24, 1725. [Google Scholar] [CrossRef]
- Sohaib, M.; Jamil, S.; Kim, J.-M. An ensemble approach for robust automated crack detection and segmentation in concrete structures. Sensors 2024, 24, 257. [Google Scholar] [CrossRef]
- Yang, L.; Liu, K.; Ou, R.; Qian, P.; Wu, Y.; Tian, Z.; Zhu, C.; Feng, S.; Yang, F. Surface Defect-Extended BIM Generation Leveraging UAV Images and Deep Learning. Sensors 2024, 24, 4151. [Google Scholar] [CrossRef] [PubMed]
- Hang, J.; Wu, Y.; Li, Y.; Lai, T.; Zhang, J.; Li, Y. A deep learning semantic segmentation network with attention mechanism for concrete crack detection. Struct. Health Monit. 2023, 22, 3006–3026. [Google Scholar]
- Yu, J.; Xu, Y.; Xing, C.; Zhou, J.; Pan, P. Pixel-Level Crack Detection and Quantification of Nuclear Containment with Deep Learning. Struct. Control Health Monit. 2023, 2023, 9982080. [Google Scholar]
- Wu, Y.; Qin, Y.; Qian, Y.; Guo, F.; Wang, Z.; Jia, L. Hybrid deep learning architecture for rail surface segmentation and surface defect detection. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 227–244. [Google Scholar]
- Guo, J.; Wang, Q.; Li, Y. Semi-supervised learning based on convolutional neural network and uncertainty filter for façade defects classification. Comput. Aided Civ. Infrastruct. Eng. 2021, 36, 302–317. [Google Scholar]
- Hoskere, V.; Narazaki, Y.; Hoang, T.; Spencer, B., Jr. Vision-based structural inspection using multiscale deep convolutional neural networks. In Proceedings of the 3rd Huixian International Forum on Earthquake Engineering for Young Researchers, Urbana-Champaign, IL, USA, 10–11 August 2017. [Google Scholar]
- Hoskere, V.; Narazaki, Y.; Hoang, T.A.; Spencer, B., Jr. MaDnet: Multi-task semantic segmentation of multiple types of structural materials and damage in images of civil infrastructure. J. Civ. Struct. Health Monit. 2020, 10, 757–773. [Google Scholar]
- Zhou, Z.; Zhang, J.; Gong, C. Automatic detection method of tunnel lining multi-defects via an enhanced You Only Look Once network. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 762–780. [Google Scholar] [CrossRef]
- Bae, H.; Jang, K.; An, Y.-K. Deep super resolution crack network (SrcNet) for improving computer vision–based automated crack detectability in in situ bridges. Struct. Health Monit. 2021, 20, 1428–1442. [Google Scholar]
- Xiang, C.; Wang, W.; Deng, L.; Shi, P.; Kong, X. Crack detection algorithm for concrete structures based on super-resolution reconstruction and segmentation network. Autom. Constr. 2022, 140, 104346. [Google Scholar]
- Kim, J.; Shim, S.; Kang, S.-J.; Cho, G.-C. Learning Structure for Concrete Crack Detection Using Robust Super-Resolution with Generative Adversarial Network. Struct. Control Health Monit. 2023, 2023, 8850290. [Google Scholar]
- Sajedi, S.; Eltouny, K.; Liang, X. Twin models for high-resolution visual inspections. Smart Struct. Syst. 2023, 31, 351–363. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Du, Z.; Liu, J.; Tang, J.; Wu, G. Anchor-based plain net for mobile image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2494–2502. [Google Scholar]
- Lai, W.-S.; Huang, J.-B.; Ahuja, N.; Yang, M.-H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Denton, E.L.; Chintala, S.; Fergus, R. Deep generative image models using a laplacian pyramid of adversarial networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1486–1494. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Guo, J.; Zou, X.; Chen, Y.; Liu, Y.; Hao, J.; Liu, J.; Yan, Y. Asconvsr: Fast and lightweight super-resolution network with assembled convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, USA, 17–24 June 2023; pp. 1582–1592. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the ICLR 2021, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022, arXiv:2201.03545. [Google Scholar]
- PapersWithCode. Semantic Segmentation. Available online: https://paperswithcode.com/task/semantic-segmentation (accessed on 30 April 2022).
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
- Iakubovskii, P. Segmentation Models Pytorch. Available online: https://github.com/qubvel/segmentation_models.pytorch (accessed on 29 July 2024).
- Bianchi, E.; Hebdon, M. Development of Extendable Open-Source Structural Inspection Datasets. J. Comput. Civ. Eng. 2022, 36, 04022039. [Google Scholar]
- Bianchi, E.; Hebdon, M. Concrete Crack Conglomerate Dataset; University Libraries Virginia Tech: Blacksburg, VA, USA, 2021. [Google Scholar] [CrossRef]
- Prasanna, P.; Dana, K.J.; Gucunski, N.; Basily, B.B.; La, H.M.; Lim, R.S.; Parvardeh, H. Automated Crack Detection on Concrete Bridges. IEEE Trans. Autom. Sci. Eng. 2016, 13, 591–599. [Google Scholar] [CrossRef]
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1525–1535. [Google Scholar] [CrossRef]
- Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images. Pattern Recognit. Lett. 2012, 33, 227–238. [Google Scholar] [CrossRef]
- Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–27 October 2017; pp. 2980–2988. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and flexible image augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
- Yep, T. Torchinfo. Available online: https://github.com/TylerYep/torchinfo (accessed on 28 July 2024).
- Zhang, Y.; d’Avigneau, A.M.; Hadjidemetriou, G.M.; de Silva, L.; Girolami, M.; Brilakis, I. Bayesian dynamic modelling for probabilistic prediction of pavement condition. Eng. Appl. Artif. Intell. 2024, 133, 108637. [Google Scholar]
Desubpixel Conv. Module (Downsampling) | Subpixel Conv. Module (Upsampling) | |||||
---|---|---|---|---|---|---|
Layer # | Operator | Kernel | # Channels | Operator | Kernel | # Channels |
1 | PixelUnshuffle | 4 × 4 | 48 | Conv. | 5 × 5 | 64 |
2 | Conv. | 3 × 3 | 32 | Conv. | 3 × 3 | 32 |
3 | Conv. | 3 × 3 | 64 | Conv. | 3 × 3 | n × 42 |
4 | Conv. | 3 × 3 | 3 | PixelShuffle | 4 × 4 | n |
Dataset | Image Count |
---|---|
CrackForest Dataset [69] | 118 |
Crack500 [70] | 3363 |
Cracktree200 [71] | 206 |
DeepCrack [72] | 521 |
EugenMiller [70] | 55 |
GAPs [70] | 509 |
Rissbilder [70] | 1411 |
Non-crack [70] | 3822 |
Volker [70] | 990 |
Total image count | 10,995 |
Dmg2Former Size | Image Size | F1-Score (%) | IoU (%) | Recall (%) | Precision (%) | |
---|---|---|---|---|---|---|
Dmg2Former | 112 | 112 | 86.52 | 76.76 | 85.30 | 87.93 |
Dmg2Former-NN 2× | 112 | 224 | 84.64 | 73.89 | 83.24 | 86.32 |
Dmg2Former-AR 2× | 112 | 224 | 87.99 | 79.01 | 87.13 | 88.95 |
Dmg2Former | 224 | 224 | 88.21 | 79.36 | 87.16 | 89.41 |
Dmg2Former-NN 4× | 112 | 448 | 84.47 | 73.65 | 83.03 | 86.24 |
Dmg2Former-AR 4× | 112 | 448 | 88.41 | 79.68 | 87.38 | 89.59 |
Dmg2Former-NN 2× | 224 | 448 | 87.25 | 77.84 | 86.14 | 88.54 |
Dmg2Former-AR 2× | 224 | 448 | 89.29 | 81.09 | 88.47 | 90.26 |
Dmg2Former-NN 8× | 112 | 896 | 84.38 | 73.49 | 82.85 | 86.24 |
Dmg2Former-AR 8× | 112 | 896 | 88.76 | 80.19 | 88.05 | 89.53 |
Dmg2Former-NN 4× | 224 | 896 | 87.27 | 77.87 | 86.14 | 88.58 |
Dmg2Former-AR 4× | 224 | 896 | 89.15 | 80.85 | 88.36 | 90.04 |
Dmg2Former Size | Image Size | F1-Score (%) | IoU (%) | Recall (%) | Precision (%) | |
---|---|---|---|---|---|---|
Dmg2Former | 224 | 224 | 92.18 | 85.80 | 92.15 | 92.26 |
Dmg2Former-NN 2× | 224 | 448 | 91.32 | 84.35 | 90.77 | 91.96 |
Dmg2Former-NN 4× | 224 | 896 | 91.25 | 84.23 | 90.65 | 91.94 |
Dmg2Former-AR 2× | 224 | 448 | 92.78 | 86.81 | 92.51 | 93.11% |
Dmg2Former-AR 4× | 224 | 896 | 92.55 | 86.48 | 92.09 | 93.19% |
Image Size | Background | Concrete | Steel | Metal Decking | |
---|---|---|---|---|---|
Dmg2Former | 224 | 73.92 | 86.31 | 94.40 | 88.59 |
Dmg2Former-NN 2× | 448 | 72.23 | 85.22 | 93.28 | 86.69 |
Dmg2Former-NN 4× | 896 | 72.07 | 85.25 | 93.27 | 86.32 |
Dmg2Former-AR 2× | 448 | 75.18 | 87.06 | 94.57 | 90.44 |
Dmg2Former-AR 4× | 896 | 73.84 | 86.62 | 94.78 | 90.69 |
Dmg2Former Size | Image Size | F1-Score (%) | IoU (%) | Recall (%) | Precision (%) | |
---|---|---|---|---|---|---|
Dmg2Former | 112 | 112 | 70.49 | 54.43 | 68.86 | 72.19 |
Dmg2Former-NN 2× | 112 | 224 | 69.29 | 53.01 | 68.41 | 70.18 |
Dmg2Former-AR 2× | 112 | 224 | 72.60 | 56.98 | 74.39 | 70.89 |
Dmg2Former | 224 | 224 | 72.09 | 56.35 | 70.95 | 73.25 |
Dmg2Former-NN 4× | 112 | 448 | 68.20 | 51.75 | 67.18 | 69.26 |
Dmg2Former-AR 4× | 112 | 448 | 73.62 | 58.26 | 74.08 | 73.17 |
Dmg2Former-NN 2× | 224 | 448 | 71.45 | 55.58 | 69.66 | 73.34 |
Dmg2Former-AR 2× | 224 | 448 | 74.04 | 58.79 | 74.67 | 73.42 |
Dmg2Former | 448 | 448 | 73.32 | 57.88 | 74.04 | 72.61 |
Image Size | F1-Score (%) | IoU (%) | Recall (%) | Precision (%) | |
---|---|---|---|---|---|
Dmg2Former | 224 | 74.54 | 59.41 | 75.70 | 73.40 |
Dmg2Former-NN 2× | 448 | 74.41 | 59.25 | 75.41 | 73.44 |
Dmg2Former-AR 2× | 448 | 76.07 | 61.39 | 78.29 | 73.98 |
Dmg2Former Size | Image Size | Trainable Params | MAC (G) | Forward Pass Size (MB) | Inference Speed (Frames-per-Second) | |
---|---|---|---|---|---|---|
Dmg2Former | 112 | 112 | 108, 987, 980 | 7.24 | 309.9 | 49.1 |
Dmg2Former-AR 2× | 112 | 224 | 109, 044, 719 | 7.95 | 350.7 | 46.9 |
Dmg2Former | 224 | 224 | 108, 992, 588 | 7.25 | 309.9 | 48.5 |
Dmg2Former-AR 4× | 112 | 448 | 109, 101, 442 | 10.77 | 514.1 | 45.1 |
Dmg2Former-AR 2× | 224 | 448 | 109, 049, 327 | 10.08 | 473.3 | 46.7 |
Dmg2Former | 448 | 448 | 108, 992, 588 | 28.84 | 1239.6 | 33.6 |
Dmg2Former-AR 8× | 112 | 896 | 109, 158, 165 | 22.08 | 1167.6 | 43.3 |
Dmg2Former-AR 4× | 224 | 896 | 109, 106, 050 | 21.39 | 1126.8 | 45.5 |
Dmg2Former | 896 | 896 | 108, 992, 588 | 115.17 | 4958.2 | 9.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Eltouny, K.; Sajedi, S.; Liang, X. Dmg2Former-AR: Vision Transformers with Adaptive Rescaling for High-Resolution Structural Visual Inspection. Sensors 2024, 24, 6007. https://doi.org/10.3390/s24186007
Eltouny K, Sajedi S, Liang X. Dmg2Former-AR: Vision Transformers with Adaptive Rescaling for High-Resolution Structural Visual Inspection. Sensors. 2024; 24(18):6007. https://doi.org/10.3390/s24186007
Chicago/Turabian StyleEltouny, Kareem, Seyedomid Sajedi, and Xiao Liang. 2024. "Dmg2Former-AR: Vision Transformers with Adaptive Rescaling for High-Resolution Structural Visual Inspection" Sensors 24, no. 18: 6007. https://doi.org/10.3390/s24186007
APA StyleEltouny, K., Sajedi, S., & Liang, X. (2024). Dmg2Former-AR: Vision Transformers with Adaptive Rescaling for High-Resolution Structural Visual Inspection. Sensors, 24(18), 6007. https://doi.org/10.3390/s24186007