Innovative Framework for Historical Architectural Recognition in China: Integrating Swin Transformer and Global Channel–Spatial Attention Mechanism
Abstract
:1. Introduction
1.1. The Demand for Digital Information in Urban and Architectural Planning
1.2. Application of Deep Learning Models
1.3. Challenges of Applying Deep Learning Models to Historical Buildings
1.4. Research Outcomes
- Through Internet data collection and field research, a substantial amount of primary data on historical buildings was obtained. Each data point was meticulously labeled, resulting in a high-quality, custom-built dataset. This dataset includes images of historical buildings from various regions, styles, and eras in China with detailed information to meet the training requirements of deep learning models.
- Building upon the Swin Transformer, a new deep learning model was developed by integrating the GCSA mechanism to enhance the performance of architectural image recognition. During the model training process, various regularization and data augmentation techniques were applied to effectively prevent overfitting, thereby ensuring the model’s robustness and generalizability.
- On the custom historical building dataset, the developed model achieved high accuracy in classification tasks, demonstrating its practicality and effectiveness in real-world applications. The Swin Transformer–GCSA model excels in historical building image classification and detail recognition tasks, showcasing strong capabilities in detail identification and image feature extraction.
- To improve the interpretability of the deep learning model, methods such as feature map analysis and Grad-CAM were employed to gain deeper insights into the model’s feature extraction process. These techniques reveal the model’s focus on different feature regions during architectural recognition, thereby making its decision-making logic more transparent.
2. Methods
2.1. Overall Workflow
2.2. Materials
2.2.1. Dataset Acquisition and Processing
2.2.2. Data Preprocessing
2.2.3. Test Sets
2.2.4. Model Selection
2.2.5. Training Parameters
2.2.6. Evaluation Metrics
2.2.7. Interpretability Analysis Methods
3. Experiment and Results
3.1. Performance of Different Deep Learning Models
3.2. Impact of Different Training Strategies and Parameters on Model Performance
3.3. Interpretability Study
4. Discussion
4.1. Model Performance and Innovations
4.2. Interpretability Analysis
4.3. Limitations and Future Prospects
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, X.; Lu, Y.; Lin, G. An integrated deep learning approach for assessing the visual qualities of built environments utilizing street view images. Eng. Appl. Artif. Intell. 2024, 130, 107805. [Google Scholar] [CrossRef]
- Cao, Y.; Yang, P.; Xu, M.; Li, M.; Li, Y.; Guo, R. A novel method of urban landscape perception based on biological vision process. Landsc. Urban Plan. 2025, 254, 105246. [Google Scholar] [CrossRef]
- Li, W.; Sun, R.; He, H.; Chen, L. How does three-dimensional landscape pattern affect urban residents’ sentiments. Cities 2023, 143, 104619. [Google Scholar] [CrossRef]
- Dang, X.; Liu, W.; Hong, Q.; Wang, Y.; Chen, X. Digital twin applications on cultural world heritage sites in China: A state-of-the-art overview. J. Cult. Herit. 2023, 64, 228–243. [Google Scholar] [CrossRef]
- Ogawa, Y.; Oki, T.; Zhao, C.; Sekimoto, Y.; Shimizu, C. Evaluating the subjective perceptions of streetscapes using street-view images. Landsc. Urban Plan. 2024, 247, 105073. [Google Scholar] [CrossRef]
- Shin, H.-S.; Woo, A. Analyzing the effects of walkable environments on nearby commercial property values based on deep learning approaches. Cities 2024, 144, 104628. [Google Scholar] [CrossRef]
- Ramalingam, S.P.; Kumar, V. Building usage prediction in complex urban scenes by fusing text and facade features from street view images using deep learning. Build. Environ. 2025, 267, 112174. [Google Scholar] [CrossRef]
- Gara, F.; Nicoletti, V.; Arezzo, D.; Cipriani, L.; Leoni, G. Model Updating of Cultural Heritage Buildings Through Swarm Intelligence Algorithms. Int. J. Archit. Herit. 2023, 11, 1–17. [Google Scholar] [CrossRef]
- Ito, K.; Bansal, P.; Biljecki, F. Examining the causal impacts of the built environment on cycling activities using time-series street view imagery. Transp. Res. Part A Policy Pract. 2024, 190, 104286. [Google Scholar] [CrossRef]
- Tarkhan, N.; Szcześniak, J.T.; Reinhart, C. Façade feature extraction for urban performance assessments: Evaluating algorithm applicability across diverse building morphologies. Sustain. Cities Soc. 2024, 105, 105280. [Google Scholar] [CrossRef]
- Larkin, A.; Gu, X.; Chen, L.; Hystad, P. Predicting perceptions of the built environment using GIS, satellite and street view image approaches. Landsc. Urban Plan. 2021, 216, 104257. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Park, J.; Lee, J.; Jang, K.M. Examining the socio-spatial patterns of bus shelters with deep learning analysis of street-view images: A case study of 20 cities in the U.S. Cities 2024, 148, 104852. [Google Scholar] [CrossRef]
- Sun, M.; Zhang, F.; Duarte, F.; Ratti, C. Understanding architecture age and style through deep learning. Cities 2022, 128, 103787. [Google Scholar] [CrossRef]
- Liang, X.; Chang, J.H.; Gao, S.; Zhao, T.; Biljecki, F. Evaluating human perception of building exteriors using street view imagery. Build. Environ. 2024, 263, 111875. [Google Scholar] [CrossRef]
- Sánchez, I.A.V.; Labib, S.M. Accessing eye-level greenness visibility from open-source street view images: A methodological development and implementation in multi-city and multi-country contexts. Sustain. Cities Soc. 2024, 103, 105262. [Google Scholar] [CrossRef]
- Xiang, H.; Xie, M.; Huang, Z.; Bao, Y. Study on spatial distribution and connectivity of Tusi sites based on quantitative analysis. Ain Shams Eng. J. 2023, 14, 101833. [Google Scholar] [CrossRef]
- Xie, L.; Li, Z.; Li, J.; Yang, G.; Jiang, J.; Liu, Z.; Tong, S. The Impact of Traditional Raw Earth Dwellings’ Envelope Retrofitting on Energy Saving: A Case Study from Zhushan Village, in West of Hunan, China. Atmosphere 2022, 13, 1537. [Google Scholar] [CrossRef]
- Rocco, A.; Vicente, R.; Rodrigues, H.; Ferreira, V. Adobe Blocks Reinforced with Vegetal Fibres: Mechanical and Thermal Characterisation. Buildings 2024, 14, 2582. [Google Scholar] [CrossRef]
- Ferretto, P.W.; Cai, L. Village prototypes: A survival strategy for Chinese minority rural villages. J. Archit. 2020, 25, 1–23. [Google Scholar] [CrossRef]
- Bian, J.; Chen, W.; Zeng, J. Spatial Distribution Characteristics and Influencing Factors of Traditional Villages in China. Int. J. Environ. Res. Public Health 2022, 19, 4627. [Google Scholar] [CrossRef]
- Wang, F.; Yu, F.; Zhu, X.; Pan, X.; Sun, R.; Cai, H. Disappearing gradually and unconsciously in rural China: Research on the sunken courtyard and the reasons for change in Shanxian County, Henan Province. J. Rural. Stud. 2016, 47, 630–649. [Google Scholar] [CrossRef]
- Lin, L.; Du, C.; Yao, Y.; Gui, Y. Dynamic influencing mechanism of traditional settlements experiencing urbanization: A case study of Chengzi Village. J. Clean. Prod. 2021, 320, 128462. [Google Scholar] [CrossRef]
- Ljubenov, G.; Roter-Blagojević, M. Disappearance of the traditional architecture: The key study of Stara Planina villages. SAJ Serbian Archit. J. 2016, 8, 43–58. [Google Scholar] [CrossRef]
- Hu, D.; Zhou, S.; Chen, Z.; Gu, L. Effect of Traditional Chinese Village policy under the background of rapid urbanization in China: Taking Jiangxi Province as an example. Prog. Geogr. 2021, 40, 104–113. [Google Scholar] [CrossRef]
- Hecht, R.; Meinel, G.; Buchroithner, M. Automatic identification of building types based on topographic databases—A comparison of different data sources. Int. J. Cartogr. 2025, 1, 18–31. [Google Scholar] [CrossRef]
- Xiao, C.; Xie, X.; Zhang, L.; Xue, B. Efficient Building Category Classification with Façade Information from Oblique Aerial Images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, XLIII-B2-2020, 1309–1313. [Google Scholar] [CrossRef]
- Lin, H.; Huang, L.; Chen, Y.; Zheng, L.; Huang, M.; Chen, Y. Research on the Application of CGAN in the Design of Historic Building Facades in Urban Renewal—Taking Fujian Putian Historic Districts as an Example. Buildings 2023, 13, 1478. [Google Scholar] [CrossRef]
- Gu, J.; Xie, Z.; Zhang, J.; He, X. Advances in Rapid Damage Identification Methods for Post-Disaster Regional Buildings Based on Remote Sensing Images: A Survey. Buildings 2024, 14, 898. [Google Scholar] [CrossRef]
- Han, Q.; Yin, C.; Deng, Y.; Liu, P. Towards Classification of Architectural Styles of Chinese Traditional Settlements Using Deep Learning: A Dataset, a New Framework, and Its Interpretability. Remote Sens. 2022, 14, 5250. [Google Scholar] [CrossRef]
- Gonzalez, D.; Rueda-Plata, D.; Acevedo, A.B.; Duque, J.C.; Ramos-Pollán, R.; Betancourt, A.; García, S. Automatic detection of building typology using deep learning methods on street level images. Build. Environ. 2020, 177, 106805. [Google Scholar] [CrossRef]
- Roussel, R.; Jacoby, S.; Asadipour, A. Robust Building Identification from Street Views Using Deep Convolutional Neural Networks. Buildings 2024, 14, 578. [Google Scholar] [CrossRef]
- Dai, M.; Ward, W.O.C.; Meyers, G.; Tingley, D.D.; Mayfield, M. Residential building facade segmentation in the urban environment. Build. Environ. 2021, 199, 107921. [Google Scholar] [CrossRef]
- Kim, J.; Nguyen, A.-D.; Lee, S. Deep CNN-Based Blind Image Quality Predictor. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 11–24. [Google Scholar] [CrossRef] [PubMed]
- Tian, C.; Xu, Y.; Li, Z.; Zuo, W.; Fei, L.; Liu, H. Attention-guided CNN for image denoising. Neural Netw. 2020, 124, 117–129. [Google Scholar] [CrossRef]
- Xiang, C.; Yin, D.; Song, F.; Yu, Z.; Jian, X.; Gong, H. A Fast and Robust Safety Helmet Network Based on a Mutilscale Swin Transformer. Buildings 2024, 14, 688. [Google Scholar] [CrossRef]
- Girard, L.; Roy, V.; Eude, T.; Giguère, P. Swin transformer for hyperspectral rare sub-pixel target detection. In Algorithms, Technologies, and Applications for Multispectral and Hyperspectral Imaging XXVIII; Messinger, D.W., Velez-Reyes, M., Eds.; SPIE: Cergy Pontoise, France, 2022; Volume 5, p. 31. [Google Scholar] [CrossRef]
- Rasmussen, C.B.; Kirk, K.; Moeslund, T.B. The Challenge of Data Annotation in Deep Learning—A Case Study on Whole Plant Corn Silage. Sensors 2022, 22, 1596. [Google Scholar] [CrossRef]
- Božič, J.; Tabernik, D.; Skočaj, D. Mixed supervision for surface-defect detection: From weakly to fully supervised learning. Comput. Ind. 2021, 129, 103459. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Kim, S.; Choi, Y.; Lee, M. Deep learning with support vector data description. Neurocomputing 2015, 165, 111–117. [Google Scholar] [CrossRef]
- Li, Q.; Yan, M.; Xu, J. Optimizing Convolutional Neural Network Performance by Mitigating Underfitting and Overfitting. In Proceedings of the 2021 IEEE/ACIS 19th International Conference on Computer and Information Science (ICIS), Shanghai, China, 23–25 June 2021; IEEE: New York, NY, USA, 2021; Volume 6, pp. 126–131. [Google Scholar] [CrossRef]
- Qi, W.; Huang, C.; Wang, Y.; Zhang, X.; Sun, W.; Zhang, L. Global—Local 3-D Convolutional Transformer Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 61, 5510820. [Google Scholar] [CrossRef]
- Guo, X.; Meng, L.; Mei, L.; Weng, Y.; Tong, H. Multi-focus image fusion with Siamese self-attention network. IET Image Process. 2020, 14, 1339–1346. [Google Scholar] [CrossRef]
- Qi, M.; Liu, L.; Zhuang, S.; Liu, Y.; Li, K.; Yang, Y.; Li, X. FTC-Net: Fusion of Transformer and CNN Features for Infrared Small Target Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8613–8623. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Zhang, Q.; Xu, Y.; Zhang, J.; Tao, D. ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond. Int. J. Comput. Vis. 2023, 131, 1141–1162. [Google Scholar] [CrossRef]
- Kim, S.; Nam, J.; Ko, B.C. Facial Expression Recognition Based on Squeeze Vision Transformer. Sensors 2022, 22, 3729. [Google Scholar] [CrossRef]
- Martín, A.; Vargas, V.M.; Gutiérrez, P.A.; Camacho, D.; Hervás-Martínez, C. Optimising Convolutional Neural Networks using a Hybrid Statistically-Driven Coral Reef Optimisation Algorithm. Appl. Soft Comput. 2020, 90, 106144. [Google Scholar] [CrossRef]
- Tian, Q.; Arbel, T.; Clark, J.J. Task dependent deep LDA pruning of neural networks. Comput. Vis. Image Underst. 2021, 203, 103154. [Google Scholar] [CrossRef]
- Jin, X.; Lan, C.; Zeng, W.; Chen, Z. Style Normalization and Restitution for Domain Generalization and Adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 975–983. [Google Scholar] [CrossRef]
- Dong, Y.; Su, H.; Zhu, J.; Zhang, B. Improving Interpretability of Deep Neural Networks with Semantic Information. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; Volume 7, pp. 975–983. [Google Scholar] [CrossRef]
- Liu, Z.; Xu, F. Interpretable neural networks: Principles and applications. Front. Artif. Intell. 2023, 6, 974295. [Google Scholar] [CrossRef]
- Owen, C.A.; Dick, G.; Whigham, P.A. Standardization and Data Augmentation in Genetic Programming. IEEE Trans. Evol. Comput. 2022, 26, 1596–1608. [Google Scholar] [CrossRef]
- Moreno-Barea, F.J.; Jerez, J.M.; Franco, L. Improving classification accuracy using data augmentation on small data sets. Expert Syst. Appl. 2020, 161, 113696. [Google Scholar] [CrossRef]
- Werner, J.M.; O’Leary-Kelly, A.M.; Baldwin, T.T.; Wexley, K.N. Augmenting behavior-modeling training: Testing the effects of pre- and post-training interventions. Hum. Resour. Dev. Q. 1994, 5, 169–183. [Google Scholar] [CrossRef]
- Cannon-Bowers, J.A.; Rhodenizer, L.; Salas, E.; Bowers, C.A. A Framework for Understanding Pre-Practice Conditions and Their Impact on Learning. Pers. Psychol. 1998, 51, 291–320. [Google Scholar] [CrossRef]
- Kimura, M. Generalized t-SNE Through the Lens of Information Geometry. IEEE Access 2021, 9, 129619–129625. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 10012–10022. [Google Scholar]
- Lin, R. Analysis on the Selection of the Appropriate Batch Size in CNN Neural Network. In Proceedings of the 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), Guilin, China, 25–27 February 2022; IEEE: New York, NY, USA, 2022; pp. 106–109. [Google Scholar] [CrossRef]
- Choi, M. An Empirical Study on the Optimal Batch Size for the Deep Q-Network. In Proceedings of the Robot Intelligence Technology and Applications 5: Results from the 5th International Conference on Robot Intelligence Technology and Applications, Daejeon, Korea, 13–15 December 2017; Springer: Berlin/Heidelberg, Germany, 2019; pp. 73–81. [Google Scholar]
- Morchdi, C.; Zhou, Y.; Ding, J.; Wang, B. Exploring Gradient Oscillation in Deep Neural Network Training. In Proceedings of the 2023 59th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 26–29 September 2023; IEEE: New York, NY, USA, 2023; Volume 9, pp. 1–7. [Google Scholar] [CrossRef]
- Jia, X.; Feng, X.; Yong, H.; Meng, D. Weight Decay with Tailored Adam on Scale-Invariant Weights for Better Generalization. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 6936–6947. [Google Scholar] [CrossRef]
- Pezoulas, V.C.; Kourou, K.D.; Kalatzis, F.; Exarchos, T.P.; Venetsanopoulou, A.; Zampeli, E.; Gandolfo, S.; Skopouli, F.; De Vita, S.; Tzioufas, A.G.; et al. Medical data quality assessment: On the development of an automated framework for medical data curation. Comput. Biol. Med. 2019, 107, 270–283. [Google Scholar] [CrossRef]
- Saito, T.; Rehmsmeier, M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
- Athawale, T.M.; Triana, B.; Kotha, T.; Pugmire, D.; Rosen, P. A Comparative Study of the Perceptual Sensitivity of Topological Visualizations to Feature Variations. IEEE Trans. Vis. Comput. Graph. 2024, 30, 1074–1084. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: New York, NY, USA, 2017; Volume 10, pp. 618–626. [Google Scholar] [CrossRef]
- Ramalingam, S.P.; Kumar, V. Automatizing the generation of building usage maps from geotagged street view images using deep learning. Build. Environ. 2023, 235, 110215. [Google Scholar] [CrossRef]
Model | Class | F1 | Precision | Recall | Accuracy |
---|---|---|---|---|---|
Swin–GCSA | MB | 0.9912 | 0.9912 | 0.9911 | |
NTRR | 0.9725 | 0.9538 | 0.992 | ||
TWR | 0.9886 | 0.9924 | 0.9848 | ||
TER | 0.9649 | 0.9617 | 0.9679 | ||
TSR | 0.9789 | 0.9938 | 0.9642 | ||
Overall | 0.9784 | 0.9787 | 0.9784 | 0.9784 | |
AlexNet | MB | 0.9694 | 0.9568 | 0.9823 | |
NTRR | 0.9565 | 0.9453 | 0.9680 | ||
TWR | 0.9772 | 0.9772 | 0.9772 | ||
TER | 0.9358 | 0.9358 | 0.9358 | ||
TSR | 0.9393 | 0.9567 | 0.9226 | ||
Overall | 0.9538 | 0.9539 | 0.9539 | 0.9539 | |
VGG16 | MB | 0.9527 | 0.9250 | 0.9823 | |
NTRR | 0.9516 | 0.9593 | 0.9440 | ||
TWR | 0.9808 | 0.9922 | 0.9696 | ||
TER | 0.9299 | 0.9240 | 0.9358 | ||
TSR | 0.9397 | 0.9512 | 0.9285 | ||
Overall | 0.9496 | 0.9501 | 0.9496 | 0.9496 | |
ResNet50 | MB | 0.9866 | 0.9910 | 0.9823 | |
NTRR | 0.9647 | 0.9461 | 0.9840 | ||
TWR | 0.9809 | 0.9847 | 0.9772 | ||
TER | 0.9254 | 0.8975 | 0.9551 | ||
TSR | 0.9349 | 0.9741 | 0.8988 | ||
Overall | 0.9554 | 0.9567 | 0.9553 | 0.9553 | |
ResNet101 | MB | 0.9613 | 0.9333 | 0.9911 | |
NTRR | 0.9558 | 0.9596 | 0.9520 | ||
TWR | 0.9885 | 0.9923 | 0.9848 | ||
TER | 0.9548 | 0.9610 | 0.9487 | ||
TSR | 0.9429 | 0.9515 | 0.9345 | ||
Overall | 0.9596 | 0.9599 | 0.9597 | 0.9597 | |
DenseNet121 | MB | 0.9826 | 0.9658 | 1.0000 | |
NTRR | 0.9760 | 0.9760 | 0.9760 | ||
TWR | 0.9886 | 0.9849 | 0.9849 | ||
TER | 0.9456 | 0.9426 | 0.9426 | ||
TSR | 0.9575 | 0.9753 | 0.9753 | ||
Overall | 0.9682 | 0.9684 | 0.9683 | 0.9683 |
Model | Class | F1 | Precision | Recall | Accuracy |
---|---|---|---|---|---|
Swin–GCSA | MB | 0.9956 | 1.0000 | 0.9913 | |
NTRR | 0.9197 | 0.9692 | 0.8750 | ||
TWR | 0.9965 | 1.0000 | 0.9931 | ||
TER | 0.9495 | 0.9495 | 0.9495 | ||
TSR | 0.9498 | 0.9179 | 0.9840 | ||
Overall | 0.9670 | 0.9680 | 0.9671 | 0.9671 | |
AlexNet | MB | 0.9783 | 0.9826 | 0.9741 | |
NTRR | 0.9037 | 0.9682 | 0.8472 | ||
TWR | 0.9727 | 0.9662 | 0.9794 | ||
TER | 0.9576 | 0.9658 | 0.9495 | ||
TSR | 0.9461 | 0.9111 | 0.9840 | ||
Overall | 0.9564 | 0.9578 | 0.9567 | 0.9567 | |
VGG16 | MB | 0.9658 | 0.9576 | 0.9741 | |
NTRR | 0.9007 | 1.0000 | 0.8194 | ||
TWR | 0.9861 | 1.0000 | 0.9726 | ||
TER | 0.9430 | 0.9133 | 0.9747 | ||
TSR | 0.9649 | 0.9393 | 0.9920 | ||
Overall | 0.9580 | 0.9606 | 0.9585 | 0.9585 | |
ResNet50 | MB | 0.9868 | 1.0000 | 0.9741 | |
NTRR | 0.9022 | 0.9836 | 0.8333 | ||
TWR | 0.9794 | 0.9794 | 0.9794 | ||
TER | 0.9328 | 0.8805 | 0.9915 | ||
TSR | 0.9236 | 0.9274 | 0.9200 | ||
Overall | 0.9497 | 0.9525 | 0.9498 | 0.9498 | |
ResNet101 | MB | 0.9779 | 1.0000 | 0.9568 | |
NTRR | 0.8032 | 0.9800 | 0.6805 | ||
TWR | 0.9931 | 0.9931 | 0.9931 | ||
TER | 0.9792 | 0.9672 | 0.9915 | ||
TSR | 0.8978 | 0.8255 | 0.9840 | ||
Overall | 0.9430 | 0.9513 | 0.9446 | 0.9446 | |
DenseNet121 | MB | 0.9826 | 0.9912 | 0.9741 | |
NTRR | 0.7627 | 0.9782 | 0.6250 | ||
TWR | 0.9896 | 0.9931 | 0.9863 | ||
TER | 0.9176 | 0.8602 | 0.9831 | ||
TSR | 0.9236 | 0.8832 | 0.9680 | ||
Overall | 0.9309 | 0.9398 | 0.9343 | 0.9343 |
Model | Class | F1 | Precision | Recall | Accuracy |
---|---|---|---|---|---|
Standard Model | MB | 0.9912 | 0.9912 | 0.9911 | |
NTRR | 0.9725 | 0.9538 | 0.992 | ||
TWR | 0.9886 | 0.9924 | 0.9848 | ||
TER | 0.9649 | 0.9617 | 0.9679 | ||
TSR | 0.9789 | 0.9938 | 0.9642 | ||
Overall | 0.9784 | 0.9787 | 0.9784 | 0.9784 | |
Model with Modified Learning Rate | MB | 0.9955 | 1.0000 | 0.9911 | |
NTRR | 0.9802 | 0.9687 | 0.9920 | ||
TWR | 0.9848 | 0.9848 | 0.9848 | ||
TER | 0.9480 | 0.9605 | 0.9358 | ||
TSR | 0.9644 | 0.9588 | 0.9702 | ||
Overall | 0.9726 | 0.9726 | 0.9726 | 0.9726 | |
Model without Pretraining | MB | 0.9531 | 0.9180 | 0.9911 | |
NTRR | 0.9274 | 0.9349 | 0.9200 | ||
TWR | 0.9552 | 0.9411 | 0.9696 | ||
TER | 0.9034 | 0.8787 | 0.9294 | ||
TSR | 0.8860 | 0.9459 | 0.8333 | ||
Overall | 0.9215 | 0.9234 | 0.9222 | 0.9222 | |
Model without Data Augmentation | MB | 0.8669 | 0.9777 | 0.7787 | |
NTRR | 0.8931 | 0.8540 | 0.9360 | ||
TWR | 0.8790 | 0.9396 | 0.8257 | ||
TER | 0.8220 | 0.8300 | 0.8141 | ||
TSR | 0.8415 | 0.7777 | 0.9166 | ||
Overall | 0.8577 | 0.8666 | 0.8573 | 0.8573 | |
Model without Normalization | MB | 0.2890 | 0.1689 | 1.0000 | |
NTRR | 0.1857 | 0.8666 | 0.1040 | ||
TWR | 0.0000 | 0.0000 | 0.0000 | ||
TER | 0.0500 | 1.0000 | 0.0256 | ||
TSR | 0.0344 | 0.5000 | 0.0178 | ||
Overall | 0.1001 | 0.5294 | 0.1916 | 0.1916 | |
Model without Normalization and Pretraining | MB | 0.3237 | 0.1931 | 1.0000 | |
NTRR | 0.0465 | 0.7500 | 0.0240 | ||
TWR | 0.2935 | 0.3720 | 0.2424 | ||
TER | 0.1538 | 1.0000 | 0.0833 | ||
TSR | 0.0459 | 0.6666 | 0.0238 | ||
Overall | 0.1626 | 0.6235 | 0.2378 | 0.2378 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, J.; Ying, Y.; Tan, Y.; Liu, Z. Innovative Framework for Historical Architectural Recognition in China: Integrating Swin Transformer and Global Channel–Spatial Attention Mechanism. Buildings 2025, 15, 176. https://doi.org/10.3390/buildings15020176
Wu J, Ying Y, Tan Y, Liu Z. Innovative Framework for Historical Architectural Recognition in China: Integrating Swin Transformer and Global Channel–Spatial Attention Mechanism. Buildings. 2025; 15(2):176. https://doi.org/10.3390/buildings15020176
Chicago/Turabian StyleWu, Jiade, Yang Ying, Yigao Tan, and Zhuliang Liu. 2025. "Innovative Framework for Historical Architectural Recognition in China: Integrating Swin Transformer and Global Channel–Spatial Attention Mechanism" Buildings 15, no. 2: 176. https://doi.org/10.3390/buildings15020176
APA StyleWu, J., Ying, Y., Tan, Y., & Liu, Z. (2025). Innovative Framework for Historical Architectural Recognition in China: Integrating Swin Transformer and Global Channel–Spatial Attention Mechanism. Buildings, 15(2), 176. https://doi.org/10.3390/buildings15020176