Research on Image Classification and Retrieval Using Deep Learning with Attention Mechanism on Diaspora Chinese Architectural Heritage in Jiangmen, China
Abstract
:1. Introduction
2. Methodology
2.1. Study Area and Datasets
2.2. Classification and Retrieval
2.2.1. Classification Module Task (Phase I)
2.2.2. Retrieval Module Tasks (Phase II)
3. Experiment and Discussion
3.1. Data Preprocessing
3.2. Image Classification Experiment
3.3. Contrast Experiment of Mainstream Image Retrieval Methods
3.4. Attention Mechanism Ablation Experiment
3.5. Top 10 Retrieved Images
3.6. Experiments on Public Datasets
4. Conclusions
- (1)
- We have built a JMI architectural heritage image database containing architectural images such as frescos, decorative patterns, chandelier base patterns, and architectural styles. The method of adding Gaussian noise, salt and pepper noise, histogram equalization, and adaptive histogram equalization with limited contrast is used to expand the data, and the number of images in JMI database is expanded to 25,365. This database contains a large number of frescos and patterns with rich cultural connotations of diaspora Chinese, and the constructed architectural heritage image database can provide rich and reliable data support for the follow-up study of architectural trends of the times and the integration of Chinese and Western aesthetics.
- (2)
- In this paper, the parameters trained by Paris500K datasets image source network are migrated to JMI architectural heritage image dataset for image network training through the migration learning method. We used the ResNet50, the GoogLeNet and the VGG16, three excellent convolutional neural network models, to conduct migration training experiments in the JMI image dataset. The results show that the Resnet50 network with migration weight not only has the fastest convergence speed, but also the highest accuracy, of 98.3%. It is the best network for JMI building image datasets classification.
- (3)
- To solve the problem of small difference in image features of the same type of buildings, we propose a two-stage training image retrieval network framework CNNAR Framework network model based on deep learning and the attention mechanism. The CNNAR network is used to conduct image retrieval research on the JMI diaspora Chinese architectural heritage datasets, and at the same time, it is compared with several mainstream network model methods for experimental analysis. The analysis results show that the CNNAR retrieval method proposed in this paper has the best retrieval effect, with an average retrieval accuracy of 76.6% and a recall rate of 19.7%. The architectural image results retrieved by this method are highly similar to the query image. In view of the current state of disrepair and damage of the world cultural heritage of diaspora Chinese architecture in Jiangmen City, through our architectural image retrieval research, we can provide an accurate repair plan for diaspora Chinese architectural heritage images.
- (4)
- The experimental results of image retrieval on the Paris500K and the Corel5K public datasets show that our CNNAR model has a strong generalization ability and can be effectively applied to other topics datasets. In subsequent research, we can enhance and improve the building image data in the JMI datasets, so as to further improve the generalization ability of our model.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Caciora, T.; Herman, G.V.; Ilies, A.; Baias, S.; Ilies, D.C.; Josan, I.; Hodor, N. The use of virtual reality to promote sustainable tourism: A case study of wooden churches historical monuments from Romania. Remote Sens. 2021, 13, 1758. [Google Scholar] [CrossRef]
- Li, J.Y.; Huang, X.; Tu, L.L.; Zhang, T.; Wang, L.G. A review of building detecting from very high resolution optical remote sensing images. Giscience Remote Sens. 2022, 59, 1199–1225. [Google Scholar] [CrossRef]
- Cai, Y.M.; Ding, Y.L.; Zhang, H.W.; Xiu, J.H.; Liu, Z.M. Geo-Location algorithm for building targets in oblique remote sensing images based on deep learning and height estimation. Remote Sens. 2020, 12, 2427. [Google Scholar] [CrossRef]
- Munawar, H.S.; Aggarwal, R.; Qadir, Z.; Khan, S.I.; Kouzani, A.Z.; Mahmud, M.A.P. A gabor filter-based protocol for automated image-based building detection. Buildings 2021, 11, 302. [Google Scholar] [CrossRef]
- Cao, D.G.; Xing, H.F.; Wong, M.S.; Kwan, M.P.; Xing, H.Q.; Meng, Y. A stacking ensemble deep learning model for building extraction from remote sensing images. Remote Sens. 2021, 13, 3898. [Google Scholar] [CrossRef]
- Khoshboresh-Masouleh, M.; Alidoost, F.; Arefi, H. Multiscale building segmentation based on deep learning for remote sensing RGB images from different sensors. J. Appl. Remote Sens. 2020, 14, 034503. [Google Scholar] [CrossRef]
- Kwak, Y.; Yun, W.; Kim, J.; Cho, H.; Park, J.; Choi, M.; Jung, S.; Kim, J. Quantum distributed deep learning architectures: Models, discussions, and applications. ICT Express, 2022; in press. [Google Scholar] [CrossRef]
- Coulibaly, S.; Foguem, B.K.; Kamissoko, D.; Traore, D. Deep learning for precision agriculture: A bibliometric analysis. Intell. Syst. Appl. 2022, 16, 200102. [Google Scholar] [CrossRef]
- Arora, T.K.; Chaubey, P.K.; Raman, M.S.; Kumar, B.; Nagesh, Y.; Anjani, P.K.; Ahmed, H.M.; Hashmi, A. Optimal facial feature based emotional recognition using deep learning algorithm. Comput. Intell. Neurosci. 2022, 2022, 8379202. [Google Scholar] [CrossRef]
- Balogh, Z.A.; Kis, B.J. Comparison of cT noise reduction performances with deep learning-based, conventional, and combined denoising algorithms. Med. Eng. Phys. 2022, 109, 103897. [Google Scholar] [CrossRef]
- Gao, L.; Huang, Y.; Zhang, X.; Liu, Q.; Chen, Z. Prediction of Prospectin Target Based on ResNet Convolutional Neural Network. Appl. Sci. 2022, 12, 11433. [Google Scholar] [CrossRef]
- Jackulin, C.; Murugavalli, S. A comprehensive review on detection of plant disease using machine learning and deep learning approaches. Meas. Sens. 2022, 24, 100441. [Google Scholar] [CrossRef]
- Huang, Y.; Feng, Q.; Zhang, W.; Zhang, L.; Gao, L. Prediction of prospecting target based on selective transfer network. Minerals 2022, 12, 1112. [Google Scholar] [CrossRef]
- Hameed, I.M.; Abdulhussain, S.H.; Mahmmod, B.M. Content-based image retrieval: A review of recent trends. Cogent Eng. 2021, 8, 1927469. [Google Scholar] [CrossRef]
- Aziz, M.A.; Ewees, A.A.; Hassanien, A.E. Multi-objective whale optimization algorithm for content-based image retrieval. Multimed. Tools Appl. 2018, 77, 26135–26172. [Google Scholar] [CrossRef]
- Fu, R.; Li, B.; Gao, Y.; Wang, P. Content-based image retrieval based on CNN and SVM. In Proceedings of the 2016 2nd IEEE International conference on computer and communications (ICCC), Chengdu, China, 14–17 October 2016; pp. 638–642. [Google Scholar] [CrossRef]
- Kilic, S.; Askerzade, I.; Kaya, Y. Using ResNet transfer deep learning methods in person identification according to physical actions. IEEE Access 2020, 8, 220364–220373. [Google Scholar] [CrossRef]
- Hua, C.; Chen, S.; Xu, G.; Lu, Y.; Du, B. Defect identification method of carbon fiber sucker rod based on GoogLeNet-based deep learning model and transfer learning. Mater. Commun. 2022, 33, 104228. [Google Scholar] [CrossRef]
- Prasetyo, E.; Suciati, N.; Fatichah, C. Multi-level residual network VGGNet for fish species classification. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 5286–5295. [Google Scholar] [CrossRef]
- Shi, Z.; Wang, F.; Wang, Y.; Jia, H. Image emotion recognition research based on separable convolution attention mechanism (SCAM) neural network. Laser J. 2022, 43, 88–93. [Google Scholar] [CrossRef]
- Lan, Y.; Peng, B.; Wu, X.; Teng, F. Infrared dim and small targets detection via self-attention mechanism and pipeline correlator. Digit. Signal Process. 2022, 130, 103733. [Google Scholar] [CrossRef]
- Vanian, V.; Zamanakos, G.; Pratikakis, I. Improving performance of deep learning model for 3d point cloud semantic segmentation via attention mechanisms. Comput. Graph. 2022, 106, 277–287. [Google Scholar] [CrossRef]
- Wang, Y.; Feng, Y.; Zhang, L.; Zhou, J.; Yong, L.; Goh, S.M.R.; Zhen, L. Adversarial multimodal fusion with attention mechanism for skin lesion classification using clinical and dermoscopic images. Med. Image Anal. 2022, 81, 102535. [Google Scholar] [CrossRef]
- Wang, X.; Yuan, Y.; Guo, D.; Huang, X.; Cui, Y.; Xia, M.; Wang, Z.; Bai, C.; Chen, S. SSA-Net: Spatial self-attention network for COVID-19 pneumonia infection segmentation with semi-supervised few-shot learning. Med. Image Anal. 2022, 79, 102459. [Google Scholar] [CrossRef] [PubMed]
- Ma, K.; Wang, B.W.; Li, Y.Q.; Zhang, J.X. Image retrieval for local architectural heritage recommendation based on deep hashing. Buildings 2022, 12, 809. [Google Scholar] [CrossRef]
- Wang, Y.S.; Hu, X. Machine learning-base image recognition for rural architectural planning and design. Neural Comput. Appl. 2022, 1–10. [Google Scholar] [CrossRef]
- Xie, X.S.; Wen, X.; Deng, F.F. Applications of 3D image using internet of things in the exhibition of classical architecture art style. Mob. Inf. Syst. 2021, 2021, 2283354. [Google Scholar] [CrossRef]
- Llamas, J.; Lerones, P.M.; Medina, R.; Zalama, E.; Gomez-Garcia-Bermejo, J. Classification of architectural heritage images using deep learning techniques. Appl. Sci. 2017, 7, 992. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.J.; Li, S.C.; Teng, F.; Lin, Y.H.; Wang, M.J.; Cai, H.F. Improved mask R-CNN for rural building roof type recognition from UAV high-resolution images: A case study in hunan province, China. Remote Sens. 2022, 14, 265. [Google Scholar] [CrossRef]
- Hong, Z.H.; Zhong, H.Z.; Pan, H.Y.; Liu, J.; Zhou, R.Y.; Zhang, Y.; Han, Y.L.; Wang, J.; Yang, S.H.; Zhong, C.Y. Classification of building damage using a novel convolutional neural network based on post-disaster aerial images. Sensors 2022, 22, 5920. [Google Scholar] [CrossRef]
- Taoufiq, S.; Nagy, B.; Benedek, C. HierarchyNet: Hierarchical CNN-based urban building classification. Remote Sens. 2020, 12, 3794. [Google Scholar] [CrossRef]
- Weyand, T.; Leibe, B.T. Visual landmark recognition from internet photo collections: A large-scale evaluation. Comput. Vis. Image Underst. 2015, 135, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Jiu, M.; Sahbi, H. Context-aware deep kernel networks for image annotation. Neurocomputing 2022, 474, 154–167. [Google Scholar] [CrossRef]
- Gupta, A.; Pawade, P.; Balakrishnan, R. Deep residual network and transfer learning-based person re-identification. Intell. Syst. Appl. 2022, 10, 200137. [Google Scholar] [CrossRef]
- Li, Z.C.; Dong, J.W. A framework integrating deeplabV3+, transfer learning, active learning, and incremental learning for mapping building footprints. Remote Sens. 2022, 14, 4738. [Google Scholar] [CrossRef]
- Huang, M.; Yin, J.; Yan, S.; Xue, P. A fault diagnosis method of bearings based on deep transfer learning. Simul. Model. Pract. Theory 2023, 122, 102659. [Google Scholar] [CrossRef]
- Peng, L.; Wu, H.; Gao, M.; Yi, H.; Xiong, Q.; Yang, L.; Cheng, S. TLT: Recurrent fine-tuning transfer learning for water quality long-term prediction. Water Res. 2022, 225, 119171. [Google Scholar] [CrossRef]
- Zhu, C.; Ni, J.; Yang, Z.; Sheng, Y.; Yang, J.; Zhang, W. Bandgap prediction on small thermoelectric material dataset via instance-based transfer learning. Comput. Theor. Chem. 2022, 1217, 113872. [Google Scholar] [CrossRef]
- Yu, X.; Wang, J.; Hong, Q.; Teku, R.; Wang, S.; Zhang, Y. Transfer learning for medical images analyses: A survey. Neurocomputing 2022, 489, 230–254. [Google Scholar] [CrossRef]
- He, K.M.; Zhang, X.; Ren, S. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Wang, Y.; Xie, Y.; Zeng, J.; Wang, H.; Fan, L.; Song, Y. Cross-modal fusion for multi-label image classification with attention mechanism. Comput. Electr. Eng. 2022, 101, 108002. [Google Scholar] [CrossRef]
- Chen, Q.; Fan, J.; Chen, W. An improved image enhancement framework based on multiple attention mechanism. Displays 2021, 70, 102091. [Google Scholar] [CrossRef]
- Zhang, S.; Deng, X.; Lu, Y.; Hong, S.; Kong, Z.; Peng, Y.; Luo, Y. A channel attention based deep neural network for automatic metallic corrosion detection. J. Build. Eng. 2021, 42, 103046. [Google Scholar] [CrossRef]
- Qiu, Z.; Becker, S.I.; Pegna, A.J. Spatial attention shifting to fearful faces depends on visual awareness in attentional blink: An ERP study. Neuropsychologia 2022, 172, 108283. [Google Scholar] [CrossRef] [PubMed]
- Cheng, Y.; Wang, H. A modified contrastive loss method for face recognition. Pattern Recognit. Lett. 2019, 125, 785–790. [Google Scholar] [CrossRef]
- Zhang, Z.C.; Wang, H.B.; Wang, N.B. Sample extraction and method with feature reconstruction and deformation information. Appl. Intell. 2022, 52, 15916–15928. [Google Scholar] [CrossRef]
- Yu, W.X.; Lu, Y.; Wang, J.N. Application of small sample virtual expansion and spherical mapping model in wind turbine fault diagnosis. Expert Syst. Appl. 2021, 183, 115397. [Google Scholar] [CrossRef]
- Koyuncu, H.; Ceylan, R. Elimination of white gaussian noise in arterial phase CT images to bring adrenal tumours into the forefront. Comput. Med. Imaging Graph. 2018, 65, 46–57. [Google Scholar] [CrossRef]
- Piroozmandan, M.M.; Farokhi, F.; Kangarloo, K.; Jahanshahi, M. Removing the impulse noise from images based on fuzzy cellular automata by using a two-phase innovative method. Optik 2022, 255, 168713. [Google Scholar] [CrossRef]
- Vijayalakshmi, D.; Nath, M.K. A novel multilevel framework based contrast enhancement for uniform and non-uniform background images using a suitable histogram equalization. Digit. Signal Process. 2022, 127, 103532. [Google Scholar] [CrossRef]
- Ullah, Z.; Farooq, M.U.; Lee, S.; An, D. A hybrid image enhancement based brain MRI images classification technique. Med. Hypothese 2020, 143, 109922. [Google Scholar] [CrossRef]
- Zhao, J.H.; Wang, X.; Dou, X.T.; Zhao, Y.X.; Fu, Z.X.; Guo, M.; Zhang, R.J. A high-precision image classification network model based on a voting mechanism. Int. J. Digit. Earth 2022, 15, 2168–2183. [Google Scholar] [CrossRef]
- Ma, J.W.; Czerniawski, T.; Lite, F. An application of metadata-based image retrieval system for facility management. Adv. Eng. Inform. 2021, 50, 101417. [Google Scholar] [CrossRef]
- Sun, W.W.; Wang, H.Q.; Lu, Y.; Luo, J.S.; Liu, T.; Lin, J.Z.; Pang, Y.; Zhang, G. Deep-learning-based complex scene text detection algorithm for architectural images. Mathematics 2022, 10, 3914. [Google Scholar] [CrossRef]
- Khatami, A.; Babaie, M.; Tizhoosh, H.R.; Khosravi, A.; Nguyen, T.; Nahavandi, S. A sequential search-space shrinking using CNN transfer learning and a radon projection pool for medical image retrieval. Expert Syst. Appl. 2018, 100, 224–233. [Google Scholar] [CrossRef]
- Singh, P.; Hrisheekesha, P.N.; Singh, V.K. CBIR-CNN: Content-based image retrieval on celebrity data using deep convolution neural network. Recent Adv. Comput. Sci. Commun. 2021, 14, 257–272. [Google Scholar] [CrossRef]
Network Model | Classification Accuracy (%) |
---|---|
TL ResNet50 | 98.3 |
ResNet50 | 94.5 |
TL GoogLeNet | 95.2 |
GoogLeNet | 91.2 |
TL VGG16 | 91.4 |
VGG16 | 89.3 |
Network | Map (%) | R@10 (%) |
---|---|---|
CNNAR | 76.6 | 19.7 |
CTSL [55] | 70.4 | 18.5 |
FLM [56] | 73.2 | 19.0 |
TL ResNet50 | 68.8 | 14.6 |
Network | AP (%) | |||
---|---|---|---|---|
Fresco | Decorative Pattern | Chandelier Base Pattern | Architectural Style | |
No attention mechanism | 73.0 | 75.6 | 80.5 | 74.7 |
channel attention mechanism | 78.9 | 77.4 | 82.2 | 75.3 |
Spatial attention mechanism | 76.5 | 76.7 | 82.0 | 75.0 |
Fusion attention mechanism | 80.5 | 79.9 | 85.3 | 75.8 |
Network | mAP (%) | R@10 (%) |
---|---|---|
CNNAR | 71.8 | 6.4 |
CTSL | 71.2 | 6.1 |
FLM | 73.4 | 7.5 |
Network | mAP (%) | R@10 (%) |
---|---|---|
CNNAR | 72.5 | 7.3 |
CTSL | 70.2 | 6.6 |
FLM | 73.9 | 8.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, L.; Wu, Y.; Yang, T.; Zhang, X.; Zeng, Z.; Chan, C.K.D.; Chen, W. Research on Image Classification and Retrieval Using Deep Learning with Attention Mechanism on Diaspora Chinese Architectural Heritage in Jiangmen, China. Buildings 2023, 13, 275. https://doi.org/10.3390/buildings13020275
Gao L, Wu Y, Yang T, Zhang X, Zeng Z, Chan CKD, Chen W. Research on Image Classification and Retrieval Using Deep Learning with Attention Mechanism on Diaspora Chinese Architectural Heritage in Jiangmen, China. Buildings. 2023; 13(2):275. https://doi.org/10.3390/buildings13020275
Chicago/Turabian StyleGao, Le, Yanqing Wu, Tian Yang, Xin Zhang, Zhiqiang Zeng, Chak Kwan Dickson Chan, and Weihui Chen. 2023. "Research on Image Classification and Retrieval Using Deep Learning with Attention Mechanism on Diaspora Chinese Architectural Heritage in Jiangmen, China" Buildings 13, no. 2: 275. https://doi.org/10.3390/buildings13020275
APA StyleGao, L., Wu, Y., Yang, T., Zhang, X., Zeng, Z., Chan, C. K. D., & Chen, W. (2023). Research on Image Classification and Retrieval Using Deep Learning with Attention Mechanism on Diaspora Chinese Architectural Heritage in Jiangmen, China. Buildings, 13(2), 275. https://doi.org/10.3390/buildings13020275