An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models
Abstract
1. Introduction
- To evaluate and compare the proposed approach against multiple baseline models, demonstrating improved accuracy, robustness, and potential for practical smart tourism applications.
- The introduction of a pixel-level enhancement method that amplifies high-intensity features relevant to architectural details in landmark classification.
- The development of a dual-path multi-epoch ensemble learning strategy leveraging both original and enhanced images to improve model robustness.
- Comprehensive experimental validation demonstrating superior performance compared to widely used deep learning architectures.
- Practical contribution to smart tourism systems by improving automated destination classification and supporting cultural heritage preservation.
- RQ1: How can pixel-level enhancement improve the robustness of landmark classification models under varying environmental conditions in smart tourism scenarios?
- RQ2: Can a multi-epoch ensemble framework using YOLO11 models trained on both original and enhanced images achieve better classification accuracy compared to single deep learning models?
- RQ3: How effective is the proposed approach in improving classification performance for geographically diverse and culturally rich tourist destinations?
2. Related Works
2.1. Deep Learning for Tourist Destination Classification
2.2. Ensemble Learning for Robust Image Classification
2.3. Smart Tourism and AI-Driven Destination Recognition
3. Proposed Methodology
- Dataset Preparation:The dataset used in this study was custom-developed by capturing images of historical sites in Samarkand. Photographs were taken from multiple angles and under varying lighting conditions to ensure a diverse and representative dataset. The dataset was manually labeled into distinct historical landmark classes. Following collection, the dataset was divided into training (80%), validation (10%), and test (10%) subsets to ensure robust model evaluation.
- Image Preprocessing:To enhance image quality and facilitate more effective feature extraction, we applied a custom preprocessing technique in which pixel values exceeding 225 were squared:This approach is motivated by the fact that high-intensity pixels, typically in the 225 to 255 range, often correspond to reflective surfaces or sunlit architectural elements such as domes, tiles, and marble structures, which are common in historical landmarks. These regions, while visually distinctive, may be underrepresented in feature learning due to their narrow dynamic range near the saturation limit.By squaring these high-value pixels, we nonlinearly amplify subtle variations in brightness, thereby making discriminative features in bright areas more prominent. This enhancement helps the convolutional filters to better capture fine structural details, which is especially important for classifying architecturally similar landmarks under challenging lighting conditions.Importantly, the transformation minimally affects the overall image distribution, as only a small subset of pixels typically exceeds the threshold. Following enhancement, all images were resized to a uniform resolution of 640×640 pixels and normalized to the [0, 1] range to ensure consistent input for training.
- Parallel Model Training:Two independent YOLO11 models were trained:
- -
- , trained on pixel-enhanced images.
- -
- , trained on original images.
Both models were trained using the same architecture and hyperparameters, ensuring a fair comparison. The best epoch was selected based on validation accuracy and stored as the final model checkpoint. - Logit Extraction and Ensemble Strategy:Once both models were trained, logits were extracted from the validation set. The logits, which represent raw class confidence scores before softmax normalization, were then ensembled using an averaging approach:
- Evaluation Metrics:The performance of the ensemble model was evaluated using standard classification metrics:
Algorithm 1 YOLO11-Based Ensemble Model with Pixel-Enhanced Training for Smart Tourism Landmark Recognition |
|
4. Experiments
4.1. Dataset
4.2. Baseline Models
- MobileNetV3 [51]—Developed by Google, MobileNetV3 is optimized for mobile devices and emphasizes a balance between latency and accuracy. Its architecture incorporates efficient building blocks like depthwise separable convolutions and is enhanced with architecture search techniques and a novel activation function, h-swish. This model is particularly suitable for real-time applications and has shown effectiveness in image classification tasks involving constrained computational resources.
- EfficientNetB0 [52]—As a part of the EfficientNet family, EfficientNetB0 scales uniformly at the dimension of depth, width, and resolution with a compound coefficient. Introduced to structure the scaling of CNNs for better efficiency and accuracy, it achieves higher accuracy with fewer parameters, making it ideal for handling diverse and complex datasets like tourist destination images.
- ResNet50 [53]—A member of the Residual Networks family, ResNet50 features “skip connections” that facilitate the training of much deeper networks by addressing the vanishing gradient problem. This architecture improves the classification performance significantly on large-scale image datasets.
- YOLO11N [12]—An extension of the YOLO (You Only Look Once) family, YOLO11N is tailored for object detection with a focus on balancing speed and accuracy. It is capable of detecting objects in real time, making it highly suitable for applications like tourist destination recognition where quick and efficient processing of visual information is required.
4.3. Training Setup
4.4. Experimental Results and Discussion
4.5. Limitations and Future Work
- Dataset Scope and Generalizability: The current study focuses solely on historical landmarks in Samarkand. While the curated dataset ensures high intra-class diversity, the model’s ability to generalize to different geographic or architectural contexts remains untested. Cross-city validation is necessary to evaluate broader applicability.
- Computational Complexity: The ensemble approach involves training two independent YOLO11 models, which increases training time and resource consumption. Although inference remains efficient due to logit-level fusion, the training phase may not be feasible for low-resource environments.
- Environmental Limitations: The dataset does not comprehensively cover extreme lighting conditions (e.g., nighttime scenes) or severe occlusions (e.g., crowds). Additional robustness testing under such conditions is needed to confirm deployment readiness.
- Lack of Real-Time Field Testing: While model performance on curated data is encouraging, the system has not yet been evaluated in a live smart tourism setting. Future studies should integrate the model into mobile applications or AR systems to assess real-time effectiveness and user experience.
5. Conclusions
- The development of a robust ensemble learning framework that integrates multiple deep learning models to enhance classification performance;
- The implementation of advanced image preprocessing techniques that optimize feature extraction and improve model reliability;
- Extensive evaluation of the model’s performance, demonstrating its superiority over traditional single-model approaches in the context of smart tourism applications.
- Exploring the application of the model in different geographic and cultural contexts to further validate its effectiveness and adaptability.
- Investigating the integration of additional modalities such as textual or audio data to enhance the classification capabilities of the model.
- Developing more computationally efficient models to enable real-time processing on mobile devices, thus expanding the practical applications of this research in the field of mobile tourism.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jabbari, M.; Amini, M.; Malekinezhad, H.; Berahmand, Z. Improving augmented reality with the help of deep learning methods in the tourism industry. Math. Comput. Sci. 2023, 4, 33–45. [Google Scholar]
- Pencarelli, T. The digital revolution in the travel and tourism industry. Inf. Technol. Tour. 2020, 22, 455–476. [Google Scholar] [CrossRef]
- Guerrero-Rodríguez, R.; Álvarez-Carmona, M.Á.; Aranda, R.; Díaz-Pacheco, Á. Big data analytics of online news to explore destination image using a comprehensive deep-learning approach: A case from Mexico. Inf. Technol. Tour. 2024, 26, 147–182. [Google Scholar] [CrossRef]
- Melo, M.; Coelho, H.; Gonçalves, G.; Losada, N.; Jorge, F.; Teixeira, M.S.; Bessa, M. Immersive multisensory virtual reality technologies for virtual tourism. Multimed. Syst. 2022, 28, 1027–1037. [Google Scholar] [CrossRef]
- Bhosale, T.A.; Pushkar, S. IWF-ECTIC: Improved Wiener filtering and ensemble of classification model for tourism image classification. Multimed. Tools Appl. 2024. [Google Scholar] [CrossRef]
- Tussyadiah, A. Intelligent automation systems in tourism. Tour. Manag. 2023, 77, 254–266. [Google Scholar]
- Li, Z.; Gao, S.; Chen, W. Integrating IoT and AI for tourism management. Tour. Technol. 2023, 12, 88–99. [Google Scholar]
- Wang, P.; Jiang, Y.; Li, X. Smart technologies for sustainable tourism. Tour. Dev. Rev. 2023, 24, 115–127. [Google Scholar]
- He, Q.; Wu, J.; Zhang, L. Integrating sustainability metrics in tourism management. Sustain. Tour. 2023, 9, 200–214. [Google Scholar]
- Zhang, X.; Wei, L.; Li, X. Sentiment analysis of social media data for tourism marketing. J. Soc. Media Tour. 2023, 12, 110–122. [Google Scholar]
- Zhang, L.; Huang, S.; Li, T. The role of data analytics in tourism decision-making. J. Tour. Anal. 2023, 12, 125–139. [Google Scholar]
- Jocher, G.; Qiu, J. Ultralytics YOLO11: Real-Time Object Detection Model (Version 11.0.0). Ultralytics. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 15 April 2025).
- Viyanon, W. An Interactive Multiplayer Mobile Application Using Feature Detection and Matching for Tourism Promotion. In Proceedings of the 2nd International Conference on Control and Computer Vision, Jeju Island, Republic of Korea, 15–18 June 2019; pp. 82–86. [Google Scholar]
- Bui, V.; Alaei, A. Virtual reality in training artificial intelligence-based systems: A case study of fall detection. Multimed. Tools Appl. 2022, 81, 32625–32642. [Google Scholar] [CrossRef]
- Carneiro, A.; Nascimento, L.S.; Noernberg, M.A.; Hara, C.S.; Pozo, A.T.R. Social media image classification for jellyfish monitoring. Aquat. Ecol. 2024, 58, 3–15. [Google Scholar] [CrossRef]
- Yao, J.; Chu, Y.; Xiang, X.; Huang, B.; Xiaoli, W. Research on detection and classification of traffic signs with data augmentation. Multimed. Tools Appl. 2023, 82, 38875–38899. [Google Scholar] [CrossRef]
- Ma, H. Development of a smart tourism service system based on the Internet of Things and machine learning. J. Supercomput. 2024, 80, 6725–6745. [Google Scholar] [CrossRef]
- Wu, W.; Liu, H.; Zhang, X. AI in tourism marketing: Improving decision-making. J. AI Tour. 2023, 10, 65–79. [Google Scholar]
- Liu, H.; Tsionas, M.; Assaf, P. Personalized tourism experiences with AI. AI Appl. Tour. 2023, 13, 34–46. [Google Scholar]
- Hao, Y.; Zheng, L. Application of SLAM method in big data rural tourism management in dynamic scenes. Soft Comput. 2023. [Google Scholar] [CrossRef]
- Schorr, J.L.; Bhattacharya, P.; Yan, D. Predictive analytics for demand forecasting in tourism. Tour. Econ. 2024, 32, 103–118. [Google Scholar]
- Pingdong, H. Application of optical imaging detection based on embedded edge computing in the evaluation of forest park tourism resources. Opt. Quantum Electron. 2024, 56, 642. [Google Scholar] [CrossRef]
- Lee, C.-Y.; Khanum, A.; Kumar, P.P. Multi-food detection using a modified Swin-Transformer with recursive feature pyramid network. Multimed. Tools Appl. 2024, 83, 57731–57757. [Google Scholar] [CrossRef]
- Martín-Rojo, I.; Gaspar-González, A.I. The impact of social changes on MICE tourism management in the age of digitalization: A bibliometric review. Rev. Manag. Sci. 2024, 18, 1–24. [Google Scholar] [CrossRef]
- Dang, Q.M.; Truong, M.T.; Dang, T.L. A lightweight approach for image quality assessment. Signal Image Video Process. 2024, 18, 6761–6768. [Google Scholar] [CrossRef]
- Fadli, H.; Ibrahim, R.; Arshad, H.; Yaacob, S. Augmented reality in cultural heritage tourism: A review of past study. Open Int. J. Inf. 2022, 10, 109–121. [Google Scholar]
- Aicardi, I.; Chiabrando, F.; Lingua, A.M.; Noardo, F. Recent trends in cultural heritage 3D survey: The photogrammetric computer vision approach. J. Cult. Herit. 2018, 32, 257–266. [Google Scholar] [CrossRef]
- Patel, K.; Parmar, B. Assistive device using computer vision and image processing for visually impaired; review and current status. Disabil. Rehabil. Assist. Technol. 2020, 16, 115–125. [Google Scholar] [CrossRef]
- Budrionis, A.; Plikynas, D.; Daniušis, P.; Indrulionis, A. Smartphone-based computer vision travelling aids for blind and visually impaired individuals: A systematic review. Assist. Technol. 2020, 34, 178–194. [Google Scholar] [CrossRef]
- Loi, K.I.; Kong, W.H. Tourism for all: Challenges and issues faced by people with vision impairment. Tour. Plan. Dev. 2017, 14, 181–197. [Google Scholar] [CrossRef]
- Ivanov, P.; Webster, A.; Lee, C. The role of robotic systems in enhancing customer experience at tourism venues. Tour. Hosp. Res. 2023, 19, 105–121. [Google Scholar]
- Assaf, A.G.; Josiassen, E.T.; Tsionas, M. Optimizing tourism operations using big data. J. Hosp. Tour. Manag. 2023, 23, 45–58. [Google Scholar]
- Feng, L.; Liu, H.; Zhang, X. Analyzing tourist preferences with big data analytics. Tour. Res. J. 2023, 25, 120–135. [Google Scholar]
- Zhao, Y.; Wu, J.; Li, C. Data-driven technologies in tourism management. Tour. Rev. 2023, 29, 101–113. [Google Scholar]
- Zhou, Y.; Xie, W.; Wang, M. Mobile applications for enhancing tourism experiences. J. Travel Technol. 2023, 15, 45–56. [Google Scholar]
- Maier, A.; Hill, M.D.S.; Tseng, T.C.F. Mobile AR applications for educational tourism. Tour. Technol. Innov. 2023, 19, 75–88. [Google Scholar]
- Backer, A.R.; Ritchie, B.W. Augmented reality in tourism marketing. J. Mark. Tour. 2023, 16, 140–156. [Google Scholar]
- Faulkner, A.; Tsionas, M. Sustainable practices in cultural heritage tourism. Tour. Sustain. J. 2023, 11, 123–135. [Google Scholar]
- Tsionas, M.; Assaf, A.G. AI for tourism service quality assessment. Tour. Serv. Manag. J. 2023, 14, 48–60. [Google Scholar]
- Liu, T.; Guo, S.; Chen, Q. Predicting and managing tourism crises with AI. Crisis Manag. Tour. 2023, 18, 90–102. [Google Scholar]
- Tsionas, M.; Assaf, A.G. Frontier methods in performance modeling for tourism. Tour. Perform. J. 2023, 15, 45–60. [Google Scholar]
- He, Q.; Zhang, L.; Ma, X. Using deep learning for tourism demand forecasting. Tour. Forecast. Anal. 2023, 18, 200–213. [Google Scholar]
- Zhang, X.; Wu, J.; Zhao, Q. Optimizing tourism resource management using AI. Tour. Resour. Manag. 2023, 12, 80–93. [Google Scholar]
- Li, L.; Liu, H.; Zhao, Q. AI-driven decision-making for tourism marketing. Tour. Mark. J. 2023, 30, 120–134. [Google Scholar]
- Liu, Y.; Chen, T.; Zhang, X. Risk management in tourism with AI. Tour. Risk Manag. 2023, 22, 134–146. [Google Scholar]
- Ma, X.; Wang, Y.; Xie, J. AI and IoT in smart tourism cities. Smart Tour. City J. 2023, 16, 101–115. [Google Scholar]
- Zhou, W.; Li, L.; Zhang, X. Improving tourist engagement using AI. Tour. Engagem. J. 2023, 17, 55–67. [Google Scholar]
- Wu, J.; Zhang, W.; Li, S. AI and big data in tourism crisis management. Tour. Crisis Manag. 2023, 14, 80–91. [Google Scholar]
- Zhao, Y.; Li, Q.; Liu, H. Enhancing tourist experiences through AI-driven services. J. Hosp. Technol. 2023, 18, 100–115. [Google Scholar]
- Liu, H.; Zhang, Y.; Wu, X. Smart tourism development through AI and IoT. Tour. Innov. J. 2023, 16, 90–103. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
MobileNet_V3 | 0.9167 | 0.9198 | 0.9167 | 0.9164 |
ResNet50 | 0.8889 | 0.8990 | 0.8889 | 0.8849 |
EfficientNet_B0 | 0.9444 | 0.9475 | 0.9444 | 0.9443 |
YOLO11n-cls | 9815 | 0.9814 | 0.9820 | 0.9813 |
Proposed Model | 0.9907 | 0.9915 | 0.9921 | 0.9914 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hudayberdiev, U.; Lee, J. An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models. Sustainability 2025, 17, 5420. https://doi.org/10.3390/su17125420
Hudayberdiev U, Lee J. An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models. Sustainability. 2025; 17(12):5420. https://doi.org/10.3390/su17125420
Chicago/Turabian StyleHudayberdiev, Ulugbek, and Junyeong Lee. 2025. "An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models" Sustainability 17, no. 12: 5420. https://doi.org/10.3390/su17125420
APA StyleHudayberdiev, U., & Lee, J. (2025). An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models. Sustainability, 17(12), 5420. https://doi.org/10.3390/su17125420