An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models

Hudayberdiev, Ulugbek; Lee, Junyeong

doi:10.3390/su17125420

Open AccessArticle

An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models

by

Ulugbek Hudayberdiev

and

Junyeong Lee

^*

Department of Management Information Systems, Chungbuk National University, Cheongju 28644, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(12), 5420; https://doi.org/10.3390/su17125420

Submission received: 8 May 2025 / Revised: 1 June 2025 / Accepted: 9 June 2025 / Published: 12 June 2025

(This article belongs to the Special Issue Digital Technology and Conservation Science for Sustainable Preservation of Cultural Heritage: Interdisciplinary Studies, Challenges, and Perspectives)

Download

Browse Figures

Versions Notes

Abstract

Tourist destination classification is pivotal for enhancing the travel experience, supporting cultural heritage preservation, and enabling smart tourism services. With recent advancements in artificial intelligence, deep learning-based systems have significantly improved the accuracy and efficiency of landmark recognition. To address the limitations of existing datasets, we developed the Samarkand dataset, containing diverse images of historical landmarks captured under varying environmental conditions. Additionally, we created enhanced image variants by squaring pixel values greater than 225 to emphasize high-intensity architectural features, improving the model’s ability to recognize subtle visual patterns. Using these datasets, we trained two parallel YOLO11 models on original and enhanced images, respectively. Each model was independently trained and validated, preserving only the best-performing epoch for final inference. We then ensembled the models by averaging the model outputs from the best checkpoints to leverage their complementary strengths. Our proposed approach outperforms conventional single-model baselines, achieving an accuracy of 99.07%, precision of 99.15%, recall of 99.21%, and F1-score of 99.14%, particularly excelling in challenging scenarios involving poor lighting or occlusions. The model’s robustness and high performance underscore its practical value for smart tourism systems. Future work will explore broader geographic datasets and real-time deployment on mobile platforms.

Keywords:

smart tourism; YOLO11; tourist landmark recognition; cultural heritage preservation; image enhancement; ensemble deep learning; Samarkand dataset

1. Introduction

Artificial intelligence (AI)-powered landmark classification has become a key enabler in the digital transformation of the tourism industry, offering significant benefits for tourists, destination managers, and policymakers [1,2,3]. First, it enhances the tourist experience by enabling real-time identification of landmarks using mobile devices or wearable technology [4,5]. This allows travelers to access personalized guides, augmented reality (AR) overlays, and context-aware itinerary suggestions without manual intervention, significantly improving visitor satisfaction, especially in foreign or unfamiliar environments [1,6].

From a sustainability perspective, AI-based classification plays a vital role in managing tourism flows and reducing overcrowding at popular sites [7,8]. Smart tourism platforms can utilize real-time classification results to guide visitors toward less crowded yet culturally important attractions, easing congestion and promoting a more balanced distribution of tourist activity [9]. Furthermore, these classification models empower tourism boards with data analytics capabilities. Automated tagging and organization of user-generated content (UGC), such as photos and videos shared on social media, help analyze destination popularity trends and enhance the effectiveness of digital marketing strategies [10,11].

An overview of the complete research workflow is illustrated in Figure 1. The process begins with dataset collection, where historical landmark images from Samarkand are acquired under various lighting conditions. Samarkand is a historically significant city located in Uzbekistan, known for its rich cultural heritage, ancient architecture, and status as a UNESCO World Heritage Site. Following data acquisition, a pixel enhancement preprocessing step is applied to emphasize high-intensity architectural features. Two separate YOLO11 models are then trained independently on the original and enhanced datasets. After selecting the best-performing epoch for each model, logits are extracted and combined through averaging to form the ensemble prediction. Finally, the performance of the ensemble model is evaluated using standard classification metrics. This workflow ensures robust and accurate tourist destination classification within the proposed framework.

For emerging destinations, particularly historical cities such as Samarkand, AI-based models provide scalable and cost-effective solutions. Lightweight architectures like YOLO11 [12] can be deployed on standard mobile devices, enabling local tourism initiatives to promote their landmarks without requiring advanced infrastructure or manual curation. Policymakers can also benefit from classified image data by monitoring seasonal interest, managing site maintenance, and guiding long-term investment planning.

AI-driven landmark classification also contributes to tourism resilience in times of crisis. During pandemics, natural disasters, or other disruptions, it enables remote monitoring of site conditions and supports the continuity of visitor experiences through virtual tours or AR-based digital exploration. Altogether, AI-based tourist destination classification supports the development of smart, accessible, and sustainable tourism systems.

Tourist destination classification plays a crucial role in enhancing the tourism experience, optimizing travel planning, and supporting cultural heritage preservation. With the increasing adoption of AI and deep learning techniques, automated classification systems have been developed to identify and categorize tourist attractions from images, aiding in smart tourism applications. These AI-driven systems enable real-time landmark recognition, improve visitor guidance, and enhance tourism management strategies. Furthermore, classification models contribute to personalized travel recommendations, supporting businesses and tourism agencies in targeting potential visitors through intelligent marketing solutions [1,13].

Another significant application of tourist destination classification is in the field of cultural heritage preservation. AI-based models assist in documenting and archiving historical sites, ensuring their digital preservation for future generations [4,14]. Additionally, these models are integrated into smart tourism ecosystems, enabling augmented reality (AR) applications that overlay contextual information on historical landmarks [5,15]. In the era of digital tourism, AI-driven classification also facilitates real-time analytics, allowing tourism authorities to monitor visitor behavior, optimize crowd management, and enhance destination sustainability [16,17].

Tourism is one of the fastest-growing industries globally, contributing significantly to economic growth, cultural exchange, and regional development. The increasing reliance on digital technologies has transformed the sector, enabling innovations in destination classification, smart tourism applications, and automated visitor experience management. One of the critical challenges in modern tourism research is the accurate classification of tourist destinations from images, which plays a vital role in personalized recommendations, travel planning, and cultural heritage preservation [1,13].

AI-driven tourism marketing strategies have also benefited from big data analytics and sentiment analysis. Researchers have investigated how deep learning can extract user preferences from online reviews and social media content, enabling personalized tourism recommendations and targeted destination marketing [10,18]. Furthermore, AI-driven tourism risk management and crisis forecasting have emerged as crucial tools for improving tourism resilience, particularly in response to pandemics, economic downturns, and climate-related disruptions [19].

Despite notable progress in AI-based landmark classification, several critical challenges remain unresolved in the current literature. Existing datasets often exhibit limited diversity in terms of geographic and cultural representation, restricting the generalizability of models across global tourism scenarios. For example, refs. [2,3] emphasize that most publicly available datasets are culturally homogeneous, with insufficient coverage of diverse architectural styles and environments.

Moreover, the robustness of classification models under real-world conditions remains a concern. Studies such as [16,17] report significant performance degradation when images contain lighting variations, occlusions, or cluttered backgrounds. These factors are common in tourist settings and thus must be explicitly addressed for practical deployment of smart tourism systems.

In addition, while ensemble learning has demonstrated strong performance improvements in other domains [20], its integration into landmark classification, particularly using multi-epoch strategies within lightweight architectures like YOLO, remains underexplored. To the best of our knowledge, no prior work has systematically investigated the benefits of ensembling YOLO models trained on both original and enhanced image variants for tourist destination recognition.

This research aims to fill these gaps by proposing a novel YOLO11-based ensemble framework. The proposed system leverages a custom dataset collected from historically significant landmarks in Samarkand, combined with a pixel enhancement preprocessing technique and a multi-epoch training strategy. These contributions collectively aim to improve classification accuracy, robustness under challenging conditions, and generalizability to real-world tourism environments.

Based on the identified research gaps discussed in the Introduction, this study aims to develop a robust AI-based tourist destination classification system by addressing the limitations of dataset diversity, environmental variability, and underexplored ensemble learning strategies in landmark recognition.

The specific objectives of this study are

To construct a geographically diverse and culturally rich tourist landmark dataset by collecting images of historical sites in Samarkand under various lighting and environmental conditions, addressing the dataset diversity limitations reported in previous works [2,3];
To apply a novel pixel enhancement technique that emphasizes high-intensity architectural features by squaring pixel values above a defined threshold, improving robustness under varying lighting conditions as highlighted in prior studies [16,17];
To design a multi-epoch ensemble learning framework based on YOLO11 models trained on both original and enhanced datasets, addressing the lack of ensemble integration strategies in YOLO-based landmark classification systems [20,21];
To evaluate and compare the proposed approach against multiple baseline models, demonstrating improved accuracy, robustness, and potential for practical smart tourism applications.

The primary contributions of this work are as follows:

The introduction of a pixel-level enhancement method that amplifies high-intensity features relevant to architectural details in landmark classification.
The development of a dual-path multi-epoch ensemble learning strategy leveraging both original and enhanced images to improve model robustness.
Comprehensive experimental validation demonstrating superior performance compared to widely used deep learning architectures.
Practical contribution to smart tourism systems by improving automated destination classification and supporting cultural heritage preservation.

By addressing current limitations in AI-based destination classification, this research contributes to advancing smart tourism technologies, improving automated image recognition in tourism, and promoting AI-driven decision support systems for destination marketing and tourism sustainability.

Based on the identified research gaps, this study seeks to address the following research questions:

RQ1: How can pixel-level enhancement improve the robustness of landmark classification models under varying environmental conditions in smart tourism scenarios?
RQ2: Can a multi-epoch ensemble framework using YOLO11 models trained on both original and enhanced images achieve better classification accuracy compared to single deep learning models?
RQ3: How effective is the proposed approach in improving classification performance for geographically diverse and culturally rich tourist destinations?

The rest of this paper is organized as follows: Section 2 presents a review of related works in computer vision for tourism, AI-based destination classification, smart tourism technologies, and AI-driven tourism sustainability. Section 3 describes the proposed methodology, including data preprocessing, model architecture, and ensemble learning techniques. Section 4 presents the experimental setup and dataset details used for assessing model performance and discusses the results and their implications, followed by a discussion of the limitations and future directions. Finally, Section 5 concludes the paper and outlines potential research opportunities.

2. Related Works

The advancement of deep learning has revolutionized image classification and object detection tasks, significantly impacting tourism applications. Recent research has explored various deep learning models and ensemble strategies to improve the robustness and accuracy of tourist destination classification. This section presents an overview of related works in deep learning-based tourist image classification, ensemble learning techniques, and AI-driven smart tourism solutions.

2.1. Deep Learning for Tourist Destination Classification

Deep learning, particularly convolutional neural networks (CNNs), has been widely adopted for tourism image classification. Pretrained models such as VGGNet, ResNet, and MobileNet have demonstrated promising results in classifying tourist attractions based on visual data [1,13]. Object detection models, including Faster R-CNN, YOLO, and SSD, have further improved real-time landmark recognition, enabling automated tourist assistance applications [4,14]. Augmented reality (AR) applications have also been employed to enhance visitor experiences by integrating AI-driven image recognition into cultural heritage exploration [5,15]. However, challenges such as environmental variations, occlusions, and dataset biases affect model generalization [16,17].

To address these issues, researchers have proposed data augmentation techniques and synthetic dataset generation to enhance model robustness [20,22]. The development of Internet of Things (IoT)-based smart tourism systems has also played a role in improving the efficiency of AI-driven destination classification [23,24]. Studies have highlighted the importance of structured datasets with diverse geographical coverage for improving classification performance [2,3]. Additionally, multi-modal deep learning approaches integrating textual metadata with visual data have been explored to enhance destination recognition accuracy [25,26]. AI-based approaches have also been applied for optimizing image classification in tourism marketing and recommendation systems [27,28].

Despite these advancements, many existing approaches are still limited in their ability to generalize across diverse geographic regions and cultural contexts, as highlighted by [2,3]. Furthermore, variations in lighting and occlusions remain significant challenges for accurate landmark classification [16,17]. These limitations directly motivate the need for improved data diversity and robust preprocessing strategies, as proposed in our study.

2.2. Ensemble Learning for Robust Image Classification

Ensemble learning has emerged as an effective strategy for improving deep learning model performance by combining predictions from multiple models. Techniques such as bagging, boosting, and multi-epoch ensembling have been used in image classification tasks to enhance accuracy and stability [29,30].

Recent studies have demonstrated that ensemble-based deep learning models outperform single-model approaches in tourist image classification by leveraging diverse feature representations [6,31]. In particular, multi-epoch integration methods have been applied to optimize classification performance, as models trained across different epochs capture complementary features [21,32]. The use of hybrid ensembling, which combines different architectures such as CNNs and transformers, has also gained attention for its superior performance in complex classification tasks [16,33].

AI-based ensemble models have been effectively utilized for sustainable tourism applications, including predictive analytics for visitor demand forecasting and congestion control [7,34]. Furthermore, the integration of deep learning models with remote sensing technologies has improved large-scale landscape and cultural heritage classification efforts [35,36]. Sentiment analysis of social media data using AI-driven techniques has further contributed to optimizing tourism experiences and decision-making [37,38].

While ensemble learning has been shown to improve classification performance across various domains [20,21], limited studies have explored its integration with lightweight YOLO-based architectures for landmark recognition. This gap motivates our adoption of a multi-epoch YOLO11 ensemble strategy to improve robustness and accuracy in tourist destination classification.

2.3. Smart Tourism and AI-Driven Destination Recognition

The integration of AI and smart tourism has enhanced visitor experiences through automated recommendations and real-time analytics. AI-powered tourism recommendation systems leverage deep learning models to analyze user preferences, enabling personalized destination suggestions [8,9]. Augmented reality applications utilizing object detection models have further improved interactive tourism experiences by overlaying real-time information onto historical sites and landmarks [18,19].

In addition, big data analytics combined with deep learning has been applied in tourism management for demand forecasting and congestion control [39,40]. IoT-driven smart tourism applications have also been developed for real-time visitor tracking and destination sustainability management [10,41]. AI-powered sentiment analysis of social media content has further contributed to optimizing tourism marketing strategies [42,43]. AI-driven approaches have also been integrated into crisis management systems for tourism, ensuring resilience against environmental and economic disruptions [44,45].

Despite these advancements, AI-based destination recognition still faces challenges related to scalability and real-world deployment. Limited dataset diversity and computational constraints remain barriers to widespread adoption [46,47]. Researchers continue to explore novel deep learning architectures and optimization techniques to improve the efficiency and effectiveness of tourism classification models [48,49]. The integration of AI in tourism risk management and sustainable development planning has further emphasized the growing importance of data-driven decision-making in this field [11,50].

Although AI-driven destination recognition contributes significantly to smart tourism systems, real-world deployment still faces challenges related to scalability, computational efficiency, and generalization [7,34]. These observations further justify the need for our proposed methodology, which combines efficient YOLO11 architectures with ensemble learning and pixel-enhanced preprocessing to address these concerns.

This study addresses these challenges by introducing a YOLO11-based ensemble learning framework that integrates multi-epoch training and structured dataset curation. The proposed approach enhances classification robustness by leveraging multiple model snapshots, contributing to the advancement of AI-driven tourism applications.

3. Proposed Methodology

The methodology adopted in this study was designed to address the research questions formulated in the Introduction. The overall process consists of the following main stages: (1) data collection from culturally rich tourist destinations; (2) pixel enhancement preprocessing to emphasize high-intensity landmark features; (3) independent training of YOLO11 models on both original and enhanced image sets; (4) the selection of best-performing models from each training pipeline; (5) the integration of model outputs through logit-based ensemble averaging; and (6) performance evaluation using multiple classification metrics. The detailed procedure is illustrated in Figure 2. Each stage is described in detail in the following subsections. Accurate classification of historical landmarks plays a crucial role in enhancing tourism experiences and preserving cultural heritage. Traditional deep learning-based classification methods often struggle with variations in lighting, occlusions, and architectural similarities among landmarks. In this work, we propose an ensemble-based YOLO11 approach that leverages both original and pixel-enhanced image data to improve classification accuracy. Our method consists of systematic dataset preparation, preprocessing for feature enhancement, dual-model training, and logit-based ensembling to generate robust predictions as presented in Figure 2. It should be noted that while ensemble learning and data preprocessing have been studied in various domains, the proposed integration of pixel-level enhancement using high-intensity squaring combined with multi-epoch ensemble training of YOLO11 models is novel in the context of landmark classification. The pixel enhancement technique applied in this work has been newly designed by the authors to emphasize reflective architectural features commonly present in cultural heritage sites, which has not been previously applied in tourist destination recognition. The overall ensemble framework builds upon general ensemble learning principles [20,21], but its specific implementation with YOLO11 models trained on both original and enhanced images represents an original methodological contribution of this study. The step-by-step methodology is described below.

Dataset Preparation:
The dataset used in this study was custom-developed by capturing images of historical sites in Samarkand. Photographs were taken from multiple angles and under varying lighting conditions to ensure a diverse and representative dataset. The dataset was manually labeled into distinct historical landmark classes. Following collection, the dataset was divided into training (80%), validation (10%), and test (10%) subsets to ensure robust model evaluation.
Image Preprocessing:
To enhance image quality and facilitate more effective feature extraction, we applied a custom preprocessing technique in which pixel values exceeding 225 were squared:

$I_{e n h a n c e d} (x, y) = \{\begin{matrix} p {(x, y)}^{2}, & if p (x, y) > 225 \\ p (x, y), & o t h e r w i s e \end{matrix}$

(1)

This approach is motivated by the fact that high-intensity pixels, typically in the 225 to 255 range, often correspond to reflective surfaces or sunlit architectural elements such as domes, tiles, and marble structures, which are common in historical landmarks. These regions, while visually distinctive, may be underrepresented in feature learning due to their narrow dynamic range near the saturation limit.
By squaring these high-value pixels, we nonlinearly amplify subtle variations in brightness, thereby making discriminative features in bright areas more prominent. This enhancement helps the convolutional filters to better capture fine structural details, which is especially important for classifying architecturally similar landmarks under challenging lighting conditions.
Importantly, the transformation minimally affects the overall image distribution, as only a small subset of pixels typically exceeds the threshold. Following enhancement, all images were resized to a uniform resolution of 640×640 pixels and normalized to the [0, 1] range to ensure consistent input for training.
Parallel Model Training:
Two independent YOLO11 models were trained:
-
$M_{E}$ , trained on pixel-enhanced images.
-
$M_{O}$ , trained on original images.
Both models were trained using the same architecture and hyperparameters, ensuring a fair comparison. The best epoch was selected based on validation accuracy and stored as the final model checkpoint.
Logit Extraction and Ensemble Strategy:
Once both models were trained, logits were extracted from the validation set. The logits, which represent raw class confidence scores before softmax normalization, were then ensembled using an averaging approach:

$z^{(e n s)} = \frac{z^{(E)} + z^{(O)}}{2},$

where $z^{(E)}$ and $z^{(O)}$ denote the logits from $M_{E}$ and $M_{O}$ , respectively. The final class prediction for each image was determined by selecting the class with the highest logit value.
Evaluation Metrics:
The performance of the ensemble model was evaluated using standard classification metrics:

$A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$

$P r e c i s i o n = \frac{T P}{T P + F P}$

$R e c a l l = \frac{T P}{T P + F N}$

$F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. These metrics ensure a comprehensive assessment of model accuracy and reliability. (See Algorithm 1).

Algorithm 1 YOLO11-Based Ensemble Model with Pixel-Enhanced Training for Smart Tourism Landmark Recognition

Require:: Samarkand dataset $D = {(x_{i}, y_{i})}_{i = 1}^{N}$ , where $x_{i}$ are images and $y_{i}$ are labels.
Ensure:: Trained ensemble model E for historical place classification.
1:: Data Preprocessing:
2:: Resize all images to $640 \times 640$ pixels.
3:: Normalize images using

$x_{i}^{'} = \frac{x_{i} - μ}{σ}, x_{i}^{'} \leftarrow resize (x_{i}^{'}, 640, 640)$
4:: Apply pixel enhancement for images: if pixel value $p > 225$ , then $p = p^{2}$ .
5:: Train YOLO11 on Enhanced Images:
6:: Initialize YOLO11 model $M_{E}$ .
7:: for $e p o c h = 1$ to N do
8:: Train $M_{E}$ on enhanced dataset $D_{t r a i n}^{e n h a n c e d}$ .
9:: Evaluate accuracy on validation set $V$ .
10:: if accuracy is highest then
11:: Save best-performing model $M_{E}^{*}$ .
12:: Train YOLO11 on Original Images:
13:: Initialize YOLO11 model $M_{O}$ .
14:: for $e p o c h = 1$ to N do
15:: Train $M_{O}$ on original dataset $D_{t r a i n}^{o r i g i n a l}$ .
16:: Evaluate accuracy on validation set $V$ .
17:: if accuracy is highest then
18:: Save best-performing model $M_{O}^{*}$ .
19:: Logit Extraction:
20:: for each image x in validation set $V$ do
21:: Obtain logits from both models:

$z^{(E)} = M_{E}^{*} (x), z^{(O)} = M_{O}^{*} (x)$
22:: Logit Ensemble Strategy:
23:: for each image x in validation and test sets do
24:: Compute ensemble logits:

$z^{(e n s)} = \frac{z^{(E)} + z^{(O)}}{2}$
25:: Determine predicted class:

$y_{p r e d} = arg max (z^{(e n s)})$
26:: Evaluate Ensemble:
27:: Evaluate ensemble model E on test set $T$ using

$Accuracy, Precision, Recall, F 1 - score$
28:: return Performance metrics of ensemble model E.

The proposed approach offers several advantages that are substantiated by the experimental results presented in Section 4. First, the ensemble strategy improves classification accuracy, as demonstrated by the significant performance gains over baseline models such as MobileNetV3, ResNet50, EfficientNetB0, and single YOLO11 models (see Table 1). This improvement is consistent with previous ensemble learning studies, which report enhanced model stability and accuracy through the integration of diverse feature representations [20,21]. Second, the dual-model training incorporating pixel-enhanced images improves robustness to environmental variations such as lighting and occlusions, which are well-documented challenges in the literature [16,17]. Third, the logit-based ensembling approach provides computational efficiency at inference time, avoiding additional postprocessing steps while leveraging the complementary strengths of each model. The demonstrated improvements across accuracy, precision, recall, and F1-score validate the practical advantages of the proposed methodology in smart tourism scenarios.

By addressing existing challenges in landmark classification, this methodology lays a foundation for further advancements in the field of cultural heritage preservation and AI-driven tourism applications. The improved classification accuracy, robustness against environmental variability, optimized inference speed, and scalability of the system make it a valuable tool for automated landmark recognition and smart tourism solutions.

4. Experiments

To validate the effectiveness of the proposed ensemble model for tourist landmark recognition, a series of controlled experiments were conducted using the curated Samarkand dataset. The experiments aim to assess the classification performance of individual YOLO11 models trained on both original and pixel-enhanced images, as well as their combination through a multi-epoch ensemble strategy. Key evaluation metrics such as accuracy, precision, recall, and F1-score were employed to provide a comprehensive analysis of model performance. In this section, we describe the experimental configuration, dataset characteristics, training strategy, and the results obtained, followed by a comparative analysis with baseline methods and a discussion on the model’s robustness under varying visual conditions.

4.1. Dataset

The Samarkand dataset (https://www.kaggle.com/datasets/ulugbekhudayberdiev/photos-of-touristic-places-in-samarkand; accessed on 10 March 2025) is a curated collection of images designed for the classification of historical landmarks in Samarkand, one of the world’s most culturally rich and ancient cities. It consists of a total of 1054 images, divided into 946 training images and 108 validation images, providing a structured and balanced foundation for model development and evaluation. The dataset includes images from nine distinct classes representing major landmarks: Bibikhonim, Gur Emir, Ruhobod, Sherdor, Tillakori Madrasasi, Ulugbek Rasadxonasi, Ulugbek Madrasasi, Xizr Majmuasi, and Shokhi Zinda. To ensure compatibility with deep learning frameworks, all images were resized to 640 × 640 resolution and their pixel values normalized to the range [0, 1] to promote training stability and convergence. Additionally, we created an enhanced version of the dataset by applying a pixel-level transformation in which all pixel values greater than 225 were squared. This preprocessing step was designed to amplify high-intensity regions, often corresponding to reflective architectural features, allowing the model to capture more discriminative visual cues. Both the original and enhanced datasets were used in training parallel YOLO11n-cls models to evaluate the impact of pixel-level enhancement on classification performance. The dataset was split into training and validation sets using a 90:10 ratio, ensuring robust model evaluation while minimizing overfitting. Figure 3 displays representative samples from both the original and enhanced versions of the dataset, highlighting the visual distinctions introduced by the preprocessing strategy.

4.2. Baseline Models

To establish a comprehensive benchmark for the performance of our proposed YOLO11-based ensemble learning model, we selected several well-known deep learning architectures due to their proven capabilities in image classification and object detection tasks. These baseline models include MobileNetV3, EfficientNetB0, ResNet50, and YOLO11N. Below, we briefly discuss each model’s architecture and its relevance to our study:

MobileNetV3 [51]—Developed by Google, MobileNetV3 is optimized for mobile devices and emphasizes a balance between latency and accuracy. Its architecture incorporates efficient building blocks like depthwise separable convolutions and is enhanced with architecture search techniques and a novel activation function, h-swish. This model is particularly suitable for real-time applications and has shown effectiveness in image classification tasks involving constrained computational resources.
EfficientNetB0 [52]—As a part of the EfficientNet family, EfficientNetB0 scales uniformly at the dimension of depth, width, and resolution with a compound coefficient. Introduced to structure the scaling of CNNs for better efficiency and accuracy, it achieves higher accuracy with fewer parameters, making it ideal for handling diverse and complex datasets like tourist destination images.
ResNet50 [53]—A member of the Residual Networks family, ResNet50 features “skip connections” that facilitate the training of much deeper networks by addressing the vanishing gradient problem. This architecture improves the classification performance significantly on large-scale image datasets.
YOLO11N [12]—An extension of the YOLO (You Only Look Once) family, YOLO11N is tailored for object detection with a focus on balancing speed and accuracy. It is capable of detecting objects in real time, making it highly suitable for applications like tourist destination recognition where quick and efficient processing of visual information is required.

These baseline models were selected based on their architectural diversity and proven track record in handling complex image classification and object detection tasks. They provide a robust framework for comparing the performance of our proposed ensemble model, ensuring that any observed improvements can be attributed to the novel aspects of our approach, including multi-epoch training and optimized data preprocessing.

4.3. Training Setup

To train the YOLO11n-cls-based ensemble model for tourist destination classification, we utilized a high-performance computing environment consisting of an NVIDIA GeForce RTX 3090 GPU with 24 GB VRAM, leveraging CUDA 11.2 for accelerated training, an Intel Core i9-11900K processor for efficient data preprocessing, and 128 GB of RAM to handle the full Samarkand dataset without memory constraints. The YOLO11n-cls models were trained using a batch size of 16 with an input image resolution of 640 × 640 pixels, employing the Cross-Entropy Loss function for classification. We used the Adam optimizer with an initial learning rate of 0.001, incorporating a cosine learning rate scheduler and warm-up strategy to improve convergence stability. Early stopping and model checkpointing were based on validation accuracy, with only the best-performing epoch being preserved for final ensembling. This setup enabled robust model training across original and enhanced datasets while maintaining computational efficiency and scalability. To ensure reproducibility, we provide additional details on the training configuration. Both the original and enhanced YOLO11n-cls models were trained independently for a maximum of 100 epochs. Early stopping with a patience of 10 epochs was applied based on validation accuracy to prevent overfitting. Data augmentation techniques such as random horizontal flips and slight rotations (±10 degrees) were applied during training to enhance model generalization. A fixed random seed (42) was used to ensure consistent dataset splitting and training reproducibility. All experiments were conducted on the same hardware environment to minimize computational variability. The best-performing epoch for each model, as determined by the highest validation accuracy, was preserved for ensemble integration.

4.4. Experimental Results and Discussion

This subsection presents the results of our experiments conducted to evaluate the effectiveness of the proposed YOLO11-based ensemble learning model for tourist destination classification. The experiments were designed to test the accuracy, precision, recall, and F1-score of the model on both original and enhanced image datasets.

The ensemble model’s performance was evaluated on a curated dataset comprising images of significant tourist attractions in Samarkand. The dataset was split into training, validation, and test subsets, with the model trained separately on original and enhanced images. Performance metrics were calculated for both the original model (

M_{O}

) and the ensemble model, combining logits from both

M_{O}

and

M_{E}

.

Table 1 summarizes the performance metrics obtained for the ensemble model compared to individual models and baseline architectures. The proposed ensemble model demonstrated superior performance, achieving a remarkable accuracy of 99.07%, precision of 99.15%, recall of 99.21%, and an F1-score of 99.14%. In contrast, individual models and baseline models performed with lower metrics, underscoring the effectiveness of the ensemble approach in dealing with complex classification tasks.

In analyzing the performance metrics presented in Table 1, the proposed YOLO11-based ensemble model outperforms all baseline models across accuracy, precision, recall, and F1-score. Notably, it achieves a +0.92% accuracy improvement over the standalone YOLO11n model and a +7.41% improvement over MobileNetV3, which, although optimized for lightweight deployment, demonstrates reduced classification fidelity on complex visual patterns. The superior performance of the ensemble model is attributed to its integration of complementary representations: one model emphasizes high-intensity architectural features through pixel enhancement, while the other preserves the global visual context.

The YOLO11n model alone already outperforms EfficientNetB0 and ResNet50, underscoring its robustness in object localization and spatial awareness—key characteristics in landmark classification. YOLO architectures are inherently suited for such tasks due to their real-time detection capability, anchor-based localization, and fine-grained feature pyramid design. These properties allow the model to better handle intra-class variability, such as occlusions and differences in architectural symmetry, which are prevalent in tourism landmark imagery.

The ensemble further amplifies these advantages by averaging logits from both training streams, thereby reducing prediction variance and capturing subtle but class-discriminative features missed by individual models. This hybrid approach results in fewer misclassifications, particularly under challenging lighting conditions or partial occlusions, as also visualized in the confusion matrices (Figure 4). The clear margin of improvement suggests that ensembling not only improves quantitative performance but also enhances model generalizability and reliability in real-world applications.

Figure 4 depicts the confusion matrices for the original model and the enhanced model. The matrices illustrate the number of correct predictions (diagonal elements) against misclassifications (off-diagonal elements). Analysis of these matrices shows that the enhanced model tends to have fewer misclassifications across most classes, particularly in classes where the original model struggled, indicating the benefits of image enhancement techniques in improving classification accuracy.

Figure 5 showcases the prediction samples for the original and enhanced models. The top row displays the predicted labels, and the bottom row shows the true labels for comparison. Visual inspection of these results reveals that the enhanced model provides clearer and more accurate predictions, particularly in cases where the original model misclassified due to poor lighting conditions or occlusions. This visual evidence supports the quantitative findings and highlights the practical improvements achieved through our proposed enhancements.

The experimental results presented in this study clearly validate the effectiveness of the proposed contributions. The pixel enhancement preprocessing method successfully emphasized high-intensity architectural features, leading to improved robustness under varying environmental conditions, as demonstrated by higher precision and recall scores compared to baseline models. Additionally, the multi-epoch ensemble learning strategy, which integrates YOLO11 models trained on both original and enhanced datasets, consistently outperformed individual models such as MobileNetV3, EfficientNetB0, ResNet50, and single YOLO11 models across all evaluation metrics. These improvements directly confirm the advantages of the proposed framework in addressing dataset diversity limitations, enhancing robustness to environmental variability, and leveraging ensemble learning to improve classification performance. The results further demonstrate the applicability of the proposed method for real-world smart tourism applications, particularly in diverse and culturally rich destinations such as Samarkand.

The results obtained from the experimental evaluations demonstrate the effectiveness of integrating multiple YOLO11 models trained on distinct datasets. The ensemble approach effectively mitigates the weaknesses inherent in individual models by leveraging diverse data representations and learning epochs, leading to a significant increase in classification performance. The superior performance of the enhanced model also suggests that our preprocessing techniques, which include dynamic range adjustments and noise reduction, contribute significantly to improving the model’s robustness and accuracy. These techniques help in accentuating important features in the images that are crucial for the classification of complex and varied tourist sites. Additionally, the scalability of our approach is evident from the consistent performance across different classes and varied environmental conditions, making it suitable for real-world applications in smart tourism systems.

4.5. Limitations and Future Work

Despite the promising results achieved by the proposed YOLO11-based ensemble model, several limitations warrant discussion:

Dataset Scope and Generalizability: The current study focuses solely on historical landmarks in Samarkand. While the curated dataset ensures high intra-class diversity, the model’s ability to generalize to different geographic or architectural contexts remains untested. Cross-city validation is necessary to evaluate broader applicability.
Computational Complexity: The ensemble approach involves training two independent YOLO11 models, which increases training time and resource consumption. Although inference remains efficient due to logit-level fusion, the training phase may not be feasible for low-resource environments.
Environmental Limitations: The dataset does not comprehensively cover extreme lighting conditions (e.g., nighttime scenes) or severe occlusions (e.g., crowds). Additional robustness testing under such conditions is needed to confirm deployment readiness.
Lack of Real-Time Field Testing: While model performance on curated data is encouraging, the system has not yet been evaluated in a live smart tourism setting. Future studies should integrate the model into mobile applications or AR systems to assess real-time effectiveness and user experience.

In future work, we plan to expand the dataset to include multiple cities and architectural styles, explore lightweight model variants for mobile deployment, and conduct user-centered evaluations in real-world tourism environments.

The proposed YOLO11-based ensemble learning model demonstrates a significant improvement in the automated classification of tourist destinations. By integrating models trained on original and enhanced datasets, our approach achieves high accuracy and robustness, making it a valuable tool for enhancing smart tourism applications and supporting cultural heritage preservation through advanced AI-driven technologies.

5. Conclusions

This study has demonstrated the efficacy of a YOLO11-based ensemble learning model optimized for the classification of tourist destinations using both original and enhanced image datasets. By employing a multi-epoch integration approach and leveraging a structured dataset, the proposed model has shown significant improvements in accuracy, precision, recall, and F1-score over conventional models.

The ensemble model effectively combined the strengths of models trained on original and enhanced datasets, showcasing robustness against various challenges such as environmental variations and occlusions. This approach not only improved classification accuracy but also ensured that the model was adaptable to different lighting conditions and architectural complexities.

The key contributions of this research include

The development of a robust ensemble learning framework that integrates multiple deep learning models to enhance classification performance;
The implementation of advanced image preprocessing techniques that optimize feature extraction and improve model reliability;
Extensive evaluation of the model’s performance, demonstrating its superiority over traditional single-model approaches in the context of smart tourism applications.

The findings of this study have significant implications for smart tourism, particularly in enhancing real-time landmark recognition, which can enrich the tourist experience and aid in the management of tourist flows. The integration of this model into smart tourism applications can facilitate more personalized and interactive experiences for tourists, contributing to the sustainable management of tourism sites.

Despite the promising results, there are several avenues for future research:

Exploring the application of the model in different geographic and cultural contexts to further validate its effectiveness and adaptability.
Investigating the integration of additional modalities such as textual or audio data to enhance the classification capabilities of the model.
Developing more computationally efficient models to enable real-time processing on mobile devices, thus expanding the practical applications of this research in the field of mobile tourism.

In conclusion, this study contributes to the fields of AI and smart tourism by demonstrating the potential of deep learning models, particularly YOLO11 ensembles, in improving the accuracy and robustness of tourist destination classification. These advancements are crucial for the development of intelligent systems that enhance both the tourist experience and the sustainability of tourism practices.

Author Contributions

Methodology, U.H.; Software, U.H.; Writing—original draft, U.H.; Writing—review & editing, J.L.; Visualization, U.H.; Supervision, J.L.; Project administration, J.L.; Funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Chungbuk National University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jabbari, M.; Amini, M.; Malekinezhad, H.; Berahmand, Z. Improving augmented reality with the help of deep learning methods in the tourism industry. Math. Comput. Sci. 2023, 4, 33–45. [Google Scholar]
Pencarelli, T. The digital revolution in the travel and tourism industry. Inf. Technol. Tour. 2020, 22, 455–476. [Google Scholar] [CrossRef]
Guerrero-Rodríguez, R.; Álvarez-Carmona, M.Á.; Aranda, R.; Díaz-Pacheco, Á. Big data analytics of online news to explore destination image using a comprehensive deep-learning approach: A case from Mexico. Inf. Technol. Tour. 2024, 26, 147–182. [Google Scholar] [CrossRef]
Melo, M.; Coelho, H.; Gonçalves, G.; Losada, N.; Jorge, F.; Teixeira, M.S.; Bessa, M. Immersive multisensory virtual reality technologies for virtual tourism. Multimed. Syst. 2022, 28, 1027–1037. [Google Scholar] [CrossRef]
Bhosale, T.A.; Pushkar, S. IWF-ECTIC: Improved Wiener filtering and ensemble of classification model for tourism image classification. Multimed. Tools Appl. 2024. [Google Scholar] [CrossRef]
Tussyadiah, A. Intelligent automation systems in tourism. Tour. Manag. 2023, 77, 254–266. [Google Scholar]
Li, Z.; Gao, S.; Chen, W. Integrating IoT and AI for tourism management. Tour. Technol. 2023, 12, 88–99. [Google Scholar]
Wang, P.; Jiang, Y.; Li, X. Smart technologies for sustainable tourism. Tour. Dev. Rev. 2023, 24, 115–127. [Google Scholar]
He, Q.; Wu, J.; Zhang, L. Integrating sustainability metrics in tourism management. Sustain. Tour. 2023, 9, 200–214. [Google Scholar]
Zhang, X.; Wei, L.; Li, X. Sentiment analysis of social media data for tourism marketing. J. Soc. Media Tour. 2023, 12, 110–122. [Google Scholar]
Zhang, L.; Huang, S.; Li, T. The role of data analytics in tourism decision-making. J. Tour. Anal. 2023, 12, 125–139. [Google Scholar]
Jocher, G.; Qiu, J. Ultralytics YOLO11: Real-Time Object Detection Model (Version 11.0.0). Ultralytics. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 15 April 2025).
Viyanon, W. An Interactive Multiplayer Mobile Application Using Feature Detection and Matching for Tourism Promotion. In Proceedings of the 2nd International Conference on Control and Computer Vision, Jeju Island, Republic of Korea, 15–18 June 2019; pp. 82–86. [Google Scholar]
Bui, V.; Alaei, A. Virtual reality in training artificial intelligence-based systems: A case study of fall detection. Multimed. Tools Appl. 2022, 81, 32625–32642. [Google Scholar] [CrossRef]
Carneiro, A.; Nascimento, L.S.; Noernberg, M.A.; Hara, C.S.; Pozo, A.T.R. Social media image classification for jellyfish monitoring. Aquat. Ecol. 2024, 58, 3–15. [Google Scholar] [CrossRef]
Yao, J.; Chu, Y.; Xiang, X.; Huang, B.; Xiaoli, W. Research on detection and classification of traffic signs with data augmentation. Multimed. Tools Appl. 2023, 82, 38875–38899. [Google Scholar] [CrossRef]
Ma, H. Development of a smart tourism service system based on the Internet of Things and machine learning. J. Supercomput. 2024, 80, 6725–6745. [Google Scholar] [CrossRef]
Wu, W.; Liu, H.; Zhang, X. AI in tourism marketing: Improving decision-making. J. AI Tour. 2023, 10, 65–79. [Google Scholar]
Liu, H.; Tsionas, M.; Assaf, P. Personalized tourism experiences with AI. AI Appl. Tour. 2023, 13, 34–46. [Google Scholar]
Hao, Y.; Zheng, L. Application of SLAM method in big data rural tourism management in dynamic scenes. Soft Comput. 2023. [Google Scholar] [CrossRef]
Schorr, J.L.; Bhattacharya, P.; Yan, D. Predictive analytics for demand forecasting in tourism. Tour. Econ. 2024, 32, 103–118. [Google Scholar]
Pingdong, H. Application of optical imaging detection based on embedded edge computing in the evaluation of forest park tourism resources. Opt. Quantum Electron. 2024, 56, 642. [Google Scholar] [CrossRef]
Lee, C.-Y.; Khanum, A.; Kumar, P.P. Multi-food detection using a modified Swin-Transformer with recursive feature pyramid network. Multimed. Tools Appl. 2024, 83, 57731–57757. [Google Scholar] [CrossRef]
Martín-Rojo, I.; Gaspar-González, A.I. The impact of social changes on MICE tourism management in the age of digitalization: A bibliometric review. Rev. Manag. Sci. 2024, 18, 1–24. [Google Scholar] [CrossRef]
Dang, Q.M.; Truong, M.T.; Dang, T.L. A lightweight approach for image quality assessment. Signal Image Video Process. 2024, 18, 6761–6768. [Google Scholar] [CrossRef]
Fadli, H.; Ibrahim, R.; Arshad, H.; Yaacob, S. Augmented reality in cultural heritage tourism: A review of past study. Open Int. J. Inf. 2022, 10, 109–121. [Google Scholar]
Aicardi, I.; Chiabrando, F.; Lingua, A.M.; Noardo, F. Recent trends in cultural heritage 3D survey: The photogrammetric computer vision approach. J. Cult. Herit. 2018, 32, 257–266. [Google Scholar] [CrossRef]
Patel, K.; Parmar, B. Assistive device using computer vision and image processing for visually impaired; review and current status. Disabil. Rehabil. Assist. Technol. 2020, 16, 115–125. [Google Scholar] [CrossRef]
Budrionis, A.; Plikynas, D.; Daniušis, P.; Indrulionis, A. Smartphone-based computer vision travelling aids for blind and visually impaired individuals: A systematic review. Assist. Technol. 2020, 34, 178–194. [Google Scholar] [CrossRef]
Loi, K.I.; Kong, W.H. Tourism for all: Challenges and issues faced by people with vision impairment. Tour. Plan. Dev. 2017, 14, 181–197. [Google Scholar] [CrossRef]
Ivanov, P.; Webster, A.; Lee, C. The role of robotic systems in enhancing customer experience at tourism venues. Tour. Hosp. Res. 2023, 19, 105–121. [Google Scholar]
Assaf, A.G.; Josiassen, E.T.; Tsionas, M. Optimizing tourism operations using big data. J. Hosp. Tour. Manag. 2023, 23, 45–58. [Google Scholar]
Feng, L.; Liu, H.; Zhang, X. Analyzing tourist preferences with big data analytics. Tour. Res. J. 2023, 25, 120–135. [Google Scholar]
Zhao, Y.; Wu, J.; Li, C. Data-driven technologies in tourism management. Tour. Rev. 2023, 29, 101–113. [Google Scholar]
Zhou, Y.; Xie, W.; Wang, M. Mobile applications for enhancing tourism experiences. J. Travel Technol. 2023, 15, 45–56. [Google Scholar]
Maier, A.; Hill, M.D.S.; Tseng, T.C.F. Mobile AR applications for educational tourism. Tour. Technol. Innov. 2023, 19, 75–88. [Google Scholar]
Backer, A.R.; Ritchie, B.W. Augmented reality in tourism marketing. J. Mark. Tour. 2023, 16, 140–156. [Google Scholar]
Faulkner, A.; Tsionas, M. Sustainable practices in cultural heritage tourism. Tour. Sustain. J. 2023, 11, 123–135. [Google Scholar]
Tsionas, M.; Assaf, A.G. AI for tourism service quality assessment. Tour. Serv. Manag. J. 2023, 14, 48–60. [Google Scholar]
Liu, T.; Guo, S.; Chen, Q. Predicting and managing tourism crises with AI. Crisis Manag. Tour. 2023, 18, 90–102. [Google Scholar]
Tsionas, M.; Assaf, A.G. Frontier methods in performance modeling for tourism. Tour. Perform. J. 2023, 15, 45–60. [Google Scholar]
He, Q.; Zhang, L.; Ma, X. Using deep learning for tourism demand forecasting. Tour. Forecast. Anal. 2023, 18, 200–213. [Google Scholar]
Zhang, X.; Wu, J.; Zhao, Q. Optimizing tourism resource management using AI. Tour. Resour. Manag. 2023, 12, 80–93. [Google Scholar]
Li, L.; Liu, H.; Zhao, Q. AI-driven decision-making for tourism marketing. Tour. Mark. J. 2023, 30, 120–134. [Google Scholar]
Liu, Y.; Chen, T.; Zhang, X. Risk management in tourism with AI. Tour. Risk Manag. 2023, 22, 134–146. [Google Scholar]
Ma, X.; Wang, Y.; Xie, J. AI and IoT in smart tourism cities. Smart Tour. City J. 2023, 16, 101–115. [Google Scholar]
Zhou, W.; Li, L.; Zhang, X. Improving tourist engagement using AI. Tour. Engagem. J. 2023, 17, 55–67. [Google Scholar]
Wu, J.; Zhang, W.; Li, S. AI and big data in tourism crisis management. Tour. Crisis Manag. 2023, 14, 80–91. [Google Scholar]
Zhao, Y.; Li, Q.; Liu, H. Enhancing tourist experiences through AI-driven services. J. Hosp. Technol. 2023, 18, 100–115. [Google Scholar]
Liu, H.; Zhang, Y.; Wu, X. Smart tourism development through AI and IoT. Tour. Innov. J. 2023, 16, 90–103. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]

Figure 1. Overall workflow of proposed YOLO11-based ensemble model for smart tourism landmark classification.

Figure 2. Illustration of proposed method.

Figure 3. Sample images from the original Samarkand dataset and the enhanced Samarkand dataset.

Figure 4. Confusion matrices for ensemble members. (a) Confusion matrix for the original model. (b) Confusion matrix for the enhanced model.

Figure 5. Comparison of prediction samples for the original and enhanced models. Top row: predicted labels. Bottom row: true labels. (a) Original dataset with YOLO11n—predicted label. (b) Enhanced dataset with YOLO11n—predicted label. (c) Original dataset with YOLO11n—true label. (d) Enhanced dataset with YOLO11n—true label.

Table 1. Performance comparison of baseline models and proposed model for validation set of Samarkand dataset.

Model	Accuracy	Precision	Recall	F1-Score
MobileNet_V3	0.9167	0.9198	0.9167	0.9164
ResNet50	0.8889	0.8990	0.8889	0.8849
EfficientNet_B0	0.9444	0.9475	0.9444	0.9443
YOLO11n-cls	9815	0.9814	0.9820	0.9813
Proposed Model	0.9907	0.9915	0.9921	0.9914

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hudayberdiev, U.; Lee, J. An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models. Sustainability 2025, 17, 5420. https://doi.org/10.3390/su17125420

AMA Style

Hudayberdiev U, Lee J. An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models. Sustainability. 2025; 17(12):5420. https://doi.org/10.3390/su17125420

Chicago/Turabian Style

Hudayberdiev, Ulugbek, and Junyeong Lee. 2025. "An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models" Sustainability 17, no. 12: 5420. https://doi.org/10.3390/su17125420

APA Style

Hudayberdiev, U., & Lee, J. (2025). An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models. Sustainability, 17(12), 5420. https://doi.org/10.3390/su17125420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Ensemble Deep Learning Framework for Smart Tourism Landmark Recognition Using Pixel-Enhanced YOLO11 Models

Abstract

1. Introduction

2. Related Works

2.1. Deep Learning for Tourist Destination Classification

2.2. Ensemble Learning for Robust Image Classification

2.3. Smart Tourism and AI-Driven Destination Recognition

3. Proposed Methodology

4. Experiments

4.1. Dataset

4.2. Baseline Models

4.3. Training Setup

4.4. Experimental Results and Discussion

4.5. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI