Image-Based Analysis of Tourist Destination Perceptions: A Deep Learning and Spatial–Temporal Study in Slovenia

Paliska, Dejan; Brezovec, Aleksandra; Sedmak, Gorazd

doi:10.3390/tourhosp7020052

Open AccessArticle

Image-Based Analysis of Tourist Destination Perceptions: A Deep Learning and Spatial–Temporal Study in Slovenia

by

Dejan Paliska

^*,

Aleksandra Brezovec

and

Gorazd Sedmak

Faculty of Tourism Studies—Turistica, University of Primorkska, Titov Trg 1, 6000 Koper, Slovenia

^*

Author to whom correspondence should be addressed.

Tour. Hosp. 2026, 7(2), 52; https://doi.org/10.3390/tourhosp7020052

Submission received: 5 January 2026 / Revised: 6 February 2026 / Accepted: 12 February 2026 / Published: 17 February 2026

(This article belongs to the Special Issue Sustainability of Tourism Destinations)

Download

Browse Figures

Versions Notes

Abstract

In the context of fierce competition among tourist destinations and increasing difficulty of differentiation, developing a strong destination image is particularly important. A comprehensive understanding of how tourists perceive destinations through user-generated images can help destination management organizations (DMOs) design more effective marketing strategies. This is especially relevant for destinations with spatially and temporally dispersed tourism resources and strong seasonal dynamics. This paper analyses inbound tourist photographs by combining deep learning techniques with spatial analysis to examine the spatial and temporal distribution of photo scenes and shifts in scene preferences among tourists. The study focuses on three distinct types of destinations in Slovenia—urban (Ljubljana), nature-based/alpine (Bled), and coastal (Piran, Izola, Koper)—providing insights into how image-based spatial scene analysis can inform destination marketing strategies. The results reveal significant spatial and temporal heterogeneity of scenes across micro destinations. Nature-based destinations exhibit lower topic entropy and fewer topic changes per user, whereas urban destinations show higher variability, with users changing topics on average five times per day. Seasonal effects are moderate: nature-based destinations display lower topic entropy in winter and higher in autumn and spring, coastal destinations show less pronounced seasonal variation, and urban destinations show almost none. These findings provide valuable insights into the spatial and temporal distribution of tourist interests and offer practical guidance for DMOs in strategic marketing planning.

Keywords:

tourist destination image; user-generated content; deep learning; spatial-temporal analysis; destination marketing strategy

1. Introduction

Destination image is a foundational construct in tourism research, shaping destination choice, satisfaction, and post-visit behavior. Traditionally, destination images have been formed through a combination of projected images produced by destination marketing organizations (DMOs) and organic images emerging from media representations and word-of-mouth communication. The proliferation of social media has altered this balance by enabling tourists to actively produce and disseminate destination representations through user-generated content (UGC), particularly photographs, thereby influencing how destinations are perceived and evaluated by broader audiences (Deng et al., 2019).

From a theoretical perspective, destination image is increasingly conceptualized as a socially constructed and dynamic phenomenon arising from interactions between institutional actors and tourists (Marine-Roig & Clavé, 2016; Tasci & Gartner, 2007). While DMOs continue to strategically project destination images, tourists’ visual self-presentations on social media contribute to the reinterpretation, reinforcement, or contestation of these narratives (Lo & McKercher, 2015; Mak, 2017). Such representations may also expose negative aspects of the tourism experience, including crowding or environmental degradation, particularly in destinations experiencing overtourism (Yan et al., 2023).

Despite extensive conceptual development, empirical research on destination image continues to rely heavily on surveys, interviews, and aggregated tourism statistics. These approaches are limited in their ability to capture fine-grained spatial variation and temporal dynamics within destinations (Leung et al., 2013; J. Li et al., 2018; X. Wang et al., 2024).

Visual UGC offers a valuable empirical lens for addressing these limitations, as tourist photographs encode cognitive and affective dimensions of destination image while preserving spatial and temporal specificity (Deng & Li, 2018; Huang et al., 2021; Yoo & Kang, 2025). Recent advances in deep learning and computer vision enable the systematic analysis of such data at scale, facilitating more nuanced examination of destination image structure and variation (Nixon, 2024; Zhou et al., 2018).

Beyond its dimensional structure, destination image is internally structured through hierarchical relationships among image elements and their connections. Research grounded in memory-network theory suggests that tourists’ perceptions tend to cluster around a limited set of highly salient associations (the image “core”), with more peripheral elements recalled less consistently (Lai & Li, 2012; Y. Wang et al., 2018). Moreover, destination image formation is embedded within spatial and temporal contexts. Tourists’ perceptions and behaviors vary across seasons, events, and stages of destination development. Longitudinal analyses of geotagged UGC reveal evolving destination images that reflect shifting tourist attention, emotional engagement, and spatial activity patterns (Encalada-Abarca et al., 2024; Jin et al., 2018). Temporal analyses further demonstrate seasonal and hourly rhythms in activities and affective responses, offering insights relevant for demand management and marketing strategy (Xiao et al., 2022; Yan et al., 2023).

Spatial configuration also plays a critical role in image coherence. As destinations evolve and commercialize, tourism intensity and attraction distribution change, embedding image formation within the physical and functional landscape (X. Wang et al., 2024). Centrally concentrated attractions tend to support more coherent destination images, whereas dispersed attraction patterns are associated with fragmented perceptions (ibid.). Differences in image coherence pose challenges for destination governance, particularly in contexts where branding and management responsibilities are distributed across institutional levels.

UGC, particularly tourist photographs, provides a uniquely rich data source for examining these processes. Visual UGC captures tourists’ emotional states, interests, spatial behavior, and perceptions of attraction complementarity, offering insights beyond those acquired through traditional survey-based approaches (J. Li et al., 2018; Yoo & Kang, 2025). Its high spatial and temporal resolution makes it particularly well suited for analyzing intra-destination heterogeneity and seasonal variation.

Understanding how destination images vary across space and time is particularly important for destinations characterized by multi-level governance structures. Alignment between national tourism organizations (NTOs) and local tourism organizations (LTOs) remains a persistent challenge, as micro destinations often differ substantially in environmental context, heritage assets, and development trajectories (Spyriadis et al., 2013). Insight into spatially differentiated tourist perceptions is therefore essential for coherent destination branding and effective coordination across governance levels (Taecharungroj & Mathayomchan, 2020).

Although UGC-based destination image research has expanded rapidly, three key gaps remain. First, existing studies often rely on aggregated or static representations, limiting insight into intra-destination heterogeneity and seasonal dynamics. Second, deep learning approaches are rarely combined with analyses of image diversity and internal image structure, constraining understanding of how destination images are organized and evolve over time. Third, the implications of spatially differentiated destination images for multi-level destination governance remain underexplored.

Addressing these gaps, this study examines spatial and temporal heterogeneity in visual destination image across three Slovenian micro destinations representing urban, alpine, and coastal typologies. Using deep learning based semantic analysis of inbound tourists’ geotagged photographs (X. Wang et al., 2024; Xiao et al., 2022), the study captures visual image of dimensions at high spatial and temporal resolution. The study makes three contributions. Theoretically, it advances destination image research by conceptualizing image formation as a spatially embedded and temporally dynamic process. Methodologically, it demonstrates how deep learning based visual analytics can be integrated with diversity and spatiotemporal metrics to analyze intra-destination variation. Managerially, it provides evidence relevant to the coordination of national and local destination branding strategies in multi-level governance contexts (Taecharungroj & Mathayomchan, 2020; Yoo & Kang, 2025).

2. Literature Review

2.1. Destination Image in the Digital Context

Destination image has long been recognized as a central determinant of tourist behavior, influencing destination choice, satisfaction, and loyalty (Crompton, 1979; Echtner & Ritchie, 1991; Gartner, 1994; Gunn, 1972; Pike & Page, 2014). It is commonly conceptualized as a multidimensional construct comprising cognitive, affective, and conative components, reflecting tourists’ beliefs, emotional responses, and behavioral intentions toward a destination (Baloglu & McCleary, 1999; Fakeye & Crompton, 1991). In addition, scholars distinguish between projected images intentionally communicated by DMOs and perceived images formed through visitors’ subjective experiences (Chan & Zhang, 2018; Ferrer-Rosell & Marine-Roig, 2020; Stylidis et al., 2022).

Tourism-related UGC in the digital era not only reflects perceived destination image and visitors’ experiential interpretations (Su et al., 2025), but also actively reshapes destination image formation through participatory media dynamics. Social networking platforms, photo-sharing applications, and online travel communities enable tourists to actively co-construct destination images through UGC (Aboalganam et al., 2025; Hunt, 1975; Khan et al., 2021; Taecharungroj & Mathayomchan, 2020). As tourists share photographs, videos, and narratives, they contribute to the production of tourism imaginaries, reinforcing or contesting institutional representations (Salazar, 2012). This shift from predominantly top-down image projection to bottom-up image co-creation has redefined notions of authenticity, authority, and control in destination marketing.

Photographs play a particularly salient role in this process. Visual representations depict landscapes, cultural symbols, activities, and emotional atmospheres that collectively shape perceived destination image (Deng & Li, 2018; D. Kim et al., 2020; Y. W. Li & Wan, 2025; Nixon, 2024). From a theoretical standpoint, visual content can be mapped onto cognitive dimensions through the recognition of attributes such as landmarks, architecture, and natural features, and onto affective dimensions through emotional cues conveyed via color, composition, and depicted expressions. Accordingly, tourist photography functions not merely as a record of experience, but as an active agent in the construction and circulation of destination images (S. Zhang et al., 2025).

Semiotic and interpretive approaches clarify how destination images are constructed and communicated through photographs, showing that online images operate as social–semiotic representations of place that shape perceived destination image beyond scene recognition (Hunter, 2016). Visual semiotic analysis has been used to reveal how these images function as social-semiotic constructs of place, reflecting and contesting marketed representations (Deng & Li, 2018). Furthermore, theories of visual performativity highlight that photography is not a passive record of experience but an active practice through which tourists produce and negotiate meanings of place (Lo & McKercher, 2015). Integrating these perspectives into destination image research foregrounds how photographs convey symbolic associations and social narratives that extend beyond simple scene classification, offering a richer interpretive lens for understanding UGC as both representational and performative acts within spatial-temporal tourism dynamics.

2.2. Deep Learning Approaches to Visual Destination Image Analysis

Advances in artificial intelligence and computer vision have enabled systematic analysis of large-scale visual UGC in tourism research. Deep learning models, particularly convolutional neural networks (CNNs), have demonstrated high accuracy in image classification, scene recognition, and object detection tasks (Ren et al., 2016; Talebi & Milanfar, 2018; Zhou et al., 2018). Research suggests that CNN-based models capture perceptual patterns that broadly align with aspects of human visual cognition (Kheradpisheh et al., 2016; F. Zhang et al., 2019).

Within tourism studies, CNN-based analyses of UGC have been used to identify visual themes related to destination image, aesthetic quality, and thematic representation (Liu et al., 2016; Tan et al., 2017; X. Wang et al., 2024; Xiao et al., 2022; F. Zhang et al., 2019, 2020, 2021). Empirical applications demonstrate that semantic analysis of tourist photographs can differentiate destination types and reveal systematic variation in perceived attributes across urban, natural, and coastal settings (X. Wang et al., 2024; Xiao et al., 2022). Collectively, this body of research highlights the potential of deep learning to uncover spatial and thematic heterogeneity in tourist perception and to support data-driven destination management (Deng & Li, 2018; Huang et al., 2021; Payntar et al., 2021; Sedmak et al., 2023).

Despite these advances, several limitations persist. Many studies emphasize spatial pattern detection while paying limited attention to temporal dynamics, despite evidence of seasonal variability in tourist perception (Yan et al., 2023). Research has also tended to focus on national-scale analyses or iconic cities, leaving micro destination level variation underexplored. Furthermore, although deep learning models offer scalability and efficiency, their capacity to capture culturally embedded meanings and nuanced affective responses remains constrained, raising concerns regarding interpretive depth and algorithmic bias (X. Wang et al., 2024).

To address these challenges, recent studies advocate integrative frameworks that combine strong theoretical grounding with spatial–temporal sensitivity and methodological rigor. In line with this direction, the present study adopts a hybrid analytical approach integrating the CLIP (Contrastive Language–Image Pretraining) model (Radford et al., 2021) with the Places365 ResNet50 model (Zhou et al., 2018). Vision language models such as CLIP enable zero-shot, open-vocabulary semantic inference (Gu et al., 2022), while CNNs trained on domain-specific datasets provide structured scene classification. Their integration allows for more precise operationalization of destination image components and facilitates the examination of spatiotemporal variation in visual perception.

2.3. From Perception Insights to Destination Marketing Strategy

Understanding tourists’ visual perceptions has important implications for destination marketing and management. Analyses of visual UGC reveal which attributes attract attention, how tourists emotionally engage with destination features, and where discrepancies emerge between projected and perceived images (Dolnicar & Grün, 2013). Such insights enable DMOs to refine branding narratives, align promotional materials with visitor experiences, and identify underrepresented or misaligned elements of destination identity.

Incorporating spatial and temporal dimensions further enhances strategic relevance. Seasonal variation in visual content can inform context-sensitive marketing campaigns, while spatial concentrations of photographic attention highlight areas experiencing excessive visitation, supporting dispersion strategies and infrastructure planning (S. Zhang et al., 2025). Temporal analyses reveal shifts in visual themes across seasons or times of day, reflecting changing tourist motivations and experiential contexts (Yan et al., 2023). Entropy based diversity measures provide additional insight into whether tourist attention is narrowly concentrated on iconic features or distributed across a broader set of attributes (F. Zhang et al., 2020).

At the same time, the application of AI-driven visual analytics raises ethical considerations related to data privacy, consent, and representational fairness (Aboalganam et al., 2025). Responsible use of visual UGC requires transparency and sensitivity to biases embedded in both data and algorithms. Provided that such concerns are adequately addressed, image-based analytics offer a robust foundation for actionable management and marketing strategies.

3. Material and Methods

3.1. Study Area

Despite growing scholarly interest, few studies integrate deep learning–based visual analysis with spatiotemporal differentiation across diverse destination types. Smaller countries with heterogeneous tourism landscapes, such as Slovenia, provide analytically valuable contexts for examining intra-destination variation under shared national branding frameworks.

This study examines visual destination image across three Slovenian micro destinations representing distinct tourism typologies: urban (Ljubljana), alpine/nature-based (Bled), and coastal (Koper, Izola, and Piran; hereafter KIP). In line with destination image theory conceptualizing image as a socially constructed and spatially embedded phenomenon (Marine-Roig & Clavé, 2016; Tasci & Gartner, 2007), these micro destinations are treated not merely as physical settings but as contexts of image co-creation, where institutional branding strategies intersect with tourists’ situated experiences and visual practices.

For analytical consistency, destinations are delineated according to municipal boundaries, which correspond to the operational jurisdictions of local tourism organizations (LTOs). This delineation reflects the institutional scale at which destination images are formally projected and managed, while also enabling examination of how tourists’ user-generated photographs reinterpret, reinforce, or contest these projected narratives at the micro destination level.

Ljubljana, Slovenia’s capital and largest city, has approximately 290,000 inhabitants and functions as the country’s cultural, political, and economic center. Its urban tourism image is shaped by a dense concentration of historical landmarks, cultural events, gastronomy, and everyday city life. In 2024, Ljubljana recorded 1,276,455 tourist arrivals and 2,590,898 overnight stays, with international visitors accounting for 94.7% of arrivals and 95.7% of overnight stays (SORS, 2025).

Bled represents Slovenia’s most prominent alpine and nature-based destination. Located in the Julian Alps, it is internationally recognized for Lake Bled, its island church, and medieval castle, highly iconic elements that dominate both institutional branding and tourist photography. In 2024, Bled registered 481,035 tourist arrivals and 1,150,582 overnight stays, of which approximately 95% were generated by international visitors (SORS, 2025).

The coastal destinations of Koper, Izola, and Piran (KIP) constitute Slovenia’s primary seaside tourism region along the Adriatic coast. Piran is characterized by its medieval Venetian architecture and cultural heritage, complemented by the nearby resort town of Portorož. Izola emphasizes fishing traditions alongside resort tourism, while Koper combines historical urban elements with its contemporary role as a port city. In 2024, Piran recorded 608,023 arrivals and 1,864,603 overnight stays, with international tourists accounting for approximately 69% of arrivals. Izola registered 158,475 arrivals and 566,243 overnight stays, and Koper recorded 128,922 arrivals and 370,848 overnight stays, with foreign tourists accounting for approximately 65% of overnight stays across all three municipalities (SORS, 2025).

At the national level, Slovenia’s tourism image is strategically positioned by the Slovenian Tourist Board as a “green, authentic, boutique destination” emphasizing nature, culture, gastronomy, health, and well-being (MGRT, 2022). Within this overarching institutional framework, local destinations articulate more specific image narratives. Bled promotes itself as a “Unique Alpine Jewel,” foregrounding nature, tradition, sports, recreation, and MICE activities (Municipality of Bled, 2018). Ljubljana positions its destination image around romance, authenticity, diversity, and culinary excellence (LTO Ljubljana—Turizem Ljubljana, 2020). The coastal region (KIP) advances an image of openness, multiculturalism, and integration, combining seaside tourism with culture, gastronomy, wellness, and outdoor experiences (ZMKTK—Institute for Culture, Youth and Tourism Koper, 2024).

Consistent with the theoretical perspective outlined in Section 2, these institutional images are understood not as fixed representations but as symbolic reference frames that interact dynamically with tourists’ experiences and visual self-presentations.

3.2. Data Acquisition

The data for this study were acquired from the photo-sharing platform Flickr (www.flickr.com) a widely used source in tourism research due to its global coverage and open access to photo metadata. Among social media data sources, Flickr and Twitter have historically been the most frequently used in tourism analytics, with Flickr playing a particularly prominent role in studies based on visual content (Ghermandi & Sinclair, 2019). Social media platforms foster distinct visual cultures that influence what users document and share, and platform-specific communities shape how tourism activity is represented in digital datasets. Social media users are generally younger, more educated, and more technologically engaged than the overall visitor population, although differences between platforms introduce additional variation (Di Minin et al., 2015; Heikinheimo et al., 2017). Flickr users are often characterized as photography-oriented, well-traveled, and disproportionately composed of hobbyist or professional photographers, which contributes to a stronger emphasis on landscapes, nature observation, and destination documentation (Bishop et al., 2025; Leppämäki et al., 2025; Tenkanen et al., 2017). In contrast, Instagram users tend to skew younger and are more likely to share socially oriented and performative content centered on people and everyday experiences (Hausmann et al., 2018). Correspondingly, Flickr datasets contain a higher proportion of biodiversity and nature imagery, whereas Instagram datasets include more human-centered photographs (Tenkanen et al., 2017).

A substantial body of tourism scholarship has validated Flickr as a reliable proxy for visitation patterns, seasonal dynamics, and the spatial concentration of tourism across both urban and nature-based destinations, frequently demonstrating strong correlations with official statistics (Barros et al., 2022; Bhatt & Pickering, 2023; Ghermandi & Sinclair, 2019; Giglio et al., 2019; Kádár & Gede, 2021; D. Kim et al., 2020; H. Kim & Stepchenkova, 2015; Su et al., 2025; Teles da Mota & Pickering, 2020; Wilkins et al., 2021). Despite its declining mainstream popularity relative to commercial platforms, Flickr remains one of the few large-scale visual social media datasets that is systematically accessible for academic research through an open API. In contrast, Instagram, Facebook, and X have progressively restricted data access, limiting the feasibility of longitudinal, reproducible, and ethically transparent research (Toivonen et al., 2019; Tromble, 2021).

Recent studies further demonstrate the continued relevance of Flickr in tourism research. The platform has been used to monitor long-term urban tourism dynamics (Encalada-Abarca et al., 2024), analyze spatiotemporal patterns of tourist behavior across different countries (Bhatt & Pickering, 2023; Bishop et al., 2025; Liu et al., 2016), and explore nightlife visitation patterns, such as Bui’s (2025) study of Los Angeles. Flickr also remains widely applied in destination image research, including analyses of inbound tourists’ perceived destination images (X. Wang et al., 2024), identification of national park brand identities (Taecharungroj et al., 2024), and earlier work on destination image communication (Deng & Li, 2018).

In line with these established applications, Flickr was selected as the primary data source, and a systematic data extraction workflow was implemented using the platform’s API. The flickr.photos.search API was used to retrieve publicly available photos and their associated metadata, including photo ID, title, geolocation coordinates, tags, timestamps, upload date, and owner information, for images uploaded between 1 January 2009, and 31 December 2023. The search was restricted to the geographic boundary box of Slovenia. Subsequently, owner IDs were used to obtain additional user details via the flickr.people.getInfo API, such as the user’s name and location, which were then linked to the corresponding photo metadata. All available photos meeting these criteria were downloaded for analysis.

3.3. Data Filtering

Tourists and locals were initially distinguished based on the retrieved user location, with all non-Slovenian residents classified as tourists. However, because not all users publicly disclose their place of residence, an additional classification step was required. Previous studies have applied various heuristic methods, often using photo timestamps to calculate time intervals or annual patterns of photo uploads to identify tourists. To classify users whose residency status was not publicly available, we employed a procedure based on the temporal distribution of their uploaded photographs. This approach follows the methodology established by (Sun et al., 2013) and subsequently adopted by Han et al. (2020). The underlying premise is that residents and tourists exhibit distinctly different patterns of photo-taking activity over the course of a year. Residents, being present throughout the year, are likely to have a more temporally dispersed pattern of activity. In contrast, tourists, whose presence is typically limited to short visits, will have a highly concentrated burst of activity. To quantify this distinction, Sun et al. (2013) proposed an entropy-based filtering method, which we implemented as follows:

P_{i} (u) = \frac{D_{i} (u)}{\sum_{i = 1}^{M (u)} D_{i} (u)}

In Equation (1), P_i(u) represents the fraction of days in month M_i that user u contributed images, calculated by dividing the number of days they contributed in that month, D_i(u), by the total number of days he contributed across all months M(u).

E (u) = - \sum_{i = 1}^{M (u)} P_{i} (u) \cdot \log P_{i} (u)

A user with all activity concentrated in a single month will have a minimum entropy of E(u) = 0, while a user with activity evenly distributed across all twelve months will reach the maximum entropy.

Given that the analysis focuses on foreign tourists, we set a relatively low entropy threshold of 1.0. Thus, all users with an E(u) > 1.0 were classified as residents and removed from the dataset. This selected threshold is effective at excluding residents but may also inadvertently remove some long-term tourists or frequent visitors (including domestic ones) to the same destination, or in our case to different destinations. However, our priority was to retain only users that exhibit strong tourist-like behavior.

During data cleansing, duplicate and incomplete records (i.e., images that could not be downloaded) were removed. The dataset was then spatially filtered using geotags to retain only images captured within the administrative boundaries of three distinct micro destinations. The final dataset comprised 21,094 images from 1536 users in Ljubljana, 13,138 images from 1500 users in Bled, and 6796 images from 792 users in the coastal destinations—KIP.

3.4. Image Visual Analysis

In the proposed analytical pipeline, each image is first encoded using CLIP’s visual encoder. A set of high-level semantic dimensions—History and Architecture, Humanistic Life, Nature, Recreation, and Infrastructure—is represented through text embeddings, and cosine similarity between image and text embeddings determines the semantic category most closely associated with each image. This enables the extraction of conceptual information directly from user-generated photographs without requiring task-specific retraining. To capture more detailed scene characteristics, we then apply ResNet50, a deep learning model trained on the Places365 dataset, which assigns object and scene labels based on a predefined taxonomy of 365 scene classes.

By integrating high-level semantic inference with fine-grained object and scene recognition, our framework establishes a hierarchical representation of visual content that connects conceptual, thematic, and environmental levels of interpretation. This structure supports quantitative assessment of scene-level heterogeneity within individual micro destinations and enables systematic comparison across urban, nature-based/alpine, and coastal tourism contexts.

3.5. Spatial Analysis

Using image metadata, including timestamps and geotags, we spatially allocated the extracted scene information and analyzed the spatial and temporal distribution of image scene categories. To assess the spatial heterogeneity of image scenes within each micro destination, we employed a local, normalized variant of the Theil H spatial entropy index (Theil, 1967). Theil’s entropy index is an information-theoretic measure originally developed to quantify inequality and segregation and has been widely applied to evaluate both within-group and between-group similarities (Cheng et al., 2024). Unlike the classical global Theil H index, which aggregates heterogeneity across an entire study area, our implementation focuses on normalized local entropy to capture fine-scale spatial variation in scene composition. For each spatial unit

i

, the entropy of scene categories is computed as:

H_{i} = - \sum_{r = 1}^{k_{i}} p_{r i} {l o g}_{2} (p_{r i})

where

p_{r i}

denotes the proportion of scene category

r

within spatial unit

i

, and

k_{i}

is the number of unique scene categories observed in that unit. This formulation measures the degree of thematic diversity within each spatial unit, with higher values indicating greater heterogeneity of scene types.

To allow comparison across spatial units with differing numbers of observed categories, the entropy value is normalized by its theoretical maximum:

H_{i}^{m a x} = \log_{2} (k_{i})

The normalized Theil H value for each spatial unit is then calculated as:

T h e i l H_{i} = \frac{H_{i}}{H_{i}^{m a x}}

This normalization constrains the index to the interval [0, 1], where values close to 0 indicate low thematic diversity (i.e., dominance of a single scene category), and values close to 1 indicate high thematic heterogeneity, corresponding to an even distribution of scene categories within the spatial unit. In cases where only one scene category is present

(k_{i} = 1)

, the index is defined as zero, reflecting the absence of diversity.

Although the classical Theil index allows formal decomposition into within-group and between-group components, our local entropy–based formulation focuses on spatially explicit heterogeneity by assigning a diversity value to each spatial unit. This approach enables identification of locations with high or low thematic diversity and highlights how individual areas contribute to broader spatial patterns of scene composition, without relying on aggregate inequality decomposition.

To operationalize the analysis, we constructed a regular hexagonal grid with a cell diameter of 500 m and calculated the normalized Theil H value for each grid cell based on the distribution of scene categories observed within that cell. Higher H values indicate greater local thematic heterogeneity, reflecting a more even and diverse mix of scene types, whereas lower values indicate dominance by one or a few scene categories, corresponding to more homogeneous scene composition within the spatial unit. These spatial variations in local Theil H provide a robust, spatially explicit basis for assessing and comparing thematic heterogeneity within and across micro destinations.

4. Results

Visitor-level metrics indicate that most visits are short and spatially concentrated. Across micro destinations, the average stay length (calculated as the time difference between a user’s first and last photograph) ranges from 1.39 to 1.54 days, with a median of one day in all cases, indicating that most photographic activity corresponds to single day visits. The majority of users (2136) photographed only one micro destination, while fewer visited two (594) or all three (168), confirming limited multi-destination movement within individual trips. This concentration is even stronger at the daily scale, where more than 95% of user-days are associated with a single micro destination.

Scene labels were predicted using two complementary models. Because a photograph may contain multiple semantic elements, only labels with predicted probabilities above 10% were retained, and the highest-scoring label per image was used for analysis. Images with no labels exceeding this threshold were excluded. The hierarchical mapping of labels to predefined scene dimensions enables a consistent representation of semantic content across micro destinations (Figure 1).

Nature dominates photographic representations of Bled, accounting for nearly two-thirds of images, with the lake alone appearing in more than half of all photographs. In Ljubljana, History & Architecture is the primary category, followed by Humanistic life and Nature, reflecting the coexistence of heritage sites and green urban spaces. The coastal micro destination exhibits the most balanced composition, with Nature, History & Architecture, and Humanistic life occurring at comparable levels and a relatively higher presence of Recreation-related scenes.

Figure 2 presents the spatial distribution of visual scene diversity and semantic content across selected micro destinations. The top row shows the spatial variation of scene diversity measured using the Theil H entropy index, aggregated to a hexagonal grid. The bottom row displays CLIP-based visual scene labeling of individual images, classified into five scene categories.

Visual analysis focuses on high density photographic zones, primarily historic cores, where interpretation is statistically reliable. High Theil H values (indicating greater scene diversity) are concentrated in the centers of Ljubljana and Piran, suggesting that tourists engage with multiple co-located semantic dimensions in these multifunctional urban environments. Peripheral areas show lower entropy, reflecting more focused visual attention. CLIP-based classification clarifies these entropy patterns. Historic centers combine History & Architecture and Humanistic life with secondary scene types, generating high diversity. Nature is more peripheral, concentrated in coastal areas, green zones, and outer urban districts. Infrastructure contributes locally to diversity, especially near major transport nodes.

In contrast, Bled exhibits uniformly lower and more evenly distributed diversity, consistent with the overwhelming dominance of Nature across space.

To investigate how tourists perceive a multi-faceted destination image through their daily photographic sequences, we calculated scene category shifts. This measure quantifies transitions between distinct scene categories in consecutive photographs taken by the same tourist within a single day, reflecting changes in focal interest, for example, moving from History & Architecture to Humanistic life or from Nature to Infrastructure. To characterize overall tourist behavior, we computed an average daily shift score by averaging the daily shift frequencies across all observation days.

Table 1 summarizes shifts between visual scene categories identified through CLIP-based visual scene labeling. Transitions are dominated by History & Architecture (32.15%), Nature (28.74%), and Humanistic life (22.47%), mirroring their overall prevalence. The most frequent shifts occur between History & Architecture and Nature, indicating strong co-presence of heritage and landscape elements. Substantial bidirectional transitions between History & Architecture and Humanistic life highlight the integration of landmarks with everyday urban activity. Infrastructure transitions are moderate (10.52%) and evenly distributed, while Recreation represents the smallest share (6.13%) and is primarily linked to Nature.

Micro destination comparisons reveal distinct patterns (Figure 3). Bled transitions are anchored in Nature, with selective shifts toward cultural and social scenes. Ljubljana shows evenly distributed transitions centered on History & Architecture, reflecting diversified urban engagement. The coastal destination exhibits the most balanced transition structure, suggesting broad experiential coverage. These patterns are consistent with the high Theil H values observed in historic urban cores, where multiple scene categories co-occur within limited spatial extents.

As shown in prior research (Lehto et al., 2004; Su et al., 2025), prior destination visits and length of stay influence tourists’ behavior and activity engagement. Repeat users exhibit significantly different behavioral patterns. They record longer stays (Wilcoxon test, p < 0.001), produce more photographs (p = 0.013), and display higher scene diversity (p = 0.020) than first-visit users. Scene category shifts are also positively correlated with stay length (Spearman ρ = 0.28), indicating that longer visits tend to involve more varied visual activity. While these visitor level patterns do not directly imply causal relationships, they are consistent with the idea that greater exposure time enables broader engagement with available destination features, whereas short visits concentrate attention on a narrower set of scenes.

Table 2 reports the average number of daily scene category shifts per tourist across seasons. In all micro destinations, shifts are lowest in winter, indicating a narrower behavioral focus. Seasonal peaks differ slightly: autumn in Bled and KIP, and summer in Ljubljana, suggesting modest expansions in activity diversity. Although differences are small, they reveal subtle spatial and temporal variation in tourist behavior.

Correspondence analysis (Figure 4) shows consistent seasonal structure. The first two dimensions explain most inertia (Dim1: 59–68%; Dim2: 31–41%), representing a landscape–urban gradient and a recreational–activity gradient, respectively. Generalist urban scenes cluster near the origin, reflecting broad but weak associations, whereas context-specific scenes (e.g., sea, mountains, snow activities) occupy peripheral positions and align more strongly with particular micro destinations.

Micro destination positions are largely stable across seasons. Bled aligns with the landscape dimension and shows a winter intensification linked to seasonal outdoor imagery. KIP consistently follows the recreational gradient, reinforcing its activity-oriented identity. Ljubljana remains near the centroid, indicating a generalized urban profile with limited seasonal differentiation. Transitional spring and autumn patterns combine environmental and urban influences.

5. Discussion and Conclusions

This study analyzes inbound tourists’ photographs by integrating deep learning techniques with spatial-temporal analysis to examine the distribution of photographed scenes and the evolving dynamics of tourists’ visual interests. Using geotagged photographs from three Slovenian micro destinations the analysis reveals how tourists’ photographed scenes are distributed, combined, and sequenced across space and time. Rather than viewing destination image as a static concept, the findings frame it as a dynamic, spatially embedded, and experientially constructed phenomenon emerging from tourists’ situated encounters with destination environments. The following section discusses the implications of these findings in the context of existing literature and highlights their practical relevance for tourism marketing.

5.1. Theoretical Implications

Destination image research has long recognized the multidimensional and subjective nature of tourist perceptions (Baloglu & McCleary, 1999; Echtner & Ritchie, 1991; Gartner, 1994). More recent work emphasizes that destination image is socially constructed and continuously re-produced through tourist practices and representations, particularly in digital environments (Stylidis et al., 2022). The present study advances this line of research by demonstrating how image perception unfolds through spatially and temporally situated visual engagements.

In our case results show tourists’ highly concentrated spatially and temporally photographic activity, with most visits lasting a single day and focusing on one micro destination. However, even short stays generate complex image structures through rapid transitions between scene categories. The observed scene category shifts indicate that destination image is assembled through sequential exposure to multiple experiential settings, rather than through singular iconic attractions alone.

Beyond dimensional composition, the results provide empirical support for the core–periphery structure of tourist destination image proposed by Lai and Li (2012). In Bled, photographic representations are overwhelmingly dominated by natural scenery, specifically the lake, indicating a highly salient image core with limited peripheral diversification. In contrast, Ljubljana and the coastal micro destinations exhibit higher visual entropy and more balanced scene compositions, reflecting more diffuse image structures in which multiple associations coexist without a single dominant core. These findings parallel power-law distributions of destination image elements identified in prior research (Pan & Li, 2011; Stepchenkova & Li, 2012) and demonstrate that image “coreness” varies systematically across destination types. Additionally, these findings reinforce and spatially operationalize arguments that urban destination images are inherently fragmented and layered (Ashworth & Page, 2011; Xiao et al., 2022). Finally, frequent transitions between History & Architecture and Humanistic life in urban destinations suggest strong associative links between heritage consumption and everyday social environments.

Seasonal variations further underline the temporal dynamism of destination image. Although overall image structures remain stable, subtle seasonal shifts in scene diversity and category transitions suggest that destination image “pulses” over time, aligning with longitudinal UGC studies that document evolving tourist attention and affective engagement (Encalada et al., 2017; Yan et al., 2023). Importantly, the correspondence analysis reveals different temporal logics of image perceptions across destination types. While iconic landscape and recreational scenes are strongly season-bound, generalized urban scenes remain comparatively stable.

Consistent with prior work suggesting that tourists idealize received destination images when sharing travel photos (Stepchenkova & Zhan, 2013), visual UGC in all three micro destinations concentrates on aesthetically appealing, and socially legible scenes.

5.2. Managerial Implications

From a managerial perspective, image-based spatial-temporal analyses offer a mechanism for linking tourists’ perceptual dynamics to destination marketing strategies, supporting both horizontal differentiation between destination types and vertical alignment between national and local branding efforts.

The proposed analytical framework offers practical guidance for evidence based destination management by revealing how tourists visually perceive destinations, which features they prioritize, and how these perceptions shift across semantic dimensions and seasons. Such insights allow DMOs to evaluate the alignment between projected and perceived destination image, identifying potential discrepancies that may affect visitor satisfaction and engagement (Dolnicar & Grün, 2013; Y. Li et al., 2023). This allows DMO’s to evaluate and if necessary adapt their marketing communication activities, branding narratives, or even the “tourism product” itself to close the potential gap, which may be the cause of dissatisfaction of tourists (Y. Li et al., 2023).

Seasonal variations in scene category shifts provide actionable insights for context sensitive marketing. Peaks in diversity during summer in Ljubljana and autumn in Bled and KIP indicate periods when tourists engage more broadly with available experiential offerings. DMOs can leverage such temporal patterns to design campaigns that promote contextually relevant imagery, support off-peak visitation, and encourage the deconcentration of tourists from over-visited hotspots (Xiao et al., 2022; X. Wang et al., 2024). Similarly, spatial clustering of visual attention highlights heavily visited areas, such as historic cores and central attractions, while peripheral zones receive comparatively little attention. These patterns suggest opportunities for strategic spatial dispersion, such as promoting complementary hubs or underutilized attractions to reduce congestion and enhance overall destination experience (Gatto & Scorza, 2025; Sedmak et al., 2023; S. Zhang et al., 2025).

The dominance of natural scenery in Bled exemplifies a strong image core, while higher visual entropy in Ljubljana and coastal destinations indicates multiple coexisting cores, reflecting diverse urban and recreational servicescapes. Understanding these patterns allows DMOs to prioritize and reinforce core images while identifying peripheral elements that can be developed or promoted to enrich the overall destination brand. Scene-category transitions also illuminate associative linkages between visual features, providing evidence on how tourists experience interconnected services, attractions, and cultural elements in practice.

From a strategic perspective, integrating deep learning–derived visual insights with multi-seasonal and spatial data supports market segmentation, visitor profiling, and evidence-based decision-making. DMOs can align marketing communications with actual tourist perceptions, adjust thematic emphasis to highlight underrepresented but high-value features (e.g., culture or gastronomy), and foster cross-seasonal engagement. By visualizing the experiential identity of destinations, managers gain actionable guidance for positioning, branding, and planning initiatives that resonate with both projected institutional narratives and the self-constructed images tourists create online (Yoo & Kang, 2025).

When comparing the official tourism image of Slovenia with our empirical findings, several misalignments become evident. Attributes such as “green” and “nature,” strongly emphasized in official promotional materials, are indeed prominent in Bled; however, “tradition” is only weakly represented. A stronger emphasis on culture is observed in Ljubljana and KIP, where outdoor activities and coastal settings also feature heavily. In contrast, authentic gastronomy, wellness, business tourism, and MICE tourism—all integral components of Slovenia’s official tourism narrative, are underrepresented across all destinations. Although this may partly reflect the lower photogenicity of business/MICE/wellness contexts, the results also indicate a potential image gap between projected and perceived destination identity. This is especially notable for traditional gastronomy, which can hold high visual appeal (Picazo et al., 2025).

While the use of UGC entails inherent representational biases (favoring photogenic, iconic, or easily accessible sites) it remains a timely, cost-effective, and scalable tool for monitoring destination image in near-real time. Refining these approaches through multi-platform data collection, integration with on-site surveys, and targeted study designs can further enhance the capacity of DMOs to track perceived image, guide destination development, and close gaps between projected and actual visitor perceptions (Picazo et al., 2025).

5.3. Limitations and Future Work

Despite its contributions, this study has several limitations that suggest directions for future research. First, the analysis focuses on three Slovenian micro destinations, which may limit generalizability to other spatial or cultural contexts. Second, while UGC analysis offers advantages over traditional methods, it also presents inherent biases. Flickr users are typically older, more educated, photography-oriented, and internationally mobile, which can overrepresent landscape, heritage, and nature imagery while underrepresenting socially oriented, performative, or domestic tourist experiences (Bishop et al., 2025; Di Minin et al., 2015; Hausmann et al., 2018). Platform-specific conventions and declining mainstream popularity may also skew temporal trends and the perceived destination image (Ghermandi & Sinclair, 2019; Tromble, 2021).

Third, UGC is biased toward visually appealing or easily accessible attractions and underrepresents less photogenic activities such as MICE or wellness potentially limiting detection of peripheral elements in the core–periphery structure of TDI (Deng et al., 2019; Lai & Li, 2016; Y. Wang et al., 2018). Fourth, although hybrid models (CLIP and Places365 ResNet50) improve scene recognition and semantic categorization, they inherit dataset and cultural biases, rely on fixed taxonomies, and may miss region-specific or novel scenes (Lee et al., 2023; X. Wang et al., 2024). Computational intensity also limits scalability for extremely large datasets.

To address these limitations, future research should integrate multiple social media platforms, including Instagram, alongside traditional survey or observational methods to capture a broader, more socially and culturally representative range of tourist experiences (Hunter, 2016; Tenkanen et al., 2017; Deng & Li, 2018). Linking UGC with visitor demographics, events, and temporal context could further enhance understanding of the spatiotemporal dynamics of destination image, as well as the interplay between tourists’ core and peripheral perceptions (Stepchenkova & Li, 2012; Y. Wang et al., 2018). Such multi-source, theory-informed approaches would strengthen both methodological rigor and the strategic relevance of destination image research.

Author Contributions

Conceptualization, D.P. and A.B.; methodology, D.P. and A.B.; software, D.P.; validation, D.P., A.B. and G.S.; formal analysis, D.P.; investigation, D.P.; resources, D.P.; data curation, D.P.; writing—original draft preparation, D.P., A.B. and G.S.; writing—review and editing, D.P., G.S. and A.B.; visualization, D.P.; supervision, D.P., G.S. and A.B.; project administration, D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The author declares no conflicts of interest.

References

Aboalganam, K. M., AlFraihat, S. F., & Tarabieh, S. (2025). The impact of user-generated content on tourist visit intentions: The mediating role of destination imagery. Administrative Sciences, 15(4), 117. [Google Scholar] [CrossRef]
Ashworth, G., & Page, S. J. (2011). Urban tourism research: Recent progress and current paradoxes. Tourism Management, 32(1), 1–15. [Google Scholar] [CrossRef]
Baloglu, S., & McCleary, K. W. (1999). A model of destination image formation. Annals of Tourism Research, 26(4), 868–897. [Google Scholar] [CrossRef]
Barros, C., Gutiérrez, J., & García-Palomares, J. (2022). Geotagged data from social media in visitor monitoring of protected areas; a scoping review. Current Issues in Tourism, 25(9), 1399–1415. [Google Scholar] [CrossRef]
Bhatt, P., & Pickering, C. M. (2023). Analysing spatial and temporal patterns of tourism and tourists’ satisfaction in Nepal using social media. Journal of Outdoor Recreation and Tourism, 44, 100647. [Google Scholar] [CrossRef]
Bishop, M. V., Ólafsdóttir, R., Michaud, T., & Metcalf, C. (2025). Crowdsourced mapping of tourism distribution and dynamics in Iceland using Flickr. Scandinavian Journal of Hospitality and Tourism, 25(4), 330–346. [Google Scholar] [CrossRef]
Bui, T.-H. (2025). Exploring visitors’ nightlife using geo-tagged social media. Current Issues in Tourism, 1–21. [Google Scholar] [CrossRef]
Chan, C.-S., & Zhang, Y. (2018). Matching projected image with perceived image for geotourism development: A qualitative-quantitative integration. Asian Geographer, 35(2), 143–160. [Google Scholar] [CrossRef]
Cheng, J., Xu, Y., Jian, I., Ren, M., & Park, S. (2024). Spatial concentration of intra-urban tourist activities and inter-group differences between Asian, European and North American travelers in Korean cities. Tourism Management, 107, 105064. [Google Scholar] [CrossRef]
Crompton, J. L. (1979). Motivations for pleasure vacation. Annals of Tourism Research, 6(4), 408–424. [Google Scholar] [CrossRef]
Deng, N., & Li, X. R. (2018). Feeling a destination through the “right” photos: A machine learning model for DMOs’ photo selection. Tourism Management, 65, 267–278. [Google Scholar] [CrossRef]
Deng, N., Liu, J., Dai, Y., & Li, H. (2019). Different cultures, different photos: A comparison of Shanghai’s pictorial destination image between East and West. Tourism Management Perspectives, 30, 182–192. [Google Scholar] [CrossRef]
Di Minin, E., Tenkanen, H., & Toivonen, T. (2015). Prospects and challenges for social media data in conservation science. Frontiers in Environmental Science, 3, 63. [Google Scholar] [CrossRef]
Dolnicar, S., & Grün, B. (2013). Validly measuring destination image in survey studies. Journal of Travel Research, 52(1), 3–14. [Google Scholar] [CrossRef]
Echtner, C. M., & Ritchie, J. B. (1991). The meaning and measurement of destination image. Journal of Tourism Studies, 2(2), 2–12. [Google Scholar]
Encalada, L., Boavida-Portugal, I., Cardoso Ferreira, C., & Rocha, J. (2017). Identifying tourist places of interest based on digital imprints: Towards a sustainable smart city. Sustainability, 9(12), 2317. [Google Scholar] [CrossRef]
Encalada-Abarca, L., Ferreira, C. C., & Rocha, J. (2024). Revisiting city tourism in the longer run: An exploratory analysis based on LBSN data. Current Issues in Tourism, 27(4), 584–599. [Google Scholar] [CrossRef]
Fakeye, P. C., & Crompton, J. L. (1991). Image differences between prospective, first-time, and repeat visitors to the Lower Rio Grande Valley. Journal of Travel Research, 30(2), 10–16. [Google Scholar] [CrossRef]
Ferrer-Rosell, B., & Marine-Roig, E. (2020). Projected versus perceived destination image. Tourism Analysis, 25(2–3), 227–237. [Google Scholar] [CrossRef]
Gartner, W. C. (1994). Image formation process. Journal of Travel & Tourism Marketing, 2(2–3), 191–216. [Google Scholar] [CrossRef]
Gatto, R. V., & Scorza, F. (2025). “Anti-gravity tourism planning”: An analytical approach to manage tourism congestion, seasonality and overtourism. Urban Science, 9(12), 524. [Google Scholar] [CrossRef]
Ghermandi, A., & Sinclair, M. (2019). Passive crowdsourcing of social media in environmental research: A systematic map. Global Environmental Change, 55, 36–47. [Google Scholar] [CrossRef]
Giglio, S., Bertacchini, F., Bilotta, E., & Pantano, P. (2019). Using social media to identify tourism attractiveness in six Italian cities. Tourism Management, 72, 306–312. [Google Scholar] [CrossRef]
Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2022, April 25–29). Open-vocabulary object detection via vision and language knowledge distillation. International Conference on Learning Representations (ICLR), Online. [Google Scholar]
Gunn, C. A. (1972). Vacationscape: Designing tourist regions. Bureau of Business Research, University of Texas at Austin. Available online: https://books.google.si/books?id=XPpOAAAAMAAJ (accessed on 20 November 2025).
Han, S., Ren, J., Yanhua, D., & Gui, D. (2020). Extracting representative images of tourist attractions from Flickr by combining an improved cluster method and multiple deep learning models. ISPRS International Journal of Geo-Information, 9, 81. [Google Scholar] [CrossRef]
Hausmann, A., Toivonen, T., Slotow, R., Tenkanen, H., Moilanen, A., Heikinheimo, V., & Di Minin, E. (2018). Social media data can be used to understand tourists’ preferences for nature-based experiences in protected areas. Conservation Letters, 11(1), e12343. [Google Scholar] [CrossRef]
Heikinheimo, V., Di Minin, E., Tenkanen, H., Hausmann, A., Erkkonen, J., & Toivonen, T. (2017). User-generated geographic information for visitor monitoring in a national park: A comparison of social media data and visitor survey. ISPRS International Journal of Geo-Information, 6(3), 85. [Google Scholar] [CrossRef]
Huang, S. S., Shao, Y., Zeng, Y., Liu, X., & Li, Z. (2021). Impacts of COVID-19 on Chinese nationals’ tourism preferences. Tourism Management Perspectives, 40, 100895. [Google Scholar] [CrossRef]
Hunt, J. D. (1975). Image as a factor in tourism development. Journal of Travel Research, 13(3), 1–7. [Google Scholar] [CrossRef]
Hunter, W. C. (2016). The social construction of tourism online destination image: A comparative semiotic analysis of the visual representation of Seoul. Tourism Management, 54, 221–229. [Google Scholar] [CrossRef]
Jin, C., Cheng, J., & Xu, J. (2018). Using user-generated content to explore the temporal heterogeneity in tourist mobility. Journal of Travel Research, 57(6), 779–791. [Google Scholar] [CrossRef]
Kádár, B., & Gede, M. (2021). Tourism flows in large-scale destination systems. Annals of Tourism Research, 87, 103113. [Google Scholar] [CrossRef]
Khan, A., Ashfaq, J., Bilal, M., Khan, M. H., & Shad, F. (2021). Destination image formation through User Generated Content (UGC). An updated literature review. Indian Journal of Economics and Business, 20(2), 1223–1238. [Google Scholar]
Kheradpisheh, S. R., Ghodrati, M., Ganjtabesh, M., & Masquelier, T. (2016). Deep networks can resemble human feed-forward vision in invariant object recognition. Scientific Reports, 6(1), 32672. [Google Scholar] [CrossRef]
Kim, D., Kang, Y., Park, Y., Kim, N., & Lee, J. (2020). Understanding tourists’ urban images with geotagged photos using convolutional neural networks. Spatial Information Research, 28(2), 241–255. [Google Scholar] [CrossRef]
Kim, H., & Stepchenkova, S. (2015). Effect of tourist photographs on attitudes towards destination: Manifest and latent content. Tourism Management, 49, 29–41. [Google Scholar] [CrossRef]
Lai, K., & Li, X. (2016). Tourism destination image: Conceptual problems and definitional solutions. Journal of Travel Research, 55(8), 1065–1080. [Google Scholar] [CrossRef]
Lai, K., & Li, Y. (2012). Core-periphery structure of destination image: Concept, evidence and implication. Annals of Tourism Research, 39(3), 1359–1379. [Google Scholar] [CrossRef]
Lee, N., Bang, Y., Lovenia, H., Cahyawijaya, S., Dai, W., & Fung, P. (2023). Survey of social bias in vision-language models. arXiv, arXiv:2309.14381. [Google Scholar] [CrossRef]
Lehto, X. Y., O’leary, J. T., & Morrison, A. M. (2004). The effect of prior experience on vacation behavior. Annals of Tourism Research, 31(4), 801–818. [Google Scholar] [CrossRef]
Leppämäki, T., Heikinheimo, V., Eklund, J., Hausmann, A., & Toivonen, T. (2025). The rise and fall of the social media platform Flickr: Implications for nature recreation research. Journal of Outdoor Recreation and Tourism, 50, 100880. [Google Scholar] [CrossRef]
Leung, D., Law, R., Van Hoof, H., & Buhalis, D. (2013). Social media in tourism and hospitality: A literature review. Journal of Travel & Tourism Marketing, 30(1–2), 3–22. [Google Scholar] [CrossRef]
Li, J., Xu, L., Tang, L., Wang, S., & Li, L. (2018). Big data in tourism research: A literature review. Tourism Management, 68, 301–323. [Google Scholar] [CrossRef]
Li, Y., He, Z., Li, Y., Huang, T., & Liu, Z. (2023). Keep it real: Assessing destination image congruence and its impact on tourist experience evaluations. Tourism Management, 97, 104736. [Google Scholar] [CrossRef]
Li, Y. W., & Wan, L. C. (2025). Inspiring tourists’ imagination: How and when human presence in photographs enhances travel mental simulation and destination attractiveness. Tourism Management, 106, 104969. [Google Scholar] [CrossRef]
Liu, L., Zhou, B., Zhao, J., & Ryan, B. D. (2016). C-IMAGE: City cognitive mapping through geo-tagged photos. GeoJournal, 81(6), 817–861. [Google Scholar] [CrossRef]
Lo, I. S., & McKercher, B. (2015). Ideal image in process: Online tourist photography and impression management. Annals of Tourism Research, 52, 104–116. [Google Scholar] [CrossRef]
LTO Ljubljana—Turizem Ljubljana. (2020). Strategija razvoja turistične destinacije. Ljubljana in ljubljanska regija 2021–2027 (Tourism destination development strategy: Ljubljana and the Ljubljana region 2021–2027). Available online: https://www.visitljubljana.com/sl/turizem-ljubljana/vizija-in-strategija/strategija-razvoja-2021-2027/ (accessed on 20 November 2025).
Mak, A. H. N. (2017). Online destination image: Comparing national tourism organisation’s and tourists’ perspectives. Tourism Management, 60, 280–297. [Google Scholar] [CrossRef]
Marine-Roig, E., & Clavé, S. A. (2016). Destination image gaps between official tourism websites and user-generated content. In Information and communication technologies in tourism 2016: Proceedings of the international conference in Bilbao, Spain, February 2–5 (pp. 253–265). Springer International Publishing. [Google Scholar]
MGRT. (2022). Strategija slovenskega turizma (Strategy of Slovene Tourism) 2022–2028. Available online: https://www.gov.si/assets/ministrstva/MGTS/Dokumenti/DTUR/Nova-strategija-2022-2028/Strategija-slovenskega-turizma-2022-2028-dokument.pdf (accessed on 25 January 2023).
Municipality of Bled. (2018). Strategija trajnostnega razvoja blejskega turizma 2018–2025 (Sustainable Development Strategy of Bled Tourism 2018–2025). Available online: https://www.bled.si/sl/informacije/poslovne-strani/novice/2019101813324244/strategija-trajnostnega-razvoja-blejskega-turizma-2018-2025/ (accessed on 20 November 2025).
Nixon, L. J. (2024). Do deep learning models accurately measure visual destination image? A comparison of a fine-tuned model to past work. Information Technology & Tourism, 26(3), 377–406. [Google Scholar] [CrossRef]
Pan, B., & Li, X. (2011). The long tail of destination image and online marketing. Annals of Tourism Research, 38(1), 132–152. [Google Scholar] [CrossRef]
Payntar, N. D., Hsiao, W.-L., Covey, R. A., & Grauman, K. (2021). Learning patterns of tourist movement and photography from geotagged photos at archaeological heritage sites in Cuzco, Peru. Tourism Management, 82, 104165. [Google Scholar] [CrossRef]
Picazo, P., Moreno-Gil, S., DiPietro, R. B., & Ma, F. (2025). From plate to picture: The role of gastronomic offerings in tourism marketing. Journal of Destination Marketing & Management, 38, 101025. [Google Scholar] [CrossRef]
Pike, S., & Page, S. J. (2014). Destination Marketing Organizations and destination marketing: A narrative analysis of the literature. Tourism Management, 41, 202–227. [Google Scholar] [CrossRef]
Radford, A., Jong, W. K., Hallacy, C., & Ramesh, A. (2021, July 18–24). Learning transferable visual models from natural language supervision. 38th International Conference on Machine Learning (ICML), Online. [Google Scholar]
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. [Google Scholar] [CrossRef]
Salazar, N. B. (2012). Tourism imaginaries: A conceptual approach. Annals of Tourism Research, 39(2), 863–882. [Google Scholar] [CrossRef]
Sedmak, G., Paliska, D., & Brezovec, A. (2023). Data mining of visitors’ spatial movement patterns using Flickr geotagged photos: The case of dispersed Plečnik’s architectural heritage in Ljubljana. Academica Turistica—Tourism and Innovation Journal, 16(1). [Google Scholar] [CrossRef]
SORS. (2025). SI-STAT—Statistical office of the Republic of Slovenia. Available online: https://pxweb.stat.si/SiStat/en/Podrocja/Index/155/turizem (accessed on 20 November 2025).
Spyriadis, T., Fletcher, J., & Fyall, A. (2013). Destination management organisational structures. In Trends in European tourism planning and organization (Vol. 60). Channel View Publication. [Google Scholar]
Stepchenkova, S., & Li, X. (2012). Chinese outbound tourists’ destination image of America: Part II. Journal of Travel Research, 51(6), 687–703. [Google Scholar] [CrossRef]
Stepchenkova, S., & Zhan, F. (2013). Visual destination images of Peru: Comparative content analysis of DMO and user-generated photography. Tourism Management, 36, 590–601. [Google Scholar] [CrossRef]
Stylidis, D., Woosnam, K. M., & Tasci, A. D. (2022). The effect of resident-tourist interaction quality on destination image and loyalty. Journal of Sustainable Tourism, 30(6), 1219–1239. [Google Scholar] [CrossRef]
Su, X., Birenboim, A., & Zhang, K. (2025). Length of stay and destination image: Insights from social media content of day trippers and tourists. Journal of China Tourism Research, 21(3), 739–760. [Google Scholar] [CrossRef]
Sun, Y., Fan, H., Bakillah, M., & Zipf, A. (2013). Road-based travel recommendation using geo-tagged images. Computers, Environment and Urban Systems, 53, 110–122. [Google Scholar] [CrossRef]
Taecharungroj, V., & Mathayomchan, B. (2020). The big picture of cities: Analysing Flickr photos of 222 cities worldwide. Cities, 102, 102741. [Google Scholar] [CrossRef]
Taecharungroj, V., Vasiljević, Đ., & Pattaratanakun, A. (2024). Snapshots of nature: Harnessing Flickr data to frame sustainable brand positioning strategies for Thailand’s national parks. Journal of Outdoor Recreation and Tourism, 46, 100765. [Google Scholar] [CrossRef]
Talebi, H., & Milanfar, P. (2018). NIMA: Neural image assessment. IEEE Transactions on Image Processing, 27(8), 3998–4011. [Google Scholar] [CrossRef]
Tan, Y., Tang, P., Zhou, Y., Luo, W., Kang, Y., & Li, G. (2017). Photograph aesthetical evaluation and classification with deep convolutional neural networks. Neurocomputing, 228, 165–175. [Google Scholar] [CrossRef]
Tasci, A. D., & Gartner, W. C. (2007). Destination image and its functional relationships. Journal of Travel Research, 45(4), 413–425. [Google Scholar] [CrossRef]
Teles da Mota, V., & Pickering, C. (2020). Using social media to assess nature-based tourism: Current research and future trends. Journal of Outdoor Recreation and Tourism, 30, 100295. [Google Scholar] [CrossRef]
Tenkanen, H., Di Minin, E., Heikinheimo, V., Hausmann, A., Herbst, M., Kajala, L., & Toivonen, T. (2017). Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas. Scientific Reports, 7(1), 17615. [Google Scholar] [CrossRef]
Theil, H. (1967). Economics and information theory. North-Holland Publishing Company. [Google Scholar]
Toivonen, T., Heikinheimo, V., Fink, C., Hausmann, A., Hiippala, T., Järv, O., Tenkanen, H., & Di Minin, E. (2019). Social media data for conservation science: A methodological overview. Biological Conservation, 233, 298–315. [Google Scholar] [CrossRef]
Tromble, R. (2021). Where have all the data gone? A critical reflection on academic digital research in the post-API age. Social Media+ Society, 7(1), 2056305121988929. [Google Scholar] [CrossRef]
Wang, X., Mou, N., Zhu, S., Yang, T., Zhang, X., & Zhang, Y. (2024). How to perceive tourism destination image? A visual content analysis based on inbound tourists’ photos. Journal of Destination Marketing & Management, 33, 100923. [Google Scholar] [CrossRef]
Wang, Y., Li, X., & Lai, K. (2018). A meeting of the minds: Exploring the core–periphery structure and retrieval paths of destination image using social network analysis. Journal of Travel Research, 57(5), 612–626. [Google Scholar] [CrossRef]
Wilkins, E. J., Howe, P. D., & Smith, J. W. (2021). Social media reveal ecoregional variation in how weather influences visitor behavior in U.S. National Park Service units. Scientific Reports, 11(1), 2403. [Google Scholar] [CrossRef]
Xiao, X., Fang, C., Lin, H., & Chen, J. (2022). A framework for quantitative analysis and differentiated marketing of tourism destination image based on visual content of photos. Tourism Management, 93, 104585. [Google Scholar] [CrossRef]
Yan, J., Yue, J., Zhang, J., & Qin, P. (2023). Research on spatio-temporal characteristics of tourists’ landscape perception and emotional experience by using photo data mining. International Journal of Environmental Research and Public Health, 20(5), 3843. [Google Scholar] [CrossRef] [PubMed]
Yoo, S. C., & Kang, S. M. (2025). Visual narratives and digital engagement: Decoding Seoul and Tokyo’s tourism identity through Instagram analytics. Tourism and Hospitality, 6(3), 149. [Google Scholar] [CrossRef]
Zhang, F., Fan, Z., Kang, Y., Hu, Y., & Ratti, C. (2021). “Perception bias”: Deciphering a mismatch between urban crime and perception of safety. Landscape and Urban Planning, 207, 104003. [Google Scholar] [CrossRef]
Zhang, F., Zhou, B., Ratti, C., & Liu, Y. (2019). Discovering place-informative scenes and objects using social media photos. Royal Society Open Science, 6(3), 181375. [Google Scholar] [CrossRef]
Zhang, F., Zu, J., Hu, M., Zhu, D., Kang, Y., Gao, S., Zhang, Y., & Huang, Z. (2020). Uncovering inconspicuous places using social media check-ins and street view images. Computers, Environment and Urban Systems, 81, 101478. [Google Scholar] [CrossRef]
Zhang, S., Li, Y., Song, X., Yang, C., Shafiabady, N., & Wu, R. M. (2025). Multi-dimensional perceptual recognition of tourist destination using deep learning model and geographic information system. PLoS ONE, 20(2), e0318846. [Google Scholar] [CrossRef]
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2018). Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1452–1464. [Google Scholar] [CrossRef]
ZMKTK—Institute for Culture, Youth and Tourism Koper. (2024). Strategija razvoja turizma slovenske istre 2030. Available online: https://www.zmkt.si/wp-content/uploads/2024/12/FORUM-SI-predlog-STRATEGIJA-TURIZMA-SLOVENSKE-ISTRE-2030.pdf (accessed on 10 December 2025).

Figure 1. Distribution of extracted object and scene categories across micro destinations.

Figure 2. Spatial distribution of scenes across micro destinations. Theil’s H entropy index (top row) and CLIP-based visual scene labeling (bottom row).

Figure 3. User scene category shifts across micro destinations (BLED, LJU, KIP). Chord diagrams show transitions between scene categories. Colors indicate categories: Recreation (blue), Nature (green), History & Architecture (purple), Infrastructure (orange), Humanistic life (terracotta). The width of each chord reflects the relative proportion of users moving between categories.

Figure 4. Correspondence analysis of micro destinations and seasonal variation in scene types.

Table 1. User’s scene category shifts (overall results).

Shift from (Column) to (Row)	History, Architecture	Humanistic Life	Infrastructure	Nature	Recreation	Σ (%)
History, Architecture		1914	780	2736	274	5704 (32.15)
Humanistic life	1898		436	1336	316	3986 (22.47)
Infrastructure	803	397		577	89	1866 (10.52)
Nature	2715	1394	565		424	5098 (28.74)
Recreation	267	324	91	405		1087 (6.13)
Σ	5683	4029	1872	5054	1103

Table 2. Average daily scene category shifts per tourist across seasons and micro destinations.

	Bled	Ljubljana	KIP	Average
Winter	1.81	4.08	2.99	2.96
Spring	2.92	5.03	3.36	3.77
Summer	2.51	5.93	3.63	4.02
Autumn	2.98	4.93	4.51	4.14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Paliska, D.; Brezovec, A.; Sedmak, G. Image-Based Analysis of Tourist Destination Perceptions: A Deep Learning and Spatial–Temporal Study in Slovenia. Tour. Hosp. 2026, 7, 52. https://doi.org/10.3390/tourhosp7020052

AMA Style

Paliska D, Brezovec A, Sedmak G. Image-Based Analysis of Tourist Destination Perceptions: A Deep Learning and Spatial–Temporal Study in Slovenia. Tourism and Hospitality. 2026; 7(2):52. https://doi.org/10.3390/tourhosp7020052

Chicago/Turabian Style

Paliska, Dejan, Aleksandra Brezovec, and Gorazd Sedmak. 2026. "Image-Based Analysis of Tourist Destination Perceptions: A Deep Learning and Spatial–Temporal Study in Slovenia" Tourism and Hospitality 7, no. 2: 52. https://doi.org/10.3390/tourhosp7020052

APA Style

Paliska, D., Brezovec, A., & Sedmak, G. (2026). Image-Based Analysis of Tourist Destination Perceptions: A Deep Learning and Spatial–Temporal Study in Slovenia. Tourism and Hospitality, 7(2), 52. https://doi.org/10.3390/tourhosp7020052

Article Menu

Image-Based Analysis of Tourist Destination Perceptions: A Deep Learning and Spatial–Temporal Study in Slovenia

Abstract

1. Introduction

2. Literature Review

2.1. Destination Image in the Digital Context

2.2. Deep Learning Approaches to Visual Destination Image Analysis

2.3. From Perception Insights to Destination Marketing Strategy

3. Material and Methods

3.1. Study Area

3.2. Data Acquisition

3.3. Data Filtering

3.4. Image Visual Analysis

3.5. Spatial Analysis

4. Results

5. Discussion and Conclusions

5.1. Theoretical Implications

5.2. Managerial Implications

5.3. Limitations and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI