Next Article in Journal
A Semantic Partition Algorithm Based on Improved K-Means Clustering for Large-Scale Indoor Areas
Previous Article in Journal
Conceptualizing and Validating the Trustworthiness of Maps through an Empirical Study on the Influence of Cultural Background on Map Design Perception
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing the K-Means Algorithm through a Genetic Algorithm Based on Survey and Social Media Tourism Objectives for Tourism Path Recommendations

1
Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, China
2
Faculty of Engineering, Karary University, Khartoum 12304, Sudan
3
Institute for Geodesy and Geoinformation, University of Bonn, 53115 Bonn, Germany
4
College of Computer Science and Information Technology, Karary University, Omdurman 12304, Sudan
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2024, 13(2), 40; https://doi.org/10.3390/ijgi13020040
Submission received: 24 November 2023 / Revised: 19 January 2024 / Accepted: 25 January 2024 / Published: 27 January 2024
(This article belongs to the Topic Geocomputation and Artificial Intelligence for Mapping)

Abstract

:
Social media platforms play a vital role in determining valuable tourist objectives, which greatly aids in optimizing tourist path planning. As data classification and analysis methods have advanced, machine learning (ML) algorithms such as the k-means algorithm have emerged as powerful tools for sorting through data collected from social media platforms. However, traditional k-means algorithms have drawbacks, including challenges in determining initial seed values. This paper presents a novel approach to enhance the k-means algorithm based on survey and social media tourism data for tourism path recommendations. The main contribution of this paper is enhancing the traditional k-means algorithm by employing the genetic algorithm (GA) to determine the number of clusters (k), select the initial seeds, and recommend the best tourism path based on social media tourism data. The GA enhances the k-means algorithm by using a binary string to represent initial centers and to apply GA operators. To assess its effectiveness, we applied this approach to recommend the optimal tourism path in the Red Sea State, Sudan. The results clearly indicate the superiority of our approach, with an algorithm optimization time of 0.01 s. In contrast, traditional k-means and hierarchical cluster algorithms required 0.27 and 0.7 s, respectively.

1. Introduction

The rapid proliferation of social media platforms has significantly expanded the availability of tourism objectives, enabling the recommendation of tourism paths from anywhere and at any time [1,2]. Travelers can effortlessly contribute to tourism objectives while on the go, using platforms such as WhatsApp, Facebook, Flickr, and Instagram. Additionally, users can share their current locations through services like Foursquare and provide feedback on tourist destinations via X, formerly known as Twitter. Location-based social media platforms like Foursquare and TripAdvisor empower users to share their experiences, reviews, and recommendations of tourist destinations, often including geographical coordinates [3,4,5]. By considering factors such as tourist travel costs, preferences, internal tourism objectives, destinations, and transportation options, it is possible to create recommended itineraries using both survey and social media tourism objectives [6,7]. The process for recommending tourist routes based on survey and social media data involves two key steps. First, the tourism objectives are clustered into groups, facilitating analysis and recommendations. Second, the optimal tourist path is recommended by solving the traveling salesman problem (TSP) [8,9].
The increasing volume of tourism data has added complexity to data collection and clusters, particularly when suggesting optimal tourist paths using ML algorithms and solving the TSP by an optimization algorithm. These methods are instrumental in analyzing unstructured data from social media platforms, ultimately enhancing path planning and destination recommendations [10,11,12]. The k-means is a commonly employed clustering algorithm for tourist data. However, it faces challenges in determining the optimal number of clusters (k) and in selecting initial seeds. The choice of k significantly influences the outcomes, and researchers often employ methods like the elbow method and silhouette score for the k selection [13,14,15]. While effective, the elbow method may encounter difficulties when dealing with many clusters [16,17].
On the other hand, the silhouette score evaluates object similarity within clusters but may have limitations when dealing with overlapping or significantly varying-sized clusters. Selecting the initial seeds for k-means is a critical step in the process. Running the algorithm multiple times with random initialization helps identify the best outcome [18,19]. Various seed selection methods, such as random and k-means++, each come with tradeoffs involving simplicity, cluster quality, and computational efficiency [20,21]. Furthermore, the GA has emerged as a powerful optimization algorithm for addressing complex problems, including determining the appropriate number of clusters (k) and identifying optimal initial seeds for the k-means algorithm [22,23].
This paper introduces the use of the GA to address the challenges of determining k and selecting initial seeds. The GA effectively mitigates the limitations of previous methods, addressing issues such as data overlap, handling large datasets, and improving the execution time of the k-means algorithm. The GA enhances the traditional k-means algorithm by employing binary strings to represent initial centers and applying GA operators. This enhancement significantly improves the algorithm’s effectiveness in classifying tourist data. Through parameter optimization, the algorithm becomes more proficient in accurately clustering and categorizing the extensive volume of tourism data, thus enhancing its overall performance. The optimization process iterates until convergence, where cluster assignments remain unchanged. Subsequently, the improved GA employs clustered and optimized tourism objectives to solve the TSP and recommend the optimal tourism path [24,25,26,27]. In summary, the main contributions of this paper are as follows:
Enhancing the traditional k-means algorithm by using the GA to determine initial seeds, selecting the appropriate number of clusters (k), and recommending the best tourism path based on survey and social media tourism objectives.
Collecting the tourism objectives from social media platforms through an online questionnaire and from TripAdvisor.
Selecting and visualizing the optimal tourism path using GAs and the geographic information system (GIS) environment.
Demonstrating the optimal time to implement the GA algorithm for finding the best tourism path through a comparison of our approach with other state-of-the-art methods.
The remainder of this article is structured as follows. In Section 2, we present the methodology and the idea of tourism objectives, survey and social media data, along with enhancing the k-means algorithm through the GA. In Section 3, the system implementation and experimental analysis are discussed. In Section 4, the results of this research and discussion are presented. Finally, the conclusion and future work are presented in Section 5.

Related Work

Tourism objectives are collected from various sources, including governmental and international institutions, geographical surveys, and popular social media platforms such as Facebook, WhatsApp, WeChat, and X. To address the static objectives of tourism planning, three main processes must be carried out: collecting tourism data, classifying tourism data, and employing optimization algorithms to determine the best route. In the following sections, we conduct a thorough analysis of studies related to the formulation of tourism objectives, highlighting key findings and methodologies.
Hu et al. [28] presented a method for deriving tourist movement patterns from X data involving a three-step process of cleaning geo-tagged posts to identify those authored by tourists. However, this method’s reliance solely on X data may limit its ability to represent comprehensive tourist activity from other sources or platforms. A separate study by Riaz and Sherani [29] overcame this limitation by focusing on the factors influencing information sharing on multiple social media platforms, particularly the adoption of Facebook and WeChat. Hashimy and T. S. [30] explored the opportunities and challenges of using social media platforms such as WhatsApp and Facebook for tourism development in Afghanistan. These include increased visibility, user-generated content, direct communication, influencer marketing, and destination marketing. However, the paper also highlights challenges that must be addressed, such as ensuring tourist safety and the need for infrastructure development. Sakas et al. [31] considered multiple objectives, including transportation type and tourist preferences, collected from various social media platforms. These objectives collectively describe the tourist destination, falling under the category of internal objectives. However, this approach overlooks the external objectives associated with interactions between tourist destinations. Addressing the challenge of integrating both internal and external objectives within a unified approach is essential for the advancement of this field.
A novel approach developed by Kim et al. [32] focused on developing a deep learning model and an image feature vector clustering technique to automate the categorization of traveler images by tourism destinations. However, the paper has limitations, primarily focusing on spatial data and omitting information about the characteristics and features of tourist destinations. The study by Bouabdallaoui et al. introduces an innovative clustering architecture that integrates the GA and k-means, coupled with a hybrid topic discovery approach incorporating latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT). The primary objective of this novel method is to predict and analyze the most significant topics related to tourist shopping destinations in Morocco. However, a significant limitation of this paper is the lack of attention in determining the values for k and the initial seeds. This limitation stems from the paper’s reliance on the random selection of both the number of groups (k) and the initial seeds, raising concerns regarding the robustness and reproducibility of the results.
Yafeng et al. [33] presented a new approach based on the GA to develop 47 tourism areas in Chongqing City, China. While this paper provides an intriguing approach for applying the GA to optimize tourism path planning, which could assist tour operators and planners in developing more effective, fun, and easy tourist trips, it is essential to note that the scope of this study is centered on enhancing the planning of tourist routes specifically for the 47 scenic areas in Chongqing. As a result, the general applicability of the findings to different contexts or regions may be limited. Moreover, this research solely employed the GA to identify the optimal tourism path without delving into the diverse objectives of tourism or the various tourism data sources available, such as social media platforms. Patcharin et al. [34] introduced a method for recognizing aircraft trajectories through statistical analysis clustering. It employs k-means clustering and Gaussian mixture clustering to group unstructured trajectories observed over Suvarnabhumi International Airport. Therefore, the applicability of these findings to other regions, such as tourist destinations, may be limited. Additionally, it is worth mentioning that the k algorithm used in this approach is not optimized, which affects the algorithm’s execution time. Majid et al. proposed the development of urban tourism and branding for spatial modeling. The authors used a novel hybrid modeling approach combining k-mean, fuzzy logic, and an artificial neural network (ANN) to assess urban tourism potential (AUTP). While this modeling provides valuable information for developing future strategies for urban tourism, the paper does not consider tourism data sources such as social media platforms, and it also overlooks various tourism objectives. Mehrdad et al. [35] discuss the use of unsupervised clustering methods as data-driven models for mineral prospectivity mapping (MPM). A hybrid data-driven clustering model combines the k-means clustering algorithm with harmony search (HS) and artificial bee colony (ABC) metaheuristic optimization algorithms. This hybrid model can be used for the selection of optimum cluster centroids to highlight favorable targets in the prospecting stage of mineral explorations.
In conclusion, many papers have addressed the extraction and prediction of tourist paths based on social media platforms. However, the aforementioned papers lack a precise definition of survey and social media data in the context of tourism and its unique characteristics. They primarily analyze data from a single social media platform without comparing the suitability of various platforms for tourism research. Additionally, these papers do not introduce novel methods for analyzing social media data in the tourism domain. Furthermore, the implementation of algorithms to organize and categorize the spatial and attribute data of tourist destinations can be time-consuming.

2. Methodology

The proposed approach utilizes the combined power of the K-means algorithm and GA to optimize and cluster social media tourism data, ultimately identifying the optimal tourism path. This approach can be broken down into several stages:
First, tourism objectives are collected from social media platforms using two main methods. The first method involves distributing questionnaires on the websites of tourist groups within popular social media platforms such as WeChat and WhatsApp. This allows for the direct collection of relevant information from users. The second method involves extracting objectives from the TripAdvisor website, which serves as a valuable source of tourism objectives.
Second, once the tourism objectives have been gathered from these platforms, they are segmented into distinct groups using the K-means algorithm. The initial value of k, representing the number of clusters, is determined using the GA. Additionally, 12 specific tourism objectives are carefully selected to evaluate and assess different tourist destinations.
Third, the GA is employed to suggest and determine the best tourist path based on the clustered data. This comprehensive and systematic approach allows researchers to gain valuable insights into the preferences and patterns of tourists. These stages are all shown in Figure 1.

2.1. Survey and Social Media Data

Social media platforms have evolved into indispensable resources for the collection of tourism data. These platforms hold significant importance within the tourism industry, primarily because they enable the real time sharing of user generated content [36,37]. Tourists utilize social media as a medium for sharing their tourism experiences, recommendations, and feedback, thus contributing to the creation of an extensive repository of information that holds immense value for researchers and tourism experts. By analyzing this rich trove of tourism data, researchers can glean valuable insights into tourist preferences. This wealth of data empowers businesses and destinations to make well informed decisions and tailor their services to align with the evolving requirements and preferences of tourists. The widespread adoption of social media platforms provides a unique opportunity to gather tourism data on a large scale, thereby fostering a deeper understanding of tourists and an overall enhancement of the tourism experience [38].

2.2. Selection Objectives

Tourism objectives in this approach were obtained from social media platforms through two methods: The first method involved the creation of a questionnaire, which was then distributed across various social media platforms groups like Facebook, WhatsApp, and WeChat. Access to the survey could be found at this link: https://forms.gle/6UHHubaiAPA6JhtE7, accessed on 25 February 2023. This questionnaire consisted of a range of inquiries concerning tourism objectives. Table 1 shows the definition of tourism objectives based on tourist preferences. Table 2 shows the sample of online questionnaire results.
The second method focused on acquiring tourist preferences, calculating distances between tourist destinations, and estimating travel costs between destinations. Table 3 proposed the tourism destination in Port Sudan City. These points were chosen on the basis of tourist demand and the aesthetic views available there, so they are considered a point of interest (POI). Tourist preferences indicate the extent to which tourists evaluate tourism destinations on the TripAdvisor website. Ratings range from 1 to 5, where 5 means very good, 4 means good, 3 means average, 2 means weak, and 1 means very weak. Table 4 displays tourist preferences according to the TripAdvisor website. Distances between tourist destinations are measured in kilometers in Table 5, and travel costs between destinations are measured in Sudanese pounds (SDG) in Table 6. This information was sourced from data collected on the TripAdvisor website.

2.3. Genetic Algorithm

GA is a computational method and N-hard algorithm based on the process of natural selection and genetics; it is a kind of meta-heuristic heuristic algorithm that mimics the natural process of evolution to solve complex issues [39,40]. The population of solutions in GA is randomly created and their fitness functions are evaluated to create a new population; the fittest individuals are chosen and then mutated. These operations are repeated until finding of the optimal solutions as the following:
  • Initialization: population of possible solutions is generated randomly.
  • Evaluation: each solution is tested for its fitness to use a successful fitness function.
  • Selection: more fitness functions are chosen to be the parents of the following generation.
  • Crossover: new individuals are generated by combining the existing genetic material of the selected parents.
  • Mutation: new individuals may undergo mutation, which introduces small changes in parents and a new generation replaces the old generation. The algorithm stops when a stopping parameter is satisfied, such as a set number of generations or a successful solution. Figure 2 shows the GA operations.
In general, the k-means algorithm is a machine learning tool that may be useful in a wide range of applications, such as clustering, anomaly detection, image compression, recommendation systems, and tourism path planning [41]. In our approach, we used the k-means algorithm to cluster tourism data and group it in the k group, the number of clusters was determined by using the GA.

2.4. K-Means Algorithm

The k-means algorithm is an unsupervised machine learning algorithm using clustering data, k-means classifying, and dividing data into k classes by its properties. The algorithm progresses using iterations; each data point is iteratively assigned to the closest centroid (cluster center), and the centroids are then computed again using the new assignments [42]. This procedure continues until the centroids stop moving altogether or the maximum number of iterations has been achieved. In the following are the steps of k-means algorithms:
Choose several clusters k.
Randomly initialize k centroid.
After the initial centroids have been selected at random, decide on each point nearest to the centroid.
Recalculate the centroids according to the new value of the mean of all the data points in that cluster. If given two points x and y, cluster C with k data points ( x 1 , x 2 , …, x k ) , then the centroid C is calculated as ( 1 k ( x 1 + x 2 + …… + x k )).
Repeat the 3–4 steps until the centroids stop moving altogether or the maximum number of iterations has been achieved.
The k-means algorithm aims are determined and then find the minimized sum distance between the data and determine the centroid. Many methods can be used to determine distance such as the Euclidean distance method; this method is most commonly used if given two points x and y and the Euclidean distance is calculated as Equation (1):
D x , y = ( X 2 X 1 ) 2 ( Y 2 Y 1 ) 2
where n is the number of data points.

2.5. Enhancing the K-Means Algorithm though GA

This approach enhances k-means to utilize GA to determine the optimal initial seeds and number of the k value for k-means clustering. Figure 2 provides an overview of the entire framework for GA k-means clustering and enhancing the algorithm showing in pseudo code. Presented below is a detailed explanation of enhancing the k-means clustering process:
In the first step of this approach, a value is defined to generate the initial population in GA and solution fitness is assessed based on clustering quality, measured by the sum of squared errors (SSE), which is utilized to determine the optimal initial seeds and the value of k. To enhance the performance of GA, optimization parameters were adjusted, as shown in Table 3, Table 4, Table 5 and Table 6 these parameter adjustments aim to optimize the GA for more effective initial seed selection and k value determination for clustering. The improved GA, designed to enhance the K-means algorithm, is presented in the pseudo code.
In the second step of this process, once the optimal values for both the number of clusters (k) and the initial seeds have been determined, the tourism data are partitioned into groups. This partitioning is accomplished using the improved k-means algorithm introduced in the first step. The enhanced k-means algorithm effectively assigns each data point to its corresponding cluster based on similarity, taking into account the optimized k value and the initial seeds. By partitioning the tourism data into groups, this step facilitates further analysis and enables the identification of unique patterns, visitor preferences, or notable characteristics within the dataset.
In the third step, after segmenting the tourism data into groups using the enhanced k-means algorithm and considering the identified tourism objectives, an improved GA is employed to recommend the optimal tourist path. Although harnessing the capabilities of the GA, which combines elements of natural selection and genetic operators, the algorithm efficiently searches for the most favorable path that aligns with the specified tourism objectives. Figure 3 and Algorithm 1 shows the Framework of recommended tourism path enhancing the k-means algorithm through GA.
Algorithm 1: Pseudo code of enhancing k-means by GA
Begin
Initialization
Generate a solution population, representing possible data clusterings, cluster centers in k-means.
Fitness Evaluation
Assess solution fitness based on clustering quality, often measured by sum of squared errors (SSE)
Selection
Choose parent solutions for the next generation, with higher fitness solutions having a better chance.
Crossover
Create new solutions by combining features from two parents averaging cluster centers.
Mutation
Randomly alter some new solution features to maintain diversity and prevent premature convergence.
Replacement
Replace some current solutions with the new ones.
Termination
If a stopping criterion is met, stop and return the best intial seeds and k value found. Otherwise.
Repeat
From step 2.
Print:
Else
Print: Fail
end if
end

3. System Implementation and Experimental Analysis

Port Sudan City is in Red Sea State, Sudan. Port Sudan, the capital of Red Sea State in eastern Sudan, functions as Sudan’s primary seaport. Over 90% of Sudan’s international trade flows through Port Sudan’s modern port facilities, which were built between 1905 and 1909 to replace the historical Arab port of Suakin. Port Sudan features key infrastructure like an international airport, an oil refinery, and state of the art cargo and passenger terminals. Located on the eastern coast of the Red Sea, the port handles significant volumes of container traffic, bulk commodities, and roll-on/roll-off shipments. Port Sudan serves as a strategic gateway for the landlocked countries of South Sudan, Ethiopia, and Eritrea. It provides access to key trade routes like the Suez Canal and the Bab el Mandeb Strait. With ample developable land and a deep-water harbor, the port has potential for significant expansion to support Sudan’s growing trade volumes and improve supply chain connectivity. However, challenges remain around port efficiency and infrastructure constraints that have hindered full realization of Port Sudan’s potential as a regional shipping hub. Ongoing developments and investments aim to address these issues, upgrade port facilities, digitize processes, and expand container handling and logistics services. If successful, such efforts could transform Port Sudan into a modern logistics center that enhances Sudan’s trade competitiveness and links the nation to global supply networks. Figure 4 shows Port Sudan City in Red Sea State Sudan. Sex tourist destinations distributed in the study area were selected. Table 3 and Figure 5 proposed this point of interest (POI) in Port Sudan city. Following the selection, questionnaires were distributed to these chosen destinations.

4. Results and Discussion

4.1. Results

Based on the results of online questionnaires distributed on social media platforms in the study area, we determined the static tourism objectives with input from 600 visitors to enhance our approach. We collected the external tourism objectives from the TripAdvisor social media website. To implement our approach, first, we improved the GA by using new parameters, and then we determined the optimal k value and initial seeds. The optimal k value is 5, and the optimal initial seed selection is 10. Second, we used the enhanced k-means algorithm to cluster the static tourism objectives based on the value of k. Five groups of visitors were created for each tourism objective. The optimal path recommendations for visitor destinations, determined through the enhanced GA with improved parameters in Table 7, are presented in Table 8.
Table 9 and Figure 6 show groups of the internal tourism objectives based on enhancing the k-means algorithm.
After creating tables categorizing tourism objectives into groups, we constructed a comprehensive tourism objectives matrix comprising 12 matrices, nine for internal objectives and three for external objectives. The numbers in the matrices represent the numerical differences between the numbers of visitors in the EN groups across various destinations. For instance, the value 25 in the EN objectives matrix corresponds to the difference in the number of visitors between destination P1 and destination P2 in Group 1 in Table 9. Consequently, we calculated the matrices for all nine internal objectives in the same manner as illustrated in Table 10. Following this, we calculated 12 optimal paths based on static tourism objectives using the improved GA. Our approach involved creating objective matrices to address the TSP.

4.2. Discussion

When collecting and classifying tourism data, online surveys and social media play significant roles. In our approach, we propose to enhance the k-means algorithm for optimizing tourism data obtained from social media platforms. The primary challenge lies in determining the optimal k value for the k-means algorithm and selecting initial seeds, which can be addressed using various methods. Our method is based on the fundamental premise of plotting different cost values against varying k values. The elbow point on the graph can be used to compute k, representing the point of diminishing returns or the inflection point at the elbow [43,44]. However, the drawback of the elbow method is that it occasionally struggles to produce effective clusters. As an alternative, we employ an improved GA in our approach to determine the ideal value of k. Compared to other methods, the two-stage GA represents a relatively recent development. Many scholars have employed the k-means algorithm to classify and organize data into groups because of advantages such as ease of implementation, scalability to handle large datasets, guaranteed convergence, the ability to initialize centroids, and adaptability to new data points. However, one challenge associated with the k-means algorithm is the estimation of the optimal number of groups [41,45]. The k-means algorithm is favored by many scholars for data classification due to its strengths, such as ease of implementation and scalability. One of its drawbacks is the difficulty in determining the number of k groups. Our approach employs the GA to overcome this limitation.
In the process of clustering with k-means, initial seeds for clustering are selected. The method used to choose these seeds is dependent on the data and the problem being addressed. One approach commonly employed involves the random selection of initial seed points; the algorithm is executed multiple times, retaining the seeds that yield the lowest clustering error. Alternatively, an initial seed selection algorithm can be utilized, which selects the initial seeds from different clusters within the dataset. The decision regarding these methods is dictated by the characteristics of the data and the clustering objectives at hand [46]. In this approach, we utilize the GA to determine the optimal k value and to select initial seeds. Comparisons were conducted with several algorithms commonly used in the field of clustering. These include the expectation-maximization (EM) algorithm, hierarchical clustering, and the traditional k-means algorithm. Table 11 presents the comparison between the enhancing k-means algorithm and other machine learning clustering algorithms. If we wish to implement machine learning algorithms, the most valuable parameter is the optimization time. The optimization time in the k-means algorithm is 0.01 s, and the number of iterations is five. These results indicate the superiority of the k-means algorithm over other clustering algorithms.
In this study, we made significant enhancements to the k-means algorithm. These enhancements specifically improve the methods used to determine the number of groups (k) and the selection of initial seeds. We achieved these improvements by utilizing an enhanced GA and optimizing its parameters. We applied this improved GA to predict optimal tourist paths. This prediction is based on tourism objectives derived from social media platforms, considering factors such as popular destinations and peak travel times. Although our approach was specifically implemented for the Port Sudan region of the Red Sea State in Sudan, it can be applied to other regions as well. The suitability of other regions for this approach depends on factors such as the variety of tourist targets, the population characteristics, and the prevalence of relevant social media platforms. However, for accurate and effective implementation, it is essential to conduct a comprehensive study of the region’s specific tourist objectives, the population characteristics, and the relevant social networking sites.

5. Conclusions and Future Work

Tourism objectives derived from social media can offer new opportunities for decision support in recommending tourism paths. In this paper, we propose an innovative approach to optimize and classify tourism objectives for recommending the optimal tourism path, using the k-means algorithm and GA. We also integrate various tools for this purpose, demonstrating the applicability of an improved k-means algorithm and GA for developing tourism path planning. Additionally, we utilize the GIS to implement and visualize efficient social media tourism objectives and display the optimal routes. Our approach is organized as follows: First, the tourism objectives were collected from surveys and social media platforms. Second, the GA was used to enhance the k-means algorithm with a new parameter for clustering tourism objectives. Finally, a comparison and combination were performed with the algorithms currently used in the GIS environment. The following points are recommended:
Optimize and classify 12 tourism objectives based on social media platforms to determine the path of tourists.
Apply the GA to determine the number of clusters, initial seeds, and the optimal path planning.
Optimize and visualize the tourism path planning approach based on the social media tourism objectives.
There is still room for improvement in using ML algorithms to improve social media data. Future work can focus on increasing the objectives and fusing both internal and external objectives in the evaluations of web users.

Author Contributions

Conceptualization, Mohamed A. Damos, Rashad Elhabob, and Jun Zhu; methodology, Weilian Li; software, Elhadi Khalifa; validation, Abubakr Hassan, Mohamed A. Damos, Weilian Li, and Jun Zhu; writing–original draft preparation, Abubakr Hassan; writing–review and editing, Elhadi Khalifa; visualization, Jun Zhu, Abubakr Hassan, and Alaa Hm; funding acquisition, Mohamed A. Damos. Resources, Weilian Li and Esra Ei. All authors have read and agreed to the published version of the manuscript.

Funding

This article was supported by the National Natural Science Foundation of China [grant number 42171397].

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, Mohamed A. Damos, upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Minazzi, R. Social Media Marketing in Tourism and Hospitality; Springer: Cham, Switzerland, 2015. [Google Scholar]
  2. Tenemaza, M.; Luján-Mora, S.; De Antonio, A.; Ramirez, J. Improving itinerary recommendations for tourists through metaheuristic algorithms: An optimization proposal. IEEE Access 2020, 8, 79003–79023. [Google Scholar] [CrossRef]
  3. Lee, J.Y.; Tsou, M.-H. Mapping spatiotemporal tourist behaviors and hotspots through location-based photo-sharing service (Flickr) data. In Progress in Location Based Services 2018; Springer: Cham, Switzerland, 2018; pp. 315–334. [Google Scholar]
  4. Wan, L.; Hong, Y.; Huang, Z.; Peng, X.; Li, R. A hybrid ensemble learning method for tourist route recommendations based on geo-tagged social networks. Int. J. Geogr. Inf. Sci. 2018, 32, 2225–2246. [Google Scholar] [CrossRef]
  5. Zhu, J.; Zhang, J.; Zhu, Q.; Li, W.; Wu, J.; Guo, Y. A knowledge-guided visualization framework of disaster scenes for helping the public cognize risk information. Int. J. Geogr. Inf. Sci. 2024, 38, 1–28. [Google Scholar] [CrossRef]
  6. Aftab, S.; Khan, M.M. Role of social media in promoting tourism in Pakistan. J. Soc. Sci. Humanit. 2019, 58, 101–113. [Google Scholar] [CrossRef]
  7. Jimenez-Barreto, J.; Sthapit, E.; Rubio, N.; Campo, S. Exploring the dimensions of online destination brand experience: Spanish and North American tourists’ perspectives. Tour. Manag. Perspect. 2019, 31, 348–360. [Google Scholar] [CrossRef]
  8. Ahsini, Y.; Díaz-Masa, P.; Inglés, B.; Rubio, A.; Martínez, A.; Magraner, A.; Conejero, J.A. The Electric Vehicle Traveling Salesman Problem on Digital Elevation Models for Traffic-Aware Urban Logistics. Algorithms 2023, 16, 402. [Google Scholar] [CrossRef]
  9. Silva, C.E.; César, T.S.; Gomes, I.P.; Silva, J.A.; Wolf, D.F.; Alves, R.; Souza, J.R. Scheduling System for Multiple Self-driving Cars Using K-Means and Bio-inspired Optimization Algorithms. SN Comput. Sci. 2023, 4, 647. [Google Scholar] [CrossRef]
  10. Gaur, L.; Afaq, A.; Solanki, A.; Singh, G.; Sharma, S.; Jhanjhi, N.; My, H.T.; Le, D.-N. Capitalizing on big data and revolutionary 5G technology: Extracting and visualizing ratings and reviews of global chain hotels. Comput. Electr. Eng. 2021, 95, 107374. [Google Scholar] [CrossRef]
  11. Hamid, R.A.; Albahri, A.S.; Alwan, J.K.; Al-Qaysi, Z.; Albahri, O.S.; Zaidan, A.; Alnoor, A.; Alamoodi, A.H.; Zaidan, B. How smart is e-tourism? A systematic review of smart tourism recommendation system applying data management. Comput. Sci. Rev. 2021, 39, 100337. [Google Scholar] [CrossRef]
  12. Li, W.; Zhu, J.; Zhu, Q.; Zhang, J.; Han, X.; Dehbi, Y. Visual attention-guided augmented representation of geographic scenes: A case of bridge stress visualization. Int. J. Geogr. Inf. Sci. 2024, 38. [Google Scholar] [CrossRef]
  13. Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
  14. Jahwar, A.F.; Abdulazeez, A.M. Meta-heuristic algorithms for K-means clustering: A review. PalArch’s J. Archaeol. Egypt/Egyptol. 2020, 17, 12002–12020. [Google Scholar]
  15. Huang, J. Design of Tourism Data Clustering Analysis Model Based on K-Means Clustering Algorithm. In International Conference on Multi-Modal Information Analytics; Springer: Cham, Switzerland, 2022; pp. 373–380. [Google Scholar]
  16. Yuan, C.; Yang, H. Research on K-value selection method of K-means clustering algorithm. J 2019, 2, 226–235. [Google Scholar] [CrossRef]
  17. Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
  18. Yang, Z.; Jiang, F.; Yu, X.; Du, J. Initial Seeds Selection for K-means Clustering Based on Outlier Detection. In Proceedings of the 2022 5th International Conference on Software Engineering and Information Management (ICSIM), Yokohama, Japan, 21–23 January 2022; pp. 138–143. [Google Scholar]
  19. Li, W.; Zhu, J.; Fu, L.; Zhu, Q.; Xie, Y.; Hu, Y. An augmented representation method of debris flow scenes to improve public perception. Int. J. Geogr. Inf. Sci. 2021, 35, 1521–1544. [Google Scholar] [CrossRef]
  20. Han, M. Research on optimization of K-means Algorithm Based on Spark. In Proceedings of the 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 24–26 February 2023; pp. 1829–1836. [Google Scholar]
  21. Bahmani, B.; Moseley, B.; Vattani, A.; Kumar, R.; Vassilvitskii, S. Scalable k-means++. arXiv 2012, arXiv:1203.6402. [Google Scholar] [CrossRef]
  22. Chaudhary, M.; Pruthi, J.; Jain, V.K.; Suryakant. A novel squirrel search clustering algorithm for text document clustering. Int. J. Inf. Technol. 2022, 14, 3277–3286. [Google Scholar] [CrossRef]
  23. Al Shaqsi, J.; Wang, W. Robust Clustering Ensemble Algorithm. SSRN Electron. J. 2022. [Google Scholar] [CrossRef]
  24. Alzyadat, T.; Yamin, M.; Chetty, G. Genetic algorithms for the travelling salesman problem: A crossover comparison. Int. J. Inf. Technol. 2020, 12, 209–213. [Google Scholar] [CrossRef]
  25. Al-Kaseem, B.R.; Taha, Z.K.; Abdulmajeed, S.W.; Al-Raweshidy, H.S. Optimized energy–efficient path planning strategy in WSN with multiple Mobile sinks. IEEE Access 2021, 9, 82833–82847. [Google Scholar] [CrossRef]
  26. Chen, J.; Zhang, Y.; Wu, L.; You, T.; Ning, X. An adaptive clustering-based algorithm for automatic path planning of heterogeneous UAVs. IEEE Trans. Intell. Transp. Syst. 2021, 23, 16842–16853. [Google Scholar] [CrossRef]
  27. Ahmed, A.; Ju, H.; Yang, Y.; Xu, H. An Improved Unit Quaternion for Attitude Alignment and Inverse Kinematic Solution of the Robot Arm Wrist. Machines 2023, 11, 669. [Google Scholar] [CrossRef]
  28. Hu, F.; Li, Z.; Yang, C.; Jiang, Y. A graph-based approach to detecting tourist movement patterns using social media data. Cartogr. Geogr. Inf. Sci. 2019, 46, 368–382. [Google Scholar] [CrossRef]
  29. Riaz, M.; Sherani. Investigation of information sharing via multiple social media platforms: A comparison of Facebook and WeChat adoption. Qual. Quant. 2021, 55, 1751–1773. [Google Scholar] [CrossRef]
  30. Hashimy, S.Q.; Halim, T.S. The Impact of Social Media on Afghanistan’s Tourism Industry: A Roadmap for the Future in the Internet Highway. Law Soc. Policy Rev. 2023, 1, 17–50. [Google Scholar]
  31. Sakas, D.P.; Reklitis, D.P.; Terzi, M.C.; Vassilakis, C. Multichannel digital marketing optimizations through Big Data Analytics in the tourism and Hospitality Industry. J. Theor. Appl. Electron. Commer. Res. 2022, 17, 1383–1408. [Google Scholar] [CrossRef]
  32. Kim, J.; Kang, Y. Automatic classification of photos by tourist attractions using deep learning model and image feature vector clustering. ISPRS Int. J. Geo-Inf. 2022, 11, 245. [Google Scholar] [CrossRef]
  33. Chen, Y.; Zheng, X.; Fang, Z.; Yu, Y.; Kuang, Z.; Huang, Y. Research on optimization of tourism route based on genetic algorithm. J. Phys. Conf. Ser. 2020, 1575, 012027. [Google Scholar] [CrossRef]
  34. Kamsing, P.; Torteeka, P.; Yooyen, S.; Yenpiem, S.; Delahaye, D.; Notry, P.; Phisannupawong, T.; Channumsin, S. Aircraft trajectory recognition via statistical analysis clustering for Suvarnabhumi International Airport. In Proceedings of the 2020 22nd International Conference on Advanced Communication Technology (ICACT), Pyeongchang, Republic of Korea, 16–19 February 2020; pp. 290–297. [Google Scholar]
  35. Dadashpour Moghaddam, M.; Ahmadzadeh, H.; Valizadeh, R. A GIS-based assessment of urban tourism potential with a branding approach utilizing hybrid modeling. Spat. Inf. Res. 2022, 30, 399–416. [Google Scholar] [CrossRef]
  36. Zhou, X.; Xu, C.; Kimmons, B. Detecting tourism destinations using scalable geospatial analysis based on cloud computing platform. Comput. Environ. Urban Syst. 2015, 54, 144–153. [Google Scholar] [CrossRef]
  37. Wang, H.; Yan, J. Effects of social media tourism information quality on destination travel intention: Mediation effect of self-congruity and trust. Front. Psychol. 2022, 13, 1049149. [Google Scholar] [CrossRef]
  38. Sarkar, S.K.; George, B. Social media technologies in the tourism industry: An analysis with special reference to their role in sustainable tourism development. Int. J. Tour. Sci. 2018, 18, 269–278. [Google Scholar] [CrossRef]
  39. Tahir, M.; Tubaishat, A.; Al-Obeidat, F.; Shah, B.; Halim, Z.; Waqas, M. A novel binary chaotic genetic algorithm for feature selection and its utility in affective computing and healthcare. Neural Comput. Appl. 2020, 34, 1–22. [Google Scholar] [CrossRef]
  40. Damos, M.A.; Zhu, J.; Li, W.; Hassan, A.; Khalifa, E. A novel urban tourism path planning approach based on a multiobjective genetic algorithm. ISPRS Int. J. Geo-Inf. 2021, 10, 530. [Google Scholar] [CrossRef]
  41. Pizzuti, C.; Procopio, N. A k-means based genetic algorithm for data clustering. In Proceedings of the International Joint Conference SOCO’16-CISIS’16-ICEUTE’16, San Sebastián, Spain, 19–21 October 2016; Proceedings 11. pp. 211–222. [Google Scholar]
  42. Tabianan, K.; Velu, S.; Ravi, V. K-means clustering approach for intelligent customer segmentation using customer purchase behavior data. Sustainability 2022, 14, 7243. [Google Scholar] [CrossRef]
  43. Ghezelbash, R.; Maghsoudi, A.; Shamekhi, M.; Pradhan, B.; Daviran, M. Genetic algorithm to optimize the SVM and K-means algorithms for mapping of mineral prospectivity. Neural Comput. Appl. 2023, 35, 719–733. [Google Scholar] [CrossRef]
  44. Zubair, M.; Iqbal, M.A.; Shil, A.; Chowdhury, M.; Moni, M.A.; Sarker, I.H. An improved K-means clustering algorithm towards an efficient data-driven modeling. Ann. Data Sci. 2022, 9, 1–20. [Google Scholar] [CrossRef]
  45. Daviran, M.; Ghezelbash, R.; Niknezhad, M.; Maghsoudi, A.; Ghaeminejad, H. Hybridizing K-means clustering algorithm with harmony search and artificial bee colony optimizers for intelligence mineral prospectivity mapping. Earth Sci. Inform. 2023, 16, 2143–2165. [Google Scholar] [CrossRef]
  46. Sajidha, S.; Desikan, K.; Chodnekar, S.P. Initial seed selection for mixed data using modified k-means clustering algorithm. Arab. J. Sci. Eng. 2020, 45, 2685–2703. [Google Scholar] [CrossRef]
Figure 1. Process of recommending the best path based on survey and social media tourism data.
Figure 1. Process of recommending the best path based on survey and social media tourism data.
Ijgi 13 00040 g001
Figure 2. GA operations.
Figure 2. GA operations.
Ijgi 13 00040 g002
Figure 3. Framework of recommended tourism path enhancing the k-means algorithm through GA [40].
Figure 3. Framework of recommended tourism path enhancing the k-means algorithm through GA [40].
Ijgi 13 00040 g003
Figure 4. Location of Red Sea State Sudan in Sudan.
Figure 4. Location of Red Sea State Sudan in Sudan.
Ijgi 13 00040 g004
Figure 5. Tourist sites distributed within the city.
Figure 5. Tourist sites distributed within the city.
Ijgi 13 00040 g005
Figure 6. Groups of the internal tourism objectives based on enhancing the k-means algorithm.
Figure 6. Groups of the internal tourism objectives based on enhancing the k-means algorithm.
Ijgi 13 00040 g006
Table 1. Definition of tourism objectives.
Table 1. Definition of tourism objectives.
Selected ObjectiveExplanation and ReferencesEvaluation ScaleRating
Entertainment value
(EN)
The value of entertainment refers to the entertainment available in the tourist site, which is available to the visitor.Very high10
High7
Medium4
Low1
Aesthetic and art (AA)Aesthetics and arts include the aesthetic and artistic sensitivity. The practical, cultural, and philosophical qualities of the site.Very high10
High7
Medium4
Low1
Cultural–historical value (CH)The historical and cultural value is considered to be one of the most important factors that affects why tourists flock to tourist sites.Very high10
High7
Medium4
Low1
Scientific value
(SI)
The scientific value of the tourist site indicates the scientific importance of the site, such as universities and others.Very high10
High7
Medium4
Low1
Size of tourism destination
(TD)
The size of the tourist destination, the height of the place, and the ability of the tourist destination to accommodate tourists.>50 km210
>10–50 km27
1–10 km24
<1 km21
Tourism seasonality
(TS)
Tourism seasonality is the possibility of visiting a tourist site in a specific season of year, some sites that can be visited year-round, such as museums, some site have seasonality such as gardens.>300 days/year10
>200–300 days/year7
100–200 days/year4
<100 days/year1
Quality of service (QS)Quality of service includes all services provided within tourist sites such as restaurants, cafes, shops, and others.Very high10
High7
Medium4
Low1
Time in site
(TI)
This includes the time spent by the visitor inside the site, taking into account the opening and closing times of the gates.>310
>2–37
>1–24
0–11
Biodiversity
(BI)
The value of biological diversity is evaluated according to the different types of endemic animals.Very high10
High7
Medium4
Low1
Table 2. Sample of online questionnaire results.
Table 2. Sample of online questionnaire results.
Visitor NOENAACHSITDTSQSTIBI
1lowMediumMediumV.high33Medium7low
2MediumlowHighV.high44V.high5High
3MediumHighV.highHigh75V.high3V.high
4HighV.highMediumV.high55Medium10Medium
5MediumlowV.highHigh37V.high7low
6HighMediumlowMedium105High5High
7lowlowHighMedium73Medium7Medium
8lowHighMediumV.high37V.high5Medium
9HighlowHighV.high104V.high3V.high
10lowV.highlowMedium57High7Medium
11V.highMediumlowV.high510Medium10V.high
12lowMediumHighV.high34Medium3V.high
13V.highMediumV.highHigh75V.high5High
14HighV.highMediumV.high55low3Medium
15HighV.highMediumV.high105High5High
16lowMediumMediumlow105V.high7low
17HighV.highMediumV.high55V.high5High
18MediumHighV.highHigh75V.high3V.high
19HighV.highMediumV.high37V.high7low
20lowlowHighMedium73Medium7Medium
21HighMediumlowMedium37V.high5Medium
22lowlowHighMedium104V.high3V.high
23HighMediumlowMedium73Medium7Medium
24lowlowHighMedium37V.high5Medium
25HighV.highMediumV.high55Medium10Medium
Table 3. Tourism destinations in Port Sudan City.
Table 3. Tourism destinations in Port Sudan City.
POIENName
P137.4504519.72309Sanganeb Reserve
P237.3429719.11629Othman Digna port
P337.3374419.11293Suakin city
P437.1051718.76735Arquette Resort
P537.1030718.77391Lake Arquette
P637.2467619.55832Red Sea Resort
Table 4. Matrix of tourist preferences.
Table 4. Matrix of tourist preferences.
POIP1P2P3P4P5P6
P1024133
P2202423
P3430123
P4141031
P5322304
P6333140
Table 5. Matrix of distance between tourism destinations (km).
Table 5. Matrix of distance between tourism destinations (km).
POIP1P2P3P4P5P6
P1067.568.9112.5111.929.9
P267.500.4545.444.250.3
P368.90.45044.1144.449.9
P4112.545.444.1101.2189.9
P5111.944.244.41.21088.8
P629.950.349.989.988.80
Table 6. Matrix of travel costs between destinations (SDG).
Table 6. Matrix of travel costs between destinations (SDG).
POIP1P2P3P4P5P6
P10500400300450300
P25000250350300700
P34002500250200150
P43003502500150400
P54503002001500300
P63007001504003000
Table 7. Parameter settings of GA.
Table 7. Parameter settings of GA.
ParametersValues
Population size100
Crossover probability0.85
Mutation probability0.10
Number of generations4000
Table 8. Optimal tourism paths.
Table 8. Optimal tourism paths.
NOThe ObjectivesThe Path
1ENP1-P5-P6-P3-P2-P4-P1
2AAP4-P2-P1-P4-P6-P5-P4
3CHP5-P2-P4-P6-P3-P1-P5
4SIP3-P4-P6-P1-P2-P5-P3
5TDP2-P4-P6-P3-P1-P5-P2
6TSP4-P6-P4-P1-P2-P3-P4
7QSP6-P1-P5-P2-P3-P4-P6
8TIP6-P3-P2-P4-P5-P1-P6
9BIP5-P3-P4-P1-P2-P6-P5
10Tourist preferencesP1-P5-P6-P3-P2-P4-P1
11Travel costsP6-P5-P2-P1-P3-P4-P5
12Total distancesP2-P3-P4-P5-P6-P1-P2
Table 9. Results of internal tourism objective groups of visitors using the enhance k-means algorithm.
Table 9. Results of internal tourism objective groups of visitors using the enhance k-means algorithm.
Group 1
POI/ObjectiveP1P2P3P4P5P6
EN138113101113122131
AA12515095187102151
CH1221201351217888
SI1351251319990135
TD8089106879178
TS7412310517385190
QS78111135109106136
TI92109105135120198
BI145189143143153110
Group 2
POI/ObjectiveP1P2P3P4P5P6
EN102123132124164122
AA168108132120160199
CH139121144108130151
SI10912515490170176
TD175131131124119190
TS12318789138118120
QS951081561496799
TI841251391207798
BI1011067613290121
Group 3
POI/ObjectiveP1P2P3P4P5P6
EN1341539012993106
AA1101097710911377
CH16713787111156127
SI140171100109128118
TD16012617816014398
TS1551039013121076
QS12316112412918667
TI9315015914089127
BI90135879877145
Group 4
POI/ObjectiveP1P2P3P4P5P6
EN1118613797107109
AA9786989610786
CH99148155167116150
SI1049011411998115
TD10411082143163109
TS107117987710962
QS1131509387111120
TI190160107120109121
BI9978130118160129
Group 5
POI/ObjectiveP1P2P3P4P5P6
EN115125140137114132
AA1001471988811887
CH7374799312084
SI1128910118311456
TD81841038684125
TS141702188178152
QS1917092126130178
TI14156908520556
BI1659216410912095
Table 10. EN objective matrix.
Table 10. EN objective matrix.
POIP1P2P3P4P5P6
P10253725167
P22505537481
P337550145747
P42537140936
P51648579013
P6714736130
Table 11. Comparisons of experimental results.
Table 11. Comparisons of experimental results.
AlgorithmTime Optimizations (s)Number of AlterationsNumber of Clusters
Enhancing k-means algorithm0.0155
Traditional k-means algorithm0.285
EM algorithm0.27225
Hierarchical cluster algorithm0.7095
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Damos, M.A.; Zhu, J.; Li, W.; Khalifa, E.; Hassan, A.; Elhabob, R.; Hm, A.; Ei, E. Enhancing the K-Means Algorithm through a Genetic Algorithm Based on Survey and Social Media Tourism Objectives for Tourism Path Recommendations. ISPRS Int. J. Geo-Inf. 2024, 13, 40. https://doi.org/10.3390/ijgi13020040

AMA Style

Damos MA, Zhu J, Li W, Khalifa E, Hassan A, Elhabob R, Hm A, Ei E. Enhancing the K-Means Algorithm through a Genetic Algorithm Based on Survey and Social Media Tourism Objectives for Tourism Path Recommendations. ISPRS International Journal of Geo-Information. 2024; 13(2):40. https://doi.org/10.3390/ijgi13020040

Chicago/Turabian Style

Damos, Mohamed A., Jun Zhu, Weilian Li, Elhadi Khalifa, Abubakr Hassan, Rashad Elhabob, Alaa Hm, and Esra Ei. 2024. "Enhancing the K-Means Algorithm through a Genetic Algorithm Based on Survey and Social Media Tourism Objectives for Tourism Path Recommendations" ISPRS International Journal of Geo-Information 13, no. 2: 40. https://doi.org/10.3390/ijgi13020040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop