Enhancing the K-Means Algorithm through a Genetic Algorithm Based on Survey and Social Media Tourism Objectives for Tourism Path Recommendations

: Social media platforms play a vital role in determining valuable tourist objectives, which greatly aids in optimizing tourist path planning. As data classification and analysis methods have advanced, machine learning (ML) algorithms such as the k-means algorithm have emerged as powerful tools for sorting through data collected from social media platforms. However, traditional k-means algorithms have drawbacks, including challenges in determining initial seed values. This paper presents a novel approach to enhance the k-means algorithm based on survey and social media tourism data for tourism path recommendations. The main contribution of this paper is enhancing the traditional k-means algorithm by employing the genetic algorithm (GA) to determine the number of clusters (k), select the initial seeds, and recommend the best tourism path based on social media tourism data. The GA enhances the k-means algorithm by using a binary string to represent initial centers and to apply GA operators. To assess its effectiveness, we applied this approach to recommend the optimal tourism path in the Red Sea State, Sudan. The results clearly indicate the superiority of our approach, with an algorithm optimization time of 0.01 s. In contrast, traditional k-means and hierarchical cluster algorithms required 0.27 and 0.7 s, respectively.


Introduction
The rapid proliferation of social media platforms has significantly expanded the availability of tourism objectives, enabling the recommendation of tourism paths from anywhere and at any time [1,2].Travelers can effortlessly contribute to tourism objectives while on the go, using platforms such as WhatsApp, Facebook, Flickr, and Instagram.Additionally, users can share their current locations through services like Foursquare and provide feedback on tourist destinations via X, formerly known as Twitter.Location-based social media platforms like Foursquare and TripAdvisor empower users to share their experiences, reviews, and recommendations of tourist destinations, often including geographical coordinates [3][4][5].By considering factors such as tourist travel costs, preferences, internal tourism objectives, destinations, and transportation options, it is possible to create recommended itineraries using both survey and social media tourism objectives [6,7].The process for recommending tourist routes based on survey and social media data involves two key steps.First, the tourism objectives are clustered into groups, facilitating analysis and recommendations.Second, the optimal tourist path is recommended by solving the traveling salesman problem (TSP) [8,9].
The increasing volume of tourism data has added complexity to data collection and clusters, particularly when suggesting optimal tourist paths using ML algorithms and solving the TSP by an optimization algorithm.These methods are instrumental in analyzing unstructured data from social media platforms, ultimately enhancing path planning and destination recommendations [10][11][12].The k-means is a commonly employed clustering algorithm for tourist data.However, it faces challenges in determining the optimal number of clusters (k) and in selecting initial seeds.The choice of k significantly influences the outcomes, and researchers often employ methods like the elbow method and silhouette score for the k selection [13][14][15].While effective, the elbow method may encounter difficulties when dealing with many clusters [16,17].
On the other hand, the silhouette score evaluates object similarity within clusters but may have limitations when dealing with overlapping or significantly varying-sized clusters.Selecting the initial seeds for k-means is a critical step in the process.Running the algorithm multiple times with random initialization helps identify the best outcome [18,19].Various seed selection methods, such as random and k-means++, each come with tradeoffs involving simplicity, cluster quality, and computational efficiency [20,21].Furthermore, the GA has emerged as a powerful optimization algorithm for addressing complex problems, including determining the appropriate number of clusters (k) and identifying optimal initial seeds for the k-means algorithm [22,23].
This paper introduces the use of the GA to address the challenges of determining k and selecting initial seeds.The GA effectively mitigates the limitations of previous methods, addressing issues such as data overlap, handling large datasets, and improving the execution time of the k-means algorithm.The GA enhances the traditional k-means algorithm by employing binary strings to represent initial centers and applying GA operators.This enhancement significantly improves the algorithm's effectiveness in classifying tourist data.Through parameter optimization, the algorithm becomes more proficient in accurately clustering and categorizing the extensive volume of tourism data, thus enhancing its overall performance.The optimization process iterates until convergence, where cluster assignments remain unchanged.Subsequently, the improved GA employs clustered and optimized tourism objectives to solve the TSP and recommend the optimal tourism path [24][25][26][27].In summary, the main contributions of this paper are as follows: ➢ Enhancing the traditional k-means algorithm by using the GA to determine initial seeds, selecting the appropriate number of clusters (k), and recommending the best tourism path based on survey and social media tourism objectives.➢ Collecting the tourism objectives from social media platforms through an online questionnaire and from TripAdvisor.➢ Selecting and visualizing the optimal tourism path using GAs and the geographic information system (GIS) environment.➢ Demonstrating the optimal time to implement the GA algorithm for finding the best tourism path through a comparison of our approach with other state-of-the-art methods.
The remainder of this article is structured as follows.In Section 2, we present the methodology and the idea of tourism objectives, survey and social media data, along with enhancing the k-means algorithm through the GA.In Section 3, the system implementation and experimental analysis are discussed.In Section 4, the results of this research and discussion are presented.Finally, the conclusion and future work are presented in Section 5.

Related Work
Tourism objectives are collected from various sources, including governmental and international institutions, geographical surveys, and popular social media platforms such as Facebook, WhatsApp, WeChat, and X.To address the static objectives of tourism planning, three main processes must be carried out: collecting tourism data, classifying tourism data, and employing optimization algorithms to determine the best route.In the following sections, we conduct a thorough analysis of studies related to the formulation of tourism objectives, highlighting key findings and methodologies.
Hu et al. [28] presented a method for deriving tourist movement patterns from X data involving a three-step process of cleaning geo-tagged posts to identify those authored by tourists.However, this method's reliance solely on X data may limit its ability to represent comprehensive tourist activity from other sources or platforms.A separate study by Riaz and Sherani [29] overcame this limitation by focusing on the factors influencing information sharing on multiple social media platforms, particularly the adoption of Facebook and WeChat.Hashimy and T. S. [30] explored the opportunities and challenges of using social media platforms such as WhatsApp and Facebook for tourism development in Afghanistan.These include increased visibility, user-generated content, direct communication, influencer marketing, and destination marketing.However, the paper also highlights challenges that must be addressed, such as ensuring tourist safety and the need for infrastructure development.Sakas et al. [31] considered multiple objectives, including transportation type and tourist preferences, collected from various social media platforms.These objectives collectively describe the tourist destination, falling under the category of internal objectives.However, this approach overlooks the external objectives associated with interactions between tourist destinations.Addressing the challenge of integrating both internal and external objectives within a unified approach is essential for the advancement of this field.
A novel approach developed by Kim et al. [32] focused on developing a deep learning model and an image feature vector clustering technique to automate the categorization of traveler images by tourism destinations.However, the paper has limitations, primarily focusing on spatial data and omitting information about the characteristics and features of tourist destinations.The study by Bouabdallaoui et al. introduces an innovative clustering architecture that integrates the GA and k-means, coupled with a hybrid topic discovery approach incorporating latent Dirichlet allocation (LDA) and bidirectional encoder representations from transformers (BERT).The primary objective of this novel method is to predict and analyze the most significant topics related to tourist shopping destinations in Morocco.However, a significant limitation of this paper is the lack of attention in determining the values for k and the initial seeds.This limitation stems from the paper's reliance on the random selection of both the number of groups (k) and the initial seeds, raising concerns regarding the robustness and reproducibility of the results.
Yafeng et al. [33] presented a new approach based on the GA to develop 47 tourism areas in Chongqing City, China.While this paper provides an intriguing approach for applying the GA to optimize tourism path planning, which could assist tour operators and planners in developing more effective, fun, and easy tourist trips, it is essential to note that the scope of this study is centered on enhancing the planning of tourist routes specifically for the 47 scenic areas in Chongqing.As a result, the general applicability of the findings to different contexts or regions may be limited.Moreover, this research solely employed the GA to identify the optimal tourism path without delving into the diverse objectives of tourism or the various tourism data sources available, such as social media platforms.Patcharin et al. [34] introduced a method for recognizing aircraft trajectories through statistical analysis clustering.It employs k-means clustering and Gaussian mixture clustering to group unstructured trajectories observed over Suvarnabhumi International Airport.Therefore, the applicability of these findings to other regions, such as tourist destinations, may be limited.Additionally, it is worth mentioning that the k algorithm used in this approach is not optimized, which affects the algorithm's execution time.Majid et al. proposed the development of urban tourism and branding for spatial modeling.The authors used a novel hybrid modeling approach combining k-mean, fuzzy logic, and an artificial neural network (ANN) to assess urban tourism potential (AUTP).While this modeling provides valuable information for developing future strategies for urban tourism, the paper does not consider tourism data sources such as social media platforms, and it also overlooks various tourism objectives.Mehrdad et al. [35] discuss the use of unsupervised clustering methods as data-driven models for mineral prospectivity mapping (MPM).A hybrid data-driven clustering model combines the k-means clustering algorithm with harmony search (HS) and artificial bee colony (ABC) metaheuristic optimization algorithms.This hybrid model can be used for the selection of optimum cluster centroids to highlight favorable targets in the prospecting stage of mineral explorations.
In conclusion, many papers have addressed the extraction and prediction of tourist paths based on social media platforms.However, the aforementioned papers lack a precise definition of survey and social media data in the context of tourism and its unique characteristics.They primarily analyze data from a single social media platform without comparing the suitability of various platforms for tourism research.Additionally, these papers do not introduce novel methods for analyzing social media data in the tourism domain.Furthermore, the implementation of algorithms to organize and categorize the spatial and attribute data of tourist destinations can be time-consuming.

Methodology
The proposed approach utilizes the combined power of the K-means algorithm and GA to optimize and cluster social media tourism data, ultimately identifying the optimal tourism path.This approach can be broken down into several stages: First, tourism objectives are collected from social media platforms using two main methods.The first method involves distributing questionnaires on the websites of tourist groups within popular social media platforms such as WeChat and WhatsApp.This allows for the direct collection of relevant information from users.The second method involves extracting objectives from the TripAdvisor website, which serves as a valuable source of tourism objectives.
Second, once the tourism objectives have been gathered from these platforms, they are segmented into distinct groups using the K-means algorithm.The initial value of k, representing the number of clusters, is determined using the GA.Additionally, 12 specific tourism objectives are carefully selected to evaluate and assess different tourist destinations.
Third, the GA is employed to suggest and determine the best tourist path based on the clustered data.This comprehensive and systematic approach allows researchers to gain valuable insights into the preferences and patterns of tourists.These stages are all shown in Figure 1.
ISPRS Int.J. Geo-Inf.2024, 13, x FOR PEER REVIEW 4 of modeling provides valuable information for developing future strategies for urban tou ism, the paper does not consider tourism data sources such as social media platforms, an it also overlooks various tourism objectives.Mehrdad et al. [35] discuss the use of uns pervised clustering methods as data-driven models for mineral prospectivity mappin (MPM).A hybrid data-driven clustering model combines the k-means clustering alg rithm with harmony search (HS) and artificial bee colony (ABC) metaheuristic optimiz tion algorithms.This hybrid model can be used for the selection of optimum cluster ce troids to highlight favorable targets in the prospecting stage of mineral explorations.
In conclusion, many papers have addressed the extraction and prediction of tour paths based on social media platforms.However, the aforementioned papers lack a pr cise definition of survey and social media data in the context of tourism and its uniq characteristics.They primarily analyze data from a single social media platform witho comparing the suitability of various platforms for tourism research.Additionally, the papers do not introduce novel methods for analyzing social media data in the touris domain.Furthermore, the implementation of algorithms to organize and categorize t spatial and attribute data of tourist destinations can be time-consuming.

Methodology
The proposed approach utilizes the combined power of the K-means algorithm an GA to optimize and cluster social media tourism data, ultimately identifying the optim tourism path.This approach can be broken down into several stages: First, tourism objectives are collected from social media platforms using two ma methods.The first method involves distributing questionnaires on the websites of tour groups within popular social media platforms such as WeChat and WhatsApp.This lows for the direct collection of relevant information from users.The second method i volves extracting objectives from the TripAdvisor website, which serves as a valuab source of tourism objectives.
Second, once the tourism objectives have been gathered from these platforms, they a segmented into distinct groups using the K-means algorithm.The initial value of k, repr senting the number of clusters, is determined using the GA.Additionally, 12 specific tou ism objectives are carefully selected to evaluate and assess different tourist destinations.
Third, the GA is employed to suggest and determine the best tourist path based the clustered data.This comprehensive and systematic approach allows researchers gain valuable insights into the preferences and patterns of tourists.These stages are shown in Figure 1.

Survey and Social Media Data
Social media platforms have evolved into indispensable resources for the collection of tourism data.These platforms hold significant importance within the tourism industry, pri-marily because they enable the real time sharing of user generated content [36,37].Tourists utilize social media as a medium for sharing their tourism experiences, recommendations, and feedback, thus contributing to the creation of an extensive repository of information that holds immense value for researchers and tourism experts.By analyzing this rich trove of tourism data, researchers can glean valuable insights into tourist preferences.This wealth of data empowers businesses and destinations to make well informed decisions and tailor their services to align with the evolving requirements and preferences of tourists.The widespread adoption of social media platforms provides a unique opportunity to gather tourism data on a large scale, thereby fostering a deeper understanding of tourists and an overall enhancement of the tourism experience [38].

Selection Objectives
Tourism objectives in this approach were obtained from social media platforms through two methods: The first method involved the creation of a questionnaire, which was then distributed across various social media platforms groups like Facebook, WhatsApp, and WeChat.Access to the survey could be found at this link: https://forms.gle/6UHHubaiAPA6JhtE7, accessed on 25 February 2023.This questionnaire consisted of a range of inquiries concerning tourism objectives.Table 1 shows the definition of tourism objectives based on tourist preferences.Table 2 shows the sample of online questionnaire results.The size of the tourist destination, the height of the place, and the ability of the tourist destination to accommodate tourists.

Tourism seasonality (TS)
Tourism seasonality is the possibility of visiting a tourist site in a specific season of year, some sites that can be visited year-round, such as museums, some site have seasonality such as gardens.The second method focused on acquiring tourist preferences, calculating distances between tourist destinations, and estimating travel costs between destinations.Table 3 proposed the tourism destination in Port Sudan City.These points were chosen on the basis of tourist demand and the aesthetic views available there, so they are considered a point of interest (POI).Tourist preferences indicate the extent to which tourists evaluate tourism destinations on the TripAdvisor website.Ratings range from 1 to 5, where 5 means very good, 4 means good, 3 means average, 2 means weak, and 1 means very weak.Table 4 displays tourist preferences according to the TripAdvisor website.Distances between tourist destinations are measured in kilometers in Table 5, and travel costs between destinations are measured in Sudanese pounds (SDG) in Table 6.This information was sourced from data collected on the TripAdvisor website.

Genetic Algorithm
GA is a computational method and N-hard algorithm based on the process of natural selection and genetics; it is a kind of meta-heuristic heuristic algorithm that mimics the natural process of evolution to solve complex issues [39,40].The population of solutions in GA is randomly created and their fitness functions are evaluated to create a new population; the fittest individuals are chosen and then mutated.These operations are repeated until finding of the optimal solutions as the following: 1.
Initialization: population of possible solutions is generated randomly.

2.
Evaluation: each solution is tested for its fitness to use a successful fitness function.

3.
Selection: more fitness functions are chosen to be the parents of the following generation.

4.
Crossover: new individuals are generated by combining the existing genetic material of the selected parents.

5.
Mutation: new individuals may undergo mutation, which introduces small changes in parents and a new generation replaces the old generation.The algorithm stops when a stopping parameter is satisfied, such as a set number of generations or a successful solution.Figure 2 shows the GA operations.
In general, the k-means algorithm is a machine learning tool that may be useful in a wide range of applications, such as clustering, anomaly detection, image compression, recommendation systems, and tourism path planning [41].In our approach, we used the k-means algorithm to cluster tourism data and group it in the k group, the number of clusters was determined by using the GA.

K-Means Algorithm
The k-means algorithm is an unsupervised machine learning algorithm using clustering data, k-means classifying, and dividing data into k classes by its properties.The algorithm progresses using iterations; each data point is iteratively assigned to the closest centroid (cluster center), and the centroids are then computed again using the new assignments [42].This procedure continues until the centroids stop moving altogether or the maximum number of iterations has been achieved.In the following are the steps of k-means algorithms: 4. Crossover: new individuals are generated by combining the existing genetic material of the selected parents. 5. Mutation: new individuals may undergo mutation, which introduces small changes in parents and a new generation replaces the old generation.The algorithm stops when a stopping parameter is satisfied, such as a set number of generations or a successful solution.Figure 2 shows the GA operations.In general, the k-means algorithm is a machine learning tool that may be useful in a wide range of applications, such as clustering, anomaly detection, image compression, recommendation systems, and tourism path planning [41].In our approach, we used the k-means algorithm to cluster tourism data and group it in the k group, the number of clusters was determined by using the GA.

K-Means Algorithm
The k-means algorithm is an unsupervised machine learning algorithm using clustering data, k-means classifying, and dividing data into k classes by its properties.The algorithm progresses using iterations; each data point is iteratively assigned to the closest centroid (cluster center), and the centroids are then computed again using the new assignments [42].This procedure continues until the centroids stop moving altogether or the maximum number of iterations has been achieved.In the following are the steps of kmeans algorithms:  Choose several clusters k.  Randomly initialize k centroid.➢ Choose several clusters k. ➢ Randomly initialize k centroid.➢ After the initial centroids have been selected at random, decide on each point nearest to the centroid.➢ Recalculate the centroids according to the new value of the mean of all the data points in that cluster.If given two points x and y, cluster C with k data points (x 1 , x 2 , . .., x k ), then the centroid C is calculated as ( 1 k (x 1 + x 2 + . ... . .+ x k )).➢ Repeat the 3-4 steps until the centroids stop moving altogether or the maximum number of iterations has been achieved.➢ The k-means algorithm aims are determined and then find the minimized sum distance between the data and determine the centroid.Many methods can be used to determine distance such as the Euclidean distance method; this method is most commonly used if given two points x and y and the Euclidean distance is calculated as Equation ( 1): where n is the number of data points.

Enhancing the K-Means Algorithm though GA
This approach enhances k-means to utilize GA to determine the optimal initial seeds and number of the k value for k-means clustering.Figure 2 provides an overview of the entire framework for GA k-means clustering and enhancing the algorithm showing in pseudo code.Presented below is a detailed explanation of enhancing the k-means clustering process: In the first step of this approach, a value is defined to generate the initial population in GA and solution fitness is assessed based on clustering quality, measured by the sum of squared errors (SSE), which is utilized to determine the optimal initial seeds and the value of k.To enhance the performance of GA, optimization parameters were adjusted, as shown in Tables 3-6 these parameter adjustments aim to optimize the GA for more effective initial seed selection and k value determination for clustering.The improved GA, designed to enhance the K-means algorithm, is presented in the pseudo code.
In the second step of this process, once the optimal values for both the number of clusters (k) and the initial seeds have been determined, the tourism data are partitioned into groups.This partitioning is accomplished using the improved k-means algorithm introduced in the first step.The enhanced k-means algorithm effectively assigns each data point to its corresponding cluster based on similarity, taking into account the optimized k value and the initial seeds.By partitioning the tourism data into groups, this step facilitates further analysis and enables the identification of unique patterns, visitor preferences, or notable characteristics within the dataset.
In the third step, after segmenting the tourism data into groups using the enhanced k-means algorithm and considering the identified tourism objectives, an improved GA is employed to recommend the optimal tourist path.Although harnessing the capabilities of the GA, which combines elements of natural selection and genetic operators, the algorithm efficiently searches for the most favorable path that aligns with the specified tourism objectives.Figure 3 and Algorithm 1 shows the Framework of recommended tourism path enhancing the k-means algorithm through GA.

System Implementation and Experimental Analysis
Port Sudan City is in Red Sea State, Sudan.Port Sudan, the capital of Red Sea State in eastern Sudan, functions as Sudan's primary seaport.Over 90% of Sudan's international trade flows through Port Sudan's modern port facilities, which were built between 1905 and 1909 to replace the historical Arab port of Suakin.Port Sudan features key infrastructure like an international airport, an oil refinery, and state of the art cargo and passenger terminals.Located on the eastern coast of the Red Sea, the port handles significant volumes of container traffic, bulk commodities, and roll-on/roll-off shipments.Port Sudan serves as a strategic gateway for the landlocked countries of South Sudan, Ethiopia, and Eritrea.It provides access to key trade routes like the Suez Canal and the Bab el Mandeb Strait.With ample developable land and a deep-water harbor, the port has potential for significant expansion to support Sudan's growing trade volumes and improve supply chain connectivity.However, challenges remain around port efficiency and infrastructure constraints that have hindered full realization of Port Sudan's potential as a regional shipping hub.Ongoing developments and investments aim to address these issues, upgrade port facilities, digitize processes, and expand container handling and logistics services.If successful, such efforts could transform Port Sudan into a modern logistics center that enhances Sudan's trade competitiveness and links the nation to global supply networks.Figure 4 shows Port Sudan City in Red Sea State Sudan.Sex tourist destinations distributed in the study area were selected.Table 3 and Figure 5 proposed this point of interest (POI) in Port Sudan city.Following the selection, questionnaires were distributed to these chosen destinations.

Results
Based on the results of online questionnaires distributed on social media platforms in the study area, we determined the static tourism objectives with input from 600 visitors to enhance our approach.We collected the external tourism objectives from the TripAdvisor social media website.To implement our approach, first, we improved the GA by using new parameters, and then we determined the optimal k value and initial seeds.The optimal k value is 5, and the optimal initial seed selection is 10.Second, we used the enhanced kmeans algorithm to cluster the static tourism objectives based on the value of k.Five groups of visitors were created for each tourism objective.The optimal path recommendations for visitor destinations, determined through the enhanced GA with improved parameters in Table 7, are presented in Table 8.

NO The Objectives
The Path Table 9 and Figure 6 show groups of the internal tourism objectives based on enhancing the k-means algorithm.The optimal k value is 5, and the optimal initial seed selection is 10.Second, we used the enhanced k-means algorithm to cluster the static tourism objectives based on the value of k.Five groups of visitors were created for each tourism objective.The optimal path recommendations for visitor destinations, determined through the enhanced GA with improved parameters in Table 7, are presented in Table 8.Table 9 and Figure 6 show groups of the internal tourism objectives based on enhancing the k-means algorithm.After creating tables categorizing tourism objectives into groups, we constructed a comprehensive tourism objectives matrix comprising 12 matrices, nine for internal objectives and three for external objectives.The numbers in the matrices represent the numerical differences between the numbers of visitors in the EN groups across various destinations.For instance, the value 25 in the EN objectives matrix corresponds to the difference in the number of visitors between destination P1 and destination P2 in Group 1 in Table 9.Consequently, we calculated the matrices for all nine internal objectives in the same manner as illustrated in Table 10.Following this, we calculated 12 optimal paths based on static tourism objectives using the improved GA.Our approach involved creating objective matrices to address the TSP.After creating tables categorizing tourism objectives into groups, we constructed a comprehensive tourism objectives matrix comprising 12 matrices, nine for internal objectives and three for external objectives.The numbers in the matrices represent the numerical differences between the numbers of visitors in the EN groups across various destinations.For instance, the value 25 in the EN objectives matrix corresponds to the difference in the number of visitors between destination P1 and destination P2 in Group 1 in Table 9.Consequently, we calculated the matrices for all nine internal objectives in the same manner as illustrated in Table 10.Following this, we calculated 12 optimal paths based on static tourism objectives using the improved GA.Our approach involved creating objective matrices to address the TSP.

Discussion
When collecting and classifying tourism data, online surveys and social media play significant roles.In our approach, we propose to enhance the k-means algorithm for optimizing tourism data obtained from social media platforms.The primary challenge lies in determining the optimal k value for the k-means algorithm and selecting initial seeds, which can be addressed using various methods.Our method is based on the fundamental premise of plotting different cost values against varying k values.The elbow point on the graph can be used to compute k, representing the point of diminishing returns or the inflection point at the elbow [43,44].However, the drawback of the elbow method is that it occasionally struggles to produce effective clusters.As an alternative, we employ an improved GA in our approach to determine the ideal value of k.Compared to other methods, the two-stage GA represents a relatively recent development.Many scholars have employed the k-means algorithm to classify and organize data into groups because of advantages such as ease of implementation, scalability to handle large datasets, guaranteed convergence, the ability to initialize centroids, and adaptability to new data points.However, one challenge associated with the k-means algorithm is the estimation of the optimal number of groups [41,45].The k-means algorithm is favored by many scholars for data classification due to its strengths, such as ease of implementation and scalability.One of its drawbacks is the difficulty in determining the number of k groups.Our approach employs the GA to overcome this limitation.
In the process of clustering with k-means, initial seeds for clustering are selected.The method used to choose these seeds is dependent on the data and the problem being addressed.One approach commonly employed involves the random selection of initial seed points; the algorithm is executed multiple times, retaining the seeds that yield the lowest clustering error.Alternatively, an initial seed selection algorithm can be utilized, which selects the initial seeds from different clusters within the dataset.The decision regarding these methods is dictated by the characteristics of the data and the clustering objectives at hand [46].In this approach, we utilize the GA to determine the optimal k value and to select initial seeds.Comparisons were conducted with several algorithms commonly used in the field of clustering.These include the expectation-maximization (EM) algorithm, hierarchical clustering, and the traditional k-means algorithm.Table 11 presents the comparison between the enhancing k-means algorithm and other machine learning clustering algorithms.If we wish to implement machine learning algorithms, the most valuable parameter is the optimization time.The optimization time in the k-means algorithm is 0.01 s, and the number of iterations is five.These results indicate the superiority of the k-means algorithm over other clustering algorithms.In this study, we made significant enhancements to the k-means algorithm.These enhancements specifically improve the methods used to determine the number of groups (k) and the selection of initial seeds.We achieved these improvements by utilizing an enhanced GA and optimizing its parameters.We applied this improved GA to predict optimal tourist paths.This prediction is based on tourism objectives derived from social media platforms, considering factors such as popular destinations and peak travel times.Although our approach was specifically implemented for the Port Sudan region of the Red Sea State in Sudan, it can be applied to other regions as well.The suitability of other regions for this approach depends on factors such as the variety of tourist targets, the population characteristics, and the prevalence of relevant social media platforms.However, for accurate and effective implementation, it is essential to conduct a comprehensive study of the region's specific tourist objectives, the population characteristics, and the relevant social networking sites.

Conclusions and Future Work
Tourism objectives derived from social media can offer new opportunities for decision support in recommending tourism paths.In this paper, we propose an innovative approach to optimize and classify tourism objectives for recommending the optimal tourism path, using the k-means algorithm and GA.We also integrate various tools for this purpose, demonstrating the applicability of an improved k-means algorithm and GA for developing tourism path planning.Additionally, we utilize the GIS to implement and visualize efficient social media tourism objectives and display the optimal routes.Our approach is organized as follows: First, the tourism objectives were collected from surveys and social media platforms.Second, the GA was used to enhance the k-means algorithm with a new parameter for clustering tourism objectives.Finally, a comparison and combination were performed with the algorithms currently used in the GIS environment.The following points are recommended:

❖
Optimize and classify 12 tourism objectives based on social media platforms to determine the path of tourists.❖ Apply the GA to determine the number of clusters, initial seeds, and the optimal path planning.

❖
Optimize and visualize the tourism path planning approach based on the social media tourism objectives.
There is still room for improvement in using ML algorithms to improve social media data.Future work can focus on increasing the objectives and fusing both internal and external objectives in the evaluations of web users.

Figure 1 .Figure 1 .
Figure 1.Process of recommending the best path based on survey and social media tourism data Figure 1.Process of recommending the best path based on survey and social media tourism data.

Figure 3 .
Figure 3. Framework of recommended tourism path enhancing the k-means algorithm through GA [40].

Figure 4 .
Figure 4. Location of Red Sea State Sudan in Sudan.

Figure 5 .
Figure 5. Tourist sites distributed within the city.

Figure 4 .
Figure 4. Location of Red Sea State Sudan in Sudan.

Figure 4 .
Figure 4. Location of Red Sea State Sudan in Sudan.

Figure 5 .
Figure 5. Tourist sites distributed within the city.Figure 5. Tourist sites distributed within the city.

Figure 6 .
Figure 6.Groups of the internal tourism objectives based on enhancing the k-means algorithm.

Figure 6 .
Figure 6.Groups of the internal tourism objectives based on enhancing the k-means algorithm.

Table 1 .
Definition of tourism objectives.The value of entertainment refers to the entertainment available in the tourist site, which is available to the visitor.

Table 2 .
Sample of online questionnaire results.

Table 3 .
Tourism destinations in Port Sudan City.

Table 4 .
Matrix of tourist preferences.

Table 5 .
Matrix of distance between tourism destinations (km).

Table 6 .
Matrix of travel costs between destinations (SDG).

Table 7 .
Parameter settings of GA.

Table 9 .
Results of internal tourism objective groups of visitors using the enhance k-means algorithm.

Table 11 .
Comparisons of experimental results.