Less-Known Tourist Attraction Discovery Based on Geo-Tagged Photographs

: Most existing studies of tourist attraction recommendations have speciﬁcally emphasized analyses of popular sites. However, recommending such spots encourages crowds to ﬂock there in large numbers, making tourists feel uncomfortable. Furthermore, some studies have discovered that quite a few tourists dislike crowded destinations and prefer to avoid them. A ready solution is discovery and publicity of less-known tourist attractions. Especially, this study speciﬁcally examines discovery of less-known Japanese tourist destinations that are attractive and merit increased visits. Using this approach, crowds can not only be dispersed from popular tourist attractions, but more diverse spots can be provided for travelers to choose from. By analyzing geo-tagged photographs on Flickr, we propose a formula that incorporates di ﬀ erent aspects such as image quality assessment (IQA), comment sentiment, and tourist attraction popularity for ranking tourist attractions. We investigate Taiwanese and Japanese people to assess their familiar Japanese cities and remove them from ranking results of tourist attractions. The remaining spots are less-known tourist attractions. As reported from results of veriﬁcation experiments, most less-known tourist attractions are known by only a few people. They appeal to participants. Additionally, we examined some factors that might a ﬀ ect respondents when they decide whether a spot is attractive to them or not. This study can beneﬁt tourism industries worldwide in the process of discovering potential tourist attractions.


Introduction
Along with development of information and communication technologies, almost all mobile devices have come to be equipped with global positioning system (GPS) sensors. The sensors and devices can support users by confirming their current position, but they can also annotate their photographs with "geo-tagging" on social networking services (SNSs). Numerous studies have analyzed "geo-tagged" photographs to elucidate user behaviors and preferences. Through that process, one can discover popular tourist attractions and can recommend some tour plans for users according to their preferences [1][2][3][4]. In addition to geolocation, diverse information such as comments and photographs are available from SNS users. That information includes important and useful data for research. For instance, Hausmann et al. [5] pointed out that social media contents might provide a swift forests, oceans, and mountains. In this way, tourists can choose their tourist spot preferences easily. Finally, these geo-tagged photographs are ranked using our formula. The verification experiments specifically investigate Japanese people, Taiwanese people, and their differences.
The remainder of the paper is organized as follows: Section 2 introduces related work. Section 3 presents the methodology of discovering less-known tourist attractions. In Section 4, we illustrate less-known tourist attraction estimation and demonstrate the current results. Section 5 explains results of our verification experiments. Section 6 discusses our experiment results and the improvable aspects of this research. Section 7 interprets conclusions and future work.

Points of Interest (POIs)
A point of interest (POI) is used with a technique positing a particular spot that someone might find useful or interesting. Such spots can be landmarks, sightseeing spots, or a commercial institution of any type such as a restaurant, a hospital, or a supermarket. Based on data types and discovery procedures, the approaches developed for POI are divisible into two types. The first type is top-down: discovery of POI from an existing POI repository or database, such as check-in data or yellow pages that are used frequently or which fit for a specific theme or target [9][10][11]. The second type is bottom-up: raw data (e.g., geo-tagged photos, digital footprints with implicit geographic information or metadata that involve latitude and longitude) to construct a new database or dataset that includes the POI [12][13][14][15]. Skovsgaard et al. [13] demonstrated a clustering technique that incorporates consideration of both spatial and textual attributes of microblog posts to obtain clusters that represent POI. Based on Flickr geo-tagged photographs, Kuo et al. [15] used pattern discovery, the spatial overlap (SO) algorithm, and the naming and merging method for attractive footprint clustering. From the peak value and range of clusters, the POI and region of interest (ROI) can be extracted, indicating the most popular location and range for appreciating attractions.
Many studies have combined a POI and a recommender system to provide various travel plans for tourists [37][38][39][40]. To recommend POIs for a given user at a specified time in a day, Yuan et al. [37] developed a collaborative recommendation model that is able to incorporate temporal information. Massimo et al. [38] presented a new recommender system technique for tourists' behavior learning and next-POI recommendations. The technique clusters users with similar POI visit trajectories and then learns a general user behavior model via inverse reinforcement learning (IRL).
Discovering new tourist attractions is an important task for tourism industry. Nevertheless, in accordance with our observation, those studies are only related to popular tourist attractions, but they neglect other places. Different from existing POI studies, this research provides a POI method for discovering less-known tourist attractions and specifically analyzes those unnoticed places which might include some attractive spots for tourists.

Image Quality Assessment (IQA)
Photograph is an important factor that affects tourists to make the decision about travel destinations, and also influenced their behaviors and reflected their satisfaction with tourism places [41,42]. Molina et al. [43] found out that the good quality photographs influenced tourist destination choice. On the other hand, in the previous research [36], we observed a phenomenon that most participants really cared about photographs' quality when evaluating the tourist attractions. Consequently, to discover appealing photographs, the image quality assessment is applied in this research.
Image quality assessment, an image processing technique, can use subjective and objective methods. Subjective methods rely on the intuitive appreciation of human observers for image attributes. Such methods are classifiable into two types: absolute evaluation and relative evaluation. Objective methods are based on computational models that can predict perceptual image quality.
Numerous early studies have been conducted to automate NR-IQA to assess photograph quality using machine learning techniques. Most of these studies applied binary labels ("good" or "bad") to assess image quality [44][45][46][47][48][49]. Although Dong et al. [50,51] developed a method to extend the image quality representation ("good", "medium", and "bad"), the results nevertheless leave great difficulty in ranking the images. Talebi et al. [52] proposed an approach called neural image assessment (NIMA), which differs from methods of other studies in that they predict the distribution of human opinion scores and assess techniques used for photography. Those IQA studies applied a large-scale database for aesthetic visual analysis (AVA) dataset [53] as their training data to machine learning model. The AVA dataset contains about 255,000 images, rated based on aesthetic qualities by different viewers (include amateur, professional, novice photographers, etc.). By using AVA dataset, those studies can train the model to classify the photographs into different levels or predict the scores of image quality. Especially, NIMA will be used to rank scenic photographs in this study and will be used to compare them with other ranking results to choose the best method for our study.

Image Classification
Image classification technique is used to discern the contents of images and classify these images into distinct categories or to assign a probability that the image is of a particular category. Traditional image classification is feature description and detection, which might be effective for some sample images, but the high dimensionality of the feature space is difficult to process in a factual situation. Recent studies applied machine learning techniques to create automatic image classification and to alleviate the shortcomings of traditional methods. This technique is widely applied in diverse fields such as medical field [54][55][56][57] and image quality assessment [44][45][46][47][48][49]. Raj et al. [56] improved classifier to recognize image of lung cancer, brain image, and Alzheimer's disease for Internet of Medical Things (IoMT). In addition, Shankar Et al. [56] ameliorated the model to distinguish image of diabetic retinopathy. To find out guide-suitable pictures for improving the touristic experience, Kleinlein et al. [58] presented an approach to classify photographs into three labels base on aesthetic perception. Different from object detection, image classification only can annotate one label for the photograph. To our knowledge, in the tourism field, most research administered object detection to recognize the content of photographs for analyzing tourist photographs. However, considering nature scenes do not have fixed features (e.g., shape), it is hard to correctly annotate multiple labels for scenic photographs (which we used in this research) by object detection. Therefore, we used technique of image classification to simply classify scenic photographs into different types that include mountains, oceans, and nightscapes. By doing so, tourists can readily choose their preferences for natural landscapes.

Methodology
This section describes our proposed method for extracting less-known tourist attractions. Figure 1 presents an overview of our method. Our method is divided into five parts. Every step is explained in the subtask. For the first, we introduce the dataset for the research in Step A. In Step B, the dataset will be classified into distinct clusters by the number of photographs in each prefecture and city.
Step C, these clusters will be used to investigate participants' unfamiliar clusters. As Step D, we analyze and extract the positive comments of photographs. Subsequently the quality of photographs is evaluated using 5 IQA methods. Steps E and F are introduced in Section 4.

Definition of Less-Known Tourist Attractions
To differentiate well-known and less-known tourist attractions, we adopt two definitions of lessknown tourist attractions as the following: • Definition 1: Only some people know about this tourist attraction. • Definition 2: The tourist attraction is attractive for tourists and deserves to be visited.
On top of that, if tourists could view well-known landscape from a certain place, but only a few people know about this place, that is also regarded as a less-known tourist attraction. Moreover, we then assume that the less-known tourist attractions might be included in unfamiliar cities of tourists.

Data Collection and Extract the Scenic Photographs (Step A)
Using Flickr API, 769,749 photographs taken in 2017 at geolocations throughout Japan were collected. To obtain full addresses, we apply geocoding to photograph latitude and longitude using Google Geocoding API. Nevertheless, 309 photographs have no details of addresses because these photographs were taken on the ocean. As a result, our dataset includes 769,440 photographs. Additionally, we collected the information of these photographs such as comments, numbers of views, and numbers of favorites they earned. To swiftly filter the scenic photographs, the photographs with the tags that related to scenic descriptions in English, Japanese, and Chinese (e.g., "scenery") are extracted from the dataset. Further, we manually sifted the inappropriate photographs which are not related to the nature scene and extracted 1159 scenic photographs as second dataset for this research. The content of scenic photographs includes over 80 percent natural scene without human.
Subsequently, these photographs are classified into different prefectures and cities according to the full address of photographs. Later, we calculated the numbers of photographs of 47 prefectures and 1158 cities (those cities include special wards). Table 1 presents the top 10 prefectures and cities in terms of the number of photographs. Figure 2 shows the distribution of photographs for Special ward of Tokyo. In Figure 2, we can realize that most photographs are shoot in Shibuya and Shinjuku where are popular spots.

Definition of Less-Known Tourist Attractions
To differentiate well-known and less-known tourist attractions, we adopt two definitions of less-known tourist attractions as the following: • Definition 1: Only some people know about this tourist attraction. • Definition 2: The tourist attraction is attractive for tourists and deserves to be visited.
On top of that, if tourists could view well-known landscape from a certain place, but only a few people know about this place, that is also regarded as a less-known tourist attraction. Moreover, we then assume that the less-known tourist attractions might be included in unfamiliar cities of tourists.

Data Collection and Extract the Scenic Photographs (Step A)
Using Flickr API, 769,749 photographs taken in 2017 at geolocations throughout Japan were collected. To obtain full addresses, we apply geocoding to photograph latitude and longitude using Google Geocoding API. Nevertheless, 309 photographs have no details of addresses because these photographs were taken on the ocean. As a result, our dataset includes 769,440 photographs. Additionally, we collected the information of these photographs such as comments, numbers of views, and numbers of favorites they earned. To swiftly filter the scenic photographs, the photographs with the tags that related to scenic descriptions in English, Japanese, and Chinese (e.g., "scenery") are extracted from the dataset. Further, we manually sifted the inappropriate photographs which are not related to the nature scene and extracted 1159 scenic photographs as second dataset for this research. The content of scenic photographs includes over 80 percent natural scene without human.
Subsequently, these photographs are classified into different prefectures and cities according to the full address of photographs. Later, we calculated the numbers of photographs of 47 prefectures and 1158 cities (those cities include special wards). Table 1 presents the top 10 prefectures and cities in terms of the number of photographs. Figure 2 shows the distribution of photographs for Special ward of Tokyo. In Figure 2, we can realize that most photographs are shoot in Shibuya and Shinjuku where are popular spots.

Clustering Prefectures and Cities (Step B)
For this study, the less-known tourist attractions are assumed to exist in cities that are unfamiliar to tourists. Moreover, people from different countries have distinct familiarity with Japanese cities. To ascertain and compare residents and foreign visitors' unfamiliarity with Japanese cities, we intend to conduct a questionnaire to investigate Japanese and Taiwanese. However, surveying the degrees of familiarity for each city (1158 cities) from respondents was difficult. For that reason, to reduce the respondent burden, X-means was used to cluster prefectures and cities to administer the questionnaire survey easily in this step.
The X-means algorithm is a clustering technique presented by Pelleg and Moore [59] to improve the shortcomings of K-means. Moreover, X-means algorithm can determine the optimum number of clusters automatically from a user setting of only the minimum and maximum of clusters. Here, we refer to the results of elbow method to set the minimum of clusters. Additionally, this approach greatly reduces the probability of being trapped into a local optimum. Considering the outliers existing in the data, we used this method to distribute the prefectures and cities into different clusters based on their respective characteristics. For the features of X-means, we adopted the number of photographs in each prefecture as the most appropriate feature for analyzing the less-known level of prefectures in current work. Furthermore, the four features are applied for cities' cluster: the number of photographs in each city, the rate of number of photographs in each city, the rate of number of photographs in each prefecture and the average of the number of photographs in each prefecture.

Clustering Prefectures and Cities (Step B)
For this study, the less-known tourist attractions are assumed to exist in cities that are unfamiliar to tourists. Moreover, people from different countries have distinct familiarity with Japanese cities. To ascertain and compare residents and foreign visitors' unfamiliarity with Japanese cities, we intend to conduct a questionnaire to investigate Japanese and Taiwanese. However, surveying the degrees of familiarity for each city (1158 cities) from respondents was difficult. For that reason, to reduce the respondent burden, X-means was used to cluster prefectures and cities to administer the questionnaire survey easily in this step.
The X-means algorithm is a clustering technique presented by Pelleg and Moore [59] to improve the shortcomings of K-means. Moreover, X-means algorithm can determine the optimum number of clusters automatically from a user setting of only the minimum and maximum of clusters. Here, we refer to the results of elbow method to set the minimum of clusters. Additionally, this approach greatly reduces the probability of being trapped into a local optimum. Considering the outliers The 47 prefectures are clustered into four clusters, as shown in Table 2 and Figure 3. The 1158 cities are distributed into 14 clusters. In Table 2, the third column represents the score of each cluster which we defined in this step. These scores will be used in our developed formula. In addition, the clustering result roughly matches the distribution of the population in Japanese prefectures. Which means that the clustering results of cities might have sufficient validity. Furthermore, the city cluster score is defined according to questionnaire survey responses, as explained in Section 3.4.   Figure 3. Distribution of prefecture clusters.

Evaluating Familiarity of City Clusters (Step C)
As described in this section, we administered an online questionnaire survey to elicit information from foreign visitor (115 Taiwanese) and local residents (123 Japanese): their degrees of familiarity with Japanese cities. The reason why we invite Taiwanese as our participants is according to the news (https://www.nippon.com/en/japan-data/h00375/overseas-visitors-to-japan-in-2018-top-31-million.html), Taiwan was reported as the third place in the "Top 20 Countries/Regions by Number of Visitors." However, this ranking did not consider the population in each country. Considering the population, Taiwan will be the first place in the average of each person visiting Japan. Thus, Taiwanese are the most suitable participants for this research.
For this research, all of participants meet five requirements as follow, According to the number of cities in each cluster, 30 city names were selected randomly for the questionnaire. Additionally, the five options are provided for participants to select, with higher scores indicating greater familiarity with this city. The question is as follows,

Evaluating Familiarity of City Clusters (Step C)
As described in this section, we administered an online questionnaire survey to elicit information from foreign visitor (115 Taiwanese) and local residents (123 Japanese): their degrees of familiarity with Japanese cities. The reason why we invite Taiwanese as our participants is according to the news (https://www.nippon.com/en/japan-data/h00375/overseas-visitors-to-japan-in-2018-top-31million.html), Taiwan was reported as the third place in the "Top 20 Countries/Regions by Number of Visitors". However, this ranking did not consider the population in each country. Considering the population, Taiwan will be the first place in the average of each person visiting Japan. Thus, Taiwanese are the most suitable participants for this research.
For this research, all of participants meet five requirements as follow,

1.
Have travel experience in Japan.

3.
They do not mind visiting unknown places.

4.
Taiwanese who has the economic ability for overseas travel.

5.
They only can participate in the questionnaire survey for one time.
According to the number of cities in each cluster, 30 city names were selected randomly for the questionnaire. Additionally, the five options are provided for participants to select, with higher scores indicating greater familiarity with this city. The question is as follows, Do you know this "random name of city"? (point 1-5) (1) I totally have no idea.
(2) I have heard of this city, but I don't know the relevant tourist attractions.
(3) I have heard of this city and know the relevant tourist attractions.
(4) I have been to this city, but I don't know the relevant tourist attractions.
(5) I have been to this city and know the relevant tourist attractions.
Afterward, the familiar clusters of participants are extracted. Then these clusters are removed from the final ranking result. Tables 3 and 4 show the average scores of respective clusters, which imply that local residents and foreign visitors have different degrees of familiarity with Japanese cities. Furthermore, the average scores of city clusters will be applied to our developed formula. Subsequently, to extract the unfamiliar clusters of participants, we use statistical methods and assumed that half of the participants are unfamiliar with the cluster when the sample means of the cluster are less than the population means. Considering that we used the survey sampling approach to conduct the questionnaire survey, it might include sampling error. To decrease the inaccuracy from the sampling error, we categorized the cluster as a less-known one using t-tests and p-values. After calculating the t-test values, we used the p-value to ascertain whether the sample mean was greater than the population mean or not. If the p-value of cluster was less than 0.05, then we inferred this cluster as an unfamiliar cluster. Conversely, the cluster will be categorized into familiar clusters when the p-value is greater than 0.05.
Tables 3 and 4 present results of application of t-tests, with p-values obtained for the respective clusters. Table 3 presents that five clusters were regarded as unfamiliar by Taiwanese people. Moreover, Table 4 shows that six clusters were regarded as unfamiliar by Japanese people.

Analysis of Photograph (Step D)
The attractive spots include various factors that affect decision-making about travel destinations of tourists. To find the attractive tourist attractions and enhance the formula we propose, we specifically analyze comment sentiment of the photographs and the photograph quality in Step D.

Analysis of Comment Sentiment
To discover attractive tourist attractions from unfamiliar areas, positive comments about photographs are assumed to be a factor affecting whether this sightseeing spot is attractive for tourists to visit or not. Therefore, we extract the comments about scenic photographs collected in Section 3.2 through Web crawler. Additionally, comments written by the photograph owner are removed because almost all of these comments are merely responses to the viewer comments. For this study, we specifically examined English, Chinese, and Japanese comments using Google natural language API, which yielded a score of sentiment representing the probability of positive meaning. In this way, one can detect whether the sentiment of comments is positive or not. Table 5 presents the results of applying Google natural language API and the number of positive comments in each language.

Evaluation of Photograph Quality
In previous investigation [36], we observed that most participants really care about the quality of photographs that affects tourists as they decide whether a spot is attractive to them or not. Thereby, photograph quality, as judged by attributes such as aesthetics and composition, is important for evaluating the attractiveness of sightseeing spots. To assess photograph quality, we discussed five approaches using heuristic and image processing methods, as described below. Subsequently, we administered the questionnaire survey to ascertain the best parameter for improving the performance of our proposed formula ( Figure 4).

1.
Method 1 (number of favorites): Users of Flickr can collect their favorite photographs. Then Flickr counts how many users like this photograph. We infer that a photograph with a higher number of favorites indicates high quality.

2.
Method 2 (number of views): Flickr counts views for each photograph. We consider that a photograph with a higher number of views represents a strong interest of other users. The higher number of views implies that this photograph might have high quality.

Method 3 (followers of photographers):
Presumably, a user with many followers tends to post high-quality photographs. For method 3, we collect 6361 photographers' information (such as the number of followers, number of photographs, and the year they joined Flickr) from our dataset. Considering that the year of joining Flickr influences the number of followers of photographers, our presumption might be unfair to Flickr novices. Therefore, we calculate the average annual followers of users up to the end of 2018 (Table 6). Then we ranked the scenic photographs through this information (In Flickr, if the number of followers is greater than 1000, then the value will become "1K". We cannot ascertain details of the numbers of followers in such cases. For that reason, in Table 5, the "K" of the followers is changed to "1000"). In addition, the photographers' works, including only one photograph were chosen as representative based on the number of favorites 4.
Method 4 and Method 5 (aesthetics and technique of photographs): Using the fourth and fifth methods, we adopt a method proposed by Talebi et al. [46]. They presented a novel approach called neural image assessment (NIMA), which can predict both technical and aesthetic qualities of photographs. Using the model of NIMA, the aesthetic and technique of 1159 scenic photographs can be evaluated as shown in Figure 5.  Choose low-quality photographs The top 10 photographs obtained using each method are extracted for the questionnaire survey. For this questionnaire survey, 50 participants (25 Japanese people and 25 Taiwanese people) who meet five requirements (as explain in Section 3.4) were asked to select, intuitively, those photographs having normal or below-normal quality. If the method yields many low-quality photographs, then it shows agreement with human perception. Table 7 presents the questionnaire survey results. The top 10 photographs obtained using each method are extracted for the questionnaire survey. For this questionnaire survey, 50 participants (25 Japanese people and 25 Taiwanese people) who meet five requirements (as explain in Section 3.4) were asked to select, intuitively, those photographs having normal or below-normal quality. If the method yields many low-quality photographs, then it shows agreement with human perception. Table 7 presents the questionnaire survey results. In Table 7, the first row presents the rankings of the respective photographs. The first column shows the name of each method. The last column represents how many votes the method received for the top 10 photographs. When regarding this table, one can realize how many people vote the photograph as the low-quality photograph. Method 2 (the number of views) received the lowest number of votes in this questionnaire, which means that this method is the most applicable approach to our research. Subsequently, the number of photograph views is used in our developed formula (Section 4.1). All the tourist attractions can be evaluated and ranked. Finally, the participants' familiar clusters are removed from the ranking results. The remaining spots are the less-known tourist attractions, which is our goal.

Rank the Scenic Photographs
Step E and Step F of our workflow will be presented in this section. Combining the result of Section 3, we propose a formula to assess the score of photographs. Finally, the participants' familiar clusters will be removed from the ranking result. The remaining spots are less-known tourist attractions, which are our goal.

Evaluation of Formula (Step E)
Considering the definitions of less-known tourist attractions and data construction, we propose a formula to rank the photographs. Using this formula, we can calculate the score of photographs for ranking.  In Table 7, the first row presents the rankings of the respective photographs. The first column shows the name of each method. The last column represents how many votes the method received for the top 10 photographs. When regarding this table, one can realize how many people vote the photograph as the low-quality photograph. Method 2 (the number of views) received the lowest number of votes in this questionnaire, which means that this method is the most applicable approach to our research. Subsequently, the number of photograph views is used in our developed formula (Section 4.1). All the tourist attractions can be evaluated and ranked. Finally, the participants' familiar clusters are removed from the ranking results. The remaining spots are the less-known tourist attractions, which is our goal.

Rank the Scenic Photographs
Step E and Step F of our workflow will be presented in this section. Combining the result of Section 3, we propose a formula to assess the score of photographs. Finally, the participants' familiar clusters will be removed from the ranking result. The remaining spots are less-known tourist attractions, which are our goal.

Evaluation of Formula (Step E)
Considering the definitions of less-known tourist attractions and data construction, we propose a formula to rank the photographs. Using this formula, we can calculate the score of photographs for ranking. 3 p=1 F pi W p + R i , 0 < W p < 1 and In Equation (1), the following variables are used: i represents each photograph; F 1i and F 2i respectively express the cognitive scores of Japanese prefectures and cities, as defined in Sections 3.3 and 3.4, with weights W 1 and W 2 are their weights; F 3i stands for the number of view in each photograph (describe in Section 3.5.2); W 3 is the F 3i weight; R i represents the number of positive comment of the photographs; and W 3 and R i are processed by feature scaling. Particularly, R i stands for an additional point in that we obtain the weight of R i as almost equal to 0 by entropy weight method (EWM). The reason is that most photographs have no associated comments. However, before visiting tourist attractions, most tourists refer to related comments and information. They then decide whether to go there, or not. Therefore, we presume R i as a necessary parameter because the positive comments might affect the perspectives of the other viewers. In this formula, the quality of photographs and positive comments were assumed as factors attracting someone to visit.
For the weight of Equation (1), we must set optimal weights for each parameter, but we do not know the importance of the respective parameters. Therefore, we applied EWM to calculate the optimal weights. Because it depends solely on the discreteness of data, EWM is an objective set weight method. Actually, EWM is used widely in the fields of engineering, socioeconomic studies, etc., [60][61][62]. In information theory, entropy is a kind of uncertainty measure. When information is greater, uncertainty and entropy are smaller. Based on entropy information properties, one can estimate the randomness of an event and can estimate the degree of randomness through calculation of the entropy value. Furthermore, entropy values are used to gauge a sort of degree of discreteness for an index. When the degree of discreteness is larger, the index affecting the integrated assessment is expected to be greater.
To complete the setting of the formula weights, we require the steps presented below.

1.
Calculate the ratio (P ij ) of the i-th index under the j-th index. Therein, x ij denotes the j-th index of the i-th sample. 2.
Calculate the entropy value (e j ) of the j-th index as shown below. 3.
Calculate the discrepancy of information entropy (d j ).

4.
Calculate the weight (w i ) of each index.
The prefecture cluster score (F 1i ), the city cluster score (F 2i ), and the number of views (F 3i ) of 1159 scenic photographs are used to calculate the weight of the formula by EWM. The weight results are presented in Table 8. The Taiwanese W 1 is equal to 0.2554, W 2 is equal to 0.2559, and W 3 is equal to 0.4887. Additionally, the Japanese W 1 is equal to 0.2725, W 2 is equal to 0.2061, and W 3 is equal to 0.5214. In Equation (1), the Taiwanese and Japanese weights differ in that their city clusters are assigned distinct scores based on questionnaire survey results, which affect all weights of parameters.

Current Result
Using this formula, all scenic photographs can be ranked; then Taiwanese and Japanese familiar city clusters (defined in Section 3.4) are removed from the ranking result. The remnant spots are less-known tourist attractions, as shown in Tables 9 and 10. Tables 9 and 10 present some Taiwanese and Japanese ranking results. The second and third columns are photograph cluster scores, as defined in Sections 3.3 and 3.4. The fourth column is the photographs' number of views collected from Flickr; the results of 5 IQA surveys are shown in Section 3.5.2. The fifth column shows positive comments about photographs, which are defined in Section 3.5.1. The last column presents the scores of places, as calculated using our formula. Particularly, before calculating the scores of places, the number of views and numbers of positive comments are processed by feature scaling. High scores are associated with places that might be attractive to travelers. Comparison of these results indicates great differences between Taiwanese and Japanese results; the differences of levels of results are distinct.

Verification Experiment
This section describes the verification experiment design and the experimentally obtained results. The results verify that the proposed method is reliable. First, our earlier experiment [36] revealed that participant preferences affect tourists as they decide whether a spot is attractive to them, or not. Thereby, we use image classification to categorize scenic photographs. Afterward, based on image classification results, we design three questionnaires from which participants can choose. In the last subsection, we present the questionnaire results and discuss them.

Image Classification
Considering the diverse preferences of various tourists, scenic photographs can be categorized into different labels using image classification. Subsequently, tourists can choose their favorite type of tourist attraction rapidly. With a view to building the image classifier model, we adopt the technique of transfer learning to retrain the Inception-v3 [63] model, which can save much time in training the model. Some parameters that Inception has already learned can be reused. We can build a highly accurate classifier using fewer training data. The Inception-v3 model is a convolutional neural network trained on more than a million images from the ImageNet. It has learned rich feature representations for widely diverse images. Moreover, Inception-v3 can identify images with 1000 object categories such as animals, vegetation, and landscapes.
For this step, based on the contents of 1159 scenic photographs, we defined nine labels to assign to these photographs. Especially, those labels include nightscape and snow in regard to the fact that the model of image classification is difficult to distinguish the dim photographs; likewise, some places covered with snow are also hard to identify the content of photographs. Considering that some scenic photographs were taken during the night/evening (when their subjects include the starry sky and evening seaside), nightscape and snow should be added to the label list.
Furthermore, 14,662 images were collected from Flickr and Google as our training dataset. Of the data, 10% were used to test the model. The remaining data were used to train the model. After 9000 training steps, the training accuracy of our model achieved 0.85, with validation accuracy of 0.83. Using this image classifier model, the 1159 scenic photographs are classifiable into distinct labels, as presented in Table 11.

Questionnaire Design
To verify the validity of our approach that discovers the less-known tourist attractions, we design the questionnaires for Taiwanese and Japanese participants in this section. Although full addresses of less-known tourist attractions are known, the cognitive levels of less-known tourist attractions are complex and difficult to delimit. Furthermore, in the previous study [35], most participants reported that except the address of their home and company, it is difficult for remembering other places' addresses. Therefore, our verification experiment must specifically address the recognition of Japanese cities (in which less-known tourist attractions exist) and provide photographs of less-known tourist attractions from these cities to respondents. The 10 questions are extracted for Japanese cities from Taiwanese and Japanese ranking results of less-known tourist attractions respectively. Since their ranking results are different, the distinct contents of questionnaires are provided for them. However, each label has no more than 10 Japanese cities because no city will contain all labels of scenes. For better calculation, less-known tourist attractions are classified into three categories based on feedback from earlier results [36]. Each category includes labels of similar properties: Category 1 includes mountains, forest, flowers, grass, and farmland; Category 2 includes oceans, rivers, and lakes; the third category is a composite, comprising category 1, category 2, snow and nightscapes. In this way, the questionnaire can be administered and analyzed easily.
Results demonstrate that 19 Taiwanese people (all of them are office workers with average age of 27) and 22 Japanese people (most engineering students with average age of 25) were recruited for the questionnaire survey. All of the participants meet five requirements as explain in Section 3.4. None participated in an earlier questionnaire survey. They were instructed to choose their preference of category which engenders dissimilar questionnaire contents.
Two questions were asked for each Japanese city. Scenic photographs were provided for the respondents' reference.
Somewhat do not want to visit.
Neutral. (4) Somewhat want to visit (5) Strongly want to visit For the first question, if participants probably knew the city, then the answer was "Yes". For the second question, respondents were instructed to assign a score of 1-5 to the attraction's photographs. Table 12 presents the preferred categories of respondents. In this experiment, most participants selected category 3 as their preference. Category 1 and category 2 were chosen by five people each.

Verification Experiment Results
Results obtained for categories 1-3 are explained in this subsection. In Figures 6-8, each point represents a Japanese city in the questionnaire. The x-axis shows what percentages of respondents know the Japanese city. The y-axis shows the attraction level of less-known tourist attractions. In these scatter plots, some points are overlapping because these places were assigned the same estimation. For example, if the two places were known by 20% of respondents and if the attraction levels of these places were equal, then their points would overlap in the scatter plot. Figure 6 indicated results obtained for category 1. One can infer that these places are known by only a few people. The Taiwanese result presents the average scores of three places as greater than four points; scores of one of these places approach the full mark, meaning that these places are attractive for Taiwanese respondents. By contrast, for the place with the lowest score, the scenic photographs show scenery similar to that in their own country. As a result, the respondents assigned few points to this place. For the result obtained for Japanese people, the average scores of four places are more than four points. However, a few places are assigned a low score because of the fact that the scenery is common in Japan. Especially, one place is known by 80% of people because that city is close to Tokyo, where the respondents live. The respondents are familiar with this city.  The Taiwanese result shows that only the average scores of three places are less than four. Moreover, one place score approaches full marks. Nevertheless, no one knows about this place. In other words, most category 2 places are known by only a few respondents, but the place appeals to them. For Japanese results, the average scores of four places are greater than four points. Particularly, one place is known by 80% of respondents. The reason is the same as that in the case of category 1. Furthermore, we investigated the answers of the respective Japanese respondents deeply and detected that the disparity between their decisions decreased the average. Additionally, we observed an interesting phenomenon: one place was assigned greatly different scores by Taiwanese and Japanese respondents. For the Japanese evaluation, this place is estimated as having the lowest score, but Taiwanese respondents assigned this place over four points. This situation expresses that the evaluation of less-known tourist attractions is subjective for respondents. Regarding the result obtained for category 3 (Figure 8), although the average score of only one place is over four points, the average scores of other places are over 3.5 points, indicating that Taiwanese respondents are not excluded from visiting these spots. Additionally, we detected that contents of photographs with the highest scores included snowscapes and Japanese castles, which are scarce in Taiwan. Taiwanese respondents reported that some places seem difficult to reach, which might influence their decision. The required cost of Taiwanese includes a monetary cost and time cost, which are higher than those of Japanese people. Consequently, Taiwanese prefer to choose tourist attractions that include local characteristics or exceptional landscapes. For Japanese results, we obtained the surprising result that the average scores of eight places are over four points: Japanese  The Taiwanese result shows that only the average scores of three places are less than four. Moreover, one place score approaches full marks. Nevertheless, no one knows about this place. In other words, most category 2 places are known by only a few respondents, but the place appeals to them. For Japanese results, the average scores of four places are greater than four points. Particularly, one place is known by 80% of respondents. The reason is the same as that in the case of category 1. Furthermore, we investigated the answers of the respective Japanese respondents deeply and detected that the disparity between their decisions decreased the average. Additionally, we observed an interesting phenomenon: one place was assigned greatly different scores by Taiwanese and Japanese respondents. For the Japanese evaluation, this place is estimated as having the lowest score, but Taiwanese respondents assigned this place over four points. This situation expresses that the evaluation of less-known tourist attractions is subjective for respondents.  The Taiwanese result shows that only the average scores of three places are less than four. Moreover, one place score approaches full marks. Nevertheless, no one knows about this place. In other words, most category 2 places are known by only a few respondents, but the place appeals to them. For Japanese results, the average scores of four places are greater than four points. Particularly, one place is known by 80% of respondents. The reason is the same as that in the case of category 1. Furthermore, we investigated the answers of the respective Japanese respondents deeply and detected that the disparity between their decisions decreased the average. Additionally, we observed an interesting phenomenon: one place was assigned greatly different scores by Taiwanese and Japanese respondents. For the Japanese evaluation, this place is estimated as having the lowest score, but Taiwanese respondents assigned this place over four points. This situation expresses that the evaluation of less-known tourist attractions is subjective for respondents. Regarding the result obtained for category 3 (Figure 8), although the average score of only one place is over four points, the average scores of other places are over 3.5 points, indicating that Taiwanese respondents are not excluded from visiting these spots. Additionally, we detected that contents of photographs with the highest scores included snowscapes and Japanese castles, which are scarce in Taiwan. Taiwanese respondents reported that some places seem difficult to reach, which might influence their decision. The required cost of Taiwanese includes a monetary cost and time cost, which are higher than those of Japanese people. Consequently, Taiwanese prefer to choose tourist attractions that include local characteristics or exceptional landscapes. For Japanese results, we obtained the surprising result that the average scores of eight places are over four points: Japanese Regarding the result obtained for category 3 (Figure 8), although the average score of only one place is over four points, the average scores of other places are over 3.5 points, indicating that Taiwanese respondents are not excluded from visiting these spots. Additionally, we detected that contents of photographs with the highest scores included snowscapes and Japanese castles, which are scarce in Taiwan. Taiwanese respondents reported that some places seem difficult to reach, which might influence their decision. The required cost of Taiwanese includes a monetary cost and time cost, which are higher than those of Japanese people. Consequently, Taiwanese prefer to choose tourist attractions that include local characteristics or exceptional landscapes. For Japanese results, we obtained the surprising result that the average scores of eight places are over four points: Japanese respondents are very satisfied with the less-known tourist attractions of category 3. The place with the lowest score is a view of snow-covered mountains. Japanese respondents think this place looks very chilly and report that there is nothing nearby.
respondents are very satisfied with the less-known tourist attractions of category 3. The place with the lowest score is a view of snow-covered mountains. Japanese respondents think this place looks very chilly and report that there is nothing nearby. In summary, this verification experiment indicates that most of these places are known by a few people, but the evaluation of less-known tourist attractions is the objective for respondents. Although two cities are known by most respondents, they do not know the details of the locations of scenic photographs. Furthermore, this experiment demonstrates that local residents and foreign visitors differ greatly in their evaluation of less-known tourist attractions. Table 13 shows that we organize the answers of Taiwanese and Japanese respondents for what percentages of people want to visit the less-known tourist attractions (who assign more than four points for the place). Table 13 presents the responses to 10 questions of each category. Then we can realize that almost all of these places are sufficiently attractive for someone to visit. Moreover, more than half of respondents are interested in the lesser-known tourist attractions that we provided. We discovered less-known tourist attractions that are attractive to some people.

Discussion
Since the less-known tourist attractions are assumed might be included in unfamiliar cities of tourists, we conducted the questionnaire survey to understand tourists' cognitive level of Japanese cities. However, people from different country might have various perspectives with Japanese cities. Thereby, to compare local residents and foreigner visitors' difference, Taiwanese and Japanese are invited to participate in the questionnaire survey. It is interesting to note that in this survey, interviews of some Taiwanese participants to ascertain what factors lead them to prefer to travel in Japan indicated four main reasons which are attractive to Taiwanese. The first reason is that air fare is cheaper and the flight time is short. The second reason is that the Japanese environment is neat and tidy. Furthermore, public security is high. The third reason is that Japanese food is delicious and exquisite. The fourth reason is that Japanese language characters and culture are similar to those of Taiwan, which can help Taiwanese people travel easily in Japan. In summary, this verification experiment indicates that most of these places are known by a few people, but the evaluation of less-known tourist attractions is the objective for respondents. Although two cities are known by most respondents, they do not know the details of the locations of scenic photographs. Furthermore, this experiment demonstrates that local residents and foreign visitors differ greatly in their evaluation of less-known tourist attractions. Table 13 shows that we organize the answers of Taiwanese and Japanese respondents for what percentages of people want to visit the less-known tourist attractions (who assign more than four points for the place). Table 13 presents the responses to 10 questions of each category. Then we can realize that almost all of these places are sufficiently attractive for someone to visit. Moreover, more than half of respondents are interested in the lesser-known tourist attractions that we provided. We discovered less-known tourist attractions that are attractive to some people.

Discussion
Since the less-known tourist attractions are assumed might be included in unfamiliar cities of tourists, we conducted the questionnaire survey to understand tourists' cognitive level of Japanese cities. However, people from different country might have various perspectives with Japanese cities. Thereby, to compare local residents and foreigner visitors' difference, Taiwanese and Japanese are invited to participate in the questionnaire survey. It is interesting to note that in this survey, interviews of some Taiwanese participants to ascertain what factors lead them to prefer to travel in Japan indicated four main reasons which are attractive to Taiwanese. The first reason is that air fare is cheaper and the flight time is short. The second reason is that the Japanese environment is neat and tidy. Furthermore, public security is high. The third reason is that Japanese food is delicious and exquisite. The fourth reason is that Japanese language characters and culture are similar to those of Taiwan, which can help Taiwanese people travel easily in Japan.
Considering that in a previous investigation [36], most participants really cared about the quality of photographs which affects tourists as they decide whether a spot is attractive to them or not. Hence, to find out attractive less-known tourist attractions and strengthen the formula which we proposed in this research, the 5 IQA methods were applied to assess the quality of photographs. To choose the best IQA method for the formula, we conducted the questionnaire survey and invited 50 participants to participate. An important finding was that after the IQA questionnaire survey, interviews of some participants were conducted to ascertain what factors led them to choose the photograph as the low-quality one. The main reasons were the photograph brightness and color saturation. Most participants prefer brilliant photographs and dislike obscure photographs. This survey only provided scenic photographs for participants to choose from, which might have led them to prefer brilliant and colorful photographs. The second reason is that a few participants were concerned about the photograph composition. Those participants know basic photography principles, which caused them to choose low-quality photographs often. In the next IQA survey, we investigate expert photographers and laymen (who have no knowledge of photography) along with their differences in choosing low-quality photographs.
To verify the result of less-known tourist attraction, the verification experiments are conducted which revealed interesting points: while we provide the same seascape photographs for Taiwanese respondents and Japanese respondents, for Taiwanese results, this place received high evaluations and attracted respondents to visit there, but this place received the lowest score among Japanese results. The reason is that these photographs show "torii", which are traditional gates of Japanese shrines. "Torii" are truly rare in Taiwan, but they are very common in Japan. Consequently, in this case, we can observe that foreign visitors are interested in special landmarks that their country does not have. That is to say, the scenic photographs including some special landmarks are expected to increase the attractiveness of these spots.
Investigation of potential tourist attractions is important for the tourism industry and for academics. Potential tourist attractions can not only promote economic development for a country, they can also enhance cultural communications. In academic assessment, very little was found in the literature on the issues of using social big data to identify those potential tourist attractions currently. Therefore, this study can encourage more researchers to assign importance to potential tourist attractions. The present study revealed some attractive less-known tourist attractions, which have insufficient information to estimate whether this place is safe or not. We expect that these potential places can be assessed further through field surveys by experts in the future.
Study limitations include the following: (i) this research collected scenic photographs as the dataset to discover less-known tourist attractions. The complex scenes did not exist in those scenic photographs which only included one to three subjects such as mountains surround the lake and forest with the river. Nevertheless, the nature scenes do not have fixed features (e.g., shape) that is hard to use object detection to annotate multiple labels for photographs and provide more information to tourists. Thereby, we use image classification simply to classify photographs into one label currently. (ii) From early investigation, most of Japanese participants minded to leave their background information. This situation makes it difficult to conduct a questionnaire survey and collect more samples. Hence, we decided to eliminate the questions of their background in this survey to prevent raising privacy concerns. In addition, the participants are strictly selected i.e., who meet five requirements as explain in Section 3.4. In this way, we can rely on their viewpoint and ensure the reliability of the result. (iii) In light of the limited sample used for the present study, a more comprehensive survey is expected to investigate more participants from different countries.

Conclusions
This study applied a novel method to identify less-known tourist attractions for people of different nationalities. The construction of the approach was undertaken based on two ideas. The first is ascertainment of local residents' and foreign visitors' unfamiliarity with Japanese cities.
Second, we propose a formula to evaluate the degree of tourist attraction, which includes different aspects such as image quality assessment (IQA), comment sentiment, and tourist attraction popularity for ranking tourist attractions. Cities that are familiar to participants are eliminated from the ranking results; the remnant spots are our target. Finally, through verification experiments, we confirmed that our result represents success.
Because COVID-19 has brought enormous damage worldwide and it has particularly influenced the tourism industry, most countries have lost great amounts of revenue. After the pandemic, tourism recovery efforts will be of paramount importance. Apart from original popular tourist attractions, tourism to other potential places can be developed, providing tourists with various tourist attractions. The use of the discovering less-known tourist attractions approach in future applied studies could contribute to developing the tourism industries worldwide as well as has the potential benefit to aid tourism recovery. Additionally, in accordance with our observations, most existing tourism recommender systems only recommend popular tourist attractions for tourists. However, certain tourists might feel tired of visiting those popular tourist attractions and interested in new places. Combing the information about less-known tourist attractions with existing tourism recommender system, tourists can be served with helpful and more comprehensive results. Besides, how to popularize and conduct propaganda for less-known tourist attractions is an important issue for the tourism industry in the future.
As future work, after collecting and analyzing more photographs taken in certain years, we expect to distinguish between local residents and foreign visitors in terms of their characteristic preferences. Considering more factors related to less-known tourist attractions (e.g., geography and population), we expect to improve the formula and cluster analysis used in this study. Less-known tourist attractions can be classified by season, weather, days, and nights according to the photograph times and contents. Furthermore, less-known tourist attractions can be assessed according to whether a place is readily accessible, or not, which can support tourists in their judgment about visiting a place. Currently, less-known tourist attractions have insufficient photographs to which tourists can refer. Therefore, we are working on simulating photographs at different times and seasons using a generative adversarial network (GAN). Moreover, other information related to SNSs (e.g., Instagram, Twitter, and Facebook) will be added to our dataset and subjected to cross-validation with our results of a questionnaire survey to verify the correctness of our result. Finally, we want to apply our results to travel recommendation services and provide various travel plans for tourists.