Coastal Tourism Spatial Planning at the Regional Unit: Identifying Coastal Tourism Hotspots Based on Social Media Data

There is an increasing need for spatial planning to manage coastal tourism, and applying social media data has emerged as an effective strategy to support coastal tourism spatial planning. Researchers and decision-makers require spatially explicit information that effectively reveals the current visitation state of the region. The purpose of this study is to identify coastal tourism hotspots considering appropriate spatial units in the regional scale using social media data to examine the advantages and limitations of applying spatial hotspots to spatial planning. Data from Flickr and Twitter with 30” spatial resolution were obtained from four South Korean regions. Coastal tourism hotspots were then derived using Getis-Ord Gi. Comparing the derived hotspot maps with the visitation rate distribution maps, the derived hotspot maps sufficiently identified the spatial influences of visitors and tourist attractions applicable for spatial planning. As the spatial autocorrelation of social media data differs based on the spatial unit, coastal tourism hotspots according to spatial unit are inevitably different. Spatial hotspots derived from the appropriate spatial unit using social media data are useful for coastal tourism spatial planning.


Introduction
Coastal tourism is growing rapidly [1]; since the 1950s, the number of international tourists has been steadily increasing, exceeding 1.4 billion in 2018 [2,3]. The coastal and marine tourism industry is also growing and is expected to employ 1.5 million more people by 2030 compared to 2010 (from 7 million employed in 2010 to 8.5 million by 2030) [4]. Coastal tourism typically has a negative impact on the surrounding coastal ecosystem because of tourists and facilities around the coast [5]. The Boracay Island, a vacation location in the Philippines, has undergone ecosystem changes due to over-tourism, with the island closing for six months in 2018 to reduce its impact on the ecosystem [6]. However, coastal tourism may also be a reason to sustainably preserve coastal ecosystems, and it can also benefit communities [7]. In "Life below Water," the 14th of the Sustainable Development Goals (SDGs), which specify the aims of the United Nations for 2030, one of the main targets is to manage coastal and marine tourism and to distribute its benefits fairly to the community [8]. As such, the growth of coastal tourism is an upcoming crisis and opportunity, and proper management is essential [9].
Coastal tourism can be defined as a generic term for travel, leisure, and recreationally oriented activities in coastal spaces and forms a close relationship between humans and the coastal environment [10][11][12]. To manage maritime and coastal activities, such as coastal tourism, and to manage conflicts, marine spatial planning is needed [13,14]. Today, as the marine economy grows and marine activities such as marine wind power increase, the use of marine spatial planning is increasing [14,15]. Spatial management planning can greatly assist decision-makers in coordinating human activity conflicts and conserving natural resources [9,16]. This can apply to coastal tourism, a marine activity in coastal spaces [12]. In fact, many coastal tourism spatial planning initiatives are being established in Europe, the United States, and China to minimize the impact on the coastal and marine ecosystem while sustainably developing various human activities and tourism industries that take place on the coastal areas [12].
However, many researchers have pointed out that there are some difficult problems in establishing marine spatial planning to manage coastal tourism in the actual field. First, there is a lack of fine scaled proxy data that can quantitatively identify the spatial distribution of various tourism activities in coastal areas. Data that can be utilized by decision makers in establishing a management plan for tourism is statistical data that periodically surveys the number of visitors to tourist destinations [17]. With statistical data, it is difficult to determine where people prefer and where people are concentrated [18]. Therefore, there was a limit to understanding the spatial distribution and concentration of coastal tourism and to understand the trade-off relationship with ecological impacts that may occur as the number of visitors increases [18,19]. Through this, it was difficult to grasp the "spatial heterogeneity" of the distribution of coastal tourism [20], which is important information in establishing spatial planning [19], and there were many cases of failure to derive the management priority of coastal tourism due to coarse-scale analysis [21].
Second, it is necessary to understand the spatial typology of coastal tourism as a prerequisite for establishing a spatial management policy for coastal tourism [22]. Much like other human activities, coastal tourism is geographically unevenly distributed [18,20,22], creating hotspots where tourists are concentrated in certain areas. This spatial distribution of tourist hotspots is caused by a combination of people's preferences for individual tourist destinations and the distribution of tourist attractions that have a place identity according to the visitor's experience [23,24]. Therefore, depending on the distribution of tourist attractions, the range and distribution pattern of tourist hotspots may vary regionally. In addition, the analysis of tourism hotspots should be different depending on the size of the administrative district, which is a policy decision-making unit for establishing and implementing spatial plans. By considering the characteristics and distribution of coastal tourist attractions that are differently distributed by region and analyzing hotspots suitable for the decision-making spatial unit, it will be possible to derive the management priority of coastal tourism.
Tourists and visitors enjoy tours while staying in and around tourist attractions, and tourism spatial hotspots are formed in places where tourists flock [25]. The area that attracts visitors to the tourist attraction is defined as the catchment area, and it can be determined by the size of the catchment area and whether the tourist attractions are local, regional, national, or international tourist attraction [26]. A tourist attraction consists of a primary attraction, tourism resources that primarily attract visitors, and a secondary attraction located around the primary attraction that encourages visitors to travel longer or provides different travel experiences [26][27][28].
In the past, identifying hotspots for tourists was a difficult task [26]. This is because data was needed that continuously surveyed the number of visitors targeting a wide area. In recent years, however, social media data has emerged as an indicator for the spatial representation of tourism [29][30][31]. As smartphones became popular from the late 2000s, people could upload their contents to social media while traveling. Geo-tagged social media data that recorded the location of users appeared, and since the early 2010s, researches using this in the tourism field have been actively conducted [29][30][31]. Studies have been conducted to determine whether social media data is related to the actual number of visitors, and many studies have mentioned that social media data correlates with the field data on the number of visitor in tourist attractions [30,32,33].
Geo-tagged social media data has several advantages compared to traditional data. First, social media data can cover a large space at a low cost. Therefore, it is now possible to collect homogeneous data for a wide area, and even tourism studies dealing with continental space become possible [29,34]. Second, social media data can cover a wide range of times. Therefore, it is effective when targeting tourist destinations that have not or could not have collected visitors in the past. It is also useful when trying to understand temporal patterns according to specific events or periods [30]. Lastly, since it is data with geographic information, it is effective in understanding the spatial pattern of visitors. Using this point, studies have been conducted on which landscape factors influence visits [31,35] and how they can be reflected in policies and planning, such as in conservation plans [18].
Social media data can represent visitation values in units of space, and can be used to identify hotspots: places where tourists flock. Although it is possible to visualize the spatial distribution of visitors using social media data, many studies have yet to target a small area at the local scale [17,25], or a large area at the continental scale [29,34]. In general, collecting data for regional spatial research is difficult as it requires high-resolution, continuous data over large areas [36,37]. Despite this challenge, it is very important to obtain such data to identify the spatial patterns for regional hotspots. Decision-makers and researchers of regional and national spatial units need to obtain appropriate hotspots for the spatial units of the site to be able to manage spatial planning. For that reason, we think this study can be a practical study that can help in actual spatial planning.
The purpose of this study was to determine whether spatial hotspots for coastal tourism can help coastal tourism spatial planning at the regional spatial unit. For this purpose, spatial social media data were collected from four coastal regions of South Korea (Table 1), and how spatial patterns and hotspots of visitation rates estimated using the data can inform coastal tourism spatial planning was discussed. We have also discussed the strengths and limitations of identified hotspots regarding regional spatial units and identified how they can be used in actual spatial planning.

Materials and Methods
This research was conducted in three stages ( Figure 1). In the first step, social media data were collected to estimate the spatial distribution of coastal tourism. Using the geoinformation of social media data, it is possible to identify users' spatial distribution in a wide area.
Second, the distribution of the visitation rate of the study area was estimated using social media data. Using Twitter and Flickr data and visitor statistics, the visitation rates were estimated in units of a 30" grid (approximately 700 m × 900 m).
Third, the spatial hotspot of the coastal tourism visitation rate was identified. To identify hotspots, the spatial statistics index Getis-Ord Gi was used, and spatial autocorrelation of each region was also calculated for this purpose.
Finally, it was discussed how social media data and spatial hotspots can support regional coastal tourism spatial planning. The discussion session consists of comparisons with distribution maps, differences according to spatial units, and the application of these to coastal tourism management.

Materials and Methods
This research was conducted in three stages (Figure 1). In the first step, social media data were collected to estimate the spatial distribution of coastal tourism. Using the geoinformation of social media data, it is possible to identify users' spatial distribution in a wide area.
Second, the distribution of the visitation rate of the study area was estimated using social media data. Using Twitter and Flickr data and visitor statistics, the visitation rates were estimated in units of a 30" grid (approximately 700 m × 900 m).
Third, the spatial hotspot of the coastal tourism visitation rate was identified. To identify hotspots, the spatial statistics index Getis-Ord Gi was used, and spatial autocorrelation of each region was also calculated for this purpose.
Finally, it was discussed how social media data and spatial hotspots can support regional coastal tourism spatial planning. The discussion session consists of comparisons with distribution maps, differences according to spatial units, and the application of these to coastal tourism management.

Study Area
South Korea consists of nine provinces and eight metropolitan city districts ( Figure  2). This study deals with regional spatial units smaller than national spatial units but larger than local and urban spatial units. As such, one province combined with metropolitan city districts located within the province is considered to be one region. This study focused on four regions: Jeollabuk-do, Jellanam-do, Gyeongsangnam-do, and Jeju-do. These regions are located south of the latitude 36° line ( Figure 2). All four regions are adjacent to the ocean and have unique characteristics (Table 1).

Study Area
South Korea consists of nine provinces and eight metropolitan city districts ( Figure 2). This study deals with regional spatial units smaller than national spatial units but larger than local and urban spatial units. As such, one province combined with metropolitan city districts located within the province is considered to be one region. This study focused on four regions: Jeollabuk-do, Jellanam-do, Gyeongsangnam-do, and Jeju-do. These regions are located south of the latitude 36 • line ( Figure 2). All four regions are adjacent to the ocean and have unique characteristics (Table 1). As this study covers coastal tourism, the coastal area for data collection and spatial analysis was designated as administrative districts of municipalities adjacent to the ocean. The areas, population density, and gross regional domestic production (GRDP) of each region were investigated as social statistical factors identifying basic regional characteris- As this study covers coastal tourism, the coastal area for data collection and spatial analysis was designated as administrative districts of municipalities adjacent to the ocean. The areas, population density, and gross regional domestic production (GRDP) of each region were investigated as social statistical factors identifying basic regional characteristics related to tourism. Population density was defined as the floating population and the potential number of local visitors in the region. GRDP is considered an indicator of investment in transport and tourism infrastructure [38,39].

Social Media Data and Visitation Data
The Korea Information Society Development Institute has been investigating media usage patterns in Korea through a sample survey method since 2011. According to a report published in 2019, the rate of social network service (SNS) use in Korea rapidly increased from 16.8% to 39.9% from 2011 to 2014, and it then showed a modest growth to 48.2% in 2018 [40]. According to statistics surveyed in 2017-2018, by age group, the usage rate was the highest for those in their 20s, at 82%, followed by those in their 30s (73.3%), 40s (55.9%), and teens (53.8%). When comparing the platform usage rate, Facebook was the highest, at 34%, followed by Kakao Story (27%), Twitter (14%), Naver Band (11.3%), and Instagram (10.8%). Men had the highest rate of Facebook use, and women had the highest rate of Kakao Story use, but other than that, there was no significant difference in the ranking of platform use rates between genders. In addition, the rates of Facebook and Instagram use were relatively high for those in their teens to 30s, and the use of Kakao Story and Naver Band was relatively high for those in their 40s to 60s.
After considering the usage rate of each platform, the availability of data, and past research literature, we collected geo-information from Flickr and Twitter to estimate the spatial distribution of coastal tourism. Facebook and Kakao Story have high usage rates but, unfortunately, we could not collect data due to technical limitations. Instead, Twitter, which has the third highest usage rate, could be used in this study because data with geo tags could be collected. In addition, Flickr was used in this study because it is a platform that allows people to upload geo-tagged images and has been used and validated in many studies in the past. Flickr and Twitter data are the social network service (SNS) data that have been most heavily used in tourism and recreation research over the last decade [30,32].
In order to make the data spatially homogeneous, it is necessary to set a reference grid. According to the guidelines for the assessment of marine spatial characteristics produced by the Ministry of Oceans and Fisheries in South Korea, it is stipulated that territorial seas can be spatialized using a square grid of 15 min and can be supplemented by using a grid of 1.5 min or 30 s [41]. Referring to the guidelines, a 30-s spatial resolution grid that could be used for actual spatial planning was also set as a reference grid in this study.
Flickr data from 2013 to 2017 was collected using the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) recreation model created by the Stanford University Natural Capital Team [42]. InVEST is a model developed to estimate ecosystem services. Among them, the InVEST-recreation model was developed to estimate eco-tourism. The InVEST-recreation model can collect Flickr data based on the Flickr Application Programming Interface (API) and calculate the number of daily Flickr users in polygonal units through input Area of Interest (AOI) and settings. After dividing the study area into 30" grids, annual average daily Flickr users were calculated by counting the number of people who uploaded photos in the grid. The total number of daily flicker users per grid in the study area was 17,335.
Twitter data in 2015 was obtained by referring to and using the "GetOldTweetspython" package produced in the Python language [43]. The "GetOldTweets-python" package is a package that helps in web crawling for tweets under specific conditions. In order to collect geo-tagged tweets, the center point and radius length of the spatial range to be collected are required. Using the center point of each grid and the length from the center point to the vertex, tweets recorded in all grids were collected. Twitter's spatial distribution and number were calculated using the same grid and calculation method used for Flickr. The total number of daily twitter users per grid in the study area was 131,899.
To graft social media data information onto reality, field data on number of visitors was required for comparison. South Korea's Ministry of Culture, Sports, and Tourism collects statistical visitation data on major tourist attractions in South Korea every year. Visitation data between 2013-2017 at 142 major coastal tourist attractions were collected and divided by five years to calculate the average number of visitors per year [44]. To spatialize the visitation data, the location and range of each tourist attraction was represented by a polygon feature class using satellite imagery (Appendix A).

Estimating Empirical Visitation Rate
Verification was conducted to ensure that social media data were suitable as data representing the number of tourist visitation. We calculated the annual average number of daily users who uploaded to Flickr and Twitter at tourist attractions. Then, using the Pearson correlation analysis between each social media data and visitation data, the validity of the social media data application was examined. As Twitter, Flickr, and visitor data are all in the form of power functions skewed to the left, they were converted to log form (Appendix B). If the social media data counted in the tourist attraction was zero, it was replaced with a not applicable (NA) value.
Multiple regression equations were constructed to calculate the visitation rate for each grid. The input variables, the Twitter and Flickr counts, and the dependent variable visitation data were converted to log form as they were power functions skewed to the left (Appendix B). If both the Flickr and Twitter counts in the tourist attraction were zero, both values were replaced with NA values. To reflect the characteristics of the region, they were added to the regression analysis as dummy variables. By performing k-fold cross validation, the generality and accuracy of the regression analysis was validated. The number of folds was set to 10, and the number of repetitions was set to 500 to reduce the bias that could occur when folding the folds. The root mean square error (RMSE) and correlation coefficient were used as verification accuracy. In the end, the visitation rate of coastal tourism was estimated by applying multiple regression equations to each grid.

Identifying Regional Hotspots for Marine Tourism
A hotspot generally refers to an area with higher values compared to its surroundings [45][46][47]. Tourist attractions and the areas surrounding them that are heavily visited and affect the surrounding space may be considered coastal tourism hotspots. As hotspots are determined from relative values, spatial autocorrelation and hotspot patterns differ according to the spatial unit [48]. Although certain grids are important hotspots in the local unit, they can be ignored in the regional and national units.
In this study, Getis-Ord Gi was used to derive coastal tourism hotspots. Getis and Ord introduced and devised Gi as a statistical value to determine local spatial autocorrelation [49]. Getis-Ord Gi is useful for determining local spatial autocorrelations that are difficult to determine when using global spatial statistics [50]. Getis-Ord Gi can be derived through Equation (1), where x j is the attribute value for feature j, ω i,j is the spatial weight between feature i and j, and n is equal to the total number of features [51]. (1) 7 of 20 The Hotspot Analysis tool in the ESRI ArcGIS program (10.1 ver) calculates Gi for each feature and measures the intensity of clusters of high or low values [49]. The Hotspot Analysis tool considers Gi Z-score values greater than 1.96, which statistically means the significance level is below 0.05, as hotspots. To calculate the Getis-Ord Gi, the Conceptualization of Spatial Relationships should be set. There are various conceptualizations such as inverse distance, K nearest neighbors, and fixed distance. As this study has to apply the unique spatial characteristics of each region, the fixed distance method was selected to set the region-specific threshold distance.
Global Moran's I is a statistic that can identify the overall spatial autocorrelation of a particular space [52][53][54]. Moran's I z-score can be used to determine spatial autocorrelation, and a positive z-score means that the spatial distribution of high values and/or low values in the dataset is more spatially clustered than would be expected if underlying spatial processes were random [55]. Moran's I can be derived through Equation (4), where z i is the deviation of an attribute for feature i from its mean, ω i,j is the spatial weight between feature i and j, n is equal to the total number of features, and S 0 is the aggregate of all the spatial weight [55].
"Peak distance", which is the distance at which Global Moran's I is greatest, can be used as the appropriate threshold distance, as it is the distance in which spatial autocorrelation is considered prominent [51]. For each region, the Global Moran's I value was obtained by increasing the distance by 700 m, starting at 803 m, the minimum distance between the grids. The peak distance, the distance with the highest spatial autocorrelation, was set as the threshold distance for each region. Following this, coastal tourism hotspots for each region were derived.

Correlation between Social Media Data and Visitation Data
The correlation coefficient between Flickr data and visitation data (0.7245) was greater than that between Twitter data and visitation data (0.5837) in all regions (Table 2 and Figure 3). As the zero value was replaced with NA, it was possible to count the number of tourist attractions where social media data exists by the degrees of freedom. In all regions, degrees of freedom of Twitter data analysis are greater than for Flickr data. Flickr data has a higher correlation with visitation data, while Twitter data had more information than Flickr data. As such, Twitter data is likely to fill areas and spaces that Flickr may not be able to cover. Additionally, the difference between the correlation coefficients between regions was not noticeable.

Estimating Visitation Rates for Coastal Tourism
Jeju Island, which has a correlation similar to the correlation coefficient between the entire social media data and visitation data, was selected as the default region of the region dummy variable. Then, a nonlinear multiple regression equation to estimate coastal tourism visitation rates was derived ( Table 3). The coefficient (R 2 ) of determination was 0.5913, and the resulting equation was considered an explanatory regression equation. As the difference in accuracy of the training data and the validation data was within 10 % of the training data accuracy, the generality of the methodology and model was considered sufficient (Appendix C).

Estimating Visitation Rates for Coastal Tourism
Jeju Island, which has a correlation similar to the correlation coefficient between the entire social media data and visitation data, was selected as the default region of the region dummy variable. Then, a nonlinear multiple regression equation to estimate coastal tourism visitation rates was derived ( Table 3). The coefficient (R 2 ) of determination was 0.5913, and the resulting equation was considered an explanatory regression equation. As the difference in accuracy of the training data and the validation data was within 10 % of the training data accuracy, the generality of the methodology and model was considered sufficient (Appendix C). The spatial distribution of the estimated visitation rate is shown in Figure 4. Jeollabukdo had the lowest overall visitation rate among the four regions (Figure 4a). Grids with high visitors were distributed around the Byeonsan Peninsula National Park, Saemangeum Seawall, Gyeokpo Port, and Gunsan Port. In Jellanam-do, high-value grids were fragmentally distributed, because of the large number of islands and complex long coastlines (Figure 4b). Grids with high visitation rates were prominent in Hong Island, Suncheon Bay wetlands, and three major ports (Mokpo, Yeosu, and Gwangyang). In Gyeongsangnam-do, the number of visitors from Busan city, the largest city in the study area, was significant ( Figure 4c). The maximum value was 38,484,630 persons/year, which was the value of the grid located in Jagalchi Market in Busan metropolitan city, South Korea's largest fish market. Grids in the cities with huge ports (Busan Port, Masan Port, Okpo Port, and Tongyeong Port) have high visitation rates. In Jeju Island, high-value grids were distributed at the Seongsan Ilchulbong Peak, one of the UNESCO World Heritage Sites, and the Jungmun Jusangjeolli Cliff, Jeju Port, and Seogwipo Port (Figure 4d). mangeum Seawall, Gyeokpo Port, and Gunsan Port. In Jellanam-do, high-value grids were fragmentally distributed, because of the large number of islands and complex long coastlines (Figure 4b). Grids with high visitation rates were prominent in Hong Island, Suncheon Bay wetlands, and three major ports (Mokpo, Yeosu, and Gwangyang). In Gyeongsangnam-do, the number of visitors from Busan city, the largest city in the study area, was significant (Figure 4c). The maximum value was 38,484,630 persons/year, which was the value of the grid located in Jagalchi Market in Busan metropolitan city, South Korea's largest fish market. Grids in the cities with huge ports (Busan Port, Masan Port, Okpo Port, and Tongyeong Port) have high visitation rates. In Jeju Island, high-value grids were distributed at the Seongsan Ilchulbong Peak, one of the UNESCO World Heritage Sites, and the Jungmun Jusangjeolli Cliff, Jeju Port, and Seogwipo Port (Figure 4d).

Regional Hotspots for Coastal Tourism
We calculated "Peak distance" as the fixed threshold distance, which means the distance at which Global Moran's I was greatest in each region. It represents the optimum distance for identifying statistically significant value on spatial clustering. Moran's I was calculated by varying the distance value for each region ( Figure 5). Considering every distance segment, the values on peak distance was identified. The length of the peak distance varies in descending order: Gyeongsangnam-do (16,903 m), Jellanam-do (7603 m), Jeju-do (5003 m), and Jeollabuk-do (3603 m).
Regional hotspots are shown in Figure 6. In Jeollabuk-do, Jellanam-do, and Jeju-do, hotspots were formed around major coastal tourist attractions. In particular, large hotspots were formed around the port. In the Jellanam-do region (Figure 6b), hotspots existed around the main coastal tourist spots (Boseong Green Tea Field, Goheung Bay), which were difficult to find using only the distribution of the visitation rate map. In Gyeongsangnamdo (Figure 6c), Busan city was so influential that all areas except Busan city were marked as cold spots. This result was inappropriate for the identification of hotspots in other areas of Gyeongsangnam-do. The presence of large metropolitan areas in or around regions seems to distort the hotspot results in a regional scale.

Social Media as an Evidence of Spatial Distribution of Coastal Tourism
Many studies have shown that social media data can be used as an indicator of tourism visitation rates and can cover a wide range of spaces [29][30][31]. In previous research on the correlation between social media data and visitation data, Pearson correlation coefficients of 0.5-0.7 between these datasets have been observed [30,33]. Additionally, correlation coefficients between visitation data and social media data for coastal tourist attractions in this study were 0.7245 (Flickr) and 0.5837 (Twitter). Through this result, it was

Social Media as an Evidence of Spatial Distribution of Coastal Tourism
Many studies have shown that social media data can be used as an indicator of tourism visitation rates and can cover a wide range of spaces [29][30][31]. In previous research on the correlation between social media data and visitation data, Pearson correlation coefficients of 0.5-0.7 between these datasets have been observed [30,33]. Additionally, correlation coefficients between visitation data and social media data for coastal tourist attractions in this study were 0.7245 (Flickr) and 0.5837 (Twitter). Through this result, it was reconfirmed that social media data is spatial data that can represent visitation of tourists.
In addition, it was confirmed that data from various platforms should be converged to obtain accurate information on visitation rate. Flickr data had higher accuracy but fewer observations, while Twitter data had a greater numbers of observations but relatively lower accuracy compared to Flickr data ( Table 2). These differences in social media data are caused by differences in the characteristics and users of each social media platform [56]. Previous studies using multiple social platforms have identified that the convergence of social media data can make up for shortcomings and increase reliability [33,57]. As such, the integration of each social media data that have different characteristics is required. Models could be created to compensate for the shortcomings of each data platform (Table 3 and Appendix C).
Finally, by observing the distribution of visitation map, it was confirmed that visitation rates are high in cities, ports, transportation facilities, and tourist attractions, which are expected to have high visitation rate. Because the same source and methodology are used for a wide area, it is possible to quantify a wide area with the same criteria. Using this point, we found that the area most visited by visitors in the study area (coastal area of 4 regions) is Jagalchi Market in Busan city in Gyeongsangnam-do, which is something that cannot be done with the traditional research method, which only measures visitors to specific tourist attractions.

Characteristics of Coastal Tourism Hotspots Using Social Meida Data
In all four regions, the hotspots created at the ports were larger than the hotspots of other major tourist attractions. For a large hotspot to be created, the primary attraction may be very wide, or secondary attractions may be located around the primary attraction. Coastal cities are usually formed around large ports [58,59], and in the city there are numerous secondary attractions, such as restaurants and lodgings, that are believed to increase the area of hotspots. Additionally, because the port acts as a hub connecting other coastal areas or islands, it can be a hotspot for visitors who want to visit other areas.
This tendency can be seen in connection with spatial autocorrelation of coastal tourist visitation patterns by region. The peak distance, which means the threshold distance that can have the greatest spatial autocorrelation, can be regarded as the distance that the tourist attractions have an influence. In instances where there are very wide and influential tourist attractions or cities with strong influences in the region, the peak distance of the region is likely to be large. Even on the regular data, the regions with high peak distances in this study showed relatively high population density and GRDP per area, indicators of urbanization and economic power (Table 1 and Figure 5).
If the distances between hotspots are small, a wide-area hotspot can be formed. In the case of Jeollabuk-do, the Saemangeum Seawall (A1) and city with Gunsan Port (P1) merge to form a wide-area hotspot (Figure 6a). In Jeollanam-do, Suncheon Bay (N4), Yeosu Expo Park (A2), Yeosu Port (P5), and Gwangyang Port (P3) form a wide-area hotspot (Figure 6b). These spots are important tourist attractions throughout the region, and in terms of planning, it is possible to consider the connection between hotspots that make up a wide-area hotspot.
If there is a metropolis with too many visitors in the region, the hotspot map may be distorted. Busan city, located next to Gyeongsangnam-do, has a beach visited by approximately 10 million visitors a year and is a large metropolis with the two largest ports of South Korea. As Busan city is one of the most famous tourist cities in South Korea [60], the peak distance of Gyeongsangnam-do was determined at approximately 14 km due to the influence of Busan. It was not possible to identify other hotspots in Gyeongsangnam-do outside of Busan (Figure 6c). The influence of the metropolis may be meaningful at the national scale, but is difficult to use at the regional scale. The huge metropolitan city, which is influenced by international tourism, should be regarded as a separate region, and it is appropriate to conduct hotspot analysis only in Gyeongsangnam-do, omitting Busan city.
By deriving and comparing hotspots according to different spatial scales targeting the same area, various spatial contexts can be identified. If hotspot analysis is performed only on Busan city, hotspots may be identified around Haeundae Beach, Jagalchi Market, and Busan Port, which are major tourist attractions of Busan (Figure 7b,c). From the local perspective in Busan, major tourist attractions are hotspots for coastal tourism. However, from the regional or national perspective, Busan metropolis itself is a huge hotspot (Figure 7a). As such, the importance of places and the main targets to be considered are altered based on the spatial unit level. Therefore, it is important to conduct the analysis according to the viewpoints required for research and policy decisions.
On the estimated visitation rate distribution map, high values were fragmented, making it difficult to interpret the map. Fragmented information may be useful at the local scale, but it is likely to be unessential information at the regional scale. In contrast, key locations can be visualized on hotspot maps, making it easier to find important locations on the regional spatial unit. Furthermore, the hotspots map can intuitively provide spatialcontext information on how this distribution appears spatially, going beyond the simple fact that the number of visitors to a specific area is high and low. This information may help decision-makers and guide their focus for making management decisions to help them build customized strategies [9]. In conclusion, visitation hotspot maps derived at the regional scale were considered to be more useful for decision-makers than spatial distribution maps of estimated visitation rates.

Application of Coastal Tourism Hotspots for Regional Spatial Planning
Coastal tourism is one of the many human activities that create conflicts in coastal spaces because coastal tourism is sometimes incompatible with other human economic activities or protection activities [12,61]. Coastal tourists travel to places where the coastal scenery is beautiful; however, the coastal tourist facilities developed around them have an adverse effect on the coastal environment [5,62]. In terms of managing coastal environments and ecosystems, spaces accommodating many visitors may be regarded as vulnerable spaces. If an ecologically important area receives a high amount of visitation, protection policies such as establishing protected areas may be considered [10,45]. Although coastal tourism is an activity on the coast, marine activities must also be considered. It has been argued that certain human activities in marine environments, such as mining and offshore wind power, conflict with coastal tourism [61,63]. This may be due to the conflict where coastal tourism places put pressure on marine natural environments and landscapes but relies on these spaces.
For the sustainable management plan of coastal tourism, it is essential to consider the spatial characteristics of tourist attractions and coastal tourism visitation [12,57]. Spatial planning is a tool that assists decision-makers at different administrative levels in identifying and managing tradeoffs for various human activities [9]. Hotspot analysis using social media data makes it easy to visualize where visitors flock and their spatial impact at different administrative levels. In particular, if hotspots derived from different observation levels can be compared, various contextual meanings can be derived. For example, the hotspots map in Figure 7b makes it easy to see that Haeundae district is a tourism hotspot in Busan city. If decision-maker can refer to Figure 7c, a map of coastal tourism hotspots in the Haeundae district and get help from other sources, he or she can see that Haeundae Beach, located on the beach, is an important place to form coastal tourism hotspots. Through this, it can be seen that the beach in the hotspot area has a great influence on tourism at the regional scale of Busan city.
Finally, at the stage of spatial planning, overlaying and comparison with other data layers should be performed. Comparing different information in the same grid or district is an essential part of spatial planning and, for example, there is a study suggesting that National Park management plans can be established by superimposing geo-location data collected from social media that represent spatial distribution of visitors and discovery record data of key indicator species [18]. As such, it is possible to designate the recommended use in a specific space through comparison with various auxiliary data such as the environmental layer expressing the state of the natural ecosystem, the geological layer representing aggregate resources, and the transportation network layer considering the floating population. Ultimately, in the context of a regional spatial unit, if it is possible to distinguish between areas to be preserved and areas to develop tourism facilities in order to maximize tourism value, it can be regarded as successful coastal tourism spatial planning.
In addition, if hotspot analysis is carried out considering each timeline, the spatial change of coastal tourism can be reflected in spatial planning. Furthermore, to fully represent distinct features of tourism, classification of visitation characteristics per locals and tourists can be conducted in future research [35]. Hotspot analysis using social media data is able to identify the most important spots in terms of coastal tourism at a spatial unit, and thus it is useful for researchers and decision-makers.

Limitations of the Study
There are limitations in this study, and it is considered that they should be supplemented in future studies. First, there are various social media platforms, but in this study, only data from Twitter and Flickr were available. Because the user pattern is different for each social media platform [40], if different social media platforms can be used, it will be able to represent various users. In our opinion, if a photo-sharing-based platform with a large number of users (e.g., Instagram) can be added to the study, it will be possible to enrich the data. The activity of taking pictures is also an activity that is related to tourism, and in particular, the time when the image was taken and coordinate information are expected to be good information.
Second, the Twitter collection period was short compared to the Flickr collection period because of technical limitations. In the case of Twitter, the amount of information in the data is more than that of Flickr, and because the data was collected using a web crawling method, the rate at which data was collected was inevitably slow. Even so, as shown in Figure 3, it seems that there is no problem in grasping the average pattern using only one year of data. However, if the collection period of Twitter is matched with the collection period of Flickr, it will be possible to produce complementary results such as evaluating temporal changes in data.
Third, the actual RMSE was high as visitation data had a distribution of power function form where the difference between the minimum value and the maximum value was approximately 1000 times (Appendix B). This problem arose from the distribution of real data. To obtain an estimate with very small actual RMSE, further specialized research is required.
Finally, whether social media data outside tourist attractions fully represents tourism remains unanswered, which emphasizes the need for further application and validation of social media data. To determine this, it is necessary to analyze the content of social media data. Techniques of analyzing the text of data using an algorithm or classifying images by using deep-learning technique could be considered [64]. In the future, if the related research continues and these techniques become available, refined data could be used for research. As such, to increase the applicability of social media data for supporting effective spatial planning that promotes sustainable coastal tourism, it is necessary to identify hotspots in appropriate spatial unit and verify the credibility of social media data.

Conclusions
With the spread of smartphones and the invigoration of social media platforms, decision-makers and researchers have been able to acquire spatially continuous big data in the tourism sector. Through this, the spatial domain that can be covered has been expanded, but the spatial pattern represented by the data has become more complicated. Spatial statistical analysis is a way to simplify this. In the future, big data will be poured out from various fields, and people's demands for space will be diversified. In order to prepare for this trend, more research should be conducted to consider decision making using big data and spatial statistical techniques.
However, there are concerns because of the inherent limitations of social media data. Because it relies on media platforms created by companies, there is a limit to data access according to the company's terms and conditions. In addition, because the social media environment changes very rapidly, the popularity of the platform changes very rapidly from a long-term perspective. Therefore, it is difficult to stably collect data, and there is a possibility that the method of analyzing social media data is not sustainable. However, it should be pointed out that the possibilities and effectiveness of social media data are still endless and valuable. Social media data is much less expensive to build than traditional field data. In addition, being able to cover large areas and time zones homogeneously is a great advantage that cannot be given up in the process of research and policy making. For this reason, we believe that social consensus and policy discussions dealing with access and use of data should continue.
Particularly, considering rapidly growing tourism in coastal areas, it is necessary that emerging big data be reflected in marine spatial planning. In this regard, this study identified the regional tourism hotspots in coastal areas by considering regional spatial typology and showed the way to apply social media data in spatial planning at the regional level. We discussed the usefulness of social media data in managing coastal tourism to offer insights for establishing optimum marine spatial planning. We hope this study will inspire other spatial planning studies, particularly for tourist attractions in coastal spaces.

Explanation of Statistical Visitation Data
The collected data is provided on the website of the Statistics Korea [42]. In this study, statistical visitation data of 142 major coastal tourist attractions located on the coast of the study site was used (Table A1).
In order to spatialize the statistical visitation data, the spatial form of the tourist attractions was made of polygons using satellite images provided by ESRI. After that, the number of social media data users was calculated in spatial units of tourist attractions and, as a result, the spatial resolution of the social media data and statistical visitation data were matched ( Figure A1). The data generated in this way were used for correlation and regression analysis conducted in Section 2.3. Figure A1. Examples of spatialization of major coastal tourist attractions. The others 23 Figure A1. Examples of spatialization of major coastal tourist attractions.      Figure A3. Histogram of the annual average daily Flickr users. Figure A3. Histogram of the annual average daily Flickr users. Figure A4. Histogram of the annual average daily Twitter users. Figure A4. Histogram of the annual average daily Twitter users.