On the Representativeness of OpenStreetMap for the Evaluation of Country Tourism Competitiveness

: Since 2007, the World Economic Forum (WEF) has issued data on the factors and policies that contribute to the development of tourism and competitiveness across countries worldwide. While WEF compiles the yearly report out of data from governmental and private stakeholders, we seek to analyze the representativeness of the open and collaborative platform OpenStreetMap (OSM) to the international tourism scene. For this study, we selected eight parameters indicative of the tourism development of each country, such as the number of beds or cultural sites, and we extracted the OSM objects representative of these indicators. Then, we performed a statistical and regression analysis of the OSM data to compare and model the data emitted by WEF with data from OSM. Our aim is to analyze the tourist representativeness of the OSM data with respect to ofﬁcial reports to better understand when OSM data can be used to complement the ofﬁcial information and, in some cases, when ofﬁcial information is scarce or non-existent, to assess whether the OSM information can be a substitute. Results show that OSM data provide a fairly accurate picture of ofﬁcial tourism statistics for most variables. We also discuss the reasons why OSM data is not so representative for some variables in some speciﬁc countries. All in all, this work represents a step towards the exploitation of open and collaborative data for tourism.


Introduction
Understanding tourism competitiveness of countries has become a key aspect to destinations.Tourism has shown to highly impact the social-cultural environment and economic growth of a country [1].Therefore, countries invest a huge amount of money to collect data related to tourism industries, attractions, infrastructure, and so on.In addition, several organizations, such as the World Economic Forum (WEF), collect and analyze data from several countries in order to determine how competitive countries are in the tourism sector.WEF is a well-known organization devoted to the dissemination of world-wide data that also emit data which show the state of tourism competitiveness of countries.In a broad way, WEF is an organization for public-private cooperation that engages the foremost political, business, and other leaders of society to shape global, regional, and industry agendas [2].WEF has published the Travel & Tourism Competitiveness Report since 2007.
The analysis of tourism on the economies typically relies on official tourism statistics provided by governments and institutions.Parallel to the dissemination of official statistical data, Information and Communication Technologies (ICT) particularly in general and mobile and social network technologies have opened a new door, and data coming from these new sources are used to analyze tourism, as shown in several recent studies [3,4].These online tools, social networks, and collaborative platforms have emerged as a relevant data source to understand tourism behavior and traveling trends [5][6][7][8] to create accurate tourist profiles [9,10] and to elicit a picture of the tourism industry [11].
A remarkable example of these new sources is the free mapping service offered by the collaborative mapping platform OpenStreetMap (OSM) [12], with around 37,000 active contributors during a typical month.OSM is claimed to be the largest freely and openly accessible database of geographic data in the world [13].It emerges as an alternative to the restricted use of other mapping services, such as Google Maps.One argument in favor of Google Maps could be the wide range of advanced features that it offers (street-view images, multimodal navigation, social recommendations, etc.).However, some services based on the OSM database also provide them.For example, Mapillary (www.mapillary.com(accessed on 13 January 2021)) is a service for crowdsourcing street-level photographs using smartphones and computer vision (with more than 1400 million geotagged photographs) or OpenRouteService (www.openrouteservice.org(accessed on 1 November 2020)) which provides multimodal navigation services, among other geography-related features (such as geocoding, isochrones, time-distance matrix, etc.).Numerous applications based on OSM can be found in the list of OSM-based services (https://wiki.openstreetmap.org/wiki/List_of_OSM-based_services (accessed on 30 May 2020)), some of them related to tourism services.Additionally, OsmAnd (https://osmand.net/(accessed on 25 July 2020)) and MapOut (https://mapout.app/(accessed on 21 December 2020)) provide some tourismrelated services, such as offline mobile map viewing, navigation, POI searching, and tour management.Other works describe applications for e-bike navigation [14], the construction of sidewalk geometries for wheelchair users [15], or the evaluation of the impact of postdisaster recovery in tourist destinations [16].
This paper presents an exploratory analysis of the OSM data set and compares the obtained insight with the publicly available data of the tourism competitiveness provided by WEF for a group of about 130 countries worldwide.Specifically, we are interested in studying the representativeness and reliability of tourism-related data found in an open and collaborative platform, such as OSM; that is, our aim is to analyze how well the OSM data reflect the actual tourism competitiveness data from the WEF across eight indicators.We will investigate the relationship between OSM and the WEF tourism competitiveness report through regression models to study the relationship between the data collected from OSM for an indicator and the official values of such indicators in WEF.
Sometimes, official information is difficult to find, it is not possible to access it at the desired level of granularity, or it is not easily upgradeable.As explained above, social networks and collaborative platforms have emerged as a relevant and alternative data source that can be used in these cases.Therefore, in this paper, we will examine the tourismrelated information of OSM and determine in which cases OSM is a reliable alternative data source to WEF and can be used for forecasting.In a nutshell, given the common acknowledgement that OSM is a powerful and user-friendly geo-data platform extensively used for tourism purposes, our aim is to give response to the following question: does OSM provide an accurate picture of the studied components of tourism competitiveness?.That is, we are interested in analyzing whether the elements mapped in OSM can be used to infer some WEF data.If the answer is yes, OSM data can be used to, for example, analyze the same components of tourism competitiveness at a more specific area (not necessarily at a country level, as WEF provides).Otherwise, we will analyze which aspects make this task difficult.
Given the nature of the OSM data, which is mainly related to attractions, accommodation, and infrastructure, the components of tourism competitiveness that will be analysed in this paper are those concerned to the endowments of these elements in each country.Therefore, other tourism competitiveness aspects, such as the dimension of touristic flows, pricing policies, destination marketing, the reputation of the place, and so forth are out of the scope of the analysis presented in this paper.Specifically, we will focus on attractions and accomodation, which are related to eight WEF indicators.
We will carry out an statistical and regression analysis of eight different tourism indicators over 133 countries from two different points of view: (1) considering all the countries as a whole, and (2) splitting the countries into three groups according to their ICT level given by the ICT readiness pillar of WEF.The reason for this double analysis is that, according to [17], the status of a country's ICT services will determine, for instance, the success of a Volunteered Geographic Information (VGI) initiative or the expected growth in the years to come.Moreover, previous investigations [18] found that although OSM has had great global success, there is still a clear difference in the volume of contributed data between affluent and poorer communities.Therefore, we will also examine whether the country ICT level is an influential factor in the relation between OSM and WEF.We hypothesize that a higher ICT level would imply a better representativeness of OSM with respect to official data sources, given that technology in these countries is more easily accessible and hence users will participate more intensely in collaborative platforms (OSM, in this case).
An additional aspect that must be mentioned is that the two data sources we handle in this work, WEF and OSM, are of a very different nature, and thereby it is not always possible to measure exactly the same concept in both sources.For example, it could be the case that a particular variable is measured in different units in OSM and WEF, or it is not possible to find an exact element in OSM to a given WEF indicator.In both cases, some approximations have been computed, and we will discuss the limitations we have found regarding this.
Our Research Questions can be summarized in the following: The paper is structured in the following sections.Section 2 gives an overview of previous work that uses OSM data in several contexts.Section 3 describes the WEF and OSM data sources used in our analysis.Section 4 describes the analysis we performed with WEF and OSM data, Section 5 presents the outcomes of this analysis, and Section 6 discusses these results.Finally, in the last section, we outline the conclusions and future research directions.

Related Work
Volunteered Geographic Information (VGI) [19] systems have emerged as an answer to the need for open and easy-to-use geographic data and as an alternative to Commercial Geographic Information systems which impose restrictions on the use of the data.Technological advancement has fostered the emerging role of the citizen as a source of data.Citizen sensing has dramatically affected mapping and map use, impacting on routine daily life activities, such as gaming and tourism, as well as on science and technology more generally [20].Due to the proliferation of location-aware devices and the opportunities of Web 2.0, it is now possible for citizens to easily acquire geographical information, which may dramatically reduce the cost of map acquisition [21] and also allows to usually have up-to-date maps [22].Additionally, it can become a tool for the empowerment of marginalized individuals and social groups [23].
However, citizen-derived data are also often of varied quality and trust levels.For example, the data generated may be poorly described and associated with little metadata.Additionally, there are other considerations in the use of VGI, including ownership rights, as well as privacy, legal, and ethical issues [20].
OpenStreetMap (OSM) is one of the most well-known VGI projects.The crowdsourced approach of OSM derives its success from citizens mapping and collecting data and information about their locality [13].Features being mapped include the location of garbage cans, pedestrian crossings, land cover types, shops, education facilities, to government buildings, roads, and river networks.All data in the OSM database can be downloaded for free in a variety of spatial data formats.Additionally, a number of open source tools are available to process this data and produce other formats [21].The OSM project counts on experienced volunteers that spend time checking, updating, and improving OSM data.The process of validation aims to ensure the completeness and quality of data.Nevertheless, the fact that the OSM is either non-commercial or governmental and that validation is carried out by volunteers sometimes puts the validation of data in question [20].
In order to alleviate the doubts concerning the quality and precision of OSM data, a large number of works have investigated the robustness and validity of OSM in several fields, like in environmental epidemiological and exposure assessment studies [24].This study compared OSM and Governmental Major Road Data in three different regions: Massachusetts (USA), Bern (Switzerland), and Beer-Sheva (South Israel).This investigation found that OSM data was fairly complete and accurate in all regions, and that the results in all regions were robust, with Massachusetts showing the best fit (R 2 of 0.93).
In the same direction, the work [25] evaluates the quality of OSM data with respect to its suitability for a certain application, specifically for pedestrian navigation.The analysis compares routes calculated with OSM data and routes done with the German topographic data set, using accessibility and length of routes as quality criteria.The study concludes that OSM is fairly accurate on average within about six meters of the position recorded by the Ordnance Survey, and with approximately 80% overlap of motorway objects between the two datasets.
Another relevant work is about comparing the accuracy of the OSM data on land use in four German metropolitan areas versus the Global Monitoring for Environment and Security Urban Atlas as a reference [26].The study reveals the suitability of using OSM as an alternative complementary source for extracting land use information as it also highlights the potential of collaboratively collected land use features by mappers.
There have also been attempts to evaluate the quality of OSM-in terms of completeness, and positional and semantic accuracy in the cultural sector.In [27], authors show that the number of museums of Italy mapped in OSM accounts for 86% of the official total.In addition, OSM has records of positional and semantic information of 39% of the museums overall.The study also states that for 77.7% of the museums, the location reported by OSM is less than 150 me away from the actual location of the museum.Likewise, 90% of the museums have a similar denomination in OSM and in the official sources.
OSM has also been used to predict socio-economic indicators (sustainability, human development, vulnerability, risk, resilience, and climate change adaptation) for municipalities.In [28], authors present an interesting study that highlights the prospects of OSM to analyze interdisciplinary topics and factors like social cohesion, and provide meaningful insight into the spatial differences in social, environmental, or economic inequalities.One of the conclusions of this study is that further research is needed to determine the impact of regional and international differences in user contributions on the outputs.
In the specific field of tourism, we found some works that use OSM in analysis tasks.For instance, in [29], a framework for the assessment of the quality of OpenStreetMap is depicted.The approach analyses several quality measures, such as completeness, compliance, consistence, granularity, richness, and trust of OSM tags in Spain.The authors conclude that the current status of the Spanish OSM data can be considered satisfactory in some indicators (compliance and consistency), while in some others (granularity and richness) it should be improved.For tourism POIs, some elements are still missing.For instance, shopping and amenity destinations should include opening hours, phone numbers, and so forth, and specific categories like restaurants or hotels should include more detailed information (prices, cuisine, stars, etc.).
In the same way, ref. [30] evaluated the consistency of the information contained in the Compendium of Tourism Statistics of the World Tourism Organization with respect to the information published in OSM, especially information on places of accommodation, food and beverages, and travel agencies.Among the results shown in this paper, the high correlation that exists between the data from both sources with respect to informa-tion on accommodation (0.81), food and beverage sites (0.87), and travel agencies (0.82) is remarkable.
In [31], the authors exposed how they used OSM data along with data from official sources and other platforms with the objective of identifying spatial patterns in park popularity in the state of Victoria, Australia.Statistically significant correlations were found between official data and OSM data, indicating that OSM vertices' density in a given area can be used to infer the number of visitors.
Finally, in [32], a methodology for computing composite indicators derived from OSM data as an alternative to statistical offices was presented.To demonstrate its use, they applied this methodology to a number of indicators used for real estate valuation of properties in Italy.Among these indicators, they considered a number of sites of historical relevance and a number of nearby hotels and hotel-related features.

Data
This section describes firstly the tourism indicators from the WEF data sources which will be used in our analysis.Subsequently, we overview some basic aspects of OSM, and we define the concept of direct and indirect variables.

WEF
Tourism competitiveness is regarded as the set of regulations, infrastructure, and resources that enable the sustainable development of the Travel & Tourism (T&T) sector.For our analysis, data on tourism competitiveness were retrieved from sources of the WEF organization.Particularly, we focus on the Travel & Tourism Competitiveness Report, of which the first edition was published in 2007.This report is based on secondary data from various international organisms and provides engaged leaders in T&T an in-depth analysis of tourism competitiveness of a large number economies across the world.The 2017 edition covers 141 economies and features data about 14 key factors and policies, also called pillars, that enable the sustainable development of the T&T sector and contribute to the development and tourism competitiveness of a country [33].
A pillar measures the strengths and weaknesses of a country in a scale of 1 (bad) to 7 (excellent), and it is based on a set of 90 indicators that are collected either from surveys or official national statistics.These indicators are mainly extracted from two sources:

•
Survey indicators: These are data derived from responses to the WEF's Executive Opinion Survey that capture the opinions of business leaders around the world on a broad range of topics.These indicators are aimed to measure critical concepts to complement the traditional sources of statistics and provide a more accurate assessment of drivers of economic development.Survey indicators range in value from 1 to 7 (1: the lowest negative perception; 7: the highest positive perception).

•
Hard data indicators: These are data which objectively represent the state of some resource or abstract concept, and they are often measured by official international or national organizations (e.g., number of stadiums, airports, ATMs, etc.).These indicators are normalized to a scale of 1 to 7 in order to align them with the Executive Opinion Survey's results WEF uses the survey and hard data indicators to shape the 14 pillars, which are then compiled into a global Travel and Tourism Competitiveness index that represents how viable a country is within the T&T sector.
For our analysis, we opted for selecting indicators that measure tangible aspects that are rather directly perceived by tourists and can be determinant in the selection of a particular destination.The nine indicators selected as our study variables are shown in Table 1.The second column of Table 1 shows the indicator name alongside a brief description.The first column is the pillar that the indicator belongs to.The third column indicates the name of the variable in our study.The fourth column shows whether the indicator is a hard data indicator (H) or a survey indicator (S).Finally, the fifth column is explained in Section 3.2 as it is directly involved with the retrieval of the OSM data.
As can be observed, each variable is drawn from only one WEF indicator except for the variable WHS which stems from two indicators, the Number of World Heritage cultural sites and Number of World Heritage natural sites.The reason is that outstanding universal sites qualify both as cultural and natural sites.

Attractiveness of natural assets
To what extent do international tourists visit your country mainly for its natural assets (i.e., parks, beaches, mountains, wildlife, etc.)? (1 = not at all; 7 = to a great extent).

NAT S I
All in all, we have a total of eight variables covering the most relevant aspects of tourism competitiveness that influence the tourist perception of the country.The selected indicators embody aspects that have a major impact on a tourist trip, such as the presence of car rental companies, the availability of accommodation, or the number of cultural/natural sites.Some of the variables in Table 1 refer to elements related to the tourism infrastructure, while others are intended to survey the tourism attractiveness of the country.The values of the indicators for every country are extracted from the Travel & Tourism Competitiveness Report, which is directly available and downloadable in electronic format [33].

OSM
In this section, we will describe the elements of OSM that will be used in our analysis.Objects drawn on a OSM map are called map features, but these map features are not a tourism-specific site.However, the aggregation of web maps and user-generated content is fed with a broad variety of metadata (OSM tags) that provide valuable tourism information, like the location of accommodation, food establishments, or tourist attractions.Hence, we are able to collect information about tourism competitiveness within a geographical or administrative area, such as a country [34].
In this sense, Table 2 shows a list of five keys alongside a brief textual description of each one.At the end of the description, we show some examples of tags that represent a particular map feature.For instance, a bar is an element tagged in OSM as amenity = "bar", and a museum is tagged as tourism = "museum".An exhaustive list of the map's features can be found in the project web page (https://wiki.openstreetmap.org/wiki/Map_Features(accessed on 28 November 2020)).

Amenity
This key is used to map facilities used by visitors and residents.For example: bar (amenity = "bar"), fast food (amenity = "fast_food").
Aeroway This is mainly related to aerodromes aeroway = "aerodrome", airfields aeroway = "airfield", and other ground facilities that support the operation of airplanes and helicopters.

Historic
This key is used to describe various historic places.For example: archeological sites historic = "archeological_site", ruins historic = "ruin", etc.

Leisure
This key is used to tag leisure and sports facilities, such as water parks leisure = "water_park" and fitness centers leisure = "fitness_center".

Tourism
It represent places and things of specific interest to tourists including places to see, places to stay, and things and places providing information and support to tourists.A museum is one of the possible values of this tag (tourism = "museum").
There are no specific guidelines for the type of tags to define a map feature, except that they must always be string values.Although OSM contributors are allowed to use free-style attributes to define features, there exists a wiki page (https://wiki.openstreetmap.org/wiki/How_to_map_a (accessed on 30 June 2020)) that shows recommended combinations of tags to qualify an object.Tags are used to query and retrieve any object defined in OSM.
The two data sources we handle in this work, WEF and OSM, are of a very different nature, and thereby it is not always possible to measure exactly the same concept in both sources.Hence, a relevant aspect that must be considered in the data extraction is whether or not the OSM value of a particular variable is given in the same measurement units as the value of the corresponding indicator in WEF, which gives rise to:

•
Direct variables: This is the case when the variable is measured in the same terms as the WEF indicator.For instance, the value retrieved from OSM for the variable CAR is the number of establishments that provide such particular service, as are the values obtained from WEF for the indicator "Presence of major can rental companies".

•
Indirect variables: this is the case when the variable in OSM is measured in units other than the ones used in the WEF indicator.For instance, the value of the WEF indicator "Attractiveness of natural assets" is a value within the range 1 to 7 that comes from a survey, while the value we obtain from OSM for variable NAT is the number of natural beauty spots.
We can observe in the fifth column of Table 1 that variables are classified as direct (D) or indirect (I).

Methods
Our aim is to analyze how well the OSM data approximate the values of the WEF indicators and thus determine whether OSM is a reliable data source to evaluate tourism competitiveness.
Figure 1 shows the workflow followed in our analysis.First, the Travel & Tourism Competitiveness Report 2017 was reviewed and, as explained in Section 3, eight variables related with attractions and accommodation infrastructure were selected.The data for each country corresponding to these variables in 2017 was downloaded from WEF.Then, the OSM database was studied, and the most appropriate data for each variable was extracted in 2017 (this will be explained in Section 4.1).Both data from WEF and OSM were combined to build some statistical models, as shown in Section 4.2.For evaluating these models, the following steps were performed: (1) OSM data were downloaded in 2019, (2) these new OSM data were used to infer the WEF values, by using the regression models and ( 3

OSM Data Processing
We follow a straightforward two-step process to retrieve the OSM data for each variable:

•
Step 1.We identify the specific combination of OSM tags that better capture the meaning of the variable.As an example, for the WEF variable CAR (car rental companies), we selected the tags amenity, name, and operator, since this particular combination enables knowledge of whether a specific car rental company is present in a geographical area.

•
Step 2. We query the OSM tags selected in Step 1 through the Overpass API (The Overpass API is an API that serves up custom selected parts of the OSM map data by search criteria, such as location, type of objects, tag properties, proximity, or combinations of them (https://wiki.openstreetmap.org/wiki/Overpass_API/Language_Guide(accessed on 3 July 2020))) within the delimited geographical area of a specific country.
Algorithm 1 shows a query to retrieve the car rental companies in Colombia.Once the objects of type amenity = "car_rental" are retrieved, we can apply the query name = "Europcar" or the query operator = "Europcar" over the retrieved objects so as to find out if the car rental company Europcar is present in Colombia.
Algorithm 1: Excerpt of Overpass code.In some cases, it is necessary to apply two or more queries as described in Step 2 to retrieve the value of a particular variable.Aggregation, arithmetic operations, or more complex operations are needed to approximate the value of some variables with OSM data.Both Overpass queries and the subsequent approximation operations have been implemented in Python.
In the following, we explain the tags used to retrieve the variables, as well as the operations needed in some cases to approximate the value of the WEF indicator.
CAR.We first retrieve all features that match the tag amenity = "car_rental", and then we check whether at least one of the features matches the name of the car rental company (e.g., name = "Avis" or operator = "Avis").
ATM.The number of features in OSM that match the tag amenity = "atm" is relatively low and usually refers only to bank entities.There exist, however, ATMs in shopping malls or other types of establishments that are retrievable via the tag atm = "yes".We estimated one ATM per feature tagged amenity = "atm" because it indicates that the object is an actual ATM, whereas we estimated two ATMs per feature tagged atm = "yes" because it indicates that the place has some ATMs.Finally, in order to calculate the number of ATMs per adult population of 100,000, we used the value of the population between 15 and 64 years that provided the World Bank (http://www.worldbank.org/(accessed on 21 October 2020)).
HOT.The number of hotel rooms in OSM is extracted by finding the features tagged tourism = "hotel" and then using the value of the tag rooms of such features, which is an integer value that denotes the number of rooms of a hotel.Unfortunately, the tag rooms is not present in most of the hotel features, which is the reason why we opted for it, considering the number of hotels as the OSM value for variable HOT.
HBD. Similarly to variable HOT, we recover the value of HBD by using the tag amenity = "hospital" and then querying the tag bed over the hospital features to obtain the number of beds.As it happens with variable HOT, only the hospital features of a small group of 19 countries (e.g., United States, Saudi Arabia, France, United Kingdom, Indonesia, Germany, etc.) include the key bed.Therefore, we opted for it considering the number of hospitals as the OSM value for variable HBD.
WHS.This direct variable represents the number of natural and cultural sites of a country that are selected by UNESCO as World Heritage.The value of WHS is retrievable through the tags heritage = "1" or heritage:operator= "World Heritage Centre (whc)", which return the number of OSM features tagged as World Heritage sites.
AIR.Given that the number of flights is not available in OSM, we focused exclusively on the number of airports using the tag aeroway = "aerodrome".More particularly, we are interested in airports open to the general public that are recognized by the International Air Transport Association (IATA = "<air_code>") or International Civil Aviation Organization (ICTAO = "<air_code>"), where <air_code> is the airport code given by IATA or ICTAO, respectively.
CDD.We assume that the more historical, cultural, and leisure attractions of a country, the more online searches will yield.For variable CDD, we count the number of features that are categorized as museums (tourism = "museum"); historic places (e.g., historic = "aircraft"|"aqueduct") and arts centers (amenity = "arts_centre"); theme parks, aquariums and water parks (tourism = "theme_park", tourism = "aquarium", leisure = "water_park"); and religious places (e.g., building = "cathedral"|"chapel" |"church", amenity = "place_of_worship"), amongst others.For the case of features that represent a building, we also query the existence of the keys historic or tourism in the feature in order to ensure the building is categorized as a tourist attraction.
NAT.For this indirect variable, we recovered the number of places of tourist interest for their natural beauty, such as national parks (e.g., boundary = "national_park"), as well as map features that have both the keys natural and tourism.Examples of tags are tourism = "attraction" and natural = "water", natural = "bay", natural = "cliff", natural = "volcano", etc.

Statistical Analysis
In this section we will carry out a statistical analysis and investigate the relationship between the values of the official WEF indicators and the data collected from OSM.In particular, first, a linear correlation analysis between each WEF variable (denoted as variable-WEF) and its counterpart in OSM (denoted as variable-OSM) is performed, and then regression models are calculated to measure how well the OSM data fits the WEF indicators.In order to obtain the most accurate model that fits the data at hand, linear and non-linear regression models were tested, like multiplicative, double-squared, and squared-root-Y models, among others (see Table 3).These regression models are an alternative when linear models do not achieve the desired accuracy, or when the phenomenon under study has a behavior that can be considered non-linear.To assess the accuracy of each model, the determination coefficient (R 2 ), which measures the proportion of variation of the dependent variable (variable-WEF), is explained by the independent variable, and (variable-OSM) is calculated.Finally, the models are tested with new data from 2019 and the values predicted by these models are compared with the actual WEF values.These analyses will help us to answer our Research Questions 1 and 2.
As stated in [17], the status of a country's ICT services will determine how successful a VGI initiative could be and what growth may be expected in the years to come.Previous investigations [18] found that although OSM has had great global success, there is still a clear difference in the volume of contributed data between affluent and poorer communities.Since OSM relies upon volunteers and the amount of time and effort spent to the relevant area of the map, broader OSM coverage will happen in wealthier countries that have a high ICT level, given that this pillar measures the existence of modern infrastructure (mobile network coverage and quality of electricity supply), but also the capacity of businesses and individuals to use and provide online services.Therefore, in order to answer our Research Question 3, our analysis is carried out from two different points of view: (1) considering all the countries as a whole, and (2) splitting the countries into three groups according to their ICT level given by the ICT readiness pillar of WEF.
Therefore, we used the value of the ICT readiness pillar (score from 1 to 7) to break up the analysis of countries into meaningful segments.Particularly, the values of this pillar that appear in the Travel & Tourism Competitiveness Report 2017 range from 1.57 (Burundi) to 6.47 (Hong Kong SAR), so we created three ICT segments that stand for low, medium, and high ICT levels.Specifically, low ICT comprises countries that have values in [1.5,In summary, we performed the analysis of each variable by taking into account all the countries together, and also with respect to low, medium, and high ICT levels.First, data included in the OSM database at the beginning of 2018 is collected and processed as explained in Section 3.2.Then, the Statgraphics (www.statgraphics.com(accessed on 23 July 2020)) package is used to generate the regression models of each WEF variable from its OSM counterpart variable.In this case, the WEF values are extracted from the 2017 Travel & Tourism Competitiveness Report.The models obtained using both approaches are compared and the models with the best determination coefficient are selected.In this selection, it is important to bear in mind that regression models are sensitive to outliers, that is, outliers may have a high effect on the regression model, an effect that increases as the amount of data decreases (as long as the data are not outliers).In other words, the models obtained for each ICT level will be more sensitive to outliers but, at the same time, they will allow to identify outliers.
Finally, we are interested in checking the applicability of the obtained models with new data.The main idea is to compare the last published WEF indicators (from 2019 Travel & Tourism Competitiveness Report) with the predicted values given by our models, using as input data those that are included in the OSM database at the beginning of 2020.This way, coverage of this indicator across countries is relatively good as compared with the car rental companies registered in WEF.
Additionally, Figure 3a shows the mean values of CAR-OSM and CAR-WEF.The mean value of CAR-OSM for low ICT level countries is almost zero in contrast to the mean value of CAR-WEF, which is about 3.This explains that the presence of car rental companies is not so extensive in this group of countries, and that the few existing companies are not well-mapped in the majority of countries.As an exception, the three most highly mapped countries are Nicaragua (6/7), Honduras (4/6), and Venezuela (3/4).
Countries that belong to the medium ICT level show a good correlation, partly supported by the positive correlation of some well-mapped countries like Morocco (5/6), Peru and Thailand (5/7), or Dominican Republic and Mexico (7/7), all important tourist destinations.In contrast, the relationship of countries that belong to the high ICT group is slightly worse because no car rental companies are mapped for quite a few countries that present high values of CAR-WEF like Lithuania, Slovenia, Jordan, Kuwait (CAR-WEF = 7) or Slovak Republic (CAR-WEF = 6).However, in this group, we can find the highest number of perfectly mapped countries with the best mapping possible 7/7 (e.g., France, Germany, Netherlands, United Arab Emirates, UK).Regarding the analysis with 2019 data, we can observe in Appendix B that the R 2 value is slightly worse than the R 2 obtained with data from 2017.This indicates that the model is not as well-adjusted to 2019 data as to 2017 data.However, the difference is not particularly remarkable.
As a conclusion, we can say that OSM reflects the official values of car rental companies across world economies quite well.More importantly, we can conclude that CAR-OSM is generally well-mapped in important tourist destinations, which leads us to confirm the representativeness of CAR-OSM for tourism purposes.

ATM
In this case, ATM-OSM is a value calculated upon an estimate of the number of machines per OSM node and the country population in order to approximate the value of ATM-WEF as much as possible.
The figures for the variable ATM are shown in Appendix A. Just like in the case of CAR, the model that best fits the data is the model obtained when taking into account all the countries, which explains a proportion of 0.42 of the variability of the ATM-WEF.The obtained model is the following: ATM-WEF(All) = e (2.18+0.39* √ OSM-ATM) . (2) Regarding the ICT segmentation models, a remarkable point is that the goodness of fit is inversely proportional to the ICT readiness, and the relationship for countries that belong to the high ICT level is neither strong nor significant, which is a clear indication that ATMs are not well-mapped in OSM.In developed countries that count on a huge number of ATMs, it seems reasonable that OSM contributors are not very interested in mapping such facilities, as an ATM is easily found all around.The null correlation comes from the fact that although the ATM-OSM values of some countries are relatively large, they are still far from the values ATM-WEF (e.g., UK, Sweden, Singapore, Australia, Canada, Japan, Korea, USA, United Arab Emirates); and, on the contrary, others are found amongst the top-mapped countries (e.g., Croatia, Austria, Switzerland, Slovak Republic, Germany, Portugal, France).The mapping of ATM-OSM thus appears to be a result of randomness, as evidenced in the non-significant p-value.On the other hand, we can observe a relatively strong relationship between ATM-OSM and ATM-WEF in the group of low ICT countries.Clearly, the number of ATMs in these countries is far less than the number of ATMs in countries with high ICT level (see Figure 3b).Additionally, these ATMs are not evenly scattered all around the country and users have to travel a large distance to use ATM facilities [35].Therefore, the scarce existing ATMs are highly mapped in OSM because it is important to locate them accurately.
It is important to note that the number of ATMs is an estimation, as explained in Section 3.2, and results reflect that this estimation should be improved.The countries with the largest actual number of ATMs, those at the high ICT level, also have the largest number of ATMs in OSM (as shown in Figure 3b), but the difference between the expected (WEF) and calculated (OSM) value is significant, which makes it difficult to find a good model.In contrast, ATM-WEF and ATM-OSM are much more similar in the low ICT level, but even in this case, it is not easy to find a better model.In fact, the best model is obtained when all the countries are considered, which implies that the effect of outliers is somewhat mitigated.When this model is applied to 2019 data, the R 2 value is slightly worse, similarly to the case of CAR-OSM, but again this difference is not very remarkable.
All in all, we can conclude that ATM-OSM data do not follow a clear pattern to adjust to ATM-WEF data.

HOT
In order to compare the values for this variable, we transformed the value provided by WEF (see Section 3.2) into the total number of hotel rooms available in a country using the World Bank population estimates.Hence, we will analyze the relationship between the number of hotels (HOT-OSM) with the total number of hotel rooms (HOT-WEF).
Unlike previous variables, in this case, the best-fitted models are those obtained for countries classified according to the different ICT levels, as shown in Appendix A. Both medium and low levels follow a quite similar model, unlike a high level.Specifically: HOT-WEF(High) = (202.60+ 0.06 * HOT-OSM) 2  (3) HOT-WEF(Medium) = e (3.67+1.06* ln(HOT-OSM)) HOT-WEF(Low) = e (4.75+0.86* ln(HOT-OSM)) On the other hand, it can be observed that both the linear correlation and R 2 are significant and quite similar for high and medium ICT levels, since the developed, richer countries with a higher level of ICT also have better hotel infrastructure and a more organized and competitive tourism industry as is the case of countries like Mexico, Greece at the medium level and Spain and France at the high level.However, it has not been possible to find a good model for countries in the low ICT level.This may reflect uneven data and the presence of outliers.In fact, when looking deep into the data, four outliers are identified (Burundi, Nigeria, Tajikistan, and Uganda).A new model is generated with the low ICT level countries by eliminating these outliers; this model obtains a R 2 of 0.4723 and countries as World Heritage, even though they are not officially recognized as such.All in all, we can draw a good OSM representativeness of WHS in countries with a low ICT level.
For countries that belong to a medium or high ICT level, there is no such strong positive relation.The main reason lies in the existence of some countries that have large values of WHS-WEF but are poorly mapped in OSM as, for instance, China (9/52) in medium ICT or Italy (2/51) in high ICT; while others are exceptionally well-mapped, such as Russia (20/26) and Spain (41/45) in medium and high ICT, respectively.As a result, the strength of the correlation decreases notably, as well as the goodness of the model.We believe that correcting the mapping of outliers in medium ICT (e.g., China, Mexico, Greece) and high ICT (e.g., Italy, Germany, USA) would enable to obtain a much more precise picture of the World Heritage Sites.
Appendix B shows that the adjustment of models for medium and high ICT levels improves with 2019 data, around 20% in both cases.This indicates that the models are still valid and that OSM data contain less outliers than 2017 data.The model for the low ICT level shows a very good fit with both datasets.

AIR
For this variable, we converted the value of AIR-WEF, which measures airports per capita (million inhabitants), into the total number of airports using the World Bank population estimates.The result of comparing this value with the number of mapped airports (AIR-OSM) is shown in Appendix A. As we can see, there exists an almost perfect relationship for countries that belong to a high ICT level with only a few discrepancies due to OSM, which also records cargo or military airports.This results in an accurate model for countries in the high ICT level.In contrast, in low ICT, a very weak correlation is observed due to some outliers in the African continent, which means that the model hardly explains a proportion of 0.14 of the variability of AIR-WEF.When generating a new model by eliminating outliers (in this case, Burundi, Benin, Ethiopia, and Madagascar), no substantial improvement is obtained (R 2 = 0.1812).We can say, however, that there exists a strong association for important tourist destinations like India, Kenya, or Madagascar.The same trend is revealed by Figure 4b, where it can be observed that the gap in the difference of the mean values narrows down as the ICT level increases.
Therefore, the model with all the countries, that reaches a R 2 of 0.93, is considered the best model for this variable.The obtained regression model is: AIR-WEF(All) = sqrt(2374.63+ 1.54 * AIR-OSM 2 ). ( Appendix B shows that the R 2 for this model is slightly worse when applied to 2019 data, but it still has a good fit (0.916).
All in all, we can conclude that the higher the ICT level, the more representative the relationship between AIR-OSM and AIR-WEF, and the discrepancies in the low ICT level are mitigated by the good adjustment in the other levels.Despite the fact that the two sources are not measuring exactly the same airport concept (WEF counts only airports with one scheduled flight per million of urban population, whereas OSM is counting all airports as long as they are tagged as public), the model with all the countries is able to explain a significant proportion of the AIR-WEF variability.

CDD
As explained above, in this case, the analysis is focused on the relationship between the online search index of cultural and entertainment activities (CDD-WEF) and the mapped locations in OSM that offer such activities.Appendix A shows that this relationship is strong in low ICT level countries, but it is weak and moderate in medium and high ICT level countries, respectively.The models obtained for this variable exhibit similar behaviour to the WHS variable.Therefore, the models for each ICT level are considered more accurate: A close look at the collected data reveals that the highest coverage of mapped locations corresponds by far to European countries, which also have the highest search index globally.This is the main reason that justifies the stronger correlation of the high-ICT countries, since most European countries fall within this group.The second-ranked group of countries in relation to OSM coverage corresponds to both North and South American countries, and finally the Southeast Asian countries.
The disparity between the search index and mapped locations that makes the correlation weak and moderate in medium and high ICT countries, respectively, is mostly affected by the highly coverage of European countries in comparison to the rest of the countries.As an example, the search index of countries like Czech Republic (6.5) and Poland ( 14) is 5 and 2.5 times less than the search index of the USA (34), while the number of mapped locations is two and three times higher in these two countries than in USA.If we focus exclusively on medium ICT, Peru and Chile have almost the same search index as Greece, but 60% less mapped locations.This provides evidence that, globally, Europe is extensively much better-mapped than the rest of the world, especially concerning cultural interests.
As for low-ICT countries, the relationship is highly significant.Furthermore, the coefficient of determination in this case is R 2 = 0.99, thus indicating that 99% of variation of CDD-WEF is attributed to the predictor variable CDD-OSM.This value is still excellent when the model is applied to 2019 data.Moreover, the model adjustment for medium and high ICT levels improves with the new dataset.

NAT
In this case, NAT-WEF is a survey indicator that measures to what extent a country is visited by its natural assets, while NAT-OSM counts the number of natural assets.As we can see in Appendix A, no correlation is found between the two values, or a very weak relationship is found for the high ICT group.Additionally, the model's adjustment shows a similar trend.In the group with a high ICT level, we find that except Australia, Norway, and Spain, other countries that are well-renowned for their natural spots and also have a large value of NAT-WEF are very poorly mapped-namely, Iceland, Costa Rica, and Ireland.
Therefore, we conclude that OSM is not a very informative source when looking for the natural spots of a country.

Discussion
This section discusses the results presented in the previous section, describes the limitations encountered in this analysis, and provides suggestions to make OSM a usergenerated VGI reference platform in tourism management.
From Table 4 and Appendices A and B, we can conclude that OSM is representative of WEF data for CAR, HBD, and AIR variables; in the case of HOT, WHS, and CDD, it depends on the ICT level, and for ATM and especially NAT, the adequacy is not good.Moreover, we can observe that there is not a clear pattern regarding the OSM representativeness in comparison to WEF when the ICT level is taken into account.That is, in some cases, countries with a high ICT level show the best values (for example, for the AIR and HOT variables), whereas in other cases, such as WHS and CDD, countries with a low ICT level show better values.In the following, we will explain the difficulties we have faced that may explain these results.
The first limitation of OSM is the incompleteness of the data regarding the mapped elements-that is, many spots are not mapped (for example, ATMs), especially in countries with a low ICT level.In fact, in the several maps provided by Anderson [36], we can observe the huge differences in the editing density across countries, with Europe being the area with the highest density in contrast with low-ICT countries.This map also shows that the editing task also focuses on some specific areas of some countries.In general, well-governed countries with good Internet access tend to be more complete, and both sparsely populated areas and dense cities are the best-mapped [37].However, in the last few years, there has been a significative effort in mapping many areas of Africa, as shown by Kateregga [38], which will have a positive impact on the representation of OSM with respect to WEF in these countries.
Another limitation is the incompleteness of the data with respect to the value of tags; that is, many spots are mapped but some lack information in key tags, and so we were not able to extract the same exact information as represented by WEF.This happens in variables such as HBD and HOT; there are tags defined in OSM to specify the value of the number of hospital beds or the hotel rooms but, in many cases, this information is not registered.As explained in Section 4, we have (quite successfully) overcome this difficulty in these cases by using an approximation.On the other hand, as explained above, in countries with a high ICT level, the information regarding World Heritage Sites is not registered in the appropriate tag, which has made it difficult to identify these spots.Given that these factors are important for the image of a country, authorized initiatives to record these types of data in OSM could be encouraged.
Additionally, we have missed some tags in the OSM catalog that would be very helpful in our analysis.For instance, in the case of NAT and CDD variables, a tag like attraction:type = {Natural, Cultural} would have been useful because it would have allowed us to retrieve data with greater precision and ease and it would increase the precision in our calculations.
On the other hand, apart from the incompleteness of OSM data, our interpretation of the WEF variables in terms of OSM tags may indeed affect the accuracy of the results.For example, the estimation we used in our analysis for the variable HOT works well for high and medium ICT countries, but it should be adjusted for low-ICT countries.This fact is especially remarkable in the variable AIR, where the R 2 is 0.96 for high-ICT countries and only 0.13 for low-ICT countries.In the latter case, it would be interesting to add some additional information for a better estimation.Sometimes, however, it is not easy to find; for example, [39] publishes the airport traffic data for the top 60 worldwide airports, with respect to passengers' traffic, but we have not found data about small airports.Another variable that would benefit from the combination of OSM data with external resources is WHS for high and medium ICT level countries: the Wikipedia gives an exhaustive list of World Heritage Sites by country [40]; however, in this case, a better approach would be to use the information in Wikipedia to complete the corresponding tag in OSM data.
We envision the following challenges to make OSM a user-generated VGI reference platform in tourism management: (1) To expand the OSM tagging system by including specific tourism-related tags; (2) encourage users, representatives, authorities, and tourism industry managers to participate in OSM; (3) foster a balance between the general freedom of OSM contributors to fill in data and producing data in a standardized way.Additionally, interesting initiatives like LinkedGeoData that collect spatial data from OSM and make it available as an RDF knowledge base will help increase the visibility of OSM and incentivize its utilization by visitors.

Conclusions
Tourism research has fostered the exploitation of OSM in smart tourism projects, encouraged by promising outcomes of studies that regard OSM as a holistic tourism platform.This new vision of tourism that deals with hyper-connected tourists who consume content any time and through different channels revolves around two core elements, smart phones and geolocation, with OSM being mostly a globally used geodata platform.
In this paper, we have presented an exploratory analysis to study the representativeness of data gathered in OSM.We have undertaken a thorough analysis of eight variables of WEF that cover different tourism aspects, and examined how well OSM data reflect the official values of such variables.We carefully selected the most representative OSM tags to retrieve the information comprised in the eight variables, and then studied for each variable the relationship between the official value and the OSM value.
The presented analysis is a small sample that illustrates the adequacy of OSM usergenerated content for obtaining a picture of the tourism industry in a country.We selected a few variables representing concepts that are measurable and comparable with official statistics, but the analysis is extensible to the large variety of maps, data, and volunteered geo-information offered by OSM.
Studies such as the one presented in this article are relevant because they serve to determine whether OSM data can be used as a reliable data source for tourismrelated applications.
Further work can be done to study other indicators that highly influence tourism behaviour, such as road density, railroad infrastructure, or protected areas, as well as extending the analysis to other collaborative data sources, such as DBPedia and Foursquare, among others.In addition to the ICT level, some other aspects could also be considered, such as the country's population, geographical area, gross domestic product, or the International Monetary Fund classification in Advanced countries and Emerging and developing countries, among others, in the model generation.
) the inferred values were compared to the actual WEF values in the Travel & Tourism Competitiveness Report 2019.
3.5], medium ICT includes countries with values in [3.5, 5.0], and in the high ICT segment we found countries with values within [5.0, 6.5].According to these intervals, 32 countries are classified as low ICT, 54 countries are classified as medium ICT, and 47 countries are classified as high ICT.In the Figure 2, we can observe how the countries are distributed according to the ICT level.

Figure 2 .
Figure 2. Map of countries by ICT level.

Figure 3 .
Figure 3. CAR and ATM variables mean for different ICT levels.

Figure 4 .
Figure 4. WHS and AIR variables' means for different ICT levels.

Table 1 .
Tourism competitiveness variables.Source for indicators can be (S)urvey or (H)ard data.Indicators can be computed (D)irectly or (I)ndirectly from OSM data.

Cultural and entertainment tourism digital demand This
indicator measures the total online search volume related to the following cultural brandtags: Historical Sites, Local People, Local Traditions, Museums, Performing Arts, UNESCO, City Tourism, Religious Tourism, Local Gastronomy, Entertainment Parks, Leisure Activities, Nightlife and Special Events.

Table 2 .
Keys of OSM to represent tourism elements.

Table 3 .
Models used in our analysis.