The Distribution of Rural Accommodation in Extremadura, Spain-between the Randomness and the Suitability Achieved by Means of Regression Models (OLS vs. GWR)

: There are multiple types of regression, the essential task of which is the obtaining of models which, starting from a set of regressive values, are capable of ﬁnding explanations for the variability of a dependent. However, in many cases, the territorial criterion is not considered to be a noteworthy factor of analysis, owing to which this deﬁciency has encouraged the arising of spatial statistics. Nevertheless, given the variety of regressions, it is not clear which can best be adapted to the analysis of tourism. In this sector, when the supply of accommodation is analysed, it is understood that it must be strongly related to the presence of resources, owing to which it has been taken as an example of an application between two di ﬀ erentiated regression techniques: ordinary least squares (OLS) and geographically weighted regression (GWR), with the objective of determining which of the two is best adapted to this type of analysis. The model has been drawn up based on various methods, although it has been shown that it is more e ﬃ cient to resort to the declared preferences of the rural tourist, with the starting point being a survey made of the tourists. These aspects have been taken as independent variables with the aim of explaining the distribution of accommodation establishments. The results obtained show that the conﬁguration of the spatial relations between the variable included in the model encourages the explanation of the latter, owing to which GWR is much more suitable than OLS, even when a system as complex as the distribution of accommodation establishments is analysed. Likewise, it is noteworthy that the distribution of accommodation does not also follow the guidelines marked by demand; far from it, it appears that in some areas, it is of a random nature. J.-I.R.-G., and J.-L.G.-G.; writing—review and editing: J.-M.S.-M. and J.-L.G.-G.; supervision. J.-M.S.-M. and J.-L.G.-G.; project administration: J.-M.S.-M.; funding acquisition: J.-M.S.-M. All authors have read and agreed to the published version of the manuscript.


Introduction
Tourism has developed greatly in many parts of the world. It has become an essential industry in the achieving of the socioeconomic development of a multitude of places [1]. It is for this reason that numerous studies analyse the activity from various perspectives. These include those studying the economy of tourism in various countries [2][3][4], those which concentrate on the analysis of the tourism satellite account [5][6][7], or those which analyse specific products and aspects such as sun and beach tourism [8], rural tourism [9], and cultural tourism [10].
There is a considerable diversity of themes analysed from the point of view of tourism, although all need data to implement them [11]. Until recent years, the information was concentrated on economic, importance [52]. Many studies carried out decades ago [53] stressed the role of spatial analysis and remain valid today [54]. In consequence, both geography and territory science give tourism a different meaning, complementing it and providing it with a key tool: the geographical information system. With this contribution, together with progress in hardware and software, we have an efficient instrument for performing a tourist analysis based on territorial support itself, albeit without renouncing the quantitative analysis.
The need for including territory in tourist analyses arises as a consequence of the fact that tourist resources, accommodation, and complementary services, together with the tourists themselves, travel movements, etc., can be represented on the territory [55]. If this is the case, an obvious circumstance which is not questioned by the literature, it would be logical to think that it is necessary to adapt analytical techniques to this situation. It is likewise necessary to mention the importance of the criterion of proximity in tourism, as nearby spaces affect and are modified by this activity, influencing and altering strictly statistical models [56,57].
Studies applying geostatistical techniques to facilitate an understanding of tourism in its most varied facets have currently become relevant. In this sense, faith is placed in analysing patterns or mapping clusters according to the formulations of Getis-Ord [58,59] or Moran [60,61]. Despite the multiple applications of both groups of geostatistical techniques, when modelling spatial relationships are used, it is necessary to resort to geographically weighted regression [62].
The literature consulted confirmed that the application of geostatistical techniques is increasing in tourism studies. It is thus clear that the importance of the territory is vital if the results are to be interpreted correctly. In order to do so, the aim of the paper is to apply and compare the results obtained by means of ordinary least squares regression (OLS) and geographically weighted regression (GWR) in the distribution of the supply of rural accommodation in Extremadura, Spain.
If on a technical level there are numerous options for analysing tourism, when specific aspects of the latter are studied, significant problems also arise. In the case chosen as a subject of study, rural accommodation establishments in a specific territorial context, doubts always arise as to their appropriate location [63]. At the same time, we have tried to determine their optimum location [51,55]. Despite the considerable efforts which have been made, only in few cases have satisfactory explanations been obtained for the location of rural accommodation establishments. This is due to the fact that their location does not always follow a logical coherence which would imply resorting to areas with the most favourable characteristics or which are adapted to the preferences of the demand. Indeed, at times, their distribution appears to be random [46] as they are not always located in the areas with the greatest capacity for attracting tourists. These facts would appear to be corroborated by observation of the preferences of the demand for this type of establishment and the location of both the accommodation and tourist attractions. This imbalance may constitute a major hindrance for establishments and in consequence may affect their profitability.
The result of the circumstances mentioned is the need for detecting the main factors which attract tourists who stay at rural accommodation establishments and comparing them with the distribution of the lodging capacity they offer. In order to do so, various techniques are used, both statistical and geostatistical, which if appropriate allow the verification of the most significant models or guidelines for the implementation of rural accommodation establishments.
The study takes as a starting point the following hypotheses: (1) the distribution of rural accommodation establishments is not always due to criteria of suitability as the implementation of an establishment depends on human decisions which are at times variable and random; (2) it is necessary to delimit correctly the independent variables which may explain the location of the establishments; in order to do this, it is necessary to find out the opinion of tourists; (3) as the territorial component is not taken into account, OLS obtains worse results than GWR in the general regression model; (4) the local regression models obtained by GWR reveal their suitability on allowing the obtaining of a differentiated equation for each entity analysed; (5) it is possible to detect outliers in the territory which must be studied specifically; (6) predictive calculations on the lodging capacity of rural accommodation may Sustainability 2020, 12, 4737 4 of 29 serve as a basis for planning action when this is necessary as there will not always be a balance between actual values and those calculated by the models built.

Study Area and Procedure
The study area chosen is Extremadura, Spain, an eminently rural region with a surface area of 41,634 km 2 . This territory contains numerous tourist resources of a cultural and natural character which serve to attract tourists. Likewise, it has a considerable lodging capacity if its nature as an inland destination is taken into account. Despite this, the distribution of accommodation establishments in the region is unequal, although it does follow certain guidelines depending on the type of accommodation [41,46,64,65] (Figure 1).

Study Area and Procedure
The study area chosen is Extremadura, Spain, an eminently rural region with a surface area of 41,634 km 2 . This territory contains numerous tourist resources of a cultural and natural character which serve to attract tourists. Likewise, it has a considerable lodging capacity if its nature as an inland destination is taken into account. Despite this, the distribution of accommodation establishments in the region is unequal, although it does follow certain guidelines depending on the type of accommodation [41,46,64,65] (Figure 1). In 2019, a total of 1768 accommodation establishments were registered with a maximum capacity of 42,214 beds (45,957 if extra beds are included) which have a variable distribution depending on the type of accommodation [66] (Table 1). In 2019, a total of 1768 accommodation establishments were registered with a maximum capacity of 42,214 beds (45,957 if extra beds are included) which have a variable distribution depending on the type of accommodation [66] (Table 1).  Hotels  433  18,373  19,994  Rural accommodation  862  8933  11,055  Tourist apartment  396  3015  3015  Hostels  45  1827  1827  Campsites  32  10,066  10,066  Total  1768 42,214 45,957 In the specific case of the accommodation establishments considered as representing rural tourism, according to Decree 65/2015 which classifies their type [67], there has been an increase out of all proportion in recent decades [63]. This increase, which has been uncoordinated and poorly planned, has given rise to a clear imbalance between the supply and the demand. This has meant that during the recent economic crisis, some establishments have closed down [65]. It is evident that the reason for this needs to be understood. According to the literature focussing on the study area, apart from the economic situation, the most frequently heard explanation is the location of these establishments in areas lacking the attractions preferred by the demand [41,56,63,65].
This type of accommodation is subjected to other strains. These include few travellers and few overnight stays in comparison with other surrounding areas, low occupation levels, and shorter average stays [68] (Table 2).

Materials
The materials used for carrying out this research are of two types: alphanumeric and cartographical. Both have been implemented on a GIS, which has allowed the application of the techniques necessary for demonstrating the initial hypotheses.
The cartography used to perform the analysis is that of the National Geographic Institute (Instituto Geográfico Nacional, IGN) [69], which operates under the Creative Commons CC-BY 4.0 International licence. The scale of reference is 1:100,000 and the resolution is 20 m, which is sufficient to approach with guarantees to the type of analysis required by the research. It gives a vision of the territory which aims to describe the appearance and the details of the surface area together with the geographical objects found on it, and whether these are natural or a product of human activity. It allows the making of geographical and alphanumerical enquiries. Its class structure of geographical objects is based on the phenomena representing the scale mentioned with simple geometries (dot, line, and area) [70]. The alphanumeric information comes from various sources. That referring to tourist accommodation establishments comes from the Register of Tourist Companies of Extremadura [66]. It has been updated to 31 December 2019. This alphanumeric database has been conveniently georeferenced for its subsequent implementation in a geographical information system.
On the other hand, the information on the declared preferences of rural tourists regarding tourist resources comes from a direct survey. It contains data which allow the extraction of their sociodemographic profiles, the type of accommodation where they stay, the variety of tourism they practise, and naturally the elements they value in choosing their destination (Table 3). This survey was completed by 710 tourists in 2015 and was repeated by 140 in 2019 with the aim of contrasting the changes. As there are no significant differences between both years, neither in the preferences of tourists nor in the sociodemographic aspects, the entire sample was used (850 surveys). This is a reliable survey, since with 95% confidence the sampling error is 3.36%. From it, the tourist resources preferred by those staying in the study area have been discovered.

Research Procedure
Among the resources finally selected to explain the distribution of rural accommodation establishments, the criterion of distance was taken into account. To calculate it, we resorted on occasion to network analyst tools and at other times to the Euclidean distance taken from the resource to the capital of the municipality (Table 4).  Owing to the problems faced by rural accommodation establishments, the need for finding out whether they are located in optimum areas by establishing regression models is put forward. At a methodological level, the process followed is summarised in 5 stages ( Figure 2). Owing to the problems faced by rural accommodation establishments, the need for finding out whether they are located in optimum areas by establishing regression models is put forward. At a methodological level, the process followed is summarised in 5 stages ( Figure 2). The first stage consisted of the obtaining of information on the supply of accommodation, tourist resources, the territory analysed, and tourists. With these data, a database was designed which was implemented in a GIS in the second stage. This software, together with the statistical analysis of the database, allowed us to obtain the tourist potential of each of the municipalities analysed. The tourism potential is obtained considering the tourism resources that tourists prefer. The third stage consisted of determining the independent variables which explain the location of rural accommodation establishments and providing two differentiated models. The first of them is obtained from ordinary least squares regression (OLS) and the second by means of geographically weighted regression (GWR) in which the criterion of proximity is considered. The fourth stage concentrates on the analysis of the results obtained, while in the fifth, both models are discussed. The first stage consisted of the obtaining of information on the supply of accommodation, tourist resources, the territory analysed, and tourists. With these data, a database was designed which was implemented in a GIS in the second stage. This software, together with the statistical analysis of the database, allowed us to obtain the tourist potential of each of the municipalities analysed. The tourism potential is obtained considering the tourism resources that tourists prefer. The third stage consisted of determining the independent variables which explain the location of rural accommodation establishments and providing two differentiated models. The first of them is obtained from ordinary least squares regression (OLS) and the second by means of geographically weighted regression (GWR) in which the criterion of proximity is considered. The fourth stage concentrates on the analysis of the results obtained, while in the fifth, both models are discussed.

Exploratory Regression (OLS)
Tourist activities have a privileged position in the geographical space in which they are carried out. In a large proportion of cases, they are conditioned by a series of factors with a certain spatial continuity. The tourist resources present in the territory play their role of attraction in the centre where they are to be found, although they also benefit nearby areas; from this, it can be deduced that proximity may have a direct effect on the configuration of the tourist space [52].
It is likewise evident that resources may be superimposed in the territory analysed, which may give rise to an incorrect interpretation of the existing relationships. After assessing the advantages and disadvantages of various techniques which allow the detecting of strong correlations between the predicting variables, we decided to carry out exploratory regressions. By this, a set of models on which to decide is achieved.
In order to carry out these exploratory regressions, we decided to take as regressive variables the 12 which may influence the location of rural accommodation establishments (Table 4) and as a dependent variable the lodging capacity in rural accommodation. Both types of variables were allocated to each of the 388 population centres located in the area of analysis.
The application of the exploratory regressions was dependent on compliance with the following requirements: Variance inflation factor (VIF) ≤3. This value reflects how much redundancy (multicollinearity) among the model explanatory variables can be tolerated. When the VIF (variance inflation factor) value is higher than about 7.5, multicollinearity may make a model unstable; consequently 7.5 is the default value here; • Coefficient p-value <0.05. The selected value is 0.05, indicating passing models which will only contain explanatory variables when their coefficients are statistically at a 95 percent confidence level (p-values smaller than 0.05).
The viable models obtained are analysed to select that with the best requirements for the application of OLS and GWR regressions.
The basis of exploratory regression is ordinary least squares regression. This is used frequently in tourism studies either as a single technique or together with others [71,72]. Its main task consists of predicting the values attained by the dependent variable in accordance with the regressive variables which make up the model.
There tends to be considerable discussion on the final configuration of this model as to the possible problems of multicollinearity [73,74]. Nevertheless, more and more references are appearing which stress the important role played by remainders in the explanation of the same [75]. Some authors even recommend checking whether homoskedasticity or heteroskedasticity exists to validate the model, starting from the covariance matrix [76].
Although this technique is frequently used, the possible problems that may derive from the choice of an erroneous model should always be considered. The most frequent among them are perhaps the omission of explanatory variables, the existence of nonlinear relations, the presence of outliers, multicollinearity, the inconsistent variance in the remainders, etc. Despite this, almost all these disadvantages tend to have different solutions, among which the revision of the model stands out.
For the purpose of this article, ordinary least squares regression is carried out by means of the analysis module integrated in the spatial statistics tools included in the ArcGIS v.10.5 software.
The ordinary least squares method uses the following Equation (1) [77]: in which: y i = the value observed for the dependent variable at point i; β 0 = the interception point y (constant value); β n = the regression coefficient or slope of the explanatory variable n at point i; x n = the value of the variable n at point i; [ε] = the error of the regression equation.
For its application the following requirements must be complied with [77,78]: -The model must be linear: this can be analysed by scatter plots; Sustainability 2020, 12, 4737 9 of 29 -The data used must not depend on any external factor; - The explanatory variables must not be related to each other; - The explanatory variables must have an insignificant error when measured; - The remainders must come to 0; - The remainders must have a homogeneous variance and follow a normal distribution.
The definitive ordinary least squares regression model was subject to different tests to confirm its reliability. Among them, the F-statistic, Wald, Koenker, and Jarque-Bera statistics tests stand out.

Geographically Weighted Regression (GWR)
Ordinary least squares regression does not include in its formulation the criterion of proximity, which may compromise the models obtained in specific systems such as that of tourism. In a system as complex as the latter, it is inevitable to resort to the proximity analysis to explain certain facts. The presence of certain tourist resources thus favours not only the areas in which they are located but also those nearby, which shows the importance of proximity [79]. The need for including distance as a key parameter has been demonstrated by numerous publications which use geostatistical analyses [41,58,[80][81][82].
In this context, spatially weighted regression emerges as a viable alternative for building models in which spatial relationships are analysed. This variety of regression allows the construction of a model by means of explanatory variables in the same way as OLS. However, it goes further as it specifies whether the kernel is constructed as a fixed distance, or whether it is allowed to vary in extent as a function of feature density [77]. Likewise, it allows specifying how the extension of the kernel can be determined, whether by means of the corrected Akaike's information criterion (AICc), cross validation (CV), or bandwidth parameter.
This regression builds a different equation for each entity on including the dependent and explanatory variables of the entities in the bandwidth of each destination entity.
The following Equation (2) should be used: in which y i is the dependent variable at location i; x ik is the kth independent variable at location i; m is the number of independent variables; β i0 is the intercept parameter at location i; β ik is the local regression coefficient for the kth independent variable at location i; and ε i is the random error at location i [48] (p. 2). GWR allows coefficients to vary continuously over the study area, and a set of coefficients can be estimated at any location-typically on a grid so that a coefficient surface can be visualised and interrogated for relationship heterogeneity. GWR makes a point-wise calibration concerning a "bump of influence" around each regression point where nearer observations have more influence on estimating the local set of coefficients than observations farther away [47]. GWR measures the inherent relationships around each regression point i, where each set of regression coefficients is estimated by weighted least squares.
To calculate it, the following Equation (3) should be used: in which X is the matrix of the independent variables with a column of 1 s for the intercept; y is the dependent variable vector;β i = (β i0 , . . . β im ) T is the vector of m + 1 local regression coefficients; and Wi is the diagonal matrix denoting the geographical weighting of each observed data for regression point i [48] (pp. 2-3).

Basis of the Regression Model: Activities Carried Out by Rural Tourists
The items used to determine the activities carried out by tourists staying in a rural accommodation establishment indicate which aspects are most attractive to them (Table 5). A detailed analysis shows that they carry out activities related to generic rural tourism, although they also make cultural visits. They declare that they practise tourism in rivers, gorges, or reservoirs and gastronomy. Finally, albeit less significantly, they also declare that they watch birds and visit mines or geological formations or practise sport. Meanwhile, the remainder of activities go unnoticed. If the activities preferred by tourists staying in rural accommodation establishments are considered, it is possible to determine the factors which most attract them. From this, it can be deduced that it is a good way of understanding which should be the parameters used to analyse the implementation of rural accommodation establishments. These parameters include some which are highly characteristic of certain territories, although there are others which can be enjoyed over a large part of Extremadura, as is the case with gastronomy. The most noteworthy are:

•
Cultural heritage. For this reason, proximity to historical ensembles has been chosen as a regressive variable; these ensembles also include World Heritage cities located outside the rural context; • Proximity to water resources. As a consequence of their importance, bathing areas and reservoirs have been selected as factors justifying the presence of a rural accommodation establishment. We have also included the distances to two protected landscapes (the Garganta de los Infiernos Nature Reserve and the Tajo-Internacional Nature Reserve). The former is a well-known bathing area and the latter offers river cruises which take tourists even as far as Portugal; • Birdwatching. In this sense, we have included the distance to Monfragüe National Park, the main centre of attraction for ornithological tourism; • Visiting mines, caves, or geological formations. Owing to its importance, the distance to the Villuercas-Ibores-Jara Geopark has been considered.
The replies obtained in each of the types of resources analysed justify the fact that the main attractions sought by tourists staying at rural establishments act as regressive variables. It is logical to think that the areas containing a large proportion of these attractions are the most appropriate for maintaining this type of accommodation and will have more advantages if they are near a highway as a consequence of their better access.
The tourists themselves define a logical model for the distribution of rural accommodation establishments, which is at variance with the actual situation (Figure 3a,b). It can be observed that there is a natural preference for certain areas although the offer of these accommodation establishments follows a markedly different pattern [63]. It can also be appreciated that there are certain differences depending on the population living in each municipality, especially if this is compared with the distribution of other types of accommodation [64]. The tourists themselves define a logical model for the distribution of rural accommodation establishments, which is at variance with the actual situation (Figure 3a,b). It can be observed that there is a natural preference for certain areas although the offer of these accommodation establishments follows a markedly different pattern [63]. It can also be appreciated that there are certain differences depending on the population living in each municipality, especially if this is compared with the distribution of other types of accommodation [64]. The distribution of rural accommodation and the presence of the resources which support the activities carried out by tourists closely coincide in certain areas. In this sense, there is an important relationship between mountain areas and the presence of bathing areas; the latter represent the highest concentrations of the lodging capacity in rural accommodation establishments. This situation is understandable if it is taken into account that summer is the time of the year when most rural tourists are received [64,65]. It can likewise be observed that in the proximity of the sierras, numerous beds in rural accommodation establishments are also concentrated, although in many cases they do not coincide with areas prepared for bathing with the subsequent loss of attraction for summer tourism. Moreover, there are also accommodation establishments in areas lacking the main attractions sought by the demand (Table 6). The distribution of rural accommodation and the presence of the resources which support the activities carried out by tourists closely coincide in certain areas. In this sense, there is an important relationship between mountain areas and the presence of bathing areas; the latter represent the highest concentrations of the lodging capacity in rural accommodation establishments. This situation is understandable if it is taken into account that summer is the time of the year when most rural tourists are received [64,65]. It can likewise be observed that in the proximity of the sierras, numerous beds in rural accommodation establishments are also concentrated, although in many cases they do not coincide with areas prepared for bathing with the subsequent loss of attraction for summer tourism. Moreover, there are also accommodation establishments in areas lacking the main attractions sought by the demand (Table 6).
This most inconsistent situation gives rise to three major groups of rural accommodation establishments. The first of these, located in mountain areas and with the presence of bathing areas, has the best competitive advantages for attracting tourists. The second also has a certain capacity of attraction, although it is not so competitive during the long summer period characteristic of Extremadura, where temperatures often exceed 35 degrees and on occasion reach 40 degrees [83]. Finally, the third, which is unequally distributed over the territory, is that near Special Bird Protection Areas (Zonas de Especial Protección para Aves, ZEPAs). This distribution, which is more or less satisfactory, is faithfully reflected in its capacity for capturing tourists according to the time of the year. This is corroborated when the areas located in the north of Extremadura, a mountain region with natural bathing areas, attract a large number of tourists and overnight stays during the summer. Meanwhile, other areas have greater potential during the rest of the year [84]. It is obvious that the essential attractions sought by the demand are mountain or sierra areas, bathing areas, and some natural spaces, although one should always stress the enormous territorial variability regarding the lodging capacity. For this reason, despite the fact that there are clear determinants which explain the presence of the rural accommodation establishments available in specific areas, it is evident that it would be viable at least in theory to model their distribution on the presence or proximity of specific factors. Nevertheless, certain conditions can be observed among some of these factors, which means that it is as well to improve these models by using specific techniques which make it possible to eliminate the effects of collinearity.

Exploratory Regression Model
Taking into account the contrasts between the distribution of accommodation establishments and the presence of the main factors mentioned by the demand staying in rural establishments, we chose to carry out an exploratory regression. Its main advantage lies in the fact that it includes all possible combinations of the regressive variables which attempt to explain the number of rural accommodation beds in each population centre.
This type of regression also allows the use of complementary criteria to select variable models in addition to value R.
After its application, a considerable variety of models were found, of which three are worthy of note. The first takes into account all the variables which may act as regressors, while the second has been improved, taking into account exclusively the variables complying with some reliability requirements. In contrast, the third model uses as regressive variables the factors which can be deduced from the activities carried out by the type of tourist analysed (Table 7).
To obtain the exploratory regression mode, slightly different criteria were used (Table 8). In the first place, an adjusted R-squared minimum value exceeding 0.3 was required; this value was not attained in any of the 4095 combinations analysed. Likewise, a p-value of <0.05 was required; this condition was complied with in 399 cases. A VIF value of <3.00 was also required; this condition was complied with by 1695 models. On the contrary, no possible combination appeared if a p-value (JB) of >0.05 was considered or a spatial autocorrelation (SA) of >0.20. In contrast, the second model saw the adjusted R-squared value lowered to figures above 0.2, which was achieved in two models. Meanwhile, a p-value of <0.05 was provided by 49 models, and a VIF value of <3.00 obtained 63 viable results. Nevertheless, although greater permissiveness is possible with a p-value (JB) or a spatial autocorrelation (SA) of >0.1, no viable model is achieved. Finally, the third model presents 511 tests. A total of 17.81% of these comply with the minimum adjusted R-squared value considered. In turn, they also comply with the p-value and VIF value criteria in between 29.75% and 43.64% of cases. However, the robustness of the models is compromised if we should resort to a p-value (JB) or a spatial autocorrelation (SA).  These first results show how difficult it is to explain the distribution of the beds available in rural accommodation establishments in a logical manner. It is also as well to point out that there is a degree of territorial concentration of the same in some places, which undoubtedly affects the fact that there is a similar concentration in the remainders obtained with these regression tests [55].
The significance of the variables considered in each model certifies the consistency of each of them (Table 9). It can thus be observed that while Model 1 attaches much importance to most variables, Model 2 distinguishes them better. In this sense, it is revealed that the relief (mountains and sierras) and the water resources (bathing areas and reservoirs) reach significant values; cultural attractions and the distance from highways are also noteworthy. Model 3 shows similarities with Model 1 in the variables which have been used jointly.
This analysis demonstrates that Model 2 is more accurate as it selects fewer variables, but all of them are highly significant from a statistical point of view. Moreover, they agree with the main attractions which tourists emphasise. This same model reflects the importance of being near mountain and bathing areas, as their negative significance is 100%, which is somewhat similar to the case of historical ensembles. This is in addition to the fact that there is a positive relationship with sierras, reservoirs, and the distance from highways. This initial analysis reflects the possibility of finding two well differentiated types of areas. On the one hand, we have mountain areas clearly linked to bathing areas, and on the other, areas far from these spaces containing reservoirs and protected areas.
Despite the interest of the analysis of statistical significance, the fact that multicollinearity is one of the most serious problems that may arise in the application of multiple regression cannot be ignored. For this reason, its analysis is essential if we are to determine the goodness of the regression models constructed (Table 10). It is noteworthy that Model 1 has three variables with unacceptable VIF values, which gives rise to their participation in numerous models with multicollinearity (violations). In contrast, Model 2 represents precisely the opposite as all the variables included have very low VIF values, especially if it is taken into account that the maximum limit established is 7.5. Clearly, having low VIF values means that no violation by collinearity occurs. For its part, Model 3 is not robust as there are three variables with a VIF much higher than the value statistically permitted (7.5).  The assessment of the models obtained indicates that Model 2 has the best conditions for explaining the presence of the supply of rural accommodation (Table 11). Indeed, the most viable model constructed consists of the main attractions required by the rural tourist. Nevertheless, it should be taken into account that this model attains an R 2 of 0.21, although it is also in keeping with the SA.
To this, what must also be added is the fact that the variables selected by the model itself are significant at a level of 0.01. From this, it can be deduced that it is unlikely that the result is due to chance.
The above analyses show that there is a model which can provide an explanation according to the activities carried out by tourists, although it provides a reduced explanation of the dependent variable. For this reason, it is difficult to find a logical explanation for the distribution of the supply of rural accommodation.

Ordinary Least Squares Regression (OLS)
The results given by OLS in Model 2 include interesting data because the VIF has a lower value in all the regressive variables. This shows that there is no collinearity between the variables. Likewise, it confirms that the coefficients obtained by each of these variables provide an explanation of the model, although three stand out negatively (historical ensemble, bathing area, and mountain) ( Table 12). This means that the centres nearer these attractions have a higher number of beds, which is in keeping with the activities carried out by the tourists staying in rural accommodation establishments, at least during the summer. Nevertheless, the model is completed with two other positive coefficients (reservoir and highway), from which it can be deduced that the further a population centre is from highways and reservoirs, the greater the capacity for rural accommodation. All this is faithfully reflected in the current situation, as shown in Figure 3. On the other hand, both probability and robust probability are significant, from which it can be deduced that the possibility of the coefficient being essentially zero is also low. Despite the fact that the regressive variables correspond to the attractions valued by tourists who stay in rural accommodation establishments, a full diagnosis of the model must be carried out (Table 13). It is from this analysis that the first signs of complexity arise. This is due to the fact that an explanation can only be found for 21% of the distribution of beds in rural accommodation establishments. Moreover, the statistics of Koenker and de Wald are significant. Owing to this, OLS reflects that the lodging capacity is not strongly linked to the presence of the essential attractions for rural demand. This may mean that the location of these establishments does not follow a logical criterion but rather other factors which have little to do with the attraction capacity and the tourist potential of Extremadura. Both the low level of explanation of the model and the negative assessments of the main tests used are a clear sign that there is no obvious trend to explain the distribution of the beds of accommodation establishments. This peculiarity is due to the fact that setting up an establishment of this type is a personal decision that does not always follow a logical criterion.
The    The analysis of the standardised results reflects a Gaussian distribution of frequencies and the positive bias is minimal (Figure 5a). This is corroborated when resorting to the spatial autocorrelation index obtained by means of Moran's I (Figure 5b), the value of which attains 0.063515 to certify a random distribution of the remainders. Moreover, the territorial distribution of the same reveals no concentrations that may lead us to suspect the omission of a specific regressor (Figure 5c). The analysis of the standardised results reflects a Gaussian distribution of frequencies and the positive bias is minimal (Figure 5a). This is corroborated when resorting to the spatial autocorrelation index obtained by means of Moran's I (Figure 5b), the value of which attains 0.063515 to certify a random distribution of the remainders. Moreover, the territorial distribution of the same reveals no concentrations that may lead us to suspect the omission of a specific regressor (Figure 5c).

Geographically Weighted Regression
The application of OLS has shown that it is difficult to obtain a satisfactory global regression model. This is essentially due to the unsatisfactory explanation of the dependent variable and the non-compliance of some indexes which check the suitability of the set of regressive variables. Other parameters however exist which suggest the possible applicability of the same, although they show ambiguities. For this reason, it is a case of discovering whether any changes occur by resorting to the conceptualisation of the territorial aspect beyond its own geographical representation. The need thus arises to contrast this model by GWR with the essential objective of revealing whether any other territorial pattern exists which has escaped analysis.
The conceptualisation of the model applied by GWR is similar with the aim of being able to contrast the results and confirm which technique is best fitted to the situation deriving from the territorial distribution of rural establishments. The decision was taken to use the same regressive variables and naturally maintain the lodging capacity in rural accommodation establishments as a dependent variable. Nevertheless, this type of regression allows the adding of a variable which acts as spatial weighting for each individual entity. For this reason, the number of rural accommodation establishments has been included as a calibration factor of the model.

Geographically Weighted Regression
The application of OLS has shown that it is difficult to obtain a satisfactory global regression model. This is essentially due to the unsatisfactory explanation of the dependent variable and the non-compliance of some indexes which check the suitability of the set of regressive variables. Other parameters however exist which suggest the possible applicability of the same, although they show ambiguities. For this reason, it is a case of discovering whether any changes occur by resorting to the conceptualisation of the territorial aspect beyond its own geographical representation. The need thus arises to contrast this model by GWR with the essential objective of revealing whether any other territorial pattern exists which has escaped analysis.
The conceptualisation of the model applied by GWR is similar with the aim of being able to contrast the results and confirm which technique is best fitted to the situation deriving from the territorial distribution of rural establishments. The decision was taken to use the same regressive variables and naturally maintain the lodging capacity in rural accommodation establishments as a dependent variable. Nevertheless, this type of regression allows the adding of a variable which acts as spatial weighting for each individual entity. For this reason, the number of rural accommodation establishments has been included as a calibration factor of the model.
From then on, different combinations have been applied to select the kernel of the function and the most suitable method used to calculate its extent. In the case of the selection of the kernel, it was From then on, different combinations have been applied to select the kernel of the function and the most suitable method used to calculate its extent. In the case of the selection of the kernel, it was decided to use both fixed and adaptive parameters. The former covers a fixed distance (metres) while the latter considers a variable distance depending on a specific number of neighbours. In both cases, it is calculated automatically in the case of the corrected Akaike's information criterion (AICc) and cross validation (CV) and selected by the authors in the case of the bandwidth parameter.
A synthesis of the results obtained shows that, overall, they are better than those obtained by OLS (Table 14).
An analysis of the different methods used reflects that AICc and CV provide superior performances to bandwidth parameter, both resorting to fixed kernels and to adaptive kernels. This is due to the fact that the condition number does not exceed the critical value of 30. Even so, the most effective result is achieved by means of the application of AICc with fixed kernels. In this case, the automatic distance calculation indicates that the models are optimised by using a bandwidth of 63.627 metres. It can likewise be observed that the difference between the R 2 and the adjusted R 2 is not very high and that the proportion of variance of the variable explained is reasonable if the complexity of the model is taken into account.  The territorial distribution of the local regression models reveals that the R 2 follows certain patterns (Figure 6a). In this sense, the best adjustments are obtained in the district of Alcántara and its vicinity where they are over 40%, although there are other areas in which 30% is exceeded such as the Villuercas-Ibores-Jara Geopark. In contrast, the adjustments of the areas of the north of the province of Cáceres or the south-centre of the province of Badajoz are lower than 20%.
The territorial analysis of the remainders obtained by the model shows a negative imbalance in those areas with the highest number of beds (Figure 6b). In the north of the province of Cáceres therefore, the model estimates a higher lodging capacity than that existing in a large proportion of the centres. This is due to the considerable potential for development which said areas still have as a consequence of the superimposition of the resources preferred by the demand from tourists. In contrast, in these areas, there are also centres in which the model estimates a number of beds lower than the actual number to coincide with the centres with smaller and more aged populations and with a very limited capacity for enterprise. In the same way, other areas which are very well delimited geographically can be observed in which there are centres with a considerable imbalance, both positive and negative, between the values calculated by the model and those actually found (Table 15). The territorial analysis of the remainders obtained by the model shows a negative imbalance in those areas with the highest number of beds (Figure 6b). In the north of the province of Cáceres therefore, the model estimates a higher lodging capacity than that existing in a large proportion of the centres. This is due to the considerable potential for development which said areas still have as a consequence of the superimposition of the resources preferred by the demand from tourists. In contrast, in these areas, there are also centres in which the model estimates a number of beds lower than the actual number to coincide with the centres with smaller and more aged populations and with a very limited capacity for enterprise. In the same way, other areas which are very well delimited geographically can be observed in which there are centres with a considerable imbalance, both positive and negative, between the values calculated by the model and those actually found (Table  15). The distribution of the values observed and those predicted by the model (Figure 7a,b) corroborates certain guidelines. On the one hand, it is clear that the lodging capacity of rural accommodation establishments is concentrated in very few areas. In them, there are population centres with many beds in this kind of accommodation also intermingling with others in which the number of beds is more limited or even non-existent. On the other hand, when the distribution of the  The distribution of the values observed and those predicted by the model (Figure 7a,b) corroborates certain guidelines. On the one hand, it is clear that the lodging capacity of rural accommodation establishments is concentrated in very few areas. In them, there are population centres with many beds in this kind of accommodation also intermingling with others in which the number of beds is more limited or even non-existent. On the other hand, when the distribution of the number of beds estimated by the regression model is analysed, it can be observed that they are concentrated in specific areas such as the north-east and other well-defined areas. The comparison of the beds available in rural accommodation establishments with those calculated by the model reveals that the areas where the latter best operates maintain a certain balance between both variables (Figure 8). When the situation of other areas in which the operation of the model is less adequate is analysed, the results are similar. Nevertheless, these adjustments are acceptable when R 2 exceeds 0.3. In contrast, the opposite is true in those areas of Extremadura in which rural tourism is more developed, such as the northern district, as a consequence of having less satisfactory adjustments. This scenario is completed with the existence of numerous population centres which have no beds available, although they are allocated some by the model. The comparison of the beds available in rural accommodation establishments with those calculated by the model reveals that the areas where the latter best operates maintain a certain balance between both variables (Figure 8). When the situation of other areas in which the operation of the model is less adequate is analysed, the results are similar. Nevertheless, these adjustments are acceptable when R 2 exceeds 0.3. In contrast, the opposite is true in those areas of Extremadura in which rural tourism is more developed, such as the northern district, as a consequence of having less satisfactory adjustments. This scenario is completed with the existence of numerous population centres which have no beds available, although they are allocated some by the model. An analysis of the results obtained by means of GWR shows the complexity of trying to explain the distribution of rural accommodation establishments by tourist preferences, which reveals that the implementation of the same has not followed an objective criterion. Far from it, in certain places, their location follows an almost random criterion which may perhaps be explained by other variables which have nothing to do with the tourism and business logic that should govern any tourist business. Sustainability 2020, 12, x FOR PEER REVIEW 23 of 30 An analysis of the results obtained by means of GWR shows the complexity of trying to explain the distribution of rural accommodation establishments by tourist preferences, which reveals that the implementation of the same has not followed an objective criterion. Far from it, in certain places, their location follows an almost random criterion which may perhaps be explained by other variables which have nothing to do with the tourism and business logic that should govern any tourist business.

Discussion
Regression models aim to explain the relationship between independent variables and the dependent variable, although in cases in which human decisions or behaviour intervene, it is always difficult to find strong causal relationships [75]. This is clear in the case analysed in which we aim to explain whether the location of rural accommodation establishments and consequently the presence

Discussion
Regression models aim to explain the relationship between independent variables and the dependent variable, although in cases in which human decisions or behaviour intervene, it is always difficult to find strong causal relationships [75]. This is clear in the case analysed in which we aim to explain whether the location of rural accommodation establishments and consequently the presence of this type of supply follows logical criteria or depends on personal inconsistent decisions, as was found in some studies [64].
In order to obtain probable explanations which reveal the incidence of a set of explanatory variables on an explained variable, one generically resorts to the application of regressions of multiple types. For this reason, taking into account the literature analysed, we decided to use ordinary least squares regression which is one of the techniques most frequently used by scientists [85][86][87][88], although given the latter's limitation when contextualising spatial relationships, geographically weighed regression is becoming more and more acceptable in analyses of tourism [89,90]. The essential objective of using these two types of analysis is to determine which of the two best adapts to the creation of models explaining the location of the supply.
In both cases, it has been found that it is very difficult to achieve a valid model which complies with the statistical requirements necessary so that predictions can be made according to the same. In this sense, it is clear that GWR obtains better results than OLS (hypothesis 3), although both types of regression show that the distribution of rural accommodation establishments is not ideal; this means that they also refute other techniques such as grouping analysis [57], hot spot analysis, or cluster and outlier analysis [41].
The difficulty in defining a suitable model to explain the distribution of rural accommodation establishments is confirmed when the preferences of the demand for rural tourism in the area studied are taken into account. For this reason, it can be inferred that their location depends on a statistically complex environment in which decisions are made on many occasions, owing to a personal criterion supported by expansive tourist policies which sometimes lack rigour. These policies pursued the profusion of the supply to the detriment of other necessary options. We can thus understand declarations calling for the need to implement a supply of quality tourist products which are capable of making destinations more attractive. Among these products, the following stand out: dehesas as an essential resource to support currently non-existent agrotourism [91], bathing areas and reservoirs [92], protected areas [93], the historical heritage [94], and very specific segments such as hunting [95].
The demonstration that the distribution of the pool of rural accommodation establishments is not ideal, as there are areas of this supply which do not have the necessary attractions in the opinion of the demand, makes it clear that it is necessary to delimit satisfactorily the explanatory variables which can reliably define and delimit the location of this type of supply (hypothesis 1). In order to do so, it is recommendable to use the criterion of the demand of the tourist type analysed. In this sense, the carrying out of surveys in which the tourist expounds the activities he/she prefers assumes that the criteria established are much more realistic and effective at the end of the day than any other method. It should not be forgotten that when the tourist indicates his/her preferences, he converts them into rigorous variables which would explain the presence of accommodation establishments (hypothesis 2). As a result, it is considered that above all the main attractions that must act as explanatory variables are mountain areas, bathing areas, and historical-artistic ensembles, locations which are also in areas far from highways and reservoirs. The simple superimposition of these locations on the supply of rural accommodation establishments shows that some of the locations in which they are available do not have this kind of attraction. This undoubtedly means that this location does not follow logical criteria, which explains that in these areas lacking the tourist attractions required by tourists, the occupation level is appreciably lower than in other areas better adapted to the preferences of tourists. This fact is revealed by both techniques, although it is obvious that spatially weighted regression is much more efficient as it is capable of determining the areas to which the regression models are better adapted. In this respect, it is as well to remember that this type of regression is capable of generating a different equation for each of the locations selected (hypothesis 4). Owing to this, its analytical superiority is understood. It is interesting that there are areas in which the adjustment of the models can be improved; they coincide with areas in which the attractions required by tourists do not abound. Moreover, the models themselves described define some outliers, imbalances in short, owing to the existence of a specific criterion such as the presence of a low demographic volume (hypothesis 5). In this sense, it is understood that small population centres, normally with less than 500 inhabitants, lack the endogenous capacity for enterprise and also suffer from a scarcity of services and tourist equipment.
By taking into account the aspects listed, it can be affirmed that the distribution of rural accommodation establishments does not correspond as it should to the presence of the lodging preferred by tourists. In its turn, it is also noteworthy that ordinary least squares regression is not the most effective technique for explaining this distribution, not even if its aim is to provide a generic model. In contrast, spatially weighted regression operates better as it explains a higher percentage of the variance and also defines areas in which the models constructed are more effective.
The analyses carried out reveal that GWR is much superior to OLS, even when analysing models based on variable behaviour such as that of human decisions, far from being adapted to physical laws or following coherent guidelines, and predicting the supply of accommodation. This is because although both types of regression obtain general models, only GWR allows local models adapted to each individual case (hypothesis 6). Moreover, it has another consubstantial advantage when tackling spatial statistics, i.e., the consideration of multiple types of spatial relationships which is particularly useful for tourism. In this discipline, it has been shown on numerous occasions that proximity is an essential element as is also in a sense the first law of geography, which indicates that interactions between places are inversely proportional to the cost of the distance between them [79].
The very complexity of the model analysed leads us to think that this type of technique is very useful, even in erratic systems such as that analysed, and facilitates decision-making at least in the places better adapted to the models designed. Nevertheless, both the global and local models defined by GWR can be improved from a statistical perspective by adding another type of explanatory variable, although when this possibility was explored with the inclusion of the most emblematic protected natural spaces of the study area, it was found that statistical consistency fell. Due to this, the results given here were obtained after multiple improvements and discarding the most problematical models, seeking the best relationship between a logical explanation and one adapted to the preferences of tourists and statistical robustness.
The applicability of this research allows both the local and regional administration to define a tourist policy conducive to the search for alternatives in the designing of tourist products in the most problematical areas and to stress the value of other attractions of these territories which are not mentioned by tourists. Indeed, it can be observed that many areas, in which the supply is not adapted to the resources required by visitors, contain other elements which may be of interest such as protected natural spaces. In them, specific products could be created, orientated towards nature tourism, ornithological tourism, or even agrotourism, as many spaces feature the typical landscape of the southwest of the Iberian Peninsula, the "dehesa".
All this opens a new line of analysis based on finding explanations for the suitable location of investments, in this case, illustrated with the supply of rural accommodation in a specific context, Extremadura. This line would improve the model, perhaps including more attractions so that the tourist can define it better.

Conclusions
The analysis of the distribution of the supply of rural accommodation in Extremadura, using two types of regression widely accepted by the scientific community, leads to the following conclusions: The first shows that the location of the supply of rural accommodation is random over a large part of the territory, as is shown by the fact that there is no correspondence between the resources required by the tourist and those present in said areas. Fortunately, the centres in which there are fewer tourist attractions according to the demand tend not to have many beds. In contrast, where this correspondence does exist, an important pool of lodging is detected. In line with the above, it is shown that the north of the province of Cáceres, recognised by the INE as an outstanding area for rural tourism at a national level, together with other areas with the highest relief in Extremadura, does have the necessary attractions and moreover a considerable number of rural accommodation beds.
The second is that the most important attractions for the demand from rural tourists who stay in this type of accommodation are varied, although those predominating are altitude, bathing areas, and historical-artistic ensembles. It is indeed observed that the distance to these elements has a negative relation with the number of beds available. In other words, the nearer the centres are to them, the greater their capacity for rural accommodation.
The third reveals that the application of OLS does not provide a convincing explanation for the model defined by tourists, despite the fact that it is a global statistic, on defining a single equation to explain the relationship between the independent variables and the dependent variable. This is due to the fact that the variability of circumstances reflected in the territory conceals different realities and cannot therefore be synthesised in a single equation.
The fourth is that GWR gives better results for both the global model and the local models which it is capable of creating. Moreover, it allows the distinguishing of spaces in which the model is more reliable than others in which the level of adjustment is appreciably lower. As has been mentioned, the reason for this is that different situations require a differentiated analysis, which this regression allows as it has not only a global outlet but also a local one, calculating a specific case for each case analysed.
The fifth shows that the configuration of spatial relations is necessary to explain models in which territorial proximity is vital. In this case, tourist activities are a clear exponent as proximity to resources is a clear sign which can explain the presence of the supply of accommodation.
The sixth and last conclusion refers to the need to continue with a line of research into the application of spatial statistics to all those systems in which the territory can be considered as one more variable to analyse. For this reason, when both regression models have been compared, the immediate conclusion has been the goodness of GWR compared with the poor results of OLS.  GEOESTADÍSTICOS, grant number PRI-IB16040 and the APC was funded by the Consejería de Economía e Infraestructuras de la Junta de Extremadura (the branch of the regional government that covers economy and infrastructure) and by the European Regional Development Fund (ERDF).