The Impact of High-Speed Rail Systems on Tourist Attractiveness in Italy: Regression Models and Numerical Results

: This paper evaluates the impact of high-speed rail systems on tourist attractiveness in Italy. The analysis is carried out with reference to provincial capitals, only some of which are served by high-speed railway lines. To achieve this objective, two multiple linear regression models were speciﬁed and calibrated, which relate arrivals and presences in accommodation facilities to several factors that could inﬂuence the tourist destination: cultural, historical, and monumental heritage, commercial activities, recreational activities, accessibility, etc. Both models showed that the availability of high-speed railway services is an important factor in the choice of tourist destination, being, moreover, the only accessibility variable found to be signiﬁcant; furthermore, the elasticity of tourist demand to this factor was signiﬁcant too.


Introduction
The accessibility of a site is one of the main factors influencing its ability to attract residential, commercial, and industrial settlements.In the literature, it is possible to find numerous works that confirm this assumption.Good accessibility makes a place easy to reach (passive accessibility) and allows those who live or work there to easily reach other places (active accessibility), making it more attractive to live in or establish a commercial or industrial activity.
From the tourism point of view, the accessibility of a site or a city can influence the choice of users when planning a trip, thus indirectly affecting the destination attractiveness; in fact, tourists tend to prefer an easily accessible destination over another that is more difficult to reach.
The accessibility of a place, in general, and even more so in tourism, is strongly influenced by the available public transport services.In this context, high-speed rail transport systems play an important role in the tourist development of a location, increasing its tourist accessibility as well as its general accessibility.
The purpose of this paper is to study the impact that high-speed rail (HSR) systems may have on tourism.To achieve this, two linear regression models were calibrated and specified to estimate tourist flows as a function of several accessibility variables, including the number of runs of high-speed rail services, as well as variables of cultural and tourist assets consistency.The models were calibrated with data from 111 provincial capitals in Italy, with reference to the year 2018, which is not affected by the impact of the COVID-19 pandemic.Although similar models could be calibrated for other Western countries, the Italian case study is significant because high-speed services are not widespread: of 111 provincial capitals, 49 are not served by high-speed rail at all, and 41 are served with no more than 10 rides per day.
In this study, we consider high-speed services not only those which exceed a maximum speed of 300 km/h (such as Trenitalia Frecciarossa services) but also those which reach a Sustainability 2022, 14, 13818 2 of 33 maximum speed of 250 km/h (Trenitalia Frecciargento services) and 200 km/h (Trenitalia Frecciabianca services), according to the UIC definition: "High-speed rail combines many different elements which constitute a 'whole, integrated system': an infrastructure for new lines designed for speeds of 250 km/h and above; upgraded existing lines for speeds of up to 200 or even 220 km/h, including interconnecting lines between high-speed sections" [1].
The limitations of the proposed models lie in their applicability only to the Italian case, but similar models can be specified and calibrated in other territories with the same approach proposed in the paper; from a temporal point of view, the models can be recalibrated with reference to years different from the one under study, just as they can be applied after the construction of new high-speed railway lines to check whether the predictions remain valid.This work can contribute to the evaluation of investments in high-speed rail transport systems at the regional and national levels.
The goals of this study are basically twofold: (i) to verify whether and to what extent the presence of high-speed services has a real impact on tourist attractiveness; (ii) to carry out this verification through quantitative methods (mathematical models, in our case) that also provide a numerical estimate of the corresponding impact.
The paper is articulated as follows: Section 2 examines the background of the problem; Section 3 describes the data; multiple linear regression models are specified and calibrated in Section 4; an application to the city of Benevento is described in Section 5; conclusions and research perspectives are summarised in Section 6.

Background 2.1. Key Tourism Data
Over the past sixty years, tourism has steadily grown in both volume and importance, becoming one of the key pillars of the world economy.In 2018, for the ninth consecutive year, except for a period of crisis (2007)(2008)(2009), international arrivals have grown (Figure 1).
(Trenitalia Frecciabianca services), according to the UIC definition: "High-speed rail combines many different elements which constitute a 'whole, integrated system': an infrastructure for new lines designed for speeds of 250 km/h and above; upgraded existing lines for speeds of up to 200 or even 220 km/h, including interconnecting lines between high-speed sections" [1].
The limitations of the proposed models lie in their applicability only to the Italian case, but similar models can be specified and calibrated in other territories with the same approach proposed in the paper; from a temporal point of view, the models can be recalibrated with reference to years different from the one under study, just as they can be applied after the construction of new high-speed railway lines to check whether the predictions remain valid.This work can contribute to the evaluation of investments in highspeed rail transport systems at the regional and national levels.
The goals of this study are basically twofold: (i) to verify whether and to what extent the presence of high-speed services has a real impact on tourist attractiveness; (ii) to carry out this verification through quantitative methods (mathematical models, in our case) that also provide a numerical estimate of the corresponding impact.
The paper is articulated as follows: Section 2 examines the background of the problem; Section 3 describes the data; multiple linear regression models are specified and calibrated in Section 4; an application to the city of Benevento is described in Section 5; conclusions and research perspectives are summarised in Section 6.

Key Tourism Data
Over the past sixty years, tourism has steadily grown in both volume and importance, becoming one of the key pillars of the world economy.In 2018, for the ninth consecutive year, except for a period of crisis (2007)(2008)(2009), international arrivals have grown (Figure 1).
Data from the UN World Tourism Organisation [2] show that international tourist arrivals worldwide have reached 1.4 billion two years ahead of forecasts; it is due in part to the easiness of travelling, lower travel costs, the simplification of obtaining a visa, and some other factors that act as enablers for the expansion of tourism.Among destinations, Europe is the most popular, accounting for 51% of total arrivals in 2018 (710 million).Italy is the fifth destination in the top ten most visited countries in the world, with 62 million international arrivals (+7% from 2017 to 2018).

Arrivals (billions)
Revenues (trillions) The data show that leisure activities and holidays are still the main purposes of the trip.The analysis of the purpose is useful for understanding the needs of tourism demand Data from the UN World Tourism Organisation [2] show that international tourist arrivals worldwide have reached 1.4 billion two years ahead of forecasts; it is due in part to the easiness of travelling, lower travel costs, the simplification of obtaining a visa, and some other factors that act as enablers for the expansion of tourism.Among destinations, Europe is the most popular, accounting for 51% of total arrivals in 2018 (710 million).Italy is the fifth destination in the top ten most visited countries in the world, with 62 million international arrivals (+7% from 2017 to 2018).
The data show that leisure activities and holidays are still the main purposes of the trip.The analysis of the purpose is useful for understanding the needs of tourism demand in terms of amenities and ancillary services that play a strategic role in the competition between tourist destinations.This is consistent with the traditional definitions of tourism: "Tourism is a social, cultural and economic phenomenon which entails the movement of people to countries or places outside their usual environment for personal or business/professional purposes.These people are called visitors (which may be either tourists or excursionists; residents or non-residents) and tourism has to do with their activities, some of which involve tourism expenditure."[3,4].
Tourism is a crucial economic sector for Italy: in 2018, tourism had direct and indirect effects on GDP of 5% and 13.2%, respectively; furthermore, tourism directly generated 6% and indirectly 15% of total employment.
The attractiveness of 'destination Italy' is facilitated by the presence of a wide system of heritage attractions in terms of historical villages, archaeological sites, cities of art, cultural tradition, significant landscape, and seaside resorts.This system consists of 4026 museums, galleries, or collections, 570 monuments and monumental complexes, and 293 archaeological areas and parks.Moreover, 2371 municipalities host at least one museum, and there are 58 locations included in the UNESCO world heritage, positioning Italy at the top of the world ranking [5].
Analysing the movement of tourists by region, the data show that Veneto is the leader with over 19 million arrivals, followed by Lombardy with around 17 million; a second group includes Emilia Romagna, Tuscany, and Lazio, forming a central backbone throughout the country; Basilicata and Molise, on the other hand, fall into the lowest class of values (Figure 2).

Literature Review
In the state-of-the-art, we can identify three groups of works: (1) papers studying models for estimating tourism demand; (2) papers studying the impact of HSR services on aspects different from tourism; (3) papers studying the impact of HSR services on tourism.
Tourism demand estimation is a topic widely covered in the literature.A review on the subject can be found in [6].
There are several papers based on modelling time series and proposing forecast models based on them.Cho [7] compared three different approaches to forecasting tourist arrivals (exponential smoothing, univariate ARIMA, and Artificial Neural Networks, ANN) and found ANN to be the best method.Palmer et al. [8] proposed ANN-based models for forecasting tourism time series.Akin [9] proposed an approach for the selection of the best models for tourism demand estimation, starting from the comparison between three models used to determine time series; she defined a set of rules to identify the most suitable model according to the available data.Spatial interaction models for estimating tourism flows were proposed in [10].Chan and Lim [11] analysed tourism seasonality in New Zealand using spectral analysis.An approach based on evolutionary fuzzy systems was proposed in [12].Hassani et al. [13] proposed the use of Singular Spectrum Analysis (SSA) to predict tourist arrivals in the USA; the authors highlight that there are significant advantages of the proposed approach over more traditional ones such as ARIMA, exponential smoothing, and ANN.Li et al. [14] proposed a model based on principal component analysis and artificial neural networks for estimating tourism volumes based on time series.Other works on time series-based tourism flow forecasting can be found in [15,16], which proposed approaches based on structural time series, and [17], which also refers to stochastic nonstationary seasonality.Chu [18] proposed a fractionally integrated autoregressive moving average approach to forecasting tourism demand in Singapore.Andrawis et al. [19] applied time series to tourism in Egypt, while Nelson et al. [20] studied a case study in Hawaii.
There are many recent papers in the literature studying the impacts of HSR services from different perspectives.Cheng and Chen [21] studied the impacts on the capacity of traditional passenger and freight rail services.
Impacts on social exclusion/inclusion were studied by Dobruszkes et al. [22], who highlighted that the users of such services are predominantly "...male, higher income, highly educated and belonging to higher social occupational groups"; Ren et al. [23], who studied the impact of HSR services on social equity in China; and Cavallaro et al. [24], who studied the spatial and social equity aspects related to HSR lines in Northern Italy.Several papers have studied the impacts of pollution and/or greenhouse gas emissions, including the work of Fang [25], who studied the impact on air pollution in China, showing that it tends to decrease in regions where services are present, compared to unserved areas; Jia et al. [26], who studied the impact on CO 2 emissions in China, showing that there are significant reductions in greenhouse gas emissions; Strauss et al. [27], who studied the impact of HSR services on air transport demand and overall CO 2 emissions.
Several other papers have studied the impacts of HSR services on property values, including Huang and Du [28], who studied the effects on land prices in China, showing that the impact is significant, particularly in urban areas; Okamoto and Sato [29] examined the impacts of HSR services on land values, focusing on a region in Japan; Zhou and Zhang [30] studied the impacts on both property values and GDP.
The impact of HSR services on the industry was studied by Tian et al. [31], who examined the impact on service industry agglomeration in peripheral cities, showing that HSR facilitated economic growth in core cities at the expense of peripheral cities.The correlation between HSR services and the evolution of the high-tech industry in China was studied by Xiao and Lin [32], showing that the impact was significant, especially in cities. Chang et al. [33] studied the impact of the extension of the HSR network on industrial movement patterns in China's Greater Bay Area, showing that the expansion of the network led to a decentralisation of large industries.Zhang et al. [34] studied the impact of HSR on consumption in China.
Studies on the impact of HSR services on tourism are also numerous.In particular, the countries where these impacts have been most studied are China, Spain, and Italy.The main works concerning China were proposed by Wang et al. [35], who studied the effects of the HSR network on regional tourism development; Jin et al. [36], who studied the impact of HSR on winter tourism in a specific region; Wang et al. [37] studied the impact on urban tourism; another study was proposed by Yin et al. [38].Zhang et al. [39] studied the impact of HSR on tourism mobility and the value of tourist firms.Zhou et al. [40] studied the effects of HSR on regional tourism economies in China.Campa et al. [41], on the other hand, studied the impact of HSR on tourism in both Spain and China.
The impacts of HSR services on tourism have been extensively studied in Spain.Pagliara et al. [42] studied the impact on tourism for the Madrid case study, while Albalate and Fageda [43] and Albalate et al. [44] studied the impact on tourism more generally.Two other works have been proposed by Guirao and Campa [45] and Guirao et al. [46].
The main studies referring to Italy are those by Pagliara et al. [47] and Pagliara and Mauriello [48], who studied the impact of HSR on tourism in Italy through statistical analysis.
Masson and Petiot [49] examined the impact on tourism attractiveness in a specific case study: the line between Perpignan (France) and Barcelona (Spain).
Other studies [50][51][52] investigated the potentialities of HSR for the tourism development of regions.Recently, also the Italian Minister of Culture Heritage indicated the use of HSR connections as a factor in revamping tourism after the pandemic event, especially in South Italy [53].
The research methodology adopted in this work involved the following main phases: (1) Identification of the variables that can influence the choice of a tourist destination and data collection; in particular, three types of variables were identified: (a) variables related to tourism supply; (b) variables related to accessibility; (c) other variables that can influence the choice of destination (commercial activities, being a regional capital, etc.).(2) Specification and calibration of multiple linear regression models capable of relating data on tourist attractiveness to the variables identified above.In this phase, the goodness of the models will be assessed through statistical tests, also to verify whether the assumption of linearity between dependent and independent variables is valid.(3) Analysis of the results obtained and verification of the performance of the models by means of sensitivity analysis and an application to a case study.
These steps were preceded by a comprehensive analysis of the state-of-the-art.
In this paper, two multiple linear regression models are proposed to identify the main variables influencing tourist mobility in Italy; in addition to accessibility variables, including the presence of HSR services, data on the quantity of cultural heritage are considered in the model.As is shown in the following sections, of all parameters referring to accessibility, only the one related to HSR services was significant; other parameters, such as distances from airports or other municipalities, instead, were not statistically significant in explaining tourist flows.
To our best knowledge, the proposed approach was not proposed before in Italy, and similar studies are not available.Indeed, the studies available in the literature refer to the evolution of tourism following the implementation of new HSR services, but without providing models or quantitative methods capable of relating the variables of tourist attractiveness to the presence of rail links.

Data
Various data sources were used in this study.The main tourism data are taken from ISTAT (Italian Institute of Statistics) and quantify monthly arrivals and presences, classified by the origin and category of accommodation (hotel and non-hotel).These data were available at the regional, provincial, and municipal levels.Here, 'arrivals' correspond to the registration of customers in the accommodation facility, while 'presences' correspond to the total number of nights spent in a facility; therefore, in this study, the term 'presence' is equivalent to the term 'overnight stay'; in the following, we use the term 'presence' to be congruent with the ISTAT terminology.In the development of this work, the data on tourist movements refer to 2018, so that they are not affected by the COVID-19 pandemic event.
Overall, in Italy, there were 128.1 million arrivals and 428.8 million presences, with an average stay of 3.35 nights.The regional data on arrivals and presences are reported in Table 1, while Table 2 shows the same data with reference to the provinces of the regional capitals.It can be seen that Veneto, Lombardy, Tuscany, and Lazio are the regions with the most arrivals, while the provinces with the most arrivals are Rome, Venice, Milan, and Florence, with an obvious correlation with the attractiveness of the capital cities.On the other hand, the most attractive regions in terms of presence are Veneto, Trentino-Alto Adige, Tuscany, and Emilia-Romagna and the provinces Venice, Rome, Trento, and Milan.The difference between regional arrivals and presences, clearly linked to the average length of stay, is related to the type of holiday, often weekly, in Trentino-Alto Adige (mainly in winter periods) and Emilia-Romagna (mainly in summer periods).
Table 3 reports the data on arrivals and presences for the 111 Italian provincial capitals, on which the models have been specified and calibrated.The cities of Rome, Milan, and Venice have over 5 million arrivals and the same cities, with the addition of Florence, have over 10 million presences per year.We underline that the data used does not allow, at this territorial scale, to distinguish tourist trips from those for other reasons (work, business, study, etc.) and does not include stays in holiday homes or those trips that do not include a stay in an accommodation facility (one-day tourist visits, stays with relatives or friends, etc.); despite all these limitations, we believe that these data are the best available for the analyses we wish to conduct.
On the supply side, the accommodation establishments (see Tables 4 and 5) show the clear prevalence of Veneto and the Province of Venice, decidedly higher also than Lazio and the Province of Rome.Data on supply have not been used as possible explanatory variables in our models, since there is a direct relationship between supply and demand (supply increases where there is more demand) that could invalidate the modelling analysis aimed at identifying the other variables that can influence tourist flows.
Once the dependent variables had been identified and the corresponding data collected, the possible explanatory (or independent) variables were examined; these variables are the factors that could influence the choice of a touristic destination.Five categories of variables have been identified:  Distance from the nearest international airport; 4.
Total road travel time to all other possible destinations; 6.
Total road travel distance to all other possible destinations.
Not all variables refer to the same year.The most recent data on employees date back to the last census, which is carried out every ten years, but there are no better or more reliable statistical sources.On the other hand, the data on State places of culture and the stock of cultural assets, although referring to different years and before 2018, can be considered valid because the variation of these numbers over the years is negligible.
The following subsections describe the sources of the data and how they were obtained or derived.

Variables Related to the Supply of Historical/Cultural Assets
The number of cultural sites is a figure taken from [55] and refers to fortified architecture, archaeological areas, historical monuments, monuments of industrial archaeology, funerary monuments, archives and libraries, churches and places of worship, villas and palaces, archaeological parks, museums and galleries, parks and gardens.Only those under state jurisdiction and management are considered, and therefore, this variable does not include all possible cultural goods.This variable is indicated as scs i , where i indicates the city.
The same source, but with reference to 2017 [56], provides the total number of cultural assets, understood as architectural assets, archaeological assets, parks, and gardens.This variable is indicated with tch i .
The data on employees in libraries, archives, museums, and other cultural activities are taken from the ISTAT census [57]; clearly, the number of employees in this sector is assumed to be a proxy for the supply of the same type of activity to tourists.This variable is indicated with mus i .
The size of the historical urban fabric was estimated from ISTAT data [58] by calculating the percentage of houses built before 1919.This variable is indicated with huc i .
The values of these variables for the provincial capitals are shown in Table A1 in Appendix A.

Variables Related to the Supply of Entertainment/Amusement Activities
The data on employees in creative, artistic, and entertainment activities and employees in recreational and leisure activities are taken from the ISTAT census [57].In addition, in this case, it is assumed that these data represent a proxy for the supply of this type of activity on the territory.The values for the provincial capitals are reported in Table A2 in Appendix A, and the variables are indicated, respectively, by ace i and ree i .

Variables Related to the Supply of Commercial Activities
The data on retail trade employees (excluding motor vehicles and motorbikes) are taken from the ISTAT census [57] and are assumed to be a proxy for the commercial offer in the territory.The values for the Provincial capitals are reported in Table A3 in Appendix A. This variable is indicated with ret i .

Accessibility Variables
The tourist accessibility of a place, particularly a city, is determined by several factors depending on the infrastructures and transport services available.The data source or calculation methods for these variables are described below.

Number of Direct Runs on High-Speed Rail Services
This variable indicates the number of runs of Italian high-speed lines.The data refer to the number of runs of this type of service arriving/departing from the station of the municipality; for some municipalities, this value is zero, if not served by this type of service.This variable is indicated with hsr i .

Distance from Rome's Leonardo da Vinci Airport (Italy's Main Hub)
The calculation of this variable, as well as all the following variables based on times or distances, required the construction of a graph of the national road network.This graph was implemented starting from the 'OpenStreetMap' database, correcting some connection errors and considering only the roads of the main network: all motorways; all primary roads with separated carriageways and their ramps; all main trunk roads (typically state roads and regional roads); some secondary roads necessary to ensure the full connection of the network.
Overall, this model represents 202,628 km of roads; Table 6 reports the extension of the network, while Figure 3 shows the overall graph.In addition to the length of the different road sections, which is necessary to calculate the distance between municipalities, it is also necessary to attribute a speed to each link, to calculate the corresponding travel time.In this work, we consider the use of the free-flow speeds sufficient, i.e., uncongested conditions, assuming the values reported in Table 7.With this model, the matrix of times and the matrix of distances between all the municipalities were generated; these matrices have a dimension of 8091 × 8091, being 8091 the Italian municipalities according to the 2011 ISTAT surveys.This matrix was simplified into a 111 × 8091 matrix, considering that the indicators were calculated only for the provincial capitals.
From this matrix, the variables in question were calculated as: where: d i,Fiumicino is the distance between municipality i and Leonardo Da Vinci airport in Fiumicino (hundreds of km).With this model, the matrix of times and the matrix of distances between all the municipalities were generated; these matrices have a dimension of 8091 × 8091, being 8091 the Italian municipalities according to the 2011 ISTAT surveys.This matrix was simplified into a 111 × 8091 matrix, considering that the indicators were calculated only for the provincial capitals.
From this matrix, the variables in question were calculated as: where: di,Fiumicino is the distance between municipality i and Leonardo Da Vinci airport in Fiumicino (hundreds of km).

Distance from the Nearest International Airport
This variable was calculated as: where: d i,AI is the distance between municipality i and international airport AI (hundreds of km).
The international airports considered, those with the most traffic in each region, are listed in Table 8.For the calculation of this variable, the 'gravity-based measures' model proposed by Hansen [59] was adopted.The general formulation of the model is as follows: where: A i is the indicator measuring the accessibility of zone i; W j β is a measure of the importance of zone j, based on activities, services, population, and so on; β is a coefficient of the model; f (c i,j , α) is an impedance function, based on generalised cost, distance, etc., between zone i and zone j.
We have calculated the accessibility indicator as: where: inh j is the number of inhabitants in the municipality j; t i,j is the travel time in hours between municipality i and municipality j.

Total Travel Time by Road with All Other Possible Destinations
For each provincial capital, i, we calculated the total travel time (h × 10 −3 ) from all other Italian municipalities and calculated the variable as the reciprocal, with the following formula: For each provincial capital, i, we calculated the total travel distance (km × 10 −5 ) from all other Italian municipalities, based on the implemented graph, and calculated the variable as the reciprocal, with the following formula: ∀i where d i,j is the distance between capital city i and municipality j.
The values of all accessibility variables are reported in Table A4 in Appendix A.

Importance Variables
We consider a dummy variable indicating whether the city is a regional capital (1) or not (0).The values of these variables, indicated with cap i , are reported in Table A5 in Appendix A.

Regression Models
The impact and significance of the explanatory variables on the tourism phenomenon are assessed with multiple linear regression models.These models relate the dependent variables (in our case, presences and arrivals) to the explanatory variables (independent) that may affect them.
Linear regression models take the following general form: where: Y is the expected value of the dependent variable; β 0 is a coefficient of the model, which does not depend on the independent variables (intercept of the regression line); β k are the coefficients of the model, which together with β 0, have to be calibrated; X k are the independent variables.
Any model must be specified and calibrated.The specification phase consists of defining which of the independent variables can be included in the model; the calibration phase consists of finding the coefficient values that can best reproduce the observed values of the independent variables for that specification.
The observed data of the independent variables are denoted by y i and ordered in a vector y; the vector y has as many elements as the number of municipalities on which we are going to calibrate the model (in our case, 111 municipalities).The values that the independent variables assume for each observation are also called 'predictors' and indicated with x i,k , where i represents the provincial capital and k the independent variable; these values can be ordered in a matrix, x, which has as many rows as the number of cities and as many columns as the number of independent variables plus one (coefficient β 0 : the elements of the first column of the matrix are equal to 1).The coefficients β k can be ordered in a vector β that has as many elements as the number of coefficients.Finally, we need to add the vector of statistical errors, ε, which has as many elements ε i as the number of cities.With these notations, it is possible to write: This formula represents, in short, the relationship between the observed data, y, and the independent variables, x.The calibration of the model consists in searching for the vector of coefficients, β, that minimises the vector of statistical errors, ε; in the theoretical case in which all statistical errors are equal to 0, the model would perfectly reproduce all the observed data.
If we denote by x i the i-th row of the matrix x, we can write: The optimal values of the coefficients can be obtained using the generalised least squares method, which minimises the sum of squares of the statistical errors; the corresponding optimisation model can be written as follows: The ability of a model to reproduce observed data, and thus its goodness, is measured by several indicators; one of them is the coefficient of determination, R 2 , which is calculated as: where yˆis the average of the y i values; this indicator measures the ability of the y i variables to explain the model, and the closer its value to 1 (statistical errors equal to 0 and perfect reproducibility of the observed phenomenon), the greater the goodness of the model.
The coefficient of determination always increases (or at least does not decrease) as the number of explanatory variables increases.To avoid this problem, it is possible to use the adjusted coefficient of determination, R 2 adj , which penalises the inclusion of variables that are not necessary to explain the phenomenon; this indicator is calculated as: where n is the number of observations and p is the number of degrees of freedom (df ) in the model.Clearly, as the number of explanatory variables, i.e., degrees of freedom, increases, the value of R 2 adj decreases with respect to the value of R 2 , the more so as there are few observations.In our case, with 111 observed data, we do not expect a great difference between the two values, which, in any case, will be calculated to verify the goodness of the model.
The coefficient of determination cannot, however, be the unique indicator to evaluate the goodness of a model.Indeed, it does not always decrease (it usually increases) with the number of variables k, even if some of them are not useful to explain the phenomenon.The other indicators that must be used to evaluate the model are the hypothesis tests that are able to measure whether the parameters adopted in the model are indeed significant to reproduce the phenomenon.In this study, we use the F-test, obtained from the analysis of variance, and the t-test, concerning the significance of each independent variable.We will assume that a model is acceptable if the significance F is close to 0 (at least < 0.05) and if the t-test of each coefficient β k is higher [lower] than t 95 [−t 95 ] for positive [negative] β k , where t 95 is the value of the t-student distribution corresponding to the degrees of freedom (df ) of the model with 95% confidence.The degree of freedom of a model is equal to the number of independent variables x k of the model.The values of t 95 for the different degrees of freedom (1 to 10) are reported in Table 9.The specification and calibration procedure used in this study is based on a trial-anderror approach, based on the values of the Pearson correlation coefficients between the dependent variable and one of the independent variables.The correlation coefficient is calculated as the ratio of the covariance between two variables, σ xy , and the product of the standard deviations, σ x and σ y : This coefficient can assume values between −1 and 1; the higher the absolute value of the index, the more the two variables are correlated with each other, either positively or negatively, depending on the sign.The value of the correlation index indicates the possibility that the independent variable has a significant influence, within the model, on the dependent variable; therefore, in the trial-and-error procedure, variables with a higher absolute correlation index will be tested first, verifying if the sign is physically admissible.After a variable has been introduced, the model will be calibrated, and it will be checked whether the inserted variable is significant.If it is, the variable is kept in the model and another one is added; if it is not, another variable is tried.To be valid, a model must have all the independent variables significant, i.e., they must respect the minimum values of the indicator t-test, and a sign of the corresponding coefficient that has a physical meaning; among all the calibrated models that respect these conditions, those with the greatest coefficient of determination are preferable.This first phase leads to a model with all significant variables and with a coefficient of determination greater than all the other models tested; from this model, we try to introduce other variables and, then, to eliminate a variable and replace it with another, to test other possible combinations.
In Table 10, we report the correlation coefficients of each explanatory variable with the independent variables in decreasing order of value.All specified and calibrated models are summarised in Table 11, for arrivals, and in Table 12, for presences, where, for each model, the considered variables, the R 2 and R 2 adj indicators, the significance F, the model coefficients and, for each variable, the t-test value, whose limit value is also reported, and the validity or not of the model are indicated.Overall, 18 models for estimating arrivals and 19 models for estimating presences were calibrated; of these models, five models for estimating arrivals and five models for estimating presences were valid in terms of significance and sign of the coefficients.At the end of the procedure, model no.16 for estimating arrivals and model no.18 for estimating presences were identified as the best.These models have the maximum values of R 2 and R 2 adj , and comply with all the significance tests.The values of the coefficients of determination are sufficiently high in both cases (0.909 for arrivals and 0.885 for presences).It is important to note that in both models, the only accessibility variable found to be significant is the one related to high-speed rail services, hsr i .The other accessibility variables were not statistically significant.
Figures 4 and 5 show the scatter diagrams comparing the actual and estimated values.The best models are formulated as follows: The analysis of these models highlights the following aspects: (a) The intercept assumes a negative value.This property permits the models to be used only for overall evaluations of the entire set of municipalities (remember that, having used the generalised least squares method, the sum of the values estimated by the model for all the municipalities is equal to the sum of the true values).The application to a specific municipality could give implausible values, and for municipalities with less tourist importance, negative values.(b) The variables linked to creative, artistic, and entertainment activities, total cultural assets, the presence of libraries, museums and other cultural activities, and direct rides on high-speed services always appear in both models.In the arrivals model, the variable related to commercial activities is also significant, while it is not statistically significant for presences.This indicates that commercial activities have a greater influence on shorter-duration trips than longer ones.In all cases, the variables closely linked to the tourist offer of the place of destination are significant.(c) Among the accessibility variables, only the one representing high-speed rail services is statistically significant in estimating arrivals and presences.The other accessibility variables, at least for the provincial capitals, are not influential.
To evaluate the importance of high-speed services with respect to the other factors, a sensitivity analysis was carried out, increasing the overall values of each variable by 10% and evaluating the percentage increase in the number of arrivals and presences.The results are summarised in Tables 13 and 14.The analysis of these results leads to the following considerations: • High-speed rail services have an important impact on the flow of arrivals and presences in accommodation facilities.The elasticity is greater for arrivals, where an increase of +10% in supply can be estimated as a +3.27% increase in arrivals, while this value is reduced to +2.65% for presences.In both cases, the values are significant: for arrivals, the elasticity is second only to that linked to total cultural assets, while for presences, it is third, being also preceded by creative, artistic, and entertainment activities.

•
A comparison of the model's elasticities between arrivals and presences shows that there is practically the same elasticity for the variable on total cultural heritage (+3.42% arrivals and +3.49% presences), highlighting how this explanatory variable has more or less the same effect on all stays, regardless of their duration.On the other hand, creative, artistic, and entertainment activities have a greater elasticity on arrivals than on presences, showing a tendency to influence shorter stays more.

•
Museums, libraries, and other cultural activities have practically the same elasticity, as total cultural heritage, on both arrivals and presences (+1.69% arrivals and +1.80% presences).

•
Commercial activities, as already mentioned, show an influence only on arrivals and, therefore, a greater influence on shorter stays.
From the calibration of these models and analyses, it can be concluded that the impact of high-speed rail services on tourism flows, as measured by arrivals and presences in accommodation establishments, is significant.For arrivals, the elasticity of the variable is high, of the same order of magnitude as for the total number of cultural assets.For presences, it is lower, but still very significant.Another fact to note is that, of all the accessibility variables considered, high-speed services are the only statistically significant.

An Application to a Case Study
The calibrated models were applied to a specific case study, the city of Benevento.Benevento is a small-medium-sized provincial capital with about 60,000 inhabitants (only 36 out of 111 provincial capitals have fewer residents than Benevento), but it has several important historical/archaeological sites, including the monumental complex of Santa Sofia, a UNESCO World Heritage Site, the Arch of Trajan, the Roman Theatre and the Rocca dei Rettori, as well as several museums and churches of great value.The accessibility of the city, however, is not as good as the artistic and historical heritage: the railway connections with the regional capital (Naples) are not efficient and have a modest frequency, while Trenitalia's Frecce services connect the city on the Rome-Bari route, for a total of only 28 runs, as the sum of those arriving and departing.
Currently (2018 data, pre-COVID), annual arrivals in accommodation amount to 36,252 (on average, 99 per day), while presences stand at 80,144 (on average, 220 per day), with an accommodation supply of 57 establishments for a total of 1039 beds; therefore, there are about 35 arrivals and 77 presences per bed; for comparison, the city of Naples has about 87 arrivals and 232 presences per bed.
A new HS railway line is currently under construction, which will serve a Naples-Benevento-Bari route, with a maximum line speed of 250 km/h.As it is under construction, the frequency of services has not yet been established, but it can be assumed that the service will be organised on 12 pairs of daily runs, increasing the service to a total of 40 daily runs, as the sum of arriving and departing runs.
Assuming the same elasticity as estimated in Section 4, the new services would increase the current services by 42.8% and, therefore, could lead to an increase, other factors unchanged, of 15.8% in arrivals (+5728) and 11.3% in presences (+9056), increasing the overall annual occupancy rate from 21.1% to 23.5%.
These results further underline how high-speed railway lines can have a significant impact on tourist attractiveness.

Conclusions
High-speed rail transport systems have proved to be an important tool for spatial development all over the world.Increased accessibility due to HSR services has a positive impact on industrial and commercial activities, increases property values, and reduces emissions of pollutants and greenhouse gases.This paper studied the effects on tourism in Italy, taking provincial capitals as territorial reference units.The calibrated multiple linear regression models showed that HSR services are the most important accessibility factor for tourism and that their impact is significant in terms of arrivals and presence in accommodation facilities.
We can summarise the results of this study as follows: (a) among all the accessibility variables, the availability of HSR services has a significant impact on tourism attractiveness in Italy; (b) the elasticity analysis showed that the influence of this variable is of the same magnitude as the variables related to the offer of historical-cultural assets and creative, artistic and entertainment activities; (c) the application to the case study of Benevento showed that the presence of HSR services is fundamental for the city's tourism development.
Based on these results, we can reasonably affirm that HSR services represent a strategic factor for the development and promotion of tourism in Italy.With an equal historical and cultural offer, the locations better served by rail transport show a higher attractiveness.This factor should be included in the assessments of policymakers when investing in HSR systems.Indeed, from a practical point of view, the possibility of forecasting the impact on tourism of HSR services (and of the infrastructures they use) makes it possible to make more conscious decisions in the choice of investments, being then able to derive from the results also obtained the impacts on the socio-economic development of the territories involved.
In this regard, further developments of this research will be aimed at extending the study to municipalities that are not provincial capitals and at considering multimodal transport accessibility variables, directly considering the interchanges between different transport systems and their role in improving the sustainable way of enjoying territory.

Appendix A
This appendix reports the input data used in the paper.

Figure 2 .
Figure 2. Movement of tourists by region.Figure 2. Movement of tourists by region.

Figure 2 .
Figure 2. Movement of tourists by region.Figure 2. Movement of tourists by region.
(a) Variables related to the supply of historical/cultural assets:

Figure 4 .
Figure 4. Model for estimating arrivals: comparison between real and model data.

Figure 5 .
Figure 5. Model for estimating presences: comparison between real and model data.

Figure 4 .
Figure 4. Model for estimating arrivals: comparison between real and model data.

Figure 4 .
Figure 4. Model for estimating arrivals: comparison between real and model data.

Figure 5 .
Figure 5. Model for estimating presences: comparison between real and model data.Figure 5. Model for estimating presences: comparison between real and model data.

Figure 5 .
Figure 5. Model for estimating presences: comparison between real and model data.Figure 5. Model for estimating presences: comparison between real and model data.

Table 6 .
Extension of the road network.

Table 9 .
Values of t 95 as a function of model degrees of freedom.

Table 11 .
Models to estimate arrivals.

Table 12 .
Models to estimate presences.

Table 13 .
Results of the sensitivity analysis for arrivals.

Table 14 .
Results of the sensitivity analysis for presences.

Table A1 .
Variables on the supply of historical/cultural assets.