A Projection Pursuit Dynamic Cluster Model for Tourism Safety Early Warning and Its Implications for Sustainable Tourism

: According to the United Nations World Tourism Organization, tourism promotes sustainable economic development. Ensuring tourism safety is an essential prerequisite for its sustainable development. In this paper, based on the three evaluation index systems for tourism safety early warning and the collected sample data, we establish three projection pursuit dynamic cluster (PPDC) models by applying group search optimization, a type of swarm intelligence algorithm. Based on case studies, it is conﬁrmed that the results derived from the PPDC models are consistent with the expert judgments. The importance of the evaluation indicators can be sorted and classiﬁed according to the obtained optimal projection pursuit vector coefﬁcients, and the tourism risks of the destinations can be ranked according to the sample projection values. Among the three aspects inﬂuencing tourism safety in case one, the stability of the tourism destination has the most signiﬁcant impact, followed by the frequency of disasters. Of the ten evaluation indicators, the frequency of epidemic disease affects tourism safety the most, and the unemployment ratio affects it the second most. Overall, the PPDC model can be adopted for tourism safety early warning with high-dimensional non-linear and non-normal distribution data modeling, as it overcomes the “curse of dimensionality” and the limitations associated with small sample sizes.


Introduction
Tourism, as an essential part of the modern service industry, has developed rapidly and promoted economic growth.It has become a critical driving force for achieving sustainable development in recent years [1][2][3][4][5].Also confirmed by the UNWTO (United Nations World Tourism Organization), the tourism industry plays a pivotal role in promoting sustainable development [6][7][8].The UNWTO clearly states the following: (1) Tourism has the potential to contribute, directly or indirectly, to all of the 17 sustainable development goals in the 2030 Agenda for Sustainable Development; (2) Tourism has mainly been included as targets in Goals 8, 12, and 14 on inclusive and sustainable economic growth, sustainable consumption and production, and the sustainable use of oceans and marine resources, respectively; (3) Sustainable tourism is firmly positioned in the 2030 Agenda for Sustainable Development [7].As one of the world's most significant and fastest-growing economic sectors, tourism is well-positioned to foster economic growth and development at all levels and provide income through job creation.Currently, the tourism industry offers one out of every 11 jobs worldwide.Sustainable tourism development relies on good public and privately supplied infrastructure and an innovative environment.The tourism industry Still, it involves ranking the indicators according to their importance, and it is not easy to sort them reasonably without determining the consequences.
Moreover, these non-linear methodologies and models are complicated in the modeling process.Under the premise of satisfying the modeling conditions and correctly determining the relative membership functions and evaluation indicator weights, is there a model that is simpler in form and less complicated in the modeling process, meets the accuracy requirements, and is convenient for subsequent application?In theory, the projection pursuit cluster (PPC) model is an emerging comprehensive evaluation model [30,31,37,38] characterized by the linear relationship between the sample project value (score) and each evaluation indicator, the simple structure, the moderate difficulty in the modeling process, and convenient subsequent application.Empirical research has suggested that the PPC model can meet the accuracy requirements raised in exploring tourism safety early warning.Therefore, this paper applies the PPC model to study tourism safety early warning research, hoping to obtain more reliable and reasonable results and provide a new methodology.

Establishment of Tourism Safety Early Warning Evaluation Index System
Four main types of threats or risk may affect tourism safety, which relate to (1) nature, (2) civil conflicts, (3) epidemics, or (4) technology failures.Generally speaking, Tourism safety early warning is a complex systematic project involving many aspects, such as the political stability of a destination, natural disasters (meteorology, geology, and hydrology), diseases, traffic conditions, and cultural conflicts [17,[20][21][22]26,29,39,40].Scholars often establish different evaluation index systems based on their experience, knowledge, and experience, and no consensus has been reached.Since this paper focuses on introducing the PPDC model [41,42] into the research on tourism safety early warning and evaluation and carrying out reliable and correct modeling, no discussion will be made on establishing a scientific and comprehensive index system.To compare and analyze the reliability and validity of different methodologies (models), we directly selected the three evaluation index systems and sample data from Refs.[43][44][45] as the cases for the empirical research.The evaluation index system of Yang et al. [43], which has been used in many studies [29,[33][34][35]44], consists of three aspects (the frequency of destination disasters, the safety of tourism facilities, and the stability of the destination area) and ten indicators, specifically as follows.
(1) The frequency of disasters at a destination: the frequency of hydrometeorological disasters (x 1 ), the frequency of earthquakes and geological disasters (x 2 ), and the frequency of epidemic diseases (x 3 ).(2) The safety degree of tourism facilities: the service saturation of tourism facilities (x 4 ) and the degree of traffic safety (x 5 ).(4) The regional stability of a destination: the political stability (x 6 ), the social unemployment rate (x 7 ), the social safety stability (x 8 ), the rate of consumer price index increase (x 9 ), and the potential index of host-guest cultural conflict (x 10 ).
Therefore, based on the investigation and statistics of the indicators of tourist destinations mentioned above, tourism safety early warning is intended to, by using reliable and effective modeling methodologies, objectively, accurately, and scientifically evaluate the degree of tourism safety, determine the risk level, issue warnings to relevant parties, and urge the relevant authorities to take measures to avoid hazards.

Criteria for Judging Individual Indicators of Tourism Safety Early Warning and the Collection of Sample Data
The tourism safety warning hierarchy is divided into four levels-green (excellent), blue (good), orange (fair), and red (poor).Yang et al. [43] established the criteria for judging the four levels of individual indicators in tourism safety early warning according to China's actual situation.Refs.[29,[33][34][35]44] adopted the index system and judgment criteria, as shown in Table 1.

Introduction to the Principle of the Projection Pursuit Classification Model
In 1974, Ref. [37] first proposed the PPC model, which projects high-dimensional, non-linear, and non-normally distributed (small-sized) sample data points into one-to-threedimensional subspaces in different projection directions (spatial angles) to find the optimal projection direction (an interesting direction) that best reflects the structural characteristics and laws of the original high-dimensional sample data, to obtain the one-dimensional projected value (evaluation value) of the high-dimensional sample data [30,31,37,38].Its basic idea of modeling is to form all the sample projection points (values) into several classes (clusters), which are dispersed as much as possible in the global scope.In contrast, the sample points within a category are locally as dense as possible to obtain the optimal projection direction and then realize the samples' comprehensive evaluation, ranking, and classification.It can be seen that the basic modeling idea of PPC is consistent with human thinking of comprehensive assessment, sorting, and classification-it is best to divide all samples into several classes.The class differences are as significant as possible (for easy to classify).In contrast, the models in one category are as similar as possible so that they can be substituted.
According to the basic modeling idea of PPC, Refs.[41,42] proposed and constructed the projection pursuit dynamic cluster (PPDC) model with the following modeling principle.
To initialize → a (a(1), a(2), ..., a(p)) and set a(j) (j = 1, 2, ......, p ) to be the projection vector coefficient or weight, we can obtain the projected value of the ith sample, where x(i, j) is the normalized value of the ith sample and the jth indicator.
To make the samples' projection points z(i) constitute the set Φ = {z(1), z(2), ......, z(n)}, and according to the prior knowledge or experience, z(i) is dynamically clustered (classified) into K (K < n) categories.Then, the dynamic clustering process of the sample points is as follows [41,[46][47][48]: (1) Setting t(i) = (K−1)[z(i)−z min ] z max −z min + 1 (z max and z min are the maximum and minimum values of z(i)).If t(i) is closest to the positive integer k, then z(i) is classified as the category k (1 ≤ k ≤ K).In this way, we classify all sample points z(i) into categories K, record them as Ψ 0 = ψ 0 1 , ψ 0 2 , ..., ψ 0 k , and record the cluster kernels as M 0 = A 0 1 , A 0 2 , ......, A 0 K .(2) Iteratively calculate the weights a(j) and thus obtain the projected value of the ith sample z(i).We classify all the sample points in Φ into the K categories according to the principle of proximity (the shortest absolute distance between the point i and the K kernels) and record them as where , and r A 0 q − z(i) are the absolute distances between the sample point z(i) and the A 0 k and A 0 q cluster kernels, respectively.(2) Form new K cluster kernels k , n k is the number of sample points in ψ 1 k .(4) Repeat above steps (2)~(3) to obtain the dynamic clustering result sequence , we obtain the sum of absolute distances between the intra-category sample points of each category after the lth iteration as follows: It has been theoretically proved that the above iterative process is convergent [41,[46][47][48], and the convergence condition for terminating the iteration is where ε is a sufficiently small allowable value (or a user-specified threshold).
The sum of absolute distances between all two sample points after the iteration is The sum of absolute distances between intra-category two sample points is The denser the intra-category sample points are, the smaller the value of d → a is.The sum of absolute distances between samples of different categories is The larger the sd Friedman's initial idea of proposing the PPC model is that the sample points are divided into several categories, with each category separated as much as possible, and the sample points within each category are as dense as possible.According to the initial idea, Ref. [42] put forward the objective function of the PPDC model: Optimizing Equation (8), we can obtain the optimal projection vector; the value of is as large as possible and, at the same time, the value of d → a is as small as possible.That is to say, the PPDC model theoretically achieves the goal of "the sample points are as scattered as possible as a whole and form several categories, which are separated from each other as much as possible, while the sample points within a category are as dense as possible".Ref. [41] set up the PPDC objective function as max[s which only achieves the goal of dispersing the sample points as much as possible on the whole, which is inaccurate and cannot realize Friedman's initial idea.From the perspective of the PPDC modeling process, the core problem is to determine a reasonable number of categories K and select suitable initial clustering kernels.The number of categories or the initial clustering kernels is different, which may lead to changes in the results.
Equation ( 8) is a high-dimensional and non-linear optimization problem containing equality and inequality constraints and is challenging to solve.To this end, the authors compile a MATLAB program based on group search optimization (GSO) [48,49] with better global convergence performance and faster convergence speed to solve the optimal projection vector → a .Because the parameters in Equations ( 1)-( 8) are composed of absolute distances between different sample points, the theorem proposed by Lou and Qiao [38] to judge whether the optimization process has obtained the proper globally optimal solution is applicable in the PPDC model.
The studied results show that (1) indicator x 3 is the most important one and has the most impact on tourism safety, indicator x 7 is the second, and then x 8 , x 9 , etc.The indicator x 2 is the least important one.The ratio of the maximum coefficient to the minimum coefficient is 1.88, which shows that all indicators are essential; (2) the projected values of samples through the PPDC model of the three cut-off samples between adjacent safety levels are 2.4694, 2.2207, and 1.9425, respectively.The PPDC model's output values of samples with excellent, good, fair, and poor tourism safety modes are within the range of ≥2.4694, 2.2207 to 2.4693, 1.9425 to 2.2206, and <1.9424, respectively.The model output value ranges of excellent samples T1~T10, good samples T11~T20, fair samples T21~T30, and poor samples T31~T40 are 2.5169~2.6296,2.2773~2.3963,2.0906~2.1802,and 1.6635~1.8971,respectively.Therefore, the safety levels of the 40 samples are all correctly identified.
We input the normalized data of samples T41~T48 into the above-established PPDC model and obtain the output values 2.6062, 2.4215, 2.4102, 2.2951, 2.0826, 1.9954, 1.8514, and 1.6421, respectively.It is convenient to determine their tourism safety levels as excellent, good, good, good, fair, fair, poor, and poor.The results are consistent with Lou et al. [29] using the TOPSIS method based on information entropy weighing and Yang et al. [36] using the expert method as well as the established BPNN model.The sketch diagram of the output values of T1-T48 and the cut-off values between the adjacent classes are shown in Figure 1.

Empirical Study with Sample Data by Wang and Li [44]
Ref. [44] established the evaluation index system consisting of ten indicators, namely the service saturation of tourism facilities ( ), the administration's ability ( ), the safety degree of hydrometeorological at a destination ( ), the frequency of geological disasters ( ), the social unemployment rate ( ), the social safety stability ( ), the safety degree of traffic ( ), the frequency of epidemic diseases ( ), the rate of consumer price index increase ( ), and the potential index of host-guest cultural conflict ( ).Ref. [44] collected all 13 samples denoted as W1~W13, in which samples W1, W5, and W9 are no-alarm, W2, W3, and W8 are light-alarm, W4, W7, and W10 are medium-alarm, W6 is serious-alarm, and W11~W13 are the verification samples, as shown in Table 3.

Empirical Study with Sample Data by Wang and Li [44]
Ref. [44] established the evaluation index system consisting of ten indicators, namely the service saturation of tourism facilities (x 1 ), the administration's ability (x 2 ), the safety degree of hydrometeorological at a destination (x 3 ), the frequency of geological disasters (x 4 ), the social unemployment rate (x 5 ), the social safety stability (x 6 ), the safety degree of traffic (x 7 ), the frequency of epidemic diseases (x 8 ), the rate of consumer price index increase (x 9 ), and the potential index of host-guest cultural conflict (x 10 ).Ref. [44] collected all 13 samples denoted as W1~W13, in which samples W1, W5, and W9 are no-alarm, W2, W3, and W8 are light-alarm, W4, W7, and W10 are medium-alarm, W6 is serious-alarm, and W11~W13 are the verification samples, as shown in Table 3.The negative indicators x 3 ∼ x 5 , x 8 ∼ x 10 are first positively preprocessed according to (100 − x) and then linearly normalized.We input the first ten samples' data into the PPDC model program compiled by the authors, take the number of classifications K = 4, and obtain the optimal projection vector and the coefficients → a (a(1), a(2), ..., a(10)) = (0.5441, 0.3036, 0.2394, 0.2554, 0.2577, 0.3174, 0.3583, 0.2829, 0.0410, 0.3348), the objective   and 2.0775, respectively.The clustering results of the ten samples are three no-alarm samples (W1, W5, and W9), three light-alarm samples (W2, W3, and W8), three medium-alarm samples (W4, W7, and W10), and one serious-alarm sample (W6).Therefore, the tourism safety levels judged through the PPDC model for the ten samples are all correct.The output value ranges of no-alarm, lightalarm, and medium-alarm samples are 2.6831~2.7232,2.3347-2.4092,and 2.0765-2.1419,respectively, and the output value of the serious-alarm selection is 1.6936.The PPDC model output values of samples W11~W13 are 2.4073, 1.6703, and 2.6412, respectively.It is easy to judge the samples W11~W13 as light-alarm, serious-alarm, and no-alarm.The judged results are entirely consistent with the expert opinion as well as the results using the BPNN model in Ref. [44].
The studied result shows that the indicator (x 1 ) (the service saturation of tourism facilities) is the most important one, followed by the second is the indicator (x 7 ) (the safety degree of traffic), and the least important is the indicator (x 9 ) (the rate of consumer price index increase).

The Empirical Study with Sample Data by Zhu et al. [45]
Zhu et al. [45] established a different evaluation index system than Refs.[43,44].The system consists of 19 indicators, namely administration ability (x 1 ), political stability (x 2 ), social unemployment rate (x 3 ), the rate of consumer price index increase (x 4 ), the potential index of host-guest cultural conflict (x 5 ), social safety stability (x 6 ), the safety degree of land transportation (x 7 ), the safety degree of waterborne transportation (x 8 ), the safety degree of aviation (x 9 ), the safety degree of tourism facilities (x 10 ), the occurrence frequency of fire (x 11 ), food hygiene qualification rate (x 12 ), the utilization rate of scenic spot facilities (x 13 ), hotel occupancy rate (x 14 ), the safety degree of hydrology (x 15 ), the safety degree of meteorology (x 16 ), the safety degree of seismology (x 17 ), the safety degree of geology (x 18 ), and the frequency of epidemic diseases (x 19 ), respectively.
Zhu et al. [44] collected 12 samples represented by Z1~Z12, of which samples Z1~Z11 are modeling samples used to optimize the PPDC model, and Z12 is a verification sample.Since Zhu et al. [44] did not specify the nature of the evaluation indicators, in this paper, we judged their natures according to Ref. [43] and indicator characteristics.The indicators x 3 , x 4 , x 5 , x 12 , x 15 , x 16 , x 17 , x 18 , and x 19 are negative; the maximum values of indicators x 3 , x 4 , and x 5 are 0.30, 0.30, and 0.50, and that of the other negative indicators is 1.0.The positively transformed data of the collected samples is shown in Table 4.The positively transformed data is directly input into the PPDC program compiled by the authors.We take the number of classifications K = 4, obtain the optimal projection vector and its coefficients (or weights) The output values of samples Z1~Z11, according to the established PPDC model, are 2.0869, 1.9932, 1.8932, 1.6993, 1.5498, 1.4716, 1.3077, 1.2168, 1.1276, 0.8622, and 0.7791, respectively.It can be seen that the rank of the 11 samples is consistent with that of the expert opinion and that obtained through the established BPNN model in Zhu et al. [44].The PPDC model dynamically clusters the above 11 samples into no-alarm (samples Z1-Z3), light-alarm (samples Z4-Z6), medium-alarm (samples Z7-Z9), and serious-alarm (samples Z10-Z11).We input the linearly normalized data of Z12 into the above-established PPDC model and obtain the output value 1.8349.Comparing the above output values of different safety levels, we can judge that the risk of sample Z12 is greater than Z3 and smaller than Z4.Comparing the model output value ranges of the above four risk levels, we can quickly determine that sample Z12 is no-alarm according to the proximity principle.This result is entirely consistent with that of the expert opinion in Zhu et al. [44].From the weights of x 3 ∼ x 5 , x 12 , x 15 ∼ x 19 being less than 0, we can conclude that these indicators are either negative indicators (e.g., the indicator x 3 ∼ x 5 ) or have a significant negative correlation with other positive indicator data, but these indicators themselves (e.g., the indicator x 12 , x 15 ∼ x 19 ) are undoubtedly positive.At the same time, the indicator x 11 (the occurrence frequency of fire) should be a negative indicator, but due to its significant positive correlation with other positive indicators (such as the administration's ability and social safety stability) (the authors are surprised by the results and wonder if there are some mistakes in the samples' data?), its weight is greater than 0.
The sketch diagram of the weights is shown in Figure 2. The absolute value of the weight determines the indicators' importance, and the greater the absolute value is, the more critical the indicator is.
dicators are either negative indicators (e.g., the indicator  ~ ) or have a significant negative correlation with other positive indicator data, but these indicators themselves (e.g., the indicator  ,  ~ ) are undoubtedly positive.At the same time, the indicator  (the occurrence frequency of fire) should be a negative indicator, but due to its significant positive correlation with other positive indicators (such as the administration's ability and social safety stability) (the authors are surprised by the results and wonder if there are some mistakes in the samples' data?), its weight is greater than 0.
The sketch diagram of the weights is shown in Figure 2. The absolute value of the weight determines the indicators' importance, and the greater the absolute value is, the more critical the indicator is.From Figure 2, we can conclude that the indicator of the hotel occupancy rate (x 14 ) is the most important one, then the utilization rate of scenic spot facilities (x 13 ) and the occurrence frequency of fire (x 11 ).The social unemployment rate (x 3 ) is the least important indicator.[43][44][45] Refs.[43][44][45] have 10, 10, and 19 evaluation indicators and 48, 13, and 12 samples, respectively.So, the three cases all belong to the evaluation problem with a small sample and cannot be modeled by PCA, FA, and BPNN [28,[50][51][52][53], etc.Still, non-linear methods, such as TOPSIS, GRA, etc., can evaluate them based on information entropy and variation coefficient weighing.The verification results of the PPDC models established in this paper show that (1) the results judged by the PPDC models are entirely consistent with that of the expert judgment-based studies [43][44][45] and those obtained with the TOPSIS method based on the information entropy weighing [29]; (2) the results judged for the eight verification samples in Yang et al. [43] are entirely consistent with the results of Lou et al. [29] and Luo [33][34][35], except that sample T42 differs by one level (the result drawn in this paper is more reasonable, and please refer to Lou et al. [29] for specific analysis); the results of other samples are the same as those of expert-judgment-based Yang et al. [43].The judgment results of Wang and Li [44] and Zhu et al. [45] are consistent with those obtained through the expert method.The judgment results of Zhu et al. [45] are consistent with those of [29];

Analysis of the PPDC Modeling Results of the Data
(3) the ranking of the 11 modeled samples for the tourism risks is consistent between Zhu et al. [45] and the expert opinions.

The Evaluation Indicators Affecting the Levels of Tourism Safety Risk and the Importance of the Three Major Aspects
We take the modeling results of Yang et al. [43] as an example.Among the ten evaluation indicators in Ref. [43], the indicator x 3 (the frequency of epidemic diseases) has the most significant impact on tourism safety, followed by the unemployment rate (x 7 ), and then, in sequence, social safety stability (x 8 ), the rate of CPI increase (x 9 ), political stability (x 6 ), traffic safety (x 5 ), a potential indicator of host-guest cultural conflict (x 10 ), service saturation of tourism facilities (x 4 ), frequency of hydrometeorological disasters (x 1 ), and the frequency of earthquakes and geological disasters (x 2 ).Of the indicators, the first three are significantly more important than the others, while the last three are considerably less critical.Of the three major aspects, the safety degree of a destination is the most important, accounting for 55.6% of the total weight, followed by the frequency of destination disasters (25.5%) and the safety of tourism facilities (18.9%).Therefore, for studying a destination's tourism safety and early warning, it is necessary to firstly examine the frequency of epidemic outbreaks such as plague in the destination and to secondly examine factors affecting residents' lives and political stability, such as unemployment rate, social safety stability, and the rate of CPI increase.In contrast, the safety of tourism facilities is relatively less critical.

Measures and Suggestions to Reduce Tourism Safety Risks
There is a linear relationship between the sample's projected value (the destination tourism safety score) z(i) = ∑ p j=1 a(j) • x(i, j) and each evaluation indicator.Therefore, on the whole (highly intuitively), improving the indicator with a larger weight has a more obvious effect on reducing tourism safety risks, usually yielding twice the result with half the effort; otherwise, it would obtain half the result with twice the effort.However, as far as a specific tourist destination is concerned, specific problems should be analyzed accordingly.Specifically, we should analyze the situation of the indicators with the larger weights, confirm the weaker indicators with lower scores, and take targeted measures to improve them.It is an effective measure to reduce the risk of tourism.If measures are blindly taken to improve those indicators with good performance, and it is challenging to improve them further, then, in practice, a large investment would lead to only minor improvement.

Characteristics of the PPDC Model, Comparison with the other Conventional Evaluation Methods, and the Applicability Analysis
We concluded that the following features characterize the PPDC model for tourism safety evaluation and early warning: (1) Clear mathematical meaning of Equation (8), which clearly shows that the samples are divided into K classes, and the samples in different classes are separated as much as possible.In contrast, samples in a class are as dense as possible.No other clustering methods can achieve this goal; (2) The PPDC model is as simple as possible with not too complicated of a modeling process and is convenient for subsequent applications; (3) It has sufficient accuracy; (4) For the given sample data, we can establish a unique deterministic model, without ambiguity; (5) The PPDC model is particularly suitable for tourism safety evaluation and early warning with a high-dimensional, non-linear, non-normal distribution and small sample data; (6) According to the results of the PPDC model, we can directly judge the importance of each evaluation indicator and thus put forward suggestions and measures to reduce the tourism safety risks.The advantages and disadvantages of various comprehensive evaluation methods are discussed and analyzed as below.
(1) PPDC model.From a methodological point of view, the idea of a PPDC model is entirely consistent with the way of thinking of human beings for a comprehensive evaluation.Hence, a PPDC model is very suitable for the evaluation, classification, and ranking of tourism safety evaluation and early warning.First, the PPDC model can be applied to the modeling with a high-dimensional, non-linear, non-normal distribution sample data, particularly small samples.In contrast, PCA and FA are only suitable for cases with a strong linear correlation between evaluation indicators (or KMO is greater than 0.60) and obeying normal distribution.It overcomes the defect of the "curse of dimensionality" of PCA, FA, and BPNN models, which are mainly suitable for large sample data [28,48,51,53,54].Second, according to the theorem proposed by Lou and Qiao [31], it is easy to judge whether the optimization process has obtained the proper, globally optimal solution, and the modeling difficulty is moderate.[46] are detailed given and discussed by Lou et al. [29].Although Refs.[43][44][45] all described that they obtained the correct results, the established BPNN models still have no generalization ability, and their results are not reliable and accurate.(3) TOPSIS and GRA models based on Information Entropy (Lou et al., [29]), variation coefficient [62], and other weighing methods.The rationality of the results of these models directly depends on the rationality of the weighing, which is one of the critical contents of the comprehensive evaluation.This notwithstanding, the TOPSIS and the GRA methods are non-linear evaluation models and cannot be used to judge the relative importance of evaluation indicator and its ranking directly.Furthermore, TOPSIS and GRA models cannot be directly applied to clustering research, and other methods must be used to classify the results.The ranking of the verification samples T41-T48 through the PPDC model is entirely in agreement with that of TOPSIS [29].(4) Variable fuzzy set (VFS) method [26][27][28].The VFS method requires the determination of the weights and relative membership function of each evaluation indicator in advance.Accompanied by the ambiguity feature, it is a non-linear evaluation method that cannot be used to judge the importance and ranking of the evaluation indicator directly.The rankings of sample T42 in Luo [33,34] are inconsistent.
Therefore, PCA, FA, and BPNN cannot be used to model the tourism safety evaluation and early warning problem with small samples.The PPDC model has better characteristics than the TOPSIS and the GRA models based on information entropy or variation coefficient weighing and should be used preferentially in the research on early warning and tourism safety evaluation.

Conclusions
Tourism has become a pillar industry in China and its provinces and cities, such as Beijing, Shanghai, Guizhou, Yunnan, etc.The sustainable development of the tourism industry is one of the critical signs for a country or region to achieve sustainable and highquality economic and social development.It can not only drive employment and protect the ecological environment and historical sites, but also inherit and develop excellent traditional culture, strengthen exchanges and mutual learning between different cultures, and build a community with a shared future for humanity.The development of the tourism industry is influenced by various factors such as the natural environment, transportation, weather, etc. Tourism safety evaluation and early warning are the prerequisites and foundation for implementing sustainable development strategies and formulating policies in the tourism industry.They significantly impact the sustainable development of the local tourism industry.
The modeling idea of the PPDC model is in agreement with the way human beings think for a comprehensive evaluation.The PPDC model realizes the following goal: the sample points are as scattered as possible as a whole and form several clusters (classes), which are as scattered as possible, while the samples within a class are as dense as possible.It is characterized by clear mathematical meaning, a linear relationship between the destination tourism safety degree and evaluation indicator, and is highly convenient for subsequent applications.The PPDC model can be used for modeling with high-dimensional, non-linear, and non-normally distributed data and can overcome the defect of the "curse of dimensionality".It has unique advantages in comprehensive evaluation, ranking, and classification research in small samples, such as tourism safety evaluation and early warning.It has better objectivity and versatility than other models.The results of the cases show that the results judged through the PPDC model are entirely consistent with those judged through expert opinion and those obtained through the TOPSIS method based on information entropy weighing.
The BPNN model is mainly suitable for large samples, and only by following the required principles and steps could it establish a model with good generalization ability and practical value.Establishing a BPNN model with good generalization ability is impossible using small-and medium-sized samples such as tourism safety evaluation and early warning.Although the TOPSIS, GRA, and VFS models can achieve reasonably good results, they are characterized by the shortcomings of non-linearity, inconvenience for subsequent application, and the need for information entropy or variation coefficient weighing.They cannot be used to determine the importance and ranking of indicators.Therefore, with a small sample, such as tourism safety early warning research, the PPDC model should be preferably used.

→a
is, the more scattered the sample points on the whole are.The above s → a , d → a , and sd → a are only relevant to the projection vector, coefficients, and sample data.

17 Figure 1 .
Figure 1.The sketch diagram of the output values of the T1-T48 samples and the cut-off values between the adjacent classes.Notes:The lines E-G, G-F, and F-P denote the cut-off values between the excellent and the good, the good and the fair, and the fair and the poor, respectively.The ordinate represents the output value of the samples, and the abscissa represents the samples' number.

Figure 1 .
Figure 1.The sketch diagram of the output values of the T1-T48 samples and the cut-off values between the adjacent classes.Notes: The lines E-G, G-F, and F-P denote the cut-off values between the excellent and the good, the good and the fair, and the fair and the poor, respectively.The ordinate represents the output value of the samples, and the abscissa represents the samples' number.

Figure 2 .
Figure 2. The sketch diagram of the weights of the 19 indicators.Note: The ordinate represents the weights, and the abscissa represents the 19 indicators.

Table 1 .
The standard ranges of individual indices in tourism safety early warning and the experts' judged alarming modes.

Table 2 .
The positively transformed data of all 48 samples denoted by T1~T48.

Table 3 .
The 13 samples with the safety levels judged by the PPDC model.

Table 3 .
The 13 samples with the safety levels judged by the PPDC model.

Table 4 .
The positively transformed data of the 12 collected samples with 19 indicators and the expert opinion (denoted y).
[43][44][45][61][59][60] characterized by clear mathematical meaning, a linear relationship between the sample projected value (tourism safety score) and the evaluation indicators, and being highly convenient for subsequent applications, facilitating the direct determination of the importance and ranking of each evaluation indicator.Fourth, in the optimization process of the PPDC model, the weights of the evaluation indicator and the projected values of the sample are obtained simultaneously.There is no need to use other methods to determine the indicator weight and the sample's expectation in advance, like the BPNN model, avoiding the influence of human factors to the greatest extent.Fifth, the empirical research shows that the results of the PPDC model are in good agreement with expert opinions, indicating sufficient accuracy of the PPDC model.Sixth, the PPDC model can integrate expert knowledge or decision-maker preference into its constraints to establish a PPDC model based on decision-maker preference.In contrast, all other comprehensive evaluation methods do not have this trait.Of course, when we build a PPDC model, we should determine a suitable number of classifications and reasonable initial clustering kernels.(2)BPNNmodel.Artificial neural networks are the most popular machine learning algorithm chosen to perform a risk assessment and safety early warning[55][56][57][58][59][60], particularly the BPNN model.Establishing a BPNN model with generalization ability and practical value must follow the required principles and steps[28,51,53,54,61].First, the BPNN model is only suitable for modeling with large sample data, and it faces the defect of the "curse of dimensionality".Second, it is necessary to use the "trial-and-error" method to manually determine the reasonable number of hidden layer nodes and other parameters and to judge whether there is "over-training" and other phenomena in the training process, and this method is highly subjective.For the given sample data, it is impossible to obtain exactly the same two models by establishing BPNN models twice before and after, suggesting the existence of ambiguity.Third, the BPNN model, as a non-linear model, cannot be directly used to judge the relationship between the model results and evaluation indicators, to judge the importance of indicators, and to propose measures and suggestions to reduce tourism safety risks, which greatly reduces the theoretical and practical significance of tourism safety evaluation and early warning.Therefore, the BPNN model is only used when other methods cannot achieve or face difficulty achieving satisfactory results and the conditions for BPNN modeling are met.Ref.[51]putforward a rule of thumb: you should aim to have at least five times as many cases (training samples) as connection weights in the network and preferably ten times as many for establishing a reliable and effective BPNN model[61].The number of data in[43][44][45]is 48, 10, and 10, and the network topology used is 19-14-1 (the connection weights are 295), 10-5-4 (the connection weights 79), and 10-5-4, respectively.There are no validation samples used to monitor the training process; it is impossible to judge whether over-training has occurred.In the case of over-training occurring, even if the error of the training dataset is minimal, and even if the RMSE of the test dataset is casually small, the established model has no generalization ability and practical value.Scholars should pay more attention to this problem.The number of connection weights is obviously greater than the number of the samples in Refs.[43-45].The rule of thumb is not obeyed, and the established BPNN models in Refs.[43-45] cannot have generalization ability and practice value [51,53,54,61].The possible results in establishing a BPNN model with the data used by Yang et al.