Predicting Foreign Tourists for the Tourism Industry Using Soft Computing-Based Grey – Markov Models

Accurate prediction of foreign tourist numbers is crucial for each country to devise sustainable tourism development policies. Tourism time series data often have significant temporal fluctuation, so Grey–Markov models based on a grey model with a first order differential equation and one variable, GM(1,1), can be appropriate. To further improve prediction accuracy from Grey–Markov models, this study incorporates soft computing techniques to estimate a modifiable range for a predicted value, and determine individual state bounds for the Markov chain. A new residual value is formulated by summing the transition probability matrices with different steps. The proposed grey prediction model was applied to foreign tourist forecasting using historical annual data collected from Taiwan Tourism Bureau and China National Tourism Administration. The experimental results show that the proposed grey prediction model performs well in comparison with other Grey–Markov models considered.


Introduction
Development of the global tourism industry has contributed significantly to economic flourishing for a country.In 2016, the World Travel and Tourism Council estimated a 3.1% growth rate for the tourism industry, which was larger than the estimated global GDP growth (2.3%).The global tourism industry contributes significantly to employment, providing 107.83 million jobs or 3.6% of total employment in 2015, and will account for 135.88 million jobs by 2026.By 2026, capital investment is estimated to be USD 1254.2 billion, with international tourist arrivals expected to be 1.93 billion, generating expenditure of USD 2056.0 billion.Thus, the foreign visitors expenditure contributes much to the global tourism industry.
Accurate prediction of foreign tourist numbers has become crucial for governments to be able to set up relevant sustainable tourism development and marketing strategies to promote the tourism industry.National authorities should carefully consider the changing number of foreign tourists.The variety of international tourism has raised a challenging task for foreign tourist prediction [1].Grey prediction models [2] have drawn much attention because they can characterize an unknown system from limited data [3][4][5], without requiring conformance to statistical assumptions, such as normal distributions.The widely used grey model with a first order differential equation and one variable, GM(1,1), for example, can be set up using only four recent sample data points [6][7][8][9][10][11][12][13].
A residual model is often constructed to improve GM(1,1) prediction [3,14] where the predicted residuals can be used to adjust GM (1,1) predicted values.Many studies have demonstrated the Grey-Markov model, denoted by MCGM (1,1), can significantly improve prediction accuracy over the original GM (1,1) [1,[15][16][17][18][19][20].MCGM (1,1) uses GM (1,1) to identify the trend of historical data, and then applies the Markov chain to correct the residuals.Hsu and Wen [21] and Hsu [22] used Markov chain sign estimation to modify residuals for the air passenger market and global integrated circuit industry, respectively.Hsu et al. [16] proposed a Markov Fourier model to forecast stock market turning time.Kumar and Jain [18] applied MCGM (1,1) to predict conventional energy consumption.Li et al. [19] combined the regression model with Markov chain for thermal electric power generation.Mao and Sun [23] applied MCGM(1,1) for fire accident prediction.Sun et al. [1] proposed a MCGM(1,1) variant using the Cuckoo search algorithm for foreign tourist arrival prediction, and Wang [24] showed MCGM(1,1) effectiveness for tourism demand prediction.Xie et al. [20] proposed a novel Markov model to estimate the probability that one energy component could transit to another energy component.
The combination of grey prediction and soft computing can better represent system dynamics with uncertainty and nonlinearity [16].Thus, this paper concentrates on building an effective MCGM(1,1) model multi-step transitions for predicting foreign tourist numbers, incorporating soft computing.The proposed model explicitly considered the following issues.First, the upper and lower bounds of individual states for the Markov chain are usually required to be known in advance.However, these troublesome bounds are not easy to determine beforehand.Therefore, we adopted the genetic algorithm (GA), a powerful optimization method [25][26][27][28], to maximize prediction accuracy for the MCGM(1,1) model.
Second, the original MCGM(1,1) usually uses the whole of a Markov chain predicted residual to modify a predicted value from GM(1,1).However, this restriction could impact on MCGM(1,1) prediction accuracy.Therefore, we require a modifiable range rather than using the whole range.Since this involves a connection between time periods and modifiable ranges, we employed the functional link net (FLN) with effective function approximation [29][30][31][32] to estimate a modifiable range for a time period.Finally, the developing coefficient and control variable associated with the original GM(1,1) are usually determined by the background value.However, this background values is difficult to determine accurately.On the other hand, neural network based GM(1,1), denoted by NNGM(1,1), [14,33] using a single-layer perceptron (SLP) can avoid the requirement for the background value.Therefore, the current study incorporated NNGM(1,1) into MCGM (1,1).Therefore, this study proposes a novel soft computing based MCGM(1,1) (SC-MCGM(1,1)).
The remainder of the paper is organized as follows.Section 2 introduces the NNGM(1,1) model and Section 3 presents the proposed SC-MCGM(1,1) model.Section 4 describes the construct of the proposed model using GA, and Section 5 examines the model performance using two real cases of foreign tourist forecasting.Section 6 provides our conclusions.This paper is concluded with Section 6.

NNGM(1,1) for Generating Predicted Values
Using the accumulated generating operation (AGO) [3], a new sequence x (1) = (x n ) can be generated from an original data sequence x (0) = (x and x (1) 2 , . . .,x n are approximated by the first-order differential equation, where a is the developing coefficient and b is the control variable.AGO also helps identify regularity hidden in data sequences, even if the collected data are finite, insufficient, and chaotic.
The predicted value, x(1) k , associated with x k can be derived from the differential equation with the initial condition x (1) 1 holds.The predicted value of x k can be obtained by using the inverse accumulated generating operation, Therefore, where a and b can be estimated from the grey difference equation where z k is the background value.However, z k is not easily determined.Therefore, to obtain a and b without requiring z k , an NNGM(1,1) model was established using a single layer perceptron (SLP) accompanied by the cost function where a and b serve as connection weights for the SLP, and the learning rules can be easily derived by using the gradient descent method on E(a,b).For further NNGM(1,1) details, the reader is referred to [33].

Generating Transition Probability Matrices
For the proposed residual modification model, we applied the Markov chain to modify the residuals produced by the NNGM (1,1).Let ε = (ε 1 , ε 2 , . . .,ε n ) denote the sequence of residual values obtained from training data, where [ε min , ε max ] denotes the residual range, where ε min and ε max are the minimum and maximum values of ε k , respectively, and [ε min , ε max ] can be divided into r intervals (r ≥ 2) with each interval treated as a state.The actual state of ε k , denoted by s k , can be determined depending on where it locates.r − 1 partition points, p 1 , p 2 , . . ., p r−1 , can be defined for r intervals, where Subsequently, an m-step transition probability matrix P (m) can be generated from training patterns as where p (m) ij represents the transition probability of going from state i to state j (1 ≤ i, j ≤ r) by m steps, where represents the number of transitions of going from state i to state j by m steps; and t i represents the amount of state i among the sequence of relative errors.p ii can be specified directly as 1 when the sum of elements in the row i equals zero.In other words, such a state is treated as an absorbing state.

Determining Centers for Individual States
Sun et al. [1] recommended that the number of intervals can be defined by Sturge formula [34]: Let c w (1 ≤ w ≤ r) be the representative point of state w, whose lower and upper bounds are l w and u w , respectively.For convenience, c w is traditionally formulated as Nevertheless, it was more reasonable to formulate c w as [1,19] where 0 ≤ α w ≤ 1.

Computing Predicted Residual Values
Let s denote the state that corresponds to m transitions ahead of s k associated with ε k .To determine the predicted state ŝk for time period k, at most k-1 (k ≥ 2) transitions ahead can be considered.The actual states including s k , . . ., and s For instance, to determine ŝ3 , one and two transition steps from s 2 and s 1 , respectively, can be used for two transitions ahead of ŝ3 (i.e., m = 2), and s 2 and s 1 are s

and s
(2) 3 , respectively.Then, ŝ3 can be simply determined from s 2 and s 1 even though m > 2.

Let p s
) denote the row vector in P (m) corresponding to s where 1 ≤ l ≤ r.Otherwise, To explain v kl as the degree in [0,1] to which εk locates in state l, Traditionally, state l can be directly assigned to ŝk associated with the predicted residual value εk [29,34] when Then, state l is the reachable state with the maximum likelihood for εk .In such a case, εk just equals c l .However, in addition to c l , c i (i = l) can also contribute to εk if v ki = 0. Therefore, considering the contribution from different representative points for εk , v kl can be normalized as and εk can becomes εk

FLN for Determining New Predicted Values
We apply εk obtained from the Markov chain to adjust x(0) k , calculating the new predicted value as where y k ranges from −1 to 1 and can be interpreted as the degree to which x(0) k can be adjusted.That is, if y k is positive, then larger y k means it is more likely that x(0) k will be adjusted toward How to obtain y k is left to FLN.Let t k ∈ denote the time period k with respect to x(0) k .For one variable x, a FLN with functional-expansion expansion like {x, sin(πx), cos(πx), sin(2πx), cos(2πx),...} is effective to approximate a nonlinear function associated with x [29,30,35].In principle, the components in the functional expansion representation can be unrestrictedly extended for x, but this is not practical.However, (t k , sin(πx), cos(πx), sin(2πx), cos(2πx), sin(4πx))) with respect to x is acceptable [30,31].Using this pattern, the corresponding actual output value can be obtained from the output node as y k = tanh(w 1 t k + w 2 sin(πt k ) + w 3 cos(πt k ) + w 4 sin(2πt k ) + w 5 cos(2πt k ) + w 6 sin(4πt k ) + θ) (21) where w 1 , w 2 , w 3 , w 4 , w 5 , and w 6 are connection weights; tanh denotes the hyperbolic tangent function; and θ is the bias to the output node.

A Genetic Algorithm for Constructing the SC-MCGM(1,1)
The problem of constructing the SC-MCGM(1,1) with high prediction accuracy can be formulated as maximizing the reciprocal of MAPE for training data, where TS denotes training or testing data.To minimize MAPE, a real-valued GA was developed to automatically determine 6 + 2r parameters that are not easily directly accessed, including the connection weights (w 1 , w 2 , w 3 , w 4 , w 5 , w 6 ), bias (θ), partition points (p 1 , p 2 , . . ., p r−1 ), and relative weights in respective intervals (α 1 , α 2 , . . .,α r ) for the proposed SC-MCGM(1,1) model, where w 1 , w 2 , w 3 , w 4 , w 5 , w 6 , and θ range from −1 to 1, and p 1 , p 2 . . .,p r−1 range from ε min to ε max .For the current population, the best solution was the chromosome with the maximum fitness value.
Let n size and n max denote the population size and maximum number of generations (i.e., GA terminated after n max generations), respectively.Each of the populations consists of n size strings.After evaluating the fitness value of each chromosome in P m , n size new chromosomes were generated for P m+1 by selection, crossover, and mutation.Crossover and mutation reproduce children for a selected parent by changing the parents' chromosomal makeup.When the stopping condition was satisfied, the algorithm terminated, and the best chromosome is the one with maximum fitness value among consecutive generations.This best case can then be used to examine the SC-MCGM(1,1) model generalization.

Selection
Let P m denote the population in generation m (1 ≤ m ≤ n max ), where chromosome u (1 ≤ u ≤ n size ) produced in P m is represented as w m u,1 w m u,2 w m u,3 w m u,4 w m u,5 w m u,6 θ m u p m u,1 p m u,2 . . .p m u,r−1 α m u,1 α m u,2 . . .α m u,r .Two chromosomes were randomly selected from P m by binary tournament selection, and the one with higher fitness was put into a mating pool [36].This process was repeated until n size chromosomes were in the mating pool.

Crossover
To generate new chromosomes in the next population, 1  2 n size pairs of chromosomes from the pool were randomly selected from the current population, and offspring of the selected parents were reproduced by crossover and mutation.For chromosomes u and v (w , each pair of real-valued genes was used to generate two new genes with crossover probability Pr c , where h 1 , h 2 , . . .,h 6+2r are random numbers between 0 and 1.

Mutation
Mutation occurred with probability Pr m for each real valued gene in a newly generated chromosome produced by crossover.A low mutation rate was used to avoid excessive perturbation.When a mutation occurs, that gene was altered by adding or subtracting a tiny number randomly selected from a pre-specified interval.Subsequently, n del (0 ≤ n del ≤ n size ) chromosomes in P m+1 were randomly removed from the set of new chromosomes (formed by genetic operations) to allow additional copies of the chromosome with maximum fitness values in P m .Only two or three elite chromosomes are sufficient to generate better results [37].

Algorithm for Constructing the Proposed Model
The GA for constructing the proposed SC-MCGM(1,1) is briefly described below.

Step 1. Initialization
Generate n size chromosomes.

Step 2. Compute fitness values
Compute the fitness value of each chromosome in the current population.

Step 3. Generate new chromosomes
Generate n size new chromosomes from the current population using selection, crossover, and mutation.

Step 4. Apply elitist strategy
Randomly remove n del strings from the newly generated n size strings, and replace them with n del best chromosomes in the current population.

Return to
Step 2 if the stopping condition is not satisfied.
In contrast to the original GM(1,1) and the MCGM(1,1) models, which use all observed data, the SGM model first used a rolling mechanism to determine the set of newly observed data, and then constructed the GM(1,1) model.The rolling mechanism could select only a few recent data by capturing the developing trend from all observed data.The training data were retained after rolling and applied to the SGM(1,1), MCSGM(1,1), CMCSGM(1,1), and proposed SC-MCGM(1,1) models.Related GA parameters were: (1) population size n size = 200; (2) stopping condition n max = 500; (3) n del = 2; (4) crossover probability Pr c = 0.9; and (5) mutation probability Pr m = 0.01.

Prediction of Foreign Tourists for Taiwan
The first experiment was conducted on the yearly statistics reported by Taiwan Tourism Bureau.Table 1 shows historical annual foreign tourists to Taiwan from six economies, including Japan, Hong Kong/Macao, Korea, China, USA, and Southeast Asia, collected from 2001-2016.Year 2016 was used for ex post testing.After performing the rolling mechanism, 2011-2015 (n = 5) from China and 2012-2015 (n = 4) from the other economies can be used for model-fitting for the SGM(1,1), MCSGM(1,1), CMCSGM(1,1), and proposed SC-MCGM(1,1) models.Both n = 4 and 5 produced two intervals.Table 2 shows prediction results associated with ex post testing for the original GM(1,1), the MCGM(1,1), the SGM(1,1), the MCSGM(1,1), and the CMCSGM(1,1), and Table 3 shows those for the proposed SC-MCGM(1,1) with different values for m.For the proposed SC-MCGM(1,1), it seems that the worse results can be obtained for m = 1.When m ≥ 2, the results obtained by the proposed SC-MCGM(1,1) is comparable or superior to those described in Table 2.For instance, the SC-MCGM(1,1) with m = 2, 3, 4 outperform the SGM(1,1), MCSGM(1,1), and CMCSGM(1,1) for Japan, Hong Kong/Macao, Korea, China, and Southeast Asia.Since the number of visitors from China to Taiwan dramatically declined in 2016, as shown in Table 1, the results of the ex post testing are relatively poor for every prediction model.However, the proposed SC-MCGM(1,1) outperforms the others, even with this significant temporal fluctuation.

Prediction of Foreign Tourists for China
Historical annual data from 1997 to 2013 published by the China National Tourism Administration were used.The collected data described the number of foreign tourists from eight main economies, including Japan, Korea, Malaysia, Mongolia, Philippines, Russia, Singapore, and USA.Following Sun et al. [1], year 2013 was used for ex post testing using a one-step transition probability matrix.Performing the rolling mechanism, 2005-2012 data from Korea, Japan, USA, and Malaysia; 2006-2012 from Russia, 2003-2012 from Mongolia and Philippines; and 2004-2012 from Singapore were used to construct the SGM(1,1), MCSGM(1,1), CMCSGM(1,1), and proposed SC-MCGM(1,1) models.
Table 4 shows prediction results for all compared models, and Table 5 shows those for the proposed SC-MCGM(1,1) with different m.The original GM(1,1) and MCGM (1,1) were significantly poorer than the other methods considered.The results obtained by the proposed SC-MCGM(1,1) with different m is comparable or superior to those described in Table 4.For instance, the SC-MCGM(1,1) outperform the prediction methods considered for Korea and Mongolia for all m considered, and outperform the compared models for USA and Singapore for m = 2, 3, 4.

Statistical Analysis
We investigated if the SC-MCGM(1,1) model prediction accuracy could be improved by tuning m, compared to the m = 1 case.We compared outcomes for the fourteen data sets (six economies for Taiwan and eight for China) using the non-parametric Friedman test [38] with the post-hoc test, the Bonferroni-Dunn test [39].The Friedman test checks whether average ranks are significantly different, ranking the SC-MCGM(1,1) model for all m separately for each data set, with increasing rank number implying lower prediction accuracy.In case of ties, the average rank was assigned.
Subsequently, the Bonferroni-Dunn test was used to detect any significant differences among m = 1, 2, 3, and 4. Significant differences in the prediction accuracy of the two forecasting methods can be probed by the difference in their average ranks, with the critical difference at the 10% level being where q 0.10 = 2.128.The difference is significant when CD > 1.03.Thus, SC-MCGM(1,1) with m = 2 is significantly superior than with m = 1, whereas m = 3 and 4 are not.Thus, long term transitions in the Markov model could be useless for tourism demand forecasting.

Discussion and Conclusions
Accurate foreign tourist forecasting is critical for governmental tourism development policies.However, time series data for tourism often have temporal fluctuation and trend changes, making precise predictions challenging.Over or under estimation of foreign tourist numbers could lead to inappropriate governmental investment in tourist infrastructures [41].This study proposed a novel grey residual modification model incorporating soft computing, including SLP, FLN, and GA, into the Grey-Markov model.
Historical annual data for foreign tourists collected from Taiwan and China official institutions were used to evaluate prediction accuracy of the proposed model.Soft computing constructs a computationally intelligent system that can learn to achieve better accuracy for changing environments and confront real world problems [42].Therefore, this study proposed a residual modification model called SC-MCGM(1,1) by incorporating soft computing techniques into the Grey-Markov model.The proposed SC-MCGM(1,1) model was capable of estimating not only the adjusted volume associated with a new residual value for a predicted value from the GM(1,1), but also avoided the troublesome bounds of individual states for the Markov chain.The degree to which a representative point in each state can contribute to a new residual value was based on the sum of transition probability matrices with different steps.
It is found that the growth rate of foreign tourists to Taiwan from Japan and Korea is greatly increased to 22% in 2016.To effectively stimulate the increase of the number of inbound visitors from Northeast Asia, Taiwan authorities should think about how to further work in close cooperation with related economies.Several relevant tourism development and marketing strategies should be put forth to promote the tourism industry.For instance, more bilateral routes that should be taken into account, the promotion for more attractive itineraries with shopping and accommodation, and environmental impact assessment.Since the SC-MCGM(1,1) model among the compared models performs very well for predicting visitors from Northeast Asia, this suggests that Taiwan authorities can leverage the proposed model to set up tourism development plans for Northeast Asia for a few years.As for China, the SC-MCGM(1,1) model performs very well for predicting visitors from Korea, Russia, US, Mongolia, Philippines, and Singapore.Similarly, China authorities can set up appropriate tourism policies by using the proposed SC-MCGM(1,1) model to predict the number of inbound visitors from those economies for a few years.After all, a prediction model can play a significant role on implementation of tourism development plans [41].
To maximize prediction accuracy, representative points, state bounds for the Markov chain, and FLN connection weights were automatically determined by GA, using a relatively simple computer program.Historical annual data involving foreign tourist collected from Taiwan and China official institutions were used to evaluate prediction performance of the proposed SC-MCGM(1,1).As for ex post testing, we can see that the proposed SC-MCGM(1,1) with m = 2 is comparable or superior to the compared models.Tables 1 and 3 show that the proposed model outperforms the other methods for 11 of the 14 data sets.These results validate that combining neural networks and GA is advantageous for intelligent prediction models.
The proposed SC-MCGM(1,1) with pre-specified GA parameters, including population size, number of generations, and crossover and mutation parameters, performed well.Thus, fine parameter tuning was not required.FLN used the hyperbolic tangent as the output neuron's activation function, computing a weighted sum of a vector of connection weights with an enhanced pattern.This assumes additivity among individual variables in the enhanced pattern [43].However, these criteria are not always independent [9,[43][44][45][46][47][48].Therefore, future research will investigate the impact of non-additivity on prediction performance of the proposed SC-MCGM(1,1) model.
into the determination of ŝk if k ≥ m + 1.In contrast, only s (1) k ≥ m + 1, then we sum p s k , . . ., and p s (m) k for m previous transitions,

Table 1 .
Historical annual foreign tourists from six economies to Taiwan.

Table 2 .
Absolute percentage errors obtained by different forecasting methods for Case I.

Table 3 .
Absolute percentage errors obtained by the proposed SC-MCGM(1,1) with different transitions for Case I.

Table 4 .
Absolute percentage errors obtained by different forecasting methods for Case II.

Table 5 .
Absolute percentage errors obtained by the proposed SC-MCGM(1,1) with different transitions for Case II.