Forecasting Tourist Arrivals for Hainan Island in China with Decomposed Broad Learning before the COVID-19 Pandemic

This study proposes a decomposed broad learning model to improve the forecasting accuracy for tourism arrivals on Hainan Island in China. With decomposed broad learning, we predicted monthly tourist arrivals from 12 countries to Hainan Island. We compared the actual tourist arrivals to Hainan from the US with the predicted tourist arrivals using three models (FEWT-BL: fuzzy entropy empirical wavelet transform-based broad learning; BL: broad Learning; BPNN: back propagation neural network). The results indicated that US foreigners had the most arrivals in 12 countries, and FEWT-BL had the best performance in forecasting tourism arrivals. In conclusion, we establish a unique model for accurate tourism forecasting that can facilitate decision-making in tourism management, especially at turning points in time.


Introduction
Hainan Island is connected with the "Pan-Pearl River Delta", Hong Kong, Macao, and Taiwan in the north, southeast Asian countries to the south, and Vietnam to the west (The People's Government of Hainan Province [TPGoHP], 2012). Hainan Island was approved to set up the China (Hainan) Pilot Free Trade Zone by the Chinese government in 2018 which covers the whole island of Hainan. The overall plan of Hainan Province requires that the development of tourism, modern service industry, and high-tech industries take the lead, and the industrial layout of Hainan Island should be scientifically arranged. Therefore, it is very important to develop tourism on Hainan Island. Additionally, Hainan Island desires the establishment of an international tourism consumption center that can become an important engine of global economic growth. The international tourism consumption center shows much consumption, including a consumption environment and a world-class tourist attractions tourism complex, with a distribution center for both tourists in the locality and those abroad [1]. Artificial intelligence is widely employed in the development of high-quality tourism products and the growth of the tourism industry [2]. Accurate forecasting of tourist demand is imperative for academia and the tourism industries. In particular, accurate tourism arrival forecasting in Hainan Island can guide the administrative department in formulating policy. How to improve the forecasting performance when building the Hainan international tourism consumption center efficiently remains a challenge. Many studies [3,4] have declared that online search engine data could improve the tourism demands of forecasting performance. Zhang et al. [5] used an approach involving decomposition combined with prediction to experiment on a sample of tourists, mainly 2 of 11 from Hong Kong. The modified method tapped into the good performance of the variational model decomposition in visitor prediction. Li et al. [6] developed a deep learning (DL) model with temporal feature learning capabilities for tourism volume data prediction by combining dimensionality reduction techniques; this produced a performance better than that of methods against which it was compared. Existing studies [4,[7][8][9] have primarily established numerous techniques to improve the forecasting accuracy for tourism demand. Accuracy in tourism forecasting is critical for enabling administrative management to make appropriate decisions. In recent decades, decomposition ideas [10] have achieved better performance in the field of time series prediction. The signal decomposition methods represented by EWT [11] can fully exploit the submodular variables in the signal, thus providing a better pre-processing method for deep learning as a predictor for prediction. However, DL models require a large number of parameters and deep network structures, which invariably increase the computational complexity of the models [12]. Therefore, a new research idea has been to try to develop models that can achieve comparability with DL models without deep network structures [13,14].
At present, the development of the tourism business in Hainan, whether considered in proportion or considered in total, is relatively poor. Hainan urgently needs to break into the tourism business and build an international hub for tourism with high standards. Thus, we investigated a methodological approach by fusing decomposition and low-complexity DL networks and the broad learning (BL) system [15] for Hainan Island tourist forecasting. We combined artificial intelligence (AI) and tourism arrivals data for Hainan Island to analyze the Hainan Island arrivals of different foreign countries and provide more accurate forecasting for tourism.
The purpose of this study was to forecast the tourism arrivals of different foreign countries with an artificial intelligence (AI) model. The main aim was to compare and analyze the feasibility and effectiveness of this model developed based on decomposition and a non-deep deep learning (BL) framework. Experiments were conducted with a fusionentropy EWT approach with the advantage of a typical neural network approach (BPNN) and BL approach for comparison. The main conclusion was that US foreigners had the most arrivals in 12 countries, and the FEWT-BL model performed the best in forecasting the tourist arrivals to Hainan from 12 countries. This should be helpful in identifying future directions of research on tourism arrival forecasting. Hence, this study provides advanced insights for researchers conducting future studies using AI models to forecast tourism demand.

Methodology
A total of 2592 observation data on Hainan Island arrivals were collected between January 2002 and December 2019 from the government's official website. We compared the actual tourist arrivals from the US to Hainan with the predicted tourist arrivals using three models (FEWT-BL: fuzzy entropy empirical wavelet transform-based broad learning; BL: broad learning; BPNN: back propagation neural network). Broad learning forecasting models were implemented to predict the tourist arrivals to Hainan. The raw data were normalized and screened before analysis. Then, we applied an empirical wavelet transform (EWT) method to decompose the normalized tourist arrivals data. FEWT-BL is an improved performance method of EWT. This method aims to obtain R-square (R 2 ), the root means square error (RMSE), individual intrinsic mode functions (IMFs), and mean absolute percentage error (MAPE). FEWT-BL was developed through the following steps.

Empirical Wavelet Transform
Jerome Gilles [11] first introduced EWT, which is defined as a set of bandpass filters that are selected through the spectral characteristics signal. To determine the frequency ranges of the bandpass filters, the Fourier spectrum signal is segmented. From the literature [16], a finite number of intrinsic modes for a time series can be effectively identified and extracted by the EWT. The EWT depends on robust pre-processing for peak detec- tion and shows spectrum segmentation and establishes a related wavelet filter bank. The EWT algorithm steps include the signal extending, the Fourier transform executing, the boundaries extracting, the filter bank building and the sub-bands extracting.
The EWT computation can be shown as following [17]: (1) The Fourier spectrum of the original precipitation series is segmented into N continuous segments. The limits are defined as ω n , where ω 0 = 0 and ω n = 0, respectively. Each segment is defined as Λ n = [ω n−1 , ω n ]. For each ω n , a transition phase T n with the width 2τ n is utilized. The range γ can be shown as: (2) A series of empirical wavelets based on the Littlewood-Paley and Meyer's wavelets is established. For ∀ n > 0, the empirical scaling function and empirical wavelets can be shown by Equations (2) and (3), respectively: The function β(x) is defined as: The inner products with the empirical scaling function achieved the approximation coefficients W ε f (0, t) as follows: The inner products with the empirical wavelets achieved the detailed coefficients W ε f (n, t) as follows: (3) The reconstruction series and empirical modes are shown as follows:

Fuzzy Entropy
Entropy is a parameter in statistical thermodynamics that measures the degree of chaos in a system to represent the state of matter [18,19]. The concepts of information entropy, sample entropy, and approximate entropy have been successively proposed. Fuzzy entropy is much more efficient and it can improve the sample entropy algorithm that is proposed in the literature [20]. The fuzzy entropy algorithm selects the exponential function as the fuzzy function to measure the similarity between two variables and has superior signal measurement properties compared to approximate entropy and sample entropy. They include relative consistency, noise resistance, and better continuity [21]. Therefore, this study combined fuzzy entropy with the EWT technique to extract signal features, and then construct feature vectors. First, the sequences are defined as follows: where x 0 (i) stands for m consecutive x(i). d m ij is defined as the distance between X(i) and X(j), and d m ij is the maximum absolute value of the difference between the two corresponding elements; that is: The fuzzy similarity is defined by the fuzzy function; that is: where n and r represent the gradient and width of the boundary, respectively. Similarly, the function φ is defined as: Then, 1 is added to the dimension and is turned into m+1; the above steps are repeated to obtain ϕ m+1 (n,r) . The fuzzy entropy (Fuzzy En) of the signal sequence is: In fuzzy entropy, r represents the width of the fuzzy function boundary; too large an r will result in the loss of much statistical information, and too small an r results in a failure to estimate the statistical properties as well and increases sensitivity to the resulting noise. Normally, r is taken to be between 0.1 and 0.25 SD(x) (where SD(x) is the standard deviation of the series). For the choice of n, which determines the gradient of the similarity tolerance bound, the larger the n, the larger the gradient. n plays a weighting role in the calculation of similarity between fuzzy entropy vectors. To capture as much detailed information as possible, one is generally advised to use smaller integer values. Thus, this study selected m = 2, r = 0.15 × SD, and n = 2.

Broad Learning System
The broad learning system (BLS) broadly extends the network [22]. Considering the general supervised learning task, the training data set is given as (X, from C classes, where each row in X and Y denotes the data point x i = (x i1, x i2, . . . , x iD ) and target vector y i = (y i1, y i2, . . . , y iC ), respectively [23]. In BLS, a random mapping generation with n nodes can be defined as follows: where the weights W et and the bias term β ei are randomly determined with the proper dimensions. The whole feature nodes can be defined as Z n ≡ [Z 1 , . . . , Z n ], and the m th group of enhancement nodes can be computed as: where ξ is a nonlinear activation function, and the outputs of the enhancement layer can be denoted by Overall, the formula of the broad learning model can be deduced as follows: where A = [Z n , H m ], and W is the output weight connection of feature nodes and enhancement nodes to the output layer. W could be obtained by minimizing the objective: where the first term denotes the training errors, and the second term is a regularization term and λ is a regularization parameter to balance the influence of error terms and the model complexity. According to a simple derivative operation on W, we can obtain: The BLS output weight W is always obtained as the matrix A T A + λI . This work combined the role of fuzzy entropy in signal decomposition. First, the corresponding components were obtained by the EWT decomposition of tourism data. Then, the energy value of each component was calculated using fuzzy entropy, and the information entropy of each component was calculated by taking the percentage of each component in the total amount as the probability density function to obtain updated estimates. Finally, the BL model was used for prediction.

Model Performance Evaluation
Two error measure indexes were utilized in the forecasting experiments to assess the prediction performance among the involved models. The indexes were the mean absolute percentage error (MAPE), and the root means square error (RMSE).
The indexes were shown as:

Results of Decomposition
We collected data on tourist arrivals to Hainan province from 12 countries, as shown in the legend of Figure 1 to study the predictive ability and improved the accuracy of tourism arrivals forecasting. Figure 1 shows the number of tourists from different countries/regions traveling to Hainan province by month, where the number of US foreign arrivals is the highest. All data information was collected from the Hainan Province Tourism Board. The abbreviations AU, US, CA, RU, CH, IT, DE, FR, GB, MY, KR, and JP are used to represent the 12 counties.

Results of Decomposition
We collected data on tourist arrivals to Hainan province from 12 countries, as shown in the legend of Figure 1 to study the predictive ability and improved the accuracy of tourism arrivals forecasting. Figure 1 shows the number of tourists from different countries/regions traveling to Hainan province by month, where the number of US foreign arrivals is the highest. All data information was collected from the Hainan Province Tourism Board. The abbreviations AU, US, CA, RU, CH, IT, DE, FR, GB, MY, KR, and JP are used to represent the 12 counties. We applied the empirical wavelet transforms (EWT) to decompose the original data set into individual components. The intrinsic mode function (IMF) is the modulated function which is amplitude-frequency. Five IMFs and one residual item were analyzed by MATLAB. The results declared various representations of each tourist arrivals component, as shown in Figure 2. First, from the perspective of component semantic interpretation, the main difference between these components is the frequency of occurrence. All the components of the tourist arrivals data set present distinct frequencies. IMF 5 has the highest frequency, whereas IMF 1 has the lowest. The more frequent the IMF, inevitably, the greater the amounts of information and noise. The individual tourist arrivals components show the cycles, trends, and seasonal patterns. Similarly, IMF 1 and IMF 2 could be thought of as secondary cycles of data, during which there are other distinct peaks and valleys. IMF 4 is thought of as modest fluctuations, which contain less information. On the contrary, the residual is of certainty long-term behavior, which can suggest the trend of the tourism market in the long-term. Huang et al. [24] also believed that the residual component could determine long-term behavior.
As indicated in Figure 3, we used the average period formula to calculate the average period IMF 3 displayed in all the tourist arrivals data, which was about 12 months. Therefore, IMF 3 was thought of as the main tourist arrivals cycle, which declared the main valleys and peaks. We applied the empirical wavelet transforms (EWT) to decompose the original data set into individual components. The intrinsic mode function (IMF) is the modulated function which is amplitude-frequency. Five IMFs and one residual item were analyzed by MATLAB. The results declared various representations of each tourist arrivals component, as shown in Figure 2. First, from the perspective of component semantic interpretation, the main difference between these components is the frequency of occurrence. All the components of the tourist arrivals data set present distinct frequencies. IMF 5 has the highest frequency, whereas IMF 1 has the lowest. The more frequent the IMF, inevitably, the greater the amounts of information and noise. The individual tourist arrivals components show the cycles, trends, and seasonal patterns. Similarly, IMF 1 and IMF 2 could be thought of as secondary cycles of data, during which there are other distinct peaks and valleys. IMF 4 is thought of as modest fluctuations, which contain less information. On the contrary, the residual is of certainty long-term behavior, which can suggest the trend of the tourism market in the long-term. Huang et al. [24] also believed that the residual component could determine long-term behavior.  As indicated in Figure 3, we used the average period formula to calculate the average period IMF 3 displayed in all the tourist arrivals data, which was about 12 months. Therefore, IMF 3 was thought of as the main tourist arrivals cycle, which declared the main valleys and peaks.

Analysis of Forecasting Results
After the decomposition of the Hainan province tourist arrivals data, the decomposed components were predicted, respectively, and this comprehensive prediction was then combined to achieve the result. In our experiment, all data sets were grouped into the training set and prediction set, with the first 13 years of data from 18 years as the

Analysis of Forecasting Results
After the decomposition of the Hainan province tourist arrivals data, the decomposed components were predicted, respectively, and this comprehensive prediction was then combined to achieve the result. In our experiment, all data sets were grouped into the training set and prediction set, with the first 13 years of data from 18 years as the training set and the last 5 years of data as the prediction set. Additionally, a rolling method was used, and the step size was set to 1.
For example, Figure 4 depicts the reception of inbound tourists by cities and counties in Hainan province; Sanya city has the optimal reception condition with foreigners from various countries. Therefore, the optimal reception condition of Sanya city can stimulate its tourism economy and promote its international development. Figure 4 shows the predictive ability of the FEWT-BL method compared to the non-decomposed method and back propagation neural network (BPNN) for tourist arrivals to Hainan, using tourists from the US as an example. We can see from Figure 4 that the yellow line indicates the true data, and the red, green, and blue lines represent the results of the three compared methods. It is easy to identify that the red line (FEWT-BL) is closer to the yellow line (TRUE) than the green (BL) and the blue line (BPNN), which means that the proposed FEWT-BL method obtained the best performance compared to the other methods. stable and upward growth in the tourism industry. Based on this forecasting result, our study can support decision-making for policy administration in tourism, especially at turning points in time.
(TRUE) than the green (BL) and the blue line (BPNN), which means that the proposed FEWT-BL method obtained the best performance compared to the other methods. These peak turning points show the direct change in Hainan Island tourist arrivals. For example, from May 2018 to September 2018 and from September 2018 to January 2019, Hainan Island tourist arrivals from the United States (US) showed a downturn and uptrend, including the valley and peak point. From May 2019 to September 2019, tourist arrivals from the US experienced a downtrend, including the turning point. Therefore, Hainan administrators should pay attention to the fluctuations in the tourism market and make prompt decisions to ensure stable and upward growth in the tourism industry. Based on this forecasting result, our study can support decision-making for policy administration in tourism, especially at turning points in time. More detailed results with all the predicted performances for 12 countries are shown in Table 1. Evaluation metrics (RMSE and MAPE) were utilized to assess the performance of the benchmark methods. Compared with the BL and BPNN methods, the proposed method-the EWT-based BL algorithm-almost obtained the lowest RMSE and MAPE. More detailed results with all the predicted performances for 12 countries are shown in Table 1. Evaluation metrics (RMSE and MAPE) were utilized to assess the performance of the benchmark methods. Compared with the BL and BPNN methods, the proposed method-the EWT-based BL algorithm-almost obtained the lowest RMSE and MAPE. The R 2 is depicted the agreement extent between forecasting and training tourist data. The FEWT-BL method is much more preferable than the BL without EWT technology in terms of R 2 . The results indicate that the proposed FEWT-BL method was favorable for predicting tourism arrivals compared with other BL and BPNN methods. As indicated in Table 1, the FEWT-BL forecasting accuracy for the tourist arrivals from Italy was higher (R 2 = 0.96) than that of the BL (R 2 = 0.93) and BPNN (R 2 = 0.88) models. Alternatively, the FEWT-BL forecasting accuracy for the tourist arrivals from the United States performed better (R 2 = 0.94) compared to the BL (R 2 = 0.91) and BPNN (R 2 = 0.89) models. The DM test was used to examine whether there was a significant difference between the predictive accuracy of the two models. The results of the DM test in Table 2 showed that, among the three prediction models, FEWT-BL outperformed the remaining two. In summary, the results indicate that the FEWT-BL method can help to reduce forecasting errors with original official government data. Particularly, the FEWT-BL model can accurately achieve predictive ability for tourist arrivals. To present the results more visually, we used radar diagrams to show the performance of the three methods, where the red, yellow, and green lines in Figure 5 represent the FEWT-BL, BL, and BPNN methods, respectively. The red line is completely contained in the innermost layer, which means that the FEWT-BL method, represented by the red line, achieves the best RMSE performance. The yellow line is mostly wrapped up in the green line, and only a small number, two points, intersect with the green line, meaning that the BL method called the BPNN model performed better in most cases. Many researchers [25][26][27][28] have shown that the tourism perspective is emphasized ethnic minorities Additionally, Han et al. [29] studied Halal tourism, including travel motivation and customer desire. Many researchers are seeking new technologies to manage tourism [30]. Our study indicates that US foreigners had the most arrivals in 12 countries, and the FEWT-BL performed the best in forecasting the tourist arrivals to Hainan from 12 countries. The results show that the FEWT-BL method has a much preferable predictive ability in forecasting turning points. Therefore, we developed an accurate FEWT-BL method to forecast tourist arrivals to Hainan from 12 countries. We recommend using this method in the future for forecasting tourism to achieve improved performance.

Conclusions
Two strengths of this study are worth highlighting: Firstly, the updated broad learning (FEWT-BL) approach can accurately forecast tourism arrivals and reception, which can facilitate decision-making in tourism management, especially at turning points in time. Secondly, this study illustrates how a proposed decomposed broad learning model

Conclusions
Two strengths of this study are worth highlighting: Firstly, the updated broad learning (FEWT-BL) approach can accurately forecast tourism arrivals and reception, which can facilitate decision-making in tourism management, especially at turning points in time. Secondly, this study illustrates how a proposed decomposed broad learning model can improve the forecasting accuracy for tourism arrivals on Hainan Island, which has rarely been used for tourist arrivals. Hence, this study provides advanced insights for researchers seeking to conduct future studies using AI models to forecast tourism demand. Additionally, this method of forecasting through AI could be strongly recommended for applications in tourism. In this work, we focused on the prediction performance for tourism data based on machine learning and lightweight deep neural networks. The comparisons included a comparison between the decomposed and the traditional shallow neural network, as well as a comparison with the non-decomposed model. Therefore, this manuscript highlights the role before and after the decomposition and the forecasting ability of the new network structure compared to the traditional network structure. In future work, we will continue to compare and analyze the model proposed in this paper and some classical time series prediction methods.