Regional Population Forecast and Analysis Based on Machine Learning Strategy
Abstract
:1. Introduction
2. Related Works
2.1. Essential Factors of Population Growth
2.2. Deep Learning Application in Decision Support
2.3. Potential Disadvantage of Conventional Models
3. Boosting RegressionBased Method and Recurrent Neural Network
3.1. Gradient BoostingBased Method
3.2. XGBoost Algorithm
Algorithm 1. XGBoost algorithm 
Input: Data ${\left\{\left({x}_{i},{y}_{i}\right)\right\}}_{i=1}^{n}$, and a differentiable Loss Function, as the algorithm (1): $l\left({y}_{i},{\widehat{y}}_{i}=F\left(x\right)\right)=\frac{1}{2}{({y}_{i}{\widehat{y}}_{i})}^{2}$ 
Step 1: Initialize model with a constant value: ${F}_{0}\left(x\right)=argmin{\sum}_{i=1}^{n}L\left({y}_{i},r\right)$ 
Step 2: for m = 1 to M: 




Step 3: Output ${F}_{M}\left(x\right)$ 
3.3. Gain
3.4. XGBoost Regression Model
3.5. Long ShortTerm Memory Network
4. Simulation Experiment
4.1. Data Description
4.2. Experiment Design
 The MAPE is applied as the measuring criteria to evaluate modelling performance in the comparison, as shown in Table 1. By observing a fitting tendency between the real historical data and the forecasted data from 2009 to 2018, it can further confirm the reliability of the forecast results from 2019 to 2025.
 Three inference models are applied in the comparison in this work, including the Linear Regression model (conventional method), the LSTM model, and the XGBoost Regression model. In addition, the comparisons are summarized in Table 1.
4.3. Near Future Forecasting with Linear Regression, XGBoost Regression, and LSTM Models
4.4. Feature Importance in the Present, across a Known Time to the Near Future
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
 Berke, P.R.; Godschalk, D.R.; Kaiser, E.J.; Rodriguez, D.A. Urban Land Use Planning, 5th ed.; University of Illinois Press: Champaign, IL, USA, 2006. [Google Scholar]
 Isserman, A.M. The right people, the right rates: Making population estimates and forecasts with an interregional cohortcomponent model. J. Am. Plan. Assoc. 1993, 59, 45–64. [Google Scholar] [CrossRef]
 Tiebout, C.M. A pure theory of local public expenditures. J. Political Econ. 1956, 64, 416–424. [Google Scholar] [CrossRef]
 Cebula, R.J.; Richard, V.K. A note on migration, economic opportunity, and the quality of life. J. Reg. Sci. 1973, 13, 205–211. [Google Scholar] [CrossRef] [Green Version]
 Cebula, R.J.; Belton, W.J. Voting with one’s feet: An analysis of public welfare and migration of the American Indian. Am. J. Econ. Sociol. 1994, 53, 273–280. [Google Scholar] [CrossRef]
 Sakashita, N.; Hirao, M. On the applicability of the Tiebout model to Japanese cities. Rev. Urban Reg. Dev. Stud. 1999, 11, 206–215. [Google Scholar] [CrossRef]
 Cebula, R.J. Migration and the TieboutTullock hypothesis revisited. Am. J. Econ. Sociol. 2009, 68, 541–551. [Google Scholar] [CrossRef] [Green Version]
 Etzo, I. The determinants of the recent interregional migration flows in Italy: A panel data analysis. J. Reg. Sci. 2001, 51, 948–966. [Google Scholar] [CrossRef]
 Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 14–19 August 2016; pp. 785–794. [Google Scholar]
 Le, N.Q.K.; Do, D.T.; Chiu, F.Y.; Yapp, E.K.Y.; Yeh, H.Y.; Chen, C.Y. XGBoost improves classification of MGMT promoter methylation status in IDH1 wildtype glioblastoma. J. Pers. Med. 2020, 10, 128. [Google Scholar] [CrossRef] [PubMed]
 Bhattacharya, S.; Kaluri, R.; Singh, S.; Alazab, M.; Tariq, U. A novel PCAFirefly based XGBoost classification model for intrusion detection in networks using GPU. Electronics 2020, 9, 219. [Google Scholar] [CrossRef] [Green Version]
 Yu, B.; Qiu, W.; Chen, C.; Ma, A.; Jiang, J.; Zhou, H.; Ma, Q. SubMitoXGBoost: Predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 2020, 36, 1074–1081. [Google Scholar] [CrossRef]
 Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Toward safer highways, application of XGBoost and SHAP for realtime accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [Google Scholar] [CrossRef]
 Bi, Y.; Xiang, D.; Ge, Z.; Li, F.; Jia, C.; Song, J. An interpretable prediction model for identifying N7methylguanosine sites based on XGBoost and SHAP. Mol. Ther. Nucleic Acids 2020, 22, 362–372. [Google Scholar] [CrossRef] [PubMed]
 Zhou, J.; Qiu, Y.; Zhu, S.; Armaghani, D.J.; Khandelwal, M.; Mohamad, E.T. Estimation of the TBM advance rate under hard rock conditions using XGBoost and Bayesian optimization. Undergr. Space 2020. [Google Scholar] [CrossRef]
 Montiel, J.; Mitchell, R.; Frank, E.; Pfahringer, B.; Abdessalem, T.; Bifet, A. Adaptive XGBoost for evolving data streams. arXiv 2020, arXiv:2005.07353. [Google Scholar]
 Samat, A.; Li, E.; Wang, W.; Liu, S.; Lin, C.; Abuduwaili, J. MetaXGBoost for hyperspectral image classification using extended MSERguided morphological profiles. Remote Sens. 2020, 12, 1973. [Google Scholar] [CrossRef]
 Chen, C.; Zhang, Q.; Yu, B.; Yu, Z.; Lawrence, P.J.; Ma, Q.; Zhang, Y. Improving proteinprotein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput. Biol. Med. 2020, 123, 103899. [Google Scholar] [CrossRef] [PubMed]
 Ma, J.; Cheng, J.C.; Xu, Z.; Chen, K.; Lin, C.; Jiang, F. Identification of the most influential areas for air pollution control using XGBoost and Grid Importance Rank. J. Clean. Prod. 2020, 274, 122835. [Google Scholar] [CrossRef]
 Goyal, K.; Dumancic, S.; Blockeel, H. Feature Interactions in XGBoost. arXiv 2020, arXiv:2007.05758. [Google Scholar]
 Hochreiter, S.; Schmidhuber, J. Long shortterm memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
 Hochreiter, S.; Schmidhuber, J. LSTM can solve hard long time lag problems. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1997; pp. 473–479. [Google Scholar]
 Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. In Proceedings of the 9th International Conference on Artificial Neural Networks: ICANN ’99, Edinburgh, UK, 7–10 September 1999. [Google Scholar]
 Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoderdecoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
 Beaufays, F. The Neural Networks behind Google Voice Transcription. Google Research Blog. 2015. Available online: https://ai.googleblog.com/2015/08/theneuralnetworksbehindgooglevoice.html (accessed on 1 January 2021).
 Sak, H.; Senior, A.; Rao, K.; Beaufays, F.; Schalkwyk, J. Google Voice Search: Faster and More Accurate. Google Research Blog. 2015. Available online: http://googleresearch.blogspot.ch/2015/09/googlevoicesearchfasterandmore.html (accessed on 1 January 2021).
 Karim, F.; Majumdar, S.; Darabi, H.; Chen, S. LSTM fully convolutional networks for time series classification. IEEE Access 2017, 6, 1662–1669. [Google Scholar] [CrossRef]
 Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008. [Google Scholar]
 Tu, J.V. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 1996, 49, 1225–1231. [Google Scholar] [CrossRef]
 Chen, Z.; Gao, Z.; Yu, R.; Wang, M.; Sun, P. Macrolevel accident fatality prediction using a combined model based on ARIMA and multivariable linear regression. In Proceedings of the 2016 International Conference on Progress in Informatics and Computing (PIC), Shanghai, China, 23–25 December 2016; pp. 133–137. [Google Scholar]
 Ediger, V.Ş.; Akar, S.; Uğurlu, B. Forecasting production of fossil fuel sources in Turkey using a comparative regression and ARIMA model. Energy Policy 2006, 34, 3836–3846. [Google Scholar] [CrossRef]
 Hsu, P.Y.; Yeh, I.W.; Tseng, C.H.; Lee, S.J. A boosting regressionbased method to evaluate the vital essence in semiconductor industry performance. IEEE Access 2020, 8, 156208–156218. [Google Scholar] [CrossRef]
Feature  Birth  City Annual  Death  Immigration  Income  Population  Average MAPE  

Models in Different Year Range  
Linear_Regression_3Y  0.30265  0.36806  0.24133  6.35127  0.17148  0.23123  1.27767  
Linear_Regression_4Y  0.36432  0.39890  0.26973  26.03689  0.18115  0.26782  4.58647  
Linear_Regression_5Y  0.34876  0.37034  0.28464  11.57862  0.15104  0.25222  2.16427  
LSTM_3Y  1.40973  1.47480  1.43107  10.09746  0.29646  1.31306  2.67043  
LSTM_4Y  1.34646  1.48777  1.42434  11.45912  0.28670  1.30690  2.88521  
LSTM_5Y  1.21405  1.27438  1.41467  13.70877  0.27888  1.30739  3.19969  
XGBoost_3Y  0.01310  0.00396  0.00210  0.42950  0.00149  0.00017  0.07505  
XGBoost_4Y  0.00725  0.00179  0.00101  0.11286  0.00080  0.00012  0.02064  
XGBoost_5Y  0.00201  0.00069  0.00062  0.13376  0.00047  0.00009  0.02294 
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. 
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, C.Y.; Lee, S.J. Regional Population Forecast and Analysis Based on Machine Learning Strategy. Entropy 2021, 23, 656. https://doi.org/10.3390/e23060656
Wang CY, Lee SJ. Regional Population Forecast and Analysis Based on Machine Learning Strategy. Entropy. 2021; 23(6):656. https://doi.org/10.3390/e23060656
Chicago/Turabian StyleWang, ChianYue, and ShinJye Lee. 2021. "Regional Population Forecast and Analysis Based on Machine Learning Strategy" Entropy 23, no. 6: 656. https://doi.org/10.3390/e23060656