1. Introduction
Human resources play a pivotal role in the prosperity of a country, and global development in general [
1,
2]. Globalization has resulted in people having more opportunities to work in many different countries [
3,
4]. With the rapid development of the economy in developed countries, labor resources are needed to meet requirements of the economy’s growth rate [
5,
6]. In addition, some developed countries face the dilemma of an aging population structure [
7,
8]. Therefore, there is an urgent need for foreign human resources to make up for the shortage of domestic human resources [
9]. In terms of solving these difficulties, many policies have been implemented with the aim of attracting human resources from overseas. Therefore, many countries have developed labor export development plans to create jobs, thus increasing income for each individual and contributing to the growth of the economy in terms of Gross Domestic Product (GDP) [
10,
11].
Currently, many countries in the Asia-Pacific region participate in labor exports, with the Philippines, Indonesia, Thailand, and China being direct competitors to Vietnam in the international labor market, as they share similarities [
12]. The experiences of these countries in managing, operating, and developing labor export activities can serve as valuable lessons for Vietnam. It is estimated that around 20 million Southeast Asians work outside of their home countries, with approximately half of those being in the Middle East [
13]. Additionally, inheriting the results and experiences gained through the era of international labor cooperation with socialist nations from 1980 to 1990, the Party and the State of Vietnam changed its level of awareness and promoted labor export activities [
14]. On 9 November 1991, the government issued Decree No. 370/HDBT, which clearly stated: “Putting migrant workers to working abroad for a definite time is a method to generate jobs, produce more incomes for workers and boost foreign currency revenue for the country, resulting in the strengthening of economic, cultural, scientific and technical collaboration relations between Vietnam and other countries that use labor on the principle of equality, mutual benefit, respect for each other’s laws and national traditions” [
15]. In 2018, 500,000 Vietnamese contract labor migrants worked abroad, according to the latest statistics [
16]. Taiwan, Korea, and Japan are still three important labor export markets for Vietnam, accounting for more than 70% of the total number of laborers working abroad in 2008 and 2009, and are three markets that continue to have a high demand for Vietnamese workers [
17]. Additionally, several studies on labor exports in East and Southeast Asia have reported workers migrating from Vietnam to Singapore, South Korea, Taiwan, and Japan [
18,
19].
When it comes to Backpropagation Neural Network (BPNN), k-Nearest Neighbor (kNN), and Random Forest Regression (RFR) models, many studies have been conducted by applying these models for prediction [
20,
21,
22,
23,
24,
25,
26]. Chen, Lai [
27] used BPNN and ANN algorithms to forecast that the majority of international visitors to Taiwan would be from the main markets of Hong Kong, Japan, and Macau from January 1971 to August 2009. Likewise, Vihikan, Putra [
28] employed BPNN to predict in the arrival of international tourists in Bali to assist the government in constructing the nation’s tourism strategies. Wang, Wu [
29] showed that the monthly inward tourism flow to China between 2000 and 2013 could be forecasted by deploying an Enhanced Backpropagation Neural Network. In addition, Lin, Malyscheff [
30] used ANN models to predict engineering students’ future retention, consistently achieving an overall prediction accuracy of around 70% or higher. Moreover, the research used RFR and Support Vector Regression (SVR) models to estimate global foreign tourist arrivals, achieving prediction accuracies of 99.4 percent using SVR and 84.7 percent using RFR [
31]. Additionally, the research used the kNN technique to investigate the unique nature of campground management, testing several forecasting approaches in order to determine which was best suited to the unique behavior of camping tourists and the unique nature of campsites [
32].
This paper presents a forecast of Vietnamese labor migration using the kNN, RFR, and BPNN models. The data collected in this study include 29 observations from 1992 to 2020. With the support of the collected data, the paper analyzes the role of labor exports in Vietnam’s socio-economic development and compares this export labor with that of other Asian countries. After that, the three algorithms are compared on the basis of the results of their statistical accuracy indicators. More importantly, this research highlights the possible future contexts of labor migration between Vietnam and East Asia, including Korea, Republic of China (Taiwan), and Japan. Some assume that the migration corridor extending between Vietnam and other East Asian nations has progressed to the point where labor migration is only one element in a more varied movement pattern. Nevertheless, there will continue to be significant demand for Vietnamese workers to work in Japan, Korea, and Taiwan for the foreseeable future. More importantly, this research could assist the government of Vietnam in enacting new regulations for Vietnamese migrant workers in order to boost the socio-economic situation. Consequently, the Vietnamese government could determine a different approach for our country’s labor export activities in the context of integration.
The following is a description of the paper’s structure. The first section of the paper is an introduction. The BPNN, kNN, and RFR models’ approach is introduced in
Section 2. The study results and discussions are described in
Section 3 and
Section 4. Finally, the conclusions are presented in
Section 5.
2. Materials and Methods
The process of the following experimental stages in this study is summarized in
Figure 1. Firstly, the database is preprocessed and tested by statistical methods, and it is also divided into training and testing sets. Secondly, the BPNN, RFR, and kNN models are used to learn the training samples and obtain the optimal network parameters. Finally, the three models’ performances are compared using metrics from the accuracy measurement indicators as Mean Square Error (MSE), Root-Mean-Square Error (RMSE), Mean Absolute Error (MAE), Correlation Coefficient (R), and Correlation of Determination (R
2) in the possible result stage. At the same time, seeking the most suitable prediction model for the study if the RMSE, MAE, and MSE gain the lowest values, and R and R
2 approach the highest values; on the other hand, the experiments are adjusted by component parameters of models or training data size to look for the optimal accuracy measurement indicators.
2.1. Database
This paper uses a database containing Vietnam labor immigration to Korea, Japan, and Taiwan from 1992 to 2020 obtained by the Department of Overseas Labor (DOLAB). The Python 3.9 software is deployed for data analysis. The data in
Figure 2 show that the number of Vietnamese workers migrating to Northeast Asian Countries has witnessed a fluctuation from 1992 to 2020. The total of Vietnamese laborers who immigrated to Northeast Asian Countries was 1,373,712 people for 29 years, in which Japan, Taiwan, and Korea occupied about 811,138,367,967, and 194,607 people, respectively. At the same time, more than 100,000 employees per yearly migrated to the region during the period of 2015–2019. However, the total number of Vietnamese workers in 2020 going to work in Japan, South Korea, and Taiwan is 76,355 persons, implying a decrease in the number of 55% compared with that in 2019 and a decline in total labor exports of nearly 56% compared with the data in 2018. Moreover, the highest total number of Vietnamese workers working in Taiwan, Japan, and South Korea was 68,737 employees in 2018, then 82,703 employees in 2019, and 18,141 in 2008.
The expected statistical results as the Mean, Min, and Max values, St Dev, Skew, and Kurt are also pointed out in
Table 1 to quantitatively analyze the yearly labor export characteristics to interpret their significance further. The mean is the crucial popular measure of central trend, and the indicator may be used with continuous data. The Minimum and Maximum values indicate the margin of the time series. The Standard Deviation is used to measure the degree of data dispersion. Finally, Skewness and Kurtosis are deployed to estimate whether the distribution of the sample is normal or not. The statistics result in
Table 1 indicate as below: The Mean and Standard Deviation of the labor exports to North-East Asian countries were 25,422 persons and 28,962 persons in Japan, 18,786 persons and 12,689 persons in Taiwan, 4132 persons and 6711 persons in Korea, respectively. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate skewed left data, and positive values for the skewness indicate skewed right data [
33,
34]. Skewness coefficients were low for the data sets. This approach is appropriate for modeling because a high skewness coefficient has a considerable negative effect on ANN performance [
35]. Hence, the values of Kurt and Skew of the data fluctuating from −8.87 to 2.64 could be accepted for prediction through these models. The labor exports to the region using Augmented Dickey-Fuller (ADF) test also point out that the database is separated two parts such as the data of labor exports to Korea (
p-value = 0.02, test statistic = −3.16) has a unit root, this data is stationary, and the data of labor exports to Japan (
p-value = 0.62, test statistic = −1.32) and Taiwan (
p-value = 0.99, Test Statistic = 2.22) do not have a unit root, these data are non-stationary. Furthermore, the time series data are stationary; they can be easily modeled with higher accuracy than their non-stationary counterparts [
36].
The input data patterns of the three nations were randomly selected with two sections. The first section was used for the training phase, and it contained roughly 70% of the total data. The remaining 30% of the labor export data was contained in the second component, which was used for the test phase.
2.2. Backpropagation Neural Network (BPNN)
BNPP can learn and store various data, akin to the human brain. Therefore, a single hidden layer is represented in
Figure 3. Furthermore, BPNN has three layers: an input layer, at least one hidden layer, and an output layer. Weights connect adjacent layers, always scattered among 0 and 1. Although several heuristic techniques have been deployed by many researchers [
37,
38], there is no systematic theory for verifying the number of input nodes and hidden layer nodes. Experiments or trial and error based on the test data’s least mean square error are the most fitted methods for determining the appropriate indicators of input and hidden nodes. The following Equations (1) and (2) indicate the relationships of McCulloch–Pitts (M–P) neurons in hidden layer and the output layer [
39].
where
is the result of
hidden neuron,
is the
valuable input out of
d inputs,
is the
output value,
m is the total number of hidden neurons,
f is the activation function of the rectified linear unit (ReLU),
and
are the weight terms,
and
are the bias terms.
Figure 3 demonstrates the BPNN structure below, showing 29 input nodes from x1 to xd (which d is also the 29th year) for the input layer; the 29 neurons for the output layer are also the values of the labor exports. At the same time, one hidden layer contains neurons from b1 to bm (m is the 200th node). Each neuron of the hidden layer and the output layer have in charge of Weight and Bias, such as
,
and
to correspond for neuron
and neuron
, respectively. Each hidden layer neuron takes the output from input layer neurons and converts these values with a weighted linear sum into the output layer. The output layer obtains the values from the hidden layer. Furthermore, the activation function for the hidden layer is ReLU function. Adam’s method is stochastic optimizations to the solver of weight optimization. The training method for the models is regression.
2.3. K-Nearest Neighbor (kNN)
In machine learning, kNN is one of the most fundamental supervised learning algorithms. The kNN model can be used for regression as well as classification. Moreover, because the observed data has a numeric label to forecast, kNN is employed as a predictive method with a multi-input multi-output (MIMO) strategy for forecasting [
32]. kNN begins by initialing the number of
k. Hence, the distance between the training and testing instances is calculated. Then, the prediction is made by calculating the test data’s average
k nearest distance neighbor. Subsequently, the Euclidean Distance is deployed as witnessed in Equation (3).
where
d symbolizes the distance among data points,
x represents the testing data,
y is the training data.
2.4. Random Forest Regression (RFR)
Random forest is a regression method for classifying or forecasting the numerical value of a variable that incorporates the outputs of various decision tree algorithms [
40,
41,
42]. When an (
x) input vector containing the values of the various evidentiary features analyzed for a certain training area is received by a random forest. RFR creates a total of
K regression trees and then averages the output. The random forest regression forecaster as shown in Equation (4) when
K grows the
tree [
43].
To avoid different trees correlations, RFR promotes tree variety by allowing them to grow from diverse training data subsets obtained through a method known as bagging [
43]. Bagging is a method for generating training data that randomly resample the original dataset with replacement data. Therefore, certain data may be used many times during training, whereas others may never be used. Hence, better stability is gained, as it becomes more resistant when faced with minor deviations in input data while also increasing forecast accuracy [
40].
2.5. Performance Metrics
Estimating results are based on calculating and comparing the actual values to the forecasted values. These metrics of the accuracy measurement parameters include the Mean Square Error (
MSE), Root-Mean-Square Error (
RMSE), Mean Absolute Error (
MAE), Correlation Coefficient (
R), and Correlation of Determination (
R2). Furthermore, the error metrics are defined as follows [
44,
45,
46]:
where
are the observed and estimated values in the period time
t, and
n is the number of the observed values in the testing data.
are mean of the observed and estimated value. The
should be approaching
1 to indicate strong model performance, and the MSE, MAE, and RMSE should be as close to
zero as possible.
3. Results Analysis
Three models were established to predict labor exports in three countries (Taiwan, Japan, and Korea). For comparing the models based on the estimating performance of labor exports, simulating experiments have pointed out the optimum models with the main indicators in
Table 2.
Line graphs in
Figure 4 illustrate the RFR (red lines), kNN (grey lines), and BPNN (green lines) algorithms using testing data that were compared with the actual line graphs (blue lines). As far as the predicted models of labor immigration in the three nations are concerned, the BPNN lines are the nearest ones compared with the real data in Taiwan, Korea, and Japan. As a result, the grey lines show that kNN algorithms achieve the second-most accurate level in Taiwan and Japan, whereas kNN only achieves the lowest level in Korea in a selection of random years from 1994 to 2016. Therefore, the accuracy parameters for the labor immigration forecast are calculated (as shown in
Table 3) to assess the three forecasting models’ accuracy levels. The outperform of the simulation in
Table 3 indicates the estimation accuracy using kNN, RFR, and BPNN algorithms has a significant reliability.
Additionally, when deploying kNN, RFR, and BPNN models in three nations with the accuracy parameters (MAPE, MAE, RMSE, R-squared, and NSE), the BPNN algorithm has the greatest accuracy indicators in Taiwan, Japan, and Korea, whereas the RFR method has the lowest. Furthermore, BPNN is the best machine learning algorithm in Taiwan, Japan, and Korea with
achieved higher estimation accuracy (with
= 0.006,
= 0.006, and
= 0.07) than RFR (with
= 0.145,
= 0.137, and
= 0.051) and kNN (with
= 0.073,
= 0.066, and
= 0.051). Similarly, based on MAE, RMSE, and NSE indicators, the BPNN algorithm in Taiwan, Japan, and Korea acquired greater accuracy parameters. In terms of forecast models for Taiwan, Japan, and Korea, the parameters of BPNN achieve a similar level of accuracy; however,
earns the highest one compared with others (with
= 57, and
,
gained the second-highest one (with
= 488, and
, and
is the lowest one (with
= 638, and
). In addition, the scatter plots and three models in
Figure 5 calculate the simulation results. As indicated, the dispersion of points around the diagonal of the BPNN model earns the highest accuracy parameters in three countries; the kNN machine learning algorithm is the second-lowest, and the RFR algorithm achieves the lowest accurate level.
The Taylor diagrams examine the performance of estimated and real values using the standard deviation and correlation used in evaluating the models. Moreover, the Taylor chart shows the standard deviation and correlation between the actual and anticipated datasets for the models and general consistency between observed and estimated values when the correlation value approaches 1, as shown in
Figure 6. This study can be considered for the kNN, RFR, and BPNN algorithms in Japan, which have SD (testing phase) = 20,649, 20,463, and 19,847, respectively, and are near to the actual data (with SD = 28,962), resulting in the three models achieving the same accurate level in predicting; meanwhile, the BPNN model’s standard deviations in Japan, and Korea implied that the predicted values of BPNN model are nearest with the real values (with
= 11,089, and
3637). Hence, the smaller the standard deviation value, the stronger the relationship. The Taylor plot indicates that kNN, RFR, and BPNN algorithms are the most accurate and optimal forecasting.
4. Discussions
Regarding labor exports of Asian countries, Indonesia was the second-largest sending country of labor migrants in Asia after the Philippines, with a long history of immigration and emigration. The Philippines, for example, sent about one million employees abroad in 2005, Indonesia about 400,000, and Bangladesh and Sri Lanka each more than 200,000; Meanwhile, Vietnam only deployed 70,000 to 80,000 people in the same year. According to the most recent published estimates, there are around 500,000 Vietnamese contract workers globally [
47]. According to the International Organization for Migration, Indonesian labor migrants significantly raised from 517,619 to 696,746 between 1996 and 2007, with the top five destinations being Saudi Arabia, Malaysia, Taiwan, Singapore, and Hong Kong [
48]. Furthermore, although Japan’s openness approves highly qualified migrants through open policies, highly skilled migrants in the targeted nations are still relatively small. In 2010, 198,000 highly skilled migrants in Japan accounted for barely 9% of the 2.1 million migrants [
49]. This export labor also compared with other Asian countries pointed out that with amount more than 100,000 employees per yearly were migrated to Japan, Taiwan, and Korea, Vietnam has third ranking of Asia during the period of 2015–2019.
Furthermore, foreign worker exports help improve employees’ lives, but they also substantially impact the country’s economy. The number of remittances from overseas to Vietnam remitted to the country is expected to be between three and four billion dollars per year. Hence, this is encouraging news for the Vietnamese economy [
50].
Human resource immigration has played a crucial role in Vietnam’s socio-economic development. Firstly, it can improve income and transform workers’ perceptions. Many workers, for instance, have returned to Vietnam to establish small and medium-sized businesses, contributing to the eradication of large-scale enterprises and using practical experiences gained from many areas of the world in Vietnam. Secondly, labor migration can help to alleviate poverty and advance Vietnam’s socio-economic growth. Then, labor migration provides a substantial source of foreign currency while lowering investment costs to alleviate the issue of domestic employment. Finally, labor exports are also a mechanism for transferring an innovative technology from other nations, assisting in training a quality workforce, and strengthening international cooperation links between Vietnam and other countries in the Northeast Asian Countries. Therefore, forecasting labor exports is significantly important for Vietnam’s labor exporting. The forecast results will assist labor export policymakers in assessing and considering foreign countries’ potential and labor demand. Simultaneously, governments can develop vocational training programs and acquire human resources that fit those countries’ needs.
This paper used kNN, BPNN, and RFR models to analyze and estimate Vietnamese human resources working for the Northeast Asian Countries. The different influential factors and parameters have been described in the simulations. The following key findings are as three models of BPNN, RFR, and kNN indicating forecast results were fairy accuracy for three countries Taiwan, Korea, and Japan. Regarding the error index, the BPNN model predicting Vietnamese labor immigration to Northeast Asian Countries obtained the best R2, MAPE, RMSE, and MAE value. At the same time, the RFR and kNN models showed better forecasting performances. The points indicate that BPNN was the highlight compared with two RFR and kNN algorithms. In addition, the forecasting errors in the case of the models increased if the testing data increased.
The model was demonstrated excellently by a training accuracy of 89.98% and validation accuracy of 84.05%. In terms of estimating tourism demand in Japan, Chen, Lai [
27] used a novel forecasting model based on empirical mode decomposition (EMD) and a BPNN, which revealed that the MAPE, RMSE of the proposed EMD-BPNN algorithm are, respectively, 0.958%, 1443, implying that these output values are higher than this study parameter. Moreover, Mishra et al. (2021) [
31] study results witnessed that for predicting international tourists with tiny datasets, the random forest regression works well (with an R-square of 0.847), pointing out higher values than this study outputs in terms of R-square indicators.
In addition, the method using three models was the most classical one; however, the models were deployed for the time series data give performance comparable to deep learning models (as Long short-term memory (LSTM) models), but faster training speed and less resource-intensive (as the computer’s memory). For instance, Wang et al. (2020) [
29] deployed BPNN and LSTM models to forecast the monthly inbound tourism to mainland China (2000–2013), the monthly tourist arrival Turkey from different countries (2000–2011), and the monthly inbound tourism to The United States of America (June 2006–May 2018). The research result showed that R of BPNN model with 0.999 to approximate the R of LMST model with 0.998.
Although this study used annual data to anticipate labor force migration, the results showed that three models in three specific countries produced reliable results. As a result, it can predict labor migration in other countries around the globe.
5. Conclusions
Labor exports have been among of the most important tasks in Vietnam’s socio-economic development over the past three decades. Labor exports will help increase income and improve skills for workers. In addition, labor exports also contribute to reducing unemployment in the country. Hence, the study aimed to examine the adjustment in the Vietnamese labor migration to Northeast Asian Countries. Based on annual data from 1992 to 2020, this study focused on the role of labor exports on Vietnam’s socio-economic factors and compared them with the movement of human resources to other Asian countries. Afterward, the study implemented three machine learning models for estimation. At the same time, the result indicated that three models earned high accurate results for predicting labor migration, in which the BPNN model showed a more accurate level than other algorithms. Hence, the BPNN model can be effectively applied for labor migration prediction. In the context of integration, forecasting human resource export variation may find a separate direction for Northeast Asian Countries’ human resource activities. Although the MAPE, MAE, and RMSE parameters of three models also attained the high level of accuracy which were compared with the other studies’ parameters values, the BPNN model earned the highest accuracy of the actual labor exports with = 0.06, = 191, = 225, = 0.006, = 71, = 102, = 0.007, = 57, = 67, respectively. In addition, this study proved that machine learning models play a key role in the decision-making progress for conducting an effect of labor exports. Using Augmented Dickey-Fuller (ADF) test indicates that stationary data will attain more forecast results’ accurate parameters compared with non-stationary data. The study results support stakeholders in establishing projects related to constructing and training high-quality human resources migration, increasing the laborers’ surplus value. Moreover, human resource migration problems help governments enforce suitable policies to appeal to labor from overseas. Another possible future work is to deeply analyze the GDP contribution of human resource immigration for Vietnam’s economy.