Big Data Spatio-Temporal Correlation Analysis and LRIM Model Based Targeted Poverty Alleviation through Education

Han, Yue; Liu, Lin; Sui, Qiaoli; Zhou, Jiaxing

doi:10.3390/ijgi10120837

Open AccessArticle

Big Data Spatio-Temporal Correlation Analysis and LRIM Model Based Targeted Poverty Alleviation through Education

¹

College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao 266590, China

²

College of Resource Environment and Tourism, Capital Normal University, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

ISPRS Int. J. Geo-Inf. 2021, 10(12), 837; https://doi.org/10.3390/ijgi10120837

Submission received: 8 October 2021 / Revised: 25 November 2021 / Accepted: 10 December 2021 / Published: 20 December 2021

Download

Browse Figures

Versions Notes

Abstract

:

There are many factors affecting poverty, among which education is an important one. Firstly, from the perspective of digital statistics, this research quantitatively analyzes the correlation between average education years (AEY) and Gross Domestic Product per capita (GDP/C), and finds that there is a significant positive correlation between AEY and GDP/C in provinces of China. Furthermore, from the perspective of spatial distribution and geostatistics, this research analyzes the correlation between AEY and the distribution of poor counties, revealing the inherent connection between education and poverty. Based on the data processing of nighttime light remote sensing images, this research adopts the machine learning method of random forest to extract the distribution status of spatio-temporal sequences for poor counties. Through the analysis, it is found that poor counties are characterized by centralized distribution and spatial autocorrelation spatially, and the number of poor counties decreases year by year in temporal evolution. On this basis, we analyze the correlation between education levels and the distribution of poor counties. It is found that, on the spatial scale, AEY in poor counties is relatively low, while AEY in non-poor counties is relatively high, showing a significant negative correlation between the two. On the temporal scale, the number of poor counties gradually decreased from 2000 to 2010, and at the same time, the education levels of poor counties also gradually improved. Finally, from the perspective of improving education levels to promote poverty elimination, we analyze the main factors affecting education using Principal Component Analysis (PCA) and other methods and obtain a regression model. This research proposes the Linear and Residual Integration Model (LRIM) to more accurately predict AEY in each province in 2020 based on historical data, and identifies the regions with low AEY as key regions for targeted poverty alleviation through education (TPAE) in the future. This research provides a decision-making basis to achieve TPAE means, helping to achieve the victory of the national education poverty elimination battle.

Keywords:

targeted poverty alleviation through education; correlation analysis; random forest; LRIM composite model; nighttime light remote sensing image

1. Introduction

The Chinese government innovatively proposed a strategy for targeted poverty alleviation (TPA) and poverty elimination [1,2] in 2013 and implemented a major policy of TPA in 2015. TPA is a poverty alleviation method that accurately identifies, assists and manages poverty alleviation objects based on the actual condition of different poor regions and different poor peasant households [3]. There are different understandings for the definition of TPA. Li and Ye [4] believed that TPA is to carry out precision poverty alleviation in different poor regions according to scientific standards, and introduced a dynamic development mechanism for poverty alleviation according to the local actual condition. Dong [5] thought that not only the targets of TPA should be precise to assist truly poor regions, but also poverty alleviation measures and effects should be precise. At the same time, many scholars had conducted research on TPA. Liu et al. [6] analyzed the relevant policy system, mechanism innovation and future challenges of poverty alleviation. Wu [7] analyzed the dilemmas of rural public management under TPA and proposed targeted solutions. Ruoqi et al. [8] analyzed the existing problems of TPA mechanism in industrial poverty alleviation through local poverty alleviation effects and experience, and innovated the mechanism to help poor households get rid of poverty and become rich. Zhao et al. [9] proposed a poverty alleviation mode combined with a big data platform and put forward targeted solutions and suggestions. Zhang et al. [10] simulated the interactive relationship between local government and poor households through an evolutionary game to improve the effects of the poverty elimination battle.

Poverty alleviation through education (PAE) is the most effective and direct way of TPA [11] and is an important breakthrough in realizing regional poverty alleviation [12]. “To alleviate poverty, we should first support education, and to eliminate poverty, we should first eliminate ignorance.” PAE is to fundamentally change the status and situation of poverty by the way of improving education levels [13,14]. Liu [15] pointed out that for precise assistance to poor regions, priority development of education must be ensured in these regions. Sun and Guo [16] analyzed the effects and defects of targeted poverty alleviation through education (TPAE) for the poor and found that there are many problems in the current TPAE. Yuan [17] thought that education must be developed in order to change the state of poverty and achieve sustainable development in poor regions. Peters and Besley [18] pointed out that lack of learning opportunities led to more children falling into crisis and poverty in New Zealand. Meng et al. [19] found through the research that there is a certain correlation between gross domestic product per capita (GDP/C) and average education years (AEY) in Gansu Province. Sun and Liu [20] explored the dynamic change trend of education equity degrees between regions. Liu [21] analyzed poverty alleviation through finance and education using big data and studied new modes and ways of TPA in Henan. Ma et al. [22] explored the factors affecting AEY and analyzed the effects of different influencing factors on AEY. Janjua and Kamal [23] found through research a significant correlation between education improvement and poverty population reduction. Chen et al. [24] established educational poverty alleviation systems at each level by comparing and analyzing education poverty alleviation policies in developed countries such as the United States and the United Kingdom. Liu et al. [25] investigated the spatial agglomeration effect and driving forces of rural education levels and poverty in 27 provinces of China. Xu et al. [26] constructed a mathematical model of the spatial dynamic panel for the relationship between fiscal education expenditure on poverty alleviation and rural poverty, measuring the direct and indirect poverty alleviation effects of financial expenditure on PAE. Paraschiv [27] explored the role of education in poverty alleviation and studied the social and economic impacts of poverty on countries in the Organization for Economic Co-operation and Development.

Through the analysis of the existing research, it can be found that there are a lot of research studies related to poverty alleviation, while there are a few research studies related to PAE. Moreover, the research on TPAE is in the preliminary stage. For TPAE, the existing research studies have the following three problems. First, most of the research focuses on the concept, relevant policies and measures of TPA from the perspective of social science, and some carry out case studies in specific regions. It is rare to use a scientific method to research the correlation between education and poverty and how to carry out TPAE. Second, although the idea that education can improve poverty has been widely accepted, there is a lack of theoretical basis behind this phenomenon. There are few studies exploring the causal effect of education on poverty improvement. Third, in order to achieve the goal of TPAE, the key is to clarify the inner connection between education and poverty, qualitative research cannot meet the needs. At present, there is a lack of quantitative research on the relationship between education and poverty, which is not feasible to rely solely on statistical analysis.

In response to the above existing problems, the main contributions of this research are as follows. In terms of method, introduce GIS spatial analysis and geostatistical analysis, and combine with traditional mathematical-statistical analysis to carry out TPAE research. In terms of specific application, choosing AEY as the indicator of education development levels, the research analyzes the internal relationship between education development levels and poverty from the two aspects of the relationship between AEY and GDP/C and the relationship between AEY and the distribution of poor counties, quantitatively analyze its correlation, explore its causal effects, providing a theoretical basis for TPAE. On this basis, we further explore the influencing factors of education levels and adopt the Principal Component Analysis (PCA) method to extract the main factors to obtain the regression model. Finally, this research combines the advantages of Autoregressive Integrated Moving Average model (ARIMA) model for linear time series data prediction and Back Propagation (BP) neural network model for nonlinear residual prediction and builds the Linear and Residual Integration Model (LRIM) to accurately predict AEY in provinces. The regions with low AEY are then determined as the key regions for TPAE in the future, providing a decision-making basis for TPAE.

The organization of the rest of the paper is as follows. Section 2 introduces the research area and data, including the types and sources of data, the total population with different education levels, GDP/C indicator and nighttime light remote sensing data. Section 3 describes the research methods used in the paper, including the proposed research framework, correlation analysis method, random forest classification method for nighttime light image, analysis method for influencing factors of education, etc., and proposes LRIM integrated model at the end of this section to predict education levels. Section 4 mainly presents the results of the correlation analysis between education and poverty obtained by the above methods, and the prediction results of the AEY obtained by the constructed model and analyzes and discusses the results. Section 5 draws the research conclusions, and based on the conclusions, determines the targeted regions for TPAE and the effective measures that should be adopted.

2. Data

The research area is the Chinese mainland, including 22 provinces, five autonomous regions, and four municipalities, and does not involve Taiwan, Hong Kong, and Macao regions of China.

The research uses three data sources of Census Data, Statistical Yearbook Data, and nighttime light remote sensing data to calculate the required indicators.

2.1. Population Number with Different Education Levels

Education levels are divided into five levels: illiteracy, elementary school education level (EEL), junior high school, high school education level (JHEL) and technical secondary school education level (HEL), university and above education level (UEL). The population of five education levels is counted using the Sixth Census Data, the 2011 Statistical Yearbook, and the 2011 Education Statistical Data, and the results are shown in Table 1, and the spatial pattern of the national education levels is analyzed.

2.2. GDP/C Indicator

According to the population and the total GDP of each province in Sixth Census Data, GDP/C indicator of each province in 2010 is obtained, as shown in the last column of Table 1.

2.3. Nighttime Remote Sensing Data

In order to match Census Data and Education Statistics Yearbook Data, the research selects nighttime light remote sensing data (DMSP/OSL) in the four periods of 1995, 2000, 2005, and 2010. We crop the research area image based on the national vector map and perform DN outlier processing, radiation correction and other preprocessing. For the preprocessed data, based on the pixel scale in each county, we extract the 11 classification characteristics (Table 2) of nighttime light images from the four angles of the quantitative characteristic, dispersion degree, distribution characteristic, and spatial characteristic. The table of classification characteristics is as follows.

The research selects 100 poor counties and 110 non-poor counties as classification samples and extracts characteristic indicators of A1–A11 based on nighttime light remote sensing data. Considering space limitations, we select an indicator value from each of the four categories of characteristics, displaying the characteristic values of each county in 2010, as shown in Figure 1. The A1 indicator is selected as the quantitative characteristic, the A4 indicator is selected as the dispersion degree, the A8 indicator is selected as the distribution characteristic, and the A11 is selected as the spatial characteristic.

3. Methods

3.1. Proposed Research Framework

The proposed research framework of the research is shown in Figure 2, and mainly includes three parts, respectively the correlation analysis between education and poverty, the analysis of the influencing factors of education levels, and the prediction of education levels for the purpose of determining the poverty alleviation region.

3.2. Correlation Analysis Method

3.2.1. Calculation of AEY

AEY generally refers to the average value of the total number of academic education years received by the population aged 6 and above. In the specific calculation, according to the population of 5 types of education levels, the following formula is used to calculate in provinces as units.

Y_{a v e} = N P_{L e v 1} * 0 + N P_{L e v 2} * 6 + N P_{L e v 3} * 9 + N P_{L e v 4} * 12 + N P_{L e v 5} * 16

(1)

In the formula,

Y_{a v e}

is AEY, and

N P_{L e v 1} - N P_{L e v 5}

are respectively the population of 5 types of education levels: illiteracy, EEL, JHEL, HEL, and UEL. The coefficients are taken the value based on the current education system of China at all stages of education.

Table 3 shows AEY from 1989 to 2019 in China.

3.2.2. Calculation of Correlation Coefficient

Because Spearman Correlation Coefficient (SCC) [28,29] is not sensitive to data errors and extreme values, this research first uses Pearson Correlation Coefficient (PCC) [30,31] to calculate the relationship between education and poverty and then uses SCC to calculate the correlational relationship. The advantage of using the two correlation coefficients is that the results can be mutually verified.

PCC [32,33] is used to measure the linear correlation between interval variables by studying whether two data sets are on the same line. The calculation formula is:

ρ_{X, Y} = \frac{\sum X Y - \frac{\sum X \sum Y}{N}}{\sqrt{(\sum X^{2} - \frac{{(\sum X)}^{2}}{N}) (\sum Y^{2} - \frac{{(\sum Y)}^{2}}{N})}}

(2)

SCC [34] is based on PCC, using the rank of the elements in their respective sets to calculate the monotonicity relationship between elements in the two sets. The calculation formula is:

ρ = 1 - \frac{6 \sum d_{i}^{2}}{n^{3} - n}

(3)

3.2.3. Random Forest Classification Algorithm

This research uses the random forest algorithm of machine learning to extract the poor counties from nighttime light remote sensing images [35,36]. Random Forest Algorithm is based on Bagging algorithm. N training sets with the same sample size as the original sample set are randomly sampled from the original sample set with replacement, and through N training sets establish N decision trees {

h_{(x, θ_{n})}, n = 1, 2, \dots, N

} to form a “forest” [37,38]. The unselected sample is out-of-bag data, and the probability that the sample is out-of-bag data is

{(1 - 1 / N)}^{N}

. When N is large enough, the probability is close to 37.8%. When each decision tree grows to select characteristic variables, randomly and equiprobably extract F from all E characteristic variables (usually F =

| l o g_{2} E | + 1

) to construct each decision tree. Each decision tree is not pruned during its growth to maximize its growth. The final result of random forest regression depends on the voting of the classification results of N decision trees, namely:

h_{(x)} = \frac{1}{N} \sum_{n = 1}^{N} h_{(x, θ_{n})}

(4)

In the formula,

x

is the independent variable and dependent variable of the input model, and

θ_{n}

is the independent identically distributed random vector.

3.2.4. Spatial Autocorrelation Analysis

Moran’s I [39,40,41,42] is a method to calculate the spatial autocorrelation coefficient, and this research uses it to analyze the spatial autocorrelation of the distribution of poor counties. The value of Moran’s I is distributed in [−1, 1], which is used to judge whether there is an autocorrelation relationship in the space. The statistical formula of Moran’s I for spatial autocorrelation is:

I = \frac{n}{S_{0}} \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i, j} Z_{i} Z_{j}}{\sum_{i = 1}^{n} Z_{i}^{2}}

(5)

3.3. Analysis Method of Influencing Factors

3.3.1. Influencing Indicator Selection

There are many factors affecting education development levels in a region. Generally speaking, the factors can be divided into the following categories: population distribution, economic development, funding support, teachers, etc. The seven influencing factors are extracted from four categories. The natural population growth rate factor

X_{1}

, the urban-rural population ratio factor

X_{2}

, the population sex ratio factor

X_{3}

, the industry population structure factor

X_{4}

, and GDP/C

X_{5}

are selected as the demographic factors, the compulsory education funding factor

X_{6}

is selected as the funding factor, and the ratio of students per teacher

X_{7}

is selected as the teacher factor.

Using the 2010 Census Data and the 2011 Statistical Yearbook Data, the above seven factors are statistically calculated. Among them, factors such as the urban-rural population ratio, industry population structure, compulsory education funding, and teachers are calculated by the following formulas.

X_{2} = \frac{N P_{r u r a l}}{N P}

(6)

X_{4} = \frac{N P_{l o w}}{N P_{i n d u}}

(7)

X_{6} = P E F_{l e v 2} + P E F_{l e v 3}

(8)

X_{6} = (S P T_{L e v 2} + S P T_{L e v 3} + S P T_{L e v 4} + S P T_{L e v 5}) / 4

(9)

In the formula,

N P_{r u r a l}

is the rural population,

N P

is the national total population,

N P_{l o w}

is the industry population with low education levels,

N P_{i n d u}

is the national total industry population,

P E F_{lev 3}

is the per capita education funding of different education levels,

S P T_{L e v}

is the number of students per teacher at each education levels,

lev 2 - lev 5

are respectively EEL, JHEL, HEL, and UEL.

Table 4 shows the data of the seven influencing factors in provinces of China in 2010.

3.3.2. Principal Component Analysis

Principal Component Analysis (PCA) [43,44,45,46] based on the idea of dimensionality reduction, can convert multiple factors into fewer comprehensive indicators that have no relationship with each other under the state of maintaining a low information loss. Firstly, construct a few linear combinations suitable for the original variables to generate new variables that are not correlated with each other, and then extract several variables containing most of the information of the original variables to explain the original variables. The extracted new variables are the principal components. PCA can be expressed by a mathematical model:

F_{i} = a_{i 1} Z X_{1} + a_{i 2} Z X_{2} + \dots a_{i n} Z X_{n} i = 1, 2, 3 \dots \dots n

(10)

In the formula,

a_{i n}

is the characteristic vector, and

Z X_{n}

is the value after standardizing the original variable, (

i = 1, 2, 3 \dots \dots n

).

3.4. Proposed LRIM Model

In the research, a new LRIM is constructed to predict education development levels on time series data. The basic idea of the model is: considering that the linear and nonlinear composite structure is the common characteristic of the actual common time-series data, the data structure of time series is divided into two parts of the linear autocorrelation main body and the nonlinear residual. ARIMA [47,48] is used to predict the linear main body of the data, and calculate the residual between the prediction results and the real values. Then BP neural network model is used to predict the nonlinear residual, and the final prediction results are obtained by the integration.

ARIMA (p, d, q) model can be expressed as:

Δ^{d} y_{t} = θ_{0} + \sum_{i = 1}^{p} φ_{i} Δ^{d} y_{t - 1} + ε_{t} + \sum_{j = 1}^{q} θ_{j} ε_{t - j}

(11)

AR represents the autoregressive model,

p

is the number of autoregressive terms corresponding to the model; MA is the moving average model,

q

is the number of corresponding moving average terms, and

d

is the number of different times to keep the sequence stationary.

In the formula,

Δ^{d} y_{t}

represents the sequence of

y

after

d

differential transformation, and

ε_{t}

is the random error at time

t

, which obeys a normal distribution with a mean value of zero and a constant variance. They are mutually independent white noise sequences and are parameters to be estimated in the model.

The process of using ARIMA for time-series prediction is: ① Preprocess the collected time series data and perform stationarity test and white noise test. ARIMA can only be used to predict when it is tested as a stationary non-white noise sequence. ② Model recognition is to select a model that matches the given time-series data from the known prediction models. ③ Order determination for a model is to determine the order of the chosen prediction model by using the Bayesian Information Criterion (BIC). ④ Parameter estimation is to estimate the model parameters by using methods such as least squares estimation, maximum likelihood estimation and correlation matrix estimation. ⑤ Model test is to verify the fitting effect of the prediction model.

BP neural network is one of the most widely used neural networks, which is a multilayer feed-forward neural network trained by the error back-propagation algorithm. The basic idea of the model is to train by using the gradient descent method and then reversely modify the weights and thresholds of the hidden layer according to the training results. After continuous learning and improvement, finally, obtain an optimized model output results of which are consistent with the input data mode. It is convenient to mine the nonlinear mode in time-series data.

Assume that the time-series

y_{t}

of the research is composed of two parts: the linear autocorrelation main body

L_{t}

and the nonlinear residual

F_{t}

, namely:

y_{t} = L_{t} + F_{t}

(12)

The method of using the newly proposed LRIM to predict the time series data is:

(1): Model the linear autocorrelation main body $L_{t}$ through ARIMA model, determine the parameters of ARIMA (p, d, q) to establish a prediction model, and obtain the prediction result $L_{c t}$ . And subtract the prediction result $L_{c t}$ and the original time series $y_{t}$ to get the residual $F_{t}$ .

$F_{t} = y_{t} - L_{c t}$

(13)
(2): The residual sequence $F_{t}$ contains the nonlinear part of the original sequence, and uses BP neural network model to describe this nonlinear relationship. Assuming that there are a pieces of input data in BP, the residual sequence is expressed as:

$F_{t} = f (e_{t - 1}, e_{t - 2}, e_{t - 3}, \dots, e_{t - a}) + ε_{t}$

(14)
(3): Integrate the predicted values of the two parts, namely:

$y_{c t} = L_{c t} + F_{c t}$

(15)

4. Results and Discussion

4.1. Correlation between Education Levels and Poverty

4.1.1. Similarity of Spatial Distribution Pattern

Education levels are divided into five levels: illiteracy, EEL, JHEL, HEL, and UEL. The population with five educational levels in China (Table 1) are conducted to spatialize and obtain the spatial distribution as shown in Figure 3a–e. Figure 3f shows the distribution of GDP/C in provinces of China. According to Figure 3, it can be seen that the overall distribution of the illiteracy and the population with EEL by provinces shows a decreasing trend from southwest to northeast. The population in Tibet, Qinghai, Guizhou, Yunnan and other places accounts for a large proportion. For example, the illiteracy in Tibet Autonomous Region accounts for more than 35%, indicating the education levels in these regions are relatively backward. The distribution of the population with JHEL, HEL and UEL by provinces shows a decreasing trend from east to west. The population of Beijing, Shanghai, Tianjin and other coastal areas accounts for a large proportion. For example, the population with UEL in Beijing accounts for more than 32%, indicating that the education levels in these regions have increased significantly.

Economic development levels among provinces in China is not the same, which is specifically manifested in the difference of GDP/C indicator by provinces. The research calculates GDP/C indicators of provinces in 2010 as shown in the last column of Table 1, and its spatial distribution is shown in Figure 3f. It can be seen according to the figure that GDP/C in provinces of China in 2010 shows a decreasing trend from east to west, and GDP/C of coastal areas is significantly higher than that of central and inland areas. GDP/C in Beijing and Shanghai is high reaching 77,205 yuan in Shanghai, and 9214 yuan in Guizhou.

It can be seen according to Figure 3 that the distribution of GDP/C in provinces of China has a similar spatial pattern with the distribution of higher education levels (HEL and UEL).

4.1.2. Correlation between AEY and GDP/C

In order to quantitatively analyze the relationship between education and poverty, the researchers used two correlation coefficients to calculate the correlation between AEY and GDP/C. Using Formula (1) calculates AEY by provinces of China in 2010, and maps its geographic distribution, as shown in Figure 4a. It can be seen that the distribution of AEY shows a decreasing trend from east to west as a whole. Among them, in regions such as Beijing and Shanghai, AEY is more than 11 years, while in Tibet, Qinghai, Gansu, Guizhou, Yunnan and other provinces, AEY is only about 5 to 8 years, which has not reached the standard of compulsory education.

AEY and GDP/C in provinces of China are generated into a line graph, as shown in Figure 4b. In the figure, it can be seen that the trends of AEY and GDP/C curve are roughly similar, showing a positive correlation.

The correlation coefficient is used to further calculate the correlation between AEY and GDP/C. Formulas (2) and (3) are used to calculate PCC and SCC values between AEY and GDP/C in 2010, and the results are shown in Table 5.

As can be seen in Table 5, the PCC of AEY and GDP/C is 0.729 and SCC is 0.754, the above indicating a significant positive correlation between AEY and GDP/C in 2010. Since SCC is based on PCC, use the rank ordering of the elements in their respective sets in the two sets to calculate the correlation between the two, the calculation result of the correlation is better than PCC.

Comprehensive analysis, there is a significant correlation between GDP/C and the local AEY. Therefore, education levels in the provinces of China and the local poverty levels are correlated. The improvement of the education levels can well promote the economic development and reduce the poverty indicator in a region. At the same time, the economic development levels of a region can well drive the improvement of the local education levels. The improvement of education levels is an important way and method to prevent the continuous spread of poverty in a region.

4.1.3. Correlation between AEY and Poor Counties Distribution

On the basis of finding a significant positive correlation between AEY and GDP/C, this research analyzes the correlation between AEY and the distribution of poor counties from another aspect, so as to explore causality to a certain extent based on the two correlations.

Nighttime light remote sensing image reflects the economic activities of humans and can reflect the economic condition of the region to a certain extent. Based on nighttime light remote sensing images, the research uses the random forest classification algorithm of machine learning to extract the spatio-temporal distribution of poor counties in China from 1995 to 2010, and further analyzes the correlation between AEY and the distribution of poor counties in provinces of China.

According to the 14 contiguous poor regions as a whole delineated by China in 2010 and the existing data of national key poor counties in March 2020, select the point-like samples of 100 poor counties and 110 non-poor counties as the samples of the random forest classification training set. The research superimposes 11 characteristic information of the selected nighttime light image to generate a composite image gathering 11 bands. Perform random forest classification on the training samples in 1995, 2000, 2005, and 2010, and obtain the classification results are shown in Figure 5. The training samples used in the four years are the same, which is convenient for subsequent research and analysis.

From the spatial perspective, poor counties show a trend of concentrated distribution, concentrated in the southwest, northwest, central and northeast of China, mainly in Tibet Autonomous Region, Yunnan Province, Qinghai Province, Guizhou Province, Ningxia Hui Autonomous Region and other regions. The number of poor counties in the northeast of China, south China and Sichuan and Chongqing regions decreases significantly, and poor counties in the surrounding areas of urban agglomerations are more likely to convert into non-poor counties. From the perspective of temporal evolution, from 1995 to 2010, the number of poor counties in provinces shows a declining trend year by year, mainly manifested in the trend of decreasing from east to west and decreasing of outward radiation from urban agglomerations, and poor counties gradually convert to non-poor counties.

In March 2020, 100 poor counties and 100 non-poor counties in the existing data of national key poor counties are selected as the accuracy-test data, and the accuracy of the two recognition results before and after the nighttime light data in 1995, 2000, 2005 and 2010 is calculated. The overall accuracy and Kappa coefficient identified are shown in Table 6.

It can be seen in Table 6 that the classification results based on the random forest algorithm in 1995, 2000, 2005 and 2010 have a good effect. The accuracy of classification and recognition of poor counties is relatively high, with an overall accuracy above 95%. Kappa coefficients are all above 0.85, which can well distinguish the spatio-temporal dynamic distribution from 1995 to 2010.

LISA map and Moran’s I scatter plot of the distribution of poor counties in China are shown in Figure 6. From spatial perspective, poor counties of China are still concentrated in the southwest, central, and northeast of China, while non-poor counties are mainly distributed in the northern coast of China. From the perspective of temporal evolution, show a trend of transforming from poor counties to non-poor counties in the northwest, northeast, central and southwestern of China, indicating the number of poor counties in China decreasing from 1995 to 2010. From 1995 to 2010, the number of High-High regions shows a gradually increasing trend, and finally, the number of High-High reaches 474 in 2010, indicating that the number of non-poor counties in China is increasing. The number of Low-Low regions shows a gradually decreasing trend, from 314 in 1995 to 200 in 2010. The number of Low-High regions shows a steady growth trend, basically maintaining around 85, indicating the process of transforming from poor counties into non-poor counties in China still going on. According to the Moran’s I scatter plot, it can be seen that the distribution of poor counties in China shows a significant spatial autocorrelation characteristic.

Taking the two time periods of 2000 and 2010 as examples, the research carries out a correlation analysis between AEY and the proportion of the number of poor counties in provinces and generates the corresponding scatter plot as shown in Figure 7. Using Formulas (2) and (3) calculate obtaining that the PCC of the two in 2000 is −0.794, and it in 2010 is −0.748. It can be seen that there is a significant negative correlation between AEY and the distribution of poor counties.

By comparing the distribution of AEY and poor counties in 2000 and 2010, it can be seen that on the spatial scale, AEY in poor counties is relatively low, while AEY in non-poor counties is relatively high, and the two show a significant negative correlation. On the time scale, the number of poor counties has gradually decreased from 2000 to 2010, at the same time, the education levels of poor counties also gradually improved.

4.2. Analysis of Influencing Factors of Education levels

This research selects the seven influencing factors among the four categories of population distribution, economic development, funding support, and teachers, including the natural growth rate factor

X_{1}

, the population urban-rural ratio factor

X_{2}

, the population sex ratio factor

X_{3}

, the industry population structure factor

X_{4}

, the resident comprehensive income factor

X_{5}

, the compulsory education funding factor

X_{6}

, the teacher-per-student ratio factor

X_{7}

. Since the above seven factors may be correlated, use PCA to determine the influencing factors and weights of education development levels. According to the data in Table 4, draw scree plot and scatter plot of the seven characteristic values. It can be seen according to Figure 8 that, from the third point, the characteristic value of the point tends to be flat, so extract two factors for calculation. Among the seven characteristic factors, only the natural population growth rate (

λ 1 = 4.3150

) and the urban-rural proportion of the population (

λ 2 = 1.3151

) meet the requirements, and other characteristic factors cannot be taken as the principal component factors.

After the principal component transformation, the contribution rate of the first principal component is 62.68%, and that of the second principal component is 18.79%, that is, the cumulative contribution rate of the first two principal components is 81.47%. Select the first two principal components for performing factor rotation to obtain the rotation component matrix. The formulas for the two principal components are:

P r i n t 1 = 0.520 X_{1} + 0.958 X_{2} - 0.265 X_{3} + 0.945 X_{4} - 0.944 X_{5} - 0.927 X_{6} + 0.639 X_{7}

(16)

P r i n t 2 = 0.697 X_{1} + 0.043 X_{2} + 0.885 X_{3} - 0.047 X_{4} + 0.033 X_{5} + 0.108 X_{6} + 0.178 X_{7}

(17)

It can be seen from the above formula that, excluding the variables whose standardized coefficients do not meet the condition (less than 0.3), the main role in the first principal component is the natural population growth rate factor

X_{1}

, the population urban-rural ratio factor

X_{2}

, the industry population structure factor

X_{4}

, the resident comprehensive income factor

X_{5}

, the compulsory education funding factor

X_{6}

and the teacher ratio factor

X_{7}

. The main role in the second principal component is the natural population growth rate factor

X_{1}

and the population sex ratio factor

X_{3}

. In the first principal component,

X_{1}

,

X_{2}

, and

X_{4}

are positively correlated,

X_{5}

and

X_{6}

play a negative role, and in the second principal component,

X_{1}

and

X_{3}

are positively correlated.

Since the correlation also depends on the contribution rate for each of the seven influencing factors, a regression model is constructed for factor analysis, and the main indicators are screened out by comparative analysis among the seven indicators.

Based on the seven characteristic variables extracted in provinces of China in 2010, the PCC matrix is calculated according to Formula (2), the correlation between the seven characteristic variables related to AEY is analyzed, and the heat map of PCC is obtained, as shown in Figure 9. In the figure, ZRZZ is the natural population growth rate factor, namely

X_{1}

factor, and CXBL, XBBL, HXRK, JMSR, JYJF, SZLL are respectively

X_{2}

to

X_{7}

factors.

KMO measure and Bartlett test of sphericity are performed on seven characteristic values, and the results are shown in Table 7. KMO measure value is greater than 0.8, indicating that the result of factor analysis is very well. The significance of the Bartlett test of sphericity is less than 0.05, indicating that the data are distributed in a spherical shape. Each variable is independent of each other to a certain extent, which is suitable for factor analysis.

Taking the seven characteristic parameters as independent variables and AEY in provinces of China in 2010 as dependent variables, sort by the contribution rate of the variables, carry out the bidirectional regression fusion, and establish a stepwise regression analysis model. The parameter table is shown in Figure 10.

It can be seen from the results that the industry population structure factor

X_{4}

, the natural population growth rate factor

X_{1}

, and the compulsory education funding factor

X_{6}

are the main factors affecting regional education development levels. It can be seen from Table 8 that the p values for significance test of the regression coefficients corresponding to the three selected influencing factors are under the condition that the confidence level is lower than 0.05, therefore, the selected regression variable factors all meet the significance. From this, the regression equation of the regression model is obtained:

Z = - 1154 X_{4} - 0309 X_{1} - 0486 X_{6}

(18)

It can be seen from the established stepwise regression model that the greater the value of the industry population structure, the greater the ratio of the industry population with low education levels in a region, and the lower education development levels. The higher the proportion of the population growth rate, the faster the natural population growth in a region, and the lower education development levels. The more the investment of the compulsory education funding, the greater the investment required for popularizing compulsory education in a region, and the lower education development levels.

4.3. Prediction of AEY in Provinces

In order to overcome the defects of using a single linear model to predict time series data, the research predicts AEY in provinces of China based on the LRIM model constructed in Section 3.4 and compares it with the prediction results of the two traditional models.

Based on the data of history AEY from 1989 to 2019 (Table 3), the ARIMA model, BP neural network model and LRIM integration model constructed in the research are used to predict the time-series data of the national AEY. Among them, the data from 1989 to 2017 are used as the modeling sample, and the data from 2018 to 2019 are used as the test sample.

4.3.1. Prediction Results of ARIMA Model

To apply the ARIMA model, data stationarity detection and white noise detection should be performed first. The augmented Dickey–Fuller (ADF) test result is shown in Figure 11, and the parameter

d

in the ARIMA (p, d, q) model is determined to be 1.

According to the autocorrelation coefficient and partial correlation coefficient of the data series in Figure 12, it can be seen that there is an obvious truncation phenomenon when the autocorrelation coefficient and the partial correlation coefficient are both 1. Determined the model coefficient p = 1, q = 1, the final model is ARIMA (1, 1, 1).

According to the regression result shown in Figure 13, the t values of AR(1) and MA(1) are respectively 18.61856 and −1.853016, and the p values are respectively 0 and 0.0757, therefore, it is reasonable to use the ARIMA (1, 1, 1) model to predict.

So the specific form of ARIMA model is:

L_{t} = 7.99962728764 + [A R (1) = 0.982330422402, M A (1) = - 0.43215476676]

(19)

The prediction results using the ARIMA (1, 1, 1) model are shown in Figure 14, using the ARIMA (1, 1, 1) model, and the final predicted value of the national AEY in 2020 is 9.360730 years.

4.3.2. Prediction Results of BP Model

The time-series data of the national AEY from 1989 to 2019 is a one-dimensional sequence. The designed BP neural network structure is shown in Figure 15, including three input nodes and 10 hidden layer nodes. Select 70% of the data as the training data set, 15% of the data as the test data set, and 15% of the data as the validation data set. The prediction results after training the model are shown in Figure 16. The prediction value of national AEY in 2020 is 9.5654 years, Root Mean Square Error (

R M S E

) is 0.2795, and Mean Absolute Percentage Error (

M A P E

) is 2.1293.

4.3.3. Prediction Results of LRIM Model

Using the LRIM model constructed in Section 3.4, predict the national AEY in 2020, and the final prediction result is 9.4741 years.

In order to evaluate the prediction effects of the three prediction models on the national AEY, this research uses two measurement indicators,

M A P E

and

R M S E

, the formulas of which are as follows:

R M S E = {(\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y_{i}})}^{2})}^{1 / 2}

(20)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} | \frac{y_{i} - \hat{y_{i}}}{y_{i}} | \times 100 %

(21)

The smaller the value of

R M S E

and

P E

, the higher the model prediction accuracy, and the closer the prediction result to the true value. The prediction effects of the three models are shown in Table 9. Compared with the ARIMA model, the

R M S E

of the LRIM model decreases by 0.2303 and the

M A P E

of that decreases by 2.074. Compared with the BP neural network model, the

R M S E

of the LRIM model decreases by 0.0928 and the

M A P E

of that decreases by 0.5053. This shows that the prediction effects of the LRIM model are better than that of the single ARIMA model or BP model.

The LRIM model is used to predict AEY in provinces in 2020, and the results are shown in Table 10, the broken line graph drawn as Figure 17.

It can be seen in Table 10 that

R M S E

of the prediction result using LRIM is less than 0.3, with the least 0.1522 in Beijing and the greatest 0.2299 in Shanghai. The prediction effect of the model is well, and the prediction value of AEY in provinces is close to the theoretical standard level. Analyzing the prediction results of AEY in provinces in 2020, AEY in all other provinces can reach more than 9 years except for the Tibet Autonomous Region, Fujian Province, and Guizhou Province. Among them, Beijing has the highest AEY, reaching 13.17149 years, and Tibet has the lowest, only 5.456407 years.

The prediction values of AEY in 2020 in Tibet, Guizhou, Yunnan, Fujian and other regions do not meet the compulsory education standard of 9 years. These regions will be identified as key regions for TPAE in the future. The government department should formulate strategies for TPAE according to the leading factors affecting education in the region.

The prediction value of AEY in provinces can provide a decision basis for the relevant government department to accurately determine the corresponding poverty alleviation regions and to take poverty alleviation measures, ultimately achieving the victory of the poverty elimination battle.

5. Conclusions

TPAE is an important part of poverty elimination, giving full play to the key role of education in TPA. This research analyzes the correlation between education levels and poverty, and the influencing factors of education levels. Moreover, on the basis of the above analysis conclusions, it predicts education levels in provinces of China to determine key regions for poverty alleviation.

(1): There is a significant positive correlation between AEY and GDP/C. The higher AEY is, the higher the local GDP, showing that AEY can be used as an indicator of PAE to a certain extent. By increasing AEY, it can help the locals to improve their economic level and eliminate poverty.
(2): There is a negative correlation between AEY and the distribution of poor counties. It indicates that the low level of local education is to some extent a factor causing poverty in the region. Moreover, by increasing AEY in the region, the local poverty situation can be improved, therefore, PAE is an important channel for TPA in China.
(3): The industry population structure, the natural population growth rate, and the compulsory education funding have become the main factors affecting the level of regional education development. By continually improving the above three influencing factors can enhance the overall AEY in provinces, ultimately helping the region eliminate poverty.
(4): The LRIM model constructed in this research is used to predict and analyze AEY in provinces of China based on historical data. In 2020, AEY in most regions of China has basically reached the national nine-year compulsory education, however, there are still some regions, such as Tibet, Guizhou, Yunnan, and Fujian, where AEY are very low. The above regions are identified as key regions for TPAE in the future. Only by realizing the improvement of education levels in these regions can the goal of TPAE in China be achieved.
(5): Finally, on the basis of the above research, we have designed and developed a WeChat mini-program of “Through Train for TPAE”, which serves as a one-to-one assistance platform between TPAE volunteers and poor students to promote the implementation of TPAE.

Author Contributions

Conceptualization, Yue Han and Lin Liu; methodology, Yue Han, Lin Li. and Qiaoli Sui; software, Lin Liu and Jiaxing Zhou; investigation, Lin Liu; data curation, Yue Han, Lin Liu and Jiaxing Zhou; writing—original draft preparation, Yue Han, Qiaoli Sui and Jiaxing Zhou; writing—review and editing, Lin Liu, Qiaoli Sui and Jiaxing Zhou; funding acquisition, Lin Liu. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Natural Science Foundation of Shandong Province (NO. ZR2019MD034) and the Education Reform Project of Shandong Province (NO. M2020266).

Data Availability Statement

Data in this experiment could be found at the National Oceanic and Atmospheric Administration (NOAA) (https://www.noaa.gov/, accessed on 9 December 2021), National Bureau of Statistics for the Sixth Census Data (http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm, accessed on 9 December 2021), the 2010 China Educational Funding Statistical Yearbook (http://www.stats.gov.cn/tjsj/ndsj/2010/indexch.htm, accessed on 9 December 2021) and the 2011 Statistical Yearbook (http://www.stats.gov.cn/tjsj/ndsj/2011/indexch.htm, accessed on 9 December 2021), and education statistical data in 2011 (http://www.moe.gov.cn/s78/A03/moe_560/s7382/, accessed on 9 December 2021).

Acknowledgments

Thanks for the data support provided by National Oceanic and Atmospheric Administration (NOAA), thanks to National Bureau of Statistics for the Sixth Census Data, the 2010 China Educational Funding Statistical Yearbook and the 2011 Statistical Yearbook, thanks for the education statistical data in 2011 provided by the Ministry of Education of the People’s Republic of China.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, Y.S.; Zhou, Y.; Liu, J.L. Regional Differentiation Characteristics of Rural Poverty and Targeted Poverty Alleviation Strategy in China. Bull. Chin. Acad. Sci. 2016, 311, 269–278. [Google Scholar]
Li, Y.H.; Wang, Y.F.; Liu, Y.S. Impact and Mechanism of Social Capital in Poverty Alleviation of China. Bull. Chin. Acad. Sci. 2016, 3, 302–308. [Google Scholar]
Zou, W.K. SWOT Analysis of Precision Poverty Alleviation—Taking Jinxian County as an Example. Old Dist. Constr. 2016, 10, 14–16. [Google Scholar]
Li, K.; Ye, X.J. Rural Precise Poverty Alleviation: Analysis on Theoretical Bases and Practical Situations—Also on Compound Poverty Governance System Building. Res. Fujian-Taiwan Relatsh. 2015, 002, 26–33. [Google Scholar]
Dong, J.F. Precise Credit Poverty Alleviation in Minority Areas. Guizhou Ethn. Stud. 2014, 35, 154–157. [Google Scholar]
Liu, Y.; Guo, Y.; Zhou, Y. Poverty Alleviation in Rural China: Policy Changes, Future Challenges and Policy Implications. China Agric. Econ. Rev. 2018, 10, 241–259. [Google Scholar] [CrossRef] [Green Version]
Wu, L.F. Analysis on the Dilemma of Rural Public Management under Targeted Poverty Alleviation. Shanxi Agric. Econ. 2021, 4, 142–143. [Google Scholar]
Ruoqi, C.; Zheng, Y.H.; Wei, S.Y.; Peisong, L. Research on the Mechanism of Precision Poverty Alleviation in Industry Poverty Alleviation—Taking Aba Prefecture as an Example. In Proceedings of the 2018 International Conference on Education, Psychology, and Management Science (ICEPMS 2018), Shanghai, China, 13–14 October 2018. [Google Scholar]
Zhao, J.N.; Bao, J.; Meng, H.T.; Liu, C. Data—Taking Fuxin Mongolian Autonomous County of Liaoning Province as an Example. Manag. Technol. SME 2021, 4, 174–175. [Google Scholar]
Zhang, J.; Zhang, Y.; Cheng, M.; Yu, N.; Wei, X.; Zhang, Z. Impact of Information Access on Poverty Alleviation Effectiveness: Evidence from China. IEEE Access 2019, 7, 149013–149025. [Google Scholar] [CrossRef]
Zhong, H. Education for Poverty Alleviation is the Most Effective and Direct Accurate Poverty Alleviation: An Interview with Zhong Binglin, President of the China Education Association. Chin. Natl. Educ. 2016, 5, 22–24. [Google Scholar]
Song, S.Y.; Liu, J.X. Research on Practical Problems and Countermeasures of Targeted Poverty Alleviation through Education. J. Heilongjiang Vocat. Inst. Ecol. Eng. 2021, 34, 101–103. [Google Scholar]
Xue, E.Y.; Zhou, X.P. Education and anti-poverty: Policy theory and strategy of poverty alleviation through education in China. Educ. Philos. Theory 2018, 50, 1101–1112. [Google Scholar]
Simone, D. Why education is not helping the poor. Findings from Uganda. World Dev. 2018, 110, 124–139. [Google Scholar]
Liu, B.L. Research on the Performance Evaluation of Educational Precision Poverty Alleviation in Deep Poverty Areas. Econ. Res. Guide 2020, 2. [Google Scholar]
Sun, X.; Guo, Y. The Ways of Educational Targeted Poverty Alleviation for the Poor in Rural Areas in Chongqing. Educ. Sci. Theory Pract. 2018, 18, 6. [Google Scholar]
Yuan, X.Y. Research on the Theoretical Basis, Profound Connotation and Measures of “Poverty Alleviation Though Education”. Comp. Study Cult. Innov. 2021, 5, 5–8. [Google Scholar]
Peters, M.A.; Besley, T. Children in Crisis: Child Poverty and Abuse in New Zealand. Educ. Philos. Theory 2014, 46, 945–961. [Google Scholar] [CrossRef] [Green Version]
Meng, Z.H.; Yu, H.; Li, Y.L. Evaluation of Poverty Alleviation in Education Based on GIS: A Case Study of Gansu. J. Henan Inst. Educ. 2019, 28, 46–54. [Google Scholar]
Sun, B.C.; Liu, Y.P. Estimating Educational Equality between Regions and Genders in China—Based on Gini Coefficients of Education from 2002 to 2012. Tsinghua J. Educ. 2014, 35, 87–95. [Google Scholar]
Liu, L.L. The Research on the New Pattern and New Approach to Accurate Poverty Alleviation in Henan Based on Big Data Analysis. In Proceedings of the 2017 9th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Changsha, China, 14–15 January 2017; pp. 422–426. [Google Scholar]
Ma, L.L.; Xu, D.G. Analysis of the Effect of Factors on the Average Years of Education in China. J. Minzu Univ. China 2009, 18, 139–142. [Google Scholar]
Janjua, P.Z.; Kamal, U.A. The Role of Education and Health in Poverty Alleviation a Cross Country Analysis. J. Econ. Manag. Trade 2014, 896–924. [Google Scholar] [CrossRef] [PubMed]
Chen, Q. Comparison and Reference of the Policies of Education Precision and Poverty Alleviation in Developed Countries—Take the United States, Britain, France, and Japan as Examples. Contemp. Educ. Sci. 2019, 3, 40–46. [Google Scholar]
Liu, W.; Li, J.; Zhao, R. The Effects of Rural Education on Poverty in China: A Spatial Econometric Perspective. J. Asia Pac. Econ. 2021, 2, 1–23. [Google Scholar] [CrossRef]
Xu, C.S.; Li, T.; Li, N. Study of Poverty Reduction Effect of Financial Education Expenditure on Poverty Alleviation—An Empirical Analysis Based on Spatial Dynamic Panel Model. J. Tianjin Univ. Commer. 2021, 41, 63–72. [Google Scholar]
Paraschiv, C.I. The Role of Education in Poverty Alleviation. Theor. Appl. Econ. 2017, 24. [Google Scholar]
Spearman, C. General Intelligence, Objectively Determined and Measured. Am. J. Psychol. 1904, 15, 201–292. [Google Scholar] [CrossRef]
Zhang, W.Y.; Wei, Z.W.; Wang, B.H.; Han, X.P. Measuring Mixing Patterns in Complex Networks by Spearman Rank Correlation Coefficient. Phys. A Stat. Mech. Appl. 2016, 451, 440–450. [Google Scholar] [CrossRef]
Zhu, H.Q.; Wang, Y.Y.; Li, X.Q. An Analysis of the Disequilibrium of Educational Development—Based on the Empirical Study of Guizhou Province. Stat. Decis. 2019, 35, 93–96. [Google Scholar]
Wang, Y.; Li, H. An Empirical Study on the Impact of Educational Gap on Income Gap. In Proceedings of the 2017 International Conference on Education Science and Economic Management (ICESEM 2017), Xiamen, China, 14–15 October 2017; pp. 14–15. [Google Scholar]
Tan, N.H. Research on the Development Level of Higher Education in my country’s Central and Southwestern Provinces and Cities—Based on SPSS Analysis. Times Financ. 2020, 787, 86–88. [Google Scholar]
Bishara, A.J.; Hittner, J.B. Reducing Bias and Error in the Correlation Coefficient due to Nonnormality. Educ. Psychol. Meas. 2015, 75, 785–804. [Google Scholar] [CrossRef]
Xiao, C.; Ye, J.; Esteves, R.M.; Rong, C. Using Spearman’s Correlation Coefficients for Exploratory Data Analysis on Big Dataset. Concurr. Comput. Pract. Exp. 2016, 28, 3866–3878. [Google Scholar] [CrossRef]
DaSilva, D.O.; Klausner, V.; Prestes, A.; Macedo, H.G.; Aakala, T.; da Silva, I.R. Principal Components Analysis: An Alternative Way for Removing Natural Growth Trends. Pure Appl. Geophys. 2021, 178, 3131–3149. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Ziegler, A.; König, I.R. Mining Data with Random Forests: Current Options for Real-World Applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2014, 4, 55–63. [Google Scholar] [CrossRef]
Li, X.H. Using “Random Forest” for Classification and Regression. Chin. J. Appl. Entomol. 2013, 50, 1190–1197. [Google Scholar]
Song, H.Y.; Park, S. An Analysis of Correlation between Personality and Visiting Place using Spearman’s Rank Correlation Coefficient. KSII Trans. Internet Inf. Syst. 2020, 14, 1951–1966. [Google Scholar]
Moran, P.A.P. Notes on Continuous Stochastic Phenomena. Biometrika 1950, 37, 17–23. [Google Scholar] [CrossRef] [PubMed]
Chen, Y. New Approaches for Calculating Moran’s Index of Spatial Autocorrelation. PLoS ONE 2013, 8, e68336. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, T.; Lin, G. On Moran’s I Coefficient under Heterogeneity. Comput. Stat. Data Anal. 2016, 95, 83–94. [Google Scholar] [CrossRef]
Thompson, E.S.; Saveyn, P.; Declercq, M.; Meert, J.; Guida, V.; Eads, C.D. Characterisation of Heterogeneity and Spatial Autocorrelation in Phase Separating Mixtures using Moran’s I. J. Colloid Interface Sci. 2018, 513, 180–187. [Google Scholar] [CrossRef]
Hotelling, H. Analysis of a Complex of Statistical Variables into Principal Components. J. Educ. Psychol. 1933, 24, 417. [Google Scholar] [CrossRef]
Vyas, S.; Kumaranayake, L. Constructing Socio-Economic Status Indices: How to Use Principal Components Analysis. Health Policy Plan. 2006, 21, 459–468. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pérez-Arribas, L.V.; León-González, M.E.; Rosales-Conrado, N. Learning Principal Component Analysis by using Data from Air Quality Networks. J. Chem. Educ. 2017, 94, 458–464. [Google Scholar] [CrossRef]
Mantas, C.J.; Castellano, J.G.; Moral-García, S.; Abellán, J. A Comparison of Random Forest Based Algorithms: Random Credal Random Forest versus Oblique Random Forest. Soft Comput. 2019, 23, 10739–10754. [Google Scholar] [CrossRef]
Fattah, J.; Ezzine, L.; Aman, Z.; El Moussami, H.; Lachhab, A. Forecasting of Demand Using ARIMA Model. Int. J. Eng. Bus. Manag. 2018, 10, 1847979018808673. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Extraction of nighttime light characteristic of pixels in all counties in China in 2010.

Figure 2. The proposed research framework.

Figure 3. Distribution map of the population with education levels in provinces of China. (a) Distribution of the illiteracy in provinces. (b) Distribution of the population with EEL in provinces. (c) Distribution of the population with HEL in provinces. (d) Distribution of the population with JHEL in provinces. (e) Distribution of the population with UEL in provinces. (f) Distribution of GDP/C in provinces.

Figure 4. Correlation between distribution of AEY and GDP/C. (a) Distribution of AEY in provinces in 2010. (b) The line chart of AEY and GDP/C.

Figure 5. Results of random forest classification in poor counties from 1995 to 2010.

Figure 7. Scatter plot of AEY and distribution of poor counties.

Figure 8. Scree plot and scatter plot of PCA.

Figure 9. Heat map of PCC for characteristic variables.

Figure 10. Parameter table of stepwise regression result.

Figure 11. ADF test result of AEY. “Prob.*” is short for probability that comes with software systems, indicating the probability of the existence of a unit root.

Figure 12. Autocorrelation and partial correlation for study sequence.

Figure 13. Regression result using ARIMA (1, 1, 1) model.

Figure 14. Prediction result of ARIMA (1, 1, 1) model.

Figure 15. Structure of BP neural network.

Figure 16. Prediction result of BP neural network model.

Figure 17. Broken line graph of AEY by provinces in 2020.

Table 1. Population number with different education levels and GDP/C by province in 2010.

Province	Illiteracy (10,000 People)	EEL (10,000 People)	JHEL (10,000 People)	HEL (10,000 People)	UEL (10,000 People)	GDP/C in 2010 (Yuan)
Anhui	496.5	1662.9	2261.9	641	398.5	1.6656
Beijing	33.3	195.3	615.7	416.2	617.8	7.0234
Fujian	90	1099.4	1397.8	511.8	308.4	3.3106
Gansu	222.2	831.3	798.3	324.4	192.3	1.2882
Guangdong	204	2394.4	4476	1780.7	856.7	3.9978
Guangxi	124.9	1458.05	1784.15	507.9	275.14	1.6576
Guizhou	303.7	1368	1035	253	183.9	0.9214
Hainan	35.4	197.1	361.9	127.2	67.3	1.876
Hebei	187.7	1772	3190.4	913.1	524.2	2.4583
Henan	399.1	2266.7	3992.2	1242.2	601.5	2.1073
Heilongjiang	78.8	922.5	1727.1	574.3	347.4	2.1593
Hubei	261.9	1309.1	2267.6	950.2	545.6	2.205
Hunan	175.4	1759.3	2596.3	1012.8	498.8	1.9355
Jilin	52.7	660.7	1155.3	463.2	271.6	2.5906
Jiangsu	299.5	1901.7	3041.7	1269.8	850.7	4.3907
Jiangxi	139.4	1337.3	1684.2	549.3	305.2	1.5921
Liaoning	84.3	936.5	1982.9	646.9	523.4	3.4193
Inner Mongolia	100.5	628	968.9	373.7	252.2	3.7287
Ningxia	39.1	187.9	212.1	78.4	57.6	1.9642
Qinghai	57.6	198.4	142.8	58.7	48.5	1.8346
Shandong	475.73	2391.24	3846.82	1332.26	832.87	3.5893
Shanxi	76.2	780.5	1611.5	561.8	311.4	2.0779
Shaanxi	139.8	874.1	1498.1	588.8	394	2.0497
Shanghai	63.1	311.5	839.3	482.6	505.3	7.7205
Sichuan	437.7	2784.6	2805.7	904.5	536.8	1.7289
Tianjin	27.1	220.6	493.6	267.2	226.1	6.3395
Tibet	104.7	109.8	38.6	13.1	16.5	1.5294
Xinjiang	51.6	656	787.3	252.6	232	1.9119
Yunnan	277	1994.4	1263.1	385	265.6	1.3687
Zhejiang	306.1	1568.54	1996.41	738.12	507.78	4.4895
Chongqing	123.9	974.71	951.41	381.14	249.3	2.0219

Table 2. Table of classification characteristics.

Categories	Indicators	Characteristics Description
Quantitative characteristics	A1	The mean value of pixel lights in the county (MEAN)
	A2	The median of pixel lights in the county (MEDIAN)
	A3	The mode number of pixel lights in the county (MAJORITY)
Dispersion degree	A4	The standard deviation of pixel lights in the county (STD)
Dispersion degree	A5	The range of pixel lights in the county (RANGE)
Distribution characteristics	A6	The sum of pixel lights in the county (SUM)
	A7	The minimum pixel lights in the county (MIN)
	A8	The maximum pixel lights in the county (MAX)
	A9	The unique value of the pixel lights in the county (VERIETY)
	A10	The least value of pixel lights in the county (MINORITY)
Spatial characteristics	A11	Local Moran’s I in the county (Moran’s I)

Table 3. AEY in China from 1989 to 2019.

Year	Period	Year	Period	Year	Period	Year	Period
1989	6.261	1997	7.009	2005	7.831	2013	9.048
1990	7.87	1998	7.088	2006	8.04	2014	9.037
1991	6.25	1999	7.179	2007	8.186	2015	9.077
1992	6.259	2000	7.114	2008	8.27	2016	9.078
1993	6.47	2001	7.621	2009	8.38	2017	9.21
1994	6.744	2002	7.734	2010	8.21	2018	9.38
1995	6.715	2003	7.911	2011	8.846	2019	9.5
1996	6.794	2004	8.01	2012	8.942

Note: The data comes from the National Bureau of Statistics.

Table 4. Table of statistical data of influencing factors.

Region	Natural Population Growth Rate (‰)	Urban-Rural Population Ratio (%)	Sex Ratio (Female = 100)	Industry Population Structure	GDP/C (Yuan)	Compulsory Education Funding (Yuan)	Students per Teacher Ratio
Beijing	3.07	20.65	106.75	0.395	70,452	34,505.43	13.94
Tianjin	2.6	31.54	114.52	0.578	62,574	26,324.9	13.41
Hebei	6.81	79.98	102.84	0.798	24,283	9010.32	17.86
Shanxi	5.3	73.64	105.56	0.74	21,522	8788.71	19.06
Inner Mongolia	3.76	67.57	108.17	0.73	40,282	14,376.15	16.79
Liaoning	0.42	49.66	102.54	0.721	35,239	12,152.21	15.97
Jilin	2.03	62.86	102.67	0.738	26,595	13,047.16	15.67
Heilongjiang	2.32	63.14	102.85	0.75	22,447	11,078.51	16.09
Shanghai	1.98	23.36	106.19	0.502	78,989	35,953.83	15.09
Jiangsu	2.85	61.65	101.52	0.713	44,744	15,638.28	16.53
Zhejiang	4.73	62.54	105.69	0.736	44,641	15,114.9	18.2
Anhui	6.75	79.53	103.39	0.829	16,407	7155.67	23.87
Fujian	6.11	65.99	105.96	0.757	33,840	10,501.46	13.61
Jiangxi	7.66	83.16	106.67	0.8	17,335	5845.42	21.84
Shandong	5.39	70.39	102.33	0.771	35,894	10,073.39	16.56
Henan	4.95	80.5	102.05	0.802	20,280	5596.16	21.48
Hubei	4.34	68.68	105.55	0.757	22,677	7722.7	21.67
Hunan	6.4	80.61	105.8	0.758	20,226	7946.56	19.7
Guangdong	6.97	49.78	108.98	0.697	41,166	7407.99	20.96
Guangxi	8.65	81.85	108.26	0.813	16,045	7655.3	16.02
Hainan	8.98	73.2	112.58	0.753	19,254	11,380.08	23.38
Chongqing	2.77	69.9	102.61	0.773	23,026	7931.88	22.6
Sichuan	2.31	80.21	103.13	0.832	17,339	7449.52	23
Guizhou	7.41	84.06	106.31	0.865	10,971	5962.81	23.14
Yunnan	6.54	86.24	107.9	0.863	13,539	7635.31	21.92
Tibet	10.25	90.93	105.7	0.888	15,008	15,407.13	15.07
Shaanxi	3.72	76.33	106.92	0.752	21,485	9980.78	19.7
Gansu	6.03	79.44	104.42	0.81	12,802	7436.28	19.85
Qinghai	8.63	75.69	107.4	0.787	19,454	12,434.92	28.55
Ningxia	9.04	67.32	104.99	0.688	20,382	9828.54	19.54
Xinjiang	10.56	72.17	106.87	0.746	24,978	13,657.27	16.21

Table 5. Correlation between AEY and GDP/C.

Correlation			AEY in 2010	GDP/C
Pearson correlation	AEY in 2010	PCC	1.000	0.729
Pearson correlation	GDP/C	PCC	0.729	1.000
Spearman correlation	AEY in 2010	SCC	1.000	0.754
Spearman correlation	GDP/C	SCC	0.754	1.000

Note: At 0.01 level (two-tailed), the correlation is significant.

Table 6. Overall accuracy and kappa coefficient.

Years	Overall Accuracy	Kappa Coefficient
1995	97.5024%	0.8950
2000	97.4305%	0.8833
2005	96.6620%	0.8523
2010	97.1094%	0.8687

Table 7. Results of KMO measure and Bartlett test.

KMO Measure of Sampling Adequacy		0.805
Butterlit test of sphericity	Approximate chi-square number	194.442
	Freedom degree	21
	Significance	0.000

Table 8. Results of stepwise regression model (Significance test < 0.05).

Characteristic	Regression Coefficient	p Value for Significance Test
Industry population structure (X₄)	−1.154	<0.001
Natural population growth rate (X₁)	−0.309	<0.001
Compulsory education funding (X₆)	−0.486	<0.001

Table 9. Comparison of prediction effects of three models.

Evaluation Indicator	ARIMA	BP	LRIM
RMSE	0.417	0.2795	0.1867
MAPE	3.698	2.1293	1.6240

Table 10. Prediction value of AEY in 31 provinces in 2020.

Province	Prediction Value	$R M S E$	Province	Prediction Value	$R M S E$
Beijing	13.17149	0.1522	Tianjin	11.71074	0.2085
Hebei	9.766235	0.2124	Shanxi	10.439371	0.2207
Inner Mongolia	10.335862	0.2248	Liaoning	9.557991	0.169
Jilin	10.342367	0.1939	Heilongjiang	10.289159	0.1851
Shanghai	11.8419	0.2299	Jiangsu	10.07094	0.2299
Zhejiang	9.632423	0.2426	Anhui	9.428845	0.1757
Fujian	8.8743	0.2326	Jiangxi	9.37056	0.1807
Shandong	9.952434	0.1894	Henan	9.796795	0.227
Hunan	10.411652	0.2052	Hunan	10.74815	0.2672
Guangdong	10.81008	0.2018	Guangxi	9.752049	0.178
Hainan	10.01459	0.2358	Chongqing	10.506915	0.2032
Sichuan	9.390266	0.1994	Guizhou	8.633355	0.2097
Yunnan	9.340406	0.1967	Tibet	5.456407	0.1926
Shaanxi	10.665378	0.2209	Gansu	9.468711	0.1676
Qinghai	8.688841	0.2038	Ningxia	9.620349	0.1851
Xinjiang	9.9956	0.1647

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, Y.; Liu, L.; Sui, Q.; Zhou, J. Big Data Spatio-Temporal Correlation Analysis and LRIM Model Based Targeted Poverty Alleviation through Education. ISPRS Int. J. Geo-Inf. 2021, 10, 837. https://doi.org/10.3390/ijgi10120837

AMA Style

Han Y, Liu L, Sui Q, Zhou J. Big Data Spatio-Temporal Correlation Analysis and LRIM Model Based Targeted Poverty Alleviation through Education. ISPRS International Journal of Geo-Information. 2021; 10(12):837. https://doi.org/10.3390/ijgi10120837

Chicago/Turabian Style

Han, Yue, Lin Liu, Qiaoli Sui, and Jiaxing Zhou. 2021. "Big Data Spatio-Temporal Correlation Analysis and LRIM Model Based Targeted Poverty Alleviation through Education" ISPRS International Journal of Geo-Information 10, no. 12: 837. https://doi.org/10.3390/ijgi10120837

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Big Data Spatio-Temporal Correlation Analysis and LRIM Model Based Targeted Poverty Alleviation through Education

Abstract

1. Introduction

2. Data

2.1. Population Number with Different Education Levels

2.2. GDP/C Indicator

2.3. Nighttime Remote Sensing Data

3. Methods

3.1. Proposed Research Framework

3.2. Correlation Analysis Method

3.2.1. Calculation of AEY

3.2.2. Calculation of Correlation Coefficient

3.2.3. Random Forest Classification Algorithm

3.2.4. Spatial Autocorrelation Analysis

3.3. Analysis Method of Influencing Factors

3.3.1. Influencing Indicator Selection

3.3.2. Principal Component Analysis

3.4. Proposed LRIM Model

4. Results and Discussion

4.1. Correlation between Education Levels and Poverty

4.1.1. Similarity of Spatial Distribution Pattern

4.1.2. Correlation between AEY and GDP/C

4.1.3. Correlation between AEY and Poor Counties Distribution

4.2. Analysis of Influencing Factors of Education levels

4.3. Prediction of AEY in Provinces

4.3.1. Prediction Results of ARIMA Model

4.3.2. Prediction Results of BP Model

4.3.3. Prediction Results of LRIM Model

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI