4.3.2. Dagum Gini Coefficient
The Dagum Gini coefficient and decomposition is a method put forward by Dagum [
68] to measure regional differences. In this paper, this method is used to measure the interprovincial differences in high-quality agricultural development. When measuring the regional differences, the Gini coefficient can be classified into three sections according to the sub-group decomposition method: intra-regional differential contribution, inter-regional differential contribution, and hypervariable density contribution [
69]. These three sections represent, respectively, the origin of differences in the high-quality agricultural development within the regions, the origin of differences in the high-quality agricultural development between regions, and the overlapping effect among different regions. This paper divides the 30 provinces (except Tibet) in China’s mainland into 3 regions: the east, the center, and the west (note: they are divided according to the economic development status. The eastern region includes Beijing, Tianjin, Hebei, Liaoning, Shanghai, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong, and Hainan. The central region includes Shanxi, Jilin, Heilongjiang, Anhui, Jiangxi, Henan, Hubei, and Hunan. The western region includes Guangxi, Inner Mongolia, Chongqing, Sichuan, Guizhou, Yunnan, Shaanxi, Gansu, Qinghai, Ningxia, and Xinjiang).
First, we calculate the overall Gini coefficient
G, the intra-regional Gini coefficient
Gj, and the inter-regional Gini coefficient
Gjh using the respective calculation equations as follows:
In these equations, k = 3 represents the number of regions, n = 30 represents the number of provinces, Y represents the average value of the high-quality agricultural development indexes for all provinces, Yj(Yh) represents the average value of the high-quality agricultural development indexes for the region j(h), and Yji(Yhr) represents the level of high-quality agricultural development in the province i(r) in the region j(h).
Second, we define the relevant variables. , , and represent the relative influences of the high-quality agricultural development indexes between two regions; represents the difference in the high-quality agricultural development indexes between the regions, and when , Mjh represents the weighted average of all the difference values for the high-quality agricultural development indexes under the condition . is the hypervariable first moment, and when , Njh represents the weighted average of all under the condition .
Finally, we calculate the contribution of the intra-regional differences
Gw, the contribution of inter-regional differences
Gnb, and the hypervariable density contribution
Gt using the respective calculation equations as follows:
In this paper, we calculate the Gini coefficient for the spatial distribution of high-quality agricultural development in 30 provinces in China from 2009 to 2019 and perform regional decomposition to research the inter-regional differences in high-quality agricultural development.
4.3.3. Kernel Density Analysis
Kernel density estimation is an important non-parametric estimation method that focuses on the data and researches their distribution characteristics [
70]. It has become a mainstream method to research unbalanced distribution. This paper uses this method to describe the overall shape and distribution characteristics of high-quality agricultural development. As this method requires no prior information on models and can describe the distribution shape of variables by estimating the continuous density curve of random variables, kernel density estimation curves at different periods can represent the high-quality development statuses at different periods. Using images, this method directly displays the sizes and the changing processes in the regional differences in high-quality development [
71]. Assume the density function for the random variable
X is
f(
x), the probability density of point
x can be estimated with Equation (7). In the equation,
N is the number of observation values, h is the bandwidth,
K(·) is the kernel function, which is a weighting function or a smooth transition function,
Xi is the independent and identically distributed observation value, and
x is the average value.
According to the different expression forms of the kernel density functions, the kernel functions can be divided into Gaussian kernel, Epanechnikov kernel, triangular kernel, quartic kernel, etc. This paper selects the commonly used Gaussian kernel function for estimation. The expression of the Gaussian kernel function is shown in Equation (8): as there is no definite function expression for non-parametric estimations, we need to examine the distribution changes using graphic comparison. We can obtain the position, shape, and extensibility of the variable distribution using the graphs for the kernel density estimation results.
This paper uses kernel density estimation to analyze the dynamic evolution of the high-quality agricultural development distribution in China within the sample investigation period. It not only describes the overall shape of the high-quality agricultural development but also grasps the dynamic characteristics of the distribution of the regional high-quality agricultural development distribution using a comparison among different periods.
4.3.4. QAP Method
When exploring the influencing factors for regional differences, we regard the difference between regions as a relationship. Additionally, using relational data can better explore the interaction between two individuals [
72]. The econometric model for the relational data set in this paper is as follows:
where
β0,
β1, and
β2 are the parameters to be estimated;
X and
Y are the explanatory and explained variables;
Z is the control variable; and
U is the residual term. In the relational data model, all variables are square matrices of order
n, where the observations
yij,
xij, and
zij in the matrix, respectively, represent the difference between the explained variables, explanatory variables, and control variables in two regions, and their values can be obtained by calculating
yi −
yj,
xi −
xj, and
zi −
zj. Since the observations are the differences in indicators between two regions, the main diagonal elements are all 0 when
i =
j. For the relational data model, the correlation coefficient between the column and row elements in the residual matrix
U is not zero; that is, the column and row elements are not independent but correlated, which leads to an autocorrelation issue in the econometric model [
73]. In addition to the autocorrelation problem, there is also serious multicollinearity among variables in the form of relational data. If the traditional statistical test method is used, the variance in and standard deviation of the parameter estimates will increase, and the significance test for the variables will be meaningless [
74,
75]. In order to solve the problem of autocorrelation and multicollinearity in relational data models, QAP, a non-parametric test method based on random permutation, becomes essential [
76]. This method converts the relationship matrix into a long vector, calculates the regression coefficients, and then performs random replacements to judge the significance of the parameter estimates. Its implementation consists of the following two steps. The first step is long vector regression. The variable matrix in Equation (9) is transformed into an
n × (
n − 1)-dimensional column vector, that is, a long vector, and then OLS estimation is performed on the long vector to obtain the regression coefficient set Γ(
Y,
XZ) and goodness of fit R
2. As mentioned above, due to the autocorrelation problem of relational data, the standard error obtained using the OLS estimation method is wrong [
77], and the significance of traditional statistical test methods (such as
t-test and F-test) will no longer be reliable. The second step is random permutation and a statistical test. A linear relationship between
X and
Z is assumed in Equation (9), as shown in Equation (10), and
E is the classical residual term. If
δ ≠ 0, then there is multicollinearity between
X and
Z, and the estimator can be expressed using Equation (11), where
is the OLS estimator from Equation (10).
Residual matrix permutation requires random permutation of both a row and a column in
to obtain a new residual matrix
. After several random permutations, Equation (12) can be used to estimate the reference value of the test statistic.
At this point, Equation (12) is identical to Equation (9) under the null hypothesis
β1 = 0. If the estimation error
can be ignored, then the residual matrix after random permutation has the same distribution as
E, which means the following:
We can repeat this step many times and save the regression coefficients and goodness of fit R2 after each random permutation to obtain the regression coefficient set ; then, we can estimate the standard error of the statistic. Assuming that after mtotal random permutations, the number of times that the regression coefficient generated with the permutation is greater than or equal to or less than or equal to the long vector regression coefficient in the first step is expressed using mlarge and msmall, respectively. Then, we can obtain two proportions: one is the proportion of regression coefficients generated with random permutations that are greater than or equal to the regression coefficients of the long vector in the first step, denoted using plarge, where plarge = mlarge/mtotal; the other one is the proportion of regression coefficients generated with random permutations that are less than or equal to the regression coefficients of the long vector in the first step, denoted using psmall, where psmall = msmall/mtotal. Since the conditions of plarge and psmall overlap, their sum need not equal one. In the statistical test, the above two proportions can be directly regarded as the minimum significance level for rejecting the null hypothesis—that is, the statistical p-value [L]. The two-tailed test is used for the regression coefficient. Therefore, if the regression coefficient is positive, plarge is used as the p-value for the statistical test. Conversely, if the regression coefficient is negative, psmall is used as the p-value for the statistical test. In addition to being able to compute the p-values of the regression coefficients, random permutation can also compute the p-values of R2. Unlike the two-tailed test for the regression coefficients, R2 uses a one-tailed test, so the p-value of R2 is expressed as the ratio of the number of times that random permutation produces R2 greater than or equal to the long vector regression R2 in the first step to the total number of random permutations.
In this paper, the regional differences in high-quality agricultural development are jointly determined using the marketization level, industrial structural level, labor market characteristics, and other factors. This paper uses the differences in economic indexes, such as the index for high-quality agricultural development and the index for marketization degree, between different regions as the data set, the difference matrices formed with various sub-systems as the explanatory variables, and the difference matrices of the high-quality agricultural development indexes as the explained variables. Considering that QAP does not need to assume independence of explanatory variables, which can effectively avoid multicollinearity, this paper applies the QAP regression method to perform a regression analysis on the influential factors for the regional differences in high-quality agricultural development.
This paper applies the annual data from 30 provinces (excluding Tibet) in China’s mainland in 2009–2019 and uses the differences in the high-quality agricultural development in the provinces as the explained variables, the inter-regional differences in marketization indexes as the explanatory variables, and the differences in the industrial structure, the urbanization rate, the old age dependency ratio, and the geographical distance as the control variables. The explained variables, the explanatory variables, and the control variables are set as follows:
The regional differences in high-quality agricultural development. This paper uses the above-mentioned high-quality agricultural development indexes for the provinces to construct the matrix of the regional differences in high-quality agricultural development.
The regional differences in marketization. Applying the marketization indexes for the provinces as the proxy variables of marketization to construct the matrices of regional differences in marketization, this paper also further investigates the impacts of the five sub-indicators of marketization degree, including the degree of governmental regulation of the economy, the development of the non-state-owned economy, the development of the product market, the development of the factor market, and the level of market order. This helps us to investigate the mechanism by which the marketization degree influences the regional differences in high-quality agricultural development from the micro-perspective.
Regional differences in industrial structure. As the industrial development level in each region differs, the development of the industrial structure in each region is not synchronous. The industrial structural level impacts high-quality development. In order to avoid measurement error in high-quality agricultural development caused by the differences in industrial structure, this paper uses the proportion of the added value from the tertiary industry to the added value from the secondary industry as the proxy variable for the industrial structure to construct the matrix of the regional difference in the industrial structure.
Regional differences in the urbanization rate. The level of urbanization varies from region to region, with different numbers of people engaged in agricultural production and different regions. In order to avoid measurement error in high-quality agricultural development caused by the differences in agricultural scale in the regions, this paper uses the urbanization rate (the proportion of the urban population to the total population) as the proxy variable for urbanization and constructs the matrix of regional difference in urbanization based on the urbanization rate measurement.
Regional differences in the old age dependency ratio. The population’s age level impacts the regional economic vitality and the agricultural labor supply level. In order to avoid measurement error in high-quality agricultural development caused by differences in age, this paper uses the regional differences in the old age dependency ratio as one of the control variables and uses the proportion of the elderly population to the gross population as the proxy variable for the old age dependency ratio, and thus constructs the matrices of regional difference in the old age dependency ratio for the provinces.
Geographical distance. Due to the spillover and spread effects of technology and innovation, the regions geographically close to each other always share similar technology and innovation levels. In order to avoid errors in high-quality measurement in adjacent regions caused by technology spillover, this paper uses geographical distance as one of the control variables. The interprovincial geographical distance is calculated and obtained using ArcGis.