Next Article in Journal
Strain-Rate and Stress-Rate Models of Nonlinear Viscoelastic Materials
Previous Article in Journal
A Fixed-Time Event-Triggered Consensus of a Class of Multi-Agent Systems with Disturbed and Non-Linear Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Inverse Probability-Weighted Estimation for Dynamic Structural Equation Model with Missing Data

National Academy of Innovation Strategy, China Association for Science and Technology, Beijing 100038, China
Mathematics 2024, 12(19), 3010; https://doi.org/10.3390/math12193010
Submission received: 3 September 2024 / Revised: 24 September 2024 / Accepted: 25 September 2024 / Published: 26 September 2024
(This article belongs to the Section E: Applied Mathematics)

Abstract

:
In various applications, observed variables are missing some information that was intended to be collected. The estimations of both loading and path coefficients could be biased when ignoring the missing data. Inverse probability weighting (IPW) is one of the well-known methods helping to reduce bias in regressions, while belonging to a promising but new category in structural equation models. The paper proposes both parametric and nonparametric IPW estimation methods for dynamic structural equation models, in which both loading and path coefficients are developed into functions of a random variable and of the quantile level. To improve the computational efficiency, modified parametric IPW and modified nonparametric IPW are developed through reducing inverse probability computations but making fuller use of completely observed information. All the above IPW estimation methods are compared to existing complete case analysis through simulation investigations. Finally, the paper illustrates the proposed model and estimation methods by an empirical study on digital new-quality productivity.

1. Introduction

Inverse probability weighting (IPW) is a well-known technique in dealing with missing data problems [1]. On the basis of complete case analysis, IPW rebalances the set of complete cases so as to make it representative of the population and reduce the potential existence of bias. Until now, IPW has been considered and wildly applied in various regression-type applications. However, combining the structural equation model (SEM) with IPW to handle missing data problems is not a well-developed topic.
Different from the classical regression models, the structural equation model investigates the relations among different groups of variables and simultaneously measures the relations within each group. The structural equation model labels “groups” as latent variables (denoted as LV ), representing those abstract concepts which cannot be observed directly. Correspondingly, within different groups, the variables are labeled as observed variables, which can be directly observed and can specifically explain the meaning of each group. Mathematically, the relations among the different groups or latent variables can be written through structural model (1), while the relations among each latent variable and its corresponding observed variables can be written through measurement model (2) and (3). All these three equations jointly consitute the structural equation model, which has been investigated by many experts [2,3,4,5,6,7,8,9]. Much of the following research developed the classical structural equation models into quantile-type and even more complex varying-coefficient models [10,11,12,13,14,15].
LV η = P γ LV ξ + E δ
X = L X LV ξ + E X
Y = L Y LV η + E Y
Here, LV η and LV ξ represent a vector of the endogenous latent variables and a vector of the exogenous latent variables, respectively. P γ represents the path coefficients vector, while L X and L Y represent the loading coefficients. The random error term E δ is assumed to have a mean of 0 and fixed variance for the corresponding latent variable. X and Y are vectors of the observed variables for latent variable vectors LV ξ and LV η , respectively. The random error terms E X and E Y are assumed to have a mean of 0 and to be uncorrelated with their corresponding latent variables.
Missing data problems may happen in both latent variables and observed variables. More specifically, latent variables belong to the variables which cannot be directly observed, and thus their values are urgently dependent on the observed variables. However, observed variables sometimes cannot be completely observed due to various reasons. The existence of missing data in observed variables brings huge challenges and difficulty in structural equation model estimations [16,17]. Particularly when the loading and path coefficients are developed into varying functions, estimations with missing data become more complex and hard to handle [18,19,20]. In this case, the simplest approach is to apply complete case analysis (CC), which deletes all the samples if only parts of their information is missing. Ignoring the missing data will waste the observed information, as it will undermine the study efficiency and can sometimes introduce substantial bias.
The paper proposes IPW estimation methods parametrically (denoted as IPW) and nonparametrically (denoted as NIPW) in a dynamic structural equation model with missing data. Our dynamic structural equation model is different from Fang and Wang (2024)’s work [17], which is based on vector autoregression (VAR). Our paper focuses on one kind of dynamic structural equation model, which is inspired by quantile varying-coefficient regression. To further improve the efficiencies of information usage and reduce the computation time, both IPW and NIPW are modified to form two new estimation methods. It should be noted that, due to the different features of our dynamic structural equation model, the existing missing data handling methods are not comparable with our IPW and NIPW [16,17,21].
The rest of our paper is organized as follows. We review the existing dynamic structural equation models and estimation methods in Section 2. Then, we present our proposed parametric and nonparametric IPW estimation algorithms and their corresponding modified ones in Section 3. In Section 4, we carry out simulation studies to investigate the performance of our estimation method. We apply our proposed model and estimation method to digital new-quality productivity real data analysis in Section 5, and some final discussions are included in Section 6.

2. Review of Dynamic Structural Equation Models and Estimation Methods

2.1. Dynamic Structural Equation Models with Varying Coefficients

Dynamic structural equation models (DSEMs) mean that the loading and path coefficients are developed into functions of random variables such as time, location, etc. [22,23,24,25,26,27,28,29]. In DSEMs, changing the relations among the latent variables and observed variables can be captured. Assume that time (denoted as T ) is the random variable affecting both the varying loading coefficients (denoted as P γ ( T ) ) and path coefficients (denoted as L X ( T ) and L Y ( T ) ). In this situation, the dynamic structural equation model [30] can be written as Equations (4).
LV η = P γ ( T ) LV ξ + E δ
X = L X ( T ) LV ξ + E X
Y = L Y ( T ) LV η + E Y
Sometimes, investigations on quantile-based structural equation models with varying coefficients are urgently needed [31]. In this situation, varying relations among latent variables and observed variables (denoted as P γ ( T , τ ) , L X ( T , τ ) and L Y ( T , τ ) ) can be captured at different quantiles τ s according to the following equations [32,33]. It should be noted that the random measurement error terms E δ ( τ ) , E X ( τ ) , and E Y ( τ ) meet the assumptions that their τ th quantiles equal zero under random variables T and their corresponding predicting latent variables in Equations (5).
LV η = P γ ( T , τ ) LV ξ + E δ ( τ )
X = L X ( T , τ ) LV ξ + E X ( τ )
Y = L Y ( T , τ ) LV η + E Y ( τ )

2.2. The Local Polynomial PLS Estimation for Dynamic Structural Equation Models

Partial least square (PLS) and its successors are well-known estimation algorithms used in the background of structural equation models. Especially in dynamic structural equation models, general partial least square cannot accomplish the estimation of both loading and path coefficients. In this situation, the local polynomial method under the framework of PLS is welcomed to solve the dynamic structural equation model problem [30,31].
Local polynomial PLS starts from latent variables’ outer estimation. More specifically, the latent variables can be obtained by calculating the product of their corresponding groups of observed variables with outer weights. The objective function for the estimation of outer weights and the updated procedure can be written as follows:
i = 1 N Φ i [ Y ˜ j , i k = 1 K W ˜ j k ( Θ ) Y j k , i ]
Here, Y ˜ j , i is the scaled external estimation of the jth latent variable for the ith sample size, W ˜ j k ( Θ ) represents the kth estimated parameter of the jth latent variable, and Y j k , i represents the kth observed variable for the jth latent variable of the ith sample size. Θ represents nothing for classical structural equation models (1)–(3), T for the dynamic structural equation model (4), and ( T , τ ) for the quantile-type dynamic structural equation model (5). According to Taylor’s expansion, W ˜ j k ( Θ ) in the latter two types of dynamic structural equation models can be written as the following Equation (7) if it is differentiable:
W ˜ j k ( Θ ) W ˜ j k ( Θ ) + W ˜ j k ( Θ ) ( T T 0 ) + + W ˜ j k ( q ) ( Θ ) ( T T 0 ) q / q ! = l = 0 q W ˜ j k ( l ) ( Θ ) ( T T 0 ) l / l !
Taking q = 1 , W ˜ j k ( Θ ) is estimated through solving the minimization problem [34,35].
m i n i N Φ i { Y ˜ j , i k = 1 K [ W ˜ j k ( Θ ) + W ˜ j k ( Θ ) ( T T 0 ) ] Y j k , i } K [ ( T T 0 ) / h ]
Latent variables’ internal estimations can be obtained through the corresponding external estimations multiplied by inner weights [36]. The weight estimation procedure will not stop until it reaches 200 iterations or until the change in the outer weights between two consecutive iterations is smaller than 10 5 at the same time [37].
Using the scaled internal estimation of endogenous latent variables Z ˜ e n and the scaled internal estimation of exogenous latent variables Z ˜ e x , path coefficients P ˜ ^ ( Θ ) can be estimated according to the following equation:
m i n i N Φ i { Z ˜ e n , i [ P ˜ ( Θ ) + P ˜ ( Θ ) ( T T 0 ) ] Z ˜ e x , i } K [ ( T T 0 ) / h ]
The kernel function K [ ( T T 0 ) / h ] is the following Gaussian kernel function. Here, δ represents the sample standard deviation of the corresponding observed variable vectors or latent variable vectors.
K [ ( T T 0 ) / h ] = 1 ( 2 π ) 1 / 2 e ( T T 0 h ) 2 / 2
h = δ N 1 / 3

3. The Proposed IPW Estimation Algorithms

3.1. The Proposed Parametric IPW Estimation Algorithms

In the paper, the dynamic structural equation model estimation investigation is carried out on the basis of the partial least square framework. Let δ i be the indicator representing whether observed variable X 1 is missing. In other words, X 1 equals 1 when observable; otherwise, it equals 0. In the situation, inverse probability weighting (IPW) is used to rebalance the set of complete cases, making it representative of the whole sample.
As a weight adjustment (WA), IPW weights all completely observed data by the following equation, which means that the reciprocals of the probabilities are missing X S , i if involved, given the completely observed X S , i , Y S , i , and LV S , i .
W A I P W = 1 Π ( X S , i , Y S , i , LV S , i , δ i ) = 1 prob ( δ i = 1 | X S , i , Y S , i , LV S , i )
Here, X S , i , Y S , i , and LV S , i use S to represent the “selected” observed variables or latent variables for the ith sample. The paper defines the “selected” variable" as those observed or latent variables within one certain regression relationship. More specifically, the estimator of the outer weights W ˜ 1 k ( Θ ) for the latent variable LV 1 can be estimated through i N δ i Π ( X 11 , i , X 12 , i , δ i ) Φ i [ Y ˜ 1 , i W ˜ 11 ( Θ ) X 11 , i W ˜ 12 ( Θ ) X 12 , i ] , where the “selected” variables consist of ( LV 1 , X 11 , X 12 ) . Therefore, the estimator of the outer weights W ˜ j k ( Θ ) can be estimated using the following equation:
a r g m i n i N δ i Φ i { Y ˜ j , i k = 1 K [ W ˜ j k ( Θ ) + W ˜ j k ( Θ ) ( T T 0 ) ] Y j k , i } K [ ( T T 0 ) / h ] Π ( X S , i , Y S , i , LV S , i , δ i )
Correspondingly, the estimator of path coefficients P ˜ ^ ( Θ ) can be estimated using the following equation:
a r g m i n i N δ i Φ i { Z ˜ e n , i [ P ˜ ( Θ ) + P ˜ ( Θ ) ( T T 0 ) ] Z ˜ e x , i } K [ ( T T 0 ) / h ] Π ( LV S , i , δ i )
Based on the above investigations, the proposed IPW estimation algorithm can be summarized in dynamic structural equation models with missing data (denoted as Algorithm 1).
Algorithm 1 The proposed IPW estimation algorithm in the dynamic structural equation model
Step 0: Assume the initial values of outer weights.
Step 1: External estimation. Use complete cases of observed variables to calculate estimation of latent variables for the Ith iteration.
Step 2: Internal estimation. Choose centroid scheme, calculate internal weights, and use the product of internal weights and the external estimation of latent variables to obtain internal estimations for the Ith iteration.
Step 3: Update the external weights.
      Step 3-1: Estimate the external weights between latent and observed variables using a r g m i n i N δ i Π ( X S , i , Y S , i , LV S , i , δ i ) Φ i { Y ˜ j , i k = 1 K [ W ˜ j k ( Θ ) + W ˜ j k ( Θ ) ( T T 0 ) ] Y j k , i } K [ ( T T 0 ) / h ] .
      Step 3-2: Calculate the differences of estimated external weights between two consecutive iterations Ith and ( I + 1 ) th.
Step 4: Iterate repeatedly from Step 1 to Step 3.
      Step 4-1: Iterate repeatedly until the results meet the stop criterion.
      Step 4-2: Obtain the final estimated external weights.
Step 5: Estimate the final varying path coefficients using a r g m i n i N δ i Π ( LV S , i , δ i ) Φ i { Z ˜ e n , i [ P ˜ ( Θ ) + P ˜ ( Θ ) ( T T 0 ) ] Z ˜ e x , i } K [ ( T T 0 ) / h ] .

3.2. The Proposed Nonparametric IPW Estimation Algorithms

3.2.1. The Determination of the Nonparametric IPW Equation

Nonparametric inverse probability weighting (NIPW), a more complex weight adjustment (WA) than IPW, is based on the approach of Wang, Wang, Zhao, and Ou (1997), using the following kernel smoother [38]:
N I P W = i = 1 n δ i K i [ ( T T 0 ) / h ] i = 1 n K i [ ( T T 0 ) / h ]
As one of the most popular nonparametric smoothing methods, the kernel smoother [35,38,39] depends on the selection of the kernel function K i [ ] , the order of the kernel function γ , the dimensions of the completely observed parts of the variables d, and the bandwidth smoothing parameter h, which will be investigated at length in the following subsections.

3.2.2. The Choice of Kernel Function

A range of commonly used kernel functions are uniform, quadratic, biweight, Gaussian, and others. Several investigation works already exist which use the first three kernel functions. They have at least one point in common. That is, the researchers should pay attention to the range of T T 0 . For example, the uniform kernel function is K ( T T 0 ) = 1 2 I [ 1 , 1 ] ( T T 0 ) . Another more commonly used kernel function is the Gaussian kernel. Like Chen, Wan, and Zhou (2015), although having no compact support in theory, the Gaussian kernel converges to zero at an exponential rate [35]. For example, the Gaussian kernel K ( T T 0 ) = O ( 10 11 ) has practically zero value for u ≥ 5. In addition, with known order γ , the choice of kernel function usually has little effect on nonparametric estimation and hence has even less of an effect on the estimation of the coefficients [38]. Therefore, the paper chooses the following commonly used Gaussian kernel without alternative kernels in N I P W :
K ( T T 0 ) = 1 ( 2 π ) 1 / 2 e ( T T 0 ) 2 2 ,   w h e r e < T T 0 <

3.2.3. Determining the Order of Kernel Function γ

A kernel function K [ ] is called the γ th order kernel function if it satisfies the following properties. For simplicity, the paper sets u = T T 0 , and we consider the kernel function K [ ] as a Gaussian kernel.
K ( u ) d u = 1 ,
u m K ( u ) d u = 0 ,   f o r m = 1 , , ( γ 1 ) ,
u γ K ( u ) d u 0 ,
K 2 ( u ) d u <
Condition (8) means the sum of the weights equals one. Condition (9) is a type of symmetry condition. For example, u K ( u ) d u = 0 is equivalent to K ( u ) = K ( u ) . Condition (10) shows the order of the kernel function K [ ] . Condition (11) indicates that the kernel function K [ ] is bounded. More strictly, the above four conditions are necessary for K [ ] to be a boundary kernel. In fact, many kernels can be modified to obtain boundary kernels [40].
According to the above investigation, the paper calculates the order of the selected targeting Gaussian kernel (denoted as γ ). Conditions (8) and (9) are obviously satisfied when m = 1 because it is a kind of probability density function and is symmetric. Here are the detailed proofs of condition (10) when γ = 2 and condition (11).
Proof for condition (10). 
When γ = 2 and K [ ] is chosen as the Gaussian kernel,
u γ K ( u ) d u = u 2 K ( u ) d u = u 2 ( 1 / ( 2 π ) 1 / 2 ) e u 2 / 2 d u = ( 1 / ( 2 π ) 1 / 2 ) u 2 e u 2 / 2 d u .
Refer to the integration formula:
u 2 n e u 2 / a 2 d u = 2 π 1 / 2 ( a / 2 ) 2 n + 1 ( 2 n ) ! / n ! ,
Take n = 1 and a = 2 1 / 2 ,
u 2 n e u 2 / a 2 d u = u 2 e u 2 / 2 d u = ( 2 π ) 1 / 2
Therefore,
u γ K ( u ) d u = ( 1 / ( 2 π ) 1 / 2 ) ( 2 π ) 1 / 2 = 1 0 .
Proof for condition (11). 
Because
K 2 ( u ) d u = ( 1 / 2 π ) e u 2 d u
Refer to the integration formula:
e a u 2 d u = ( π / a ) 1 / 2
Easily, the paper obtains the result:
K 2 ( u ) d u = ( 1 / 2 π ) ( π ) 1 / 2 = 1 / 2 π 1 / 2 <
Based on the above calculation process, the final order of the kernel function is determined as γ equals 2. □

3.2.4. Determining the Dimension of W, d

For brevity, let W denote all the completely observed variables, shuffle the elements of W and divide them into two parts of W o b s e r v e d and W m i s s i n g . Here, W o b s e r v e d is observed for all subjects, while W m i s s i n g contains elements for which observations are missing for some subjects.
Actually, d denotes the number of the completely observed and continuous parts of W. The paper requires the continuity here, partially because we will calculate the standard deviation of W. Chen, Wan, and Zhou (2015) and Zhou, Wan, and Wang (2008) take 1 for simplicity [35,41]. In this case, W is organized as [ Y , Z ] 2 q 1 , where both Y and Z are univariate. For example, if Y = ( 1 , 2 , 3 ) T and Z = ( 4 , 5 , 6 ) T , then W = ( 1 , 2 , 3 , 4 , 5 , 6 ) T . Chen, Wan, and Zhou (2015) calculate the bandwidth h = s t d [ x 2 ; x 3 ; y ] / n 1 / 3 by matlab [35]. Here, x 2 , x 3 , and y represent all completely observed variables and x 1 represents the only covariate with missing data. Thus, [ x 2 ; x 3 ; y ] is a 3 n 1 vector, where n is the sample size of x 2 (same as x 3 and y). Therefore, the final dimension of W is chosen as d = 1 .

3.2.5. Selecting Bandwidth Smoothing Parameter h

In order to ensure that n 1 / 2 ( β ^ β ) is an asymptotic normal distribution with a mean of 0 and an estimated covariance matrix like 1 / n M 1 Γ M 1 , the bandwidth h should satisfy at least two conditions:
( 1 ) n h 2 d , d > 0 ,
( 2 ) n h 2 γ 0 , γ > 0
There are several bandwidth selection methods such as the ad hoc method and plug-in method. (Also, there are some criteria such as generalized cross-validation (GCV), unbiased risk (UBR), and the approximate asymptotic mean integrated squared error (MISE) for practical use.) However, a plug-in bandwidth selection seems very complex for practical use because of higher-order covariance calculations [42]. Here, we use a simple ad hoc bandwidth selection method, which does have the correct rate of convergence and is easily programmed and thus h can be written as h = C n L , where C is a constant depending on the unknown function E { Ψ ( Y , Z , β 0 ) | X } and its first and second derivatives. Carroll and Wand (1991) estimated C by calculating the sample standard deviation of the always observed vector of variables W (denoted as δ W ) in their papers [43]. n L indicates the bandwidth rate. We can easily obtain L > 1 / 2 d and L < 1 / 2 γ because of the above two conditions, where n h 2 d and n h 2 γ 0 . Thus, we obtain d < γ . Carroll and Wand (1991) took n L = n 1 / 3 directly as the optimal bandwidth rate [43]. Furthermore, Chen, Wan, and Zhou (2015) provide the optimal bandwidth as O ( n 1 / ( d + γ ) ) and indicate that γ commonly equals 2 [35]. When γ = 2 , d = 1 as 0 < d < γ . In this way, L also equals 1 / 3 . Thus, the paper takes h = δ W n 1 / 3 as the final bandwidth.
Remark 1. 
A second consideration about bandwidth h is based on Gaussian approximation or Silverman’s (1986) rule of thumb (that is, the bandwidth that minimises the mean integrated squared error, h = 1.06 δ W n 1 / 5 ), although it can yield widely inaccurate estimates when the density is not close to being normal [44]. As the common order γ of kernel function is 2, that is, n h 4 0 , then the bandwidth rate n 1 / 5 is not allowed. Therefore, h = 1.06 δ W n 1 / 3 may be the second appropriate bandwidth.

3.2.6. NIPW Estimation Algorithms

As a more complex weight adjustment (WA) than IPW, NIPW weights all completely observed data through the following equation:
W A N I P W = 1 Π N ( X S , i , Y S , i , LV S , i , δ i ) = i = 1 n K i [ ( T T 0 ) / h ] i = 1 n δ i K i [ ( T T 0 ) / h ]
Here, X S , i , Y S , i , and LV S , i use S to represent the “selected” observed variables or latent variables for the ith sample which are the same as IPW. Correspondingly, the estimator of outer weights W ˜ j k ( Θ ) can be estimated referring to the following equation, which can be used in Step 3-1 of Algorithm 1 instead.
a r g m i n i N δ i i = 1 n K i [ ( T T 0 ) / h ] i = 1 n δ i K i [ ( T T 0 ) / h ] Φ i { Y ˜ j , i k = 1 K [ W ˜ j k ( Θ ) + W ˜ j k ( Θ ) ( T T 0 ) ] Y j k , i } K [ ( T T 0 ) / h ]
The estimator of path coefficients P ˜ ^ ( Θ ) can be estimated using the following equation, which can be used in Step 5 of Algorithm 1 instead.
a r g m i n i N δ i i = 1 n K i [ ( T T 0 ) / h ] i = 1 n δ i K i [ ( T T 0 ) / h ] Φ i { Z ˜ e n , i [ P ˜ ( Θ ) + P ˜ ( Θ ) ( T T 0 ) ] Z ˜ e x , i } K [ ( T T 0 ) / h ]

3.3. Modified IPW and NIPW Estimation Algorithms

Both IPW and NIPW estimation algorithms are carried out only based on the complete observed cases of all observed variables and latent variables. Under the partial least square algorithm framework, the ’partial’ in IPW and NIPW is reflected in the relatively independent estimations among each latent variable and its corresponding observed variables and among each latent variable. The iteration process for outer weighting updating links all these relatively independent estimations.
For those independent estimations without any variable containing missing data (denoted as E 1 ), a local polynomial PLS estimation algorithm can be directly applied based on all of the variables’ full information. For those independent estimations containing missing data (denoted as E 2 ), IPW or NIPW can be used to correct the biases only using completely observed cases. Although more cases are used to estimate the unknown coefficients which brings a potential computational burden, more observed information is used, and the ellipsis of W A I P W and W A N I P W calculations is really helpful in improving the computational efficiency. Therefore, modified IPW and NIPW estimation algorithms are proposed.

4. Simulation Investigations

4.1. Notations

Let LV η 1 and LV η 2 represent two endogenous latent variables, and let LV ξ represent exogenous latent variable. For each latent variable, two observed variables are generated, which are denoted as Y 11 , Y 12 , Y 21 , Y 22 , X 1 , and X 2 , respectively. We assume that X 1 cannot be completely observed, while all of the other observed variables can be completely observed. The loading coefficients L 11 ( Θ ) , L 12 ( Θ ) , L 21 ( Θ ) , L 22 ( Θ ) , L 1 ( Θ ) , and L 2 ( Θ ) link the latent variables to their corresponding observed variables. The path coefficients P 1 ( Θ ) and P 2 ( Θ ) link the same exogenous latent variable LV ξ to the endogenous latent variables LV η 1 and LV η 2 . Except for the above variables and coefficients, E δ 1 ( τ ) , E δ 2 ( τ ) , E ϵ Y 1 i ( τ ) , E ϵ Y 2 j ( τ ) , and E ϵ X k ( τ ) represent random error terms in dynamic structural equation models with missing data. Let δ = 1 if X 1 is observed; otherwise, δ = 0 . Here, T is a random variable and is typically chosen to be evenly spread across grid points on ( 0 , 1 ) that are sufficiently dense. τ represents the quantile level.

4.2. Models

Simulated examples are carried out to investigate the proposed IPW estimation algorithms’ performance in applications. The dynamic structural equation models with missing data can be written as the following equations [45,46,47,48]:
LV η 1 = P 1 ( Θ ) LV ξ + E δ 1 ( τ )
LV η 2 = P 2 ( Θ ) LV ξ + E δ 2 ( τ )
Y 1 i = L 1 i ( Θ ) LV η 1 + E ϵ Y 1 i ( τ ) , i = 1 , 2
Y 2 j = L 2 j ( Θ ) LV η 2 + E ϵ Y 2 j ( τ ) , j = 1 , 2
X k = L k ( Θ ) LV ξ + E ϵ X k ( τ ) , k = 1 , 2
Here, Θ = ( T , τ ) . Figure 1 displays the dynamic structural equation model with missing data, which illustrates the relations among different latent variables and observed variables more clearly.

4.3. Simulation Data Generation Mechanism

In the paper, the exogenous latent variable LV ξ is generated from normal distribution N ( 0 , ( 1 + σ ) / ( 2 + σ ) ) , where σ follows uniform distribution U ( U / 10 , 2 + U / 10 ) . In the structural model, the random error term E δ 1 ( τ ) follows Laplace distribution minus F 1 ( τ ) , with F ( ) being the common cumulative distribution function of normal distribution. The random error terms E δ 2 ( τ ) equal E δ 1 ( τ ) + N ( 0 , 1 ) . In the measurement model, the random error terms are E ϵ Y 1 i ( τ ) N ( n , 0 , 1 ) , E ϵ Y 2 j ( τ ) N ( n , 0.25 , 1 . 25 2 ) , and E ϵ X k ( τ ) N ( n , 0.2 , 1 ) . We assume that as the sample sizes are 200 and 500, the random variable T has 200 and 500 values, respectively, which are chosen to be evenly spread grid points on (0, 1). For brevity, the paper only displays the estimated results at the quantile levels 0.10, 0.50, and 0.90. The number of Monte Carlo replicates is 200.
The next important part is to generate the loading and path coefficients, which have been developed into functions of Θ . Firstly, path coefficients functions P 1 ( Θ ) and P 2 ( Θ ) are given by the following equations:
P 1 ( T ) = 15 + 20 s i n ( π 60 T )
P 2 ( T ) = 20 + 15 s i n ( π 60 T )
Loading coefficient functions L 11 ( Θ ) , L 12 ( Θ ) , L 21 ( Θ ) , L 22 ( Θ ) , L 1 ( Θ ) , and L 2 ( Θ ) are given by the following equations:
L 11 ( T ) = 2 3 c o s [ ( T 25 ) π 15 ] , L 12 ( T ) = 6 0.2 T
L 21 ( T ) = 4 + ( 20 T ) 3 2000 , L 22 ( T ) = 1 2 c o s [ ( T 25 ) π 15 ]
L 1 ( T ) = 3 + ( 20 T ) 3 2000 , L 2 ( T ) = 1.5 2 c o s [ ( T 25 ) π 15 ]
The paper considers two kinds of missing mechanisms for observed variable X 1 , which are denoted as Setting S1 and Setting S2. More specifically, in S1, P ( δ | X 2 ) = m a x [ 0 , ( X 2 + 1 ) / 10 ) 1 / 20 ] , such that approximately 20% of observations miss X 1 s. While in S2, P ( δ | X 2 , ξ ) = 1 / [ 1 + e x p ( 1.5 + 0.5 X 2 + 0.6 ξ ) ] , such that approximately 20% of observations miss X 1 s, and the missingness is related to another observed variable X 2 and even the corresponding latent variable ξ . Figure 2 shows two kinds of missing data distributions based on LV ξ and X 1 under the two settings S1 and S2 with sample sizes of 500. In S1, there exist 103 missing X 1 among 500 points, and the missing rate is 20.600%. In S2, there exist 101 missing X 1 among 500 points, and the missing rate is 20.200%.

4.4. Evaluation Indexes

To evaluate the performances of IPW estimation algorithms, indexes measuring the differences between the estimates and true values of loading and path coefficients are needed. In this part, mean absolute errors (MAEs) and mean squared errors (MSEs) are proposed on the basis of all time points ( t = 1 , , T ) and Monte Carlo replicates ( b = 1 , , B ). The MAE and MSE equations to calculate all the loading coefficients (denoted as L ( Θ ) for brevity) and path coefficients (denoted as P ( Θ ) for brevity) can be written as follows [49]:
M A E L ( Θ ) = 1 B 1 T b = 1 B t = 1 T | L b ^ ( t , τ ) L b ( t , τ ) |
M S E L ( Θ ) = 1 B 1 T b = 1 B t = 1 T ( L b ^ ( t , τ ) L b ( t , τ ) ) 2
Here L b ^ ( t , τ ) is the estimated loading coefficients and L b ( t , τ ) is the true loading coefficients at the tth time point ( t = 1 , , T ) and the bth Monte Carlo replicate ( b = 1 , , B ).
M A E P ( Θ ) = 1 B 1 T b = 1 B t = 1 T | P b ^ ( t , τ ) P b ( t , τ ) |
M S E P ( Θ ) = 1 B 1 T b = 1 B t = 1 T ( P b ^ ( t , τ ) P b ( t , τ ) ) 2
Here, P b ^ ( t , τ ) is the estimated path coefficients and P b ( t , τ ) is the true path coefficients at the tth time point ( t = 1 , , T ) and bth Monte Carlo replicate ( b = 1 , , B ).

4.5. Results

4.5.1. Comparisons of Estimation Accuracy and Efficiency in Setting S1

Table 1 displays the mean absolute errors of the estimated loading and path coefficients with a sample size of 200 from 200 Monte Carlo replicates at quantile levels 0.10, 0.50, and 0.90 in Setting S1. Table 2 presents the corresponding mean square errors of the estimated loading and path coefficients under the same setting as Table 1. Table 3 and Table 4 display the mean absolute errors and mean square errors of the estimated loading and path coefficients with an increased sample size of 500. Obviously, larger mean absolute errors and mean square errors mean relatively worse estimation accuracy. Compared with CC at quantile levels 0.1 and 0.50, our proposed IPW, IPWM, NIPW, and NIPWM estimation algorithms have advantages in all path coefficients estimations, as well as the advantage of the block of loading coefficients ( L 1 ( Θ ) and L 2 ( Θ ) ) estimation containing missing observed variables. It indicates that in Setting S1, our proposed estimation algorithms are more appropriate for use to capture structural relations among different latent variables and measure the relations between the latent variables and observed variables with missing data at low and median quantile levels. However, at a high quantile level of 0.9 or for those loading coefficients without missing data, the proposed IPW, IPWM, NIPW, and NIPWM estimation algorithms have not showed any substantial advantages.

4.5.2. Comparisons of Estimation Accuracy and Efficiency in Setting S2

Table 5 displays the mean absolute errors of the estimated loading and path coefficients with sample sizes of 200 from 200 Monte Carlo replicates at quantile levels 0.10, 0.50, and 0.90 in Setting S2. Table 6 presents the corresponding mean square errors of the estimated loading and path coefficients under the same setting as Table 5.
From the perspective of estimated loading coefficients with a sample size of 200, NIPW has the largest mean absolute error and mean square error at the quantile level 0.10, except for the mean absolute error value of NIPWM’s L 11 equaling 1.014. At quantile levels 0.50 and 0.90, CC and IPW almost have relatively larger mean absolute errors and mean square errors of the estimated loading coefficients, except for NIPWM’s L 2 and L 22 equaling 0.999 and 1.001 at the quantile level 0.50, respectively.
From the perspective of estimated path coefficients with a sample size of 200, IPW has relatively larger mean absolute errors of the estimated path coefficients ( P 1 and P 2 ) equaling 1.022 and 1.021 at the quantile level 0.10, and a relatively larger mean square error of the estimated path coefficients ( P 1 ) equaling 1.379 at the quantile level 0.10. NIPWM has a larger mean square error of the estimated path coefficients ( P 2 ) equaling 1.375 at the quantile level 0.10, a relatively larger mean absolute error of the estimated path coefficients ( P 2 ) equaling 1.025, and larger mean square error of the estimated path coefficients ( P 2 ) equaling 1.389 at the quantile level 0.50. CC, NIPW, and NIPWM have the same and larger mean absolute errors and mean square errors of the estimated path coefficients ( P 1 ) at the median quantile level 0.50. At the quantile level 0.90, CC has the largest mean absolute error and mean square error of the estimated path coefficient ( P 1 ) when compared with IPW, IPWM, NIPW, and NIPWM.
The paper also carries out simulation studies in Setting S2 with an increased sample size of 500. Table 7 and Table 8 display the mean absolute errors and mean square errors of the estimated loading and path coefficients with a sample size of 500 in Setting S2. At the quantile level 0.10, NIPW has relatively larger mean absolute errors and mean square errors of all loading and path coefficients except for the loading coefficients L 1 and L 2 , where NIPWM has the largest values. From the perspective of loading coefficients at the quantile levels 0.50 and 0.90, CC and IPW almost have all of the largest mean absolute errors and mean square errors of the estimated loading coefficients, except for L 1 and L 2 (NIPW has the largest values) and L 22 (NIPWM has the largest values) at the quantile level 0.50. From the perspective of path coefficients, CC and NIPWM have the relatively larger mean absolute errors at the quantile level 0.50, and NIPWM has the relatively larger mean absolute error at the quantile level 0.90 (IPWM has the same value for the estimated path coefficient P 2 ). IPWM and NIPWM have the relatively largest mean square errors of the estimated path coefficients at the quantile level 0.90. CC has a relatively larger mean square error of the estimated path coefficient P 1 at the quantile level 0.50, and NIPWM has a relatively larger mean square error of the estimated path coefficient P 2 at the quantile level 0.50.
Based on all the above analyses in Setting S2, IPWM almost has the smaller mean absolute error and mean square error of all loading and path coefficients compared to other estimation algorithms at all quantile levels with different sample sizes of 200 and 500. Contrastingly to Setting S1, at the quantile level 0.90 with a small sample size of 200, the proposed IPWM, NIPW, and NIPWM models can be treated as appropriate methods for use. With an increased sample size of 500 at the quantile level 0.90, NIPW reflects relatively obvious advantages compared with other estimation methods.

4.5.3. Comparison of Computing Time

Table 9 consists of two parts in total, displaying the computational efficiencies of estimation algorithms CC, IPW, IPWM, NIPW, and NIPWM in two settings: S 1 and S 2 . The upper half part displays the average computing times of all five estimation algorithms based on 200 Monte Carlo replicates. As expected, CC is the fastest estimation algorithm at all quantile levels in both settings when compared with other estimation algorithms. NIPW requires the most computing time. They are 96.025 s, 88.768 s, and 83.741 s in S 1 and 33.693 s, 45.456 s, and 87.876 s in S 2 . Both modified IPW and NIPW estimation algorithms (IPWM and NIPWM) obviously outperform when compared with their corresponding IPW and NIPW.
In the bottom half of Table 9, average computing time ratios (%) are calculated to compare the percentage of IPWM (NIPWM) to IPW (NIPW) in both settings at quantile levels 0.10, 0.50, and 0.90, suggesting that IPWM’s average computing times are only 53.237% (the minimum ratio) to 75.817% (the maximum ratio) of IPW’s average computing times, and that NIPWM’s average computing times are only 21.209% (the minimum ratio) to 31.577% (the maximum ratio) of NIPW’s average computing times.

5. Empirical Study

In this section, the inverse probability-weighted estimation method in the dynamic structural equation model is applied to digital new-quality productivity investigations across 277 cities within China in 2021 [50]. In the paper, digital new-quality productivity levels can be measured through three dimensions, which are science and technology investments ( S T ), environment conditions ( E C ), and digital infrastructure ( D I ). Each dimension can be measured through two observed variables, which can be seen in Table 10.
The digital new-quality productivity assessment model, which uses all dimensions and observed variables in Table 10, can be written as the following equations. Here, S T , E C , and D I represent latent variables, and S T 1 , S T 2 , E C 1 , E C 2 , D I 1 , and D I 2 are observed variables. L 11 ( U , τ ) , L 12 ( U , τ ) , L 1 ( U , τ ) , L 2 ( U , τ ) , L 21 ( U , τ ) , and L 22 ( U , τ ) are varying loading coefficients on a random variable U and the quantile level τ . P 1 ( U , τ ) and P 2 ( U , τ ) are varying path coefficients on a random variable U and the quantile level τ . It should be noted that the random measurement error terms E i ( τ ) i = 1 , 2 , , 8 meet the assumption that its τ th quantile equals zero under random variable U and their corresponding predicting variables.
E C = P 1 ( U , τ ) S T + E 1 ( τ ) , D I = P 2 ( U , τ ) S T + E 2 ( τ )
S T 1 = L 1 ( U , τ ) S T + E 3 ( τ ) , S T 2 = L 2 ( U , τ ) S T + E 4 ( τ )
E C 1 = L 11 ( U , τ ) E C + E 5 ( τ ) , E C 2 = L 12 ( U , τ ) E C + E 6 ( τ )
D I 1 = L 21 ( U , τ ) D I + E 7 ( τ ) , D I 2 = L 22 ( U , τ ) D I + E 8 ( τ )
The data of our paper are originally from the China City Statistical Yearbook, China Energy Statistical Yearbook, China Statistical Yearbook on Environment, China Statistical Yearbook on Science and Technology, and China Statistical Yearbook. There are 277 cities in total, and we assume that 20 % of the observations of S T 1 are missing. Based on the above model and data, both CC and the proposed IPW, IPWM, NIPW, and NIPWM are applied to digital new-quality productivity real data using 200 bootstraps. It should be noted that in our dynamic structural equation model, the random variable, affecting both the loading and path coefficients, is the location, representing different cities in China.
Table 11 displays the mean absolute errors and mean square errors of the estimated loading and path coefficients using CC and the inverse probability-weighted estimation method in digital new-quality productivity assessment models with 200 bootstraps. It should be noted that both the mean absolute errors and mean square errors are measured based on the differences between the raw estimation before bootstrap and each estimate in the 200 bootstraps. Obviously, there exist significantly large differences between CC and the proposed IPW, IPWM, NIPW, and NIPWM in both the mean absolute errors and mean square errors of the estimated loading coefficient estimations L 1 ( Θ ) and L 2 ( Θ ) . This suggests that the proposed IPW, IPWM, NIPW, and NIPWM estimation algorithms outperform in the estimation of loading coefficients L 1 ( Θ ) and L 2 ( Θ ) with missing data compared with the existing CC.
Table 12 presents average computing times (minutes) using CC, IPW, IPWM, NIPW, and NIPWM at quantile levels 0.10, 0.50, and 0.90. As the comparison, complete case analysis is chosen, and the average computing times equal 0.061, 0.064, and 0.062 min at three different quantile levels, respectively. IPW takes 0.208, 0.197, and 0.196 min on average. NIPW takes 2.114, 1.817, and 2.000 min on average. The modified IPWM only accounts for 47.626 %, 47.755 %, and 47.470 % of IPW at quantile levels 0.10, 0.50, and 0.90. The modified NIPWM only accounts for 27.917 %, 43.371 %, and 25.122 % of NIPW at quantile levels 0.10, 0.50, and 0.90, respectively. This suggests that both IPWM and NIPWM have largely improved the computational efficiencies when compared with IPW and NIPW.

6. Discussion

In the paper, inverse probability-weighted estimation methods are investigated for dynamic structural equation models containing observed variables with missing data. From parametric and nonparametric perspectives, two kinds of inverse probability-weighted estimation methods are proposed. They are parametric inverse probability weighting (IPW) and nonparametric inverse probability weighting (NIPW). To further improve the usage of observed information and relieve the computation burden, modified IPW and NIPW are developed on the basis of both IPW and NIPW.
Estimation accuracies and computational efficiencies are compared through simulation studies and real data analyses on digital new-quality productivity. Through simulation studies, the paper tries to provide the most appropriate settings in which our proposed inverse probability-weighted estimation methods can be considered. The real data analyses on digital new-quality productivity show the proposed IPW’s and NIPW’s relatively obvious advantages in loading coefficient estimations when parts of the observed variables are missing. Both simulation studies and empirical research display that obvious improvements of computational efficiencies from IPWM and NIPWM are obtained compared with IPW and NIPW. However, the comparison of the results between CC and our proposed IPW and NIPW estimation methods does not provide an absolute conclusion on which method is the most preferable across all quantiles for all path and loading coefficient estimators with missing data. In future, different kernel functions and parameters such as bandwidth will be further discussed when nonparametric IPW is needed.
As we know, IPW estimation methods are based on the ignorance of the cases containing missing data, which will more or less waste observed information. To further improve the performances in estimation accuracies, another missing data handling direction is to impute new values of missing data according to all observed information. In future, imputation methods will be investigated to generate new values for the missing data and make fuller use of the available information in dynamic structural equation models.

Funding

The author’s work is supported by the National Natural Science Foundation of China (72001197).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The author is very grateful to all of the reviewers for their insightful comments and to the interviewees for participating in our investigation. The author’s work was supported by the National Natural Science Foundation of China (72001197). The author wants to thank his parents, his wife Yujie Liu, and his two cute babies Maoqi and Maoshen.

Conflicts of Interest

The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

References

  1. Seaman, S.R.; White, I.R.; Copas, A.J.; Li, L. Combining multiple imputation and inverse-probability weighting. Biometrics 2012, 68, 129–137. [Google Scholar] [CrossRef] [PubMed]
  2. Jöreskog, K.G.; Sörbom, D. LISREL V: Analysis of Linear Structural Relationships by the Method of Maximum Likelihood. In National Educational Resources; Scientific Software: Chapel Hill, NC, USA, 1981. [Google Scholar]
  3. Jöreskog, K.G.; Sörbom, D. Recent developments in structural equation modeling. J. Mark. Res. 1982, 19, 404–416. [Google Scholar]
  4. Bollen, K.A. Structural Equations with Latent Variables; Wiley: New York, NY, USA, 1989. [Google Scholar]
  5. Lohmöller, J.B. Latent Variable Path Modeling with Partial Least Squares; Physica-Verlag: Heidelberg, Germany, 1989. [Google Scholar]
  6. Sammel, M.D.; Ryan, L.M. Latent variable models with fixed effects. Biometrics 1996, 52, 650–663. [Google Scholar] [CrossRef] [PubMed]
  7. Ciavolino, E.; Nitti, M. Simulation study for PLS path modeling with high-order construct: A job satisfaction model evidence. In Advanced Dynamic Modeling of Economic and Social Systems; Springer: Berlin/Heidelberg, Germany, 2013; pp. 185–207. [Google Scholar]
  8. Ciavolino, E.; Nitti, M. Using the hybrid two-step estimation approach for the identification of second-order latent variable models. J. Appl. Stat. 2013, 40, 508–526. [Google Scholar] [CrossRef]
  9. Hair, J.F.; Hult, G.T.M.; Ringle, C.M.; Sarstedt, M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM), 2nd ed.; SAGE Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar]
  10. Tenenhaus, M.; Esposito, V.V.; Chatelin, Y.M.; Lauro, C. PLS path modeling. Comput. Stat. Data Anal. 2005, 48, 159–205. [Google Scholar] [CrossRef]
  11. Davino, C.; Esposito, V.V. Quantile composite-based path modelling. Adv. Data Anal. Classif. 2016, 10, 491–520. [Google Scholar] [CrossRef]
  12. Davino, C.; Esposito, V.V.; Dolce, P. Assessment and validation in quantile composite-based path modeling. In The Multiple Facets of Partial Least Squares and Related Methods; Springer Proceedings in Mathematics and Statistics; Springer: New York, NY, USA, 2016; pp. 169–185. [Google Scholar]
  13. Davino, C.; Dolce, P.; Taralli, S. Quantile composite-based model: A recent advance in PLS-PM. In Partial Least Squares Path Modeling; Basic Concepts, Methodological Issues and Applications; Springer International Publishing AG: Berlin/Heidelberg, Germany, 2017; pp. 81–108. [Google Scholar]
  14. Davino, C.; Dolce, P.; Taralli, S. A quantile composite-indicator approach for the measurement of equitable and sustainable well-Being: A case study of the Italian provinces. Soc. Indic. Res. 2018, 136, 999–1029. [Google Scholar] [CrossRef]
  15. Dolce, P.; Davino, C.; Vistocco, D. Quantile Composite-Based Path Modeling: Algorithms, Properties and Applications; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
  16. Allison, P.D. Missing data techniques for structural equation modeling. J. Abnorm. Psychol. 2003, 112, 545. [Google Scholar] [CrossRef]
  17. Fang, Y.; Wang, L. Dynamic structural equation models with missing data: Data requirements on N and T. Struct. Equ. Model. Multidiscip. J. 2024, 31, 891–908. [Google Scholar] [CrossRef]
  18. Cai, Z.; Fan, J.; Li, R. Efficient estimation and inferences for varying-coefficient models. J. Am. Stat. Assoc. 2001, 95, 888–902. [Google Scholar] [CrossRef]
  19. Cheng, H. A class of new partial least square algorithms for first and higher order models. Commun. Stat. Simul. Comput. 2020, 51, 4349–4371. [Google Scholar] [CrossRef]
  20. Cheng, H.; Pei, R.M. Visualization analysis of functional dynamic effects of globalization talent flow on international cooperation. J. Stat. Inf. 2022, 37, 107–116. [Google Scholar]
  21. Ji, L.; Chow, S.M.; Schermerhorn, A.C.; Jacobson, N.C.; Cummings, E.M. Handling Missing Data in the Modeling of Intensive Longitudinal Data. Struct. Equ. Model. A Multidiscip. J. 2018, 25, 715–736. [Google Scholar] [CrossRef]
  22. Fan, J.; Zhang, J.T. Statistical estimation in varying coefficient models. Ann Stat 1999, 27, 1491–1518. [Google Scholar] [CrossRef]
  23. Fan, J.; Zhang, J.T. Functional linear models for longitudinal data. J. R. Stat. Soc. B 2000, 62, 303–322. [Google Scholar] [CrossRef]
  24. Assuno, R.M. Space varying coefficient models for small area data. Environmetrics 2003, 14, 453–473. [Google Scholar] [CrossRef]
  25. Fan, J.; Zhang, W. Statistical methods with varying coefficient models. Stat. Interface 2008, 1, 179. [Google Scholar] [CrossRef]
  26. Zhang, W.Y.; Lee, S.Y. Nonlinear dynamical structural equation models. Quant. Financ. 2009, 9, 305–314. [Google Scholar] [CrossRef]
  27. Asparouhov, T.; Hamaker, E.L.; Muthen, B. Dynamic latent class analysis. Struct. Equ. Model. A Multidiscip. J. 2017, 24, 257–269. [Google Scholar] [CrossRef]
  28. Asparouhov, T.; Hamaker, E.L.; Muthen, B. Dynamic structural equation models. Struct. Equ. Model. Multidiscip. J. 2017, 25, 359–388. [Google Scholar] [CrossRef]
  29. Wei, C.H.; Wang, S.J.; Su, Y.N. Local GMM estimation in spatial varying coefficient geographocally weighted autoregressive model. J. Stat. Inf. 2022, 37, 3–13. [Google Scholar]
  30. Cheng, H. New latent variable models with varying-coefficients. Commun. Stat. Theory Methods 2024, 1–18. [Google Scholar] [CrossRef]
  31. Cheng, H. Quantile Varying-coefficient Structural Equation Models. Stat. Methods Appl. 2023, 32, 1439–1475. [Google Scholar] [CrossRef]
  32. Koenker, R.; Bassett, G.J. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  33. Koenker, R. Quantile Regression; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
  34. Fan, J.; Gijbels, I. Local Polynomial Modeling and Its Applications; Chapman & Hall: London, UK, 1996. [Google Scholar]
  35. Chen, X.R.; Wan, A.T.K.; Zhou, Y. Efficient Quantile Regression Analysis With Missing Observations. J. Am. Stat. Assoc. 2015, 110, 723–741. [Google Scholar] [CrossRef]
  36. Chatelin, Y.M.; Esposito, V.V.; Tenenhaus, M. State-of-Art on PLS Path Modeling through the Available Software; HEC: Paris, France, 2002. [Google Scholar]
  37. Ringle, C.M.; Wende, S.; Becker, J.M. SmartPLS 3; SmartPLS GmbH: Boenningstedt, Germany, 2015. [Google Scholar]
  38. Wang, C.Y.; Wang, S.J.; Zhao, L.; Ou, S.T. Weighted semiparametric estimation in regression analysis with missing covariate data. J. Am. Stat. Assoc. 1997, 92, 512–525. [Google Scholar] [CrossRef]
  39. Cheng, H. Research on Nonparametric Inverse Probability Weighting Quantile Regression with Its Application in CHARLS Data. J. Appl. Stat. Manag. 2023, 42, 403–415. [Google Scholar]
  40. Eubank, R.L. Smoothing Spline and Nonparametric Regression; Marcel Dekker: New York, NY, USA, 1988. [Google Scholar]
  41. Zhou, Y.; Wan, A.T.K.; Wang, X. Estimating Equation Inference with Missing Data. J. Am. Stat. Assoc. 2008, 103, 1187–1199. [Google Scholar] [CrossRef]
  42. Sepanski, J.H.; Knickerbocker, R.; Carroll, R.J. A semiparametric correction for attenuation. J. Am. Stat. Assoc. 1994, 89, 1366–1373. [Google Scholar] [CrossRef]
  43. Carroll, R.J.; Wand, M.P. Semiparametric estimation in logistic measurement error models. J. R. Stat. Soc. 1991, 53, 573–585. [Google Scholar] [CrossRef]
  44. Silverman, B.W. Density Estimation; Chapman and Hall: London, UK, 1986. [Google Scholar]
  45. Chin, W.W.; Marcolin, B.L.; Newsted, P.R. A partial least squares latent variable modeling approach for measuring interaction effects: Results from a Monte Carlo simulation study and an electronic-mail emotion/adoption study. Inf. Syst. Res. 2003, 14, 189–217. [Google Scholar] [CrossRef]
  46. Reinartz, B.; Ballmann, J. Shock Waves; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1099–1104. [Google Scholar]
  47. Henseler, J.; Chin, W.W. A comparison of approaches for the analysis of interaction effects between latent variables using partial least squares path modeling. Struct. Equ. Model. 2010, 17, 82–109. [Google Scholar] [CrossRef]
  48. Becker, J.M.; Klein, K.; Wetzels, M. Formative hierarchical latent variable models in PLS-SEM: Recommendations and guidelines. Long Range Plan. 2012, 45, 359–394. [Google Scholar] [CrossRef]
  49. Hahn, J. Bootstrapping quantile regression estimators. Econom. Theory 1995, 11, 105–121. [Google Scholar] [CrossRef]
  50. Lu, J.; Guo, Z.A. New quality productivity research in urban areas contains horizontal measurement, spatiotemporal evolution and influencing factors based on panel data from 277 cities across China from 2012 to 2021. Soc. Sci. J. 2024, 4, 124–133. [Google Scholar]
Figure 1. Dynamic structural equation model with missing data. The observed variable X 1 contains missing data, and all the other observed variables Y 11 , Y 12 , Y 21 , Y 22 , X 2 are completely observed.
Figure 1. Dynamic structural equation model with missing data. The observed variable X 1 contains missing data, and all the other observed variables Y 11 , Y 12 , Y 21 , Y 22 , X 2 are completely observed.
Mathematics 12 03010 g001
Figure 2. Missing data distribution under S1 and S2 with sample size of 500.
Figure 2. Missing data distribution under S1 and S2 with sample size of 500.
Mathematics 12 03010 g002
Table 1. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S1.
Table 1. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S1.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
0.10CC1.0030.4610.4181.1020.4251.0171.0311.029
IPW1.0220.4940.3230.6090.4531.0301.0151.017
IPWM1.0080.4270.3170.6070.3991.0271.0241.023
NIPW1.0030.4200.3160.5620.3851.0251.0241.020
NIPWM1.0080.4270.3170.5610.4011.0281.0281.026
0.50CC0.9660.2960.3620.9980.3031.0001.0321.032
IPW0.9680.2980.3540.9880.3041.0081.0271.024
IPWM0.9750.2900.3480.9840.2981.0041.0301.029
NIPW0.9770.2950.3560.9940.2971.0061.0301.030
NIPWM0.9750.2920.3600.9960.2991.0021.0281.029
0.90CC1.0130.4710.3560.9400.4181.0501.0221.024
IPW1.0300.4800.5781.4620.4181.0561.0251.023
IPWM1.0330.4550.5701.4530.4041.0621.0221.021
NIPW1.0330.5250.5891.5610.4511.0621.0341.032
NIPWM1.0320.4570.5751.5490.4041.0621.0221.022
Table 2. Mean squared errors (MSEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S1.
Table 2. Mean squared errors (MSEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S1.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
0.10CC1.2450.3160.2591.3370.2751.2211.4031.394
IPW1.3250.3700.1800.5130.3151.2791.3641.362
IPWM1.2140.2730.1730.5080.2441.2171.3891.382
NIPW1.1900.2690.1670.4420.2291.1981.3861.370
NIPWM1.2140.2730.1700.4400.2461.2191.3961.389
0.50CC0.9710.1180.1941.0880.1241.0391.4081.407
IPW0.9820.1240.1901.0910.1261.0611.3941.387
IPWM0.9830.1100.1781.0810.1161.0441.4031.399
NIPW0.9930.1170.1851.0960.1191.0531.4041.401
NIPWM0.9820.1110.1941.0980.1171.0411.4001.400
0.90CC1.2330.3320.2241.0280.2631.2641.3861.387
IPW1.2810.3460.4702.3780.2621.2911.3901.383
IPWM1.2650.3120.4592.3520.2441.2871.3871.381
NIPW1.3310.4080.4792.6640.3041.3391.4101.402
NIPWM1.2620.3140.4562.6270.2451.2861.3861.383
Table 3. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S1.
Table 3. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S1.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
CC0.100.9980.5230.4871.1980.4600.9971.0171.016
0.500.9980.3090.3541.0280.2870.9801.0181.018
0.901.0150.4930.3280.8810.4371.0121.0171.016
IPW0.101.0060.5550.3390.5690.4891.0041.0131.012
0.501.0010.3090.3411.0160.2830.9791.0151.015
0.901.0200.5120.6911.6300.4481.0161.0151.015
IPWM0.101.0020.5020.3420.5670.4400.9981.0161.015
0.501.0030.3050.3371.0140.2830.9811.0171.016
0.901.0200.4960.6951.6350.4321.0131.0141.013
NIPW0.100.9980.4560.3340.5180.4051.0011.0131.011
0.501.0020.3060.3501.0230.2840.9821.0161.016
0.901.0260.5660.7051.7290.4881.0201.0191.019
NIPWM0.101.0030.5020.3360.5160.4400.9981.0171.015
0.501.0020.3040.3471.0210.2830.9821.0171.017
0.901.0190.4950.7071.7350.4321.0131.0151.014
Table 4. Mean squared errors (MSE) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S1.
Table 4. Mean squared errors (MSE) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S1.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
CC0.101.3110.3820.3551.5550.3021.2291.3551.353
0.501.0330.1270.1831.1160.1040.9871.3571.357
0.901.3060.3470.1780.8920.2861.2271.3551.353
IPW0.101.3640.4290.2060.4450.3381.2751.3471.341
0.501.0440.1280.1701.1030.1050.9891.3501.351
0.901.3410.3660.6532.9230.2951.2521.3491.347
IPWM0.101.2860.3520.2090.4440.2761.2001.3531.351
0.501.0400.1220.1671.0980.1000.9881.3541.353
0.901.3200.3440.6582.9390.2751.2271.3481.344
NIPW0.101.2370.2990.1980.3770.2401.1781.3451.340
0.501.0420.1250.1751.1160.1030.9921.3541.352
0.901.4150.4390.6573.2380.3451.3091.3591.358
NIPWM0.101.2870.3510.2010.3740.2761.2001.3551.352
0.501.0400.1220.1731.1100.1000.9881.3551.354
0.901.3180.3440.6593.2540.2751.2271.3511.348
Table 5. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S2.
Table 5. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S2.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
0.10CC0.9750.4390.3860.9990.4081.0011.0181.019
IPW0.9990.4640.5741.2800.4291.0231.0221.021
IPWM1.0130.4220.5771.2840.3961.0251.0181.016
NIPW1.0030.4720.6141.3490.4381.0271.0191.016
NIPWM1.0140.4220.6071.3480.3951.0261.0191.019
0.50CC0.9630.2980.3720.9980.3000.9891.0251.024
IPW0.9720.3000.3580.9810.2980.9971.0231.022
IPWM0.9710.2890.3560.9820.2971.0001.0241.023
NIPW0.9670.2980.3700.9980.2990.9941.0251.024
NIPWM0.9700.2880.3680.9990.2981.0011.0251.025
0.90CC1.0230.4710.3911.0300.4281.0611.0201.021
IPW1.0220.4870.3220.7550.4351.0521.0151.015
IPWM1.0160.4410.3210.7540.3951.0551.0191.018
NIPW1.0140.4680.3180.7220.4201.0511.0141.016
NIPWM1.0150.4410.3320.7290.3951.0551.0121.012
Table 6. Mean squared errors (MSEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S2.
Table 6. Mean squared errors (MSEs) of the estimated loading and path coefficients with sample sizes of 200 and 200 Monte Carlo replicates in Setting S2.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
0.10CC1.1660.2900.2401.1460.2611.1731.3711.373
IPW1.2370.3230.4631.8560.2841.2341.3791.373
IPWM1.2230.2680.4671.8720.2421.2081.3731.368
NIPW1.2590.3320.5172.0220.2971.2501.3751.368
NIPWM1.2230.2690.4952.0140.2411.2101.3761.375
0.50CC0.9660.1180.2101.1060.1221.0191.3881.385
IPW0.9870.1220.1921.0800.1241.0381.3821.379
IPWM0.9740.1080.1881.0780.1151.0361.3841.384
NIPW0.9770.1200.1981.1050.1231.0311.3881.386
NIPWM0.9730.1080.1961.1040.1161.0381.3881.389
0.90CC1.2560.3360.2401.2140.2771.2991.3801.378
IPW1.2810.3570.1740.7300.2841.2971.3691.360
IPWM1.2200.2950.1750.7280.2351.2661.3761.370
NIPW1.2410.3310.1650.6790.2671.2821.3661.365
NIPWM1.2180.2940.1930.6910.2351.2661.3651.360
Table 7. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S2.
Table 7. Mean absolute errors (MAEs) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S2.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
CC0.100.9700.4930.3540.9840.4330.9661.0181.016
0.500.9920.3080.3551.0150.2900.9691.0191.019
0.901.0260.5100.4141.1030.4551.0201.0141.016
IPW0.101.0040.5230.6851.4200.4600.9981.0161.013
0.501.0000.3060.3451.0080.2860.9751.0171.017
0.901.0170.5280.3400.7080.4701.0081.0141.013
IPWM0.101.0070.4920.6831.4200.4261.0031.0161.012
0.500.9970.3050.3441.0060.2850.9771.0181.018
0.901.0050.4840.3360.7060.4221.0031.0161.017
NIPW0.101.0090.5340.7041.4680.4711.0031.0191.017
0.500.9980.3040.3651.0160.2850.9731.0181.018
0.901.0080.5050.3380.6800.4501.0051.0151.015
NIPWM0.101.0070.4920.7091.4720.4261.0021.0161.013
0.500.9970.3050.3641.0150.2850.9781.0191.019
0.901.0050.4840.3400.6850.4221.0041.0171.017
Table 8. Mean squared errors (MSEs) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S2.
Table 8. Mean squared errors (MSEs) of the estimated loading and path coefficients with sample sizes of 500 and 200 Monte Carlo replicates in Setting S2.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
CC0.101.2150.3400.1961.0830.2711.1381.3701.364
0.501.0220.1260.1871.0980.1070.9671.3731.371
0.901.3460.3670.2671.3290.3041.2531.3621.363
IPW0.101.3140.3820.6512.2660.3041.2241.3641.358
0.501.0400.1260.1771.0840.1060.9791.3681.368
0.901.3570.3870.2150.6470.3191.2551.3601.359
IPWM0.101.2810.3420.6492.2640.2641.1961.3661.360
0.501.0300.1220.1761.0820.1010.9801.3711.371
0.901.2720.3280.2060.6420.2621.1971.3671.366
NIPW0.101.3380.3970.6632.3850.3171.2451.3711.364
0.501.0370.1250.1911.1000.1050.9771.3711.370
0.901.3150.3570.2050.6000.2941.2271.3641.363
NIPWM0.101.2800.3420.6732.3920.2641.1961.3671.361
0.501.0280.1220.1901.0980.1010.9811.3721.372
0.901.2720.3270.2050.6090.2621.1981.3671.366
Table 9. Computational efficiencies of all algorithms (CC, IPW, IPWM, NIPW, NIPWM) with sample sizes of 200 and 200 Monte Carlo replicates in settings S 1 and S 2 .
Table 9. Computational efficiencies of all algorithms (CC, IPW, IPWM, NIPW, NIPWM) with sample sizes of 200 and 200 Monte Carlo replicates in settings S 1 and S 2 .
S1 S2
0.100.500.900.100.500.90
CC (seconds)8.61214.64013.9263.97710.5946.287
IPW (seconds)32.10230.56931.73011.85928.55921.939
IPWM (seconds)18.02023.17619.8047.53015.20412.718
NIPW (seconds)96.02588.76883.74133.69345.45687.876
NIPWM (seconds)29.31624.66917.76010.63912.21427.657
IPWM/IPW (%)56.13475.81762.41363.50053.23757.970
NIPWM/NIPW (%)30.52927.79121.20931.57726.87131.473
Table 10. Dimensions, indicators, and their abbreviations in digital new-quality productivity assessment indicator systems.
Table 10. Dimensions, indicators, and their abbreviations in digital new-quality productivity assessment indicator systems.
DimensionsObserved Variables
science and technology investment S T 1 : number of employees in scientific research,
S T technical services and
geological exploration industry
S T 2 : financial expenditure on science and education
environment condition E C 1 : industrial sulfur dioxide emissions/GDP
E C E C 2 : industrial waste water generation/GDP
digital infrastructure D I 1 : total telecommunications business volume
D I D I 2 : number of Internet broadband access ports
Table 11. Mean absolute errors (MAEs) and mean squared errors (MSEs) of the estimated loading and path coefficients.
Table 11. Mean absolute errors (MAEs) and mean squared errors (MSEs) of the estimated loading and path coefficients.
L 11 ( Θ ) L 12 ( Θ ) L 1 ( Θ ) L 2 ( Θ ) L 21 ( Θ ) L 22 ( Θ ) P 1 ( Θ ) P 2 ( Θ )
MAE0.1CC0.4280.3241.9932.1161.0570.4550.2560.120
IPW0.4240.3100.8310.7950.9660.4490.0610.100
IPWM0.4210.3170.8330.8141.0060.4350.0980.138
NIPW0.4400.3070.8930.8021.0060.4510.0690.095
NIPWM0.4170.3110.8950.8271.0170.4270.1000.114
MAE0.5CC0.1200.0960.4890.8090.5650.2810.0430.038
IPW0.1200.1040.2730.2810.5140.2910.0250.032
IPWM0.1090.0940.2710.2820.5580.2710.0280.033
NIPW0.1150.1120.2720.2740.5610.2740.0320.033
NIPWM0.1080.0920.2680.2730.5570.2660.0310.034
MAE0.9CC0.5910.5625.9467.7451.1480.4310.1061.904
IPW0.6210.4391.0221.0670.7090.4400.1420.165
IPWM0.6070.4181.0271.0510.6740.4850.1770.185
NIPW0.6610.3560.8390.8020.6300.4910.1610.143
NIPWM0.7370.3620.8710.8180.5360.4960.2180.169
MSE0.1CC0.2860.1995.2164.9022.1860.3560.1020.071
IPW0.2710.1901.4131.2221.9730.3480.0430.072
IPWM0.2740.2101.4111.2612.1140.3370.0790.177
NIPW0.2890.1791.5541.1932.0730.3460.0460.073
NIPWM0.2700.2031.5841.2672.0960.3130.0730.127
MSE0.5CC0.0250.0220.4160.7840.8140.1260.0120.005
IPW0.0260.0230.1470.1460.6690.1340.0030.002
IPWM0.0230.0220.1520.1500.8170.1180.0040.002
NIPW0.0250.0250.1500.1390.7720.1210.0040.002
NIPWM0.0230.0210.1480.1430.8010.1130.0050.002
MSE0.9CC0.6080.44536.72260.7981.6770.2750.1303.677
IPW0.5500.2291.6741.7521.2890.2830.2160.074
IPWM0.5320.2081.6851.7020.9920.3160.3010.183
NIPW0.6970.2581.2031.0971.1490.4370.1650.067
NIPWM0.8380.2651.2711.1240.7860.4360.2660.126
Table 12. Average computing times (ACT, minutes) using CC, IPW, IPWM, NIPW, and NIPWM.
Table 12. Average computing times (ACT, minutes) using CC, IPW, IPWM, NIPW, and NIPWM.
0.100.500.90
CC0.0610.0640.062
IPW0.2080.1970.196
IPWM0.0990.0940.093
NIPW2.1141.8172.000
NIPWM0.5900.7880.502
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheng, H. Inverse Probability-Weighted Estimation for Dynamic Structural Equation Model with Missing Data. Mathematics 2024, 12, 3010. https://doi.org/10.3390/math12193010

AMA Style

Cheng H. Inverse Probability-Weighted Estimation for Dynamic Structural Equation Model with Missing Data. Mathematics. 2024; 12(19):3010. https://doi.org/10.3390/math12193010

Chicago/Turabian Style

Cheng, Hao. 2024. "Inverse Probability-Weighted Estimation for Dynamic Structural Equation Model with Missing Data" Mathematics 12, no. 19: 3010. https://doi.org/10.3390/math12193010

APA Style

Cheng, H. (2024). Inverse Probability-Weighted Estimation for Dynamic Structural Equation Model with Missing Data. Mathematics, 12(19), 3010. https://doi.org/10.3390/math12193010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop