Estimating the Variance of Estimator of the Latent Factor Linear Mixed Model Using Supplemented Expectation-Maximization Algorithm

: This paper deals with symmetrical data that can be modelled based on Gaussian distribution, such as linear mixed models for longitudinal data. The latent factor linear mixed model (LFLMM) is a method generally used for analysing changes in high-dimensional longitudinal data. It is usual that the model estimates are based on the expectation-maximization (EM) algorithm, but unfortunately, the algorithm does not produce the standard errors of the regression coefﬁcients, which then hampers testing procedures. To ﬁll in the gap, the Supplemented EM (SEM) algorithm for the case of ﬁxed variables is proposed in this paper. The computational aspects of the SEM algorithm have been investigated by means of simulation. We also calculate the variance matrix of beta using the second moment as a benchmark to compare with the asymptotic variance matrix of beta of SEM. Both the second moment and SEM produce symmetrical results, the variance estimates of beta are getting smaller when number of subjects in the simulation increases. In addition, the practical usefulness of this work was illustrated using real data on political attitudes and behaviour in Flanders-Belgium.


Introduction
The latent factor multivariate linear mixed model (LFLMM) is a combination between the Factor Analysis (FA) and the Linear Mixed Model (LMM), as proposed by [1]. The model aims to analyze longitudinal data sets with large numbers of multivariate responses, i.e., high-dimensional longitudinal data. The authors proposed estimation of the LFLMM by means of the EM algorithm, which is a closed-form solution. They showed by way of simulation that EM estimation of the LFLMM provides accurate parameter estimates and is more efficient in terms of adding variables other than time variables in the model than alternatives like the structural equation model. As shown by [2,3], the combination of fixed and random effects and the interaction of covariates with time can be straightforwardly handled by the LFLMM estimated by EM.
The LFLMM assumes that the responses are continuous and that the number of latent variables is known [1]. Moreover, convergence of the EM algorithm is sometimes slow. The main disadvantage, however, is that the EM algorithm does not produce standard errors of the estimator of the regression coefficients because it does not calculate the derivatives of the likelihood function, which are often complicated and tedious to derive [4,5]. Thus, it is difficult to study the effects of different covariates or fixed variables for different latent factors simultaneously.
The general Supplemented EM algorithm was proposed by [6] to obtain the standard errors by calculating the complete information matrix as the base of the variance-covariance matrix of the estimator. The Supplemented EM algorithm has been applied to various kinds of models, notably item response models [6][7][8][9]. However, its suitability and features in the case of application to the LFLMM have not been investigated yet. In this study, we extend the work of [1] by employing the Supplemented EM algorithm as a by-product of the EM estimator for the case of fixed variables. We used simulation studies to investigate the computational aspects of the Supplemented EM algorithm and used a real data example to illustrate the practical usefulness of this work.
The remainder of this study is organized as follows. In Section 2, we specify the LFLMM and summarize the EM algorithm to estimate it. Section 3 presents the Supplemented EM algorithm. Sections 4 and 5 present the results of the simulation and real data example. Conclusions follow in Section 6.

The LFLMM and the EM Algorithm
Following [1], an LFLMM can be composed of two parts. The first is the factor analysis model, which represents the relationships between the observed and latent variables. This is similar to the structural equation model, which explains the relationship of the latent variables and the measurement indicators carried out through factor analysis [10]. This part can be written as: Specifically, for the i-th of N individuals, we observe j = 1, . . . , J responses which where T i is the number of time periods for subject i. Λ is the matrix of factor loadings and it = it1 , . . . , itJ the vector of measurement errors for subject i at time t. It is assumed that itJ ∼ N 0, τ 2 j and itj ⊥ ith , j = h. In matrix notation, Equation (1) reads: where The second part of the LFLMM is a multivariate linear mixed model containing the fixed and random effects for each latent variable (η it ). For individual i, i = 1, 2, . . . N, at time t, t = 1, 2, . . . , T i , and latent variable l, l = 1, 2, . . . , d, we thus have: where x l it and z l it are the elements of design matrices of the p fixed variables and q random effects, respectively. β l is an unknown coefficient, a l i = a l i1 , . . . , a l iq and ε l i = ε l i1 , . . . , ε l iTi , l = 1, 2, . . . d, are the random effects and errors for subject i and factor l, respectively. The random effects are assumed to be normally distributed with mean 0 and variancecovariance matrix V(a) = Σ a . It is assumed that Σ a captures the changes among the latent variables [1]. For example, a positive covariance between the random effects for the latent variables 1 and 2 means that if for a given individual i the latent variable 1 increases over time, the latent variable 2 also increases for that individual. Note that in this setting, the covariates are included in the multivariate linear mixed model (MLMM) of Equation (3) but not in the factor analysis model of Equation (1).
In matrix notation, Equation (3) reads: where The marginal distribution of Y i is assumed multivariate normal with mean: The first term in V(Y i ) denotes the variances and covariance of the latent factors and the last term the variances of the error term, it . The mean and variance-covariance matrix of η i are E(η i ) = X i β and V(η i ) = Z i Σ a Z i + I T i ⊗ Σ ε , respectively.
To estimate the LFLMM by EM, we summarize it below, as proposed by [1]. Before going into detail, we observe that {η i , a i } is treated as missing data. Hence, the complete dataset is {Y i , X i , Z i , η i , a i } whereas the observed data is {Y i , X i , Z i }. It follows that the complete data likelihood is: The corresponding complete data loglikelihood is: where Let the θ denote the parameter vector Λ, τ 2 , β, and Σ ε , θ (w) be the ML estimate of θ at the wth iteration for w = 0, 1, . . .., and Q θ| θ (w) the expectation of the joint loglikelihood for the complete data Then the (w + 1)th iteration of the EM algorithm consists of (i) the E-step, which is the expectation of the joint loglikelihood computed according to (10) and (ii) the M-step, which maximizes Q θ| θ (w) to yield θ (w+1) . Further details on EM estimation of the LFLMM can be found in [1].

The Supplemented EM
Below we discuss the Supplemented EM algorithm, denoted as SEM. Before going into detail, we observe that the main purpose of this study is to estimate the standard errors of the fixed effects, β.
Consider the mapping M defined by iteration w of the EM algorithm: when the parameter vector converges to β * , we obtain β * = M(β * ). For M(β) continuous we have by Taylor expansion in the neighbourhood of β * where g = 1, 2, . . . , k and h = 1, 2, . . . , k is the k × k Jacobian matrix of M(β) = (M 1 (β), . . . , M k (β)) evaluated at the ML estimate of β with k = p × d. DM is known as the rate matrix. To obtain the loglikelihood of β, we consider the complete data density of the LFLMM: the density of missing data, given the observed data. Thus, the loglikelihood of β given the complete data is: The asymptotic variance-covariance matrix of β, V(β), is the inverse of the observed information matrix (I o ). In the case of the LFLMM, the observed data is {Y i , X i , Z i } so that V(β) is: where is the information matrix of the observed data loglikelihood (which is assumed to exist). That is [6,11]: Equation (15) is difficult to evaluate directly using the EM algorithm [6,11]. As a way out, [7] suggested to evaluate the complete data information matrix: The conditional complete data information, given the observed data evaluated at β = β * , is: After taking second derivatives, averaging over , and evaluating at β = β * , Equation (13) implies: where the missing information matrix (I om ) is ref [12] interpreted Equation (18) as observed in f ormation = complete in f ormation − missing in f ormation and called it the "missing information principle". Equation (18) can be written as: where I is the k × k identity matrix and I om I −1 oc is the matrix of the fraction of missing information [7,11]. According to [13], the rate of convergence of the EM algorithm is determined by the fraction of missing information in the neighborhood of β * : Substituting DM = I om I −1 oc into Equation (20) and inverting, the asymptotic variancecovariance matrix of β * , V(β * ) is: From the equality (I − P) −1 = (I − P + P)(I − P) −1 = I + P(I − P) −1 it follows that: where ∆V(β * ) is the increase of the diagonal elements of V(β * ) related to missing information. Calculation of the DM matrix can be done using the code and output of the original EM algorithm as follows [6,7]. The DM matrix represents the differential of the parameter mappings during the EM algorithm. Hence, each element of the DM matrix represents a component-wise increase of the rate of convergence per iteration of the EM algorithm. Let r gh be the (g, h) th element of the DM matrix. From Equation (13), we have: which converges to β * g . Note that only the gth component in β (w) (g) takes a value different from its maximum likelihood estimate.
The output after a single run of the Supplemented EM algorithm (Step 1 and 2) are β w+1 and r (w) gh g = 1, 2, . . . , k and h = 1, 2, . . . , k. Based on the final estimates of DM, V(β * ) is calculated using (24). The diagonal elements of V are the variance of β * .

Simulation
To evaluate the statistical properties and computational aspects of the SEM, we set up a simulation study. The number of subjects (N) is set at 500, 1000, and 1500 with six time periods. The number of simulations (S) is set at 50 and 250. The other set-up of the simulations is adopted from [1]. Particularly, we use the same initial values of the parameters of the LFLMM model (12 items, 2 latent factors, and a simple structure to model the relationship between the items and the latent factors). It is done to check if the bias resulting from the Supplemented EM algorithm on LFLMM is in line with the results presented in [1]. Table 1 presents the absolute difference for the true parameters, and the averages of the SEM estimates are calculated as a measure of performance. Table 1 shows that the absolute difference of σ a,11 has a range from 0 to 0.0444 (N = 500 and S = 250). The results are in line with [1] the parameters of the measurement model (factor loadings and error variances) are estimated more precisely than those of the latent mixed regression model. Overall, these results indicate that with the increasing number of subjects in the simulation, the absolute difference between the actual parameters and the SEM average is getting smaller. This means the SEM can estimate the model parameter very well.
Although the results in Table 1 indicate that the accuracy of the estimates in the latent mixed regression model (No. 23-43) is not as good as in the measurement model part (No. 1-22), through Figure 1a-c, it can be shown that the median of boxplots (which generally is close to the true parameters) are all at the same level. This means that the parameters of the latent mixed regression models (β, σ a , σ ε ) are estimated more precisely, especially for the number of simulations S = 250. Furthermore, all the boxplots are also shown to have different distributions of views with an increasing number of subjects in the simulation. This is indicated by the smaller size of the boxplot as the number of subjects increases for both numbers of simulations.
The results from the simulations of the Supplemented EM algorithm in estimating the asymptotic variance matrix of beta is summarized as a standard deviation of beta in Table 2. We also calculate the standard deviation of beta using the 2nd moment, as a benchmark to compare with the standard deviation of beta of SEM. Both the 2nd moment and SEM produce symmetrical results the parameter estimate for the standard deviation of beta is getting smaller with the increasing number of subjects in the simulation. Overall, it can be concluded that by using SEM, changes in the parameter estimate for the standard deviation of beta are not too different for all number of subjects ( Figure 2). Therefore, the simulation results suggest that the asymptotic variance of beta from the Supplemented EM Algorithm can be used to estimate the asymptotic variance of beta in real data analysis.  The results from the simulations of the Supplemented EM algorithm in estimating the asymptotic variance matrix of beta is summarized as a standard deviation of beta in Table 2. We also calculate the standard deviation of beta using the 2nd moment, = ∑ − ̅ as a benchmark to compare with the standard deviation of beta of SEM. Both the 2nd moment and SEM produce symmetrical results the parameter estimate for the standard deviation of beta is getting smaller with the increasing number of subjects in the simulation. Overall, it can be concluded that by using SEM, changes in the parameter estimate for the standard deviation of beta are not too different for all number of subjects ( Figure 2). Therefore, the simulation results suggest that the asymptotic variance of beta from the Supplemented EM Algorithm can be used to estimate the asymptotic variance of beta in real data analysis.

Real Data Example
The real data-set that we used to illustrate the development of the Supplemented EM algorithm is the political attitudes and behavior data of Flemish. The data was designed to include a representative sample of the target population under the Belgian electorate. The Flemish data set (Flemish and Dutch speaking respondents from Brussels Capital Region) consists of 1274 respondents, who have been interviewed three times (1991, 1995, and 1999) [16][17][18]. There are four latent factors measured on political attitudes of Flemish used, i.e., Individualism, Nationalism, Ethnocentrism, and Authoritarianism. This data has been analyzed using various methods by several authors, including [19][20][21][22][23]. There are three interesting questions in this real data case, i.e., how Individualism, Nationalism, Ethnocentrism, and Authoritarianism of the Flemish develop over time; whether there is an association between these four developments, and whether the gender of the respond-

Real Data Example
The real data-set that we used to illustrate the development of the Supplemented EM algorithm is the political attitudes and behavior data of Flemish. The data was designed to include a representative sample of the target population under the Belgian electorate. The Flemish data set (Flemish and Dutch speaking respondents from Brussels Capital Region) consists of 1274 respondents, who have been interviewed three times (1991, 1995, and 1999) [16][17][18]. There are four latent factors measured on political attitudes of Flemish used, i.e., Individualism, Nationalism, Ethnocentrism, and Authoritarianism. This data has been analyzed using various methods by several authors, including [19][20][21][22][23]. There are three interesting questions in this real data case, i.e., how Individualism, Nationalism, Ethnocentrism, and Authoritarianism of the Flemish develop over time; whether there is an association between these four developments, and whether the gender of the respondent affects the change patterns of latent developments. I, N, E, and A in Table 3 correspond to Individualism, Nationalism, Ethnocentrism, and Authoritarianism, respectively. a 11 and a 12 are the random intercept and random slope for Individualism. a 21 and a 22 are the random intercept and random slope for Nationalism. a 31 and a 32 are the random intercept and random slope for Ethnocentrism. a 41 and a 42 are the random intercept and random slope for Authoritarianism. The positive correlation of random intercept between a 11 and a 21 , a 11 and a 31 , a 11 and a 41 suggests that the development of Individualism and other political attitudes is highly related, which highest correlated with Ethnocentrism. The results indicate that those who have a better sense of Individualism tend to have a better sense of Nationalism, Ethnocentrism, and Authoritarianism. The results find a positive correlation of random intercept between a 21 and a 31 , a 21 and a 41 . It suggests that those who have a better sense of Nationalism tend to have a better sense of Ethnocentrism and Authoritarianism, as well as those who have a better sense of Ethnocentrism tend to have a better sense of Authoritarianism. There is also a positive correlation of random slope between a 12 and a 22 . It means that if one subject's Individualism decreases over time, then it is reasonable to expect that his or her Nationalism will decrease over time and vice versa. This also holds between Individualism and Ethnocentrism and between Individualism and Authoritarianism. The positive correlation of random slope between a 22 and a 32 , meaning that if one subject's Nationalism decreases over time, then it is reasonable to expect that his or her Ethnocentrism will decrease over time. The correlation matrix of random effects confirms that all latent factors have a positive correlation over time. The significance of parameter estimate of β is analyzed via the z-values. By using the Supplemented EM algorithm, the standard errors of β for all parameters can be calculated. The standard errors of β are listed in Table 4. Using a 95 percent confidence interval of β, almost all confidence intervals do not include the null value, except the slope of Male on Authoritarianism. Hence there are statistically significant differences in the parameter estimate of β. In other words, all latent factors of Flemish people decrease over time, with Ethnocentrism having the highest rate of decline over time (−0.252) and Nationalism the lowest (−0.177). On average, the Individualism and Nationalism of the male respondent are higher than that of the female. However, Ethnocentrism of the male respondent is lower than that of the female.

Conclusions
This paper proposed the Supplemented EM algorithm for LFLMM in estimating the asymptotic variance-covariance matrix as a by-product of the EM estimator for the case of fixed variables in the model. Results from simulation studies suggest that the Supplemented EM algorithm can estimate the model very close to the initial parameters.
As a result of the development of EM algorithm of LFLMM, the Supplemented EM algorithm is very slow to converge, as stated by [1], especially when the number of simulations is 250 times with 1500 subjects. For this reason, further research is needed to find techniques that can be used to accelerate the speed of the algorithm. Several approaches to speed the EM algorithm have been proposed and can be found in [24][25][26] (the ECM algorithm), [27] (the ECME algorithm), and [28] (the Parameter-Expanded EM algorithm).