Variational Bayesian Estimation of Quantile Nonlinear Dynamic Latent Variable Models with Possible Nonignorable Missingness

Mulati Tuerde; Ahmadjan Muhammadhaji

doi:10.3390/axioms13120849

and

¹

College of Mathematics and System Sciences, Xinjiang University, Urumqi 830017, China

²

Xinjiang Key Laboratory of Applied Mathematics, Xinjiang University, Urumqi 830017, China

^*

Author to whom correspondence should be addressed.

Axioms2024, 13(12), 849;https://doi.org/10.3390/axioms13120849

This article belongs to the Special Issue Recent Advances in Statistical Modeling and Simulations with Applications, 2nd Edition

Version Notes

Order Reprints

Abstract

Our study presents an innovative variational Bayesian parameter estimation method for the Quantile Nonlinear Dynamic Latent Variable Model (QNDLVM), particularly when dealing with missing data and nonparametric priors. This method addresses the computational inefficiencies associated with the traditional Markov chain Monte Carlo (MCMC) approach, which struggles with large datasets and high-dimensional parameters due to its prolonged computation times, slow convergence, and substantial memory consumption. By harnessing the deterministic variational Bayesian framework, we convert the complex parameter estimation into a more manageable deterministic optimization problem. This is achieved by leveraging the hierarchical structure of the QNDLVM and the principle of efficiently optimizing approximate posterior distributions within the variational Bayesian framework. We further optimize the evidence lower bound using the coordinate ascent algorithm. To specify propensity scores for missing data manifestations and covariates, we adopt logistic and probit models, respectively, with conditionally conjugate mean field variational Bayes for logistic models. Additionally, we utilize Bayesian local influence to analyze the Ecological Momentary Assessment (EMA) dataset. Our results highlight the variational Bayesian approach’s notable accuracy and its ability to significantly alleviate computational demands, as demonstrated through simulation studies and practical applications.

Keywords:

Dirichlet process; quantile nonlinear dynamic latent variable model; missing data; variational Bayesian method

MSC:

Primary 62F15; 62H25; Secondary 62P15

1. Introduction

Latent variable models are designed to estimate the relationships between observed variables and a hypothesized latent construct that is presumed to predict these variables. Dynamic latent variable models (DLVMs) extend this concept by accounting for unobserved heterogeneity through the explicit modeling of dependencies between observations and latent variables. This approach allows for the capture of variable effects across different spatial and temporal dimensions, providing a nuanced understanding of intra-individual dynamics and interindividual differences within ecological momentary assessments, as elucidated by Diener et al. [] and Chow et al. []. Grasping the dynamic interplay and dependencies among latent and manifest variables over time is pivotal. Incorporating these lagged relationships is crucial for developing models that offer more profound insights and greater accuracy in explaining data patterns and trends. The exploration of lagged relationships among model elements is essential for crafting insightful models, which is why dynamic latent variable models have attracted considerable attention recently, as demonstrated by the works of Zhang et al. [], Chow et al. [] and Tang et al. []. However, the existing literature primarily assumes normality in directly measured error, which is especially susceptible to the influence of outliers and distributions characterized by heavy tails. To address this issue, a typical strategy is to relax the distributional constraints on variables that are directly measurable, focusing on their conditional quantiles instead of the full distribution, as explored by Wang et al. [] and Tuerde et al. []. In our view, there is a sparse body of research concerning quantile nonlinear dynamic latent variable models (QNDLVMs). These models necessitate Markov chain Monte Carlo (MCMC) methods for inference, which can be computationally demanding and impractical for large datasets.

The cornerstone of empirical research is rooted in data collection and analysis, a process where missing values are an almost ubiquitous challenge. This issue spans across various fields, such as education, social sciences, and economics, due to factors like participants’ hesitancy to answer sensitive questions, accidental data loss, or withdrawal from studies (Little and Rubin []). Factor analysis with missing data has garnered significant interest from both theoretical researchers and practitioners. A multitude of methodologies have been developed for estimating strict latent variable models when data are incomplete. Banbura and Modugno [], along with Jungbacker et al. [], have explored likelihood estimation methods. For approximate factor models, Stock and Mark [] suggest using the most recent estimate of the common component to fill in missing values in covariates. Despite differences in execution, a prevalent strategy is to utilize estimates from a balanced panel as starting values, as highlighted by Stock and Watson []. Recently, Tuerde et al. [] developed a quantile nonlinear dynamic latent variable model to tackle missing data, but their approach is limited to continuous data and relies on MCMC techniques for inference. To overcome these constraints, we propose a variational Bayesian methodology capable of effectively managing nonignorable missing data.

Our inspiration comes from examining the data gathered through ecological momentary assessment (EMA) methods, as initially explored by Diener and colleagues in 1995. The research involved a group of 174 university students, consisting of 93 males and 81 females with an average age of 20.24 years and a standard deviation of 1.41 years. Participants were instructed to provide daily Ecological Momentary Assessment (EMA) ratings for various emotions using a seven-point Likert scale, with 1 representing “None” and 7 representing “Always”, over a period of 52 days. The directly measurable variables consist of eight ordinal scales: joy, contentment, love, affection, unphappiness, anger, depression, and anxiety. The top four scales are used to denote a latent positive emotion (PE), and the final four scales indicate a latent negative emotion (NE). The missing of data was not negligible; discrete variables are intertwined with individual beliefs and ethical considerations. The missing data made up about 11.2% of the entire dataset. This study seeks to investigate the evolving and nonlinear interactions between positive emotions (PE) and negative emotions (NE), with a particular focus on how these dynamics are affected by prior levels of emotions that possess opposing valences.

We present our innovative contributions: (i) We unveil a deterministic variational Bayesian approach designed to tackle nonlinear dynamic latent variable models (DLVMs), even when faced with the challenges of a nonparametric prior and possible nonignorable missingness data; (ii) To effectively manage nonignorable missing covariates and responses, we employ a set of univariate dimensional logistic and probit models, which facilitate optimizing evidence of the lower bound for robust statistical inference based on their respective posterior distributions; (iii) Our dynamic model benefits from the assumption of a Dirichlet prior for the random effects, offering reasonable representation of the underlying processes; (iv) We take Bayesian local influence analysis to new heights by conducting a comprehensive sensitivity analysis across multiple model components.

This article is organized in a systematic fashion. In Section 2, we begin by delineating the quantile regression model, proceed to elucidate the mechanisms and distributions associated with missing data, and culminate with an introduction to the Dirichlet process. Section 3 introduces a variational Bayesian methodology aimed at estimating the challenging parameters that have been estimated, random effects, and latent factor variables. Section 4 includes simulation studies that analyze the finite sample performance of these techniques. Section 5 illustrates the application of our methodologies through a real-world example. Finally, we conclude with a succinct discussion in Section 6.

2. Model

2.1. Quantile Nonlinear DLVM

A nonlinear dynamic latent variable model (DLVM) comprises two distinct submodels. The first submodel is a measurement model, which serves to elucidate the connections between latent factors and their corresponding variables that are directly observable. The second submodel is a dynamic model, applied to investigate the time-delayed interactions among latent variables. The measurement model that establishes the connection between latent variables and their associated variables that are directly observable is delineated as follows:

z_{i t} = x_{i t} ω + Λ ϕ_{i t} + ε_{i t}, t = 1, \dots, T_{i}, i = 1, \dots, n,

(1)

where we encounter a vector of observable variables represented as

z_{i t} = {(z_{1 i t}, \dots, z_{J i t})}^{⊤}

, which is continuous with a

J \times 1

vector of variables that are directly observable. Concurrently, there is

x_{i t}

, a

J \times K

vector of covariates that may have missing data. We also introduce

ϕ_{i t}

, a

q \times 1

vector that signifies hidden factors affecting our observations. The vector

ω

contains the K regression coefficients, and

Λ

acts as the

J \times q

matrix, capturing the interplay between our manifest and latent variables. Furthermore,

ε_{i t}

is a J-dimensional error term, which is uncorrelated with both

x_{i t}

and

ϕ_{i t}

. The sample size is denoted by n, and

T_{i}

represents the number of repeated measurements for each individual observation. In some cases, we might encounter a scenario where

T_{1} = \dots = T_{n}

, which results in a balanced design. To move forward without losing any generality, let us set

T_{1} = T_{2} = \dots = T_{n} = T

[]. In the traditional framework of the dynamic latent variable model (DLVM), it is commonly accepted that the error term

ε_{i t}

adheres to a multivariate normal distribution. If the normality assumption for

z_{i}

is violated, it may result in biased parameter estimates. Consequently, it is crucial to develop a more robust approach within the context of dynamic factor analysis. To tackle this issue, we will investigate the conditional quantile regression (QR) of

z_{j i t}

conditional on

x_{j i t}

and

ϕ_{i t}

for a specific quantile level

τ

, where

0 < τ < 1

. Thus, we focus on the conditional quantile regression (QR) of variable

z_{j i t}

on

x_{j i t}

and

ϕ_{i t}

for the quantile level

τ \in (0, 1)

:

Q_{τ} (z_{j i t} | x_{j i t}, ϕ_{i t}) = x_{j i t}^{⊤} ω_{τ} + Λ_{τ j}^{⊤} ϕ_{i t}, j = 1, \dots, J, i = 1, \dots, n, t = 1, \dots, T,

(2)

where the expression

Q_{τ} (z_{j i t} | x_{j i t}, ϕ_{i t}) = \inf {z : F (z | x_{j i t}, ϕ_{i t}) \geq τ}

represents the

τ

th conditional quantile of

z_{j i t}

given the covariates

x_{j i t}

and the parameters

ϕ_{i t}

. Here,

F (z | x_{j i t}, ϕ_{i t})

denotes the conditional cumulative distribution function of

z_{j i t}

based on the same covariates and parameters. The vectors

x_{j i t}^{⊤}

and

Λ_{τ j}^{⊤}

correspond to the jth row of the matrices

x_{i t}

and

Λ_{τ}

, respectively. Additionally,

z_{j i t}

signifies the jth component of the vector

z_{i t}

, while

ω_{τ}

is a regression coefficient with K-dimensions, and

Λ_{τ}

is a

J \times q

loading matrix of the latent factor associated with the quantile level

τ

. It is important to note that both

ω_{τ}

and

Λ_{τ}

may exhibit variability across different quantiles, indicating that distinct values of

τ

are associated with varying coefficients and loading matrices of the latent factors.

According to Kozumi et al. [], the quantile regression model, denoted by Equation (2), can be re-expressed in the subsequent hierarchical models:

\{\begin{matrix} z_{j i t} = x_{j i t}^{⊤} ω_{τ} + Λ_{τ j}^{⊤} ϕ_{i t} + ρ_{1} ν_{j i t} + \sqrt{γ_{j}^{- 1} ρ_{2} ν_{j i t}} l_{j i t}, \\ ν_{j i t} | γ_{j} \sim Exp (γ_{j}^{- 1}) = γ_{j} exp (- γ_{j} ν_{j i t}), \\ l_{j i t} \sim N (0, 1) j = 1, \dots, J, i = 1, \dots, n, t = 1, \dots, T_{i} . \end{matrix}

where the variable

ν_{j i t}

follows an exponential distribution characterized by the parameter

γ_{j}^{- 1}

, denoted as Exp(

γ_{j}^{- 1}

). The probability density function for

ν_{j i t}

given

γ_{j}

is expressed as

f (ν_{j i t} | γ_{j}) = γ_{j} exp (- γ_{j} ν_{j i t})

. Additionally, the variable

l_{j i t} \sim N (0, 1)

. Furthermore, the parameters

ρ_{1}

and

ρ_{2}

are defined as follows:

ρ_{1} = (1 - 2 τ) / {τ (1 - τ)}

and

ρ_{2} = 2 / {τ (1 - τ)}

.

The model, which operates on a dynamic basis, for investigating the ties among the latent variables is considered to be:

ϕ_{i t} = f_{t} (ϕ_{i, t - 1}, μ_{i}, θ_{ϕ}) + ζ_{i t}, ζ_{i t} \overset{i . i . d}{\sim} N_{q} (0, Ψ_{ζ}),

(3)

In the model being proposed, the vector

μ_{i}

, with dimensions of

r \times 1

encapsulates the random influences that are constant over time and unique to each individual. As a

q \times 1

vector of functions that vary and are differentiable over time,

f_{t} (\cdot)

maps the interaction between the latent variables at time t and their levels one time unit earlier. The vector

θ_{ϕ}

consists of parameters that are invariant across both time and individuals, while

Ψ_{ζ}

denotes a

q \times q

covariance matrix associated with

ζ_{i t}

. In the standard dynamic models, it is a typical assumption that the

μ_{i}

are independent and identically follow a multivariate normal distribution. Nevertheless, in certain cases [], the presupposition of a normal distribution for the vector

μ_{i}

may not be appropriate. To address this issue, we propose that

μ_{i}

is drawn from an unknown distribution, specifically,

μ_{i} \sim G

, The notation

G

here is meant to represent a probability distribution that is random and without a fixed form.

In order to conduct Bayesian analysis on the parameters

μ

, we employ a Dirichlet process (DP) prior to approximate the distribution

G

. Specifically, we assume that

G \sim DP (κ G_{0})

, where

G_{0}

serves as a basic distribution that sets up a preliminary scaffold for an uncharted distribution. The parameter

κ

is a positive constant that represents the probability assigned by the user to the base distribution

G_{0}

. The choice of

G_{0}

is contingent upon the characteristics of the dataset under investigation.

In accordance with the methodologies established by Blei et al. [], we make use of the stick-breaking technique to simulate the DP prior. Specifically, we articulate

G \sim DP (κ G_{0})

as

G (\cdot) = \sum_{s = 1}^{\infty} π_{s} δ_{U_{s}} (\cdot), U_{s} \overset{i . i . d}{\sim} G_{0},

Here, we have the Dirac delta measure, denoted as

δ_{U_{s}} (\cdot)

, which is tightly focused on the vector

U_{s}

. This vector

U_{s}

is an

r \times 1

array that encapsulates the potential values of

μ_{i}

. The

π_{s}

values represent random probability weights that adhere to the constraints of being between 0 and 1, with their total summing to 1 across all s from 1 to infinity. The structure of

G

suggests that it is created from a variety of point masses or elongated sticks with different lengths, each anchored at different points defined by

U_{s}

. To facilitate the sampling of observations, we delve into the concept of a truncated Dirichlet Process (DP) for

G

:

P (\cdot) \approx \sum_{s = 1}^{S} π_{s} δ_{U_{s}} (\cdot), U_{s} \overset{i . i . d}{\sim} G_{0}, 1 \leq S < \infty,

(4)

where the

π_{s}

values are set by the stick-breaking strategy presented below:

π_{1} = v_{1}, π_{s} = v_{s} \prod_{l < s} (1 - v_{l}) for s = 2, \dots, S,

(5)

with

v_{s} \sim beta (1, b)

for

s = 1, \dots, S - 1

,

v_{S} = 1

and

\sum_{s = 1}^{S} π_{s} = 1

.

Given the presence of some complex posterior distributions, optimizing the posterior distribution of

μ_{i}

using the previously defined DP prior through a variational Bayesian approach is not feasible. To tackle this challenge, we define

μ_{i}

in relation to a latent variable

L_{i} \in 1, \dots, S

, which tracks the cluster affiliation of each

μ_{i}

and correlates with

μ_{i}

through

μ_{i} = U_{L_{i}}

. Consequently, Equation (4) can be reformulated as:

L_{i} | π \overset{i . i . d}{\sim} \sum_{s = 1}^{S} π_{s} δ_{U s} (\cdot),

(6)

where

π = {(π_{1}, \dots, π_{S})}^{⊤}

.

2.2. Mechanism of Missing Data

In this study, we acknowledge that both

z_{i t}

and

x_{j i t}

may have some missing values in their data. We can break down

z_{i t}

into two parts: the observed data,

z_{i t, obs}

, and the missing data,

z_{i t, mis}

. Similarly, for

x_{j i t}

, we have the observed portion,

x_{j i t, obs}

, and the missing portion,

x_{j i t, mis}

. This distinction allows us to clearly identify which data points we have and which ones are still elusive. Let

r_{i t}^{z} = {(r_{1 i t}^{z}, \dots, r_{J i t}^{z})}^{⊤}

be a vector of missing indicators for

z_{i t}

which has dimensions

J \times 1

, i.e.,

r_{k i t}^{z} = 0

if

z_{k i t}

is missing and

r_{k i t}^{z} = 1

if

z_{k i t}

is observed for

k = 1, \dots, J

. Likewise, we establish

r_{j i t}^{x} = {(r_{j i t, 1}^{x}, \dots, r_{j i t, n_{k}}^{x})}^{⊤}

to represent the missing data indicator for

x_{j i t}

, with

r_{j i t, k}^{x} = 0

indicating missing data for

x_{j i t, k}

, and

r_{j i t, k}^{x} = 1

indicating observed data. To streamline notation, we define

f (r_{i t}^{z} | z_{i t}, χ)

and

f (r_{j i t}^{x} | x_{j i t}, ξ)

as the probability density functions for

r_{i t}^{z}

and

r_{j i t}^{x}

, respectively, Here,

χ

and

ξ

represent the parameter vectors linked to the missing data indicators

r_{i t}^{z}

and

r_{j i t}^{x}

.

Given that

r_{g i t}^{z}

is independent of

r_{h i t}^{z}

for

g \neq h

, owing to the binary nature of

r_{k i t}^{z}

, we consider the following model:

f (r_{i t}^{z} | z_{i t}, χ)

:

f (r_{i t}^{z} | z_{i t}, χ) = \prod_{k = 1}^{J} {\Pr (r_{k i t}^{z} = 1 | z_{i t}, χ)}^{r_{k i t}^{z}} {1 - \Pr (r_{k i t}^{z} = 1 | z_{i t}, χ)}^{1 - r_{k i t}^{z}},

where the probability

\Pr (r_{k i t}^{z} = 1 | z_{i t}, χ)

is contingent upon the missing entries in

z_{i t}

, signifying that the mechanism of data missingness under consideration is not random, that is, it is nonignorable. Drawing parallels with the work of Tuerde et al. [], through the use of a collection of univariate dimensional conditional probability distributions, we can detail the mechanism underlying the patterns of missing data within the model for

f (r_{j i t}^{x} | x_{j i t}, ξ)

, That is to say

\begin{matrix} f (r_{j i t}^{x} | x_{j i t}, ξ) & = & f (r_{j i t, n_{k}}^{x} | r_{j i t (n_{k})}^{x}, x_{j i t (n_{k})}, ξ_{n_{k}}) f (r_{j i t, n_{k} - 1}^{x} | r_{j i t (n_{k} - 1)}^{x}, x_{j i t (n_{k} - 1)}, ξ_{n_{k} - 1}) \\ \times \dots \times f (r_{j i t, 2}^{x} | r_{j i t (2)}^{x}, x_{j i t (2)}, ξ_{2}) f (r_{j i t, 1}^{x} | x_{j i t, 1}, ξ_{1}), \end{matrix}

(7)

where

ξ_{k}

represents an unknown parameter vector linked to the conditional distribution of

r_{j i t, k}^{x}

, conditioned on the set

{r_{j i t (k)}^{x}, x_{j i t (k)}}

, where

r_{j i t (k)}^{x} = {r_{j i t, 1}^{x}, \dots, r_{j i t, k - 1}^{x}}

and

x_{j i t (k)} = {x_{j i t, 1}, \dots, x_{j i t, k}}

. Here,

x_{j i t, k}

denotes the element of

x_{j i k}

for

k = 1, \dots, n_{k}

, and

ξ = {ξ_{1}, \dots, ξ_{n_{k}}}

. Considering

r_{j i t, k}^{x}

as a binary indicator, it follows that:

\begin{matrix} f (r_{j i t, k}^{x} | r_{j i t (k)}^{x}, x_{j i t (k)}, ξ_{k}) & = & {\Pr (r_{j i t, k}^{x} = 1 | r_{j i t (k)}^{x}, x_{j i t (k)}, ξ_{k})}^{r_{j i t, k}^{x}} \\ \times {1 - \Pr (r_{j i t, k}^{x} = 1 | r_{j i t (k)}^{x}, x_{j i t (k)}, ξ_{k})}^{1 - r_{j i t, k}^{x}}, \end{matrix}

Here, the probability

\Pr (r_{j i t, k}^{x} = 1 | r_{j i t (k)}^{x}, x_{j i t (k)}, ξ_{k})

s influenced by the missing entries within

x_{j i t}

, suggesting that the data’s missingness mechanism being examined is not ignorable.

In line with the research conducted by Lee and Tang [], the likelihood Pr

(r_{k i t}^{z} = 1 | z_{i t}, χ)

can be delineated as

logit {\Pr (r_{k i t}^{z} = 1 | z_{i t}, χ)} = χ_{z 0} + χ_{z 1} z_{k i t} + χ_{z 2} z_{k i, t - 1}, k = 1, \dots, J,

where

χ = {(χ_{z 0}, χ_{z 1}, χ_{z 2})}^{⊤}

is a vector of the regression coefficients that are independent of both time and individual factors. In certain scenarios, it is plausible to account for a time-varying mechanism affecting data missingness. Consequently, an analogous model can be applied to determine

\Pr (r_{j i t, k}^{x} = 1 | r_{j i t (k)}^{x}, x_{j i t (k)}, η_{k})

; following Tuerde et al. [], we define

\Pr (r_{j i t, k}^{x} = 1 | r_{j i t (k)}^{x}, x_{j i t (k)}, η_{k})

using the probit regression model:

Φ^{- 1} {\Pr (r_{j i t, k}^{x} = 1 | r_{j i t (k)}^{x}, x_{j i t (k)}, η_{k})} = f_{j i t, k}^{⊤} η_{k},

(8)

where

f_{j i t, k} = {(1, x_{j i t, 1}, \dots, x_{j i t, k}, r_{j i t, 1}^{x}, \dots, r_{j i t, k - 1}^{x})}^{⊤}

; the

Φ^{- 1} (\cdot)

denotes the standard normal CDF’s inverse transformation. To establish a foundational normal regression framework, we introduce latent variables. Specifically, by defining the latent variables

ς_{j i t, k}

,

(j = 1, \dots, J,

i = 1, \dots, n,

and

t = 1, \dots, T, k = 1, \dots, n_{k})

, model (8) can be rewritten as

r_{j i t, k}^{x} = \{\begin{matrix} 1 & if ς_{j i t, k} > 0, \\ 0 & if ς_{j i t, k} \leq 0, \end{matrix}

(9)

where

ς_{j i t, k} = f_{j i t, k}^{⊤} η_{k} + ϵ_{j i t, k}, ϵ_{j i t, k} \sim N (0, 1)

.

Drawing from Equation (9), the conditional probability density function as

r_{j i t, k}^{x}

given

{r_{j i t (k)}^{x}, x_{j i t (k)}}

can be characterized by the expression

\begin{matrix} f (r_{j i t, k}^{x} | r_{j i t (k)}^{x}, x_{j i t (k)}, η_{k}) & = & \int f (r_{j i t, k}^{x}, ς_{j i t, k} | r_{j i t (k)}^{x}, x_{j i t (k)}, η_{k}) d ς_{j i t, k} \\ = & \int f (ς_{j i t, k} | r_{j i t (k)}^{x}, x_{j i t (k)}, η_{k}) {I (r_{j i t, k}^{x} = 1) I (ς_{j i t, k} > 0) \\ + I (r_{j i t, k}^{x} = 0) I (ς_{j i t, k} \leq 0)} d ς_{j i t, k}, \end{matrix}

where

f (ς_{j i t, k} | r_{j i t (k)}^{x}, x_{j i t (k)}, η_{k}) \sim N (f_{j i t, k}^{⊤} η_{k}, 1)

.

2.3. The Missing Covariates Distribution

Adhering to the methodology of Tuerde et al. [], the joint density function can be derived by linking a chain of the forthcoming univariate dimensional conditional distributions

f (x_{j i t} | β, δ)

:

f (x_{j i t} | β, δ) = f (x_{j i t, n_{K}} | x_{j i t, 1}, \dots, x_{j i t, n_{K - 1}}, β_{n_{K}}, δ_{n_{K}}) \dots f (x_{j i t, 2} | x_{j i t, 1}, β_{2}, δ_{2}) f (x_{j i t, 1} | γ_{1}, δ_{1}),

(10)

where

γ_{k}

and

δ_{k}

can be estimated and linked to the k-th conditional distribution

f (x_{j i t, k} | x_{j i t, 1}

,

\dots, x_{j i t, k - 1}, β_{k}, δ_{k})

,

β = {β_{1}, \dots, β_{K}}

, and

δ = {δ_{1}, \dots, δ_{K}}

. Equation (10) demonstrates flexibility in defining the distribution of covariates with missing data, irrespective of whether they are discrete random variables or continuous random variables. To allow for a broader range of distributions, it is posited that the missing covariate

x_{j i t, k} (j = 1, \dots, J, i = 1, \dots, n

and

t = 1, \dots, T, k = 1, \dots, n_{K}

) conforms to a distribution within the exponential family:

f (x_{j i t, k} | x_{j i t, 1}, \dots, x_{j i t, k - 1}, β_{k}, δ_{k}) = exp \{\frac{x_{j i t, k} ϑ_{j i t, k} - b_{k} (ϑ_{j i t, k})}{a_{k} (δ_{k})} - h_{k} (x_{j i t, k}, δ_{k})\},

(11)

where the functions

a_{k} (\cdot) > 0

are strictly positive, while

b_{k} (\cdot)

and

h_{k} (\cdot)

are predetermined functions. The probability density function

x_{j i t, k}

, as delineated in Equation (11), encompasses special instances such as the binomial distribution, Gaussian distribution, and gamma distribution. We give thought to the following model for

μ_{j i t, k}^{x}

:

ξ_{j i t, k} = g (μ_{j i t, k}^{x}) = β_{k 0} + β_{k 1} x_{j i t, 1} + \dots + β_{k, k - 1} x_{j i t, k - 1},

where the link function

g (\cdot)

is identified and is strictly monotonically increasing, with differentiable attributes, and

β_{k} = {(β_{k 0}, β_{k 1}, \dots, β_{k, k - 1})}^{⊤}

.

3. Variational Bayesian Inference

Variational Bayes

For ease of understanding, we refer to

Ξ = {Ξ_{z}, Ξ_{ϕ}, Ψ_{ζ}, b, κ, γ, χ, ξ, β, ν, η, μ_{μ}, Ψ_{μ}, ς, L}

,

r = {r_{z}, r_{x}}

, and

D_{obs} = {z_{obs}, x_{obs}, S}

, in which

Ξ_{z}

encompasses all the parameters that are unknown and related to

ω_{τ}

,

Λ_{τ}

and

γ = {γ_{1}, \dots, γ_{p}}

,

r_{x} = {r_{m i t}^{x} : m = 1, \dots, M, i = 1, \dots, n, t = 1, \dots, T}

,

r_{z} = {r_{i t}^{z} : i = 1, \dots, n, t = 1, \dots, T}

,

z_{obs} = {z_{i t, obs} : i = 1, \dots, n, t = 1, \dots, T}

,

x_{obs} = {x_{m i t, obs} : m = 1, \dots, M, i = 1, \dots, n, t = 1, \dots, T}

,

S = {S_{i t} : i = 1, \dots, n, t = 1, \dots, T}

,

L = {L_{i} : i = i = 1, \dots, n}

,

ς = {ς_{m i t, k} : m = 1, \dots, M, i = 1, \dots, n, t = 1, \dots, T, k = 1, \dots, n_{K}}

,

D = {x, z}

,

D_{o b s} = {x_{o b s}, z_{o b s}}

; in light of the assumption stated above, the integrated posterior density of

Ξ

considering r and

D_{obs}

, assumes the subsequent form

\begin{matrix} f (Ξ | r, D_{obs}) & \propto & \{\prod_{i = 1}^{n} \prod_{t = 1}^{T} \int f (z_{i t} | x_{i t}, ϕ_{i t}, ν_{i t}, Ξ_{z}) f (ν_{i t} | γ) f (ϕ_{i t} | μ_{i}, Ξ_{ϕ}, Ψ_{ζ}) f (μ_{i} | Ξ_{μ}) f (Ψ_{i} | Ξ_{μ}) \\ f (x_{i t, mis} | γ, ϕ) f (r_{i t}^{z} | z_{i t}, χ) f (r_{i t}^{x} | x_{i t}, ς, ξ) f (ς | x, r_{x}, ψ) f (χ | z_{o b s, r_{i t}^{z}}) f (ξ | z_{o b s, r_{i t}^{z}}) \\ f (κ | ν) f (L | π) d z_{i t, mis} d x_{i t, mis} d ϕ_{i t} d ν_{i t} d μ_{i} d ξ d χ d ς\} f (Ξ), \end{matrix}

where

x_{i t, mis} = {x_{m i t, mis} : m = 1, \dots, M, i = 1, \dots, n, t = 1, \dots, T}

,

ν_{i t} = {ν_{m i t} : m = 1, \dots, M, i = 1, \dots, n, t = 1, \dots, T}

,

r_{i t}^{x} = {r_{m i t}^{x} : m = 1, \dots, M, i = 1, \dots, n, t = 1, \dots, T}

,

Ξ_{μ}

is a parameter set related to the distribution of

μ_{i}

,

f (Ξ)

denotes the prior distribution of

Ξ

. Obviously, the task of obtaining a closed-form representation of

f (Ξ | r, D_{obs})

is quite arduous, given the intricate high-dimensional integrals at play, suggesting that Bayesian inference on

Ξ

is predicated on

f (Ξ | r, D_{obs})

. Addressing this issue, augmenting the sets

{z_{mis}, x_{mis}, Ξ_{ϕ}, Ψ_{ζ}, b, κ, ν, χ, ξ, β, ϕ, μ_{μ}, Ψ_{μ}, ς, L}

with the observed data

{r, D_{obs}}

in the Bayesian analysis facilitates the estimation of the posterior density

f (Ξ | D_{o b s}, r)

that is conducive to Bayesian inference without the complexity of high-dimensional integrals.

According to the principles of variational inference, the initial step involves creating a variational family

Ω

of probability density functions for the random variable ℜ, which is designed to have the same support as the posterior distribution

g (ℜ | D)

, where

ℜ = {Ξ_{z}, Ξ_{ϕ}, Ψ_{ζ}, b, η, γ, χ, ξ, ϕ, μ_{μ}, Ψ_{μ}, ν, ς, L}

. It is thought that

d (ℜ)

, which is in

Ω

, is a variational density for approximating

g (ℜ | D)

. The variational Bayes strategy aims to identify the most fitting approximation to

g (ℜ | D)

by minimizing the Kullback–Leibler divergence between

d (ℜ)

and

g (ℜ | D)

, tackling the optimization problem:

d^{*} (ℜ) = \underset{d (ℜ) \in Ω}{arg min} K L (d (ℜ) ∥ g (ℜ | D)),

where

\begin{matrix} K L (d (ℜ) ∥ g (ℜ | D)) & = & \int log \{\frac{d (ℜ)}{g (ℜ | D)}\} \\ = & \int log {\frac{d (ℜ) g (z | x)}{g (ℜ, z | x)}} d (ℜ) d ℜ \\ = & E_{d (ℜ)} {log d (ℜ)} - E_{d (ℜ)} {log g (ℜ, z | x)} + log g (z | x) \geq 0 \end{matrix}

where the term

E_{d (ℜ)} (\cdot)

in this case refers to the expected value relative to

d (ℜ)

. The Kullback–Leibler divergence

K L (d (ℜ) | g (ℜ | D))

is zeroed out if, and only if,

d (ℜ)

is equivalent to

g (ℜ | D)

. The optimization problem is very difficult due to the complex high-dimensional integral.

Alternatively, it is demonstrated from

L {d (ℜ)} = E_{d (ℜ)} {log g (ℜ, z | x)} - E_{d (ℜ)} {log d (ℜ)}

that

log g (z | x) = K L (d (ℜ) ∥ g (ℜ | D)) + L {d (ℜ)} \geq L {d (ℜ)} .

Therefore, ℜ is considered to be the minimum achievable value for

log g (z | x)

, commonly known as the Evidence Lower Bound (ELB) (see Appendix A). Following this, minimizing the Kullback–Leibler divergence

K L (d (ℜ) | g (ℜ | D))

is the same as maximizing

L {d (ℜ)}

, because

log g (z | x)

is not associated with ℜ. This means that,

d^{*} = \underset{d (ℜ) \in Ω}{arg min} K L (d (ℜ) ∥ g (ℜ | D)) = \underset{d (ℜ) \in Ω}{arg max} L {d (ℜ)} .

The goal of identifying the closest approximation to

g (ℜ | D)

becomes an issue of maximizing the problem of

L {d (ℜ)}

within the variational family

Ω

. Due to the complexity inherent in the variational set

Ω

, the optimization task becomes particularly challenging. As such, it is more reasonable to pursue optimization over a simpler variational set

Ω

.

To develop a straightforward variational set based on prevalent techniques, we define

Ω

as the mean-field variational family, where each element operates independently, and every element has a unique factor in the variational density. Thus, it can be inferred that the variational density

d (ℜ)

takes a specific form:

\begin{matrix} d (ℜ) = d (z_{m i s}, Λ_{z}, ν, γ, ξ, χ, b, μ_{u}, Φ_{u}, η, δ, Φ_{ζ}, ς, L) & = d (z_{m i s}) d (Λ_{z}) d (ν) d (γ) d (χ) d (ξ) d (η) d (b) d (μ_{u}) \\ \times d (Φ_{u}) d (Φ_{ζ}) d (ς) d (L) \equiv \prod_{m = 1}^{M} d_{m} (η_{m}), \end{matrix}

(12)

where

d_{m} (η_{m})

s are not explicitly defined, the assumed separation into distinct components is already set. Consistent with existing procedures in the variational literature, the best possible solutions for

d_{m} (η_{m})

can be obtained by maximizing

L {d (ℜ)}

through the method of coordinate ascent.

Following the approach of the coordinate ascent method as referenced in [,,], when keeping fixed the other variational factors

d_{l} (ϑ_{l})

for

l \neq s

, i.e.,

ϑ_{- s} = {ϑ_{l} : l \neq s, l = 1, \dots, J}

, the optimal variational density

d_{s}^{*} (ϑ_{s})

maximizing

L {d (ℜ)}

in terms of

d_{s} (ϑ_{s})

is given by the following form:

\begin{matrix} d_{s}^{*} (ϑ_{s}) & \propto exp [E_{- s} {log g (ϑ_{s} | ϑ_{- s}, D)}] \\ \propto exp [E_{- s} {log g (z, Ξ | x)}], \end{matrix}

(13)

where the function

g (ϑ_{s} | ϑ_{- s}, D)

denotes the conditional distribution of

ϑ_{s}

given

(ϑ_{- s}, D)

, and

E - s (\cdot)

signifies the expected value with respect to

d - s (ϑ_{- s})

. Equation (13) shows that

E - s (\cdot)

is detached from the sth variational component

d_{s} (ϑ_{s})

, and the optimal variational density

d_{s}^{*} (ϑ_{s})

is inaccessible because the

d - s (ϑ_{- s})

on the right-hand side are not optimal. To resolve this issue, the coordinate ascent procedure is iteratively applied to refine

d_{s}^{*} (ϑ_{s})

according to Equation (13). Upon reaching convergence, either the mean or mode of the optimal variational density

d_{s}^{*} (ϑ_{s})

is picked to approximate the parameter vector s using a variational Bayesian framework.

From Equation (13), it is straightforward to deduce that the optimal density

d^{*} (b)

assumes the form:

d^{*} (b) = B e t a (γ_{s, 1}, γ_{s, 2}), s = 1, \dots, S - 1,

where

\{\begin{matrix} γ_{s, 1} = 1 + \sum_{i = 1}^{n} ϕ_{i, s}^{*}, \\ γ_{s, 2} = E_{- V_{s}} [ν] + \sum_{i = 1}^{n} \sum_{j = s + 1}^{S} ϕ_{i, j}^{*}, \end{matrix}

The optimal density

d^{*} (μ_{μ})

is the multivariate normal distribution

d^{*} (μ_{μ}) = N (μ_{1}, Σ_{1}),

where

\begin{matrix} μ_{1} = Σ_{1} [Σ_{0 μ}^{- 1} μ_{0 μ}] + E_{- μ_{μ}} [H_{0 μ}] \sum_{j = 1}^{d} \sum_{s = 1}^{S} I (s = L_{j}^{*}) E_{- μ_{μ}} [μ_{L_{j}^{*}}], \\ Σ_{1} = {[S E_{- μ_{μ}} [H_{0 μ}] + H_{0 μ}^{- 1}]}^{- 1}, \end{matrix}

The optimal density

d^{*} (Ψ_{μ}

,

Ψ_{μ}) = diag (Φ_{1}, \dots, Φ_{S})

,

d^{*} (Φ_{l}^{- 1}) = Γ (w_{1 l}, w_{2 l}), l = 1, \dots, S,

\begin{matrix} w_{1 l} = c_{1} + \frac{S}{2}, \\ w_{2 l} = c_{2} + \frac{1}{2} \sum_{s = 1}^{S} E_{- Ψ_{μ}} [{(Z_{l} - μ_{μ_{l}})}^{2}], \end{matrix}

where

μ_{s l}

is the l-th element associated with the mass point s in the set or sequence

μ

, and

μ_{μ_{l}}

is the l-th element in the set or sequence

μ_{z}

.

The optimal density

L

is a multinomial distribution

d^{*} (L_{i}) = M u l t i n o m i a l (ϕ_{i, s}^{*}),

\{\begin{matrix} ϕ_{i, s} \propto exp (S_{s}), \\ S_{s} = E_{- L_{i}} ln V_{s} + \sum_{s = 1}^{S - 1} E_{- L_{i}} ln (1 - V_{s}) + \sum_{i = 1}^{n} \sum_{t = 1}^{T} E_{- L_{i}} [{(ϕ_{i t} - B_{i} ϕ_{i, t - 1})}^{⊤}, \\ Φ_{ζ}^{- 1} (ϕ_{i t} - B_{i} ϕ_{i, t - 1}) + ln Φ_{ζ}] - E_{- L_{i}} [ln | Φ_{ζ} |], \end{matrix}

where

ϕ_{i, s}^{*} = \frac{ϕ_{i, s}}{\sum_{s = 1}^{S} ϕ_{i, s}}

,

B_{i} = [\begin{matrix} μ_{11 i} & μ_{12 i} \\ μ_{21 i} & μ_{22 i} \end{matrix}]

,

μ_{i} = {(μ_{i 11}, μ_{i 22}, μ_{i 12}, μ_{i 21})}^{⊤}

,

s = 1, \dots, S

,

i = 1, \dots, n .

E_{- L_{i}} [ln V_{s}] = Ψ (γ_{s, 1}) - Ψ (γ_{s, 1} + γ_{s, 2}),

E_{- L_{i}} [ln (1 - V_{s})] = Ψ (γ_{s, 2}) - Ψ (γ_{s, 1} + γ_{s, 2}),

where

Ψ

is a digamma function.

Let

λ_{1}^{*}, \dots, λ_{d}^{*}

be the d unique

λ_{i}

values (i.e., unique number of a cluster),

U^{λ} = (U_{λ_{1}^{*}}, \dots, U_{λ_{d}^{*}})

, and let

U^{[λ]}

be components in

U = (U_{1}, \dots, U_{S})

other than

U^{λ}

. Then, the optimal density

d^{*} (U^{[λ]})

has the form:

d^{*} (U^{[λ]}) = N (μ_{U}, Φ_{U}),

The optimal density of each of the elements

U_{λ_{s}^{*}}

of

U^{λ} = (U_{λ_{1}^{*}}, \dots, U_{λ_{d}^{*}})

is

d^{*} (U_{λ_{s}^{*}}) \propto exp {E_{- U_{λ_{s}^{*}}} [P (U_{λ_{s}^{*}} | μ_{u}, Φ_{u}) \prod_{i : λ_{i} = λ_{s}^{*}} p (ϕ_{i} | μ_{i} = U_{λ_{s}^{*}}, θ_{ϕ})]},

where

p (ϕ_{i} | μ_{i} = U_{λ_{s}^{*}}, θ_{ϕ})

is

\{\begin{matrix} p (ϕ_{i 0}) \prod_{t = 1}^{T} p (ϕ_{i t} | ϕ_{i, t - 1}, μ_{i} = U_{λ_{s}^{*}}, θ_{ϕ}), if ϕ_{i 0} random \\ \prod_{t = 1}^{T} p (ϕ_{i t} | ϕ_{i, t - 1}, μ_{i} = U_{λ_{s}^{*}}, θ_{ϕ}), other . \end{matrix}

(14)

due to

\prod_{t = 1}^{T} p (ϕ_{i t} | ϕ_{i, t - 1}, U_{i} = U_{λ_{s}^{*}}, θ_{ϕ})

being a differentiable linear or nonlinear function, which is difficult to optimize. In order to optimize convenience, we sometimes relax the conditions.

Let

B_{i} = (\begin{matrix} μ_{11 i} & μ_{12 i} \\ μ_{21 i} & μ_{22 i} \end{matrix})

, and the dynamic model have the form

ϕ_{i t} = B_{i} ϕ_{i, t - 1} + ζ_{i t} ζ_{i t} \sim N (0, Ψ_{ζ}),

(15)

following model (14); model (15) has

p (ϕ_{i t} | ϕ_{i, t - 1}, μ_{i} = U_{λ_{s}^{*}}, θ_{ϕ}) = N (U_{λ_{s}^{*}}^{*} ϕ_{i, t - 1}, Ψ_{ζ})

, where

U_{λ_{s}^{*}}^{*} = (\begin{matrix} μ_{11, λ_{s}^{*}} & μ_{12, λ_{s}^{*}} \\ μ_{21, λ_{s}^{*}} & μ_{22, λ_{s}^{*}} \end{matrix}),

we consider the components optimal density of

U_{λ_{s}^{*}} = (u_{11, λ_{s}^{*}}, u_{12, λ_{s}^{*}}, u_{21, λ_{s}^{*}}, u_{22, λ_{s}^{*}})

.

The optimal density of component

u_{11, λ_{s}^{*}}

has the form

d^{*} (u_{11, λ_{s}^{*}}) = N (μ_{u_{11, λ_{s}^{*}}}, σ_{u_{11, λ_{s}^{*}}}^{2}),

where

\begin{matrix} μ_{u_{11, λ_{s}^{*}}} = & σ_{u_{11, λ_{s}^{*}}}^{2} (E_{- u_{11, λ_{s}^{*}}} [μ_{u, 1} Φ_{u, 1}^{- 1}] + \sum_{i : λ_{i} = λ_{s}^{*}} \sum_{t = 1}^{T} E_{- u_{11, λ_{s}^{*}}} [ϕ_{i, t - 1, 1} (ϕ_{i t, 1} s_{11} + ϕ_{i t, 2} s_{21} - u_{12, λ_{s}^{*}} ϕ_{i, t - 1, 2} s_{11} \\ - (u_{21, λ_{s}^{*}} ϕ_{i, t - 1, 1} + ϕ_{22, L_{s}^{*}} ϕ_{i, t - 1, 2}) s_{21})]), \end{matrix}

σ_{u_{11, λ_{s}^{*}}}^{2} = {(E_{- u_{11, λ_{s}^{*}}} [Φ_{u, 1}^{- 1} + s_{11} \sum_{i : λ_{i} = λ_{s}^{*}} \sum_{t = 1}^{T} ϕ_{i, t - 1, 1}^{2}])}^{- 1}

where

μ_{u, 1}

is the component of the mean vector

μ_{u}

,

Φ_{z, 1}^{- 1}

is the diagonal component of covariance

Φ_{u}

,

ϕ_{i t} = {(ϕ_{i t, 1}, ϕ_{i t, 2})}^{⊤}

,

ϕ_{i, t - 1} = {(ϕ_{i, t - 1, 1}, ϕ_{i, t - 1, 2})}^{⊤}

,

E_{Ψ} [Ψ_{ζ}^{- 1}] = [\begin{matrix} s_{11} & s_{12} \\ s_{21} & s_{22} \end{matrix}]

.

The optimal density of component

u_{12, λ_{s}^{*}}

has the form

d^{*} ((u_{12, λ_{s}^{*}}) = N (μ_{u_{12, λ_{s}^{*}}}, σ_{u_{12, λ_{s}^{*}}}^{2}),

where

\begin{matrix} μ_{u_{12, λ_{s}^{*}}} = & σ_{u_{12, λ_{s}^{*}}}^{2} (E_{- u_{12, λ_{s}^{*}}} [μ_{u, 2} Φ_{u, 2}^{- 1}] + \sum_{i : λ_{i} = λ_{s}^{*}} \sum_{t = 1}^{T} E_{- u_{12, λ_{s}^{*}}} [ϕ_{i, t - 1, 2} (ϕ_{i t, 1} s_{11} + ϕ_{i t, 2} s_{21} - u_{11, λ_{s}^{*}} ϕ_{i, t - 1, 1} s_{11} \\ - (u_{21, λ_{s}^{*}} ϕ_{i, t - 1, 1} + u_{22, λ_{s}^{*}} ϕ_{i, t - 1, 2}) s_{21})]), \end{matrix}

σ_{u_{12, λ_{s}^{*}}}^{2} = {(E_{- u_{12, λ_{s}^{*}}} [Φ_{u, 2}^{- 1} + s_{11} \sum_{i : λ_{i} = λ_{s}^{*}} \sum_{t = 1}^{T} ϕ_{i, t - 1, 2}^{2}])}^{- 1},

where

μ_{u, 2}

is the component of the mean vector

μ_{z}

,

Φ_{z, 2}^{- 1}

is the diagonal component of the covariance

Φ_{z}

,

ϕ_{i t} = {(ϕ_{i t, 1}, ϕ_{i t, 2})}^{⊤}

,

ϕ_{i, t - 1} = {(ϕ_{i, t - 1, 1}, ϕ_{i, t - 1, 2})}^{⊤}

,

E_{Ψ} [Ψ_{ζ}^{- 1}] = [\begin{matrix} s_{11} & s_{12} \\ s_{21} & s_{22} \end{matrix}]

.

The optimal density of the component

u_{21, λ_{s}^{*}}

has the form

d^{*} (u_{21, λ_{s}^{*}}) = N (μ_{u_{21, λ_{s}^{*}}}, σ_{u_{21, λ_{s}^{*}}}^{2}),

where

\begin{matrix} μ_{u_{21, λ_{s}^{*}}} = & σ_{u_{21, λ_{s}^{*}}}^{2} (E_{- u_{21, λ_{s}^{*}}} [μ_{u, 3} Φ_{u, 3}^{- 1}] + \sum_{i : λ_{i} = λ_{s}^{*}} \sum_{t = 1}^{T} E_{- u_{21, λ_{s}^{*}}} [ϕ_{i, t - 1, 1} (ϕ_{i t, 1} s_{12} + ϕ_{i t, 2} s_{22} - u_{22, λ_{s}^{*}} ϕ_{i, t - 1, 2} s_{22} \\ - (u_{11, λ_{s}^{*}} ϕ_{i, t - 1, 1} + u_{12, λ_{s}^{*}} ϕ_{i, t - 1, 2}) s_{21})]), \end{matrix}

σ_{u_{21, λ_{s}^{*}}}^{2} = {(E_{- u_{21, λ_{s}^{*}}} [Φ_{u, 3}^{- 1} + s_{22} \sum_{i : λ_{i} = λ_{s}^{*}} \sum_{t = 1}^{T} ϕ_{i, t - 1, 1}^{2}])}^{- 1},

where

μ_{u, 3}

is the component of the mean vector

μ_{u}

,

Φ_{u, 3}^{- 1}

is the diagonal component of the covariance

Φ_{u}

,

ϕ_{i t} = {(ϕ_{i t, 1}, ϕ_{i t, 2})}^{⊤}

,

ϕ_{i, t - 1} = {(ϕ_{i, t - 1, 1}, ϕ_{i, t - 1, 2})}^{⊤}

,

E_{Ψ} [Ψ_{ζ}^{- 1}] = [\begin{matrix} s_{11} & s_{12} \\ s_{21} & s_{22} \end{matrix}]

.

The optimal density of the latent variable

d^{*} (ϕ_{i t} ∣ ϕ_{i, t - 1})

,

t = 2, \dots, T, i = 1, \dots, n,

is a multivariate normal distribution,

ϕ_{i t} \sim N (μ_{i t}, Σ_{i t}),

where

ϕ_{i t} = B_{i} ϕ_{i, t - 1} + ζ_{i t}, Σ_{ϕ 0}^{*} = {[Σ_{ω 0}^{- 1} + E_{ϕ_{i t}} [B_{i}^{⊤} Ψ_{ζ}^{- 1} B_{i}]]}^{- 1}, ω_{0} = {(0, 0)}^{⊤}

,

ϕ_{i 1} \sim N (ϕ_{0}, Σ_{ϕ 0}^{*})

,

Σ_{i, t - 1}^{*} = {[Σ_{i, t - 1}^{- 1} + E_{- ϕ_{i t}} [B_{i}^{⊤} Ψ_{ζ}^{- 1} B_{i}]]}^{- 1}

,

Σ_{i t} = {[E_{- ϕ_{i t}} [Ψ_{ζ}^{- 1}] + E_{- ϕ_{i t}} [Λ^{⊤} M_{p \times p} Λ] - E_{- ϕ_{i t}} [B_{i}^{⊤} Ψ_{ζ}^{- 1} B_{i}]]}^{- 1}

.

Let

M_{p \times p} = d i a g {E_{- ϕ_{i t}} [{(k_{2} γ_{1} l_{i t 1})}^{- 1}], \dots, E_{- ϕ_{i t}} [{(k_{2} γ_{p} l_{i t p})}^{- 1}]}

,

μ_{i t} = Σ_{i t} [E_{- ϕ_{i t}} [Λ^{⊤} M_{p \times p}] (E_{- ϕ_{i t}} [z_{i t}] - E_{- ϕ_{i t}} [ω] x_{i t} - E_{- ϕ_{i t}} [k_{1} υ_{i t}]) + E_{- ϕ_{i t}} [Ψ_{ζ}^{- 1} B_{i}] Σ_{i, t - 1}^{*} Σ_{i, t - 1}^{- 1} μ_{i, t - 1}]

,

E_{- ϕ_{i t}} [B_{i}^{⊤} Ψ_{ζ}^{- 1} B_{i}] = E_{Ψ, B} [B_{i}^{⊤} Ψ_{ζ}^{- 1} B_{i}] = E_{B} [B_{i}^{⊤} E_{Ψ} [Ψ_{ζ}^{- 1}] B_{i}]

. Due to

B_{i} = [\begin{matrix} μ_{11 i} & μ_{12 i} \\ μ_{21 i} & μ_{22 i} \end{matrix}]

,

μ_{i} = {(μ_{i 11}, μ_{i 22}, μ_{i 12}, μ_{i 21})}^{⊤}

. We assumption

μ_{i} \sim N (μ_{λ_{i}}, Σ_{λ_{i}}), i = 1, \dots, n

.

Ψ_{ζ} \sim IW (n T + w 0, R_{ω} + Ψ_{ζ_{0}}^{- 1})

,

Ψ_{ζ}^{- 1} \sim W (n T + w 0, {[R_{ω} + Ψ_{ζ_{0}}^{- 1}]}^{- 1})

. so

E_{Ψ} [Ψ_{ζ}^{- 1}] = (n T + w 0) \times [{[R_{ω} + Ψ_{ζ_{0}}^{- 1}]}^{- 1}]

, let

E_{Ψ} [Ψ_{ζ}^{- 1}] = [\begin{matrix} s_{11} & s_{12} \\ s_{21} & s_{22} \end{matrix}]

,

\begin{matrix} E_{- θ_{j}} [B_{i}^{⊤} Ψ_{ζ}^{- 1} B_{i}] = E_{B} [B_{i}^{⊤} E_{Ψ} [Ψ_{ζ}^{- 1}] B_{i}] = E_{B_{i}} [{[\begin{matrix} b_{11 i} & b_{12 i} \\ b_{21 i} & b_{22 i} \end{matrix}]}^{⊤} [\begin{matrix} s_{11} & s_{12} \\ s_{21} & s_{22} \end{matrix}] [\begin{matrix} b_{11 i} & b_{12 i} \\ b_{21 i} & b_{22 i} \end{matrix}]] \\ = [\begin{matrix} E [μ_{11 i}^{2}] s_{11} + 2 E [μ_{12 i} b_{21 i}] s_{21} + E [μ_{21 i}^{2}] s_{22} & E [μ_{11 i} μ_{12 i}] s_{11} + E [μ_{12 i} μ_{21 i}] s_{21} + E [μ_{11 i} b_{22 i}] s_{12} + E [μ_{21 i} μ_{22 i}] s_{22} \\ E [μ_{11 i} μ_{12 i}] s_{11} + E [μ_{11 i} μ_{22 i}] s_{21} + E [μ_{12 i} μ_{21 i}] s_{12} + E [μ_{22 i} μ_{21 i}] s_{22} & E [μ_{12 i}^{2}] s_{11} + 2 E [μ_{12 i} μ_{22 i}] s_{21} + E [μ_{22 i}^{2}] s_{22} \end{matrix}], \end{matrix}

where

\begin{matrix} E [μ_{11 i}^{2}] = E^{2} [μ_{11 i}] + V a r (μ_{11 i}), E [μ_{22 i}^{2}] = E^{2} [μ_{22 i}] + V a r (μ_{22 i}), \\ E [μ_{12 i}^{2}] = E^{2} [μ_{12 i}] + V a r (μ_{12 i}), E [μ_{21 i}^{2}] = E^{2} [μ_{21 i}] + V a r (μ_{21 i}), \\ E [μ_{11 i} μ_{12 i}] = E [μ_{11 i}] E [μ_{12 i}] + C o v (μ_{11 i}, μ_{12 i}), E [μ_{11 i} μ_{21 i}] = E [μ_{11 i}] E [μ_{21 i}] + C o v (μ_{11 i}, μ_{21 i}), \\ E [μ_{11 i} μ_{22 i}] = E [μ_{11 i}] E [μ_{22 i}] + C o v (μ_{11 i}, μ_{22 i}), E [μ_{22 i} μ_{12 i}] = E [μ_{22 i}] E [μ_{12 i}] + C o v (μ_{22 i}, μ_{12 i}), \\ E [μ_{22 i} μ_{21 i}] = E [μ_{22 i}] E [μ_{21 i}] + C o v (μ_{22 i}, μ_{21 i}), E [μ_{12 i} μ_{21 i}] = E [μ_{12 i}] E [μ_{21 i}] + C o v (μ_{12 i}, μ_{21 i}) . \end{matrix}

The optimal density

d^{*} (Ψ_{ζ})

has the form

Ψ_{ζ} \sim IW (ω_{Ψ}, W_{Ψ}),

where

ω_{Ψ} = ω_{0} + n T

,

W_{Ψ} = Ψ_{0}^{- 1} + \sum_{i = 1}^{n} \sum_{t = 1}^{T} [E_{- Ψ_{ζ}} [ϕ_{i t} ϕ_{i t}^{⊤}] - E_{- Ψ_{ζ}} [ϕ_{i t} ϕ_{i, t - 1}^{⊤} B_{i}^{⊤}] +

E_{- Ψ_{ζ}} [ϕ_{i, t - 1} B_{i} ϕ_{i t}^{⊤}] + E_{- Ψ_{ζ}} [B_{i} ϕ_{i, t - 1} ϕ_{i, t - 1}^{⊤} B_{i}^{⊤}]]

.

Λ_{z} = {[Λ_{z 1}, \dots, Λ_{z p}]}^{⊤}, Λ_{z j} = {(ω_{j}, Λ_{j})}_{1 \times (q + s)}, ϕ_{i t} = {(ϕ_{1 i t}, ϕ_{2 i t})}^{⊤}, i = 1, \dots, n, t = 1, \dots, T, j = 1, \dots, J, k = 1, \dots, s

. To define the matrix for model identifiability, we define identifiable indicators

L_{z} = {l_{z k j}}

for

Λ_{z} = {λ_{z k j}}

. If

λ_{z k j}

is fixed,

l_{z k j} = 0

. If

λ_{z k j}

is random,

l_{z k j} = 1

. Let

u_{j i t} = {({x_{j i t}}^{⊤}, {ϕ_{i t}}^{⊤})}^{⊤}

, matrix

u_{j i t}^{*}

is

u_{j i t}

delete

l_{z k j} = 0

row.

Z^{*} = {z_{j i t}^{*}}

, where

z_{j i t}^{*} = z_{j i t} - \sum_{k = 1}^{q + s} λ_{z k j} u_{j i t, k} (1 - l_{z k j})

.

The optimal density

d^{*} (Λ_{z})

has the form

Λ_{z j} = ϕ_{j}, Λ_{j} \sim N_{s + q} (μ_{j}, Σ_{j}),

where

Σ_{j} = {[\sum_{i = 1}^{n} \sum_{t = 1}^{T} E_{- Λ_{z j}} [u_{j i t}^{*} {u_{j i t}^{*}}^{⊤}] E_{- Λ_{z j}} [{(ρ_{2} γ_{j} ν_{j i t})}^{- 1}] + H_{0 z j}^{- 1}]}^{- 1}

,

μ_{j} = Σ_{j} [\sum_{i = 1}^{n} \sum_{t = 1}^{T} (E_{- Λ_{z j}} [z_{j i t}^{*} u_{j i t}^{*} υ_{j i t}^{- 1}] - k_{1} E_{- Λ_{z j}} [u_{j i t}^{*}]) E_{- Λ_{z j}} [{(ρ_{2} γ_{j})}^{- 1}] + H_{0 z j}^{- 1} Λ_{0 z j}]

,

E_{- Λ_{z j}} [u_{j i k}^{*}] = \{\begin{matrix} x_{j i t}^{⊤} & l_{z k 2} = l_{z k 3} = 0, \\ {(x_{j i t}, E_{- Λ_{z j}} [ϕ_{1 i t}])}^{⊤} & l_{z k 3} = 0, \\ {(x_{j i t}, E_{- Λ_{z j}} [ϕ_{2 i t}])}^{⊤} & l_{z k 2} = 0 . \end{matrix}

E_{- Λ_{z j}} [u_{j i t}^{*} {u_{j i t}^{*}}^{⊤}] = \{\begin{matrix} x_{j i t}^{⊤} x_{j i t} & l_{z k 2} = l_{z k 3} = 0, \\ E_{- Λ_{z j}} [{(x_{j i t}^{⊤}, ϕ_{1 i t}^{⊤})}^{⊤} (x_{j i t}, ϕ_{1 i t})] & l_{z k 3} = 0, \\ E_{- Λ_{z j}} [{(x_{j i t}^{⊤}, ϕ_{2 i t}^{⊤})}^{⊤} (x_{j i t}, ϕ_{2 i t})] & l_{z k 2} = 0 . \end{matrix}

\begin{matrix} E_{- Λ_{z j}} [{(x_{j i t}^{⊤}, ϕ_{1 i t}^{⊤})}^{⊤} (x_{j i t}, ϕ_{1 i t})] = [\begin{matrix} x_{j i t}^{⊤} x_{j i t} & x_{j i t}^{⊤} E_{- Λ_{z k}} [ϕ_{1 j i t}] \\ E_{- Λ_{z k}} [ϕ_{1 j i t}^{⊤}] x_{j i t} & E_{- Λ_{z j}} [ϕ_{1 j i t}^{⊤} ϕ_{1 j i t}] \end{matrix}], \end{matrix}

\begin{matrix} E_{- Λ_{z j}} [{(x_{j i t}^{⊤}, ϕ_{2 i t}^{⊤})}^{⊤} (x_{j i t}, ϕ_{2 i t})] = [\begin{matrix} x_{j i t}^{⊤} x_{j i t} & x_{j i t}^{⊤} E_{- Λ_{z j}} [ϕ_{2 j i t}] \\ E_{- Λ_{z j}} [ϕ_{2 j i t}^{⊤}] x_{j i t} & E_{- Λ_{z j}} [ϕ_{2 j i t}^{⊤} ϕ_{2 j i t}] \end{matrix}], \end{matrix}

\begin{matrix} E_{- Λ_{z j}} [ϕ_{1 j i t}^{⊤} ϕ_{1 j i t}] = E_{- Λ_{z j}}^{2} [ϕ_{1 j i t}] + V a r (ϕ_{1 j i t}), E_{- Λ_{z j}} [ϕ_{2 j i t}^{⊤} ϕ_{2 j i t}] = E_{- Λ_{z j}}^{2} [ϕ_{2 j i t}] + V a r (ϕ_{2 j i t}) . \end{matrix}

The optimal density

d^{*} (ν_{j i t})

has the form

d^{*} (ν_{j i t}) = GIG (\frac{1}{2}, a_{j i t}, b_{j i t}),

where

a_{j i t} = (2 ρ_{2} + ρ_{1}^{2}) E_{- ν_{j i t}} [{(ρ_{2} γ_{j})}^{- 1}]

,

b_{j i t} = E_{- ν_{j i t}} [{(z_{j i t} - ω_{j} x_{j i t} - Λ_{j} ϕ_{j i t})}^{2}] E_{- ν_{j i t}} [{(ρ_{2} γ_{j})}^{- 1}]

,

GIG (\frac{1}{2}, a_{j i t}, b_{j i t})

is

a_{j i t}, b_{j i t}

generalized Gaussian distribution.

The optimal density

d^{*} (γ_{j})

has the form

d^{*} (γ_{j}) = IG (m_{j}, n_{j}),

where

m_{j} = m_{γ 0} + 3 n T / 2

,

n_{j} = n_{γ 0} + \sum_{i = 1}^{n} \sum_{t = 1}^{T} E_{- γ_{j}} [ν_{j i t}] + N

,

N = \sum_{i = 1}^{n} \sum_{t = 1}^{T}

[E_{- γ_{j}} [\frac{{(z_{j i t} - ω_{j} x_{j i t} - Λ_{j} ϕ_{j i t})}^{2}}{2 ρ_{2} ν_{j i t}}] - E_{- γ_{j}} [\frac{k_{1} (z_{j i t} - ω_{j} x_{j i t} - Λ_{j} ϕ_{j i t})}{ρ_{2}}] + E_{- γ_{j}} [\frac{ρ_{1}^{2} ν_{j i t}}{2 ρ_{2}}]]

,

E_{- γ_{j}} [ν_{j i t}] = \frac{\sqrt{b_{j i t}}}{\sqrt{a_{j i t}}} + \frac{1}{a_{j i t}}

,

E_{- γ_{j}} [\frac{1}{ν_{j i t}}] = \frac{\sqrt{a_{j i t}}}{\sqrt{b_{j i t}}}

E_{γ_{j}} [ln ν_{i t k}] = ln \frac{\sqrt{b_{i t k}}}{\sqrt{a_{i t k}}} + \frac{\partial}{\partial p} ln K_{p} (\sqrt{a_{i t k} b_{i t k}})

,

E_{- γ_{j}} [Λ_{z j}] = μ_{j}

,

E_{- γ_{j}} [Λ_{z j} Λ_{z j}^{⊤}] = μ_{λ j} μ_{λ j}^{⊤} + Σ_{λ j}

.

Let

z_{j i t}^{*} = {(1, z_{j i t}, z_{j i, t - 1})}^{⊤}, χ = {(χ_{0}, χ_{1}, χ_{2})}^{⊤}, j = 1, \dots, J, t = 1, \dots, T, i = 1, \dots, n

,

ω_{j, m i s}, Λ_{j, m i s}

,

γ_{j, m i s}, ν_{j i t, m i s}

be the components of

ω, Λ, γ, ν

corresponding to

z_{j i t, m i s}

. According to the assumption of the missing data mechanism

r_{j i t}^{z} \sim B e r (1, p_{r^{z}})

,

p_{r^{z}} = \frac{{exp ({z^{*}}_{j i t}^{⊤} χ)}^{r_{j i t}^{z}}}{1 + exp ({z^{*}}_{j i t}^{⊤} χ)},

(16)

following the idea of the coordinate ascent method

\begin{matrix} d (z_{j i t, m i s}) \propto exp {E_{- z_{j i t, m i s}} {log p (z_{j i t, m i s} ∣ x_{i t}, ϕ_{i t}, r_{j i t}^{z}, θ_{z}, χ)}} \\ \propto exp {E_{- z_{j i t, m i s}} {log (N (ω_{j, m i s} x_{j i t} + Λ_{j, m i s} ϕ_{i t}, ρ_{2} γ_{j, m i s} ν_{j i t, m i s}) \times p (r_{j i t}^{z} ∣ z_{j i t, o b s}, z_{j i t, mis}, χ)}} \\ \propto exp {E_{- z_{j i t, m i s}} {log γ_{j, m i s}} - E_{- z_{j i t, m i s}} {log ν_{j i t, m i s}} + E_{- z_{j i t, m i s}} {\frac{{(z_{j i t} - x_{j i t} ω_{j, m i s} - Λ_{j, m i s} ϕ_{i t})}^{2}}{k_{2} γ_{j, m i s} ν_{j i t, m i s}}} \\ + E_{- z_{j i t, m i s}} {r_{j i t}^{z} (χ_{0} + χ_{1} z_{j i t} + χ_{2} z_{j, i, t - 1})} - E_{- z_{j i t, m i s}} {log (1 + exp (χ_{0} + χ_{1} z_{j i t} + χ_{2} z_{j i, t - 1}))}}, \end{matrix}

(17)

E_{- z_{j i t, m i s}} {log (1 + exp (χ_{0} + χ_{1} z_{j i t} + χ_{2} z_{j i, t - 1}))}

in (17) is difficult to optimize. Following Durante et al. []

\begin{matrix} p (r_{j i t}^{y}) & = \frac{{exp ({z^{*}}_{j i t}^{⊤} χ)}^{r_{j i t}^{z}}}{1 + exp ({z^{*}}_{j i t}^{⊤} χ)} \\ \propto exp {\frac{ω_{j i t}}{2} {({z^{*}}_{j i t}^{⊤} χ - κ_{j i k} / ω_{j i t})}^{2}}, \end{matrix}

(18)

where

κ_{j i t} = r_{j i t}^{z} - \frac{1}{2}

,

ω_{j i t}

following the Pólya-Gamma distribution.

The optimal density

d^{*} (z_{j i t, m i s})

has the form

d^{*} (z_{j i t, m i s}) = N (μ_{j i t, m i s}, Σ_{j i t, m i s}),

where

Σ_{j i t, m i s} = {(E_{- z_{j i t, m i s}} [\frac{1}{ρ_{2} γ_{j} ν_{j i t}}] + E_{- z_{j i t, m i s}} [χ_{1}^{2}] E_{- z_{j i t, m i s}} [ω_{j i t}])}^{- 1}

,

μ_{j i t, m i s}^{1} = E_{- z_{j i t, m i s}} [(ω_{j, m i s}^{⊤} x_{i t} + Λ_{j, m i s}^{⊤} ϕ_{i t} + k_{1} ν_{j i t, m i s}) (1 / k_{2} γ_{j, m i s} ν_{j i t, m i s})]

μ_{j i t, m i s}^{2} = E_{- z_{j i t, m i s}} [χ_{1} κ_{j i t} - (χ_{0} χ_{1} + χ_{1} χ_{2} z_{j i, t - 1}) ω_{j i t}]

,

μ_{j i t, m i s} = Σ_{j i t, m i s} (μ_{j i t, m i s}^{1} + μ_{j i t, m i s}^{2})

,

E_{- z_{j i t, m i s}} [κ_{j i t}] = \frac{exp ({z^{*}}_{j i t}^{⊤} χ)}{1 + exp ({z^{*}}_{j i t}^{⊤} χ)} - \frac{1}{2}

,

E_{- z_{j i t, m i s}} [ω_{j i t}] = \frac{tanh (\frac{{z^{*}}_{j i t}^{⊤} χ}{2})}{2 {z^{*}}_{j i t}^{⊤} χ}

,

j = 1, \dots, J, t = 1, \dots, T, i = 1, \dots, n

.

Following Durante et al. [], we introduce the auxiliary variables

e_{j i t}

. The optimal density

d^{*} (χ)

has the form

d^{*} (χ) = N (μ_{χ}, Σ_{χ}),

where

Σ_{χ} = {[Σ_{χ_{0}}^{- 1} + \sum_{j = 1}^{J} \sum_{i = 1}^{n} \sum_{t = 1}^{T} z_{j i t}^{*} e_{j i t} {z^{*}}_{j i t}^{⊤}]}^{- 1}

,

μ_{χ} = Σ_{χ} [\sum_{j = 1}^{J} \sum_{i = 1}^{n} \sum_{t = 1}^{T} z_{j i t}^{*} (r_{j i t}^{z} - 0.5) + Σ_{χ_{0}}^{- 1} μ_{χ_{0}}]

,

e_{j i t} \sim PG (1, ξ_{j i t})

,

e_{j i t} = \frac{tanh (0.5 ξ_{j i t})}{2 ξ_{j i t}}, ξ_{i k t} = \sqrt{- 2 α_{j i t}},

α_{j i t} = (- 1 / 2) [{z^{*}}_{j i t}^{⊤} Σ_{χ} z_{j i t} + {({z^{*}}_{j i t}^{⊤} μ_{χ})}^{2}],

j = 1, \dots, J, i = 1, \dots, n, t = 1, \dots, T .

The optimal density

d^{*} (κ)

has the form

d^{*} (κ) = G a m m a (a_{κ}, b_{κ}),

where

a_{κ} = s_{1} + S - 1

,

b_{κ} = s_{2} - \sum_{s = 1}^{S - 1} E_{- κ} [l o g (1 - V_{s})]

.

4. Simulation Studies

This section describes three model-based research studies intended to explore the practicality of the previously discussed variational Bayesian methods.

Simulation 1. We examine a quantile dynamic latent variable model represented by the equation:

z_{j i t} = θ_{j} + Λ_{j}^{⊤} ϕ_{i t} + ε_{j i t},

where

ϕ_{i t} = {(ϕ_{1 i t}, ϕ_{2 i t})}^{⊤}

and the components

ϕ_{1 i t}

and

ϕ_{2 i t}

are defined as follows:

\begin{matrix} ϕ_{1 i t} = μ_{11 i} ϕ_{1 i, t - 1} + μ_{12 i} ϕ_{2 i, t - 1} + ζ_{1 i t}, \\ ϕ_{2 i t} = μ_{22 i} ϕ_{2 i, t - 1} + μ_{21 i} ϕ_{1 i, t - 1} + ζ_{2 i t}, \end{matrix}

where

ζ

_it = (ζ_1it, ζ_2it) ∼

N_{2}

(0,

Ψ_{ζ}

). ε_jit s an error term whose distribution is restricted to have the

τ - th

quantile equal to zero for j = 1, …, J, i = 1, …, n, and t = 1, …, T. Let

μ

_i = (μ_11i, μ_22i, μ_12i, μ_21i), and denote Λ as a J × q loading matrix of the latent factor;

Λ_{j}^{⊺}

corresponds to the j-th row vector within the matrix Λ. For the realization of this simulation, we derive the elements of

μ_{i}

from the subsequent probability distributions:

\begin{matrix} μ_{11 i} \sim N (1, 0.01), & μ_{22 i} \sim N (1, 0.01), \\ μ_{12 i} \sim N (- 0.1, 0.005), & μ_{21 i} \sim N (- 0.1, 0.005) . \end{matrix}

Employing these distributions, we aim to show that the Dirichlet Process (DP) prior is well suited to capturing the essence of a normal distribution. For the purpose of ensuring identifiability, we delineate the frameworks of the matrices

Λ

and

Ψ_{ζ}

in the subsequent manner:

Λ = {[\begin{matrix} 1.0 & λ_{21} & λ_{31} & λ_{41} & 0.0 & 0.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 1.0 & λ_{62} & λ_{72} & λ_{82} \end{matrix}]}^{⊤}, Ψ_{ζ} = [\begin{matrix} ψ_{ζ 11} & ψ_{ζ 12} \\ ψ_{ζ 12} & ψ_{ζ 22} \end{matrix}],

where J = 8 and q = 2. In this experiment, the values of one and zero are considered known parameters, while the parameters

λ_{21}

,

λ_{31}

,

λ_{41}

,

λ_{62}

,

λ_{72}

,

λ_{82}

,

ψ_{ζ 11}

,

ψ_{ζ 12}

and

ψ_{ζ 22}

remain unknown. The true values of the unknown parameters in

θ

= (θ₁,…, θ_J)^⊤,

Λ

and

Ψ_{ζ}

are specified as follows: θ₁ = … = θ₈ = 1.0,

ψ_{ζ 11}

=

ψ_{ζ 22}

= 1.0,

ψ_{ζ 12}

= −0.5, and

λ_{21}

=

λ_{31}

=

λ_{41}

=

λ_{62}

=

λ_{72}

=

λ_{82}

= 0.8, J = 8, T = 10, n = 30 and τ = 0.75,0.5, 0.25.

To analyze the impact of the measurement error distribution on the precision of the parameter estimation, we will explore three distinct distributions for the variable

ε_{j i t}

:

case 1:

ε_{j i t} \sim ln N (0, 0.5)

;

case 2:

ε_{j i t} \sim t (5)

;

case 3:

ε_{j i t}

is produced from a mixture of normal distributions,

0.9 N (0, 0.5) + 0.1 N (0, 2)

; for

j = 1, \dots, J

,

i = 1, \dots, n

, and

t = 1, \dots, T

. As per the aforementioned details, the

τ

-th conditional quantile for

z_{j i t}

is

Q_{z_{j i t}} (τ | θ_{τ j}, ϕ_{i t}) = θ_{τ j} + Λ_{τ j}^{⊤} ϕ_{i t}

,

θ_{τ j} = θ_{j} + Q^{*} (τ)

,

Λ_{τ j} = Λ_{j}

for

j = 1, \dots, J

. The measurement error distributions (case 1)–(case 3),

Q^{*} (τ)

are taken as the

τ

-th quantile of the distributions (case 1)–(case 3), respectively.

We recognize that

z_{j i t}

’s are affected by non-random missing data, and missing indicators

r_{j i t}^{z}

are formulated using the logistic regression model:

logit {\frac{\Pr (r_{j i t}^{z} = 1 | z_{i t}, χ)}{1 - \Pr (r_{j i t}^{z} = 1 | z_{i t}, χ)}} = χ_{z 0} + χ_{z 1} z_{j i t} + χ_{z 2} z_{j i, t - 1},

where

χ = {(χ_{z 0}, χ_{z 1}, χ_{z 2})}^{⊤}

. The actual parameter values in the vector

χ

are established as follows:

χ_{z 0} = - 3.0

,

χ_{z 1} = 0.4

, and

χ_{z 2} = 0.3

. Additionally, the mean proportion of missing responses

z_{j i t}

is approximately 18%.

In order to derive Bayesian estimates for the unknown parameters, it is necessary to define the hyperparameters as follows:

Λ_{0 z j} = {(μ_{0 j}, λ_{0 j l})}^{⊤}

,

H_{0 z j} = diag (h_{01}^{z}, h_{02}^{z})

,

α_{0 j}

,

β_{0 j}

,

q_{0}

,

Ψ_{ζ 0}

,

χ_{0}

,

H_{0 χ}

,

μ_{0 z}

,

H_{0 z}

,

a_{1}

,

a_{2}

,

c_{1}

and

c_{2}

,

μ_{0 j} = 1.0

for

j = 1, \dots, 8

,

λ_{0 j l} = 0.8

for

j = 2, 3, 4, 6, 7, 8

and

l = 1, 2

,

h_{01}^{z} = h_{02}^{z} = 10.0

,

α_{0 j} = 9

and

ω_{0 j} = 4

for

j = 1, \dots, 8

,

q_{0} = 10

,

Ψ_{ζ 0} = 7 R_{0}^{- 1}

, where

R_{0}

is the true value of

Ψ_{ζ}

,

χ_{0} = {(- 3.0, 0.4, 0.3)}^{⊤}

,

H_{0 χ} = 7 I

,

μ_{0 z} = 0

,

H_{0 z} = 7 I

,

a_{1} = 400

,

a_{2} = 2

,

c_{1} = 300

and

c_{2}

to be 0.5 and 0.03 for the first two and last two elements of

μ_{i}

, respectively.

In this intriguing simulation study, we conducted 100 replications to uncover the pivotal variables and measure the model’s parameters. The results for quantiles 0.75, 0.5, and 0.25 are laid out in Table 1. Here, ‘Bias’ represents the gap between the true value and the average of its estimates from our 100 simulations, while ‘SD’ denotes the standard deviation of these estimates. Meanwhile, ‘RMS’ stands for the root mean square deviation between the replication estimates and the actual value. A glance at Table 1 reveals that our proposed estimate procedure shines brightly, showcasing minimal bias and RMS, with SD values closely mirroring the RMS, regardless of the quantile or error distribution. To keep things concise, we have opted to omit the SD values for the parameters being estimated. The simulation results affirm that our variational estimation method retains impressive efficiency across various error assumptions.

Table 1. Performance of Bayesian parameter estimates for

τ = 0.25, 0.5

and 0.75 in Simulation 1.

Simulation 2. Paralleling the procedures of Simulation 1, we assess the forthcoming dynamic latent variable model:

z_{j i t} = x_{j i t}^{⊤} ω + Λ_{j}^{⊤} ϕ_{i t} + ε_{j i t},

where

ω = {(ω_{0}, ω_{1}, ω_{2})}^{⊤}

,

ζ_{i t} = {(ζ_{1 i t}, ζ_{2 i t})}^{⊤} \sim N_{2} (0, Ψ_{ζ})

,

ε_{j i t} \sim N (0, 0.5)

,

x_{j i t} = {(1, x_{1 j i t}, x_{2 j i t})}^{⊤}

,

ϕ_{i t} = {(ϕ_{1 i t}, ϕ_{2 i t})}^{⊤}

with

ϕ_{1 i t} = μ_{11 i} ϕ_{1 i, t - 1} + μ_{12 i} ϕ_{2 i, t - 1} + ζ_{1 i t},

ϕ_{2 i t} = μ_{22 i} ϕ_{2 i, t - 1} + μ_{21 i} ϕ_{1 i, t - 1} + ζ_{2 i t},

for

j = 1, \dots, J

,

i = 1, \dots, n

and

t = 1, \dots, T

. For this analysis,

μ_{i} = {(μ_{11 i}, μ_{22 i}, μ_{12 i}, μ_{21 i})}^{⊤}

is produced following the approach of Simulation 1. The covariates

x_{1 j i t}

are distributed according to the Bernoulli distribution

B (1, π_{j})

, where

π_{j} = exp (α_{1 j}) / 1 + exp (α_{1 j})

, not dependent on the subjects or observation time points. On the other hand,

x_{2 j i t}

are simulated from the normal distribution

N (α_{0 j} + α_{2 j} x_{1 j i t}, α_{3 j})

. The true values of the parameters

Λ = {(Λ_{1}, \dots, Λ_{J})}^{⊤}

and

Ψ_{ζ}

are adopted from those specified in Simulation 1. In contrast, the true values of

ω

,

α_{1 j}

, and

υ_{2 j}^{*} = {(α 0 j, α_{2 j}, α_{3 j})}^{⊤}

are set to

ω = {(1.0, 1.0, 1.0)}^{⊤}

,

α_{1 j} = 0.5

, and

υ_{2 j}^{*} = {(1, 0.5, 1)}^{⊤}

, respectively. With these settings, the

τ

-th conditional quantile of

z_{j i t}

is expressed as

Q_{z_{j i t}} (τ | x_{j i t}, ϕ i t) = x_{j i t}^{⊤} ω τ + Λ_{τ j}^{⊤} ϕ_{i t}

, where

ω_{τ} = {(ω_{τ 0}, ω_{τ 1}, ω_{τ 2})}^{⊤}

and

ω_{τ j} = ω_{j} + Q^{*} (τ)

, with

Q^{*} (τ)

prescribed for Simulation 1 for

j = 0, 1, 2

, and

Λ_{τ j} = Λ_{j}

for

j = 1, \dots, J

. In this scenario, we set

n = 100

,

T = 7

,

J = 8

,

q = 2

, and

τ = 0.75, 0.5, 0.25

.

The assumption is that

z_{j i t}

is fully accounted for, while

x_{1 j i t}

and

x_{2 j i t}

are possible due to nonignorable missing. The missing indicators

r_{j i t, 1}^{x}

and

r_{j i t, 2}^{x}

are deduced from the listed probit models below:

\begin{matrix} Φ^{- 1} {\Pr (r_{j i t, 1}^{x} = 1 | x_{1 j i t}, η_{j 1})} = η_{j 11} x_{1 j i t}, \\ Φ^{- 1} {\Pr (r_{j i t, 2}^{x} = 1 | x_{1 j i t}, x_{2 j i t}, r_{j i t, 1}^{x}, η_{j 2})} = η_{j 21} x_{1 j i t} + η_{j 22} x_{2 j i t} + η_{j 23} r_{j i t, 1}^{x}, \end{matrix}

where

η_{j 1} = η_{j 11}

, and

η_{j 2} = {(η_{j 21}, η_{j 22}, η_{j 23})}^{⊤}, j = 1, \dots, J

. We set the true values of parameters

η_{j 1}

and

η_{j 2}

as

η_{j 11} = 0.5

and

η_{j 2} = {(2.0, 0.1, 0.1)}^{⊤}, j = 1, \dots, J

, respectively. The average portions of missing

x 1 j i t

and

x_{2 j i t}

are about 18% and 15%, respectively.

In the previous section, we delved into the variational Bayesian approach, utilizing the same hyperparameters as outlined in Simulation 1, with a few tweaks:

Λ_{0 ω} = {(1.0, 1.0, 1.0)}^{⊤}

,

ψ_{01} = 0.7

,

ψ_{02} = {(2.0, 0.1, 0.1)}^{⊤}

, and

H_{0 ψ k} = 0.5 I

for

k = 1, 2

. This setup was employed to derive Bayesian estimates for the unknown parameters across 100 datasets generated multiple times. The findings for quantiles

τ = 0.75, 0.5, 0.25

are showcased in Table 2. Table 2 demonstrates that the findings obtained are consistent with our conclusions.

Table 2. Performance of Bayesian parameter estimates for

τ = 0.25

, 0.5 and 0.75 in Simulation 2.

Simulation 3. In this experiment, we focus on following a quantile nonlinear dynamic factor analysis model:

z_{j i t} = θ_{j} + Λ_{j}^{⊤} ϕ_{i t} + ϵ_{j i t},

where

ϕ_{i t} = {(ϕ_{1 i t}, ϕ_{2 i t})}^{⊤}

with

\begin{matrix} ϕ_{1 i t} = log (1 + exp (ϕ_{2 i, t - 1})) + ζ_{1 i t}, \\ ϕ_{2 i t} = log (1 + exp (ϕ_{1 i, t - 1})) + ζ_{2 i t}, \end{matrix}

where

ζ

_it = (ζ_1it, ζ_2it)^⊤ ∼

N_{2}

(0,

Ψ_{ζ}

), the τ-

th

value quantile of z_jit is Q_{z_jit}(τ|θ_τj,

ϕ

_it) = θ_τj +

Λ_{τ j}^{⊤}

ϕ_{i t}

, θ_τj = θ_j + Q*(τ), Q*(τ) prescribed in Simulation 1,

Λ_{τ j}

=

Λ_{j}

for j = 1,…, J.

μ_{i}

= (

μ_{11 i}

,

μ_{22 i}

,

μ_{12 i}

,

μ_{21 i}

)^⊤, and Λ is a J × q matrix of latent factor and

Λ_{j}^{⊤}

corresponds to the j-th row vector within the matrix Λ. The elements of

μ_{i}

follow a normal distribution or uniform distributions:

μ_{11 i} \sim N (1, 0.05)

,

μ_{22 i} \sim N (1, 0.05)

,

μ_{12 i} \sim U (- 0.1, 0.1)

and

μ_{21 i} \sim U (- 0.1, 0.1)

. We propose that

ε_{j i t}

is produced by a mixture of normal distributions, for which

0.9 N (0, 0.3) + 0.1 N (0, 1)

. All other parameters and settings are the same as Simulation 1.

For the process of deriving Bayesian estimates for the unknown parameters, it is a prerequisite to designate the hyperparameters as will be described below:

Λ_{0 z j} = {(μ_{0 j}, λ_{0 j l})}^{⊤}

,

H_{0 z j} = diag (h_{01}^{z}, h_{02}^{z})

,

α_{0 j}

,

β_{0 j}

,

q_{0}

,

Ψ_{ζ 0}

,

χ_{0}^{*}

,

H_{0 χ^{*}}

,

μ_{0 z}

,

H_{0 z}

,

a_{1}

,

a_{2}

,

c_{1}

and

c_{2}

,

μ_{0 j} = 1.0

for

j = 1, \dots, 8

,

λ_{0 j l} = 0.8

for

j = 2, 3, 4, 6, 7, 8

and

l = 1, 2

,

h_{01}^{z} = h_{02}^{z} = 10.0

,

α_{0 j} = 11

and

ω_{0 j} = 5

for

j = 1, \dots, 8

,

q_{0} = 10

,

Ψ_{ζ 0} = 10 R_{0}^{- 1}

, where

R_{0}

is the true value of

Ψ_{ζ}

,

χ_{0}^{*} = {(- 3.0, 0.4, 0.3)}^{⊤}

,

H_{0 χ^{*}} = 10 I

,

μ_{0 z} = 0

,

H_{0 z} = 10 I

,

a_{1} = 500

,

a_{2} = 10

,

c_{1} = 100

and

c_{2}

are 1 and 10 for the first two and last two elements of

μ_{i}

, respectively.

The assumption here is that

z_{j i t}

suffers from nonignorable missing data, with the associated missing indicators

r_{j i t}^{z}

being determined by the forthcoming logistic regression model:

logit {\Pr (r_{j i t}^{z} = 1 | z_{i t}, χ^{*})} = χ_{z 0}^{*} + χ_{z 1}^{*} z_{j i t} + χ_{z 2}^{*} z_{j i, t - 1},

where

χ^{*} = {(χ_{z 0}^{*}, χ_{z 1}^{*}, χ_{z 2}^{*})}^{⊤}

. The real values of the parameters in

χ^{*}

are fixed at

χ_{z 0}^{*} = - 3.0

,

χ_{z 1}^{*} = 1.2

, and

χ_{z 2}^{*} = 0.5

; the expected proportion of missing

z_{j i t}

responses is about 15.7%.

Throughout the simulation, 100 repetitions are undertaken to extract the active variables and to quantify the model parameters. The outcomes for some quantiles (i.e.,

τ = 0.75, 0.5, 0.25

) are depicted in Table 3. A review of Table 3 indicates that the Bayesian estimates produced by the proposed methodology demonstrate satisfactory performance.

Table 3. Performance of Bayesian parameter estimates for

τ = 0.25

, 0.5 and 0.75 in Simulation 3.

5. A Real Example

To bring the aforementioned techniques to life, let us dive into the Ecological Momentary Assessment (EMA) dataset we touched upon earlier. We will be applying the conditional quantile Dynamic Factor Analysis Model (QDFAM) to this intriguing dataset:

Q_{τ} (z_{j i t} | ϕ_{i t}) = θ_{τ j} + Λ_{τ j}^{⊤} ϕ_{i t},

ϕ_{i t} = f_{t} (ϕ_{i, t - 1}, μ_{i}, θ_{ϕ}) = (\begin{matrix} μ_{11 i} & μ_{12 i} \\ μ_{21 i} & μ_{22 i} \end{matrix}) ϕ_{i, t - 1} + ζ_{i t}, ζ_{i t} \sim N_{2} (0, Ψ_{ζ}),

for

j = 1, \dots, 8, i = 1, \dots, 174, t = 1, \dots, 52

; here,

ϕ_{i t} = {(ϕ_{1 i t}, ϕ_{2 i t})}^{⊤}

,

μ_{i} = {(μ_{11 i}, μ_{22 i}, μ_{12 i}, μ_{21 i})}^{⊤}

. In our exploration, we define

θ_{τ j}

corresponding to the j-th element within the elusive parameter vector

θ_{τ}

, while

Λ_{τ j}

represents the j-th row in the loading matrix

Λ_{τ}

of the latent factor, which mirrors the setup from Simulation 1. following the excellent work of Tang et al. [], We focus on the beginning and ending thresholds

α_{j k}

(where j ranges from 1 to 8 and k takes on values 1 and 6) for every single one of the eight traceable ordinal entries as

{Φ^{*}}^{- 1} (c_{j k})

. Here,

c_{j k}

corresponds to the observed aggregate marginal quantity for the j-th observed variable, derived from a sample of 174 individuals and 52 measurements for class k, where

s_{j i t}

is less than or equal to k. Additionally,

Φ^{*}

is

N (0, 1)

. We operate under the premise that the distribution of

μ_{i}

remains unknown, while

z_{j i t}

grapples with the challenge of missing values. The nature of this missingness is defined by a specific data mechanism, which is outlined as follows:

logit \{\Pr (r_{j i t}^{z} = 1 | z_{i t}, φ)\} = φ_{z 0} + φ_{z 1} z_{j i t} + φ_{z 2} z_{j i, t - 1},

where

φ = {(φ_{z 0}, φ_{z 1}, φ_{z 2})}^{⊤}

; when

φ_{1} = φ_{2} = 0

, it is a MAR mechanism.

The previously mentioned variational Bayesian method is utilized to derive estimates and to establish 95% confidence intervals for the parameters encompassed by

θ_{τ}

,

Λ_{τ}

,

Ψ_{ζ}

and

φ

. Table 4 displays the results for

τ = 0.5

.

Table 4. Bayesian estimates, lower and upper limits of 95% confidence intervals of parameters for

τ = 0.5

in a real example.

In light of Reference [], to investigate the effects of negligible data perturbations on the priors and statistical distribution of samples using the aforementioned Bayesian local influence measures, the subsequent perturbation strategies are considered:

z_{j i t ϖ} = z_{j i t} + ϖ_{i}

,

ω_{τ ϖ} = ω_{τ}^{0} + ϖ_{ω} 1_{p}

, for the MAR mechanism,

φ_{ϖ} = ϖ_{φ}

, where

1 p

is a

p \times 1

vector filled with ones, and

ϖ = ϖ_{1}, \dots, ϖ_{n}, ϖ_{ω}, ϖ_{φ}

. When there is no perturbation, this is signified by

ϖ^{0} = 0

. Within the framework of the perturbation strategies we discussed, the log-likelihood function for the model under perturbation is depicted as follows:

\begin{matrix} ℓ (ϖ) & \propto & - \frac{1}{2} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sum_{j = 1}^{p} \frac{{(z_{j i t} + ϖ_{i} - θ_{τ j} - Λ_{τ j}^{⊤} ϕ_{i t} - ρ_{1} ν_{j i t})}^{2} γ_{j}}{ρ_{2} ν_{j i t}} \\ - \frac{1}{2} {(ω_{τ} - ω_{τ}^{0} - ϖ_{ω} 1_{p})}^{⊤} H_{0 ω}^{- 1} (ω_{τ} - ω_{τ}^{0} - ϖ_{ω} 1_{p}) \\ + \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sum_{j = 1}^{p} [r_{j i t}^{z} (φ_{0} + ϖ_{φ} z_{j i t} + ϖ_{ω} z_{j i, t - 1}) - log {1 + exp (φ_{0} + ϖ_{φ} z_{j i t} + ϖ_{φ} z_{j i, t - 1})}], \end{matrix}

where

ω_{τ}^{0}

and

H_{0 β}

are the hyperparameters of the prior

ω_{τ}

(i.e.,

ω_{τ} \sim N (ω_{τ}^{0}, H_{0 ω})

). It is simple to find that

G (ϖ^{0}) = diag (G_{D} (ϖ^{0}), g_{P} (ϖ^{0}), g_{S} (ϖ^{0}))

, where

G_{D} (ϖ^{0}) = diag (g_{11}, \dots, g_{n n})

with

g_{i i} = \sum_{t = 1}^{T} \sum_{j = 1}^{p} E {γ_{j} / (ρ_{2} ν_{j i t})} \approx T \sum_{j = 1}^{p} ω_{0 j}^{2} / {(α_{0 j} - 1) (α_{0 j} - 2) ρ_{2}}

for

i = 1, \dots, n

,

g_{P} (ϖ^{0}) = 1_{p}^{⊤} H_{0 ω}^{- 1} 1_{p}

, and

\begin{matrix} g_{S} (ϖ^{0}) & = & \frac{n T e^{φ_{0}}}{{(1 + e^{φ_{0}})}^{2}} \sum_{j = 1}^{p} [{ω_{τ j}^{0}}^{2} + h_{τ j}^{0} + (2 ρ_{1}^{2} + ρ_{2}^{2}) \frac{α_{0 j} (1 + α_{0 j})}{ω_{0 j}^{2}} + \frac{2 ρ_{1} α_{0 j} ω_{τ j}^{0}}{ω_{0 j}} \\ + trace \{\frac{Ψ_{ζ 0} (H_{0 τ} + Λ_{τ j}^{0} {Λ_{τ j}^{0}}^{⊤})}{q_{0} - q - 1}\}], \end{matrix}

in which

Λ_{τ j}^{0}

and

H_{0 τ}

represent the hyperparameters associated with the prior of

Λ_{τ j}

(i.e.,

Λ_{τ j} \sim N_{q} (Λ_{τ j}^{0}, H_{0 τ})

),

ω_{τ j}^{0}

and

h_{β}^{0}

denote the hyperparameters linked to the prior of

ω_{τ j}

(i.e.,

β_{τ j} \sim N (ω_{τ j}^{0}, h_{ω}^{0})

) for

j = 1, \dots, J

. The remaining hyperparameters are assigned their respective Bayesian estimates as outlined in Table 4. Additionally, we initialize

h_{ω}^{0}

at 0.5 and set

H_{0 τ} = 0.3

, and

H_{0 τ} = I

.

In this real example, the Bayesian local influence diagnostics, which include the Bayes factor (denoted as

{FIC}_{B, e_{j}}

),

ϕ

-divergence (represented as

{SIC}_{ϕ}, e_{j}

), and posterior mean distance (indicated as

{SIC}_{M_{d}}, e_{j}

) [], are assessed based on the parameter vector

d (θ) = (\sqrt{| ω_{τ 1} |}, \dots, \sqrt{| ω_{τ 8} |}, \sqrt{| λ_{21} |}, \sqrt{| λ_{31} |},

\sqrt{| λ_{41} |}, \sqrt{| λ_{62} |}, \sqrt{| λ_{72} |}, \sqrt{| λ_{82} |})

. This evaluation is conducted using 200 observations, which were obtained through the previously developed variational Bayesian algorithm in conjunction with the specified prior distributions. In the enchanting field of Bayesian local influence diagnostics, we present our findings alongside their benchmarks, beautifully captured in Figure 1. A closer inspection of this visual wonder reveals that cases 3, 121, and 128 emerge as the stars of the show, wielding their influence across all quantile levels, whether it is

τ

= 0.25, 0.5, or 0.75. Meanwhile, case 118 stands out at quantiles

τ = 0.25

and

0.5

. Case 107, alternatively, is continuously influential at quantile 0.25, irrespective of the local influence metrics used. Notably, the identification of

ω_{φ}

as influential by three local influence measures raises concerns about the accuracy of the MAR assumption for missing data.

Figure 1. Index plots of Bayesian local influence measures:

{FIC}_{B F, e_{j}}

(left panel),

{SIC}_{ϕ, e_{j}}

(middle panel) and

{SIC}_{M_{d}, e_{j}}

(right panel) for

τ = 0.25

(1st line), 0.5 (2nd line) and 0.75 (3rd line) in a real example.

In order to determine the influence of subjects, case 3, case 118, case 121, and case 128, we recompute the Bayesian estimations for the unknown parameters, omitting these individuals, by applying the variational Bayesian approach introduced earlier along with a mechanism for nonignorable missing data. The Bayesian estimations and their 95% confidence intervals for the parameters in

ω_{τ}

,

Λ_{τ}

,

Ψ_{ζ}

, and

φ

, with the exclusion of individuals 128, 121, 118, and 3, are displayed in Table 4. Analysis of Table 4 reveals that the 95% confidence intervals for

φ_{1}

and

φ_{2}

do not include zero, and the fact that their minimum values are greater than zero substantiates the justification of the nonignorable hypothesis for missing data.

6. Discussion

This investigation considers the complex task of parameter estimation within nonlinear dynamic latent variable models, especially when the challenge of nonignorable missing data is present. These models, marked by their nonlinear latent variables and nonparametric priors, are analyzed using a Bayesian framework. To address the substantial computational demands typically associated with traditional dynamic latent variable models, a novel variational Bayesian technique is introduced to handle missing data and nonlinear dynamics effectively. Utilizing a Dirichlet Process (DP) prior, we can effectively define the unknown distribution of the random effects and integrate it with the variational Bayesian approach. This Bayesian framework has enabled us to develop a practical and robust algorithm that uses an asymmetric Laplace distribution, which combines exponential and normal distributions elegantly. To handle missing data, we employ logistic regression to model the missingness mechanism for observable variables and a truncated normal latent variable with a probit regression model for the propensity scores of missing covariates. In order to handle the mix of continuous and discrete covariates, we conceive a suite of univariate exponential family distributions to portray the joint distribution of the unobserved covariates. We also adapt the Bayesian local influence technique by Zhu et al. [] to carry out a robustness assessment on quantile nonlinear dynamic latent variable models (DLVMs). Within the variational Bayesian methods framework, we turn the task of precise posterior density estimation into an optimization challenge, focusing on minimizing the evidence lower bound. To ensure computational efficiency, we employ a coordinate ascent algorithm to optimize this lower bound.

Our Bayesian estimates, as evidenced by empirical findings, maintain a high level of accuracy across different quantile levels and missing data mechanisms. We use a real dataset from an EMA study to illustrate our methodologies. However, it is noteworthy that this article refrains from delving into the complexities of high-dimensional random effect models or the more intricate direct relationships between latent variables challenges that are worthy of separate investigation. These engaging issues are perfect for future research pursuits.

Author Contributions

Conceptualization, M.T.; writing—original draft preparation, M.T.; writing—review and editing, A.M.; supervision, M.T.; methodology, M.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Open Project of Key Laboratory of Applied Mathematics of Xinjiang Uygur Autonomous Region (grant no. 2023D04045).

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Appendix A. Calculation the Evidence Lower Bound (ELB)

\begin{matrix} E L B O & = E_{q} {log {p (z, z_{m i s}, Λ_{z}, ν, γ, φ, r, V, μ_{z}, Φ_{z}, Z, L, ϕ, Φ_{ζ}, ς, κ)}} \\ - E_{q} {log {q (z_{m i s}, Λ_{z}, ν, γ, φ, r, V, μ_{z}, Φ_{z}, Z, L, ϕ, Φ_{ζ}, ς, κ)}}, \end{matrix}

where

\begin{matrix} E_{q} {log (p (z, z_{m i s}, Λ_{z}, ν, μ, γ, χ, V, μ_{z}, Φ_{z}, Z, L, ϕ, Φ_{ζ}, ς, κ))} \\ = E_{q} {log (p (z ∣ X, ϕ, Λ, γ, ν))} E_{q} {log (\prod_{j = 1}^{J} p (Λ_{z_{j}} ∣ μ_{Λ_{0_{k}}}, Σ_{Λ_{0_{k}}}))} E_{q} {log (\prod_{j = 1}^{J} p (γ_{j} ∣ m_{0}, n_{0}))} \\ \times E_{q} {log p (χ ∣ μ_{χ 0}, Σ_{χ 0})} E_{q} {log (\prod_{i = 1}^{n} \prod_{t = 2}^{T} p (ϕ_{i 1} ∣ ϕ_{0}, Φ_{ϕ_{0}}) p (ϕ_{i t} ∣ ϕ_{i, t - 1}, b, Φ_{ϕ}))} E_{q} {log (p (Φ_{ζ} ∣ ω_{ζ_{0}}, W_{ζ_{0}}))} \\ \times E_{s} {\prod_{s = 1}^{S} log (p (Z_{s} ∣ μ_{z}, Φ_{z}))} E_{q} {log (p (μ_{z} ∣ μ_{μ_{z_{0}}}, Σ_{μ_{z}}))} E_{q} {log (p (Φ_{z} ∣ ω_{1}, ω_{2}))} E_{q} {log (p (L ∣ V))} \\ \times E_{q} {log (p (V ∣ κ))} E_{q} {log (p (κ ∣ s_{1}, s_{2}))} E_{q} {log (p (ν ∣ γ))} E_{q} {log (p (ς ∣ ϕ, x))} \end{matrix}

\begin{matrix} E_{q} {q (z_{m i s}, Λ_{z}, ν, μ, γ, χ, V, μ_{z}, Φ_{z}, Z, L, ϕ, Φ, ς, κ)} \\ = E_{q} {\prod_{j = 1}^{J} \prod_{i = 1}^{n} \prod_{t = 1}^{T} q (ν_{j i t}) \prod_{i = 1}^{n} \prod_{t = 1}^{T} q (z_{i t, m i s}) \prod_{i = 1}^{n} \prod_{t = 2}^{T} p (ϕ_{i 1} ∣ ϕ_{0}, Φ_{ϕ_{0}}) q (ϕ_{i t} ∣ ϕ_{i, t - 1}) \\ \times q (μ_{z}) q (Φ_{z}) \prod_{s = 1}^{S} q (μ_{s}) \prod_{i = 1}^{n} q (L_{i}) \prod_{s = 1}^{S - 1} q (V_{s}) \prod_{j = 1}^{J} q (Λ_{z j}) \prod_{j = 1}^{J} q (σ_{j}) q (Φ_{ζ}) q (κ) q (χ) q (ς)} \end{matrix}

(A1)

References

Diener, E.; Fujita, F.; Smith, H. The personality structure of affect. J. Personal. Soc. Psychol. 1995, 69, 130–141. [Google Scholar] [CrossRef]
Chow, S.M.; Nesselroade, J.R.; Shifren, K.; Mcardle, J.J. Dynamic structure of emotions among individuals with parkinson’s disease. Struct. Equ. Model. Multidiscip. J. 2004, 11, 560–582. [Google Scholar] [CrossRef]
Zhang, Z.Y.; Nesselroade, J.R. Bayesian Estimation of Categorical Dynamic Factor Models. Multivar. Behav. Res. 2007, 42, 729–756. [Google Scholar] [CrossRef]
Chow, S.M.; Tang, N.S.; Yuan, Y.; Song, X.Y.; Zhu, H.T. Bayesian estimation of semiparametric dynamic latent variable models using the dirichlet process prior. Br. J. Math. Stat. Psychol. 2011, 64, 69–106. [Google Scholar] [CrossRef] [PubMed]
Tang, N.S.; Chow, S.M.; Ibrahim, J.G.; Zhu, H.T. Bayesian sensitivity analysis of a nonlinear dynamic factor analysis model with nonparametric prior and possible nonignorable missingness. Psychometrika 2017, 82, 875–903. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.Q.; Tang, N.S. Bayesian quantile regression with mixed discrete and nonignorable missing covariates. Bayesian Anal. 2020, 15, 579–604. [Google Scholar] [CrossRef]
Tuerde, M.; Tang, N.S. Bayesian semiparametric approach to quantile nonlinear dynamic factor analysis models with mixed ordered and nonignorable missing data. Statistics 2022, 56, 1166–1192. [Google Scholar] [CrossRef]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; John Wiley & Sons: New York, NY, USA, 2019. [Google Scholar]
Bańbura, M.; Modugno, M. Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data. J. Appl. Econ. 2014, 29, 133–160. [Google Scholar] [CrossRef]
Jungbacker, B.; Koopman, S.J.; van der Wel, M. Maximum likelihood estimation for dynamic factor models with missing data. J. Econ. Dyn. Control 2011, 35, 1358–1368. [Google Scholar] [CrossRef]
Stock, J.H.; Mark, W.W. Macroeconomic Forecasting Using Diffusion Indexes. J. Bus. Econ. Stat. 2002, 20, 147–295. [Google Scholar] [CrossRef]
Stock, J.H.; Watson, M.W. Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics. In Handbook of Macroeconomics; Elsevier B.V.: Amsterdam, The Netherlands, 2016; pp. 415–525. [Google Scholar]
Kozumi, H.; Kobayashi, G. Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul. 2011, 81, 1565–1578. [Google Scholar] [CrossRef]
Tang, A.M.; Tang, N.S. Semiparametric Bayesian inference on skew-normal joint modeling of multivariate longitudinal and survival data. Stat. Med. 2015, 34, 824–843. [Google Scholar] [CrossRef] [PubMed]
Blei, D.; Jordan, M.I. Variational inference for Dirichlet process mixtures. Bayesian Analysis 2006, 1, 121–143. [Google Scholar] [CrossRef]
Lee, S.Y.; Tang, N.S. Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data. Stat. Sin. 2006, 16, 1117–1141. [Google Scholar]
Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, University of London, London, UK, 2003. [Google Scholar]
Bishop, C. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 518, 859–877. [Google Scholar] [CrossRef]
Durante, D.; Rigon, T. Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models. Stat. Sci. 2019, 34, 472–485. [Google Scholar] [CrossRef]
Zhu, H.T.; Ibrahim, J.G.; Tang, N.S. Bayesian influence analysis: A geometric approach. Biometrika 2011, 98, 307–323. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Index plots of Bayesian local influence measures:

{FIC}_{B F, e_{j}}

(left panel),

{SIC}_{ϕ, e_{j}}

(middle panel) and

{SIC}_{M_{d}, e_{j}}

(right panel) for

τ = 0.25

(1st line), 0.5 (2nd line) and 0.75 (3rd line) in a real example.

Table 1. Performance of Bayesian parameter estimates for

τ = 0.25, 0.5

and 0.75 in Simulation 1.

Table 1. Performance of Bayesian parameter estimates for

τ = 0.25, 0.5

and 0.75 in Simulation 1.

	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$		$τ = 0.25$		$τ = 0.5$		$τ = 0.75$
Par	Case1						Case2
	Bias	RMS	Bias	RMS	Bias	RMS	Bias	RMS	Bias	RMS	Bias	RMS
$λ_{21}$	0.016	0.033	−0.008	0.024	−0.019	0.031	0.007	0.032	−0.028	0.043	−0.057	0.070
$λ_{31}$	0.016	0.032	−0.008	0.024	−0.020	0.031	0.004	0.031	−0.029	0.045	−0.060	0.075
$λ_{41}$	0.017	0.031	−0.009	0.028	−0.020	0.031	0.002	0.032	−0.029	0.044	−0.060	0.075
$λ_{62}$	0.015	0.031	−0.008	0.026	−0.019	0.031	0.007	0.033	−0.029	0.045	−0.062	0.076
$λ_{72}$	0.019	0.037	−0.007	0.024	−0.018	0.030	0.011	0.030	−0.030	0.044	−0.064	0.078
$λ_{82}$	0.021	0.036	−0.008	0.028	−0.017	0.031	0.007	0.030	−0.031	0.044	0.062	0.077
$θ_{1}$	0.042	0.113	0.004	0.112	−0.029	0.123	0.109	0.154	0.101	0.152	0.077	0.157
$θ_{2}$	0.047	0.100	0.007	0.093	−0.027	0.102	0.095	0.131	0.062	0.111	0.029	0.112
$θ_{3}$	0.048	0.097	0.001	0.087	−0.030	0.102	0.097	0.132	0.061	0.110	0.033	0.112
$θ_{4}$	0.046	0.099	0.002	0.094	−0.031	0.101	0.093	0.129	0.066	0.110	0.027	0.116
$θ_{5}$	0.034	0.116	0.007	0.104	−0.030	0.123	0.116	0.157	0.105	0.151	0.095	0.163
$θ_{6}$	0.040	0.102	0.002	0.099	−0.036	0.093	0.098	0.131	0.068	0.109	0.048	0.109
$θ_{7}$	0.042	0.104	0.005	0.099	−0.033	0.095	0.100	0.133	0.066	0.106	0.048	0.107
$θ_{8}$	0.044	0.103	0.003	0.101	−0.032	0.092	0.097	0.128	0.067	0.105	0.042	0.109
$ζ_{11}$	0.017	0.079	0.029	0.081	0.083	0.117	−0.017	0.079	0.068	0.104	0.029	0.061
$ζ_{21}$	0.013	0.046	0.001	0.045	−0.008	0.049	0.014	0.045	−0.002	0.053	−0.027	0.065
$ζ_{22}$	0.014	0.066	0.037	0.078	0.075	0.110	−0.018	0.079	0.074	0.109	0.291	0.318
$χ_{z 0}$	−0.167	0.293	−0.012	0.090	0.042	0.084	−0.059	0.121	0.141	0.163	0.129	0.155
$χ_{z 1}$	0.165	0.275	0.043	0.078	−0.051	0.083	0.087	0.102	−0.103	0.111	−0.167	0.182
$χ_{z 2}$	−0.168	0.272	−0.091	0.101	−0.027	0.059	−0.134	0.139	−0.017	0.038	0.028	0.051
Par	Case3
	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$
	Bias	RMS	Bias	RMS	Bias	RMS
$λ_{21}$	0.017	0.040	−0.006	0.026	−0.024	0.037
$λ_{31}$	0.019	0.040	−0.006	0.024	−0.019	0.035
$λ_{41}$	0.021	0.039	−0.010	0.029	−0.018	0.032
$λ_{62}$	0.025	0.041	−0.005	0.025	−0.015	0.036
$λ_{72}$	0.025	0.040	−0.008	0.026	−0.018	0.034
$λ_{82}$	0.026	0.041	−0.006	0.026	−0.019	0.037
$θ_{1}$	0.019	0.114	−0.027	0.125	−0.043	0.121
$θ_{2}$	0.029	0.101	−0.021	0.103	−0.042	0.101
$θ_{3}$	0.025	0.101	−0.018	0.097	−0.044	0.097
$θ_{4}$	0.033	0.101	−0.020	0.103	−0.044	0.097
$θ_{5}$	0.036	0.118	0.023	0.120	−0.019	0.128
$θ_{6}$	0.042	0.103	0.016	0.092	−0.020	0.100
$θ_{7}$	0.041	0.104	0.012	0.093	−0.017	0.100
$θ_{8}$	0.037	0.098	0.015	0.094	−0.012	0.097
$ζ_{11}$	0.013	0.076	0.033	0.075	0.086	0.117
$ζ_{21}$	0.019	0.051	−0.003	0.045	−0.004	0.054
$ζ_{22}$	0.001	0.076	0.032	0.082	0.084	0.115
$χ_{z 0}$	−0.189	0.234	−0.020	0.084	0.027	0.086
$χ_{z 1}$	0.110	0.204	0.035	0.069	−0.041	0.078
$χ_{z 2}$	−0.202	0.241	−0.080	0.093	−0.028	0.057

Table 2. Performance of Bayesian parameter estimates for

τ = 0.25

, 0.5 and 0.75 in Simulation 2.

Table 2. Performance of Bayesian parameter estimates for

τ = 0.25

, 0.5 and 0.75 in Simulation 2.

Par	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$		Par	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$
Par	Bias	RMS	Bias	RMS	Bias	RMS	Par	Bias	RMS	Bias	RMS	Bias	RMS
$λ_{21}$	−0.003	0.023	−0.006	0.027	0.002	0.025	$α_{01}$	−0.078	0.134	0.083	0.121	0.087	0.132
$λ_{31}$	0.005	0.022	−0.007	0.031	−0.002	0.024	$α_{02}$	0.086	0.133	−0.084	0.123	0.088	0.131
$λ_{41}$	−0.007	0.025	−0.012	0.029	0.005	0.025	$α_{03}$	−0.111	0.151	0.124	0.108	−0.088	0.133
$λ_{62}$	0.011	0.021	0.006	0.026	−0.003	0.024	$α_{04}$	0.093	0.152	−0.126	0.123	0.077	0.117
$λ_{72}$	0.006	0.025	0.003	0.027	−0.002	0.026	$α_{05}$	0.087	0.142	0.084	0.116	0.088	0.124
$λ_{82}$	0.013	0.026	−0.005	0.028	0.005	0.028	$α_{06}$	−0.093	0.132	0.084	0.124	−0.084	0.116
$ω_{01}$	0.088	0.133	−0.081	0.144	0.084	0.146	$α_{07}$	−0.089	0.161	−0.089	0.133	0.087	0.139
$ω_{02}$	0.094	0.146	−0.099	0.139	0.088	0.144	$α_{08}$	−0.092	0.144	0.067	0.125	0.081	0.129
$ω_{03}$	0.083	0.127	−0.106	0.154	0.091	0.131	$α_{11}$	−0.015	0.034	0.015	0.021	−0.015	0.022
$ω_{04}$	0.102	0.139	0.115	0.167	0.085	0.145	$α_{12}$	−0.011	0.031	0.017	0.027	−0.021	0.029
$ω_{05}$	0.087	0.142	0.083	0.131	0.088	0.153	$α_{13}$	−0.017	0.032	0.011	0.025	−0.021	0.034
$ω_{06}$	0.094	0.131	0.087	0.133	0.104	0.152	$α_{14}$	−0.014	0.033	0.017	0.031	−0.009	0.022
$ω_{07}$	0.086	0.119	0.072	0.126	0.085	0.139	$α_{15}$	−0.016	0.026	0.010	0.021	−0.016	0.030
$ω_{08}$	0.077	0.122	0.082	0.124	0.064	0.125	$α_{16}$	−0.011	0.017	0.012	0.022	−0.013	0.024
$ω_{11}$	0.075	0.149	0.074	0.123	0.066	0.127	$α_{17}$	−0.013	0.022	0.016	0.022	−0.016	0.027
$ω_{12}$	0.058	0.123	0.061	0.117	0.071	0.111	$α_{18}$	−0.015	0.026	0.011	0.021	−0.017	0.025
$ω_{13}$	0.062	0.119	0.066	0.112	0.059	0.114	$α_{21}$	0.084	0.143	−0.092	0.173	−0.065	0.166
$ω_{14}$	0.061	0.125	0.064	0.094	0.062	0.116	$α_{22}$	−0.085	0.155	−0.089	0.146	−0.051	0.147
$ω_{15}$	0.055	0.122	0.073	0.124	0.077	0.127	$α_{23}$	−0.023	0.095	−0.055	0.176	−0.073	0.164
$ω_{16}$	0.054	0.094	0.075	0.124	0.073	0.126	$α_{24}$	0.044	0.151	0.088	0.181	−0.068	0.151
$ω_{17}$	0.057	0.091	0.063	0.119	0.069	0.115	$α_{25}$	−0.036	0.166	−0.063	0.154	0.091	0.172
$ω_{18}$	0.051	0.087	0.061	0.111	0.065	0.083	$α_{26}$	−0.073	0.153	−0.077	0.159	−0.071	0.178
$ω_{21}$	0.092	0.114	0.085	0.089	0.067	0.083	$α_{27}$	−0.084	0.155	−0.069	0.178	−0.063	0.147
$ω_{22}$	−0.082	0.091	0.067	0.077	−0.066	0.065	$α_{28}$	−0.067	0.148	−0.084	0.179	−0.067	0.151
$ω_{23}$	−0.083	0.092	0.066	0.076	−0.063	0.071	$α_{31}$	−0.092	0.144	0.076	0.131	−0.088	0.136
$ω_{24}$	−0.088	0.093	0.063	0.075	−0.062	0.068	$α_{32}$	−0.091	0.148	0.082	0.129	−0.111	0.157
$ω_{25}$	0.103	0.098	0.077	0.083	0.071	0.091	$α_{33}$	0.108	0.155	−0.084	0.126	−0.087	0.143
$ω_{26}$	−0.083	0.088	0.053	0.075	0.068	0.073	$α_{34}$	−0.091	0.149	−0.079	0.138	0.103	0.156
$ω_{27}$	−0.077	0.085	0.064	0.074	−0.053	0.064	$α_{35}$	−0.0.91	0.151	0.087	0.134	0.091	0.146
$ω_{28}$	−0.082	0.092	0.061	0.073	−0.062	0.075	$α_{36}$	−0.084	0.144	0.082	0.135	−0.092	0.147
							$α_{37}$	0.103	0.155	−0.088	0.141	−0.096	0.137
							$α_{38}$	0.097	0.154	0.077	0.135	−0.102	0.154
Par	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$		Par	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$
Par	Bias	RMS	Bias	RMS	Bias	RMS	Par	Bias	RMS	Bias	RMS	Bias	RMS
$η_{111}$	0.053	0.116	0.043	0.111	0.054	0.115	$η_{522}$	0.086	0.152	0.088	0.156	0.103	0.156
$η_{121}$	0.050	0.111	0.047	0.109	0.052	0.116	$η_{523}$	0.074	0.149	0.083	0.166	0.088	0.161
$η_{122}$	0.052	0.116	0.041	0.112	0.053	0.112	$η_{611}$	0.077	0.133	0.081	0.154	0.087	0.148
$η_{123}$	0.066	0.110	0.053	0.106	0.053	0.113	$η_{621}$	0.082	0.154	0.087	0.157	0.112	0.157
$η_{211}$	0.055	0.111	0.051	0.108	0.233	0.112	$η_{622}$	0.085	0.166	0.091	0.163	0.087	0.166
$η_{221}$	0.052	0.112	0.052	0.111	0.222	0.111	$η_{623}$	0.096	0.171	0.084	0.166	0.095	0.168
$η_{222}$	0.051	0.115	0.046	0.104	0.057	0.123	$η_{711}$	0.054	0.141	0.078	0.141	0.077	0.153
$η_{223}$	0.045	0.116	0.042	0.112	0.056	0.122	$η_{721}$	0.077	0.152	0.044	0.143	0.067	0.145
$η_{311}$	0.062	0.156	0.054	0.143	0.078	0.154	$η_{722}$	0.067	0.144	0.058	0.145	0.068	0.144
$η_{321}$	0.077	0.161	0.071	0.151	0.088	0.155	$η_{723}$	0.066	0.138	0.071	0.144	0.061	0.143
$η_{322}$	0.067	0.145	0.066	0.148	0.072	0.152	$η_{811}$	0.062	0.167	0.068	0.138	0.075	0.156
$η_{323}$	0.077	0.148	0.083	0.152	0.090	0.167	$η_{821}$	0.048	0.144	0.069	0.145	0.073	0.148
$η_{411}$	0.074	0.150	0.071	0.161	0.076	0.161	$η_{822}$	0.067	0.156	0.059	0.144	0.076	0.149
$η_{421}$	0.089	0.145	0.073	0.151	0.073	0.155	$η_{823}$	0.081	0.152	0.066	0.137	0.077	0.156
$η_{422}$	0.092	0.156	0.081	0.152	0.089	0.154	$ζ_{11}$	0.083	0.146	0.087	0.132	0.078	0.115
$η_{423}$	0.082	0.155	0.087	0.153	0.077	0.151	$ζ_{21}$	0.049	0.123	0.067	0.112	0.061	0.135
$η_{511}$	0.095	0.167	0.082	0.154	0.086	0.158	$ζ_{22}$	0.088	0.148	0.083	0.145	0.092	0.121
$η_{521}$	0.067	0.158	0.088	0.166	0.092	0.153

Table 3. Performance of Bayesian parameter estimates for

τ = 0.25

, 0.5 and 0.75 in Simulation 3.

Table 3. Performance of Bayesian parameter estimates for

τ = 0.25

, 0.5 and 0.75 in Simulation 3.

Par	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$
Par	Bias	RMS	Bias	RMS	Bias	RMS
$λ_{21}$	0.023	0.052	0.022	0.041	0.028	0.053
$λ_{31}$	0.026	0.054	0.024	0.039	0.027	0.054
$λ_{41}$	0.027	0.055	0.022	0.038	0.026	0.053
$λ_{62}$	0.022	0.052	0.024	0.042	0.022	0.052
$λ_{72}$	0.021	0.051	0.021	0.038	0.027	0.055
$λ_{82}$	0.025	0.057	0.023	0.037	0.029	0.061
$θ_{1}$	0.051	0.142	0.035	0.133	0.066	0.125
$θ_{2}$	0.047	0.121	0.029	0.118	0.058	0.111
$θ_{3}$	0.044	0.123	0.027	0.117	0.053	0.112
$θ_{4}$	0.046	0.125	0.021	0.111	0.054	0.113
$θ_{5}$	0.052	0.144	0.037	0.139	0.68	0.129
$θ_{6}$	0.048	0.125	0.027	0.115	0.52	0.112
$θ_{7}$	0.049	0.124	0.025	0.113	0.54	0.116
$θ_{8}$	0.048	0.126	0.024	0.112	0.51	0.111
$ζ_{11}$	0.087	0.131	0.047	0.089	0.107	0.132
$ζ_{21}$	0.066	0.128	0.033	0.078	0.083	0.103
$ζ_{22}$	0.091	0.133	0.051	0.104	0.091	0.128
$χ_{z 0}^{*}$	0.562	0.521	0.033	0.091	0.046	0.103
$χ_{z 1}^{*}$	0.494	0.511	0.037	0.089	0.059	0.113
$χ_{z 2}^{*}$	0.326	0.367	0.106	0.126	0.044	0.073

Table 4. Bayesian estimates, lower and upper limits of 95% confidence intervals of parameters for

τ = 0.5

in a real example.

Table 4. Bayesian estimates, lower and upper limits of 95% confidence intervals of parameters for

τ = 0.5

in a real example.

Parameter	With			Without
Parameter	Estimate	Lower	Upper	Estimate	Lower	Upper
$λ_{21}$	0.891	0.886	0.893	0.897	0.895	0.901
$λ_{31}$	0.891	0.890	0.897	0.896	0.893	0.899
$λ_{41}$	0.664	0.663	0.666	0.664	0.662	0.665
$λ_{62}$	1.211	1.207	1.238	1.194	1.189	1.207
$λ_{72}$	0.730	0.719	0.731	0.725	0.719	0.728
$λ_{82}$	0.861	0.855	0.866	0.868	0.862	0.871
$θ_{1}$	−1.265	−1.262	−1.257	−1.261	−1.267	−1.256
$θ_{2}$	−0.263	−0.266	−0.261	−0.264	−0.269	−0.261
$θ_{3}$	−0.417	−0.415	−0.412	−0.415	−0.420	−0.411
$θ_{4}$	−0.873	−0.881	−0.872	−0.888	−0.892	−0.884
$θ_{5}$	−0.745	−0.749	−0.740	−0.746	−0.748	−0.744
$θ_{6}$	−0.652	−0.659	−0.653	−0.653	−0.661	−0.653
$θ_{7}$	−0.788	−0.792	−0.781	−0.783	−0.794	−0.776
$θ_{8}$	−0.481	−0.486	−0.480	−0.484	−0.487	−0.479
$ζ_{11}$	0.532	0.528	0.535	0.541	0.539	0.543
$ζ_{21}$	−0.025	−0.026	−0.023	−0.025	−0.025	−0.024
$ζ_{22}$	0.117	0.116	0.119	0.116	0.115	0.117
$φ_{z 0}$	−3.997	−4.047	−3.998	−4.105	−4.129	−4.054
$φ_{z 1}$	0.162	0.155	0.175	0.166	0.153	0.177
$φ_{z 2}$	0.133	0.128	0.135	0.131	0.128	0.135

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Variational Bayesian Estimation of Quantile Nonlinear Dynamic Latent Variable Models with Possible Nonignorable Missingness

Abstract

1. Introduction

2. Model

2.1. Quantile Nonlinear DLVM

2.2. Mechanism of Missing Data

2.3. The Missing Covariates Distribution

3. Variational Bayesian Inference

Variational Bayes

4. Simulation Studies

5. A Real Example

6. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Calculation the Evidence Lower Bound (ELB)

References

Article Metrics

Citations

Article Access Statistics