Information Geometry of the Exponential Family of Distributions with Progressive Type-II Censoring

In geometry and topology, a family of probability distributions can be analyzed as the points on a manifold, known as statistical manifold, with intrinsic coordinates corresponding to the parameters of the distribution. Consider the exponential family of distributions with progressive Type-II censoring as the manifold of a statistical model, we use the information geometry methods to investigate the geometric quantities such as the tangent space, the Fisher metric tensors, the affine connection and the α-connection of the manifold. As an application of the geometric quantities, the asymptotic expansions of the posterior density function and the posterior Bayesian predictive density function of the manifold are discussed. The results show that the asymptotic expansions are related to the coefficients of the α-connections and metric tensors, and the predictive density function is the estimated density function in an asymptotic sense. The main results are illustrated by considering the Rayleigh distribution.


Introduction
From the geometrical viewpoint, a parametric statistical model can be considered a differentiable manifold, and the parameter space can be regarded as a coordinate system of the manifold [1,2]. Let F = { f (x; θ), θ ∈ Θ} be a parametric statistical model with respect to some σ-finite reference measure µ, where θ is a real k-dimensional parameter vector belonging to some open subset Θ of the k-dimensional real space R k . For simplicity, a random variable X and its observed value x are uniformly denoted by x in this paper.
When the density function f (x; θ) is sufficiently smooth in θ and it is differentiable as a function of θ, it is natural to introduce the structure of an k-dimensional manifold in the statistical model F , where θ plays the role of a coordinate system. The geometrical quantities, such as connection, divergence, flatness, curvature and tangent space, play a fundamental role in the statistical inference and asymptotic theory (see, for example, Komaki [3,4] and Harsha and Moosath [5]).
In reliability engineering, a life testing experiment is one of the effective ways to obtain reliability information of a product. To save time and reduce the cost of a life testing experiment, censoring methodologies are often applied so that the experiment is terminated before all the items on the life-testing experiment fail. Some commonly used censoring schemes include the Type-I and Type-II censoring schemes, where the life-testing experiment will be terminated at a prefixed time point and the life-testing experiment will be terminated as soon as the m-th (m is prefixed) failure is observed, respectively. In other words, the experimental time is prefixed for Type-I censoring scheme and the number of observed failures is prefixed for the Type-II censoring scheme (see, for example, Ng [6]). The Type-I and Type-II censoring schemes have been generalized to a more complicated and flexible censoring scheme such as progressive censoring schemes [7][8][9] and hybrid censoring schemes [10,11]. For progressive Type-II censoring schemes, the conventional Type-II censoring scheme is extended to situations wherein censoring occurs in multiple stages. A progressive Type-II censored life-testing experiment will be carried out in the following manner. Suppose n items are placed on a life-testing experiment and we assume that these n items have lifetimes following distribution with density function f (x; θ). It is planned that m failures will be observed and R r items are randomly removed (i.e., censored) from the experiment at the time of the r-th failure. More specifically, at the time of the first failure (denoted by X 1:m:n ), R 1 randomly selected items from the n − 1 surviving items are removed from the life testing experiment; then, the experiment continues and at the time of the second failure (denoted by X 2:m:n ), R 2 randomly selected items from the (n − R 1 − 2) surviving items are removed from the experiment, and so on; finally, at the time of the m-th item failure (denoted by X m:m:n ), the experiment terminates and all the remaining R m = n − m − ∑ m−1 r=1 R r surviving items are censored. Here, R = (R 1 , R 2 , . . . , R m ) is the progressive Type-II censoring scheme for the life testing experiment with ∑ m r=1 R r = n − m. Note that, when R 1 = R 2 = . . . = 0, R m = n − m, the progressive Type-II censoring scheme reduces to the conventional Type-II censoring scheme.
Since the comprehensive studies of information geometry by Amari [1], information geometry has been productively used in different research fields including statistical learning, machine learning, neural networks, signal processing, information theory and so on (see, for example, Amari et al. [2] and Amari [12].) The information geometry methods are also widely used in statistics and reliability engineering. For example, Zhang et al. [13] discussed the Amari-Chentsov structure on the accelerated life test model with applications to optimal designs with different optimal criteria. The methods of information geometry are also employed to investigate the Bayesian prediction by taking α-divergences as loss functions [14]. In degradation modeling, a robust parameter estimation method was proposed in [15] by minimizing the f -divergence between the true model and suggested models.
In this paper, we investigate the tangent space, affine connection, α-connection, torsion and Riemann-Christoffel curvature of the manifold of the exponential family of distributions with progressive Type-II censoring scheme. These geometric quantities can be applied to different areas in statistics such as Bayesian analysis. Note that one of the challenges of Bayesian analysis is to calculate the integrals for obtaining the posterior distribution, especially when the number of parameters is large. Instead of using numerical methods to approximate those integrals, the geometric quantities developed in this paper can provide an efficient theoretical method to approximate those integrals involved in Bayesian prediction. The main contributions and the organization of this paper are described as follows: • Asymptotic theory plays an important role in statistical inference, which consider the properties of statistical procedures as the sample size increases. Geometrically, an approximation to a manifold is a local linearization by the tangent space. Thus, the tangent space of the manifold of the exponential family of distributions with progressively Type-II censored data is discussed in Section 2.

•
The local linearization accounts only for local properties of a statistical model. It is necessary to investigate the Fisher metric tensors, affine connection, and α-connection of the manifold in order to study the global or large-scale properties of the model. Therefore, these important geometric quantities are studied in Section 3.

•
As an application of the geometric quantities, the asymptotic expansions of the posterior density and the posterior Bayesian predictive density of the model are provided in Section 4. • To illustrate the results presented in this paper, the Rayleigh distribution is considered as an example in Section 5. Moreover, Monte Carlo simulation results and a real data analysis are presented in Section 6 to illustrate the main results.

The Statistical Model and Tangent Space
In this paper, we adopt the Einstein summation convention, that is, if an index occurs both as a superscript and as a subscript in a single expression, then the summation over that index is implied. For a density function f (x; θ) ∈ F , let l(x; θ) = log f (x; θ), we introduce the following definitions (see [1,2] for more details): : the Fisher metric tensors, the inverse of g ij is denoted by : the skewness tensor; the α-connection. The −1-connection and 1-connection are said to be the m-connection and e-connection, denoted by Γ m ijk and Γ e ijk , respectively. We also abbreviate some geometric terms by multiplication the metric tensors, i.e., T i = T ijk g jk , Γ l ij = Γ ijk g kl , Γ α,l ij = Γ α ijk g kl . Suppose that F = { f (x, θ), θ ∈ Θ} is an exponential family of distributions (see, for example, Barndorff-Nielsen [16]) with density function and reliability function where τ is the number of functions of the parameter vector θ, F(x, θ) is the cumulative distribution function, and ψ(θ) is the cumulant generating function defined as with α i (θ) and β i (θ) are smooth functions of the parameter vector θ, and c i and d i are smooth functions of the random variable x. Here are two examples, the exponential and the Rayleigh distributions, of the members in the exponential family of distributions: • Exponential distribution with density function and reliability function we have τ = 1 and the functions α 1 (λ) = β 1 (λ) = −λ and c 1 (x) = d 1 (x) = x, ψ(θ) = − ln λ and φ(θ) = 0. The dimension of the parameter vector θ is k = 1. • Rayleigh distribution with density function and reliability function We have τ = 2 and the functions α Consider the life-testing experiment with progressive Type-II censoring described in Section 1 with n items placed on the life testing experiment and m failures are planned to be observed, let the set of all admissible Type-II PCSs as where N 0 is the set of the non-negative integers. Under a given censoring scheme R = (R 1 , . . . , R m ) ∈ PC(m, n), the set of progressively Type-II censored order statistics is denoted as x R m:n = {x R 1:m:n , . . . , x R m:m:n }. The PCS R = (R 1 , . . . , R m ) is prefixed prior to starting the life testing experiment.
Suppose the lifetime distribution of the items in the life testing experiment follows a distribution in the exponential family of distributions with density function in Equation (1), the joint density function of the observed data, x R m:n , can be expressed as [8,10] L(x R m: where is the normalizing constant. By defining a new random variables x R i;r:m:n = e i (x R r:m:n ), the joint density function in Equation (4) can be expressed as The parameter θ of this form is called the natural parameter of the joint density function of the exponential family of distributions with progressive Type-II censoring. The tangent vector T θ of the manifold of the function L(x R m:n ; θ) is spanned by the vectors ∂ i = ∂/∂θ i , and the set {∂ i } is called the natural basis associated with the coordinate system θ. Let θ i e i x R r:m:n − mϕ(θ), and the set T be the linear space of random variables spanned by ∂ i l(x R m:n ; θ). The space T (1) θ is called the 1-representation of the tangent space with progressively Type-II censored data. Here, the basis ∂ i l(x R m:n ; θ) of the 1-representation is given by and the second and the third order derivatives of l(x R m:n ; θ) are given by

The α-Connections of Manifold Model
In this section, we investigate the α-connection of the manifold of the statistical model for the exponential family of distributions with progressively Type-II censored data. From Equation (4), the normalization factor ϕ(θ) can be defined as Since the function under the integral is assumed to be continuously differentiable, the order of integration and differentiation can be switched, and hence, the first three derivatives of the function ϕ(θ) with respect to the natural parameter θ i are given by where the expectations E L [·] are taken with respect to the joint density function in Equation (4). The derivatives in Equations (7)- (9) can be considered as the expected value, the covariance and the third-order central moments of ∑ m r=1 e i x R r:m:n , respectively. The derivative in Equation (7) can also be obtained from the condition The derivatives in Equations (8) and (9) can be obtained by calculating, respectively, Equations (8) and (9) show that the (i, j) element of the metric tensors is given by the (i, j, k) element of the skewness tensor is given by and the (i, j, k) element of the affine connection is given by Therefore, based on the joint density function L(x R m:n ; θ), the α-connection of the manifold of an exponential family of distributions is given by which means that the natural parameter θ is 1-affine, i.e., Γ ijk = 0. Based on the information carried by the joint density function in Equation (4), we can obtain the following results.

Theorem 1.
The metric tensors and the α-connection of the exponential family of distributions are given by respectively.
From the α-connection, we can obtain the torsion and the Riemann-Christoffel curvature of the manifold. The torsion is represented by the torsion tensor whose components are given by [1,2] S ijk (θ) which is a tensor anti-symmetric with respect to indices i, j. Note that the coefficients Γ α ijk (θ) of the α-connections are symmetric with respect to the first two indices i and j, then the tensor S ijk (θ) vanishes for any α-connection. This shows that the manifold of the statistical model of the exponential family of distributions with progressively Type-II censored data is torsion-free.
The Riemann-Christoffel curvature of the manifold can be obtained as [1,2] where Γ k ij = g km Γ ijm . The Riemann-Christoffel curvature based on the α-connection is called the α-Riemann-Christoffel curvature and its tensor is given by The tangent space of the manifold is said to be α-flat if the α-Riemann-Christoffel curvature R α ijkm = 0. We can also obtain the α-covariant derivative and the Laplace operator based on the α-connection and the metric tensors.
In the above process for obtaining those geometric quantities, we only use the information from the joint density function L(x R m:n ; θ). There is, in fact, another kind of information in the progressively Type-II censored order statistics x R r:m:n (r = 1, . . . , m). We can consider the marginal density function of the r-th progressively Type-II censored order statistic, x R r:m:n (see, for example, Kamps and Cramer [17], Balakrishnan [18], Balakrishnan and Aggarwala [8], and Balakrishnan and Cramer [10]) where Based on the marginal density in Equation (10), the expectations of the random variables e i (x R r:m:n ) (r = 1, . . . , m) in Equation (5) can be obtained as is taken with respect to the density function in Equation (10). Suppose that the random variables e i (x) (r = 1, . . . , m) are independent, and let we can obtain the following results.
Theorem 2. The Fisher metric tensors and the α-connection of the exponential family of distribution with progressively Type-II censored data are given bỹ

Applications in Bayesian Predictive Inference and Asymptotic Expansions
In Bayesian inference for the exponential family of distributions, the parameter vector θ is considered as a random variable. Given a prior density function for θ, π(θ), the joint posterior density function of the exponential family of distributions with progressively Type-II censored data can be expressed as and the posterior Bayesian predictive distribution is given bŷ where x is an unobserved set of observations to be predicted and it is independently distributed according to the same density f (x; θ) ∈ F . The predictive densityf (x|θ) is called the plug-in density function or the estimative density function, whereθ =θ(x R m:n ) is an estimate of θ based on the observed progressively Type-II censored sample x R m:n (see, for example, Geisser [19]). Consider the Kullback-Leibler divergence as the loss function, the predictive distribution in Equation (12) is the best predictive distribution in the sense that it minimizes the Bayes risk defined as [20] π The integral defined in the predictive density in Equation (12) can be difficult to integrate or the form is too complicated to be used in practice. In these situations, asymptotic or large-sample theory (see, for example, Barndorff-Nielsen and Cox [21]) can be considered. In this section, we adopt the metric tensors and the α-connection introduced in Sections 2 and 3 to study the asymptotic expansions of the posterior joint density and the Bayesian predictive density of the exponential family of distributions with progressively Type-II censored data. A similar asymptotic expansion of Bayesian prediction based on a full sample can be found in Zhang et al. [14]. For simplicity, we only consider the information carried by the joint density function in Equation (4), a similar process can be applied for the situation where the information obtained from the joint density function in Equation (4) and the marginal density function in Equation (10) together. (11) can be expressed asymptotically as

Theorem 3. Given a prior distribution π(θ) for θ, the posterior distribution in Equation
whereθ i = θ i −θ i andθ is an estimator of parameter set θ.
Proof. Using the Laplace method suggested by Nielsen and Cox [21], the posterior distribution can be expressed asymptotically as We have which implies that Based on the asymptotic expansion presented in Theorem 3, we can obtain the following result.
Theorem 4. Given a prior distribution π(θ) for θ, the predictive distribution in Equation (12) can be expressed asymptotically aŝ Proof. The proof is similar to the proof of Theorem 2 in Komaki [3]. The proof can be completed by substituting respectively.
If the prior distribution π(θ) is the Jeffreys prior π J (θ) ∝ |g ij | 1 2 , then from the relationship The following results can be immediately obtained.

Corollary 2.
Given the Jeffreys prior π J (θ) ∝ |g ij (θ)| 1/2 , the prediction (12) can be asymptotically expanded aŝ These results show that the predictive density function, when the sample size n approaches infinity, is the estimative density function in the asymptotic sense.

Illustration Example
The illustration of the geometric quantities for exponential distribution has been provided in the literature (see, for example, [12]). In this section, we use the Rayleigh distribution, a member of the exponential family of distributions, presented in Section 2 as an example to illustrate our results. Suppose that x R m:n is the progressively Type-II censored order statistics form items with lifetimes follow the Rayleigh distribution with density function in Equation (3), then the joint density function of x R m:n can be expressed as r:m:n 2 , e 2 x R r:m:n = ln x R r:m:n and ϕ(θ) = ψ(θ) = − ln(λ). Let Then, the first three derivatives of the function l(x R m:n ; θ) can be obtained as The maximum likelihood estimator (MLE) of the parameter λ can be derived aŝ We first consider the information carried by the joint density in Equation (4). The metric tensors have one element, that is, The skewness tensor can be written as The affine connection and the α-connection can be obtained as respectively. The coefficients of the m-connection and e-connection are G m,1 11 (θ) = Γ m 111 (θ)g 11 (θ) = −2/λ, G e,1 11 (θ) = Γ e 111 (θ)g 11 (θ) = 0, respectively. For Bayesian inference, we consider the Jeffreys prior for the parameter λ, i.e., π J (θ) ∝ √ m/λ, then the posterior distribution of λ is , which can be written as Here, the predictive distribution iŝ which can be expanded asymptotically aŝ In the following, we consider the information obtained from the marginal density function in Equation (10) and the joint density function in Equation (4) together. Notice that which implies Thus, the affine connection is specified as The metric tensorg 11 (θ) and the skewness tensorT 111 (θ) are the same as the expressions in Equations (13) and (14). The α-connection is reduces tõ The coefficients of the m-connection and the e-connection arẽ respectively. Therefore, based on the Jeffreys prior π J (θ) ∝ √ m/λ, the Bayesian predictive density function of the Rayleigh distribution with progressively Type-II censored data can be asymptotically expanded aŝ This shows that the predictive density function, with the increase of the sample size n and the observed sample size m, is the estimative density function in the asymptotic sense. The term can be considered the correction term due to the information carried by the density function in Equation (10).

Monte Carlo Simulation Study and Real Data Analysis
In this section, we present a Monte Carlo simulation study of the Bayesian prediction based on progressively Type-II censored data described in Section 4. We also present a real data analysis based on the progressive Type-II censored data discussed in the literature. In the Monte Carlo simulation study, we consider different sample sizes (n, m) = (10, 30), (10,35), (15,40) and (20,40) and three different censoring schemes: The progressively Type-II censored data, x R m:n , are generated based on the Rayleigh distribution in Equation (3) with parameter λ = 2 for different sample sizes and censoring schemes. For the proposed Bayesian prediction (BP), we consider two different priors: (i) the Jeffreys prior π J (θ) ∝ √ m/λ; and (ii) the uniform prior π I on interval (0, 3). For comparative purposes, we also consider the plug-in prediction (PP) approach in which the estimative density functionf (x,λ) is also considered. For the plug-in approach, the parameter is estimated by using the maximum likelihood method based on the simulated progressive Type-II censored sample x R m:n . The estimated biases and mean square errors (MSEs) of different prediction approaches for predicting the probability density at x = 2.5 based on 10,000 simulations are presented in Table 1.
From Table 1, we observe that the performances of all prediction methods improve in terms of MSEs as the sample sizes m and n increase. In other words, the number of items being removed during the progressively Type-II censored experiment affects the performance of prediction. Moreover, we observe that the Bayesian prediction method with the Jeffreys prior can provide smaller biases and smaller MSEs compared to the plug-in prediction method in some cases. To illustrate the practical applications of the approximate methods based on geometric quantities proposed in this paper, we analyze a real data set which contains the times to breakdown of an insulating fluid at 34 kV originally presented in Nelson [22] (Table 6.1). A progressively Type-II censored sample of size m = 8 generated from the n = 19 observations by Balakrishnan et al. [9] is analyzed here. The progressively censored sample and the progressive censoring scheme are presented in Table 2. Suppose that we assume the lifetimes of the insulating fluid tested at 34 kV follow a Rayleigh distribution and we are interested in predicting the probability density, based on the progressive Type-II censored data presented in Table 2, the predicted density curves obtained from the plug-in prediction approach and the proposed Bayesian prediction approach with two different priors are presented in Figure 1. From Figure 1, we observe that the three prediction methods provide similar predicted density curves in this case.
For instance, if we are interested in predicting density at x = 2.8, based on the progressive Type-II censored data presented in Table 2, the predicted values of plug-in prediction densityf (x,λ) is 0.230, and the Bayesian prediction densitiesf π J x|x R m:n with Jeffreys prior π J andf π I x|x R m:n uniform prior π I are 0.229 and 0.232, respectively.  Table 2.

Conclusions
In this paper, we discussed the tangent space, affine connection, α-connection, torsion and Riemann-Christoffel curvature of statistical manifold induced by the exponential family of distributions. As applications of these geometric quantities, the asymptotic expansions of the Bayesian posterior distribution and prediction function with progressively Type-II censored data were discussed. The results showed that the asymptotic expansions are related to the geometric quantities. We also illustrated the main results by studying the Rayleigh distribution. Note that more theoretical results and applications of information geometry in reliability in addition to the main results of this paper can be found in the Ph.D. thesis [23].