Evidential Estimation of an Uncertain Mixed Exponential Distribution under Progressive Censoring

In this paper, the evidential estimation method for the parameters of the mixed exponential distribution is considered when a sample is obtained from Type-II progressively censored data. Different from the traditional statistical inference methods for censored data from mixture models, here we consider a very general form where there is some uncertain information about the sub-class labels of units. The partially specified label information, as well as the censored data are represented in a united frame by mass functions within the theory of belief functions. Following that, the evidential likelihood function is derived based on the completely observed failures and the uncertain information included in the data. Then, the optimization method using the evidential expectation maximization algorithm (E2M) is introduced. A general form of the maximal likelihood estimates (MLEs) in the sense of the evidential likelihood, named maximal evidential likelihood estimates (MELEs), can be obtained. Finally, some Monte Carlo simulations are conducted. The results show that the proposed estimation method can incorporate more information than traditional EM algorithms, and this confirms the interest in using uncertain labels for the censored data from finite mixture models.


Background
Mixture models are of great importance in many applied sciences such as survival analysis, pattern recognition, image analysis, economics, and so on [1,2]. In reliability analysis, there are only one population and one type of failure in the simple case of life distributions. However, in real applications, there may be more than one failure cause [3,4]. In this case, the response of the modeling process can be seen as from several distinct sub-populations, and finite mixture models can be used to represent the time of failures.
As we can see, the samples from mixed distributions are not precisely labeled by their origin sub-classes; thus, the heterogeneous dataset cannot be explicitly decomposed to homogeneous subgroups. Consequently, it brings about a barrier for estimating finite mixture models [5].
Consider a life-testing experiment where n units are placed under observation. For some reason, such as to save time and cost, we have to terminate the experiment before all items have failed. In real applications, the removal of units prior to failure is often pre-planned. Data obtained from such a type of experiments are called censored data. The most common censoring schemes are termed Type-I and Type-II censoring. In Type-I censoring, the experiment continues up to a prescribed time T. Any failures that occur after T cannot been observed. The endpoint of the experiment T is assumed to be independent of the failure times. In Type-II censoring, the experiment is terminated upon the N th failure, where N < n is prefixed. The experiments under these two kinds of test schemes have the drawback that they do not allow removal of samples at time points other than the termination of the experiment. However, the Type-II progressive censoring, which can be seen as a generalization of Type-II censoring, allows removing some fixed number of units at the time of the first observed m − 1 failures and removing all the remaining at the time of the m th failure, at which time, the experiment terminates [6]. As a result, it is efficient in time and money [7] and has become very popular in the last few years [8][9][10][11][12][13][14].
The parameter estimation problems for different lifetime distributions, including the mixed exponential distribution and some other mixed distribution models, have been widely studied under different censoring schemes [15][16][17]. The most commonly adopted methods to obtain the estimations for mixture models are simply using the expectation maximization (EM) algorithm or Bayesian method. Lee and Scott [18] presented the EM algorithm for fitting multivariate Gaussian mixture models to data that are truncated, censored, or truncated and censored. In [19,20], the authors discussed the parameter estimation methods of the MED under Type-II progressively censored data and progressively hybrid Type-II censored data, respectively. Tahir et al. [21] studied the problem of estimating the parameters of a three-component mixture of exponential, Rayleigh and Burr Type-XII distributions using the Type-I right censoring scheme in the Bayesian framework. The maximum likelihood and Bayesian estimators of the parameters of a heterogeneous population represented by a finite mixture of two Pareto distributions were discussed in [22]. Feroze and Aslam [23] introduced the Bayesian approach for estimating the parameters of the two-component mixture of Weibull distribution under doubly censored samples. Feroze and Aslam [24] considered the Bayesian analysis of the three-component mixture of the Rayleigh distribution under doubly censored samples. All these works assume that we do not have any information about the label of the sub-class at all. However, in many applications, often, it is easy to get some imprecise and uncertain knowledge about the sub-class labels. For instance, in medical surveillance databases, we can find partially labeled data provided by experts or from experience, that is, while not completely unlabeled, there is only uncertain information about class values [5]. The above-mentioned methods cannot deal with the possible available partial label information in the dataset from mixture models.
Zio [25] pointed out that uncertain information is a big challenge in reliability engineering. Specifically, there indeed exits many kinds of uncertain information in reliability analysis for censored data. First, we only know that the failure time of the censored unit belongs to the interval [t * , +∞). Second, as mentioned, in the finite mixture models, there may be some uncertain information about the sub-class labels of the data. The theory of belief functions (also known as Dempster-Shafer theory (DST)) is appealing to represent data uncertainty. As an extension of probability theory, it has many advantages in dealing with uncertain information. Similar to the probability distribution over the discernment frame, in DST, the basic belief assignment (BBA) defined on the power set of the frame is used to represent the available information. Many scholars have studied how to measure the uncertainty of BBA [26][27][28][29]. Due to the effectiveness of DST in dealing with uncertain knowledge, it has already been widely used in many fields such as data classification/clustering [30,31], target recognition [32], decision making [33], fault diagnosis [34,35], complex networks [36,37], and so on.
In this paper, we try to handle the uncertain information in mixed distributions under Type-II progressive censoring using the theory of belief functions. Note that for the progressively censored data considered in this work, the incomplete observations are not influenced by the specific values taken by the random variables. Thus, the mechanism that causes complete failure data cases to become incompletely reported can be ignored. Under such a kind of coarsening at random (CAR) assumption, the statistical inference can proceed based on the so-called "face-value likelihood", which measures the probability of incomplete observations by its marginal probability according to the underlying complete data distribution [38,39]. The EM algorithm is quite effective at maximizing the face-value likelihood [38], and it has been widely used for progressively censored data [7,40]. Different from the traditional estimation method using the EM algorithm, the proposed evidential parameter estimation model can take not only the uncertain censored observations, but also the prior partial information about class labels into consideration. The two kinds of uncertain information are modeled in a united frame by mass functions in belief function theory, and then, the evidential likelihood function is derived. The optimization method to get the optimal estimators, called maximal evidential likelihood estimates (MELEs), is derived based on the evidential-EM (E2M) algorithm [41]. Experimental results show that the proposed model can take advantage of the the partial information about the sub-class labels effectively and consequently improve the performance of the estimation model.
The rest of this paper is organized as follows. In Section 2, some basic concepts and the rationale of our method are briefly introduced. In Section 3, the evidential estimation model to get MELEs, which can maximize the evidential likelihood function, is presented in detail. In order to show the effectiveness of our approach, a real data application is discussed in Section 4, while some Monte Carlo simulations are conducted in Section 5. Finally, we conclude the paper in Section 6.

Preliminary Knowledge
Some necessary background knowledge related to this paper will be recalled in this section. The Type-II progressive censoring scheme is first introduced in Section 2.1. The basic knowledge about the theory of belief functions is then presented in Section 2.2. Finally, the evidential likelihood is recalled in Section 2.3.

Type-II Progressively Censoring Scheme
The Type-II progressive censoring scheme can be described as follows. Suppose n independent identical items are placed on a life-test. The integer N(< n), which denotes the number of failures we would like to observe in the experiment, is fixed before the experiment. Let R = (R 1 , R 2 , . . . , R N ), where R 1 , R 2 , . . . , R N are also fixed integers describing the progressive censoring scheme. They should satisfy: At the time of the first failure, say T 1 , R 1 of the remaining units are randomly removed. Similarly, at the time of the i th failure, say T i , R i (i = 1, 2, . . . , N) of the remaining units are removed. At the time of the N th failure, say T N , the remaining R N = n − N − ∑ N−1 i=1 R i items are removed, and the experiment terminates. Therefore, in the presence of the Type-II progressive censoring scheme, the observed failures are {T 1 , . . . , T N }. For further details on this censoring scheme, the readers may refer to the excellent monograph of Balakrishnan and Aggarwala [6].

Theory of Belief Functions
The theory of belief functions is a mathematical theory that generalizes the theory of probabilities by giving up the additivity constraint. In this theory, justified degrees of support are assessed according to an evidential corpus, which is the set of all evidential pieces of evidence held by a source that justifies the degrees of support assigned to some subsets.
Let Ω = {ω 1 , ω 2 , . . . , ω c } be the finite domain of reference, called the discernment frame. The c elements in Ω are nonempty and mutually exclusive hypotheses related to a given problem. The belief functions are defined on the power set 2 Ω = {A : A ⊆ Ω}. The function m : 2 Ω → [0, 1] is said to be the basic belief assignment on 2 Ω , if it satisfies: Every A ∈ 2 Ω such that m(A) > 0 is called a focal element. The difference from probability models is that masses can be given to any subset of Ω instead of only to the atomic element of Ω. The credibility and plausibility functions are defined as in Equations (6) and (7), respectively: Each quantity Bel(A) measures the total support given to A, while Pl(A) represents the potential amount of support to A. The function pl : Ω → [0, 1] such that pl(ω) = Pl({ω}) is called the contour function associated with m.
According to the type of the focal elements, we can define some particular mass functions. A categorical mass function is a normalized mass function that has a unique focal element A * . This kind of mass function can be defined as: A vacuous mass function is a particular categorical mass function focused on Ω. It is a special kind of categorical mass function with a unique focal element Ω. This type of mass function is defined as follows: The vacuous mass function emphasizes the case of total ignorance. A Bayesian mass function is a mass function for which all focal elements are elementary hypotheses, i.e., the focal elements are all singletons. It is defined as follows: As all focal elements are single points, this mass function is a probability distribution over frame Ω. Specifically, if a Bayesian mass function is categorical, it describes that there is no uncertainty at all, and we are completely sure about the state of the variable concerned.
If m 1 and m 2 are two independent mass functions defined on Ω, a new mass function can be formed by combining m 1 and m 2 using Dempster's rule directly: where k is defined as This describes the conflict between m 1 and m 2 . When k = 1, the two masses are completely in conflict, and they cannot be combined using Dempster's rule. Suppose m 1 is a Bayesian mass function, and its corresponding contour function is a probability distribution function defined by p 1 (ω) = m({ω}). Assume m 2 is an arbitrary mass function with contour function pl 2 (ω). The fused mass function of m 1 and m 2 by Dempster's rule yields a Bayesian mass function. Its corresponding contour function can be defined by [41]: where k is the conflict between m 1 and m 2 . It can be written as: The item ∑ w∈Ω p 1 (ω)pl 1 (ω) can be regarded as the mathematical expectation of pl 2 with respect to p 1 . If m 2 is categorical and such that m 2 (A) = 1, then p 1 ⊕ pl 2 is the probability distribution by conditioning p 1 with respect to A.
Let m X and m Y be two mass functions defined on Ω X and Ω Y , respectively, and Pl X and Pl Y are the associated plausibility functions. Pl XY is the plausibility function defined on the product frame Ω X × Ω Y . Variables X and Y are called cognitively independent if: If two variables are cognitively independent, the evidence on one variable does not affect the beliefs on other variables. It is clear that cognitive independence reduces to stochastic independence when m X and m Y are Bayesian.

Evidential Likelihood
Let X be a random vector with probability density function p X (·; θ), where θ is an unknown parameter. Assume that X is perfectly observed, and let x 0 be a realization of X. The complete likelihood function given x 0 would be defined as: If x is not precisely observed, but we only know that where Ω X is the set from which X can take values. Then, the likelihood function given such imprecise data can be defined as [41]: More generally, if the observation x is not only imprecise, but also uncertain, it can be described by a mass function m on Ω X with focal elements A 1 , . . . , A r . The corresponding masses assigned to each focal element are m(A 1 ), . . . , m(A r ). The likelihood function given such uncertain data can be extended as [41]: It can be rewritten as: From Equation (19c), we can see that L(θ; m) depends on m through its associated contour function pl. As a result, L(θ; m) can be denoted by L(θ; pl) indifferently. We can see that L(θ; pl) is the expectation of pl(x) with respect to p(x; θ). It is often called the evidential likelihood function [42]. The evidential likelihood function L(θ; pl) can be seen as a special case of L(θ; A) and L(θ; x). If pl is associated with a categorical mass function m A , then: Equation (19c) equals the likelihood function given the imprecise data in Equation (17). If pl is associated with a certain mass function: Equation (19c) degrades to the complete likelihood defined in Equation (16).

Maximum Evidential Likelihood Estimates for Uncertain Progressively Censored Data
The EM algorithm is usually adopted to obtain the maximal likelihood estimates (MLEs) for the datasets with missing information. However, it cannot deal with the partially labeled information included in the finite mixture model. In this section, we discuss the evidential likelihood for the uncertain progressively censored data from mixture distributions. The optimal statistical estimates that can be obtained by maximizing the evidential likelihood function (named MELEs) are derived, and the relation with the traditional EM model is also discussed.

Evidential Likelihood for Uncertain Progressively Censored Data
Assume that X 1 , X 2 , . . . , X n are n independent variables that follow the mixture distribution in the form of Equation (1). Denote their probability density function by f (x; π, λ), the probability distribution function by F(x; π, λ) and the survival function by s(x; π, λ). Suppose n units are placed on a life-test under the progressive censoring scheme.
In the Type-II progressive censoring scheme, we can denote the observed failure times by T i , i = 1, 2, . . . , N, and use T = (t 1 , t 2 , . . . , t N ) to describe the completely observed data. At the time of the i th observed failure, we remove R i (i = 1, 2, . . . , N) items. Denote the failure times of these units by: It is easy to know that Z cannot be observed in the experiment.
denotes whether the completely observed failure ij = 0. Similarly, we can define the indicator for the unobserved data z ij . Let B ijk represents whether the j th removed unit at the time t i comes from the k th component in the mixture model. Let The complete variables in this life-test can be denoted by W = (X, B), where X = (T, Z) and B = (B (1) , B (2) ). It is clear that the observed data in this model are T, and the hidden variables are Z and B. Let θ = (π, λ), then the probability function of W under the Type-II progressive censoring scheme is: The likelihood function based on the complete data is: As mentioned, in this life-test, the observed data are T = {t i , i = 1, 2, . . . , N}. The likelihood function based on the observed data is: where C is a constant number. The maximum likelihood estimates can be obtained by maximizing Equation (22). The EM algorithm is usually adopted to get the MLEs for the progressively censored data [7]. As we can see, this likelihood function is based on the observed data, but it ignores some possible uncertain information of the partial labels. In fact, there are two kinds of uncertain information in this case. One is the failure time of the removed units during the experiment, for which we only know their censored time. The other is the sub-class label of each unit. Here, we use the mass functions in belief function theory to model these two types of uncertain information in a united frame. In order to establish a united frame for the data, we can describe all the data obtained in the life-test using different kinds of mass functions. If X is a completely observed data and we assume that X = t * , we can model this information with a certain mass function. Its contour function can be defined as: If X is the censored data at time t * , we only know that x ∈ [t * , +∞). We can model this uncertain information using the following contour function: As mentioned before, in real applications, some uncertain information could be obtained from the experts or from historical data. The plausibility is adopted to express the uncertain information about the partial class labels of units. For the completely observed data t i , the plausibility about the proposition that the unit with failure t i comes from the k th component is pl (B) t i ,k . For the censored data z ij , we can use pl Suppose that the failure time X and the sub-class label B are cognitively independent. Therefore, the contour function of the complete data w i = (x i , b i ) can be defined by: where pl (B) x i ,k is the plausibility that x i comes from the k th sub-group and pl (X) (x i ) describes the information about the failures. For the completely observed data and censored data, pl (X) (x i ) can be defined as Equations (23) and (24), respectively.
Considering the progressive censoring scheme, given the data w = (x, b), the contour function of x can be defined as: The evidential likelihood for the progressively censored data can be derived as: where θ = (π, λ) denotes the unknown parameters.

Corollary 1.
If there is no information about the sub-class label of each unit, we can use the vacuous mass function on Ω B , i.e., pl (B) x i ,k = 1. Therefore, for the observed data and the censored data, we can get: In this case, the evidential likelihood function defined in Equation (27f) can be rewritten as: This is the same as Equation (22).
From Corollary 1, we can see that the traditional likelihood can be regarded as a special case of the evidential likelihood for the progressively censored data when there is no prior information about the sub-class label of each unit.

The Optimal Estimates under the Evidential Likelihood
The E2M algorithm can be evoked to derive the MELEs for the uncertain progressively censored data from mixture models. Given the initial parameter value of θ = θ (0) , the E-step and M-step are repeated alternatively.
• E-step: Derive the Q function, which can be seen as the expectation of log L(θ; w) with respect to p(w|pl; θ (q) ).
The probability function p(·|pl; θ (q) ) can be seen as the Dempster combination of p(w; θ (q) ) and pl(w). The former describes the random uncertainty due to the underlying population, while the latter reflects the epistemic uncertainty due to the partial knowledge.

Corollary 2.
When the contour function pl(w) degrades to the categorical mass function, the probability function p(w|pl; θ (q) ) degrades to the conditional probability function P(·|y, θ (q) ) in EM.

The Estimation Methodology for a Mixed Exponential Distribution
The likelihood function based on the complete data w for Type-II progressively censored data from MED is: The log-likelihood is: As before, let θ = (π, λ) denote the unknown parameters of the MED model. Given the initial parameter θ (0) = (π (0) , λ (0) ), the alternative iterations of the E2M algorithm are described in the following.

Corollary 3.
If there is no uncertain information in the experiment, i.e., pl x i ,k = 1, the conditional expectations defined in Equations (44), (46), and (50) are the same as those in the conventional EM algorithm: • M-step: In order to get the updated estimates, we have to maximize the Q(θ, θ (q) ) function with respect to parameters π and λ. The Lagrangian method can be adopted to consider the constraint ∑ m k=1 π k = 1. Let: We can get: where From Equations (52) and (54), we know: Thus, it is easy to get: where A The E-step and M-step will be repeated iteratively until: and: where is a given small positive constant. Suppose that the algorithm stops at the l th step, then θ (l) = (π (l) , λ (l) ) are the MELEs for the parameters. The whole process of the estimation is summarized in Algorithm 1.

Algorithm 1 :
The evidential estimation approach for uncertain MED under progressive censoring.
Output: The MELEs of the parameters.

Data Analysis
In this section, a real dataset of the air conditioning system of an airplane is considered to illustrate the practical application of the proposed evidential estimation model. The group of failure times of the system is given as follows: 1, 3, 5, 7, 11, 11, 11, 12, 14, 14, 14, 16, 16, 20, 21, 23, 42, 47, 52, 62, 71, 71, 87, 90, 95, 120, 120, 225, 246, 261. This dataset was analyzed by different authors using various models [43,44]. In [19], the authors found that the MED model with two components (m = 2) provided a good fit to the given failure data. The fitted probability density function can be obtained by the maximal likelihood estimations: Suppose we have some uncertain information about the sub-class labels. This type of knowledge can be obtained by experts or from historical data. Here, we can use the evidential c-means (ECM) clustering algorithm [31] to obtain the contour function for each sample. From the given failures, it is easy to see that they can be divided into two groups. The credal partition found by ECM can provide us with the plausibility of the data belonging to the sub-groups. In ECM, let C = 2, and the other parameters are set as the default. The plausibility matrix describing the partial information about the sub-class label of each sample is illustrated in Table 1.
The maximum likelihood estimations in the sense of evidential likelihood (denoted by MELE) that can integrate the uncertain label information can be obtained: We can see that the estimation results by the proposed estimation model are closer to the estimations by the complete data.  As we know, in the traditional EM algorithm, the solution depends on the initial values adopted. In order to show the influences of the initial values for the proposed estimation model, we will try with different initialization strategies. For parameter π, we evoke the estimation model with: while for λ, we set: where |x| is the absolute value of x and N(0, 0.1) denotes a random number from the normal distribution with µ = 0, σ 2 = 0.1. The E2M algorithm and EM algorithm are both repeated 100 times. The mean value and the standard variances of the estimations are reported, which is illustrated in Table 2. It is easy to see that the standard variation values of MELEs are smaller than those of MLEs. We can conclude that the use of the uncertain information can reduce the influence of the initializations in the estimation model. Many initialization methods have been studied for the EM algorithm, which is worth learning and a reference for the proposed estimation model. However, in this paper, we do not focus on the proper initialization method and leave it for our future research.

Simulations
Monte Carlo simulations are conducted in this section to show the performance of the proposed method. Suppose that X 1 , X 2 , . . . , X n are n independent variables from the mixed exponential distribution in Equation (2). Here, we consider the model with two mixture components (m = 2) under the Type-II progressive censoring scheme. The parameters to be estimated are π 1 , π 2 , λ 1 , λ 2 with π 1 + π 2 = 1.
To simulate uncertainty on class labels, the original generated data are processed as follows. For the i th observed failure, an error probability q i is drawn randomly from a beta distribution with mean ρ and standard deviation σ. In real applications, q i can be obtained by experts or from historical data. With probability q i , the label of z i is replaced by a completely value of z i (with a uniform distribution over all the class labels). The contour function on class labels can be defined as: If there is no uncertainty, i.e., pl x i ,k = 1, the evidential likelihood degrades to the traditional likelihood. Let π 1 = 0.3, π 2 = 0.7; λ 1 = 0.6, λ 2 = 0.1. Set the initial value of the parameters as: In this experiment, we set the number of units placed on the life-test to be n = 20, 100, 200, 400. We consider different censoring proportions for a given sample size. Let N = n × (0.1, 0.2, 0.3). For the parameters to model the uncertain label information, we set ρ = 0.2, σ = 0.2. Three censoring schemes are considered: Given the sample size n and the censoring scheme, the E2M algorithm will be repeated s = 100 times. The bias and mean squared error (MSE) values of the estimators are reported. The results are illustrated in Tables 3-8.
From the tables, we can see that, with the increasing size of n and N , for the estimators obtained both by the proposed evidential estimation model and by the EM algorithm, the values of the bias and MSE decrease in most cases. This is consistent with our common sense about the maximum likelihood estimations. For different censoring schemes, it is obvious that the estimation efficiency is highly related to the proportion of the censored samples. When the number of units placed in the life-test is fixed, with the increasing number of censored data, the values of MSE increase.
Overall, in most cases, the biases and MSEs of the MELEs by the proposed evidential estimation model are smaller than those of MLEs by the EM algorithm, which can be attributed to the use of the uncertain prior information about the sub-class labels. When the proportion of the censored data is large, the MELEs obtained by the proposed estimation model are significantly better.

Conclusions
In this paper, a method for estimating the parameters of a mixed distribution model under Type-II progressive censoring was introduced. The main advantage of the proposed formalism is that it can combine uncertainty captured by the imperfect observation process due to censoring and the partial knowledge about the sub-class information in the mixture model. These two types of uncertain information can be represented in a united frame by the use of belief functions, and then, the evidential likelihood is derived, which can be seen as a generalized likelihood criterion. It can be maximized using the evidential EM algorithm. As an illustration, the method was applied to the MED model. A real dataset was considered, and some Monte Carlo simulations were carried out to demonstrate the performance of the resulting estimates. The results show that the estimations by the proposed method gain better estimation efficiency, which can be attributed to the effective integration of the uncertain information in the model.
More generally, the evidential estimation approach introduced in this paper is applicable to any other mixed distributions under different censoring schemes, where data uncertainty in the parametric statistical model arises from the imperfect observation process and reliability engineering applications such as the life data with multiple failure causes.