Statistical Inference of Inverted Exponentiated Rayleigh Distribution under Joint Progressively Type-II Censoring

Inverted exponentiated Rayleigh distribution is a widely used and important continuous lifetime distribution, which plays a key role in lifetime research. The joint progressively type-II censoring scheme is an effective method used in the quality evaluation of products from different assembly lines. In this paper, we study the statistical inference of inverted exponentiated Rayleigh distribution based on joint progressively type-II censored data. The likelihood function and maximum likelihood estimates are obtained firstly by adopting Expectation-Maximization algorithm. Then, we calculate the observed information matrix based on the missing value principle. Bootstrap-p and Bootstrap-t methods are applied to get confidence intervals. Bayesian approaches under square loss function and linex loss function are provided respectively to derive the estimates, during which the importance sampling method is introduced. Finally, the Monte Carlo simulation and real data analysis are performed for further study.


Inverted Exponentiated Rayleigh Distribution
Rayleigh distribution is a special form of the Weibull distribution, which was first proposed by Rayleigh when he studied the problems in the field of acoustics. Ref. [1] discussed the generalization of the Rayleigh distribution and its application to practical problems. Ref. [2] introduced Bayesian approaches to study the statistical inference of Rayleigh distribution. Ref. [3] combined progressively type-II censored data with the Rayleigh model and studied the estimations of parameters. Rayleigh distribution has only one parameter, and its applications in practice are limited and not flexible. Many scholars have extended it to two-parameter distributions. Refs. [4,5] began to study the estimations of generalized Rayleigh distribution. Ref. [6] extended progressively type-II censoring scheme to generalized Rayleigh distribution. Recently, under non-informative prior, Ref. [7] studied the estimation of the shape parameter of generalized Rayleigh distribution. Noticing that the hazard function of generalize Rayleigh distribution is monotone when the shape parameter is greater than 1 2 but the hazard function of inverted exponentiated Rayleigh distribution is nonmonotone, which is more realistic in real life, more scientists become interested in this distribution. Ref. [8] analyzed the estimation of the inverted exponentiated Rayleigh distribution under progressively first-failure censoring scheme and Ref. [9] further studied its prediction. Ref. [10] considered inverted exponentiated Rayleigh distribution with adaptive type-II progressive hybrid censored data. Ref. [11] applied this distribution to analyzing coating weights of iron sheets data.
A random variable X follows inverted exponentiated Rayleigh distribution (IERD) if its probability density function (pdf), the cumulative distribution function (cdf) and hazard function take the forms, respectively, as x 2 ) θ , x > 0; θ, λ > 0, (2) h(x; θ, λ) = 2θλx −3 e − λ where θ and λ are the shape and scale parameters respectively, and they are both positive. Figure 1 shows the pdfs and hazard functions for different θ and λ of the IERD. It is obvious that both functions are nonmonotone. The IERD can be treated as a useful alternative to some other lifetime distributions such as Weibull distributions, and it has been applied in many fields. Specially, it can effectively be utilized to most experiments of electrical or mechanical devices, patient treatments, and so on.

Joint Progressively Type-II Censoring Scheme
Type-I and type-II censoring schemes have been mostly used. The definition of a type-I censoring scheme is that the observations are terminated at a fixed time and the failure times are recorded. The definition of a type-II censoring scheme is that the observations are terminated until a sufficient and prefixed number of units fail. However, these two censoring schemes do not work well when the lifetimes of tested units are relatively long. More censoring schemes have been proposed later, and the popular and attractive ones are progressive censoring schemes that remove test units every failure time, not just at the last time. Ref. [12] gave details about the progressive censoring schemes. Progressive censoring schemes can be classified as progressively type-I and progressively type-II. The progressively type-I censoring scheme is used in multiple lifetime models, like Refs. [13][14][15]. Many scholars have studied progressively type-II censoring scheme, too. Ref. [16] began to apply it to generalized Gamma distribution. Refs. [17,18] discussed different distributions under progressively type-II censoring scheme.
The censoring schemes mentioned above are all used in one-sample problems. In real life, we face and need to consider two or more samples from different assembly lines. The joint progressively type-II censoring scheme is quite useful in comparing the lifetimes of products from different assembly lines and has received a lot of attention in recent years. Ref. [19] first introduced the joint progressively type-II censoring scheme to study two populations from different exponential distributions. Ref. [20] extended it to multiple exponential populations and provided statistical inference. In the same year, they proposed a new two-sample progressively type-II censoring scheme [21] to study two populations more conveniently. Recently, Ref. [22] discussed the estimation of generalized exponential distribution based on joint progressively type-II censored data. Ref. [23] estimated the two unknown parameters of the inverse exponentiated Rayleigh distribution based on progressive censored data using the pivotal quantity method. In Ref. [24], the reliability of the stress-strength model was introduced and derived for the probability P(T < X < Z) of a component strength X relation between two stresses, T and Z, which follows to have IERD with different unknown shape parameters and common known scale parameter.
The joint progressively type-II censoring (JPC) scheme can be briefly described as follows. Suppose sample A and sample B are taken from two populations. Sample A has m units and sample B has n units. The two samples are merged and used for a life testing experiment. Set k as the amount of failures and let r 1 , · · · , r k be the numbers of units withdrawn every time, which satisfy ∑ k i=1 (r i + 1) = m + n. At the time of the first failure w 1 , r 1 units are intentionally withdrawn from the remaining units. r 1 units include s 1 units withdrawn from sample A and t 1 units withdrawn from sample B. Similarly, at the time of i-th failure w i (i = 2, 3, · · · , k − 1), r i units are withdrawn from the remaining m + n − i − ∑ i−1 l=1 r l units. r i units include s i units withdrawn from sample A and t i units withdrawn from sample B. Here r i is prefixed, and s i and t i are random but satisfy s i + t i =r i . The experiment will be finished at the k-th time and we withdraw all the rest of the surviving units. Besides, we let z i =1 (or 0) if the i-th failure is from sample A (or sample B). According to the scheme above, we introduce three elements to express the observed censored sample, ((w i , s i , z i ), · · · , (w k , s k , z k )). Let k 1 = ∑ k i=1 z i be the total number of failures from sample A, and k 2 = ∑ k i=1 (1 − z i ) be the total number of failures from sample B. Figure 2 below shows the scheme. In this paper, we make statistical inference and analyze two samples from twoparameter IERD under joint progressively type-II censoring scheme. The maximum likelihood estimation and Bayesian inference are applied to get point estimations and interval estimations. Expectation-Maximization (EM) algorithm is used to calculate the maximum likelihood estimates, which is a three-dimensional optimization problem. Then, the observed information matrix is derived. Bootstrap-p and Bootstrap-t methods are adopted to compute the confidence intervals. In Bootstrap-t method, the observed information matrix obtained is essential. When doing Bayesian inference, non-informative prior and informative prior are provided. With these two Bayesian priors, we obtain Bayesian estimates based on both square loss function and linex loss function. To get the arithmetic solution, importance sampling technique is used. Monte Carlo simulation and real data analysis are performed to compare the performance of different methods.
The rest of the paper is arranged as follows. In Section 2, the EM algorithm is proposed to derive MLEs. In Section 3, we calculate the observed information matrix based on the missing value principle. Then, Bootstrap methods are used to obtain confidence intervals in Section 4. Bayes inference based on non-informative and informative priors is presented in Section 5. In Section 6, the methods above are compared through Monte Carlo simulation and data analysis.

Maximum Likelihood Estimators and EM Algorithm
Suppose sample A has m units and their lifetimes independently follow IERD(θ 1 , λ) and sample B has n units and their lifetimes independently follow IERD(θ 2 , λ). According to the JPC scheme, we have the observed data ((w i , s i , z i ), · · · , (w k , s k , z k )) for prefixed r 1 , · · · , r k . Therefore, the likelihood function without the normalizing constant can be written as where s l + t l = r l for l = 1, · · · , k, S means the times of failures from sample A and T means the times of failures from sample B. According to (1) and (2), the likelihood function becomes where the function becomes the following form: When i ) θ 1 s i is a strictly decreasing function of θ 1 for a fixed λ and it decreases to 0. So when k 1 = 0, L(θ 1 , θ 2 , λ |data) is a strictly decreasing function of θ 1 for fixed θ 2 and λ, which implies that maximum likelihood estimators (MLEs) do not exist. For k 2 = 0, the situation is similar. Thus, we assume that k 1 > 0, k 2 > 0 to avoid the trivial cases.
The log-likelihood function can be expressed as ln L(θ 1 , θ 2 , λ |data) =k ln(2λ) + k 1 ln θ 1 + k 2 ln θ 2 Take the partial derivatives of ln L(θ 1 , θ 2 , λ |data), let them equal 0 and the roots of the equations are the maximum likelihood estimates of (θ 1 , θ 2 , λ). It is found that the equations are non-linear, and it is infeasible to calculate the solutions directly. The New-Raphson method needs second partial derivatives to inevitably calculate and the computation is cumbersome.

EM Algorithm
In this subsection, we use the EM algorithm to get MLEs. Ref. [25] proposed the method to get maximum likelihood from incomplete data and Ref. [26] introduced its applications in a generalized partial credit model. The observed data are available and the missing data are the lifetimes of those units withdrawn. Complete data of the experiment consist of the observed data ((w 1 , s 1 , z 1 ), · · · , (w k , s k , z k )) and missing data. At the time of i-th failure, s i units are withdrawn from sample A (i = 1, 2, · · · , k). Assume that the lifetimes of the s i units are (u i1 , u i2 , · · · , u is i ). Similarly, t i units are withdrawn from sample B. Assume that the life-times of the t i units are (v i1 , v i2 , · · · , v it i ). The missing data are ((u 11 , u 12 , · · · , u 1s 1 ), · · · , (u k1 , u k2 , · · · , u ks k ), (v 11 , v 12 , · · · , v 1t 1 ), · · · , (v k1 , v k2 , · · · , v kt k )) which can be expressed as (U 1 , · · · , U k , V 1 , · · · , V k ) = (U, V). The complete data data * (say) are ((w 1 , s 1 , z 1 ), · · · , (w k , s k , z k ), U, V). The log-likelihood function for the complete data is obtained as The EM algorithm is divided into two steps. The pseudo log-likelihood function at the 'E'-step is given by: The conditional pdfs of the U ij and V il are expressed respectively as (see [27]) for i = 1, · · · , k. Then, the following formulas are obtained: The expectations related to the functions of V il are similar and omitted here. In the 'M' step, we calculate the estimates of θ 1 , θ 2 and λ, which maximize the pseudo log-likelihood function with finite iterations. Take partial derivatives of (8) to get the expressions of θ 1 and θ 2 in terms of λ: Equation (8) can be rewritten as the function only for λ and the problem turns to be a one-dimensional optimization problem. At the q-th iteration (q = 1, 2, · · · ), let (8), maximize the function and obtain λ (q) . For fixed θ 2 can be derived by using the formulas below: . (12) The iterations are stopped until |λ (p) − λ (p−1) | 0.0001, |θ

Observed Fisher Information Matrix
In this section, the observed information matrix is calculated and will be used in Section 4. According to the idea of [28], we have and the notations below: I c : the observed information matrix of complete data; I 1 : the observed information matrix of one unit from sample A; I 2 : the observed information matrix of one unit from sample B; I o : the observed information matrix of observed data; I m : the observed information matrix of missing data.
For sample A, ). The missing observed matrices can be obtained as follow: ).
Here, CPF means the conditional pdf. Expressions of all the expectations related to θ 1 are given in Appendix A. The expressions above related to sample B are similar and omitted.
After getting the observed information matrix, for every fixed (θ 1 , θ 2 , λ), the covariance matrix of estimators is the inverse matrix of the observed information matrix.

Bootstrap Methods
In this section, the Bootstrap methods are introduced to construct confidence intervals. The algorithms of Bootstrap-p method and Bootstrap-t method are respectively given in Algorithms 1 and 2.

Algorithm 1
The algorithm of Bootstrap-p method.
Step 5: Repeat steps 3 and 4 N times.
• Bootstrap-t method: Algorithm 2 The algorithm of Bootstrap-t method.
Steps 1 to 5 are the same as those in Algorithm 1.

Bayesian Inference
In this section, we study the Bayesian inference for the three parameters based on non-informative prior and informative prior. The estimates under symmetric loss function (square loss function) and asymmetric loss function (linex loss function) are derived.
Notice that all the parameters range from 0 to +∞, so let Gamma distributions be the prior distributions. Gamma distribution (Ga(α, β)) has the following density: It is assumed that θ 1 , θ 2 and λ are independent and follow different Gamma dis- The joint posterior density is given by where π(θ 1 , θ 2 , λ) means the joint prior density.
Observing that in (18) the first two items are Gamma densities, in fact, we check the shape and scale parameters and confirm that The distributions of θ 1 and θ 2 both depend on λ. Generating the random variates of λ is the key. The well-known log-concave method is considered and discussed first.
ln π(λ|data) can be written as where C means the normalization constant of π(λ|data) and the second-order partial derivative of ln π(λ|data) is given by It cannot be determined whether (22) is negative or positive. So, log-concave method is not appropriate here. The importance sampling method [29] is adopted. (21) is rewritten as below: • Square loss function: Square loss function is given below: where Θ means the parameter vector, g(Θ) means any function of Θ. The Bayesian estimate of any function of (θ 1 , θ 2 , λ), say g(θ 1 , θ 2 , λ) under square loss function can be obtained aŝ The following Algorithm 3 can be used in Bayesian estimates under square loss function.

Algorithm 3
Bayesian estimates under square loss function.
Step 3: Repeat Steps 1 and 2 for M times.

• Linex loss function:
Linex loss function is given below: The Bayesian estimate of any function of (θ 1 , θ 2 , λ), say g(θ 1 , θ 2 , λ) under linex loss function can be obtained aŝ The following Algorithm 4 can be used in Bayesian estimates under linex loss function.

Algorithm 4 Bayesian estimates under linex loss function.
Steps 1 to 4 are the same as those in Algorithm 3.
Define sumc (ρ) = ∑ ρ l=1 weight (l) for l = 1, 2, · · · , M. When sumc (ρ−1) < α 2 and sumc (ρ) α 2 , record the g (ρ) . When sumc (η) < 1 − α 2 and sumc (η+1) 1 − α 2 , record the g (η) . Then a (1 − α)% credible interval of g(λ, 6. Simulation and Data Analysis 6.1. Numerical Simulation In this section, different m, n, k and r 1 , r 2 , · · · , r k are taken and simulation results through the methods mentioned above are displayed. We consider different censoring schemes. For the rest of the paper, the notation (0 (5) , 5 (5) , 0 (10) ) means that no unit is withdrawn for the first 5 times, 5 units are withdrawn for 5 times, and no unit is withdrawn for the last 10 times. For other notations, the meanings are similar. Let The average values (AVs) and mean square errors (MSEs) of MLEs are calculated by repeating the EM algorithm 1000 times. To be more practical, both the true values and θ 0 1 = θ 0 2 = λ 0 = 7 are set as the initial values in EM algorithm. The results are given in Tables 1 and 2. In Tables 1 and 2, the estimates of λ get closer to the true value than those of θ 1 and θ 2 under the same censoring scheme. The results are acceptable because λ is the same in two samples. With more information, λ gets better estimates. When considering the same sample size and failure times (k), there are better results if the withdrawal processes are executed in the middle rather than at the beginning and end. Besides, Figure 3 shows that a more dispersed withdrawal scheme, which means withdrawing the units for several times but not one time, yields better estimates. Comparing censoring schemes (n = 20, m = 25) with (n = 25, m = 20), the estimates of θ 2 are better as n increases. The MSEs ofθ 2 decrease.It is reasonable because if the sample size n increases and more information of sample B can be utilized, better estimates can be obtained. When keeping n and m fixed and increasing k, estimations under these schemes are better. The plots of MSEs were shown below: (take k = 20, n = 40, m = 45 as an example).  The parameters of prior distributions are a 1 = 2, b 1 = 1, a 2 = 1, b 2 = 2, c = 3, d = 2 and linex loss constant δ = 2. Then, the Bayes estimates for informative prior under square loss function (sayθ 1S ,θ 2S ,λ S ) and linex loss function (sayθ 1L ,θ 2L ,λ L ) are compared based on 1000 replications. The results are given in Tables 3 and 4.  Tables 3 and 4 show that MSEs are bigger and AVs are closer to the true value under linex loss function than the results under square loss function. Bayesian inference performs better than MLEs in terms of MSEs. However, Bayesian estimators mostly underestimate the true parameter values and MLEs do not show this pattern. In the schemes with larger sample sizes, MLEs get better AVs but bigger MSEs than Bayesian estimators. The tables also reveal that MSEs become small with k increases. As true value increases, the methods mentioned above all show bigger MSEs, which means that the results are more dispersed. Next, we compare Bayes estimates for non-informative prior and informative prior. According to [30], let hyperparameters a 1 = 0.0001, b 1 = 0.0001, a 2 = 0.0001, b 2 = 0.0001, c = 0.0001, d = 0.0001. The patterns of different schemes are similar to those mentioned above. Observing Tables 5 and 6, it is found that if the samples have more units but the failure times are relatively less, Bayesian estimators with informative priors perform better than those with non-informative priors. When the failure times are relatively more, in another word, there are more observed data even the sample sizes are small, and the results with these two methods have little difference. In addition, the results are closer to the true value under square loss function. In a word, Bayesian estimation with informative prior under square loss function performs best among the methods discussed.
Besides the point estimates, Bootstrap-p, Bootstrap-t, and Bayesian methods are used to obtain the 90% confidence/credible intervals. In Tables 7 and 8, the average lengths (ALs) and coverage percentages (CPs) are calculated based on 1000 replications. In Bootstrap methods, boot-time is set as 1000 (N = 1000). Here, IP means under informative prior density and NIP means under non-informative prior density.    Table 7 displays the ALs and CPs of confidence intervals with Bootstrap-p and Bootstrap-t methods. The contrast between Bootstrap-p and Bootstrap-t indicates that CPs are similar but ALs of Bootstrap-t are wider than those of Bootstrap-p. Therefore, the Bootstrap-p method is more appropriate to get the confidence intervals. Table 8 shows the ALs and CPs of credible intervals under informative prior density and non-informative prior density. The contrast indicates that ALs of NIP tend to be longer than those of IP but CPs are less than those of IP. Obviously, IP performs better than NIP. Besides, for a fixed scheme, θ 2 has the best estimates of credible intervals. Figure 4 displays the contrast of CPs among different methods which also indicates that Bootstrap-p and Bayes method with informative prior are more suitable for interval estimates. Compared to Bootstrap methods, Bayesian method yields better results of credible intervals. In the condition of large k, CPs increase a lot with both the Bayesian method and Bootstrap methods. When there are sufficient units, choosing the Bayesian method with informative priors is better.   The plots of CPs were shown below: (take k = 20, n = 40, m = 45 as an example).

Real Data Analysis
In this part, we analyze one application in the coating weights with two real data sets and apply the approaches put forward in the sections above. The data sets are from ALAF (formerly called Aluminium Africa Limited) industry, Tanzania, which contain the coating weights (mg/m 2 ) by chemical procedure on the top center side (TCS) and by chemical procedure on the bottom center side (BCS). For each data set, there are 72 observations. The data were also analyzed by [11]. The data are shown below: For convenience, the data sets are divided by 10. First, to verify that IERD is suitable for the data sets, we fit it for each data set and have Kolmogorov-Smirnov(K-S) distance test. By calculating the largest difference value of empirical cumulative distribtuion functions and the fitted distribution functions and comparing that value with the 95% critical value, we find the data sets can be fitted well. The results are shown in Table 9: K-S distances are less than 95% critical value, so the IERD fits well for both data sets. Figure 5 shows the fitness of the data sets separately. Then, the likelihood ratio test is used to test if the scale parameters can be considered as the same value. H 0 : λ 1 = λ 2 . The p-value is calculated to be 94.3%. Obviously, the null hypothesis cannot be rejected. The two scale parameters can be considered as the same. Based on the null hypothesis, the MLEs are obtained asθ 1 = 15.55,θ 2 = 14.87,λ = 57.08.
Use the complete data above and generate observed data for the following three censoring schemes, (0 (18) , 2 (36) , 0 (18) ), (0 (35) , 36 (2) , 0 (35) ), and (36, 0 (70) , 36). Take MLEs for complete data as the initial values of EM algorithm. Then, the AVs and MSEs of MLEs can be obtained in Table 10.  To verify the stablitity of iteration, we change the initial guesses and plot the trend of the estimates. The results are shown in Figure 6. The iteration times are 15 times in (a), 23 times in (b) and (c), and 26 times in (d). It is observed that with the same initial guess of λ, the more dispersed scheme need less iteration times. When the initial guesses are not close to the true value, the iteration times will increase but the processes are still stable.
In this case, we cannot get the informative priors, so all Bayesian estimates are based on non-informative priors. Tables 11 and 12 record the results of Bayesian method with non-informative priors. The 90% confidence/credible intervals with Bootstrap methods and Bayes estimates for non-informative prior are displayed in Table 13.
From the real data, some facts are displayed. Bayesian point estimates for noninformative prior under square loss function are higher than those under linex function. Besides, the first scheme (0 (18) , 2 (36) , 0 (18) ) corresponds to higher estimates in point estimations and shorter interval lengths in interval estimations. Table 13 reveals that Bayesian inference under non-informative priors yields shorter interval lengths than Bootstrap-p and Bootstrap-t methods. More dispersed schemes and Bayesian inference are preferred in real data analysis.

Conclusions
In this article, we studied two samples following inverted exponentiated Rayleigh distribution under joint progressively type-II censoring scheme. It was supposed that the shape parameters were different but that the scale parameters were the same. The expectation-maximization algorithm was applied to obtain the estimates of MLEs. The performance of MLEs and Bayesian estimators for non-informative prior and informative prior was compared. Bootstrap-p, Bootstrap-t, and Bayesian methods were used in intervals estimations. Importance sampling technique was introduced when calculating Bayesian estimates. The contrast between estimates under square loss function and linex loss function was also studied.
During the point estimation process, Bayesian inference under informative priors turned out to be the best method and many patterns of censoring scheme were concluded. To study the confidence intervals, Bootstrap methods were applied and evaluated. Besides, observed Fisher information matrix played a key role in getting Bootstrap-t intervals based on the missing value principle.
In the future, the methods can be extended to many other distributions such as multivariate Gaussian distribution with zero mean. Ref. [31] proposed a total Bregman divergence-based matrix information geometry (TBD-MIG) detector and applied it to detect targets emerged into nonhomogeneous clutter. We are still doing more work the situations where the scale parameters are different and the samples are not independent.

Data Availability Statement:
The data presented in this study are openly available in [11].

Conflicts of Interest:
The authors declare no conflict of interest.
The elements of the Fisher information matrix are expressed as follows: −E( ∂ 2 ln f IERD (x;θ 1 ,λ)