Change Point Test for Length-Biased Lognormal Distribution under Random Right Censoring

: The length-biased lognormal distribution is a length-biased version of lognormal distribution, which is developed to model the length-biased lifetime data from, for example, biological investigation, medical research, and engineering fields. Owing to the existence of censoring phenomena in lifetime data, we study the change-point-testing problem of length-biased lognormal distribution under random censoring in this paper. A procedure based on the modified information criterion is developed to detect changes in parameters of this distribution. Under the sufficient condition of the Fisher information matrix being positive definite, it is proven that the null asymptotic distribution of the test statistic follows a chi-square distribution. In order to evaluate the uncertainty of change point location estimation, a way of calculating the coverage probabilities and average lengths of confidence sets of change point location based on the profile likelihood and deviation function is proposed. The simulations are conducted, under the scenarios of uniform censoring and exponential censoring, to investigate the validity of the proposed method. And the results indicate that the proposed approach performs better in terms of test power, coverage probabilities, and average lengths of confidence sets compared to the method based on the likelihood ratio test. Subsequently, the proposed approach is applied to the analysis of survival data from heart transplant patients, and the results show that there are differences in the median survival time post-heart transplantation among patients of different ages.


Introduction
In fields such as medical research [1], survival analysis [2,3], and reliability studies [4], censored data often exist when the life model is a length-biased distribution.Typically, the type of censoring can be classified into left censoring, interval censoring, and right censoring, where the right censoring can further be categorized as Type I censoring, Type II censoring and random censoring, and random censoring is the most common type of right censoring [5,6].It refers to situations where the total period of observation is fixed, but subjects enter into the study at different points in time.Some subjects experience the events of interest.Others do not, and some are lost to follow-ups.Others will still be alive at the end of the study period.In random censoring, the censored subjects do not all have the same censoring time [5].In this case, if the censored data are deleted or the censored time is modeled as complete lifetime data, it may lead to the underestimation of important numerical characteristics such as the median or mean.Therefore, analyzing right-censored data in a reasonable manner is of significant importance for exploring the underlying patterns.In this article, we focus on random right-censored data.
The length-biased lognormal distribution is a kind of length-biased distribution based on lognormal distribution.Its probability density function is defined as follows: where x > 0, −∞ < µ < ∞, σ > 0. For convenience, it is briefly denoted as LBLN(µ, σ).Because the length-biased lognormal distribution is important in describing the characteristics of lifetime distributions, it has been studied extensively.Sansgiry and Akman considered it for product life modeling and deduced its corresponding reliability function [7,8].Ratnaparkhi and Naik-Nimbalkar studied the estimation problems of length-biased lognormal distribution [9].The existing literature primarily focuses on the statistical properties [7], parameter estimation [9], and practical applications of the length-biased lognormal distribution [8].However, there is relatively limited research on the parameter change point test of this distribution under random censoring.The problem of change point originates from the field of industrial quality control [10] and has become one of the hot topics in statistical research since it was proposed.The research contents mainly include two aspects.One is to test weather there is a change point, and the other is to detect the number and locations of change points if there are existing change points.Following over six decades of research and development, the study of the change point problem has been widely applied in epidemiology, biology, environmental science, and reliability engineering [11][12][13].For censored data, there are a few studies that have focused on change point testing and estimation for linear transformation models and hazard function parameters based on right-censored data.For instance, Kosorok and Song [14] utilized a score test statistic to investigate the estimation and testing problems of change points for regression coefficients in a linear transformation model for right-censored survival data.Rabhi and Asgharian [15] studied the problem of change point estimation for the hazard function under biased sampling and right-censoring data.Chen et al. [16] proposed a detection procedure based on the empirical likelihood for the changes in the mean residual life functions with right-censored data.
While some of the literature has studied change point detection procedures for linear transformation models and interesting reliability parameters with right-censored data, research on change point testing for length-biased lognormal distribution with random right-censored data has been limited.In this change point detection problem, we can approach it as a model selection problem, where we aim to select the better option between the null hypothesis with no change and the alternative hypothesis with at least one change.Therefore, methods commonly used for model selection can be applied to change-pointtesting problems.
Compared to the usual model selection problem, the change point problem introduces a special parameter-the change location.When the change point occurs near the middle of the process, there are no redundant parameters, making it easier to detect the change point.However, when it occurs near the beginning or the end of the sequence, the parameters of one segment of the data (the first part when it is near the beginning, the second part when it is near the end) become redundant.The traditional change point detection method is based on the likelihood ratio (LRT), which does not consider the contribution of the change location.
To address this limitation, we utilized the modified information criterion method, which considers the influence of change point positions on model complexity in the penalty term.This method was employed to detect the change point in the length-biased lognormal distribution with random right-censored data.Additionally, the asymptotic properties of the test statistics constructed based on the MIC with random right censoring have not been studied yet.This paper aims to investigate change point detection for the length-biased lognormal distribution under random right censoring.
The rest of this paper is organized as follows.In Section 2, a change-point-testing model for the parameters of the length-biased lognormal distribution is presented, which is based on random right censoring.A corresponding change point test method is introduced, and sufficient conditions for the positive definiteness of the Fisher information matrix under random right-censored data are provided, along with the asymptotic distribution of the test statistic.Then, a technique to calculate the coverage probabilities and average lengths of confidence sets of change point location based on profile likelihood function and deviation function is proposed.Simulations are carried out, with censoring rates of 10% and 30% under uniform and exponential censoring distributions, to indicate the performance of the detecting procedures in Section 3. In Section 4, we illustrate our method on a survival time dataset in heart transplant patients.Some discussion is provided in Section 5.

Change Point Model
Suppose T 1 , T 2 , . . ., T n are independent and identically distributed (i.i.d.) survival times.A common feature of survival data is the presence of right censoring, which is the most common type of censoring.In the presence of censoring, we only observe where C i are potential censoring times for n subjects, and they are often treated as random variables with distribution function H(η) in statistical inference.And we assume that If the independent random variables T 1 , T 2 , . . ., T n follow a length-biased lognormal distribution, that is T i ∼ LBLN(µ i , σ i ), then the density of the length-biased lognormal distribution under random right-censored data pair where We are interested in testing the null hypothesis of no change in θ i = (µ i , σ i ) of lengthbiased lognormal distribution with random right censoring against the following alternative hypothesis where k 1 , k 2 , . . ., k p are the locations of p changes in the sequence.Since the multiple change point problem can be simplified to a single change point problem by the binary segmentation method [17], we thus consider testing the null hypothesis (6) against the following alternative hypothesis: where k is the unknown change point location, and θ L and θ R are the parameters before and after the change point, respectively.
Then, the likelihood function under the null hypothesis is where h(•) is the probability density function of the censored distribution, and H(•) is the probability distribution function of the censored distribution.The log-likelihood function after removing the constant term is Similarly, the log-likelihood function under the alternative hypothesis is Because of the existence of nonlinear functions Φ(•), we obtain θ, θL and θR by the numerical method.

Modified Information Criterion Procedure
The likelihood ratio test (LRT) procedure is one of the most popular methods for parametric change point analysis.Alternatively, a change point problem can be viewed as a model selection problem.That is, we choose a better model between the null hypothesis and the alternative hypothesis.The choice of a model under the null hypothesis corresponds to the situation of no change.Otherwise, it corresponds to the case that at least one change occurs.Hence, the information criterion, such as the Akaike information criterion (AIC) [18] and Schwarz information criterion (SIC) [19], can be used for the change point test.The literature is fruitful with studies which have been conducted in this direction.For instance, Chen and Gupta [20] developed a binary procedure combined with the SIC to search all of the possible variance change points in a sequence of independent Gaussian random variables.Chen and Gupta [21] provided the testing and estimation of a single change point in means and variances of a sequence of independent Gaussian normal random variables based on SIC and the unbiased version of the SIC and so on.
Model complexity is a very important factor in model selection based on the information criteria.The dimension of parameter space is usually used to measure model complexity.However, the method based on SIC does not consider the influence of the change location, which may cause redundancy when the change nears the beginning or the end of data.To solve this issue, Chen et al. [22] modified the traditional information criterion by making the model complexity a function of the change point location, and denoted it as a modified information criterion (MIC).In this paper, we study the change point problem in parameters of the LBLN distribution with random right censoring based on MIC.Under the null hypothesis, the MIC is defined as where θ = ( μ, σ) is the MLE of θ = (µ, σ) and n is the sample size.The associated MIC statistics under the alternative hypothesis is defined as where k is the unknown change point location, θL and θR are maximum likelihood estimators of θ L and θ R .For the penalty term 4 + (2k/n − 1) 2 log(n) in Equation ( 13), when k → 1 or k → n, it means that the change point positions appear at both ends, respectively, and the penalty term in MIC(k) approaches 5 log(n).When k → n/2, the penalty term approaches 4 log(n), and the complexity of the model is minimized.This is because when the change point position is close to both ends, there are not enough data for estimating the parameters, which may result in a large variance of parameter estimation.Thus, we should set a larger penalty in this case.When the change point is close to the middle of the data, there are enough data to estimate the parameters, and the variance of the parameter estimation is small, so setting a smaller penalty is sufficient.That is, when the suspected change points appear at both ends, stronger evidence is needed to prove the existence of such changes.Therefore, when k closes to 1 or n, a larger penalty needs to be set.In order to ensure sufficient observations for parameter estimation, the range of Based on the minimum information criterion, a model with the smallest MIC value will be considered the best one to fit the data.Then, the min k∈K MIC(k) corresponds to the best model under H 1 .Further, we accept that is, the model with no change is the best one.We reject which indicates that the best model under H 1 is more appropriate to describe the data than the model under H 0 .It leads us to conclude that there exists at least one change in the data.Correspondingly, the location of change can be estimated as follows: In order to make the conclusion more statistically convincing, we construct the test statistics-based MIC as follows: Given critical value M under a certain significance level α, one can make the decision of accepting or rejecting H 0 by comparing M and M n .To calculate the critical value of M n , under certain conditions, the asymptotic distribution of the M n can be obtained as shown in Theorem 1, and the following lemmas are required to prove this theorem.

Lemma 1. Under the random right-censoring data (U
2 , then the Fisher information matrix based on these data is positively definite, where Proof of Lemma 1.According to the definition of random right-censoring data pairs in Section 2.1, the density function under right-censoring data is p(U i ; θ), where Then, the logarithmic density function is For convenience in writing, let Therefore, under the random right-censoring data, the Fisher information matrix of the density function p(U i ; θ)based on the length-biased lognormal distribution is as follows: where Further, the expectations for the second derivative are When calculating the above expectations, it is found that there is no analytical expression for some integrals.We use the Monte Carlo integration method to approximate.In order to reduce the number of these integrals and Monte Carlo approximations, the following expectations can be simplified according to the integration by parts method, Then, the elements of the Fisher information matrix are Thus, where ) 2 , then the asymptotic null distribution of the test statistic M n for the length-biased lognormal distribution with random right censoring is where M n is defined in Equation ( 15), and Wald conditions and regular conditions are listed in Appendix A.

Profile Likelihood Function and Deviance Function
The confidence distribution can be regarded as a distribution that depends on samples, which can be used to study the interval estimation and point estimation of the parameter of interest [23].In particular, it can provide the confidence interval of the parameter of interest at any nominal level through the confidence curve.In the change point problem, the location of the change point is a discrete variable, and the uncertainty analysis of the estimated change location is a challenging task.Similar to reference [24], we establish the confidence curve of the change point location based on profile likelihood and deviation function to analyze the uncertainty of the change location estimation.
Assuming X 1 , X 2 , . . ., X k is a sample from the population density function, f (x, θ L ), x 1 , x 2 , . . ., x k is the corresponding sample observation.Meanwhile, X k+1 , . . ., X n comes from the population density function f (x, θ R ), and x k+1 , . . ., x n is the corresponding sample observation.Then, the log-likelihood function of For a given change point k, the profile log-likelihood function can be obtained by maximizing the log-likelihood function (21), where the profile log-likelihood function is defined as follows: where θL , θR are MLEs of θ L and θ R for a given k.Then, k can be obtained by The deviance function of k is given by where X = (X 1 , X 2 , . . ., X n ).To construct k's confidence curve based on deviance function, we consider the distribution of D(k, X) at k, which is denoted as R k (t) = P k, θL , θR {D(k, X) ≤ t}.However, the location of the change point is discrete; R k (t) does not satisfy Wilks' theorem.Therefore, we compute R k (t) through simulation.And the confidence curve of k is defined by cc(k, x obs and it can be obtained through the following simulation, where B is a large number and B = 1000 typically.X * j is a sample from f (x, θL ) and f (x, θR ) with a given k.
For comparison purposes, we estimate the location of the change point through (14) for computing the deviation function.The specific simulation steps are as follows, Step 1.Given sample size n and change point k true , generating a group of random samples with change point k true based on parameters θ L and θ L , record them as x obs , and calculate the deviation D(k, x obs ) at each possible change point based on x obs .Step 2. Compute ( k, θL , θR ) based on Step 1.
Step 4. Repeat Step 1 to Step 3 for N = 1000 times to obtain N confidence curves about the change point k.
Step 5. Given significance level α, the corresponding confidence set of each confidence curve is K set = {k : cc(k, x obs ) <= 1 − α}.Step 6. Coverage probabilities of confidence sets.Based on the confidence set K set obtained from each confidence curve, then the frequency of k true in K set is the coverage probability of the confidence set at the corresponding confidence level 1 − α: I{k true ∈ K set }.
Step 7. Average size of confidence sets.Since the confidence set is the set of estimates of the change location, it is a set of discrete points.Thus, unlike the calculation method for continuous interval length, we define the number of elements in the confidence set as the size of the confidence set.Noting the number of elements in each confidence set K set as m i (i = 1, . . ., N), then the average size of confidence sets is obtained by In conclusion, when cp is closer to 1 − α, the smaller l set means the corresponding estimate method is better.

Simulation Study
Based on the no information characteristic of the uniform distribution and the memorylessness of the exponential distribution, these two types of distributions are commonly used as censoring distributions [16,[25][26][27][28].In this study, simulations are conducted in terms of Type I error, power, and coverage probabilities of confidence sets and average size of confidence sets based on uniform and exponential censoring distributions with various censoring rates, sample sizes and change locations.
Case 2: The sample size, change location, significance level, and parameters are the same as in Case 1. Different from Case 1, the censoring time observations are from Exp(m), where m is determined by the censoring rate of survival time observations.Two different values of m = {0.025,0.101} are calculated by P(T i > C i ) = 10% and P(T i > C i ) = 30%.

Critical Values and Probability of Type I Error
First of all, we obtain the critical values of the test and probability of Type I error through simulations.The probability of Type I error is approximated by the frequency of Type I errors in N repeated simulation experiments, given by P errorI = S nI /N, where S nI represents the number of Type I errors in N repeated trials with a sample size of n.According to the Central Limit Theorem, S nI /N approximately follows a normal distribution with mean of α and variance of α(1 − α)/N, that is,

Power Comparison
Tables 2 and 3 show the power of the likelihood ratio test method and the MIC criterion test method in the case of uniform and exponential censoring, respectively.
Taking the test results of uniform censored distribution as an example, we can obtain the following conclusions.First, excluding these combinations with equal powers, the test power calculated based on MIC is mostly higher than that based on the likelihood ratio under the same parameter settings.Second, the powers of the LRT-based method and MIC-based method increase with the increase in sample size.For example, when (µ R , σ R ) = (1.0,0.5), cr = 10%, and k = n/2, the power of the LRT-based method increases from 0.514(n = 50) to 0.910(n = 150), the power of the MIC-based method increases from 0.547(n = 50) to 0.913(n = 150).Third, in most cases, the closer the change point occurs to the midpoint of the data series, the higher the power of the test.For example, the power of the MIC-based method is 0.587 at k = 25, while the power is 0.427 at k = 12 under (µ R , σ R ) = (1, 1) and cr = 10%.Fourth, when the change point position and parameter settings are same, the power of the test with a censoring rate of 10% is significantly higher than that of a test with a censoring rate of 30% in most cases.For instance, the power of MIC-based method is 0.817 when n = 100, k = 25, cr = 10%, (µ R , σ R ) = (−2, 2).But the power is 0.539 when cr = 30%.This may be because a larger censoring rate masks the true pattern of change and increases the difficulty of the change point test.Finally, when the sample size, change point position, and censoring rate are the same, the greater the parameter change and the higher the power of the change point test.For example, when the parameter increases from (0, 1) to (0, 1.5), the test power based on MIC is (0.515, 0.769, 0.596) when n = 50, cr = 10%, and increases to (0.976, 0.999, 0.978) when the parameter increases to (1, 1.5).
Table 2.The power of the change point test under uniform censoring (α = 0.05).

Coverage Probabilities and Average Length of Confidence Sets
To evaluate the uncertainty of change point location estimation, some simulations are conducted to study the coverage probabilities and average lengths of confidence sets for change point location estimation under a uniform censored distribution with censoring rates of 10% and 30%, respectively.The specific results are shown in Tables 4 and 5. From Table 4, we can observe that the coverage probability of confidence sets for change point location estimation with a censoring rate of 10% is higher than that corresponding to a censoring rate of 30%, and closer to the given nominal level.
From Table 5, it can be seen that under the same sample size, the same change point location, and the same parameter settings, the average length of confidence sets for change point location estimation corresponding to a censoring rate of 10% is smaller than that corresponding to a censoring rate of 30%.When the change point location is closer to the middle of the dataset, the average length of the confidence set is shorter.From the perspective of testing methods, except for the case of equality, the testing method based on the MIC criterion can provide a shorter confidence set.Under the same testing method and parameter settings, as the sample size increases, the average length of the confidence set becomes shorter.In summary, the smaller the censoring rate, the closer the coverage probability of the confidence set for the change point location estimation is to the nominal level, and the shorter the average length of the confidence sets.

Application: Analysis of Survival Data for Heart Transplantation
This section applies the proposed testing method to the Stanford heart transplantation data from February 1980 [29].The Stanford heart transplantation project ran from October 1967 to February 1980.In total, 184 patients underwent heart transplantation.One patient's survival time was 0, which is excluded from this study.Therefore, there are 183 samples, of which 71 patients have censored status, resulting in a censored rate of 38.798%, and their censored situation is random.Because only patients who had undergone heart transplantation and were alive at the beginning of data collection are included as observations, this dataset is a length-biased dataset.Next, there is some right censoring in the survival time of patients, meaning that some patients were still alive after the end of the observation period, and their exact survival time could not be determined.For such a randomly right-censored dataset, the proposed method was used to test whether there is a change point in the data.
Before testing, patients were divided into 43 groups according to their age.The median survival time of each group of patients was calculated as shown in Figure 1a.A Q-Q plot was generated to test whether the median survival time of patients follows a lengthbiased lognormal distribution as shown in Figure 1b.The results indicate that the patient survival time approximately follows a length-biased lognormal distribution.To make this conclusion more convincing, we performed a Kolmogorov-Smirnov test on the dataset.The test statistic was 0.210 and the corresponding p value was 0.306.Since the p value is much greater than 0.05, we accepted the null hypothesis, indicating that there is sufficient evidence to suggest that the dataset follows a length-biased lognormal distribution.Combining the binary segmentation method, the LRT-based method and the MICbased method are applied for this data.The values of the test statistics and corresponding p values based on this dataset are calculated and presented in Table 6.In Table 6, p LRT and p MIC represent the p values for the two testing methods.From the p values, it can be observed that all p values are less than 0.05, which means that the null hypothesis for no change point should be rejected.Therefore, there are three change points in the dataset located at positions {15, 30, 34}, corresponding to ages 30, 46, and 50 in the data.This means that there are differences in the survival time among patients of different age groups after undergoing heart transplant surgery.The parameter estimation results before and after the change points are ( μ1 , σ1 ) = (1.807,2.308), ( μ2 , σ2 ) = (5.424,0.996), ( μ3 , σ3 ) = (6.346,0.056), and ( μ4 , σ4 ) = (3.598,0.945), respectively.

Conclusions And Discussions
Due to the frequent occurrence of random right censoring in life data, this study investigates the parameter change point test problem of length-biased lognormal distribution based on random right-censoring data.For the parameter change point model with random right censoring, a test statistic based on MIC is constructed, and the corresponding testing method is presented.Under the sufficient condition of the Fisher information matrix being positive definite, it is proven that the asymptotic distribution of the MIC-based test statistic is a chi-square distribution with two degrees of freedom under the null hypothesis.
To demonstrate the performance of the testing method, simulations are conducted to investigate the power of the change point test under uniform and exponential distributions.The simulation study considered various combinations of censoring rates and parameter settings.The simulation results indicate that the power of the method based on MIC is generally higher than that of the method based on LRT in most cases.Moreover, as the censored rate decreases, the testing power increases, which means that as the censoring rate decreases, the number of censored data decreases while the amount of complete lifetime data increases.Consequently, the data contain more valuable lifetime information, leading to an increase in the power of hypothesis testing.
For the purpose of assessing the uncertainty in estimation of change point location, the coverage probability and the average lengths of confidence sets are calculated.Simulation results indicate that as the censoring rate decreases, the coverage rate approaches the nominal level in most cases, and the average length of the confidence sets becomes shorter.From the perspective of testing methods, the method based on the MIC criterion yields shorter confidence sets.Under the same testing method and parameter settings, with an increase in sample size, the average length of the confidence sets decreases.
From the simulations, we observe that differences in sample size, change point location, and censoring rate all have an impact on the power of change point testing in random right-censored data.Therefore, factors such as testing methods, censored distribution, and censored rate should be considered comprehensively when investigating the change point problem with censored data.
The unique contributions of the proposed method can be concluded as follows.In our study, the change point problem is viewed as a model selection problem, where we aim to select the better option between the null hypothesis of no change and the alternative hypothesis of at least one change.As a result, methods commonly used for model selection can be applied to change-point-testing problems.However, the change point problem involves a special parameter, namely, the change location.In this paper, we employ the MIC-based method, which takes into account the contribution of change point positions to model complexity in the penalty term.This method is utilized to detect change points in the length-biased lognormal distribution with randomly right-censored data.
The traditional change point detection method is based on the likelihood ratio.In the simulation section, we conduct a comprehensive comparison of the detection performance between the MIC-based method and the LRT-based method.The results indicate that the coverage probability and the average length of the confidence sets for change point estimation are comparable between the two methods.However, under the same parameter settings, the test powers based on the MIC method are generally higher than those based on the likelihood ratio.
There is still much work to be performed in the research of change point detection for the length-biased lognormal distribution in future.(i) In terms of statistical theory, this study primarily focuses on the asymptotic distribution of the test statistic.The estimation of the change point location is discrete, and asymptotic properties, such as the consistency of the estimator, are challenging issues that need to be addressed in future research.(ii) In terms of the method application, the method proposed in this paper can be used for change point detection in data with length-biased distribution characteristics.This will help to establish accurate models, make reasonable predictions of patients' remaining lifetimes, and analyze the effectiveness of treatment methods or medications for patients.

Appendix A Wald Conditions and Regularity Conditions.
W1.The distribution of X is either discrete for (µ, σ) or is absolutely continuous for (µ, σ).W2.For sufficiently small δ and for sufficiently large ρ, the expected values are R3.For each θ ∈ Θ, then the Fisher information matrix I(θ) is positively definite.Theorem 1.Based on certain Wald conditions and regular conditions, and if

Figure 1 .
Figure 1.(a) Scatter plot of the median survival time.(b) Q-Q plot for length-biased lognormal distribution.

Table 1 .
with a high probability.This means that the values of S nI /N fluctuate within this interval, with the length of the fluctuation interval primarily determined by the number of repeated simulation experiments N. In our simulation, α = 0.05, N = 1000.Then, S nI /N is highly likely to fluctuate within [0.029, 0.071].And the specific results are shown in Table1.From columns 7-10 in Table1, it can be seen that the frequency of Type I error under both censoring rates almost all fluctuate within the range of [0.029, 0.071] based on uniform and exponential censoring, which means the Type I error is effectively controlled.Critical values and the probability of Type I error.

Table 3 .
The power of change point test under exponential censoring (α = 0.05).

Table 4 .
The coverage probability of confidence sets for change point location estimation under different parameter combinations.

Table 5 .
The average length of the confidence set for estimating the position of change points under different parameter combinations.

Table 6 .
Change point test results for heart transplant survival time data.