A New Entropy Optimization Model for Graduation of Data in Survival Analysis

Graduation of data is of great importance in survival analysis. Smoothness and goodness of fit are two fundamental requirements in graduation. Based on the instinctive defining expression for entropy in terms of a probability distribution, two optimization models based on the Maximum Entropy Principle (MaxEnt) and Minimum Cross Entropy Principle (MinCEnt) to estimate mortality probability distributions are presented. The results demonstrate that the two approaches achieve the two basic requirements of data graduating, smoothness and goodness of fit respectively. Then, in order to achieve a compromise between these requirements, a new entropy optimization model is proposed by defining a hybrid objective function combining both principles of MaxEnt and MinCEnt models linked by a given adjustment factor which reflects the preference of smoothness and goodness of fit in the data graduation. The proposed approach is feasible and more reasonable in data graduation when both smoothness and goodness of fit are concerned.


Introduction
Survival analysis is an important topic in actuarial science.Survival analysis has a long history and there are many kinds of approaches.The common approach to estimate the survival distribution is the parametric one, with which a theoretical survival distribution is specified and the parameters involved are determined by certain methods.There are a lot of different methods available in literatures, such as maximum likelihood estimators [1][2][3] and Bayesian estimators [4][5][6][7], etc.To our best knowledge, the choice of theoretical survival distributions or other prior distributions is difficult and critical for the implementation of these kinds of methods.In this paper, we will discuss how to utilize an entropy optimization approach to estimate mortality distribution based on the instinctive relationship between entropy and probability distribution.
Information-theoretic entropy, presented by Shannon in 1948 [8] and the entropy optimization principle was proposed by Jaynes in 1957 [9,10] and Kullback et al. (1951Kullback et al. ( , 1959) ) [11,12], have widened the application area of entropy and transformed it from a measure of information into a tool of statistics inference.Generally speaking, entropy optimization includes the Maximum Entropy Principle (MaxEnt) and Minimum Cross Entropy Principle (MinCEnt).MaxEnt estimates a probability distribution based only on the known information, without adding any other subjective information.MinCEnt is used to estimate a probability distribution which is closest to the prior one by minimizing the cross entropy between the estimated and the prior one.
Based on the instinctive defining expression for entropy in terms of a probability distribution, this paper will first review the idea of the application of entropy optimization rules in mortality distribution estimation.Then, in consideration of the goodness of fit and smoothness requirements in data graduation, a new approach will be proposed, which tries to combine the MaxEnt and MinCEnt methods to assure the degree of goodness of fit and smoothness of the estimation by use of an adjustment factor.

Review of Entropy Optimization Principles
There are two basic approaches to estimate probability distributions by using the concept of entropy: Maximum Entropy Principle (MaxEnt) and Minimum Cross Entropy Principle (MinCEnt).MaxEnt was proposed by Jaynes in 1957 [9,10].He thought that given just some mean values, there are usually an infinity of compatible distributions.MaxEnt encourages us to select the distribution that maximizes the Shannon entropy measure and is simultaneously consistent with the mean value constraints.This is a natural extension of Laplace's famous principle of insufficient reason, which postulates that the uniform distribution is the most satisfactory representation of our knowledge when we know nothing about the random variant except that each probability is non-negative and that the sum of the probabilities is unity.
Let be a discrete random variable on the probability space , where and which is unknown and need to be estimated by some known information denoted by such as mean value, variance, moment, etc. Mathematically, MaxEnt can be described as the following optimization model: (1) where is the entropy of a probability distribution on .If there is a prior distribution of , i.e., , then MinCEnt can be used to get another estimated distribution which is statistically closest to the prior distribution under the same constraints as MaxEnt.MinCEnt can be modeled as: (2) where is the cross entropy between the probability distribution and .The two model can be solved by Lagrangian approach [13].The solutions are: (3) and: (4) where and are Lagrangian multipliers for each model above.

Data Graduating with Entropy Optimization Principles
Assume that there is a living group whose original population is .At time , the living population is denoted as .And we assume that .Let be the death population from time to , then: ( The value of usually can be obtained from observation.In survival analysis, is usually used to denote the mortality probability which means the probability for an individual not to live another year.The mortality probability is a basic parameter to construct a life table and has very important function in life insurance actuarial science.In real situations, it can be obtained from sample data and denoted as , which is: It is easy to find that .However, is just an estimation of which has to be graduated to approximate the real mortality probability as closely as possible.This process is called graduation of data in life insurance.Furthermore, the estimation of is based on the sample information, so we must utilize the information in sample fully and try our best to add as little extra information as possible.This may provide a reasonable foundation to use MaxEnt and MinCEnt.It should note that the mortality distribution does not meet the requirement of unity.Hence, in order to use MaxEnt and MinCEnt, let: (6) and we define it as death probability, which is the ratio of the death population in different age groups to the original population.It is easy to find that can meet requirements of probability distribution, and it may be stated that: Hence, , the estimation of , can achieved once is determined.
Based on the sample information, the estimation of can be described as: (7) where is the estimation of , is the mean value of sample data and is the moment.On the other hand, can be viewed as a prior distribution of death probability, then a MinCEnt model can be established to estimation of as: (8) The solution of Equations ( 7) and ( 8) can be achieved by using Lagrangian approaches [13].The results are: (9) and: (10) where are Lagrangian multipliers.To clarify the feasibility and properties of the above estimation methods, the above models will be applied to estimate the mortality distribution on experimental data taken from Ananda et al. [5].

Example:
The following data (Table 1) are death times for 208 mice, which were exposed to gamma radiation.The data are divided into 14 groups by the time interval given in [5]   0.0337 0.1063 0.0366 0.0290 0.0315 0.0321 0.0337 0.0337 0.0337 0.0337 0.0337 0.3237 0.0432 0.0342 0.0334 0.0334 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.6184 0.6184 0.6184 0.6184 0.6184  Table 3 and Figure 3 are the comparison of results of MaxEnt (up to 5th moment) with results of maximum likelihood estimation (MLE) and Bayesian estimation (BE) method from [5].From above results, it can be found that:

MaxEnt
(1) From Table 3, results of MaxEnt have smaller value(a measure of goodness of fit to original data and calculated by Euclidean distance) and (a measure of smoothness by using 4th-moment of difference) than the results of MLE and BE approach.Hence, MaxEnt method of data graduating can be thought as a better method.

MaxEnt MLE BE
(2) From Table 2, results of the MinCEnt approach are the same as the prior distribution, and the reason is that and are calculated from the experimental data.If or is given by other information, the results will be different from the prior distribution.However, from this extreme situation we can find that MinCEnt focuses on the goodness of fit in data graduation.
(3) Comparing with the MinCEnt approach, the MaxEnt approach is better from the viewpoint of the smoothness of data graduation and is worse from the point of view of goodness of fit.

Data Graduating by Combining MaxEnt and MinCEnt
Smoothness and goodness of fit always are the most important consideration in graduation of data though many techniques have been developed.For example, the most widely used method until now, especially by North American actuaries for the construction of life tables, is the Whittaker-Henderson method of graduation.This method originates in the work of Bohlmann (1899) [14] and Whittaker (1923) [15], and contributions to the theory were made by Henderson (1924Henderson ( , 1925) ) [16,17], and others.The Whittaker-Henderson method gives the graduated values by minimizing the quantity: (11 ) where and are the weighted measure of goodness of fit to the original data and smoothness receptively.is the prior of mortality, is the estimated mortality, i.e., result of data graduation, and is the difference of (usually or higher). is a weight coefficient and is a positive adjustment factor between goodness of fit and smoothness.This method is widely used and has become a basic logic in graduation of data and many other approaches currently follow this lead.Generally speaking, the characteristics of different data graduation methods may lie on two sides, putting different emphasis on the goodness of fit and smoothness, and on how to measure smoothness and goodness of fit.Therefore, graduation of data can be looked as a bi-objective question.On one hand, the graduation results should be smooth and on the other hand they should be close to the original data.From the above section, it has shown that results of MaxEnt data graduation is smoother while those of MinCEnt is closer to the original data, so we propose a new approach of data graduation which can combine the both methods as in the following model: (12 ) where is a given adjustment factor between smoothness and goodness of fit.When , it is the MinCEnt approach, and when it is the MaxEnt approach.Because Equations ( 7) and ( 8) are solvable, it is easy to conclude that Equation ( 12) is solvable too.
In the above model, MaxEnt is proposed as a measure of smoothness and MinCEnt is proposed as a measure of goodness of fit.These two measures are integrated with a linear coefficient to reflect different weights on smoothness and goodness of fit.The reason to adopt a convex combination of smoothness and goodness of fit is to assure convexity of the objective function and solvability of the proposed model.In reality, how to decide the appropriate weight between smoothness and goodness of fit is a highly controversial topic.We propose that the value of can be determined as determined in Equation (11), which is usually determined by experimental approaches.
Based on the data of above Example, we calculate the estimated mortality distribution by this model.Table 4 and Figure 4 are results of this approach when taking 5th moment as constraints.From the above results, it can be concluded that the proposed method is feasible and the adjustment factor plays an important role in trading off smoothness and goodness of fit, which provides flexibility in graduation of data.

Conclusions
In this paper, based on the instinctive defining expression for entropy in terms of a probability distribution, the methods of application of entropy optimization in survival analysis were discussed.It was found that the results of the MaxEnt model focus on the smoothness of data graduation, while results of the MinCEnt model focus on the goodness of fit to the original data, so in consideration of the requirements of smoothness and goodness of fit in data graduation, a new approach was proposed to combine the results of MaxEnt and MinCEnt, which could provide a trade-off between smoothness and goodness of fit in data graduation by use of a given adjustment factor.

Figure 3 .
Figure 3. Results of Different Graduation Approaches.

Table 2 .
and then we calculate the mortality distribution by the MaxEnt and MinCEnt model.With the MaxEnt and MinCEnt approach, Table2shows the data graduation results under different moment constraints (up to 5th), and they are plotted in Figures1 and 2too.Results of MaxEnt and MinCEnt.

Table 4 .
Combination Method Results.