A Phase-Type Distribution for the Sum of Two Concatenated Markov Processes Application to the Analysis Survival in Bladder Cancer

: Stochastic processes are useful and important for modeling the evolution of processes that take different states over time, a situation frequently found in ﬁelds such as medical research and engineering. In a previous paper and within this framework, we developed the sum of two independent phase-type (PH)-distributed variables, each of them being associated with a Markovian process of one absorbing state. In that analysis, we computed the distribution function, and its associated survival function, of the sum of both variables, also PH-distributed. In this work, in one more step, we have developed a ﬁrst approximation of that distribution function in order to avoid the calculation of an inverse matrix for the possibility of a bad conditioning of the matrix, involved in the expression of the distribution function in the previous paper. Next, in a second step, we improve this result, giving a second, more accurate approximation. Two numerical applications, one with simulated data and the other one with bladder cancer data, are used to illustrate the two proposed approaches to the distribution function. We compare and argue the accuracy and precision of each one of them by means of their error bound and the application to real data of bladder cancer.


Introduction
Stochastic processes have proved to be useful in the evolution of processes that can take different states over time. In this regard, Markov models have been widely used for this purpose in fields like medical research and engineering as well as the phase-type distributions. A phase-type (PH) distribution is the distribution of the time until absorption in a finite-state absorbing Markov chain [1][2][3][4]. A PH distribution is represented by (α, T) where α is an initial probability vector and T is a squared matrix representing the rates between the transient states in the Markov process. Phase-type distributions are a powerful tool in stochastic models of real systems. Numerous applications have been reported in queueing theory models [5] and reliability in the context to model the failure of electrical components [6], in bladder cancer [7] in the context of survival analysis, and applications in shock and wear systems [8], among others. These distributions also arise in the evolution of some chronic diseases since the process goes through a series of states or phases [9,10] and in applications to the length of stay at hospitals [11][12][13]. See Chapter 1 in [14,15] for a sample of the diversity of contexts where the phase-type distributions are used (telecommunications, finance, teletraffic modeling, biostatistics, drug kinetics, and survival analysis).
On the other hand, one of the major interests in cancer research is to model its evolution from the beginning of the disease until death, going through a number of states before reaching the absorbing state (death). However, sometimes, the real data do not include the complete evolution of the disease because this evolution is treated in different and independent units within the same department or hospital and consequently the real data are registered separately. For this reason, given the need to describe the complete evolution of the bladder cancer (from a primary tumor to the extirpation of the bladder), in a previous paper [16], we concatenated two Markov processes. Each one analyzed one part of the evolution of this chronic disease because in that study, we had two different and independent real databases belonging to different units of the La Fe University Hospital in Valencia (Spain). The first database described the disease progression from a primary tumor through several states to a more aggressive tumor. The second database described the evolution from the aggressive tumors to the bladder extirpation (death of the bladder), also through several states. Both bases were disconnected and our objective was to connect both and be able to study the evolution of the whole disease divided in two phases from the primary tumor (start state in the first database) to the removal of the bladder (absorbing state in the second database).
In the modeling of these two phases of the disease, we considered two consecutive homogeneous Markov processes with state spaces {1, 2 . . .,m + 1} and {1, 2 . . .,n + 1}, respectively, and glued together to become a unique continuous-time Markov chain with m + n + 1 states, identifying the state m + 1 with the first state of the second Markov chain. The absorbing state of the resulting concatenated Markov process was the m + n + 1 state. We have also considered two random continuous and independent variables representing two absorption times: the absorption time of the first process (the appearance of the first aggressive or progression tumor) and the absorption time of the second one (the bladder extirpation). Both variables were considered PH-distributed with representations (α, T) and (β, S), respectively, with α and β initial probability vectors in each process. T and S were squared matrices of orders m and n, respectively, representing the rates between the transient states in each process. In order to study the survival function of the absorption time of this concatenated process (m + n + 1 state), we obtained a new distribution function of the sum of these two variables, also PH-distributed.
Thinking about the practical application of our approach, for example, if we find large matrices because of dealing with many states, we aim to build approximations that facilitate the task. In Section 2, we present the distribution function of the previous paper [16]. In Section 3, we develop a first approximation of this distribution function to avoid the calculation of an inverse matrix in its expression due to the possibility of bad conditioning of the matrix. We start developing the expression of that inverse by a numeric series. We also compute an error bound for this first approximation. In a second step, we improve this result with a new and more accurate approximation. We have also developed an error bound for this second approach that allows us to compare them and improve the result of this study. In Section 4, we illustrate the results of both approaches with a numerical application of simulated data. We compare them with the exact distribution function in [16]. In Section 5, we apply this methodology to real data of bladder cancer, and we argue the improvement and the usefulness of the second approach compared to the first one. Finally, we conclude the study in Section 6 with some discussion.

The Survival Function of Two Concatenated Markov Processes
Throughout this paper, we use the Kronecker matrix form just like the Kronecker sum and product, denoted with ⊕ and ⊗ (mathematical notation), respectively. The main definitions about these two concepts as well as the properties are in Appendix A. We start with the following result proved in [16]. Theorem 1. Let X 1 and X 2 be nonnegative random two independent variables representing the absorbing times in the Markov chains with state spaces {1, 2, . . ., m,m + 1} and {1, 2, . . .,n, n + 1}, respectively. The groups of states {1, 2, . . .,m} and {1, 2, . . ., n} are the transient states in each Markov process and the states m + 1 and n + 1 are the absorbing ones. X 1 and X 2 are PH-distributed and the representations are (α, T) and (β, S), respectively. Thus, the survival function S(x) = P(X > x) = 1 − F(x) for the sum X 1 + X 2 is given by S(x) = α e Tx e m + α m+1 βe Sx e n + e n ⊗ α 1 0 e sS x⊕(1−s)Tx ds vec T 0 βx (1) where α m+1 is the initial probability of reaching the absorbing state m + 1, e k = (1,1,. . .,1) ∈ R k×1 , T 0 = −Te m , being T 0 a matrix of order m × 1 with the entries t i,m+1 , i = 1,. . .,m, that represent the absorbing rates from the transient states and the function vec() stacks the columns of a matrix into a column vector (defined in Appendix A). The expression for the calculated integral is See Results (15)- (17) in [16].

The Approximated Survival Function for Two Concatenated Markov Processes
The calculation of the inverse of the previous matrix S x ⊕ (−Tx) given in Equation (2) can present serious difficulties if this matrix is badly conditioned. To avoid a possible bad conditioning, we perform one approximation for the survival function S(x).

An Approximated Survival Function
First, we consider the distribution function, F(x) = 1 − S(x), using the expression given in Equation (1), and so on, let us calculate an approximation for the term Let us call the kth term of Equation (4) by the function f k (S, T,s, x) and taking into account that is a convergent series of constant terms. Then, applying the Weierstrass criterion of uniform convergence, the Taylor series expansion f (S, T, s, x) = ∑ ∞ k=0 f k (S, T,s, x) converges uniformly in s ∈ [0, 1]. Thus, the integral in Equation (3) can be obtained by integrating Equation (4) term by term. For this, as S ⊗ I and I ⊗ T commute (see the demonstration in Appendix A for k = 2), the binomial of Newton can be used in each term f k (S, T,s, x) and so one gets to Integrating each term of Equation (7), it follows that Now, if we take in Equation (8), the following finite sum until the pth term, we have as an approximation to I(x) and we can substitute Expression (9) by the integral of the distribution function Equation (3). Then,F 1 (x) will represent an approximation of the distribution function F(x), and consequently, an approximation of the survival function would bê We are interested in obtaining an error bound for this first approach. For this, if M = max{ S 2 , T 2 }, it follows that where 0 < θ < 1, and so Therefore, the following result has been proved: Theorem 2. Let X 1 and X 2 be nonnegative random independent variables representing the absorption times in two homogeneous Markov processes with state space {1, 2, . . ., m,m + 1} and {1, 2, . . .,n, n + 1}, respectively, where {1, 2, . . .,m} and {1, 2, . . .,n} are the transient states in each process and m + 1 and n + 1 are the absorbing ones. Assume that both variables are PH-distributed with representation (α, T) and (β, S), respectively. Then, an approximation of the survival function for the sum X 1 + X 2 is given by Equation (11), where α m+1 is the initial probability of entering the absorbing state m + 1, e k = (1,1,. . .,1) ∈ R k×1 , and T 0 = −Te m . Moreover, an error bound of the approximation error is given in Expression (13), where M = max{ S 2 , T 2 } and · F is the Frobenius norm.
Notice that the error bound increases exponentially with time x, and consequently, for predictions, it would not be accurate. Thus, in the following section, we propose a second approximation method for the distribution function to improve this last obtained result.

Improving the Approximation to the Survival Function
The aim of this section is to improveF 1 (x) in order to get a closer approximation to F(x). For this, we use the expression of F(x) obtained in our previous paper (see Equation (15) in [16]) given by the expression In [16], it is demonstrated, step by step, that Equation (14) is equivalent to 1 − S(x) in Equation (1) at the beginning of this paper (see Result (17) in [16]). For our purpose, we start working with the last term of Equation (14), specifically with We are going to reduce the previous bound Inequality (13). For this, let us consider the following expression with k ≥ 0 positive integer and taking into account the Schur-Fréchet derivative and its function composition [17,18], we have the following expression for j = 1,2,. . .,k. See the demonstration of Expression (17) in Appendix B. Now applying the function vec for both terms of this last expression, and using property 1 of Appendix A, we have for the first term Now, in the second term of Expression (17), we have Notice that in Equation (19), property 1 of Appendix A is used again, and in the first and second steps of Equation (20), properties 2 and 3, respectively. In the last two steps of Equation (20), properties 3 and 4 of Appendix A are used.
Now, taking the equality Equations (18) and (20) Let us call I j =  (21) is If now we denote φ k,p the approximated function Equation (9) for the integral Then, for j = k, and taking Equation (22) into account, we have Similarly, for j = k − 1, one gets and finally, for j = 1, we have and so where Thus, the approximation for In this way, the second approximation for F(x) iŝ and consequently, the survival function iŝ Finally, we construct an error bound for the difference between F(x) and this second appproximationF 2 (x). As by Equation (22), we have and given that M = max{ S 2 , T 2 }, we establish where the difference I k − φ k,p 2 has been calculated previously with Equation (12). Thus, the error bound for the survival function and this second approximation is We illustrate the result in the following theorem Assume that both variables are PH-distributed with representation (α, T) and (β, S), respectively. Then an approximation of the survival function for the sum X 1 + X 2 is given in Equation (29), where α m+1 is the initial probability of entering the absorbing state m + 1, e k = (1,1,. . .,1) ∈ R k×1 , T 0 = −Te m , and φ k,p is given by Equation (9). Moreover, an error bound of the approximation error is given in Expression (32), where M = max{ S 2 , T 2 } and · F is the Frobenius norm.
Comparing Expressions (13) and (32), it is clear that this last one is more accurate, given that M 2 k < M and e Mx 1− 1 2 k < e Mk where k is big. Moreover, this second error bound does not increase with time as fast as the previous one, Equation (13), and so the convergence is assured, a subject that is interesting in survival or reliability analysis for long-term predictions. This suggests that Equation (29) is better than Equation (11); however, Equation (29) requires the computation of more matrix exponentials. Thus, both approximations may be useful in applications, and we apply them to a numerical example of simulated data and real bladder cancer data in the next sections.

Numerical Application with Simulated Data
In this section, we compute the two approximations obtained above with simulated data. In order to clarify the two proposed approaches, first, we considered two homogeneous Markov processes and the corresponding concatenated process (Figure 1). We have considered the T and S transition matrices with m = 10 and n = 11 states, respectively, where each entry different from zero has been uniformly distributed in the range [0.01, 0.05]. We have chosen that range for the rates between transitional states by similarity with our real data; then, the absorbing rate is much greater (see expression T 0 = −Te m in [16]). All simulated rates (N = 100) of the uniform distribution for each transition have been generated randomly with the Mathematica software in the above range 90% of the simulations according to this uniform distribution led to badly conditioned matrices for the survival function S(x), which corroborates the need to apply the two developed approximations,Ŝ 1 (x) andŜ 2 (x) of this study. The first approximated survival function Equation (11) has been compared with the theoretical model Equation (1) previously developed in [16]. In Figure 2, we present the results of this fit between both functions, S(x) andŜ 1 (x), for k = 5 and p = 7 (9). We can observe a slight mismatch between both survival functions from 60 units of time onward, and in 80 units of time, the probability is over 60% and 50%. This is because the rates between the transitional states are small while these rates have more weight from each transitional state to the absorbing state. In Table 1, the error bound Inequality (13), calculated for the difference betweenŜ 1 (x) and S(x), increases from x = 30 until x = 80, reaching a large difference and leading to poor predictions in our calculations. In these cases, to get a lower bound of this difference, about 10 −15 , for example, leads to an increase in the number of p terms in Equation (9), at least up to p = 20 in x = 30 units of time. Notice that the obtained error bound for this first approximation Equation (9) increases exponentially with time x, giving inaccurate predictions and a high error level.
In order to improve this first approximated distribution function, we proposed a second approximation to get a better convergence (Equation (29)). In Figure 3, we can check the accuracy between both survival functions, S(x) andŜ 2 (x), where absence of mismatches between them can be observed for high units of time. We can observe that the lines for both functions are superposed. In this case, the error bound Inequality (32) is low enough (about 10 −14 ) with only p = 7 and k = 5 as in Equation (9). The fit between both functions is almost perfect, even after one hundred units of time (error bound about 10 −8 ).
In order to evaluate both functions,Ŝ 1 (x) andŜ 2 (x), we can conclude that the second approach is more accurate than the first one compared to the exact model Equation (1) and with the same computational cost. It can also be concluded thatŜ 1 (x) is always calculated with more terms (higher p) andŜ 2 (x) with p and k where p < p and with a error bound much smaller than withŜ 1 (x). For this reason, we will only consider the second approximation in our application with the real bladder cancer data of the next section, given that we want to assure an error bound by 10 −10 with only k = 5 and p = 7 terms. Table 1. Error bound calculated for both approximations,Ŝ 1 (x) andŜ 2 (x), compared to the exact model S(x) in the concatenated Markov process with simulated data from a uniform distribution for the transition rates.

Error Bound
Error Bound

S(x)
S  2 (x) Figure 3. Survival function, S(x), and the second approximation,Ŝ 2 (x), in the concatenated Markov process for k = 5 and p = 7 with simulated data from a uniform distribution.

Illustration with a Real Data Set in Bladder Cancer
Bladder cancer can be classified into two well-differentiated types: non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive tumor (MIT). Each type of tumor has a different follow-up protocol and treatment. Between 30-80% of the patients with a non-muscle-invasive primary tumor (the first tumor) have a recurrence of the disease (same type of tumor) and between 1-45% of these patients progress to muscle-invasive tumor (more aggressive). Patients with a muscle-invasive primary tumor may have a progression of the disease (with the possibility of the extirpation of the bladder) after some recurrences or directly. The first aim of the present work is to model both processes of the disease and, after the concatenated process, from the first NMBIC to the bladder extirpation (absorbing state m + n + 1).
For this, we have considered two Markov chains of an absorbing state, each one of them: (1) a first process, from a NMIBC primary tumor to a muscle-invasive tumor (first absorbing state), and (2) a second process starting in a muscle invasive primary tumor until the arrival to a progression with the final of the bladder (extirpation, the second absorbing state). In this context, we have worked with two continuous variables, X 1 and X 2 , where each one of them represented the two absorbing times mentioned above, each one of them PH-distributed. Two independent databases (with a different follow-up protocol and treatment) have been collected in this application, both from the Urology Department of at La Fe University Hospital in Valencia (Spain), each one with for a Markov process. Both databases cover between January 1995 and January 2010.
In the first stage of the disease, five transient states and one absorbing state were considered: the primary non-muscle invasive tumor (NMIBC), a first recurrence and until a fourth recurrence (reappearance of a new NMIBC with similar characteristics to the first initial tumor). The absorbing state is the appearance of a progression to a muscle invasive tumor (MIT), a much more aggressive tumor. The first database collected the information of 800 patients with mean follow-up period of 22.74 months.
The second stage showed three transient states and the absorbing state: three muscle invasive tumors (MIT) and the end of the bladder (extirpation). This database consisted of 160 patients with a MIT and 14.53 months of mean follow-up. The progression in a MIT is much faster than when the patient presents a NMIBC.
We have concatenated both independent processes to study the Survival function of a patient who had a primary non-muscle invasive tumor NMIBC (start of the illness) until the extirpation of its bladder (the worst episode in this disease) going through all the states (transient and absorbent ones) in each process (see Figure 1) with m = 5 and n = 3. For this, we have obtained the distribution function of the sum of the two variables (also PH-distributed) for each absorption state with the aim to obtain the survival function, S(x), and the two approaches,Ŝ 1 (x) andŜ 2 (x) developed in this work.
The squared matrices T and S from each real database, referred to the rates between the transient states in each process, were estimated using the maximum likelihood method by the msm() function in the multistate modeling with R software and the msm package [19]. This function msm models transition rates with hidden Markov chains and data with observations with censored states. The dimensions of both matrices are five and three, respectively, corresponding to the number of transient states considered in the real data of each process as it has been mentioned above. The transient rates for T and S resulted in a range of [0.001, 0.0001], smaller quantities than in the simulated data of the previous section.
We have also calculated the mean absolute percentage error (MAPE) for both approximations, S 1 (x) andŜ 2 (x), compared to the exact function S(x): 0.0200759 and 1.1701 × 10 −13 , respectively. The value of the MAPE forŜ 1 (x) andŜ(x) shows the obvious mismatch between both functions mentioned in simulated data. We have also calculated the difference between both functions with S(x) to conclude the same result that with simulated data: more precision with the second approach (Table 2). Finally, we have represented in the Figure 4 the exact function S(x) and the second approximation, S 2 (x), where we can see the good match between these two functions developed for the survival function of the concatenated process. Table 2. Error bound calculated for both approximations,Ŝ 1 (x) andŜ 2 (x), compared to the exact model S(x) in the concatenated Markov process with simulated real bladder cancer data.

Error Bound
Error Bound  Figure 4. Survival function, S(x), and the second approximation,Ŝ 2 (x), for k = 5 and p = 7 of the concatenated Markov process for real bladder cancer data.

Concluding Remarks
In this study, we developed two approaches to a distribution function, proposed in a previous paper, to avoid the calculation of the inverse of a matrix (due to the possibility of a badly conditioned matrix) in the expression of that distribution function. Of the two developed approaches, we find the second one is much more accurate for performing predictions, corroborated with the calculation of the error bound for both approaches.
Regarding the computations of both approaches, small values of p and k terms in the expressions were used to get the desired accuracy. There is no problem in increasing the number of terms p and the parameter k to improve the precision with the second approximation. The mathematical expressions and the calculations were presented in a closed form that allowed algebraic treatment and the corresponding computational implementation. Moreover, this is easily interpretable and has a relatively low computational cost. Calculations were performed with the Mathematica software, and all codes are available from the authors on request.
The aim of this work arose from the need to develop an approximation to the survival function for the disease when a database from the start of the illness to the bladder extirpation is not available, but two disconnected bases are available. The two real databases in this work and the previous paper were from different units at the La Fe University Hospital (hence independent of each other), and we were interested in examining the process until bladder extirpation from the beginning of the disease (the primary NMIBC). However, the presented approach is general, and this analysis can be applied to other similar data in chronic diseases or in a reliability context.
Summarizing, the results obtained in this paper with their applications allow to compute the survival or reliability function when the matrix in Equation (2.2) of the resulting process is poorly conditioned. Definition A2. Let A ∈ R m×m and B ∈ R n×n be matrices and I k denotes the identity matrix of order k; then, the Kronecker sum is given by the following expression A ⊕ B = A ⊗ I n + I m ⊗ B.
These properties are used through the paper. Let A ∈ R m×n , B ∈ R p×q , C ∈ R n×o , and D ∈ R q×r be matrices; then, This last result is used in the composition of functions X → X 2 k followed by exponentiation and then squaring e X = e X Firstly, the standard integral formula of the Fréchet derivative of the exponential map at X in the direction N is applied to the function e where T and S are nonsingular matrices of order m and n, respectively. T 0 is a m × 1 matrix [16].