Discrete Time Hybrid Semi-Markov Models in Manpower Planning

: Discrete time Markov models are used in a wide variety of social sciences. However, these models possess the memoryless property, which makes them less suitable for certain applications. Semi-Markov models allow for more ﬂexible sojourn time distributions, which can accommodate for duration of stay effects. An overview of differences and possible obstacles regarding the use of Markov and semi-Markov models in manpower planning was ﬁrst given by Valliant and Milkovich (1977). We further elaborate on their insights and introduce hybrid semi-Markov models for open systems with transition-dependent sojourn time distributions. Hybrid semi-Markov models aim to reduce model complexity in terms of the number of parameters to be estimated by only taking into account duration of stay effects for those transitions for which it is useful. Prediction equations for the stock vector are derived and discussed. Furthermore, the insights are illustrated and discussed based on a real world personnel dataset. The hybrid semi-Markov model is compared with the Markov and the semi-Markov models by diverse model selection criteria.


Introduction
Manpower planning is a key aspect of modern human resources management. The principal aim of manpower planning is the development of plans dealing with future human resource requirements. In this way, an effective manpower planning policy can avoid future shortages and excesses of staff members. Such an imbalance between the actual and the required staff is highly undesirable because it would lead to higher costs and/or less profits. Since manpower planning itself is concerned with the description and prediction of large groups of employees, whose behaviour can be unpredictable at the individual level, it is only natural to study aggregated data, where statistical patterns may appear. So, it is no surprise that the use of mathematical models for manpower planning can be traced back to at least 1779 when Rowe used a career-modeling plan in the Royal Marines [1].
Since the 1960s and the dawn of the computer age, such models have become an essential tool for the modern manager. Pioneering work concerning mathematical approaches for manpower planning was carried out by Vajda [2,3] and Bartholomew [4,5], whereas Almond and Young [6] were the first to study a real world application of an open homogeneous Markov chain model. Since then, various other manpower planning model approaches have been considered. In the work of Vassiliou [7], the non-homogeneous Markov system was introduced. This idea was expanded upon by Vasilliou et al. [8]. Other work regarding non-homogeneous discrete time (semi-)Markov models includes the works of Papadopoulou [9] and Dimitriou et al. [10] as well as continuous time (semi-)Markov models by McClean et al. [11,12], Papadopoulou et al. [13] and Mehlmann [14]. It is important to remark that the scope of those models is not limited to humans [7], as is the case in manpower planning, but that it can be any biological being or object. Some examples of other populations modeled by this class of stochastic processes [15] include ecological modeling [16] and biological Markov population models [17] and financial applications [18]. It is remarkable that, until recently, discrete time homogeneous semi-Markov models were somewhat neglected in manpower planning.
One of the assumptions of a Markov model is that the length of time a person stays in a state S i before going to another state S j only depends on the state S i itself. Moreover, the waiting time distribution, often called the sojourn time distribution, exhibits the memoryless property. Which means that it does not account for possible duration of stay effects. In this case, the sojourn time distribution is in fact a geometric distribution. However, in practice those assumptions may pose an unrealistic limitation. An alternative model that may solve those problems is a semi-Markov model, which can be viewed as a natural extension of a Markov model. In recent years, the use of discrete time semi-Markov models became more and more popular in various fields such as reliability and survival analysis [19], DNA analysis [20,21], disability insurance [22], credit risk [23][24][25][26], and wind speed and tornado modeling [27,28]. Moreover, insights regarding discrete time semi-Markov models contribute to the use of continuous time semi-Markov models [29].
Markov models and semi-Markov models both have advantages: Markov models are less complex and more transparent. In the manpower planning context, for example, this makes a classical Markov model easier to interpret and understand for a manager. Semi-Markov models, on the other hand, allow capturing duration of stay effects due to their more general sojourn distributions. This provides motivation to build hybrid models that incorporate the best of both approaches. In the previous work of Guédon, so-called hidden hybrid semi-Markov chains are presented that combine Markovian states with semi-Markovian states [30]. Since it is possible that, for a particular state, some of the transitions are Markovian while other transitions are semi-Markovian [22], the present paper introduces the concepts of Markovian transitions and semi-Markovian transitions. In this way, Markovian and semi-Markovian transitions are a further refinement of Markovian and semi-Markovian states.
Furthermore, both Markov and semi-Markov models require longitudinal data for their parameter estimation. In practice, however, longitudinal data are often left truncated or right censored, which may lead to estimation problems [31], especially in a semi-Markovian context, where more general sojourn time distributions are allowed. Previous works [11,32] suggest alternative approaches [23,33] to deal with this drawback, such as restricting the analyses to the items for which there is complete information, artificially truncating the data or using adapted formulas for the estimation of the parameters.
In this paper, we discuss the advantages and disadvantages of Markov and semi-Markov manpower planning models in Section 2. In Section 3, we present the so-called hybrid semi-Markov model, which uses a mix of Markov (geometric) and more general (Weibull) sojourn time distributions, offering some advantages: the hybrid semi-Markov model allows for capturing duration of stay effects where useful and reduces the number of parameters to estimate, where possible. In this way, the hybrid semi-Markov model enables one to improve on the semi-Markov model in case the amount of available data is limited. Finally, in Section 4, we use a real world personnel dataset to illustrate our insights. The hybrid semi-Markov model is compared with the Markov model as well as with the semi-Markov model based on several criteria.

Markov and Semi-Markov Manpower Planning Models
To model a manpower system, one has to account for three different types of flows: the incoming flows (recruitments), the internal flows between the different personnel categories and the outgoing flows (wastage). We consider G + 1 states, given by G personnel categories and one absorbing state W, corresponding to the wastage. First of all, the classical Markov model [4] will be discussed; afterwards, a semi-Markov model for manpower planning based on [19] will be proposed. An interesting reference regarding (semi-)Markov processes is [34]. The discussion on the classical Markov model (in Section 2.1) and the semi-Markov model (in Section 2.2) contributes in defining the new hybrid semi-Markov model in Section 3.

Markov Model
All models in this section are Markov processes and generalizations thereof, such as semi-Markov processes. However, all models have their limitations and are subjected to restrictions. In this setting, one of the assumptions we make is the so-called Markov property, which states that the probability of reaching a future state is independent of the past states and only depends on the present state. For a second-order Markov chain, this probability of entering a state at time t + 1 also depends on the state at time t − 1. To assess the Markov property, we will use Equation (1) below, which tests a first-order against a second-order Markov chain. The use of a classical Markov model without meeting the first-order assumption may lead to false conclusions and incorrect analysis results. An extensive discussion about the often overlooked need to check for the Markov property can be found in [35]. For a given stochastic process {X t } t with G + 1 states {S 1 , · · · , S G+1 } and data over a time horizon [0, T], we will use the following χ 2 goodness of fit test to verify the first-order assumption, as described in [35,36], with index set G = {1, 2, · · · , G, G + 1} and n ij = ∑ T−1 t=0 ∑ m k=0 n ij (t, k) , where n ij (t, k) is the number of persons that are at time t in the state S i with grade seniority k and at time t + 1 in state S j and m is the maximal grade seniority observed in the database. p jl is the maximum likelihood estimator of the transition probability p jl with N j (t) = ∑ i∈G ∑ m k=0 n ij (t − 1, k) being the number of persons in state S j at time t , where p ijl is the maximum likelihood estimator of the transition probability p ijl and where n ijl (t, k) is the number of persons that are at time t in the state S i with grade seniority k at time t + 1 in the state S j and at time t + 2 in the state S l : . (2) Only non-zero p jl are taken into account for computing χ 2 e . Under the assumption that the Markov property is satisfied, i.e., that we are looking for a Markov chain of order 1, the test statistic χ 2 e has a χ 2 -distribution with (G + 1) 3 degrees of freedom. If this assumption holds, we can proceed with the classical Markov approach, in which transition probabilities are assumed to be equal for individuals within a category.
The use of time homogeneous Markov chains in manpower planning is well-known (see, for example, [4]) . Given the G states corresponding to different personnel categories S 1 , · · · , S G and a wastage state W = S G+1 , one can define a Markov process {X t } t on those states with transition probabilities p jl that can be estimated by Equation (2). If we denote the stock vector at time t by N(t) = (N 1 (t), N 2 (t), · · · , N G (t), W(t)), and write R(t) = (R 1 (t), R 2 (t), · · · , R G (t), 0) for the recruitment vector at time t, then we obtain the prediction equation [4] for the stocks at time t + 1 : where P is the matrix with elements p jl . Due to their simplicity, time homogeneous Markov chain models are used in a wide variety of domains and applications. As there are relatively few parameters to estimate in a time homogeneous Markov chain model, they are not too data demanding. However, on the other hand, they cannot be used to account for duration of stay effects and they are less flexible due to the so-called memoryless property, which implies that their sojourn time distributions are geometrical distributed by construction. This shortcoming is accounted for in semi-Markov models.

Semi-Markov Model
Again, consider a system with a finite number of states {S 1 , · · · , S G , S G+1 } and let us denote the set of indices by G = {1, 2, · · · , G, G + 1}. Furthermore, let T n and J n denote, respectively, the time of the n-th transition and the state occupied after the n-th transition. A semi-Markov process is equivalent to a Markov renewal process [37] and is completely determined by an initial distribution δ = (δ 1 , · · · δ G , δ G+1 ) and a discrete semi-Markov It can be shown that {J n } n itself is a Markov chain via i.e., p ∞ ij is the probability, starting from S i , that the next state will eventually be S j , regardless of the duration time. We write P ∞ = (p ∞ ij : i, j ∈ G) for the associated transition matrix. This allows for the following decomposition: where f = ( f ij (k) : i, j ∈ G, k ∈ N) consists of the sojourn time distributions, conditioned by the next state to be visited: A few remarks are in order at this point. First of all, only actual transitions are accounted for, in the sense that transitions to the same state are prohibited, so that p ∞ ii = 0 for every i ∈ G. Furthermore, instantaneous transitions are not allowed either: the chain has to spend at least one unit of time in a state, which corresponds to f ij (0) = q ij (0) = 0 for every i, j ∈ G.
The main difference in regard to the Markov chain model is the fact that the sojourn time distributions f can be any discrete distribution, incorporating the possible duration of stay effects. Note that a Markov chain with transition matrix P = (p ij : i, j ∈ G) itself can be viewed as a semi-Markov chain with geometrically distributed sojourn times for which In order to use this framework for a manpower planning model, one starts in the same way as in the case of a Markov chain model with dividing the population in G + 1 states and determining the corresponding stock vector N(t). In contrast with the Markov chain model, we incorporate the grade seniority of the employees in our model. Instead of a vector N(t) consisting of the total number of people in each personnel category at time t, every entry of N(t) corresponds to a vector of a certain length m containing the number of employees with seniority l, with 1 ≤ l ≤ m. This disaggregation of the entries of N(t) results in a matrix, whose columns will be denoted by N(t, k) as in Figure 1. So, the first column, N(t, 0), corresponds to the employees with grade seniority 0 at time t, the second column, N(t, 1), corresponds to the employees with grade seniority 1 at time t, ... up to the m + 1-th column that corresponds to the employees with grade seniority m at time t , where m is the maximal grade seniority observed in the database. We will call this matrix the seniority based stock matrix. Note that N i (t, k) corresponds to the number of employees in state S i with grade seniority k at time t and that ∑ m k=0 N i (t, k) = N i (t) for each i ∈ G and every t ∈ N. The vectors N(t, k) enable the expression of the prediction equation for the stock vector as in Theorem 2. An equivalent approach is presented in [8], where the semi-Markov system is transformed into a Markov system. While the present paper considers a separate vector N(t, k) for each grade seniority k, in [8], this information is gathered into one vector. ... ...
... Now, we can estimate a discrete semi-Markov kernel q using the maximum likelihood estimator [19]: where n i = ∑ j =i ∑ T−1 t=0 ∑ m k=0 n ji (t, k), i.e., the total number of visits to state i. Furthermore, we can use this q to calculate the grade seniority transition matrices P(k) = (P ij (k) : i, j ∈ G), the one-step ahead transition matrix for group members with grade seniority k, is defined by: In practice, P(k) can be calculated in the following way.

Theorem 1.
For all k such that ∑ h∈G ∑ k−1 m=0 q ih (m) = 1 we have Proof.
Combining all of the above, we arrive at: For a semi-Markov system, the prediction equation for the stock vector at time t + 1 is as follows: where N(t, k) is the stock vector of people with grade seniority k at time t, R(t + 1) is the recruitment vector at time t + 1, P(k) is the one-step ahead transition matrix for people with grade seniority k and m is the maximum of all grade seniorities.
At first glance, it would seem that the semi-Markov model is a preferable model due to its more flexible sojourn time distributions and its greater generality. However, to build a semi-Markov model, one has to estimate more parameters, such that a sufficiently long time series of data may be necessary to avoid problems with overfitting [38]. This may limit the utility of semi-Markov modeling in manpower planning as a data horizon of, for example, less than ten years may be insufficient for the realization of some transitions and so for the required data for estimating the semi-Markov kernel q.

Hybrid Semi-Markov Model
In Section 2.2, we note that a Markov chain with transition matrix P = (p ij : i, j ∈ G) can be viewed as a semi-Markov chain, i.e., a semi-Markov chain can be considered as an extension of a Markov chain, where more general and flexible sojourn time distributions are allowed. However, in practice, it can be difficult to decide which approach is more adequate to model the manpower system in question. Due to its greater generality, the semi-Markov chain may look as the most preferable model at first sight. However, in practice, the data requirements to result in accurate parameter estimates may limit the utility of semi-Markov models in manpower planning [38]. For these reasons, the presented hybrid semi-Markov model examines, for each pair of states (S i , S j ), whether the transition from S i to S j can be considered as a Markov transition or should be modeled as a semi-Markov transition. In order to make an adequate choice for a particular transition from S i to S j between a Markov and a semi-Markov approach, one can use a technique which was introduced in [22] and which is briefly discussed below.
The semi-Markov hypothesis is tested at the level of the sojourn time distributions f ij . A transition from S i to S j can be considered Markovian if its corresponding sojourn time f ij is geometrically distributed. Under the geometrical hypothesis, the equality f ij (2) = f ij (1)(1 − f ij (1)) holds and a significant deviation of f ij (1)(1 − f ij (1)) − f ij (2) from zero has to be seen as evidence to the contrary, i.e., evidence in favor of a (more general) sojourn time distribution. The test statistic, as introduced in [22], is given by: where n ij = ∑ T−1 t=0 ∑ m k=0 n ij (t, k) denotes the observed total number of transitions from S i to S j and where f ij (k) is the maximum likelihood estimator of the probability f ij (k) (see [19]): Under the geometrical hypothesis H 0 , the test statistic S ij is asymptotically normally distributed.
Note that for a system with G + 1 states, this test has to be run (G + 1) * G times as this test permits us to make a decision about the sojourn time distribution for each f ij individually, which allows for a so-called hybrid semi-Markov model-a semi-Markov model that incorporates the sojourn time distributions of the classical Markov model for those pairs (S i , S j ) where geometric sojourn time distributions may be assumed and that enables the use of more general sojourn time distributions for those pairs (S i , S j ) where necessary. This approach can be seen as a further generalization of techniques used in [22,39], where the same criterion was used to make a decision about the sojourn time distributions at the level of the states instead of the transitions. Since the sojourn time distribution is determined per pair (S i , S j ), and hence for each possible transition, the hybrid semi-Markov model is based on transition-dependent sojourn time distributions. In this way, we can construct a model that unites the best of the Markovian and (pure) semi-Markovian worlds, as we will only have to estimate extra parameters of the sojourn time distributions if those parameters might improve the goodness of fit.
Previous studies concerning semi-Markov models often used the discrete Weibull distribution [19] whenever the geometrical hypothesis is rejected. The choice for the discrete Weibull distribution is motivated by the fact that the discrete Weibull distribution can be viewed as a more flexible generalization of the geometric distribution [40]: so geometric(k, p) = dweibull(k, 1 − p, 1). Note that, in the semi-Markov setting, the prediction equation of the stock vector (Equation (15)) is nothing more than a generalization of the prediction equation of the stock vector in the Markov setting (Equation (1)), as in the latter case the P(k) are equal for all k. So, to arrive at the prediction equation for the stock vector of the hybrid semi-Markov model, one can recycle Equation (15), where P ij (k) will be dependent on k due to the sojourn time distributions associated with the (S i , S j ) for which the Markov hypothesis does not hold.
A procedure to decide on whether to use a Markov model, a semi-Markov model or a hybrid semi-Markov model is graphically represented in Figure 2. Group all q ij in the hybrid semi-Markov kernel q = (q ij (k)) Calculate the onestep ahead transition matrices P(k) Equation (14) Calculate the seniority based stock matrices N(t, k) (See Figure 1) Calculate the one-step ahead stock prediction Equation (15) Estimate the recruitment vector R(t + 1) yes no no yes

Data Handling
The subject of this research is modeling (a subsystem of) the academic staff of the Vrije Universiteit Brussel (VUB). An anonymized personnel database including the career paths of all academic staff at the VUB between 1999 and 2013 was at our disposal for this study. The aim is to estimate the number of teaching staff in the various grades for the near future. In our study, we have chosen to avoid left censoring issues: since the analyzed data contain only a limited number of data lines where left censoring is involved, we did not take into account the first observed state of an employee in case it was subjected to left censoring. We corrected for right censoring in computing the estimations of the parameters [41].
After extensive data cleansing, we obtained the career paths of 1585 relevant employees. Only data from 1999 to 2012 were included to avoid look-ahead bias as we aim to estimate the number of teaching staff in 2013. Concerning the division of the personnel in G states, we opted for the common hierarchical academic ranking structure in Belgium as in Table 1. Furthermore, we included an additional state, state S 5 , which corresponds to wastage in our system. Contrary to most applications in the literature, we did not consider the wastage state to be an absorbing state as it regularly happens during academic careers that people who leave their universities are employed again later on. This happens in our dataset for 371 cases. The observed transitions between the states in our system are visualized in Figure 3.

Data Handling
The subject of this research is modeling (a subsystem of) the academic staff of the Vrije Universiteit Brussel (VUB). An anonymized personnel database including the career paths of all academic staff at the VUB between 1999 and 2013 was at our disposal for this study. The aim is to estimate the number of teaching staff in the various grades for the near future. In our study, we have chosen to avoid left censoring issues: since the analyzed data contain only a limited number of data lines where left censoring is involved, we did not take into account the first observed state of an employee in case it was subjected to left censoring. We corrected for right censoring in computing the estimations of the parameters [41].
After extensive data cleansing, we obtained the career paths of 1585 relevant employees. Only data from 1999 to 2012 were included to avoid look-ahead bias as we aim to estimate the number of teaching staff in 2013. Concerning the division of the personnel in G states, we opted for the common hierarchical academic ranking structure in Belgium as in Table 1. Furthermore, we included an additional state, state S 5 , which corresponds to wastage in our system. Contrary to most applications in the literature, we did not consider the wastage state to be an absorbing state as it regularly happens during academic careers that people who leave their universities are employed again later on. This happens in our dataset for 371 cases. The observed transitions between the states in our system are visualized in Figure 3. First of all, the Markov property (Equation (1)) was assessed. Defining the level of significance at α = 0.05, the null hypothesis states that the Markov property is met. As we consider five states in our subsystem, it follows that the test statistic χ 2 e has a χ 2 -distribution with 5 3 degrees of freedom under the null hypothesis. We obtained χ 2 e = 4984.911, which means that we reject the zero hypothesis at the significance level α = 0.05. These findings let us conclude that the whole system, consisting of five states, does not satisfy the Markov property. First of all, the Markov property (Equation (1)) was assessed. Defining the level of significance at α = 0.05, the null hypothesis states that the Markov property is met. As we consider five states in our subsystem, it follows that the test statistic χ 2 e has a χ 2 -distribution with 5 3 degrees of freedom under the null hypothesis. We obtained χ 2 e = 4984.911, which means that we reject the zero hypothesis at the significance level α = 0.05. These findings let us conclude that the whole system, consisting of five states, does not satisfy the Markov property.

Parameter Estimation and Modeling
We now use the same data as in the previous section to estimate the empirical sojourn time distributions f ij according to Equation (17) with the aid of the R package SMM [41] and apply the test statistic S ij (Equation (16)) to each tuple of states (S i , S j ). The results are summarized in Table 2. HSM, the hybrid semi-Markov model as in Section 3 with the f ij s as described above.

Comparison of the Different Models
We used Equation (15) to predict the stock vector in 2013 starting from the stock vector in 2012 for the three models mentioned above, as a first indication of the performance of those models. We took the factual recruitment vector for R(t + 1). The forecasts, including the standard deviations [42], are summarized in Table 3. It is immediately obvious, looking at Table 3, that the SMW model is the worst predictor of the stock vector for the first two personnel categories in the setting above. Other prediction results are more similar. In what follows, M, SMW and HSM are compared based on several model selection criteria such as AIC and BIC. Afterwards, we used the likelihood ratio test statistic to state a final model preference [43].
First, we analyzed the goodness of fit of our different models using the AIC and BIC according to the formulas below [44], where n corresponds to the number of estimated parameters in the model M i , l(M i ) is the log-likelihood function for M i and κ corresponds to the total number of observations, which is the number of observed transitions in our case. We obtained the following values for the log-likelihood function: l(M) = −3723.73, l(SMW) = −3912.79, l(HSM) = −3682.55 It is immediately apparent from the equations above that SMW is an unfeasible model, as it has the most parameters but the worst fit of our three models. We now proceed to calculate the AIC and BIC of the three models in question. The results are summarized in Table 4. The hybrid semi-Markov model HSM has the lowest BIC and AIC values, which means that it outperforms both the semi-Markov model SMW and the Markov model M with regard to the goodness of fit. Furthermore it is remarkable that the semi-Markov model SMW turns out to be the model with the worst fit of the three models concerning the AIC, BIC or even the values of the log-likelihood function itself. This may sound counter-intuitive at first as this model is the most flexible model of the three. We theorize that this is probably due to the more demanding data requirements needed to estimate a higher amount of parameters, which can lead to problems with overfitting.
At last, in order to make a final choice between the models above, one can assess the goodness of fit between the Markov model M and the hybrid semi-Markov model HSM by means of the likelihood ratio test for nested models as M ⊂ HSM [44]. For two nested statistical models M 1 ⊂ M 2 , the likelihood ratio test statistic is given by: where l(M 1 ) and l(M 2 ) are the values of the log-likelihood function for M 1 and M 2 , respectively. This test statistic is, under the zero hypothesis, i.e., that the more simple model is in fact the true model, asymptotically χ 2 distributed with d degrees of freedom, where d is the number of additional parameters in the more complex model. We now proceed to use the likelihood ratio test to assess the goodness of fit between the two remaining models of interest: M and HSM. We arrive at the following value for the test statistic λ LR .
λ LR = 82.36 (22) As HSM adds two additional parameters to M, it follows that the test statistic λ LR has a χ 2 -distribution with two degrees of freedom under the null hypothesis. We obtained λ LR = 82.36, which means that we reject the zero hypothesis at the significance level α = 0.05 in favor of the alternative hypothesis, i.e., that HSM is the better model, which is consistent with the AIC and BIC values in Table 4. Hence, for illustrative purposes, our three models can be ranked according to their goodness of fit as HSM, M and finally SMW.

Conclusions
In this paper, we use a discrete time semi-Markov framework to model an open manpower system. At first sight, such a model might appear to be the preferable model as it is not only a more flexible model in nature but also enables us to account for duration of stay effects. However, such a model does not always show to be superior in an empirical context due to the fact that more parameters have to be estimated, which necessitates the availability of a vast amount of data and which may lead to overfitting in the absence of enough data. Therefore, we introduce a hybrid semi-Markov model, that is a semi-Markov model in which Markov sojourn time distributions are used for those transitions (S i , S j ) where it is not useful to account for duration of stay effects and in which Weibull distributed sojourn times are used for those transitions (S i , S j ) where the geometrical hypothesis does not hold. Hence, the hybrid semi-Markov model takes the duration of stay effect into account only for those transitions where it can contribute to the improvement of the goodness of fit. In this way, the hybrid semi-Markov combines the best of both worlds by capturing duration of stay effects where useful and reduces the number of parameters to estimate, where possible. Finally we used a real world personnel dataset to illustrate our insights and made a comparison between the Markov model, the semi-Markov model and the hybrid semi-Markov model.
The authors view the use of this specific dataset as one of the most important limitations of this research, as alternative or richer databases may exhibit other characteristics which could lead to other model choices. In addition, future research may focus on the use of other non-Weibull distributions or might explore the possibilities of a hybrid semi-Markov model in a non-homogeneous context.