For illustration, we aim to analyze the dynamics of self-rated health condition (SRH) among a sample of individuals aged 50 and over living in Switzerland. In particular, by means of a hidden Markov model, we will analyze if the observed changes can be explained by the presence of a hidden process (
Section 4.2) and we will investigate on the effects of the educational level (
Section 4.3) using an MTD-based approach. All the models presented here have been computed using the R package “March” [
29].
4.1. Data
We use data from 14 waves of the Swiss Household Panel [
30]. It is a yearly panel study started in 1999 on a random sample of 5074 households. We focus here on an unbalanced subsample of 1331 individuals aged 50 or more at the first interview with at least three measurement occasions (on average 10 observations per individual). Among them, at the baseline, 63.3% of the respondents are aged 50–64, 31.9% are between 65 and 79 years old, and the remaining 4.7% are more than 80 years old.
The SRH conditions are defined by the question “How do you feel right now?”. Five possible answers were proposed: “not well at all”; “not very well”; “so, so”; “well”; “very well”, that we shall denote respectively as P (poor), B (bad), M (medium), W (well), and E (excellent) health condition. The distribution of the dependent variable shows a general condition of good health. Almost 80% of respondents feel well or very well (W—61.26%, E—17.26%) and only 2% bad or very bad (P—0.23%, B—1.8%).
4.2. The Hidden Markov Model
We analyze the dynamics of health conditions by the means of a hidden Markov model. Differently from a conventional Markov chain (i.e., a multistate transitional model), where the model estimates the transitions among the observed self-reported health states and then the process is entirely visible, we want to introduce a latent variable to represent unobserved characteristics that influence the observed health condition.
In order to select the optimal number of hidden states, we compare several models in terms of likelihood and Bayesian Information Criterion (BIC) [
31], increasing the number of hidden states up to 5 (see
Table 1).
The most parsimonious model, i.e., the one with the lowest BIC [
32], is a three-state hidden model. It is important to notice that this model is not optimal in terms of log-likelihood, since the addition of more hidden states always improves the fit of the model to the data. However, the model chosen by the BIC is the best compromise between its complexity and the fit to the data.
The relationship between the outcome variable and the three hidden states can be analyzed using the response probabilities (Equation (2c), reported in
Table 2). An alternative is to estimate the most likely sequence of hidden states (by using the Viterbi algorithm [
21]) and then to provide a cross tabulation between observations and the predicted hidden states (
Table 3).
The first hidden state is mainly associated with state M (65%) (average health) and with a worse health condition (10% of probability of feeling B “not very well” or P “not well at all”). We will then label this state as a "frail" health condition (
F). The third hidden state refers to individuals in good health with high chances to be in excellent condition (56.1%). We will refer to this hidden state as a situation of “very good” condition (
). Finally, the second state is an intermediate situation mainly associated with W (84%) or M (almost 10%). We will refer to it as a state of “good” health (
G). We will then (re)label the hidden states as
F,
G, and
. Here, the labels of the hidden states will be printed in italics since such states are not observed but inferred from the data. The transition probabilities between hidden states can be represented in matrix form (
Figure 2 left panel) or, since we have a relatively low number of hidden states (the latent variable has three categories in our example), as a path diagram (
Figure 2, right hand side). In the diagram, the arrows corresponding to probabilities estimated as zero are not shown, and for readability purposes, transition probabilities have been rounded to two decimals.
The initial distribution of hidden state
represents the condition at baseline. Despite the overall healthy aging of the Swiss population, there is still 19.9% of chance to start the trajectory in a frail (
F) state.
The limit distribution of the hidden chain
shows a progressive decline in SRH trajectories with a probability of being in a fair condition (F) that rises to 34.8% at the end of the observational period. Nevertheless, in 59% of cases, the respondents are estimated to be in the good hidden state
G.
According to the transition probabilities (
Figure 2), the states are very persistent. There is more than 90% probability to stay in the same state for two consecutive periods (probabilities reported on the main diagonal of the transition matrix). Three transitions, (
), (
), (
), are extremely rare or impossible. The transition probabilities for individuals with a good health condition, hidden state
G, are particularly interesting. Apart from those who stay in the same hidden state, they have more chance to fall down in the frail condition (transition probability from state
G to
F of 3.4%) rather than to improve their situation (probability of moving from state
G to state
of 0.9%).
We want to include now the effect of educational level on self-rated health trajectories of mature and older Swiss population. The level of education has been coded into three categories: (i) lower secondary level (“low”); (ii) secondary level with professional vocation (“medium”); (iii) post-secondary level (“high”). We show first the model with education at hidden level, then how to consider it at the visible level. We will use the same labels of the hidden states as in the model without covariates. However, it is worth noting that the hidden states are not exactly the same since we are including additional elements in the model. Nevertheless, looking at the response probability distribution, in our empirical example, the hidden states of the model with and without the covariate remain similar and the substantive interpretation remains the same. For simplicity then, we will keep referring to the hidden states as “fair” F, “good” G, and “very good” .
4.3. HMM with Education at the Hidden Level
Using the 3-state hidden Markov model presented in the previous section, we include now a categorical covariate—level of education—at the hidden level. We consider both a large transition matrix with the interactions between states and levels of the covariate and the MTD-based approach.
We consider first having a unique transition matrix. The transition matrix
D (
Figure 3) shows a competitive advantage on health deterioration for those who have a high level of education (“High”).
The probability of falling in the hidden state F decreases with the level of education. Moreover, less educated people have a probability of being in a “fair” condition at the beginning of the sequence (initial probability distribution ), twice bigger than the most educated ones (31.1% versus 13.9%). The level of education also has a slight positive impact on chances to recover from a not-healthy condition. For instance, people with a high level of education have 2% more chance to move from a “fair” to a “good” situation (transition ) than those with a lower level of education (7.4% against 5.0% and 5.1%, respectively, for those with a medium or low educational level). Similarly, the probability of a worsening in the health condition () decreases with the educational attainment.
Let us consider now the MTD-based approach discussed in
Section 3. We consider a mixture of the effects of the lag (transition probabilities across hidden states) and of the level of education (the response probability distribution of hidden states given the modalities assumed by the covariate). The results are reported in
Table 4. Despite that the likelihood ratio test shows a statistically significant improvement of the likelihood (
p-value for the likelihood ratio test of ≤0.005, see
Table 5), the weight parameters
show that the level of education has only a small effect on the latent process (it counts for only 1%). Nevertheless, as shown by the initial hidden state distribution
and the response probabilities, we again observe evidence of education as a protective factor against deterioration in health. The probability of being in a frail regime (
F) for instance decreases with the level of education from 88.8% for low-educated respondents to 38.1% for those with a medium level of education. Due to the low effect of education on the latent process, the transition probabilities estimated with the MTD approach for the covariate (
Table 4) are similar to those estimated in the HMM without covariate (
in
Figure 2). The interpretation then remains the same with a stability of the states over time.
Table 5 shows the quality of the models using the two approaches. As expected, the MTD-based approach is more parsimonious and, according to BIC, performs better [
32] than the model with the transition matrix made up of the interaction between the levels of the covariate and the states.
4.4. HMM with Education at the Visible Level
Let us consider now the effect of education directly at the visible level using the MTD-based approach. As before, at the hidden level we have a latent variable with three states that we will keep referring to as
F,
G, and
. For each hidden state, the model estimates a mixture of the response probability distribution of that hidden state and the distribution of the visible SRH given the level of education. In addition, a vector of weighting parameters will be estimated to measure the relevance of the explanatory factor for each hidden state. The results of our illustrative example are reported in
Table 6.
Individuals belonging to hidden state F report relatively lower levels of health with a probability of 6% of reporting a poor or bad level of health (see the response probability distribution). The respondents represented by G and hidden states, and especially those in the latter one, are likely to be in W (well) or E (excellent) health condition. It is interesting to notice that for all three hidden states, the response probabilities show quite high values for being in a well-condition too (41.8% in F, 86.2% in G, and 54.5% in ).
The weight parameters show that educational levels affect the SRH condition only for those who are in the “fair” (
F) hidden state (associated weight
of 0.5451); also for them, the level of education is slightly more important to predict the current condition than the SRH itself (54.51% compared to 45.49%). For the other two hidden states, the education level has an almost null weight (0.004 for the second hidden state and 0.000 for the third hidden state). Due to the low impact of education on the data generating process, comparing the model with (
Table 6) and without (
Section 4.2) covariate, the initial distribution and the transition matrix of the model with and without covariate remain similar.
Since education seems to have an effect only on the first hidden state, we focus here on the results referred to F. According to the distribution of health condition by level of education, individuals in the hidden state F with a low level of education have no chance of having a good or an excellent health condition. So, education becomes more relevant in cases of poorer health, confirming the results we found including the level of education directly at the hidden level, where education plays a protective role against falling in a frail regime.