An Advanced Markov Switching Approach for the Modelling of Consultation Rate Data †

: Regime switching in conjunction with penalized likelihood techniques could be a robust tool concerning the modelling of dynamic behaviours of consultation rate data. To that end, in this work we propose a methodology that combines the aforementioned techniques, and its performance and capabilities are tested through a real application.


Introduction
Dynamic behaviours in time events are always quite complex, and their modelling is often a challenging task.The level of difficulty is accelerated in cases where the dynamics of a system cannot be satisfactorily described by linear models, but more perplexed non-linear functions are required.Classical time series approaches are not capable of capturing complex functional behaviours.Even advanced models recently proposed are not flexible enough and as a result are not easily adjusted to handle more general non-linear schemes [1][2][3][4].
It is also frequently observed that the behaviour changes its general pattern in different regions of the time space.Such changes may affect either the mean or the variation or both.A breakthrough in this field took place 40 years ago with the proposal of Markov regime models and the switching regressions [5].Such models allow a great degree of flexibility, and as a result they could be implemented to capture complex dynamic behaviours.The unobserved state variable associated with such models is an attractive feature directly related to the switching mechanism of the underlying modelling approach.The resulting advanced models rely on the Markovian property, which is an easily handled issue in terms of inferential statistics.Note finally that censoring [6] or semi-Markov approaches [7] may also be considered in such frameworks.
In this work, within the switching framework, we introduce the classical likelihood combined with a penalty term controlled by a properly chosen tuning parameter.In other words, the switching modelling technique is combined with the so-called penalized likelihood with the parameter estimation being dealt via the Expectation-Maximization algorithm [8].Depending on the phenomenon under investigation, a proper switching model can be used.Two states often suffice to describe the classical dynamic behaviour of incidence data or epidemics, with one state representing the normal stage of the phenomenon and the other the outbreak stage.In such a case, the frequency (usually of daily or weekly data) changes (increases) considerably (and in some cases dramatically) when the system enters into the second stage.Such a change is considered statistically significant, and therefore the unobserved state variable ignites the switch.In addition, possible covariates may affect the variable of interest, which is denoted by y t and represents either the frequency or the associated rate.

The Modelling
The model used in this work is the 2-state switching model of conditional mean, the general form of which is given by where where θ js t the coefficient associated with the W j covariate.As i is clear from the presentation of the above model, a different set of parameters is involved for each state considered.It is important to state that the set of covariates involved in each state may or may not be the same.

The Algorithmic Procedure
The approach we choose to follow for modelling phenomena that exhibit a dynamic switching behaviour consists of three steps, which are briefly discussed in this section.

Step 1-The Change Point Detection
The detection of a change point in a time series and in general in events over time constitutes an integral part of time series analysis, since their identification is directly related to a distributional change.Such changes, even light ones, should cause alarm due to the fact that they may alter the data generating process in such a way that the process under investigation may fail to fulfill the purpose for which it is intended.Applications can be found in most scientific fields from finance and business to engineering, biosciences, climatology, geosciences etc. [9][10][11].
The proposed methodology requires a preliminary analysis to identify a set of possible change points.It should be noted that such analysis involves only the response variable, and no covariates are involved.The method to be used may be the classical method of change-point identification [12].
For the change point detection, an offline algorithm is used to examine the entire set of observations in a single step to recognize where the change occurred.The online approach could be chosen instead, as long as a certain number of new data are available for the algorithm to function properly and satisfactorily.
To check the performance of the selected change points, we have used the mean absolute error (MSE) according to which predicted and actual values are compared.The general expression is given by

Step 2-The Variable Identification
For the identification of the statistically significant covariates, a kind of model selection technique can be implemented.In this work, we propose the use of computationally advanced regularization methods, such as Lasso, Ridge or Elastic-Net with the latter considered to be a generalization of the former ones that overcomes their disadvantages.For the interested reader, a number of articles investigate the interrelation of time series and regularization techniques [13][14][15].
The generalized regularization method used in this work is given by where SSE is the sum of squared errors or any other loss function chosen by the researcher, T the sample size, α ∈ [0, 1], and λ the tuning parameters that result the penalty in the loss function.Note that α balances the amount of emphasis given to minimize the loss function versus minimizing the sum of squared coefficients and/or the sum of absolute coefficients.Observe that the above generalized regularization method is reduced to

•
The Lasso method for a = 1; • The Ridge method for a = 0; and • To Elastic-Net for a ∈ (0, 1).
Note that a proper weighted version of (3) can be used if it is needed, for instance, to resolve a heteroscedasticity issue.In such a case, (3) takes the general form where w φ i and w θ j appropriate weights, i = 1, . . ., p and j = 1, . . ., k.

Step 3-The Switching
The selected model for each state is obtained together with the parameter estimates and the associated standard errors.
Note that in practice, we do not know and we do not observe the state s t , but we could infer it from the observed data.Indeed, although the state variable s t is an unobserved variable, the process y t is observed.To make an inference about s t , we need to make an assumption about the process s t , which usually is assumed to follow a first order Markov chain.For the 2-state case, the probabilities of transition are also obtained.Thus, the transition probability to state j at time t, given that the process was in state i at the time point t − 1, is given by P(s t = j|s t−1 = i) = p ij , i, j = 1, 2.

An Application on Epidemiology
Using a data set of 105 weekly influenza-like-illness (ILI) consultation rate data for Greece for the period 2014-2016, including a series of meteorological and climatological covariates such as temperature and wind, we were able to identify an ideal tuning parameter λ and the best value of the index α in the sense that the minimum mean squared error is achieved.Figure 1 shows that the ideal value of λ is around one (1), while the best value of α is between 0.5 and 1.A more detailed analysis reveals that λ = 1 and α = 0.729, with the corresponding value of MSE being equal to 0.0734.Under this setting, the regimes of the data and consequently the form of the selected models are identified together with the estimates of the parameters involved.
The models obtained with the use of the MSwM R-package [16] are as follows (with three decimal points): It is worth mentioning that the same set of covariances are found to be significant for both regimes together with a first-order autoregressive, a first degree trend polynomial (linear trend) and a periodic (seasonal) part.The covariates chosen are the minimum, mean, and median temperature, denoted, respectively, by T1, T2, and T3 and the mean of the wind force denoted by WF, implying that the influenza is closely connected to meteorological/climatological factors like the temperature and the wind.

Figure 1 .
Figure 1.Behaviour for various values of a as opposed to different values of log λ and MSE.
p, are autoregressive (AR) switching coefficients, s t represents the state variable that takes the values 1 (normal or typical state), and 2 (the extreme or outbreak state) and t are i.i.d.random variables with zero mean and variance σ 2 .
If k covariates are allowed to enter into the model, (1) extends to