Modeling COVID-19 Infection Rates by Regime-Switching Unobserved Components Models

: The COVID-19 pandemic is characterized by a recurring sequence of peaks and troughs. This article proposes a regime-switching unobserved components (UC) approach to model the trend of COVID-19 infections as a function of this ebb and ﬂow pattern. Estimated regime probabilities indicate the prevalence of either an infection up-or down-turning regime for every day of the observational period. This method provides an intuitive real-time analysis of the state of the pandemic as well as a tool for identifying structural changes ex post. We ﬁnd that when applied to U.S. data, the model closely tracks


Introduction
Since its onset in late 2019, the COVID-19 pandemic has permeated next to all facets of public life. Even in the transition of the COVID-19 pandemic to an endemic state, the by now well-known routine of alternating infection peaks and troughs will demand close observation for the foreseeable future (Telenti et al. 2021). However, analyzing and monitoring the state and development of the pandemic is complicated by the nature of the data on COVID-19 case numbers: a glimpse at Figure 1 reveals key characteristics include the strong persistence and nonstationarity of case numbers (Dolton 2021), as well as alternating regimes of increasing and decreasing infections caused by policy interventions, medical innovations, seasonal climate conditions, and the evolution of the virus itself (Doornik et al. 2022;Fiscon et al. 2021). In addition, infection dynamics are overlapped by a seasonal pattern of increasing volatility, generated by a varying number of tests over the days of the week (Bergman et al. 2020), as well as by measurement errors (Hortaçsu et al. 2021).
While a variety of econometric tools have been utilized to study the dynamics of COVID-19 case numbers, in particular, unobserved components (UC) models have been found successful in capturing the aforementioned characteristics of COVID-19-related data. 1 In the early stages of the pandemic, UC models were used to fit linear deterministic trends to COVID-19 case numbers and to identify structural breaks (Hartl et al. 2020;Lee et al. 2021;Liu et al. 2021). With an increasing number of observations available, UC models with stochastic trends have been considered by Moosa (2020) and Doornik et al. (2022), while (stationary) seasonal components were added by Navas Thorakkattle et al. (2022) and Xie (2022), among others. However, appropriately accounting for the alternating peaks and troughs in the trend of COVID-19 case numbers remains an open challenge. We contribute to the UC literature by explicitly modeling the peak and trough pattern of infection numbers, which has emerged as one of the defining features of the COVID-19 pandemic. For this purpose, we introduce a regime-switching UC model that decomposes log daily COVID-19 infections into trend, seasonal, and cyclical components. While the trend is formulated as a random walk (RW) with drift to capture the long-run dynamics of log new infections, the novelty of our model is that the drift term is made regime-dependent to account for alternating periods of increasing and decreasing infections, as well as for the strong persistence of the respective regime. A seasonal component is added to model the weekly recurring pattern of case numbers, while an autoregressive (AR) term accounts for short-term dependencies in the data.
Regime-switching and mixture models have been studied extensively in the past. We refer to Frühwirth-Schnatter (2006); Kim and Nelson (2017) for an overview. As proposed in Kim (1994); Kim and Nelson (2017, ch. 5), we estimate the trend, seasonal, and cyclical components via the Kim filter, which is an extension of the Kalman filter to regime-switching models: between the prediction and updating step of the Kalman filter, it executes the recursions of Hamilton (1989) to estimate the regime probabilities and thus allows for regime switching in a state-space framework. Parameter estimation is carried out by numerical optimization of the likelihood function, where we make use of an extensive grid search to be robust against local optima. As an alternative, one could also utilize the Expectation-Maximization (EM) algorithm, which has recently been derived for the Kim filter by Degras et al. (2022). As an alternative estimation strategy, we also consider a Bayesian-Gibbs sampling approach. Markov chain Monte Carlo (MCMC) techniques have been widely employed for state-space and especially regime-switching models (see Frühwirth-Schnatter 2006). In contrast to the Kim filter, inference is based on the joint distribution of the state vector, regime probabilities, and additional parameters, rather than on conditional distributions (Frühwirth-Schnatter 2006, ch. 13).
The model is applied to daily U.S. infection data provided by the Johns Hopkins University Center for Systems Science and Engineering (JH/CSSE) (Dong et al. 2020). We found that the estimated regime probabilities closely track the pattern of infection waves throughout the whole chronology of the pandemic. This allows for easy ex post evaluation of the effectiveness of public health interventions or the severity of viral mutations: whereas significant changes tip the system into the opposite regime, harmless mutations or insignificant interventions do not trigger a switch. Moreover, we introduce a nowcasting application that provides an easy-to-understand and concise monitoring tool for the current state of the pandemic.
The plan of this paper is as follows. Section 2 describes the data and methodology. Section 3 presents empirical findings and examines the nowcasting application. Section 4 discusses alternative model specifications, contrasts the Kim filter with a Bayesian-Gibbs sampling approach, and presents a Monte Carlo study to evaluate the reliability of parameter estimates for the preferred specification. Section 5 concludes. All R code to replicate this paper is available at the Github repository https://github.com/Paul-Haimerl/Regime-SW-UC-COVID-19 Repository accessed on 2 April 2023.

Data and Methodology
To analyze the state and dynamics of the COVID-19 pandemic, we consider data from the JH/CSSE, which can be downloaded from https://github.com/CSSEGISandData/ COVID-19 (accessed on 14 March 2023). Figure 1 sketches the data on reported daily COVID-19 infections from the 22nd of January 2020 to the 25th of December 2022 for the U.S. in both levels and logarithms.
As can be seen, data on daily infections in the early phase of the pandemic are downward-biased due to limited test capacities. As test availability increases, one sees a rapid increase in case numbers in March 2020. 2 Accounting for this bias, we select the 1st of April 2020 as the start date for our study. Tracking the pandemic until the end of 2022 provides a total of T = 1005 daily observations. 3 To set up our model, let i t denote new daily U.S. COVID-19 infections and define y t = log(i t ). The measurement equation of the unobserved components model is then given by where µ t denotes the trend, γ t models the seasonal component, and c t is a stationary cyclical component. As Figure 1 suggests, the model is formulated in terms of logarithmic case numbers to reflect the exponential growth of COVID-19. Furthermore, taking logs allows for proportional seasonal effects and measurement errors rather than additive effects (see e.g., Doornik et al. 2021;Lee et al. 2021;Liu et al. 2021).
As typical in the literature, the seasonal component γ t is modeled as a mean-zero deterministic process (Durbin and Koopman 2012;Navas Thorakkattle et al. 2022, ch. 3 The cyclical component c t is to capture short-run dynamics in the data. Therefore, we define c t as a stationary AR(2) process L denotes the lag operator Lx t = x t−1 . The motivation behind (3) is to allow for some additional short-term dependencies and thus for autocorrelated measurement errors instead of a purely unsystematic additive component with no temporal correlation. Such short-term fluctuations can arise, e.g., due to testing bottlenecks, large-scale public events, the gradual introduction of new policies, or time lags in the reporting of new cases.
Turning to the trend, we specify µ t as a random walk with drift to capture the permanent, smooth low-frequency dynamics, as displayed in Figure 1.
The drift term ν t depends on the current regime indicator S t ∈ {0, 1}. In order to uniquely identify the system, we impose ν 1 < 0, which declares S t = 1 as the infection down-turning regime and S t = 0 as the infection up-turning regime. Consequently, (4) allows for alternating regimes of increasing and decreasing infections, as well as for different rates of growth and decay. Specification (4) can be justified by visual inspection of Figure 1, which shows that log infections are driven by persistent but alternating phases of increasing and decreasing case numbers. Regime switches may be triggered by major epidemic events that have a persistent impact, such as virus mutations, policy changes, or medical innovations, among others.
The switching behavior between the two states is modeled by a first-order stationary Markov process (Hamilton 1989, ch. 2), with the transition probabilities The transition probabilities are constrained to satisfy p, q > 90%, which resembles the path-dependent behavior of infection-increasing and infection-decreasing regimes. 4 In the following, we refer to the specification in (1) to (5) as the D.Seas.C. model. To estimate the parameters θ = (σ ξ , σ η , ν 1 , ν 0 , φ 1 , φ 2 , p, q) we proceed similarly to the UC literature: First, the model is cast in state-space form. Based on the Kim filter, we obtain a conditional likelihood function that combines the prediction and updating steps of the Kalman filter with the regime-switching recursions of Hamilton (1989) (see Kim 1994, ch. 2 for details). Maximizing the conditional likelihood yields the desired parameter estimates and is analogous to the usual prediction error decomposition in the UC literature, which involves the Kalman filter instead of the Kim filter (see Harvey 1989, ch. 4). 5 While the estimation results for the D.Seas.C model are presented in the next section, we also discuss generalizations, including a stochastic specification of the seasonal term and the inclusion of a third state in Section 4.

Empirical Results
To initiate the Kim filter, we initialize µ 0 with the number of reported COVID-19 cases on the 31st of May 2020, one day prior to the start of the observational period. The remaining entries to the state vector and the diagonal of the state covariance matrix are initialized diffusely, as is common in the UC literature (see Durbin and Koopman 2012, ch. 5).
To ensure stable estimates of all parameters in θ and to avoid convergence to local optima, we employ an extensive three-step grid search that covers the relevant parameter space while remaining computationally feasible: In a first global grid search, 30,000 parameter combinations are randomly drawn from uniform distributions with sufficiently wide support to encompass the entire parameter space. To narrow down the locations of local and global optima, we pick the 50 θ-vectors corresponding to the greatest likelihood values. For each parameter, we store the minimum and maximum value of these 50 combinations, which gives us the range of the relevant parameter space. In the second step, these minimum and maximum values are employed as bounds to construct a finer parameter grid. After computing the likelihood of each grid point, we use the 50 grid points yielding the greatest likelihood as starting values for numerical optimization via the Nelder-Mead algorithm. The optimized parameters corresponding to the greatest log likelihood are then chosen as the final parameter estimateθ. Figure 2 outlines the parameter distributions as generated by these 50 optimization results. As all parameters of our model (1) to (5) feature a narrow distribution around their best estimate, they appear accurately identified. Table 1 presents the estimation results and corresponding standard errors.  Based on the transition probabilitiesp andq, it is straightforward to calculate the regime durations. The model estimates an expected duration of (1 −p) −1 ≈ 83 days for the down-turning S t = 1 regime and (1 −q) −1 ≈ 32 days for the up-turning S t = 0 state. The drift estimatesν 1 andν 0 are interpreted as the average day-to-day change in the trend in log COVID-19 cases. Transforming back into levels gives an average daily decrease of 1 − expν 0 +ν 1 ≈ 1.49% for the down-turning regime on the one hand, and a daily increase of expν 0 −1 ≈ 3.36% in periods of the up-turning regime on the other. Whereas the COVID-19 infections double roughly everyν −1 0 log(2) ≈ 21 days in an up-turning state, approximately (ν 0 +ν 1 ) −1 log( 1 2 ) ≈ 46 days are required to halve these case numbers again in a subsequent down-turning regime.
Executing the Kim filter and smoothing recursions yields estimates for the state vector. Figure 3 displays the smoothed trendμ t and regime probabilitiesPr(S t = 0|y T , ..., y 1 ), together with the noisy measurement of log COVID-19 infections. . . , y 1 ) (blue, right scale), and log COVID-19 cases log(i t ) (gray, left scale). The smoothed trend is a probability-weighted average of the two regime-specific trend estimates. Infection waves (up-turning regime) as identified byPr(S t = 0|y T , . . . , y 1 ) > 40% are shaded.
As Figure 3 shows, the model allows for the identification of episodes of up-and down-turning regimes. In particular, it assigns a strong path dependence to the infection regimes, such that long episodes of containment are followed by new infection waves. The increasing proportion of seasonal distortions is attributed to the seasonal and short-run component, and thus does not deteriorate the smoothed trend nor the smoothed regime probabilities. It is only towards the end of the observational horizon that lateral dynamics, strong seasonal fluctuations, and the limited number of future observations complicate the separation of trend, seasonal, and cyclical dynamics. As a consequence, the smoothed trend and regime probabilities appear more erratic.
To illustrate the benefits from our UC model for identifying structural changes ex post, we select a simple threshold of the smoothed regime probabilityPr(S t = 0|y T , ..., y 1 ) > 40% in order to indicate up-turning states. While we leave the choice of a policy rule up to the experts in the field, our arbitrarily chosen threshold is rather cautious (i.e., the probability threshold is set below 1/2) and allows to identify several infectious waves that are shaded in Figure 3. A chosen list of policy measures and events, as reported by the U.S. Centers for Disease Control and Prevention in proximity of the six identified infection waves, is presented below. 6 1.
3 June 2020-10 July 2020 10. In addition to an ex post analysis, we also propose the use of the prediction step of the Kim filter as an easy real-time monitoring device for the pandemic. In case the one-step-ahead prediction of entering the up-turning regime exceeds the aforementioned threshold of 40%, policies may be triggered to quickly shorten the duration of the incoming up-turning regime.
In the following, we evaluate the appropriateness of this monitoring application based on its past propensity for type I (false-positive) and type II (false-negative) errors, as well as on the time lag of filtered predictions relative to the smoothed estimates. To mimic the situation of a decision maker in real time, we start with 150 observations and estimate the parameters θ, as described at the beginning of this section. We then obtain one-step-ahead predictions for the conditional probability for the up-turning state from the Kim filter. Next, an additional observation is added to the sample, and the procedure repeats. To speed up the procedure, we update the parameter estimates only every two iterations and use the estimates from the previous iteration as starting values for the numerical optimization. Every 500 iterations, an additional grid search, as described at the beginning of this section, is performed to robustify the procedure. Figure 4 compares the smoothed estimates and the filtered one-step-ahead regime probabilities. Time periods where the one-step-ahead prediction exceeds the 40% threshold are shaded. As expected, the filtered regime probabilities are significantly more jagged and slightly lag the smoothed estimates due to their smaller information set. Nonetheless, all of the previously established up-turning periods that fall within the scope of the nowcasting exercise are identified by the monitoring device. A feature that underpins the rather conservative nature of the monitoring device is that it detects the onset of an up-turning regime almost in real time. However, this comes at the cost of a higher type I error, i.e., the detection of an up-turning regime that contradicts the smoothed estimates, e.g., at the end of 2021. During the summer of 2022, the monitoring device detected several additional up-turning regimes that are classified as down-turning ones based on the smoothed regime probabilities. However, as Figure 3 indicates, the smoothed trend is actually increasing during that period; however, the magnitude of the increase is small compared with other up-turning periods. Table 2 contrasts the infection waves, as identified by the smoothed regime probabilities (shaded periods in Figure 3), with those derived from the filtered one-step-ahead predictions (shaded periods in Figure 4). Table 2. Periods of the up-turning S t = 0 regime as identified by smoothed estimates and corresponding nowcasting one-step-ahead predictions. Notes: False-positive periods of the up-turning regime are omitted (see Figure 4). The first infection wave is not covered due to an initialization period of 150 days for the nowcasting application.

Smoothed Estimates Filtered One-Step-Ahead Predictions
Across the whole covered period, the nowcasting tool lags only slightly behind the smoothed estimates. These findings suggest the suitability of the model in Equations (1)-(5) for a real-time monitoring tool to provide an intuitive and concise state of the pandemic.

Robustness and Discussion
The daily observations of COVID-19 case numbers in Figure 1 display an increasing seasonal variation over time. The deterministic seasonality of the D.Seas.C. model in Equation (2) is not equipped to handle such dynamic behavior. As a result, some of the increasing seasonality may be captured by the estimated cyclical and trend component, which subsequently distorts the regime-switching behavior of the system. A straightforward extension of the model would therefore be to substitute (2) with a stochastic specification capable of assuming seasonal behavior with a progressively increasing variance. One process with such properties is a seasonal unit root (see Bauer and Wagner 2012 for details). It can be added to the state-space model by specifying the new seasonal component Similar to the traditional UC literature, γ UR t is set to be a stochastic process with expected average equal to zero (see e.g., Durbin and Koopman 2012, ch. 3.2.2). However, contrary to the deterministic seasonality in (2), the stochastic specification (6) introduces nonstationary dynamics at lag seven. By plugging in x t and rearranging, (6) can be expressed as ∑ 6 j=0 γ UR t−j = ∑ 13 j=7 γ UR t−j + ω t , and thus (1 − L 7 ) ∑ 6 j=0 γ UR t−j = ω t . Therefore, the unit root at lag seven links the current seasonal dynamics to those of the previous week, which generates a repeating seasonal pattern. In each period, a shock is added to ∑ 6 j=0 γ UR t−j , which yields an ever-increasing volatility over time. Figure 5 sketches 100 simulations of γ UR t and illustrates the oscillating property and nonstationary nature of this seasonal component. Note that (6) differs from unit root seasonal components in the spirit of Harrison and Stevens (1976), which are rather to allow for time-variant seasonal dummies instead of generating an oscillating, diverging behavior.  Table 3 presents the estimation results for substituting the deterministic seasonal term (2) with the unit root specification (6), where the parameter estimates are obtained as before. Furthermore, we provide estimates for specifications that exclude the cyclical component (3) in favor of an unsystematic error term t for both the deterministic and stochastic seasonality, respectively.
Another point of concern is the lateral behavior of infection numbers towards the end of the observational horizon. Ambiguous time periods that cannot be fully attributed to either the up-turning or the down-turning regime may lead to incoherent regime probabilities and thus bias the regime estimates at an earlier stage of the pandemic. As a potential remedy, we introduce a specification involving a third state S t = 2, during which the drift term ν t of the trend component in (4) is set to zero. Thus, the third state is intended to reflect periods that feature neither a strong upward nor downward trend in the reported case numbers. Table 3 provides estimates of the proposed extensions together with the preferred D.Seas.C. specification displayed in Section 3.  All information criteria clearly favor the more general specifications that include a seasonal unit root process over the D.Seas.C. model presented in Section 3. This is due to the seasonal unit root component, which better grasps the oscillating seasonal pattern along with its increasing volatility over time, as compared with the deterministic seasonal specification. The latter attributes the strong short-run fluctuations to the cyclical component. Since c t cannot adequately capture the strong seasonal patterns due to its stationary nature, the smoothed cyclical shocks appear to be autocorrelated, which reduces the likelihood and thus increases the information criteria. 7 However, even though the observed series can be fitted more accurately overall when the seasonal component is generalized to include a seasonal unit root, it is striking that the trend and regime-switching characteristics vary little across specifications: as illustrated by Figure 6, the smoothed regime probabilities of the different models almost coincide, and only minimal differences between the smoothed trends can be spotted. Therefore, smoothed trend and regime probabilities appear robust to the specification of seasonal and short-run components.
The introduction of a third state, reflecting episodes of stable case numbers, is able to improve upon the fit of the two-state D.Seas.C. model, but this refinement is not sufficient enough as to be favored by the BIC and HQ information criteria. In addition, the constraint q > 90% is binding, indicating that the model is not suitable to depict the regime path persistence that is inherent to the pandemic. 8 Consequently, we choose the simpler D.Seas.C. model for the remainder of our analysis.
Additional consideration is given to the general estimation strategy. Numerical optimization of a likelihood function, as employed by the Kim filter, has been traditionally regarded as the standard technique for state-space models (Durbin and Koopman 2012, ch. 7). However, Bayesian approaches, especially MCMC techniques, are also a welldeveloped field in the literature (Frühwirth-Schnatter 2006;Kim and Kang 2019;Luginbuhl and de Vos 1999). In the Bayesian estimation of UC models, the state vector and additional parameters are drawn from a joint distribution, rather than treating the unknown parameters as fixed when maximizing the likelihood numerically. As a consequence, the Bayesian approach bears the advantage that the posteriors incorporate any underlying uncertainty regarding the additional parameters (Luginbuhl and de Vos 1999). Moreover, in order for the Kim filter to account for all possible regime permutations, M 2 individual predictions need to be produced in a single time period for a model with M states. Since the path dependencies grow at a rate of O(M 2T ), the Kim filter exploits regime probabilities to collapse M 2 state vector posteriors into M posteriors at the end of each time period in order to remain computationally feasible (Kim and Nelson 2017, ch. 5.2). Although previous studies have shown that this approximation entails only a small effect on the final estimates, some bias may be introduced by using weighted averages to collapse the posteriors (Kim and Kang 2019). The Bayesian approach, on the other hand, relies on draws from the joint distribution of the state vector, regime probabilities, and additional parameters. The need for a similar approximation step in an effort to track all conditional distributions is eliminated. Inference only requires the convergence of the Markov chain (Frühwirth-Schnatter 2006, ch. 13).
To robustify our findings, we therefore also estimate the preferred D.Seas.C. model using a Gibbs sampling approach. Table 4 portrays the results. Furthermore, Figure 6 includes the averaged trend and regime probability posterior draws.   Table 1), as well as for the seasonal unit root UR.Seas. model (grey, see Table 3) using the Kim filter. Averaged posterior draws of the trend and regime probabilities for the D.Seas.C. model as derived by the Gibbs sampler are shown in blue (see Table 4).
Comparing the Kim filter estimates in Table 1 with the Bayesian inference in Table 4, the cyclical dynamics almost perfectly match. However, when analyzing the trend component, the Gibbs sampler provides a lower estimate of the innovation variance σ ξ while, in parallel, specifying the regimes as less persistent and more dissimilar. Therefore, more variance of the data-generating process is attributed to the regime-switching behavior, driving more frequent regime switches and a greater regime-dependent effect. Nevertheless, even with more volatile regime probability estimates, inference based on the Gibbs sampler exhibits only marginal differences, as Figure 6 reveals.
Regime-switching models are notoriously difficult to estimate. In particular, inference regarding the regime transitions is often based on only a small number of observed switches; thus, it presents a substantial challenge to the practitioner. Therefore, it is a sensible exercise to evaluate and test the performance of the specification at hand via a Monte Carlo simulation study. We simulate a sample path over 1000 time periods for 1000 iterations, where the parameter values as well as the length of the data-generating process are chosen to mimic the reported number of COVID-19 infections. The results of the Monte Carlo study are shown in Table 5. Considering the confounding factors that complicate the estimation of regime-switching state-space models, the D.Seas.C. specification in (1) to (5) seems to accurately identify the data-generating process. Only the mean of the estimates of the additional drift parameter ν 1 deviates from its true value, together with a very high standard deviation. The drift parameter ν 1 materializes only during episodes where the down-turning regime S t = 1 is turned on (see Equation (4)). As a consequence, in iterations where the randomly generated sample path contains only very few and brief periods of the down-turning regime, any parameter estimate will be imprecise and subject to a high variance. This is also evident from the median of point estimates in Table 5, which is robust against such outlier cases.
Regarding the robustness of the D.Seas.C. model, it can be inferred that a sufficient data history, as well as a sufficient exposure to both regimes, are required in order to accurately estimate a regime-switching model. Examining Figure 1, these conditions appear to be satisfied.

Conclusions
In this article, we employ a regime-switching UC model to decompose log daily COVID-19 infections and estimate the probabilities of alternating regimes of up-and downturning case numbers over an extensive period, spanning from the 1st of April 2020 to the 25th of December 2022.
Our findings indicate that a regime-switching UC model is capable of capturing many characteristics of the COVID-19 pandemic that more inflexible approaches cannot absorb: a regime-dependent drift assumes persistent long-run dynamics; the weekly patterns of reported COVID-19 infections are modeled by a seasonal component; and a stationary autoregressive component captures short-run dynamics and measurement errors.
The results show that: (i) our approach is well-suited to asses ex post the severity and or efficacy of structural changes, such as viral mutations, regulatory loosening and tightening, behavioral changes, and novel therapeutic approaches ex post; (ii) the model can be applied as a policy tool to monitor the state of the pandemic by nowcasting the current propensity of either the up-or down-turning regime being switched on.
Remaining issues arising from highly inconsistent and erratic seasonality could be overcome by extending the model to allow for fractionally integrated components. In particular, this would allow for gradual adjustments of the trend to structural changes (see e.g., Hartl and Jucknewitz 2022). Another possible extension could be the use of time-inhomogeneous transition probabilities, as proposed in Kaufmann (2015). There, an exogenous variable drives the transition matrix over time, allowing for changes in the frequency of regime switches over the course of the pandemic. However, identifying a candidate exogenous variable is difficult and remains an open question for future research.  Data Availability Statement: Publicly available datasets were analyzed in this study. These data can be found at the Github COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University https://github.com/CSSEGISandData/COVID-19 (accessed on 14 March 2023).