Next Article in Journal
Information-Criterion-Based Lag Length Selection in Vector Autoregressive Approximations for I(2) Processes
Next Article in Special Issue
Factorization of a Spectral Density with Smooth Eigenvalues of a Multidimensional Stationary Time Series
Previous Article in Journal
Detecting Common Bubbles in Multivariate Mixed Causal–Noncausal Models
Previous Article in Special Issue
Causal Vector Autoregression Enhanced with Covariance and Order Selection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling COVID-19 Infection Rates by Regime-Switching Unobserved Components Models

1
School of Business and Economics, Maastricht University, 6200 MD Maastricht, The Netherlands
2
Department of Economics and Econometrics, University of Regensburg, Universitätsstr. 31, 93053 Regensburg, Germany
3
Institute for Employment Research, Regensburger Str. 104, 90478 Nuremberg, Germany
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Econometrics 2023, 11(2), 10; https://doi.org/10.3390/econometrics11020010
Submission received: 30 September 2022 / Revised: 9 March 2023 / Accepted: 28 March 2023 / Published: 3 April 2023
(This article belongs to the Special Issue High-Dimensional Time Series in Macroeconomics and Finance)

Abstract

:
The COVID-19 pandemic is characterized by a recurring sequence of peaks and troughs. This article proposes a regime-switching unobserved components (UC) approach to model the trend of COVID-19 infections as a function of this ebb and flow pattern. Estimated regime probabilities indicate the prevalence of either an infection up- or down-turning regime for every day of the observational period. This method provides an intuitive real-time analysis of the state of the pandemic as well as a tool for identifying structural changes ex post. We find that when applied to U.S. data, the model closely tracks regime changes caused by viral mutations, policy interventions, and public behavior.
JEL Classification:
C32; C51; I10

1. Introduction

Since its onset in late 2019, the COVID-19 pandemic has permeated next to all facets of public life. Even in the transition of the COVID-19 pandemic to an endemic state, the by now well-known routine of alternating infection peaks and troughs will demand close observation for the foreseeable future (Telenti et al. 2021). However, analyzing and monitoring the state and development of the pandemic is complicated by the nature of the data on COVID-19 case numbers: a glimpse at Figure 1 reveals key characteristics include the strong persistence and nonstationarity of case numbers (Dolton 2021), as well as alternating regimes of increasing and decreasing infections caused by policy interventions, medical innovations, seasonal climate conditions, and the evolution of the virus itself (Doornik et al. 2022; Fiscon et al. 2021). In addition, infection dynamics are overlapped by a seasonal pattern of increasing volatility, generated by a varying number of tests over the days of the week (Bergman et al. 2020), as well as by measurement errors (Hortaçsu et al. 2021).
While a variety of econometric tools have been utilized to study the dynamics of COVID-19 case numbers, in particular, unobserved components (UC) models have been found successful in capturing the aforementioned characteristics of COVID-19-related data.1 In the early stages of the pandemic, UC models were used to fit linear deterministic trends to COVID-19 case numbers and to identify structural breaks (Hartl et al. 2020; Lee et al. 2021; Liu et al. 2021). With an increasing number of observations available, UC models with stochastic trends have been considered by Moosa (2020) and Doornik et al. (2022), while (stationary) seasonal components were added by Navas Thorakkattle et al. (2022) and Xie (2022), among others. However, appropriately accounting for the alternating peaks and troughs in the trend of COVID-19 case numbers remains an open challenge.
We contribute to the UC literature by explicitly modeling the peak and trough pattern of infection numbers, which has emerged as one of the defining features of the COVID-19 pandemic. For this purpose, we introduce a regime-switching UC model that decomposes log daily COVID-19 infections into trend, seasonal, and cyclical components. While the trend is formulated as a random walk (RW) with drift to capture the long-run dynamics of log new infections, the novelty of our model is that the drift term is made regime-dependent to account for alternating periods of increasing and decreasing infections, as well as for the strong persistence of the respective regime. A seasonal component is added to model the weekly recurring pattern of case numbers, while an autoregressive (AR) term accounts for short-term dependencies in the data.
Regime-switching and mixture models have been studied extensively in the past. We refer to Frühwirth-Schnatter (2006); Kim and Nelson (2017) for an overview. As proposed in Kim (1994); Kim and Nelson (2017, ch. 5), we estimate the trend, seasonal, and cyclical components via the Kim filter, which is an extension of the Kalman filter to regime-switching models: between the prediction and updating step of the Kalman filter, it executes the recursions of Hamilton (1989) to estimate the regime probabilities and thus allows for regime switching in a state-space framework. Parameter estimation is carried out by numerical optimization of the likelihood function, where we make use of an extensive grid search to be robust against local optima. As an alternative, one could also utilize the Expectation–Maximization (EM) algorithm, which has recently been derived for the Kim filter by Degras et al. (2022). As an alternative estimation strategy, we also consider a Bayesian–Gibbs sampling approach. Markov chain Monte Carlo (MCMC) techniques have been widely employed for state-space and especially regime-switching models (see Frühwirth-Schnatter 2006). In contrast to the Kim filter, inference is based on the joint distribution of the state vector, regime probabilities, and additional parameters, rather than on conditional distributions (Frühwirth-Schnatter 2006, ch. 13).
The model is applied to daily U.S. infection data provided by the Johns Hopkins University Center for Systems Science and Engineering (JH/CSSE) (Dong et al. 2020). We found that the estimated regime probabilities closely track the pattern of infection waves throughout the whole chronology of the pandemic. This allows for easy ex post evaluation of the effectiveness of public health interventions or the severity of viral mutations: whereas significant changes tip the system into the opposite regime, harmless mutations or insignificant interventions do not trigger a switch. Moreover, we introduce a nowcasting application that provides an easy-to-understand and concise monitoring tool for the current state of the pandemic.
The plan of this paper is as follows. Section 2 describes the data and methodology. Section 3 presents empirical findings and examines the nowcasting application. Section 4 discusses alternative model specifications, contrasts the Kim filter with a Bayesian–Gibbs sampling approach, and presents a Monte Carlo study to evaluate the reliability of parameter estimates for the preferred specification. Section 5 concludes. All R code to replicate this paper is available at the Github repository https://github.com/Paul-Haimerl/Regime-SW-UC-COVID-19 Repository accessed on 2 April 2023.

2. Data and Methodology

To analyze the state and dynamics of the COVID-19 pandemic, we consider data from the JH/CSSE, which can be downloaded from https://github.com/CSSEGISandData/COVID-19 (accessed on 14 March 2023). Figure 1 sketches the data on reported daily COVID-19 infections from the 22nd of January 2020 to the 25th of December 2022 for the U.S. in both levels and logarithms.
As can be seen, data on daily infections in the early phase of the pandemic are downward-biased due to limited test capacities. As test availability increases, one sees a rapid increase in case numbers in March 2020.2
Accounting for this bias, we select the 1st of April 2020 as the start date for our study. Tracking the pandemic until the end of 2022 provides a total of T = 1005 daily observations.3
To set up our model, let i t denote new daily U.S. COVID-19 infections and define y t = log ( i t ) . The measurement equation of the unobserved components model is then given by
y t = μ t + γ t + c t , t = 1 , , T ,
where μ t denotes the trend, γ t models the seasonal component, and c t is a stationary cyclical component. As Figure 1 suggests, the model is formulated in terms of logarithmic case numbers to reflect the exponential growth of COVID-19. Furthermore, taking logs allows for proportional seasonal effects and measurement errors rather than additive effects (see e.g., Doornik et al. 2021; Lee et al. 2021; Liu et al. 2021).
As typical in the literature, the seasonal component γ t is modeled as a mean-zero deterministic process (Durbin and Koopman 2012; Navas Thorakkattle et al. 2022, ch. 3.2.2)
γ t = j = 1 6 γ t j .
The cyclical component c t is to capture short-run dynamics in the data. Therefore, we define c t as a stationary AR(2) process
( 1 L ϕ 1 L 2 ϕ 2 ) c t = η t η t i . i . d . N ( 0 , σ η 2 ) .
L denotes the lag operator L x t = x t 1 . The motivation behind (3) is to allow for some additional short-term dependencies and thus for autocorrelated measurement errors instead of a purely unsystematic additive component with no temporal correlation. Such short-term fluctuations can arise, e.g., due to testing bottlenecks, large-scale public events, the gradual introduction of new policies, or time lags in the reporting of new cases.
Turning to the trend, we specify μ t as a random walk with drift to capture the permanent, smooth low-frequency dynamics, as displayed in Figure 1.
μ t = μ t 1 + ν t + ξ t ξ t i . i . d . N ( 0 , σ ξ 2 ) , ν t = ν 0 + S t ν 1 .
The drift term ν t depends on the current regime indicator S t { 0 , 1 } . In order to uniquely identify the system, we impose ν 1 < 0 , which declares S t = 1 as the infection down-turning regime and S t = 0 as the infection up-turning regime. Consequently, (4) allows for alternating regimes of increasing and decreasing infections, as well as for different rates of growth and decay. Specification (4) can be justified by visual inspection of Figure 1, which shows that log infections are driven by persistent but alternating phases of increasing and decreasing case numbers. Regime switches may be triggered by major epidemic events that have a persistent impact, such as virus mutations, policy changes, or medical innovations, among others.
The switching behavior between the two states is modeled by a first-order stationary Markov process (Hamilton 1989, ch. 2), with the transition probabilities
Pr ( S t = 1 | S t 1 = 1 ) = p , Pr ( S t = 0 | S t 1 = 1 ) = 1 p , Pr ( S t = 0 | S t 1 = 0 ) = q , Pr ( S t = 1 | S t 1 = 0 ) = 1 q .
The transition probabilities are constrained to satisfy p , q > 90 % , which resembles the path-dependent behavior of infection-increasing and infection-decreasing regimes.4
In the following, we refer to the specification in (1) to (5) as the D.Seas.C. model. To estimate the parameters θ = ( σ ξ , σ η , ν 1 , ν 0 , ϕ 1 , ϕ 2 , p , q ) we proceed similarly to the UC literature: First, the model is cast in state-space form. Based on the Kim filter, we obtain a conditional likelihood function that combines the prediction and updating steps of the Kalman filter with the regime-switching recursions of Hamilton (1989) (see Kim 1994, ch. 2 for details). Maximizing the conditional likelihood yields the desired parameter estimates and is analogous to the usual prediction error decomposition in the UC literature, which involves the Kalman filter instead of the Kim filter (see Harvey 1989, ch. 4).5 While the estimation results for the D.Seas.C model are presented in the next section, we also discuss generalizations, including a stochastic specification of the seasonal term and the inclusion of a third state in Section 4.

3. Empirical Results

To initiate the Kim filter, we initialize μ 0 with the number of reported COVID-19 cases on the 31st of May 2020, one day prior to the start of the observational period. The remaining entries to the state vector and the diagonal of the state covariance matrix are initialized diffusely, as is common in the UC literature (see Durbin and Koopman 2012, ch. 5).
To ensure stable estimates of all parameters in θ and to avoid convergence to local optima, we employ an extensive three-step grid search that covers the relevant parameter space while remaining computationally feasible: In a first global grid search, 30,000 parameter combinations are randomly drawn from uniform distributions with sufficiently wide support to encompass the entire parameter space. To narrow down the locations of local and global optima, we pick the 50 θ -vectors corresponding to the greatest likelihood values. For each parameter, we store the minimum and maximum value of these 50 combinations, which gives us the range of the relevant parameter space. In the second step, these minimum and maximum values are employed as bounds to construct a finer parameter grid. After computing the likelihood of each grid point, we use the 50 grid points yielding the greatest likelihood as starting values for numerical optimization via the Nelder–Mead algorithm. The optimized parameters corresponding to the greatest log likelihood are then chosen as the final parameter estimate θ ^ . Figure 2 outlines the parameter distributions as generated by these 50 optimization results.
As all parameters of our model (1) to (5) feature a narrow distribution around their best estimate, they appear accurately identified.
Table 1 presents the estimation results and corresponding standard errors.
Based on the transition probabilities p ^ and q ^ , it is straightforward to calculate the regime durations. The model estimates an expected duration of ( 1 p ^ ) 1 83 days for the down-turning S t = 1 regime and ( 1 q ^ ) 1 32 days for the up-turning S t = 0 state. The drift estimates ν ^ 1 and ν ^ 0 are interpreted as the average day-to-day change in the trend in log COVID-19 cases. Transforming back into levels gives an average daily decrease of 1 exp ν ^ 0 + ν ^ 1 1.49 % for the down-turning regime on the one hand, and a daily increase of exp ν ^ 0 1 3.36 % in periods of the up-turning regime on the other. Whereas the COVID-19 infections double roughly every ν ^ 0 1 log ( 2 ) 21 days in an up-turning state, approximately ( ν ^ 0 + ν ^ 1 ) 1 log ( 1 2 ) 46 days are required to halve these case numbers again in a subsequent down-turning regime.
Executing the Kim filter and smoothing recursions yields estimates for the state vector. Figure 3 displays the smoothed trend μ t ^ and regime probabilities Pr ^ ( S t = 0 | y T , . . . , y 1 ) , together with the noisy measurement of log COVID-19 infections.
As Figure 3 shows, the model allows for the identification of episodes of up- and down-turning regimes. In particular, it assigns a strong path dependence to the infection regimes, such that long episodes of containment are followed by new infection waves. The increasing proportion of seasonal distortions is attributed to the seasonal and short-run component, and thus does not deteriorate the smoothed trend nor the smoothed regime probabilities. It is only towards the end of the observational horizon that lateral dynamics, strong seasonal fluctuations, and the limited number of future observations complicate the separation of trend, seasonal, and cyclical dynamics. As a consequence, the smoothed trend and regime probabilities appear more erratic.
To illustrate the benefits from our UC model for identifying structural changes ex post, we select a simple threshold of the smoothed regime probability Pr ^ ( S t = 0 | y T , . . . , y 1 ) > 40 % in order to indicate up-turning states. While we leave the choice of a policy rule up to the experts in the field, our arbitrarily chosen threshold is rather cautious (i.e., the probability threshold is set below 1 / 2 ) and allows to identify several infectious waves that are shaded in Figure 3. A chosen list of policy measures and events, as reported by the U.S. Centers for Disease Control and Prevention in proximity of the six identified infection waves, is presented below.6
  • 3 June 2020–10 July 2020 10.04.: The U.S. is the country with the most reported COVID-19 cases and deaths worldwide. 13.04.: Most states in the U.S. report widespread cases of COVID-19. 13.06.: CDC releases consolidated COVID-19 testing guidelines. 22.06.: The U.S. President extends the temporary suspension on new immigrant visas through the end of the year. 30.06.: Dr. Anthony Fauci warns of new infections overwhelming the healthcare system. 01.07.: The U.S. has more than 50 K new daily COVID-19 cases. 14.07.: The CDC again calls on all people to wear cloth face masks when leaving their homes.
  • 6 October 2020–20 November 2020 04.11.: New U.S. COVID-19 cases surpass 100 K in a day. 10.11.: Total cases of COVID-19 in the U.S. surge past 10 M. 13.11.: COVID-19 case numbers spike across the U.S. after Halloween celebrations. 20.11.: The CDC recommends to stay home for Thanksgiving and to avoid contact as case numbers surge.
  • 26 June 2021–23 August 2021 27.07.: Amid a Delta variant surge, the CDC releases updated guidance recommending that everyone in areas with high transmission wears a mask. 30.07.: The CDC releases data suggesting that vaccinated people infected with Delta can transmit the virus to others. 23.08.: The FDA fully approves the Pfizer–BioNTech COVID-19 vaccine for all people ages 18 years and older.
  • 22 November 2021–14 January 2022 26.11.: The WHO designates the COVID-19 Omicron variant as a “variant of concern”. 20.12.: The CDC releases data estimating that the Omicron variant is around 1.6 times more transmissible than the Delta variant. 27.12.: The CDC shortens the recommended isolation period for people with COVID-19 to six days.
  • 4 April 2022–25 May 2022 13.04.: The Omicron subvariant BA.2 makes up more than 85% of all new COVID-19 infections in the U.S. 18.04.: The CDC’s mask mandate for indoor public transportation is struck down in court. 21.04.: The DHS extends the COVID-19 vaccine requirement for all noncitizens entering the U.S.
  • 28 November 2022–8 December 2022 08.12.: The FDA authorizes bivalent COVID-19 vaccines for children as young as 6 months of age. 15.12.: The Biden administration announces the COVID-19 Winter Preparedness plan.
In addition to an ex post analysis, we also propose the use of the prediction step of the Kim filter as an easy real-time monitoring device for the pandemic. In case the one-step-ahead prediction of entering the up-turning regime exceeds the aforementioned threshold of 40%, policies may be triggered to quickly shorten the duration of the incoming up-turning regime.
In the following, we evaluate the appropriateness of this monitoring application based on its past propensity for type I (false-positive) and type II (false-negative) errors, as well as on the time lag of filtered predictions relative to the smoothed estimates. To mimic the situation of a decision maker in real time, we start with 150 observations and estimate the parameters θ , as described at the beginning of this section. We then obtain one-step-ahead predictions for the conditional probability for the up-turning state from the Kim filter. Next, an additional observation is added to the sample, and the procedure repeats. To speed up the procedure, we update the parameter estimates only every two iterations and use the estimates from the previous iteration as starting values for the numerical optimization. Every 500 iterations, an additional grid search, as described at the beginning of this section, is performed to robustify the procedure.
Figure 4 compares the smoothed estimates and the filtered one-step-ahead regime probabilities. Time periods where the one-step-ahead prediction exceeds the 40% threshold are shaded.
As expected, the filtered regime probabilities are significantly more jagged and slightly lag the smoothed estimates due to their smaller information set. Nonetheless, all of the previously established up-turning periods that fall within the scope of the nowcasting exercise are identified by the monitoring device. A feature that underpins the rather conservative nature of the monitoring device is that it detects the onset of an up-turning regime almost in real time. However, this comes at the cost of a higher type I error, i.e., the detection of an up-turning regime that contradicts the smoothed estimates, e.g., at the end of 2021. During the summer of 2022, the monitoring device detected several additional up-turning regimes that are classified as down-turning ones based on the smoothed regime probabilities. However, as Figure 3 indicates, the smoothed trend is actually increasing during that period; however, the magnitude of the increase is small compared with other up-turning periods.
Table 2 contrasts the infection waves, as identified by the smoothed regime probabilities (shaded periods in Figure 3), with those derived from the filtered one-step-ahead predictions (shaded periods in Figure 4).
Across the whole covered period, the nowcasting tool lags only slightly behind the smoothed estimates. These findings suggest the suitability of the model in Equations (1)–(5) for a real-time monitoring tool to provide an intuitive and concise state of the pandemic.

4. Robustness and Discussion

The daily observations of COVID-19 case numbers in Figure 1 display an increasing seasonal variation over time. The deterministic seasonality of the D.Seas.C. model in Equation (2) is not equipped to handle such dynamic behavior. As a result, some of the increasing seasonality may be captured by the estimated cyclical and trend component, which subsequently distorts the regime-switching behavior of the system. A straightforward extension of the model would therefore be to substitute (2) with a stochastic specification capable of assuming seasonal behavior with a progressively increasing variance. One process with such properties is a seasonal unit root (see Bauer and Wagner 2012 for details). It can be added to the state-space model by specifying the new seasonal component
γ t U R = j = 1 6 γ t j U R + x t , ( 1 L 7 ) x t = ω t , ω t i . i . d . N ( 0 , σ ω 2 ) .
Similar to the traditional UC literature, γ t U R is set to be a stochastic process with expected average equal to zero (see e.g., Durbin and Koopman 2012, ch. 3.2.2). However, contrary to the deterministic seasonality in (2), the stochastic specification (6) introduces nonstationary dynamics at lag seven. By plugging in x t and rearranging, (6) can be expressed as j = 0 6 γ t j U R = j = 7 13 γ t j U R + ω t , and thus ( 1 L 7 ) j = 0 6 γ t j U R = ω t . Therefore, the unit root at lag seven links the current seasonal dynamics to those of the previous week, which generates a repeating seasonal pattern. In each period, a shock is added to j = 0 6 γ t j U R , which yields an ever-increasing volatility over time. Figure 5 sketches 100 simulations of γ t U R and illustrates the oscillating property and nonstationary nature of this seasonal component. Note that (6) differs from unit root seasonal components in the spirit of Harrison and Stevens (1976), which are rather to allow for time-variant seasonal dummies instead of generating an oscillating, diverging behavior.
Table 3 presents the estimation results for substituting the deterministic seasonal term (2) with the unit root specification (6), where the parameter estimates are obtained as before. Furthermore, we provide estimates for specifications that exclude the cyclical component (3) in favor of an unsystematic error term ϵ t for both the deterministic and stochastic seasonality, respectively.
Another point of concern is the lateral behavior of infection numbers towards the end of the observational horizon. Ambiguous time periods that cannot be fully attributed to either the up-turning or the down-turning regime may lead to incoherent regime probabilities and thus bias the regime estimates at an earlier stage of the pandemic. As a potential remedy, we introduce a specification involving a third state S t = 2 , during which the drift term ν t of the trend component in (4) is set to zero. Thus, the third state is intended to reflect periods that feature neither a strong upward nor downward trend in the reported case numbers. Table 3 provides estimates of the proposed extensions together with the preferred D.Seas.C. specification displayed in Section 3.
All information criteria clearly favor the more general specifications that include a seasonal unit root process over the D.Seas.C. model presented in Section 3. This is due to the seasonal unit root component, which better grasps the oscillating seasonal pattern along with its increasing volatility over time, as compared with the deterministic seasonal specification. The latter attributes the strong short-run fluctuations to the cyclical component. Since c t cannot adequately capture the strong seasonal patterns due to its stationary nature, the smoothed cyclical shocks appear to be autocorrelated, which reduces the likelihood and thus increases the information criteria.7 However, even though the observed series can be fitted more accurately overall when the seasonal component is generalized to include a seasonal unit root, it is striking that the trend and regime-switching characteristics vary little across specifications: as illustrated by Figure 6, the smoothed regime probabilities of the different models almost coincide, and only minimal differences between the smoothed trends can be spotted. Therefore, smoothed trend and regime probabilities appear robust to the specification of seasonal and short-run components.
The introduction of a third state, reflecting episodes of stable case numbers, is able to improve upon the fit of the two-state D.Seas.C. model, but this refinement is not sufficient enough as to be favored by the BIC and HQ information criteria. In addition, the constraint q > 90 % is binding, indicating that the model is not suitable to depict the regime path persistence that is inherent to the pandemic.8 Consequently, we choose the simpler D.Seas.C. model for the remainder of our analysis.
Additional consideration is given to the general estimation strategy. Numerical optimization of a likelihood function, as employed by the Kim filter, has been traditionally regarded as the standard technique for state-space models (Durbin and Koopman 2012, ch. 7). However, Bayesian approaches, especially MCMC techniques, are also a well-developed field in the literature (Frühwirth-Schnatter 2006; Kim and Kang 2019; Luginbuhl and de Vos 1999). In the Bayesian estimation of UC models, the state vector and additional parameters are drawn from a joint distribution, rather than treating the unknown parameters as fixed when maximizing the likelihood numerically. As a consequence, the Bayesian approach bears the advantage that the posteriors incorporate any underlying uncertainty regarding the additional parameters (Luginbuhl and de Vos 1999). Moreover, in order for the Kim filter to account for all possible regime permutations, M 2 individual predictions need to be produced in a single time period for a model with M states. Since the path dependencies grow at a rate of O ( M 2 T ) , the Kim filter exploits regime probabilities to collapse M 2 state vector posteriors into M posteriors at the end of each time period in order to remain computationally feasible (Kim and Nelson 2017, ch. 5.2). Although previous studies have shown that this approximation entails only a small effect on the final estimates, some bias may be introduced by using weighted averages to collapse the posteriors (Kim and Kang 2019). The Bayesian approach, on the other hand, relies on draws from the joint distribution of the state vector, regime probabilities, and additional parameters. The need for a similar approximation step in an effort to track all conditional distributions is eliminated. Inference only requires the convergence of the Markov chain (Frühwirth-Schnatter 2006, ch. 13).
To robustify our findings, we therefore also estimate the preferred D.Seas.C. model using a Gibbs sampling approach. Table 4 portrays the results. Furthermore, Figure 6 includes the averaged trend and regime probability posterior draws.
Comparing the Kim filter estimates in Table 1 with the Bayesian inference in Table 4, the cyclical dynamics almost perfectly match. However, when analyzing the trend component, the Gibbs sampler provides a lower estimate of the innovation variance σ ξ while, in parallel, specifying the regimes as less persistent and more dissimilar. Therefore, more variance of the data-generating process is attributed to the regime-switching behavior, driving more frequent regime switches and a greater regime-dependent effect. Nevertheless, even with more volatile regime probability estimates, inference based on the Gibbs sampler exhibits only marginal differences, as Figure 6 reveals.
Regime-switching models are notoriously difficult to estimate. In particular, inference regarding the regime transitions is often based on only a small number of observed switches; thus, it presents a substantial challenge to the practitioner. Therefore, it is a sensible exercise to evaluate and test the performance of the specification at hand via a Monte Carlo simulation study. We simulate a sample path over 1000 time periods for 1000 iterations, where the parameter values as well as the length of the data-generating process are chosen to mimic the reported number of COVID-19 infections. The results of the Monte Carlo study are shown in Table 5.
Considering the confounding factors that complicate the estimation of regime-switching state-space models, the D.Seas.C. specification in (1) to (5) seems to accurately identify the data-generating process. Only the mean of the estimates of the additional drift parameter ν 1 deviates from its true value, together with a very high standard deviation. The drift parameter ν 1 materializes only during episodes where the down-turning regime S t = 1 is turned on (see Equation (4)). As a consequence, in iterations where the randomly generated sample path contains only very few and brief periods of the down-turning regime, any parameter estimate will be imprecise and subject to a high variance. This is also evident from the median of point estimates in Table 5, which is robust against such outlier cases.
Regarding the robustness of the D.Seas.C. model, it can be inferred that a sufficient data history, as well as a sufficient exposure to both regimes, are required in order to accurately estimate a regime-switching model. Examining Figure 1, these conditions appear to be satisfied.

5. Conclusions

In this article, we employ a regime-switching UC model to decompose log daily COVID-19 infections and estimate the probabilities of alternating regimes of up- and down-turning case numbers over an extensive period, spanning from the 1st of April 2020 to the 25th of December 2022.
Our findings indicate that a regime-switching UC model is capable of capturing many characteristics of the COVID-19 pandemic that more inflexible approaches cannot absorb: a regime-dependent drift assumes persistent long-run dynamics; the weekly patterns of reported COVID-19 infections are modeled by a seasonal component; and a stationary autoregressive component captures short-run dynamics and measurement errors.
The results show that: (i) our approach is well-suited to asses ex post the severity and or efficacy of structural changes, such as viral mutations, regulatory loosening and tightening, behavioral changes, and novel therapeutic approaches ex post; (ii) the model can be applied as a policy tool to monitor the state of the pandemic by nowcasting the current propensity of either the up- or down-turning regime being switched on.
Remaining issues arising from highly inconsistent and erratic seasonality could be overcome by extending the model to allow for fractionally integrated components. In particular, this would allow for gradual adjustments of the trend to structural changes (see e.g., Hartl and Jucknewitz 2022). Another possible extension could be the use of time-inhomogeneous transition probabilities, as proposed in Kaufmann (2015). There, an exogenous variable drives the transition matrix over time, allowing for changes in the frequency of regime switches over the course of the pandemic. However, identifying a candidate exogenous variable is difficult and remains an open question for future research.

Author Contributions

Conceptualization, P.H. and T.H.; methodology, P.H. and T.H.; software, P.H.; validation, T.H.; writing—original draft preparation, P.H. and T.H.; writing—review and editing, P.H. and T.H.; visualization, P.H.; supervision, T.H.; funding acquisition, T.H. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Institute for Employment Research, 90478 Nuremberg, Germany. Tobias Hartl gratefully acknowledges support through the project 356439312 financed by the German Research Foundation (DFG).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found at the Github COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University https://github.com/CSSEGISandData/COVID-19 (accessed on 14 March 2023).

Acknowledgments

We wish to thank the editor and the anonymous referees for helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ARAutoregressive
EMExpectation-Maximization
JH/CSSEJohns Hopkins University Center for Systems Science and Engineering
RWRandom Walk
UCUnobserved Components

Notes

1
Harvey (1989) refers to these models as structural time series models. To avoid confusion, the term UC model is used for any state-space model that specifies one or multiple time series as a function of latent components and assigns an interpretation to these components by imposing assumptions on their spectra.
2
Carvalho et al. (2021) and the U.S. Centers for Disease Control and Prevention (https://www.cdc.gov/museum/timeline/covid19.html, accessed on 14 March 2023), provide an exhaustive timeline regarding the development of the COVID-19 pandemic. Additional data on the number of performed tests, as well as on a wide range of other indicators, can be found at https://ourworldindata.org/coronavirus (accessed on 14 March 2023) (Ritchie et al. 2020).
3
It is reasonable to assume that the number of reported COVID-19 cases in the last week of the year is again biased downward due to the holiday period. We therefore omit the last week of 2022 and end the observational horizon of our analysis on the 25th of December.
4
Constraining the parameter space to p , q > 90 % implies a maximum expected value of three regime switches per 30 days. This is a reasonable constraint given the realized dynamics of the COVID-19 pandemic and, furthermore, speeds up the parameter optimization.
5
Note that we estimate ν 0 as part of the state vector. However, the state-space representation of the model in (1) to (5) is not uniquely defined. Different, albeit observationally identical approaches, such as, e.g., the estimation of ν 0 via ML or constraining ν 1 to be >0 and flipping labels with ν 0 , are possible.
6
A more detailed and extensive overview can be seen at https://www.cdc.gov/museum/timeline/covid19.html (accessed on 14 March 2023) as well as https://www.defense.gov/Spotlights/Coronavirus-DOD-Response/Timeline/ (accessed on 14 March 2023).
7
Another way to get rid of the strong seasonal pattern would be to take seven-day averages of the log case numbers before estimating the model, which would yield a smooth series and eliminate the seasonal variation.
8
Since the focus of the analysis lies on identifying coherent periods of up- or down-turning infection regimes, we do not constrain P 22 to be >90%.

References

  1. Bauer, Dietmar, and Martin Wagner. 2012. A state space canonical form for unit root processes. Econometric Theory 28: 1313–49. [Google Scholar] [CrossRef]
  2. Bergman, Aviv, Yehonatan Sella, Peter Agre, and Arturo Casadevall. 2020. Oscillations in U.S. COVID-19 incidence and mortality data reflect diagnostic and reporting factors. mSystems 5: e00544–20. [Google Scholar] [CrossRef]
  3. Carvalho, Thiago, Florian Krammer, and Akiko Iwasaki. 2021. The first 12 months of COVID-19: A timeline of immunological insights. Nature Reviews Immunology 21: 245–56. [Google Scholar] [CrossRef] [PubMed]
  4. Degras, David, Chee-Ming Ting, and Hernando Ombao. 2022. Markov-switching state-space models with applications to neuroimaging. Computational Statistics & Data Analysis 174: 107525. [Google Scholar] [CrossRef]
  5. Dolton, Peter. 2021. The statistical challenges of modelling COVID-19. National Institute Economic Review 257: 46–82. [Google Scholar] [CrossRef]
  6. Dong, Ensheng, Hongru Du, and Lauren Gardner. 2020. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases 20: 533–34. [Google Scholar] [CrossRef] [PubMed]
  7. Doornik, Jurgen A., Jennifer L. Castle, and David F. Hendry. 2021. Modeling and forecasting the COVID-19 pandemic time-series data. Social Science Quarterly 102: 2070–87. [Google Scholar] [CrossRef]
  8. Doornik, Jurgen A., Jennifer L. Castle, and David F. Hendry. 2022. Short-term forecasting of the Coronavirus pandemic. International Journal of Forecasting 38: 453–66. [Google Scholar] [CrossRef]
  9. Durbin, James, and Siem Jan Koopman. 2012. Time Series Analysis by State Space Methods: Second Edition. Oxford: Oxford University Press. [Google Scholar] [CrossRef]
  10. Fiscon, Giulia, Francesco Salvadore, Valerio Guarrasi, Anna Rosa Garbuglia, and Paola Paci. 2021. Assessing the impact of data-driven limitations on tracing and forecasting the outbreak dynamics of COVID-19. Computers in Biology and Medicine 135: 104657. [Google Scholar] [CrossRef]
  11. Frühwirth-Schnatter, Sylvia. 2006. Finite Mixture and Markov Switching Models. New York: Springer. [Google Scholar] [CrossRef]
  12. Hamilton, James D. 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57: 357–84. [Google Scholar] [CrossRef]
  13. Harrison, Peter J., and C. F. Stevens. 1976. Bayesian forecasting. Journal of the Royal Statistical Society: Series B 38: 205–28. [Google Scholar] [CrossRef]
  14. Hartl, Tobias, and Roland Jucknewitz. 2022. Approximate state space modelling of unobserved fractional components. Econometric Reviews 41: 75–98. [Google Scholar] [CrossRef]
  15. Hartl, Tobias, Klaus Wälde, and Enzo Weber. 2020. Measuring the impact of the German public shutdown on the spread of COVID-19. Covid Economics: Vetted and Real-Time Papers 1: 25–32. [Google Scholar]
  16. Harvey, Andrew C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. [Google Scholar]
  17. Hortaçsu, Ali, Jiarui Liu, and Timothy Schwieg. 2021. Estimating the fraction of unreported infections in epidemics with a known epicenter: An application to COVID-19. Journal of Econometrics 220: 106–29. [Google Scholar] [CrossRef] [PubMed]
  18. Kaufmann, Sylvia. 2015. K-state switching models with time-varying transition distributions—Does loan growth signal stronger effects of variables on inflation? Journal of Econometrics 187: 82–94. [Google Scholar] [CrossRef]
  19. Kim, Chang-Jin. 1994. Dynamic linear models with Markov-switching. Journal of Econometrics 60: 1–22. [Google Scholar] [CrossRef]
  20. Kim, Chang-Jin, and Charles R. Nelson. 2017. State-Space Models with Regime Switching: Classical and Gibbs-Sampling Approaches with Applications. Cambridge: MIT Press. [Google Scholar] [CrossRef]
  21. Kim, Young Min, and Kyu Ho Kang. 2019. Likelihood inference for dynamic linear models with Markov switching parameters: On the efficiency of the Kim filter. Econometric Reviews 38: 1109–30. [Google Scholar] [CrossRef]
  22. Lee, Sokbae, Yuan Liao, Myung Hwan Seo, and Youngki Shin. 2021. Sparse HP filter: Finding kinks in the COVID-19 contact rate. Journal of Econometrics 220: 158–80. [Google Scholar] [CrossRef] [PubMed]
  23. Liu, Laura, Hyungsik Roger Moon, and Frank Schorfheide. 2021. Panel forecasts of country-level COVID-19 infections. Journal of Econometrics 220: 2–22. [Google Scholar] [CrossRef] [PubMed]
  24. Luginbuhl, Rob, and Aart de Vos. 1999. Bayesian analysis of an unobserved-component time series model of GDP with Markov-switching and time-varying growths. Journal of Business & Economic Statistics 17: 456–65. [Google Scholar] [CrossRef]
  25. Moosa, Imad A. 2020. The effectiveness of social distancing in containing COVID-19. Applied Economics 52: 6292–05. [Google Scholar] [CrossRef]
  26. Navas Thorakkattle, Muhammed, Shazia Farhin, and Athar Ali Khan. 2022. Forecasting the trends of COVID-19 and causal impact of vaccines using Bayesian structural time series and ARIMA. Annals of Data Science 9: 1025–47. [Google Scholar] [CrossRef]
  27. Ritchie, Hannah, Edouard Mathieu, Lucas Rodés-Guirao, Cameron Appel, Charlie Giattino, Esteban Ortiz-Ospina, Joe Hasell, Bobbie Macdonald, Diana Beltekian, and Max Roser. 2020. Coronavirus pandemic (COVID-19). Our World in Data. Available online: https://ourworldindata.org/coronavirus (accessed on 14 March 2023).
  28. Telenti, Amalio, Ann Arvin, Lawrence Corey, Davide Corti, Michael S. Diamond, Adolfo García-Sastre, Robert F. Garry, Edward C. Holmes, Phillip S. Pang, and Herbert W. Virgin. 2021. After the pandemic: Perspectives on the future trajectory of COVID-19. Nature 596: 495–504. [Google Scholar] [CrossRef]
  29. Xie, Liming. 2022. The analysis and forecasting COVID-19 cases in the United States using Bayesian structural time series models. Biostatistics & Epidemiology 6: 1–15. [Google Scholar] [CrossRef]
Figure 1. Daily U.S. COVID-19 infections i t (orange, right scale) and logarithm of daily infections log ( i t ) (gray, left scale). The vertical line (gray, dashed) indicates the 1st of April 2020, the start of the observational period.
Figure 1. Daily U.S. COVID-19 infections i t (orange, right scale) and logarithm of daily infections log ( i t ) (gray, left scale). The vertical line (gray, dashed) indicates the 1st of April 2020, the start of the observational period.
Econometrics 11 00010 g001
Figure 2. Parameter-specific kernel densities of the final-step grid search results. The respective best estimate is marked in red. Note that p and q are floored at 90% (see Section 2).
Figure 2. Parameter-specific kernel densities of the final-step grid search results. The respective best estimate is marked in red. Note that p and q are floored at 90% (see Section 2).
Econometrics 11 00010 g002
Figure 3. Smoothed trend estimates μ ^ t (orange, left scale), smoothed regime probabilities Pr ^ ( S t = 0 | y T , , y 1 ) (blue, right scale), and log COVID-19 cases log ( i t ) (gray, left scale). The smoothed trend is a probability-weighted average of the two regime-specific trend estimates. Infection waves (up-turning regime) as identified by Pr ^ ( S t = 0 | y T , , y 1 ) > 40 % are shaded.
Figure 3. Smoothed trend estimates μ ^ t (orange, left scale), smoothed regime probabilities Pr ^ ( S t = 0 | y T , , y 1 ) (blue, right scale), and log COVID-19 cases log ( i t ) (gray, left scale). The smoothed trend is a probability-weighted average of the two regime-specific trend estimates. Infection waves (up-turning regime) as identified by Pr ^ ( S t = 0 | y T , , y 1 ) > 40 % are shaded.
Econometrics 11 00010 g003
Figure 4. Smoothed regime probabilities Pr ^ ( S t = 0 | y T , , y 1 ) (blue) and filtered regime probabilities Pr ^ ( S t = 0 | y t 1 , , y 1 ) (orange) for the up-turning regime. Days on which the filtered predictions Pr ^ ( S t = 0 | y t 1 , , y 1 ) exceed a threshold of 40 % are shaded. Horizontal black bars denote infection waves based on the smoothed estimates from Figure 3.
Figure 4. Smoothed regime probabilities Pr ^ ( S t = 0 | y T , , y 1 ) (blue) and filtered regime probabilities Pr ^ ( S t = 0 | y t 1 , , y 1 ) (orange) for the up-turning regime. Days on which the filtered predictions Pr ^ ( S t = 0 | y t 1 , , y 1 ) exceed a threshold of 40 % are shaded. Horizontal black bars denote infection waves based on the smoothed estimates from Figure 3.
Econometrics 11 00010 g004
Figure 5. One hundred simulated paths of the seasonal component γ t U R , as given in (6) (gray). The orange sample path depicts a single exemplary trajectory.
Figure 5. One hundred simulated paths of the seasonal component γ t U R , as given in (6) (gray). The orange sample path depicts a single exemplary trajectory.
Econometrics 11 00010 g005
Figure 6. Smoothed trend μ ^ (dashed, left scale) and regime probability Pr ^ ( S t = 0 | y T , . . . , y 1 ) (right scale) estimates for the preferred D.Seas.C. specification (orange, see Table 1), as well as for the seasonal unit root UR.Seas. model (grey, see Table 3) using the Kim filter. Averaged posterior draws of the trend and regime probabilities for the D.Seas.C. model as derived by the Gibbs sampler are shown in blue (see Table 4).
Figure 6. Smoothed trend μ ^ (dashed, left scale) and regime probability Pr ^ ( S t = 0 | y T , . . . , y 1 ) (right scale) estimates for the preferred D.Seas.C. specification (orange, see Table 1), as well as for the seasonal unit root UR.Seas. model (grey, see Table 3) using the Kim filter. Averaged posterior draws of the trend and regime probabilities for the D.Seas.C. model as derived by the Gibbs sampler are shown in blue (see Table 4).
Econometrics 11 00010 g006
Table 1. Maximum likelihood estimation result for the D.Seas.C specification.
Table 1. Maximum likelihood estimation result for the D.Seas.C specification.
ParameterEstimateStandard Error
σ ξ 0.0730.008
σ η 0.4090.010
ν 0 0.0330.004
ν 1 −0.0480.010
ϕ 1 0.4400.033
ϕ 2 −0.2700.032
q0.9690.017
p0.9880.010
Log L: −677.783 AIC: 1.379 BIC: 1.452 HQ: 1.407
Notes: The maximum likelihood estimates are based on the grid search, as described in Section 3. Standard errors are obtained from the inverted numerical Hessian matrix. Information criteria are adjusted to reflect the diffuse initialization of the state vector (Durbin and Koopman 2012, ch. 7.4).
Table 2. Periods of the up-turning S t = 0 regime as identified by smoothed estimates and corresponding nowcasting one-step-ahead predictions.
Table 2. Periods of the up-turning S t = 0 regime as identified by smoothed estimates and corresponding nowcasting one-step-ahead predictions.
Smoothed EstimatesFiltered One-Step-Ahead Predictions
Infection WaveBeginningEndBeginningEnd
13 June 202010 July 2020--
26 Oct 202020 Nov 20209 Oct 202010 Oct 2020
12 Oct 202013 Oct 2020
15 Oct 202017 Oct 2020
19 Oct 202022 Nov 2020
24 Nov 202025 Nov 2020
326 June 202123 Aug 202126 June 202127 June 2021
3 July 20214 July 2021
7 July 20219 Aug 2021
422 Nov 202114 Jan 202217 Nov 202124 Nov 2021
27 Nov 202129 Nov 2021
9 Dec 202128 Dec 2022
31 Dec 20211 Jan 2022
06 Jan 202115 Jan 2022
504 Apr 202225 May 20229 Apr 202210 Apr 2022
9 Apr 202210 Apr 2022
14 Apr 202215 Apr 2022
21 Apr 20228 May 2022
10 May 202222 May 2022
24 May 202228 May 2022
628 Noc 20228 Dec 20221 Dec 20223 Dec 2022
6 Dec 202210 Dec 2022
Notes: False-positive periods of the up-turning regime are omitted (see Figure 4). The first infection wave is not covered due to an initialization period of 150 days for the nowcasting application.
Table 3. Maximum likelihood parameter estimates for different model specifications.
Table 3. Maximum likelihood parameter estimates for different model specifications.
 D.Seas.C.D.Seas.UR.Seas.C.UR.Seas.D.Seas. 3 St.
σ ξ 0.0730.0810.0750.0750.042
(0.008)(0.010)(0.006)(0.006)(0.012)
σ ω --0.0050.005-
(0.001)(0.001)
σ ϵ -0.445-0.188-
(0.011)(0.006)
σ η 0.409-0.188-0.322
(0.010)(0.006)(0.008)
ν 0 0.0330.0340.0040.0420.035
(0.004)(0.004)(0.003)(0.004)(0.006)
ν 1 −0.048−0.047−0.055−0.055−0.257
(0.010)(0.012)(0.013)(0.013)(0.027)
ϕ 1 0.440-0.007-0.272
(0.033)(0.767)(0.038)
ϕ 2 −0.270-0-−0.185
(0.032)(0.009)(0.033)
q , P 00 0.9690.9710.9720.9730.900
(0.017)(0.017)(0.014)(0.013)(0.001)
P 01 ----0.092
(0.019)
P 10 ----0
(0.001)
p , P 11 0.9880.9900.9760.9910.947
(0.010)(0.008)(0.007)(0.007)(0.019)
P 20 ----0.018
(0.010)
P 21 ----0.007
(0.003)
Log L−677.783−768.670−157.759−161.242−662.714
AIC1.3791.5520.3480.3471.376
BIC1.4521.6050.4310.4101.518
HQ1.4071.5720.3790.3711.430
Notes: The maximum likelihood estimates are derived via the grid search proposed in Section 3. Standard errors are reported in parenthesis. The columns refer to model specifications with deterministic seasonality including (D.Seas.C.), as well as excluding (D.Seas.), the AR(2) cyclical component (3), a model accounting for a seasonal unit root process with (UR.Seas.C.) and without (UR.Seas.), the cyclical component, and a specification with a third state reflecting time periods of neither falling nor rising infection numbers (D.Seas. 3 St.).
Table 4. Gibbs sampling posterior draws for the preferred D.Seas.C. model specification.
Table 4. Gibbs sampling posterior draws for the preferred D.Seas.C. model specification.
MeanStd.DevMedian2.5%97.5%
σ ξ 0.0330.0090.0310.0230.060
σ η 0.4240.0100.4240.4040.445
ν 0 0.0650.0370.0540.0250.174
ν 1 −0.0890.030−0.080−0.178−0.058
ϕ 1 0.4330.0350.4330.3640.500
ϕ 2 −0.2540.034−0.254−0.319−0.186
q0.9020.0020.9020.9000.908
p0.9110.0150.9040.9000.955
Notes: The estimation result refers to the preferred model in Equations (1)–(5). The inference is based on 30,000 draws after an initial burn-in period of 5000 iterations. For robustness, we only store and evaluate every third draw. As in the frequentist estimation technique, the transition probabilities are constrained to be >90%, and ν 1 is bounded to be negative. Apart from the transition probabilities and the trend estimate, all priors are set diffusely. The trend is initialized with the number of COVID-19 infections one day prior to the start of the analysis. The initial transition probabilities are set to 0.95. Splitting the final chain of stored parameter draws in the middle yields a Gelman–Rubin criterion of 1.014, indicating convergence.
Table 5. Result from 1000 Monte Carlo simulations.
Table 5. Result from 1000 Monte Carlo simulations.
T = 1000 MeanStd.DevMedian95% CI
σ ξ = 0.050 0.0460.0250.044[0.045; 0.048]
σ η = 0.500 0.4990.0130.499[0.499; 0.500]
ν 0 = 0.040 0.0370.0350.040[0.036; 0.039]
ν 1 = −0.060−0.3578.903−0.060[−0.819; 0.104]
ϕ 1 = 0.500 0.4970.0360.497[0.495; 0.499]
ϕ 2 = −0.200−0.1940.036−0.194[−0.196; −0.192]
q = 0.970 0.9730.0180.976[0.972; 0.974]
p = 0.990 0.9860.0150.990[0.986; 0.987]
Notes: For each of the 1000 iterations, we simulate a sample path of 1000 time periods that corresponds to the D.Seas.C. model specification. Confidence intervals are asymptotic. Parameter values are chosen so as to resemble the data-generating process of the COVID-19 pandemic. Estimates are derived via the Kim filter.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Haimerl, P.; Hartl, T. Modeling COVID-19 Infection Rates by Regime-Switching Unobserved Components Models. Econometrics 2023, 11, 10. https://doi.org/10.3390/econometrics11020010

AMA Style

Haimerl P, Hartl T. Modeling COVID-19 Infection Rates by Regime-Switching Unobserved Components Models. Econometrics. 2023; 11(2):10. https://doi.org/10.3390/econometrics11020010

Chicago/Turabian Style

Haimerl, Paul, and Tobias Hartl. 2023. "Modeling COVID-19 Infection Rates by Regime-Switching Unobserved Components Models" Econometrics 11, no. 2: 10. https://doi.org/10.3390/econometrics11020010

APA Style

Haimerl, P., & Hartl, T. (2023). Modeling COVID-19 Infection Rates by Regime-Switching Unobserved Components Models. Econometrics, 11(2), 10. https://doi.org/10.3390/econometrics11020010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop