A Mechanistic Model for Long COVID Dynamics

Long COVID, a long-lasting disorder following an acute infection of COVID-19, represents a significant public health burden at present. In this paper, we propose a new mechanistic model based on differential equations to investigate the population dynamics of long COVID. By connecting long COVID with acute infection at the population level, our modeling framework emphasizes the interplay between COVID-19 transmission, vaccination, and long COVID dynamics. We conducted a detailed mathematical analysis of the model. We also validated the model using numerical simulation with real data from the US state of Tennessee and the UK.


Introduction
A serious consequence following infection with SARS-CoV-2 is the potential occurrence of a long-lasting disorder known as post-acute sequelae of COVID-19 (or, long COVID) [1,2].The cause of long COVID is unclear at present, but possible contributors include persistent reservoirs of SARS-CoV-2 in certain tissues and their continued interactions with host microbiome communities, injuries to one or more organs from the acute infection, ongoing activities of primed immune cells and other immune dysregulations, clotting/coagulation issues, and dysfunctional vagus nerve signals [3][4][5].
A number of studies have highlighted the persistence and complications of the long COVID syndrome.For example, a cohort study followed 147 Italian patients who were hospitalized for COVID-19 and found that 87% still had symptoms 60 days after they were discharged from the hospital [6].Another study [7] found that 76% of hospitalized COVID-19 patients (or, 1265 out of 1655) in Wuhan, China, were still experiencing symptoms 6 months after their infection.In the United States, CDC estimated that up to a third of COVID-19 cases may result in long-term symptoms [8], indicating that tens of millions of Americans diagnosed with COVID-19 may endure long-lasting consequences of the illness.The situation is further complicated by the fact that individuals who are infected with only mild symptoms, and those who are fully vaccinated but later develop breakthrough infections, can become COVID long-haulers [9][10][11].
In order to assess the impact of long COVID on the healthcare system and to design effective strategies for resource distribution, it is critical to quantify and predict the burden of long COVID at the population level.However, several systematic reviews and metaanalyses found that COVID long-haulers exhibited a wide variety of symptoms and that the long COVID rates are highly heterogeneous across different populations, ranging from 10% to 67% [11][12][13][14][15].Such a heterogeneous pattern indicates that no single formula can be applied to every population to quantify the burden of long COVID.Instead, the evaluation and prediction of long COVID prevalence have to take into account the specific population characteristics and the epidemic/pandemic trends.
Mathematical models can overcome such challenges and help us to better quantify the population dynamics associated with long COVID.However, in contrast to the large number of mathematical, statistical, and computational models already published for COVID-19 transmission and spread (see reviews in [16][17][18][19]), very few modeling studies have been devoted to long COVID.The currently available models for long COVID are mainly based on machine learning and statistical techniques, with a focus on identifying individuals at risk of long COVID [20][21][22][23][24][25].Their findings, though very useful, do not improve the understanding of the population dynamics of long COVID.Moreover, none of these studies have employed mechanistic models.
In this work, we propose a novel mechanistic model based on differential equations to investigate the dynamics of long COVID and its population-level prevalence.Since long COVID stems from acute infection, we will link the population dynamics of long COVID with the transmission dynamics of COVID-19.In addition, since COVID-19 vaccines have played an important role in fighting the virus, we will also incorporate the impact of vaccination in our model.Our goal is to accurately predict the prevalence of long COVID in a given population with the available data.We will implement the model in real-world applications using data from both the US and the UK.
The remainder of this paper is organized as follows.We present the formulation of our mechanistic model in Section 2 and perform a detailed mathematical analysis in Section 3. We then present data fitting and numerical simulation in Section 4. Finally, we conclude our paper with some discussion in Section 5.

Model Formulation
We used differential equations to construct our mathematical model and investigated the population dynamics of long COVID.We considered a model that involves five compartments representing the number of the following individuals: the susceptible individuals (denoted by S), the vaccinated individuals (denoted by V ), the infected and infectious individuals with short-term symptoms (denoted by I), the individuals with longterm symptoms; i.e., COVID long-haulers (denoted by L), and the recovered individuals (denoted by R).Based on numerous clinical studies, the duration of infectiousness of SARS-CoV-2 is always limited to a relatively short period [26].We, thus, assume that individuals with long COVID are not contagious.
The model is described by the following differential equations: A flow chart for this model is shown in Figure 1.The parameter Λ is the population influx rate, β is the transmission rate, μ is the natural death rate for the human hosts, and ϕ is the vaccination rate.Susceptible individuals become infected by contacting infectious individuals.In addition, susceptible individuals are vaccinated at the rate ϕ.We assume that a portion of θ 0 < θ < 1 in vaccinated individuals are at risk for breakthrough infections, that is, the degree of protection for the vaccines is 1 − θ.Infected individuals exit the acute infection period at the rate γ; among these, a portion ρ will develop long COVID and enter the L compartment, whereas the other portion 1 − ρ will truly recover from the disease and enter the R compartment.Since we are concerned with a relatively long time period, individuals in all these compartments are subject to natural mortality at an average rate μ.In addition, the parameter γ L is the rate of recovery for COVID long-haulers, and ω and ω L are the disease-induced death rates in the acute infection and long COVID states, respectively.

Mathematical Analysis
System (1) has a feasible domain It can be easily verified that Ω is positively invariant for the vector field of (1).It is also straightforward to obtain the following result: Theorem 1. System (1) has a unique disease-free equilibrium (DFE) in the domain Ω X 0 = S 0 , V 0 , I 0 , L 0 , R 0 = Λ ϕ + μ , ϕΛ μ ϕ + μ , 0,0, 0 . ( The basic reproduction number of our model can be derived by the standard next-generation matrix technique [27], with the new infection matrix F and the transition matrix V computed as The spectral radius of the next-generation matrix F V −1 then gives the basic reproduction number of system (1): Note that the term related to ϕθ in Equation ( 4) represents the risk associated with the breakthrough infection.
Theorem 2. System (1) has a positive endemic equilibrium in Ω if and only if ℛ 0 > 1.The endemic equilibrium is unique when it exists.
Proof.At a nontrivial equilibrium of system (1), we have Now, we write each equation as a function of I. From ( 6)-( 8), we obtain V = ϕS θβI + μ , and respectively.Substituting (11) into (12), we obtain Similarly, substitution of ( 11) into (13) yields Equating ( 14) and ( 15), we obtain an equation in terms of a single variable I: where the function f is defined as The function f I is strictly decreasing for I > 0, and f I 0 when I ∞.Hence, Equation ( 16) has a positive solution at I = I ˆ> 0 if and only if Through simple algebraic manipulation, it can be easily seen that Equation ( 18) is equivalent to ℛ 0 > 1.Clearly, the positive solution I ˆ is unique when ℛ 0 > 1, due to the monotonicity of the function f.
Consequently, S, V , L, and R at the positive equilibrium can all be uniquely determined from I ˆ based on the equations presented above, and this positive equilibrium clearly belongs to Ω.
Hence, there is a unique endemic equilibrium X ˆ= S ˆ, V ˆ, I ˆ, L ˆ, R ˆ for system (1) if and only if From [27], we know that the DFE X 0 is locally asymptotically stable when ℛ 0 < 1 and unstable when ℛ 0 > 1.The result below establishes the local stability for the endemic equilibrium.
We provide the proof of Theorem 3 in Appendix A. In what follows, we focus on the global stability properties of the DFE and the endemic equilibrium.
Proof.We first consider the case ℛ 0 > 1, where the endemic equilibrium X ˆ= S ˆ, V ˆ, I ˆ, L ˆ, R êxists and is unique.We aim to establish that all the orbits of S, V , I approach S ˆ, V ˆ, I ˆ.To that end, we introduce the following Lyapunov function [28]: Taking the derivative of ℒ along the solution of system (1) gives us where the dot notation is used for the time derivatives of S, V , and I.We will show that dℒ dt ≤ 0. To do so, we will manipulate the three parts in Equation ( 20) separately.For convenience, we will denote these terms by ( 21)- (23).Our first term is equal to Similarly, the second term in ( 20) is Now, the last term of ( 20) is From here, we will begin the process of canceling out many terms and showing that all that remains are nonpositive terms.We can rewrite the following terms as γ + ω + μ I ˆ= βI ˆS ˆ+ θV from (8).Therefore, Similarly, from (7), the terms in ( 22) can be rewritten as giving us Note that, in all of the terms of ( 21)-( 23), there are common factors of β, μ, ϕ.We will simplify dℒ dt by first summing all of the terms from ( 21)-( 23 Many terms can immediately be canceled.Doing so gives For terms that share a common factor of ϕ, This leaves us with one remaining term with a common factor of μ: Now, by combining all of the three terms, we are left with all of which are nonpositive terms since the arithmetic mean is greater than or equal to the geometric mean.Hence, dℒ dt ≤ 0 and the equality holds if and only if S, V , I = S ˆ, V ˆ, I ˆ, which shows that S, V , I S ˆ, V ˆ, I ˆ for all the solution orbits.Consequently, letting I I ˆ, we immediately obtain L L ˆ and R R ˆ from the last two equations of system (1).This establishes the global asymptotic stability of the endemic equilibrium X ˆ when ℛ 0 > 1.
When ℛ 0 < 1, the DFE X 0 = S 0 , V 0 , I 0 , L 0 , R 0 is the only equilibrium of system (1).We consider the Lyapunov function which yields along the solution of system (1).It can be easily observed that the algebraic manipulations we performed previously for S ˆ, V ˆ, I ˆ will still hold for S 0 , V 0 , 0 and all the terms associated with I ˆ will disappear.We thus obtain With similar arguments as before, we can establish the global asymptotic stability of the DFE X 0 when ℛ 0 < 1. □ These theoretical results concerning the different types of equilibria and their stability properties, with a sharp threshold at ℛ 0 = 1, are common in many epidemiological models [27,29].Nevertheless, an important message from Theorems 2-4 is that long COVID may persist in the host population in the long run, unless COVID-19 is completely eradicated.
In the next section, we will utilize a numerical simulation to implement our model for realworld applications.We fitted and simulated the model using reported data from relatively short time periods to gain insights for the population-level progression of long COVID.

Numerical Simulation
We conducted numerical simulation with real data to validate our model.In contrast to the large amount of surveillance data for COVID-19, time series data for long COVID at the population level are very rare at present.We performed two simulation studies.The first one was for the state of Tennessee in the US, where detailed data for COVID-19 cases, deaths, and vaccination coverage are available [30], but there are no population-level data available for long COVID.The second study was concerned with the UK which, as an exception, published monthly data for long COVID prevalence in the UK population through its Office for National Statistics (ONS) [31].

Simulation for the Tennessee State in the US
We first applied our model to the COVID-19 data for the US state of Tennessee for the 4-month period between 31 August 2021 and 30 December 2021.An estimation given by the US Census Bureau of the total population in Tennessee on 1 July 2021 gave us a value of N = 6,975,218 [32].Our time period was relatively short, so we assumed that immigration and emigration were equal, and that the natural birth rate was equal to the natural death rate, μ.We defined the natural death rate as the inverse of the life expectancy, which in 2019, before COVID-19 started in Tennessee, was 75.6 years [33].We were then able to define the population influx rate as the product of the natural birth rate and the total population: Λ = μN.We defined γ as the recovery rate for acute infections, calculated as the reciprocal of the acute infection period reported in [34].For ω, i.e., the disease-induced death rate for acute infections, we used the estimate provided in [35].The breakthrough infection ratio, θ, ranged from 1% to 20% in the US [36], and we took an average θ = 10% in this study.The values for these parameters can be found in Table 1.
In our model, the susceptible, vaccinated, and infected compartments form a system that is not dependent on long COVID cases or the recovered compartment.So next, we estimated the transmission rate β and the vaccination rate ϕ by fitting a simplified model consisting of only the S, V , and I compartments to the COVID-19 infection and vaccination data reported on a daily basis by the Tennessee Department of Health [30].The initial condition for this simulation was set as S 0 , V 0 , I 0 = 3,023,763, 2,888,057, 46,260 , based on the reported data for the number of fully vaccinated individuals and infected individuals in Tennessee on 31 August 2021.The fitted parameter values are given in Table 2 and the result of the data fitting for the number of cumulative cases is shown graphically in Figure 2.
Based on our data fitting results, which included the number of active infections I in particular, we proceeded to conduct a numerical simulation for the possible prevalence levels of long COVID in Tennessee using the equation for the L compartment in system (1).We assumed that the mortality rate caused by long COVID was much lower than that caused by the acute infection, with ω L = 0.1ω, and that the average recovery period for long COVID was 90 days, with γ L = 1/90 per day.We then picked three different values for ρ, i.e., the portion of infected individuals who went on to develop long COVID, with ρ = 10%, 20%, and 30%.Figure 3 displays the simulation curves for the number of active long COVID cases with the three values of ρ.The highest and lowest points on each curve are marked by a square and a circle, respectively.We observed that the peak of L ranged from about 2.5 × 10 4 (when ρ = 10%) to almost 8 × 10 4 (when ρ = 30%).Even with the minimal estimate of ρ = 10%, the lowest point on the simulation curve was L ≈ 1.5 × 10 4 , indicating a substantial public health burden caused by long COVID.
Because long COVID data are not available for the Tennessee population, we were not able to fit the parameters relevant to long COVID dynamics and predict the progression of long COVID in Tennessee.Nevertheless, our simulation results provide possible ranges for the prevalence of long COVID in this population, which could inform the public health administration in their design of control and intervention strategies.

Simulation for the UK
In addition to the regular, daily reported data for COVID-19 infections, the UK Office for National Statistics (ONS) has published survey data for the prevalence of long COVID in the UK population on a monthly basis [31].The time period covered by the long COVID data starts from 6 February 2021.The ONS data allowed us to fit our model to the number of long COVID cases.We focused our attention on fitting the L compartment in system (1), where the variable I involved in the L equation was determined by the reported number of active COVID-19 cases.We set the time from 6 February 2021 to 4 July 2022, a period about 17 months, for model fitting, and the time from 5 July 2022 to 6 December 2022, a period about 5 months, for model testing.
The values of the parameters for the UK simulation study are listed in Table 3.There were five parameters, i.e., γ, ρ, γ L , ω L , and μ, that were involved in the L equation in system (1).
We also conducted a sensitivity analysis for these five parameters in terms of the variable L, using the method from [37] for computing relative sensitivities.The results are presented in Figure 4. We observed that ρ, γ L , and γ were the three most sensitive parameters, while ω L and μ had very low sensitivities.This indicates that μ and ω L would have very little impact on the long COVID prevalence L. We calculated the natural death rate μ from the demographic information of the UK population [38].We took the long COVID-induced death rate ω L = 0.0012 per day, i.e., the same value used in the Tennessee simulation.We additionally noted that the recovery rate γ was determined by the characteristics of the acute infection.We thus set the (average) recovery period from the acute infection as γ −1 = 10 days, based on reported values for COVID-19 in the UK [39].
We proceeded to use data fitting to estimate the two key parameters ρ and γ L associated with the long COVID prevalence.Specifically, we fitted the L equation in our model to the monthly reported long COVID data over the 17-month fitting period.Next, we conducted a numerical simulation to generate a prediction for the 5-month testing period using the parameter values estimated from the fitting period.
Table 4 lists the fitted parameter values.Figure 5 displays the fitting and prediction curves for the number of active long COVID cases in the UK, compared with the ONS data.The vertical dashed line in the figure separates the fitting and prediction periods.
Our fitting result for ρ shows that about 31.6% of the infected individuals in the UK went on to develop long COVID.The fitted value for the recovery rate of long COVID was γ L ≈ 0.0112 per day, indicating that long COVID would last about 1/γ L ≈ 89.3 days on average in the UK population.This is indeed very close to the assumed value of 90 days in our simulation study for the Tennessee data.These numbers, together with the predictive capability of the model, can provide useful quantitative information to assess the burden of long COVID in the UK and to guide relevant policy developments and resource allocation.

Discussion
As stated in a recent review article, "measuring COVID-19 morbidity is an immediate priority in this pandemic" [40].Long COVID contributes substantially to the overall COVID-19 morbidity, and quantifying the burden of long COVID at the population level is important for public health planning and policy making.In this pilot study, we present a new mechanistic model based on differential equations to investigate the population dynamics of long COVID.Our model emphasizes the interaction between COVID-19 transmission, vaccination, and long COVID dynamics.A detailed mathematical analysis was conducted and the main dynamical properties of the model were completely resolved.Furthermore, numerical simulation was carried out with real data from the US state of Tennessee and the UK to validate this modeling framework.
The two simulation studies demonstrate the utility of our modeling framework.For a place such as the US state of Tennessee, where long COVID data are not currently available, our model was able to generate useful information for the range of the long COVID prevalence at the population level.For a place such as the UK, which has published long COVID data, our model can be used to fit the prevalence and predict the future progression of long COVID.It is our hope that more population-level long COVID data will be reported in the near future, which will enable wider applications of our mechanistic model.
This work contributes to the quantitative and predictive studies of long COVID, which are emerging but still in the initial stage at present.Our findings add quantitative knowledge for the transmission of COVID-19, the progression from acute infection to long-lasting disorder, and the population-level prevalence and burden of long COVID.These results can be used to provide helpful guidelines for the public health administration to engage in science-based long COVID management and resource allocation, and for healthcare providers to target early intervention strategies and facilitate the timely recovery of long COVID patients.
The current model can be extended in several directions.For example, we may refine the model structure by adding another compartment to represent hospitalized COVID-19 patients who typically exhibit severe illness from acute infection and who are more likely to develop long COVID [6,7].In addition, there are a number of other factors associated with COVID-19 patients that have been linked to an increased risk of long COVID, including the presence of underlying health conditions, the occurrence of multiple acute infection symptoms, old age, and a high body mass index [5,25,41].Such factors and related clinical data may be used to improve our modeling framework.
We know that both λ + γ L + ω L + μ and λ + μ have negative eigenvalues.Therefore, to ensure asymptotic stability, we only need to prove x, y, z > 0 and xy > z to satisfy the Routh-Hurwitz criteria.
We first seek to show that x > 0.
This is surely positive except, potentially, for the last term.However, since we are observing this at the endemic equilibrium point X ˆ,( 6)- (10) still hold true.Thus, we quickly see from (8) that Therefore, which is positive.Hence, we have that, not only is x > 0, but from our last steps, we see that Next, we work towards proving that y > 0 while noting a 3,3 = 0. Thus, clearly y > 0.
Similarly, to show that z > 0, we see that which is also clearly greater than 0.
Here, we can note that xy and z share common terms: −a 1,3 a 2,2 a 3,1 , − a 1,1 a 2,3 a 3,2 ; so to prove xy > z, it suffices to show that Let us analyze these terms more closely.First of all, we can see that the term on the right-hand side of (A1) is Moreover, we see that one of the terms on the left-hand side of (A1) is Therefore, we have a term that is strictly greater than what we need to prove that (A1) is true as long as none of the other terms on the left-hand side subtract away more than β 3 I ˆ2S ˆ+ β 2 I ˆS ˆμ.We will see, however, that this is not a problem, and in fact, the rest of the terms will be positive additions to the left-hand side of (A1).It is easy to see that both 2 a 2,2 and a 1,1 a 2,2 2 are positive values, ensuring that our inequality is true.We now look at the last value of (A1).which is another positive term.Hence, our inequality (A1) is true, proving that xy > z, Therefore, using the Routh-Hurwitz criteria, the endemic equilibrium is locally asymptotically stable when ℛ 0 > 1. □ A flow chart showing movement between the compartments of the system (1).Relative sensitivities for the five parameters γ, ρ, γ L , ω L , and μ that are related to the long COVID compartment L.

Figure 2 .
Figure 2. Fitting result for the number of cumulative COVID-19 cases in the US state of Tennessee beginning from 31 August 2021.The horizontal axis represents the number of days since 31 August 2021, and the vertical axis represents the number of cumulative cases.

Figure 3 .
Figure 3.Simulation results for the number of active long COVID cases in Tennessee using different values of ρ.The horizontal axis represents the number of days since 31 August 2021, and the vertical axis represents the number of active long COVID cases.On each simulation curve, the square marks the peak and the circle marks the lowest point of the curve.

Fitting
and prediction results for the long COVID cases in the UK from 6 February 2021 to 6 December 2022.The vertical dashed line in purple separates the fitting and prediction periods.The red circles represent the reported data.The blue solid line represents the fitting result and the green solid line represents the prediction result.Derrick et al.Page 22

Table 2 .
Parameter estimates found by fitting the Tennessee data.

Table 3 .
Parameter values for the simulation of the UK long COVID data.Proportion of long COVID cases Found via data fitting - Mathematics (Basel).Author manuscript; available in PMC 2023 December 18.