The Structural Identifiability of a Humidity-Driven Epidemiological Model of Influenza Transmission

Influenza epidemics cause considerable morbidity and mortality every year worldwide. Climate-driven epidemiological models are mainstream tools to understand seasonal transmission dynamics and predict future trends of influenza activity, especially in temperate regions. Testing the structural identifiability of these models is a fundamental prerequisite for the model to be applied in practice, by assessing whether the unknown model parameters can be uniquely determined from epidemic data. In this study, we applied a scaling method to analyse the structural identifiability of four types of commonly used humidity-driven epidemiological models. Specifically, we investigated whether the key epidemiological parameters (i.e., infectious period, the average duration of immunity, the average latency period, and the maximum and minimum daily basic reproductive number) can be uniquely determined simultaneously when prevalence data is observable. We found that each model is identifiable when the prevalence of infection is observable. The structural identifiability of these models will lay the foundation for testing practical identifiability in the future using synthetic prevalence data when considering observation noise. In practice, epidemiological models should be examined with caution before using them to estimate model parameters from epidemic data.


Introduction
Influenza epidemics cause considerable morbidity and mortality every year worldwide [1][2][3][4][5][6]. According to The World Health Organization, influenza leads to about 3 to 5 million cases of severe illness and about 290,000 to 650,000 deaths annually [7]. This burden might be alleviated by understanding historical and current transmission dynamics and predicting further transmission trends of influenza to assist public health authorities in designing effective interventions and vaccination strategies, especially in some special situations (for example, the emergence of influenza A (H1N1) viruses and the potential rebound of influenza in the post-COVID-19 pandemic period) [8][9][10][11][12].
Climate-driven epidemiological models, such as models driven by humidity, are mainstream tools for understanding seasonal transmission dynamics and predicting future trends in influenza activity. Recently, climate-driven epidemiological models have been successfully applied to recreate historical activity time series of influenza and to forecast the week with the highest influenza activity in temperate, tropical and subtropical regions (for example, in the United States and Hong Kong) [13][14][15][16][17]. Confident predictions using these models depend on testing various numerical optimization algorithms, such as filter-based data-assimilation algorithms, which fit the model to epidemic data when parameterizing models to simulate the transmission dynamics of influenza [13]. However, the model structure identifiability needs to be tested to avoid the optimization algorithm falling into a set of the locally optimal solution.
Model identifiability includes structural and practical identifiability, involving investigation of whether unknown model parameters can be uniquely determined from noise-free epidemic data and accurately identified from noisy data, respectively [18][19][20][21][22][23][24]. The structural identifiability of the model is a fundamental prerequisite for practical identifiability and for the model to be used in practice [25]. It is necessary, but insufficient, to accurately identify model parameters from actual noisy data because a model that is structurally identifiable might be unidentifiable when noisy data are used. The scaling method, which is a structural identifiability analysis method that has been proposed in recent years, is based on the scale invariance of the equations [26]. Compared with existing structural identifiability methods (such as differential algebra), this method has the advantage of simple operation (no advanced computing skills are required) and low computational cost, particularly when analyzing high-dimensional non-linear models [27][28][29][30]. This method has been used to analyze the structural identifiability of mathematical modeling describing biological processes, such as the generalized massaction model [31][32][33].
In this study, we apply the scaling method to analyse the structural identifiability of several types of commonly used humidity-driven epidemiological models. We investigate whether the key epidemiological parameters (infectious period, the average duration of immunity, the average latency period, the maximum and minimum daily basic reproductive number) can be determined simultaneously when the population prevalence of infected people is observable.

Methods
Here, we briefly introduce the process from building an epidemiological model to applying the model in real-world applications. After building an epidemiological model of influenza transmission, we test the structural identifiability of the model to investigate the properties of the model itself (the upper part of Figure 1). If the model is structurally identifiable, then we test the practical identifiability of the model, which determines whether the model is identifiable for noise data (e.g., reported noise), using synthetic data experiments. If the model is also practically identifiable, then this model can potentially be used in practice after evaluating its performance for specific functions (e.g., inference, forecasting, etc.). On the other hand, if a model is structurally unidentifiable, any parameter estimated by optimization algorithms might be unreliable; then, we need to consider modifying the model. For some complex models, for example, agent-based influenza transmission models [34], we can only test the practical identifiability directly after building the model (the lower part of Figure 1), as current mathematical methods may not be able to theoretically test the structural identifiability of these complex models. Here, we mainly focus on testing the structural identifiability of humidity-driven epidemiological models.

Humidity-Driven Epidemiological Model
We test the structural identifiability of several commonly used humidity-driven epidemiological models. The form of each model is as follows.

SIS model:
dS 2. SIRS model: 3. SEIR model: 4. SEIRS model: where N, S, E, and I represent the total number of people, the number of susceptible people, the number of exposed people, and the number of infectious individuals, respectively. N = S + I in the SIS model, N = S + I + R in the SIRS model, and N = S + E + I + R in the SEIR model and the SEIRS model. α represents the rate at which influenza viruses are imported into the model due to travel. t represents time, such as the day, week or year. β(t) represents the transmission rate at time t. D represents the mean infectious period in all four models. L represents the average duration of immunity in the SIRS model and the SEIRS model. W represents the average latency period in the SEIR model and the SEIRS model. The flow diagrams of these models are presented in Figure A1 in Appendix A. Obtain the scaled version for each functionally independent function; Find the identifiability equations.  The basic reproductive number, R 0 (t), represents the average number of secondary infections generated by a primary case in a fully susceptible population at time t, which is proportional to the transmission rate. The expression is as follows: The influenza virus survival and transmission are relative to the absolute humidity (AH) shown from laboratory experiments [35]. The specific humidity (SH) is a measure of AH, in which q(t) represents SH at time t. In this model, the humidity factor modulates R 0 (t) through an exponential relationship: where a = −180 is estimated by fitting laboratory influenza virus survival to the value of AH using a regression model. b = log(R 0max − R 0min ), R 0max and R 0min are the maximum and minimum daily basic reproductive number, respectively. Parameter sets } in the SEIRS model may be estimated by fitting the model to the epidemic data. We analyse the structural identifiability of these models, which tests whether each parameter set can be uniquely determined simultaneously when the prevalence data is observable (using the observation record from the beginning to the end time, such as one influenza season in temperate regions).

The Frameworks for the Scaling Method
The scaling method is easy to use for identifying the structural identifiability of a non-linear model based on simple scaling transformations and the solution of simple sparse systems of equations [26]. The ordinary differential equations(ODE) model, which is applied to the frameworks of the scaling method, is as follows: where x i (0) represents the initial conditions, dx i dt represents the change of x i over time, depending on m parameters θ j , and the number of state variables is n. f i is a function characterising the specific details of the change rate of x i . The simplicity of this method depends on the ability to decompose functions f i as a sum of P functional independent components, f ij , A property of f ij is that f ij is functionally independent of f ik for every j = k. Here, x j andθ j represent the subset of variables and parameters of function f ij , respectively. In simple terms, if f i (x i , x 2 , . . .), . . . , f n (x i , x 2 , . . .) are linearly independent functions, then the only solution of the equation is a 1 = a 2 =, . . . , = a n = 0. The functional independence theorems used in this work are presented in Appendix A.
Based on decomposing functions, the steps of the scaling method are as follows: Step 1. Scale all parameters and unobserved variables using unknown scaling factors, µ: and substitute them into Equation (8). The experiment measures variables x 1 , . . . , x s without modifying them (n is the total number of state variables, x 1 , . . . , x s is the observable state variable and x s+1 , . . . , x n is the unobserved state variable).
Step 2. Obtain the scaled version for each functionally independent function. Namely, and Step 3. Find scaling factor combinations that maintain the system invariant. Only the parametersθ j with a solution µθ j = 1 are identifiable. Only the variables,x j with µx j = 1 are observable. Otherwise, parameters whose scaling factors are coupled form identifiable groups but cannot be identified independently.

Results
Here, we demonstrate how to use the simple scaling method to test the identifiability of the four humidity-driven epidemiological models introduced in the Methods section.
We consider a scenario where only I is observed, representing a kind of epidemic data that can be collected in practice. For example, in the UK, the COVID-19 Infection Survey identified those people testing positive for coronavirus  in private residential households (surveillance sensors) at a point in time to help the government make decisions on how to respond to the emerging epidemic and provide information to the public [36]. This infection survey can, in principle, be extended to survey influenza to identify new positive cases of influenza regularly around the influenza season. In this scenario, we can obtain I from the sentinel surveillance systems. We test whether the humidity-driven epidemiological models in Equations (1)-(4) are structurally identifiable, respectively.

SIS Model
For the SIS model, we test whether parameter set Θ 1 = {D, R 0max , R 0min } can be determined uniquely from the observable I. First, we investigate whether the differential equation in Equation (1) can be decomposed into a sum of linearly independent functions. This is a prerequisite for using the scaling method. For the differential equation associated with S, we have: According to Theorem A1, the generalized Wronskian determinant is as follows: So, f S1 and f S2 are linearly independent functions. Similarly, for the differential equation associated with I, we have: The corresponding generalized Wronskian determinant is as follows: So, f I1 and f I2 are linearly independent functions. Next, we explore whether parameter set Θ 1 = {D, R 0max , R 0min } can uniquely be determined from the observable I using the scaling method. The steps of the scaling method are as follows: Step 1. We scale the parameter set Θ 1 = {D, R 0max , R 0min } and the unobserved variable (S) by unknown scaling factors: Step 2. We obtain the scaled version for each functional linear independent function in Equations (13) and (15).
Step 3. We obtain the identifiability equations: Manipulating the above formulas, the identifiability equations are: From the last formula in Equation (22), we further manipulate this formula and obtain the following equation: e aq(t) is not equal to zero. When the left and right sides of the equation are equal, we have: By solving the above equation, we can obtain: Therefore, the SIS model is identifiable. Namely, parameter set Θ 1 = {D, R 0max , R 0min } can be determined uniquely from the observable I.

SIRS Model
For the SIRS model, we test whether the parameter set Θ 2 = {D, L, R 0max , R 0min } can be determined uniquely from the observable I. First, we investigate whether the differential equation in Equation (2) can be decomposed into a sum of linearly independent functions. For the differential equation associated with S, we have: For the differential equation associated with S, we have: According to Theorem A1, the generalized Wronskian determinant is as follows: So, f S1 and f S2 are linearly independent functions. Similarly, for the differential equation associated with I, we have: The corresponding generalized Wronskian determinant is as follows: So, f I1 and f I2 are linearly independent functions. Next, we explore whether the parameter set Θ 2 = {D, L, R 0max , R 0min } can be determined uniquely from the observable I using the scaling method. The steps of the scaling method are as follows: Step 1. We scale the parameters and unobserved variables by unknown scaling factors: Step 2. We obtain the scaled version for each functional linear independent function in Equations (26) and (28).
Step 3. We obtain the identifiability equations: Manipulating Equations (31)- (34), the identifiability equations are: From the last formula in Equation (35), the left side of Equation (35) is constant, the right side of this equation is the function of t, and e aq(t) is not equal to zero. For the equation to be satisfied, µ R 0max − 1 = 0 and µ R 0min − 1 = 0. Hence, the system has a unique solution (µ R 0max = µ R 0min = 1). It follows that the SIRS model is identifiable.

SEIR Model
For the SEIR model, we test whether the parameter set Θ 3 = {D, W, R 0max , R 0min } can be determined uniquely from the observable I. First, we investigate whether the differential equation in Equation (3) can be decomposed into a sum of linearly independent functions. For the differential equation associated with S, we have: According to Theorem A1, the generalized Wronskian determinant is as follows: So, f S1 is a linearly independent function. Similarly, for the differential equation associated with E, we have: The corresponding generalized Wronskian determinant is as follows: So, f E1 and f E2 are linearly independent functions. For the differential equation associated with I, we have: The generalized Wronskian determinant is as follows: So, f I1 and f I2 are linearly independent functions. Next, we explore whether parameter set Θ 3 = {D, W, R 0max , R 0min } can be determined uniquely from observable I using the scaling method. The steps of the scaling method are as follows: Step 1. We scale the parameter set Θ 3 = {D, W, R 0max , R 0min } and the unobserved variable (S) by unknown scaling factors: Step 2. We obtain the scaled version for each functional linear independent function in Equations (36), (38) and (40).
Step 3. We obtain the identifiability equations: According to formulas (45)-(47), it is easy to get: Pressing µ D = 1 into the formula (43), we can use Equation (23) in the SIS model. Then by Equation (24) in the SIS model, we can obtain: Next, pressing the above results into formula (44), we can get µ S = 1. Therefore, the SEIR model is identifiable. Namely, the parameter set Θ 3 = {D, W, R 0max , R 0min } can be uniquely determined from the observable I.

SEIRS Model
For the SEIRS model, we test whether the parameter set Θ 4 = {D, L, W, R 0max , R 0min } can be determined uniquely from observable I. First, we investigate whether the differential equation in Equation (4) can be decomposed into a sum of linearly independent functions. For the differential equation associated with S, we have: According to Theorem A1, the generalized Wronskian determinant is as follows: So, f S1 and f S2 are linearly independent functions. Similarly, for the differential equation associated with E, we have: The corresponding generalized Wronskian determinant is as follows: So, f E1 and f E2 are linearly independent functions. For the differential equation associated with I, we have: The generalized Wronskian determinant is as follows: So, f I1 and f I2 are linearly independent functions. Next, we explore whether the parameter set Θ 4 = {D, L, W, R 0max , R 0min } can be determined uniquely from the observable I using the scaling method. The steps of the scaling method are as follows: Step 1. We scale the parameter set Θ 4 = {D, L, W, R 0max , R 0min } and the unobserved variable (S) by unknown scaling factors: Step 2. We obtain the scaled version for each functional linear independent function in Equations (50), (53) and (55).
Step 3. We obtain the identifiability equations: Similar to the previous derivation of the corresponding part of the SEIR model, it is easy to obtain: Therefore, the SEIRS model is identifiable. Namely, the parameter set Θ 4 = {D, L, W, R 0max , R 0min } can be determined uniquely from the observable I.

Discussion
In some existing studies, the authors explored the performance of different data assimilation algorithms when inferring parameters from observational epidemic data [13]. For example, ensemble filters are better at reproducing historical influenza incidence time series than particle Markov chain Monte Carlo. However, we still do not know whether the model is structurally unidentifiable, which affects the performance of the optimization algorithms. In this study, we applied the scaling method to analyse the structural identifiability of four types of commonly used humidity-driven epidemiological models when prevalence data is observable. Specifically, we investigated whether each parameter set Θ 1 = {D, R 0max , R 0min } in the SIS model, Θ 2 = {D, L, R 0max , R 0min } in the SIRS model, Θ 3 = {D, W, R 0max , R 0min } in the SEIR model and Θ 4 = {D, L, W, R 0max , R 0min } in the SEIRS model can be uniquely determined (D is the infectious period, L is the average duration of immunity, W is the latency period, and R 0max , R 0min are the combination of the maximum and minimum daily basic reproductive number and the minimum daily basic reproductive number). We find each model is identifiable when the prevalence is observable. Aside from considering prevalence data as observational data, this study also considered the number of new cases as observational data. For example, when we introduced the differential equation of the new case changes over time in the SIRS model, the differential equation decomposition components (obtained based on Equation (8) from the frameworks of the scaling method part) are not linear-independent. So the scaling method cannot be used to determine whether the model is structurally identifiable when the new cases are observable.
Much work needs to be done to further test, validate, and improve the ability of humidity-driven epidemiological models to predict influenza activity. In the future, we will test more expanded humidity-driven epidemiological models (such as a model that includes more than one exposed state) to provide theoretical support for these models to be used in practice. This investigation involved testing of the structure identifiability of different humidity-driven epidemiological models. In the future, we will assess the practical identifiability of these models to provide synthetic experimental support to enable these models to be used in practice. In addition, we will explore other model structure identifiability methods to test the structural identification of the humidity-driven epidemiological model when the number of new cases is observable. We will also consider more data types, such as the cumulative number of incidences.
In conclusion, our analysis suggests that the structural identifiability of these models can lay the foundation for testing practical identifiability in the future. In practice, epidemiological models should be examined with caution before using them to estimate model parameters from epidemic data.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
We show the theorem to prove that the functions in the humidity-driven epidemiological model are linearly independent.
SIS model: SIRS model: SEIR model: SEIRS model: Figure A1. Flow diagram of the four types of humidity-driven epidemiological models.