A Multivariate Deep Learning Model with Coupled Human Intervention Factors for COVID-19 Forecasting

Qu, Zongxi; Zhang, Beidou; Wang, Hongpeng

doi:10.3390/systems11040201

Open AccessArticle

A Multivariate Deep Learning Model with Coupled Human Intervention Factors for COVID-19 Forecasting

by

Zongxi Qu

^1,2,

Beidou Zhang

³ and

Hongpeng Wang

^1,2,*

¹

School of Management, Lanzhou University, Lanzhou 730000, China

²

Research Center for Emergency Management, Lanzhou University, Lanzhou 730000, China

³

College of Atmospheric Sciences, Lanzhou University, Lanzhou 730000, China

^*

Author to whom correspondence should be addressed.

Systems 2023, 11(4), 201; https://doi.org/10.3390/systems11040201

Submission received: 1 March 2023 / Revised: 10 April 2023 / Accepted: 13 April 2023 / Published: 17 April 2023

(This article belongs to the Special Issue Human–AI Teaming: Synergy, Decision-Making and Interdependency)

Download

Browse Figures

Versions Notes

Abstract

:

Artificial intelligence (AI) technology plays a crucial role in infectious disease outbreak prediction and control. Many human interventions can influence the spread of epidemics, including government responses, quarantine, and economic support. However, most previous AI-based models have failed to consider human interventions when predicting the trend of infectious diseases. This study selected four human intervention factors that may affect COVID-19 transmission, examined their relationship to epidemic cases, and developed a multivariate long short-term memory network model (M-LSTM) incorporating human intervention factors. Firstly, we analyzed the correlations and lagged effects between four human factors and epidemic cases in three representative countries, and found that these four factors typically delayed the epidemic case data by approximately 15 days. On this basis, a multivariate epidemic prediction model (M-LSTM) was developed. The model prediction results show that coupling human intervention factors generally improves model performance, but adding certain intervention factors also results in lower performance. Overall, a multivariate deep learning model with coupled variable correlation and lag outperformed other comparative models, and thus validated its effectiveness in predicting infectious diseases.

Keywords:

COVID-19 forecasting; human interventions; multivariate prediction; LSTM model

1. Introduction

With global warming, ecological changes, and urbanization in recent decades, an increasing number of pathogenic microorganisms have mutated, making outbreaks of major infectious diseases more frequent, increasing by one or more species per year [1]. For example, SARS, influenza A(H1N1), H7N9, Ebola hemorrhagic fever, and COVID-19 have triggered a series of major public health outbreaks. These diseases pose a serious threat to human health, social stability, and public safety due to characteristics such as high infectivity and their rapid, wide-ranging transmission.

Since the outbreak of COVID-19, researchers have been increasingly interested in the application of AI techniques to COVID-19 responses [2]. Examples include AI-based diagnosis of viral images and symptoms [3,4,5], prediction of the number of cases [6,7,8], intelligent contact tracing [9], and artificial intelligence-assisted drug discovery [10]. Particularly, a majority of studies have been dedicated to the use of machine learning techniques to predict the number of infections. For instance, Ly et al. [11] used an adaptive neuro-fuzzy inference system to predict COVID-19 cases in the UK and showed that data from Spain and Italy could improve the prediction ability. Based on time series data reported from 1 March 2020 to 30 April 2020, Parbat et al. [12] used support vector regression (SVR) to forecast coronavirus cases in India in 2019. Furthermore, deep learning techniques have been widely used in the prediction of COVID-19, especially the LSTM [13]. The LSTM is an extension of recurrent neural networks (RNNs), which rely on persistent prior knowledge and typically have short-term memory. Shastri et al. [14] used the LSTM to predict confirmed cases of COVID-19 and resulting deaths in both India and the United States. A mean absolute percentage error (MAPE) calculation showed that the prediction error rate for the LSTM model was 2.0 to 3.3%. Kırbaş et al. [15] used the LSTM model to predict COVID-19 cases in eight European countries. Despite the differences in human behavior, applied measures, and available data in each country/region, the LSTM outperformed ARIMA and NARNN in predicting COVID-19 cases. Mohamed et al. [16] applied several deep learning models to predict the COVID-19 outbreak in Egypt, and they found that the LSTM model showed the best performance in forecasting cumulative infections for one week and one month in advance. Devaraj et al. [17] assessed the reliability and practical implications of several AI-based models. In comparison to other algorithms considered, the stacked LSTM model had higher prediction accuracy and reliability in forecasting cumulative infections one week and one month ahead.

The above studies have shown that the LSTM algorithm is capable of accurately predicting COVID-19 epidemics. However, these studies lack consideration of possible human intervention factors that influence epidemic development when training AI models, such as government response, implementation of epidemic prevention and control measures, stringency, and economic support. Several studies have determined that non-pharmaceutical interventions by the state can significantly affect infectious disease epidemics [18,19]. It has been demonstrated that strict prevention and control measures are more effective in suppressing infectious disease epidemics, emphasizing the importance of dynamic human interventions. Hence, it is generally accepted that human intervention can influence epidemic development trends in different ways, so it is necessary to take human intervention factors into account when developing AI-based epidemic prediction models. To this end, we proposed a multivariate long short-term memory model (M-LSTM) coupled with human-influencing factors for effective infectious disease prediction. Firstly, we collected data on non-pharmaceutical interventions during the COVID-19 outbreak from Oxford University’s Oxford COVID-19 Government Response Tracker (OxCGRT) [20]. These four indexes were selected as the main factors for human intervention: overall government response, containment and health, stringency, and economic support. Subsequently, we analyzed the correlations and lagged effects between four human factors and epidemic cases in three representative countries: the United States, the United Kingdom, and India. On this basis, a multivariate epidemic prediction model (M-LSTM) was developed. Multi-group experiments were conducted to assess the prediction performance of multiple different human-influenced factor coupling schemes. Finally, the epidemic prediction model with the best-coupled variables was determined.

The contribution of our work can be summarized by the two following aspects.

(1) The relationship between human intervention factors—such as government response—and epidemic changes is unclear. Therefore, in this paper, we analyzed the correlation and lag between the four human intervention factors and data on the COVID-19 epidemic. It was found that there were significant negative correlations and lags between these two factors.

(2) To accurately predict the epidemic trend, we proposed a novel multivariate deep learning prediction model (M-LSTM) by coupling the information of human intervention factors to provide more reliable epidemic prediction results.

2. Methods and Data

2.1. LSTM Model Principles

LSTMs, which are a special type of RNN, can learn long-term dependence. Hochreiter and Schmidhuber proposed the long short-term memory neural network, which overcomes the gradient vanishing problem and performs well in a variety of problems [21]. In the LSTM network, three gates determine the behavior of the memory unit. These gates are the input gate, the output gate, and the forgetting gate, which control whether to update or discard the data. By eliminating the disadvantages associated with the general recursive neural model of excessive weight influence and easy gradient disappearance, the network can converge more quickly and efficiently. Prediction accuracy can be effectively improved. In the LSTM network model, in addition to the hidden-layer neurons, there is a memory cell c_t, which is used to encode the memory of recorded information until it reaches the time step ‘t’.

The process layer equations of the LSTM network are shown below, and the LSTM schematic is shown in Figure 1.

Input gate:

i_{t} = sigmoid_activation (V_{i} z_{t - 1} + W_{i} x_{t} + b_{i})

(1)

Forgetting gate:

f_{t} = sigmoid_activation (V_{f} z_{t - 1} + W_{f} x_{t} + b_{f})

(2)

Output gate:

o_{t} = sigmoid_activation (V_{o} z_{t - 1} + W_{o} x_{t} + b_{o})

(3)

Memory cell:

{\tilde{c}}_{t} = sigmoid_activation (V_{c} z_{t - 1} + W_{c} x_{t} + b_{c})

(4)

c_{t} = f_{t} \otimes c_{t - 1} + i_{t} \otimes {\tilde{c}}_{t}

(5)

Final output:

h_{t} = o_{t} \otimes \tanh_activation (c_{t})

(6)

where x_t and z_t are the input and output of the LSTM connectivity layer at time t; W_i, W_f, W_o, and W_c are the weights of input gate, forgetting gate, output gate, and memory cell, respectively; V_i, V_f, V_o, and Vc are the cyclic layer weight matrices of the input gate, forgetting gate, output gate, and memory cell, respectively; and b_i, b_f, b_o, and b_c are the deviation coefficients of the input gate, forgetting gate, output gate, and memory cell, respectively. The operator

\otimes

represents multiplication by elements.

The LSTM neural network training process is divided into the following steps: Step 1: at time step ‘t’, the forgetting gate

f_{t}

is passed through the new input

x_{t}

and the function of the previous hidden state

z_{t - 1}

. If the forgetting gate value is close to 1, then the information from the last memory cell

c_{t - 1}

will be retained and vice versa; Step 2: the new input state and the function of the previous hidden state from the input gate

i_{t}

are added to the memory cell to obtain it; Step 3: the output gate decides what information should be obtained from the LSTM memory cell to simulate a new hidden state

z_{t}

.

2.2. Multivariate Epidemic Prediction Model with Coupled Human Factors

This paper proposes a multivariate LSTM (M-LSTM) epidemic prediction model that combines correlation and lag between human factors to address actual changes in the epidemic. Figure 2 illustrates the topology of the network structure of the model, which consists of three layers: input, hidden, and output.

According to the M-LSTM model structure, the model training input set needs to be determined first. The three-dimensional input array is formed as [N, T, Var]. Among them, N is the size of the input samples. The variables (Var) of the input layer are selected by calculating correlations between confirmed cases and deaths with the four human influence factors. Then, considering correlations and lags between variables, multivariate forecasts based on correlation thresholds were constructed. For example, the M-LSTM (R_thred > 0.7) indicated that human-influenced variables with correlation thresholds greater than 0.7 were to be screened as inputs. The rolling time window method was used to generate time series samples. The time window T was set to 7 days (the historical data from the first 7 days were used as model inputs (denoted as ts01, ts02, …, ts07), and the future day sample values were the target outputs). Additionally, alignment time was determined based on the time lag between the four human intervention factors and the number of confirmed cases and fatalities. Subsequently, the hidden layer was composed of a 2-layer LSTM structure consisting of 64 neuron nodes in the first layer and 32 neuron nodes in the second layer. To mitigate the overfitting phenomenon during model training, the Dropout algorithm was added to the hidden layer to remove random units and connections from the neural network during training. The mean absolute error (MAE) was selected as the loss function, and the Adam algorithm was used to create optimized parameters for each node’s learning; then, the error was reduced by iterating and adjusting the weights until convergence was achieved. Finally, the prediction results were provided by the output layer, and the inverse normalization process reduced the results to the format of the original data.

2.3. Data Collection

2.3.1. COVID-19 Data Collection

Prediction model accuracy depends on historical data, so sufficient historical data on the outbreak are required. In this study, we collected data from the open dataset Our World in Data (COVID-19 cases), which contains global daily data from the European Centre for Disease Prevention and Control (ECDC) [22]. As a representative sample of the epidemic situation in the Americas, Asia, and Europe, we selected three countries whose populations have been most heavily affected by the COVID-19 epidemic. These countries were the United States, the United Kingdom, and India. For these three countries, daily confirmed cases and deaths per 100,000 population were collected from 1 April 2020 to 1 April 2022. The calculation formulas were as follows:

c o m f i r m e d c a s e s p e r 100 t h o u s a n d = \frac{c o m f i r m e d c a s e s p e r d a y}{T o t a l P o p u l a t i o n} \times 100, 000

(7)

d e a t h s p e r 100 t h o u s a n d = \frac{d e a t h s p e r d a y}{T o t a l P o p u l a t i o n} \times 100, 000

(8)

2.3.2. Human Intervention Data Collection

In this paper, data regarding human intervention measures for COVID-19 were collected through the Oxford COVID-19 Government Response Tracker (OxCGRT) [20]. OxCGRT provides a systematic, longitudinal measure of government responses to COVID-19 since 1 January 2020. Through standardized indicators, the project monitors national and subnational governments’ policies and interventions. It also creates a suite of composite indices to quantify the extent of these human responses. The data mainly cover public information on 20 indicators of government response to COVID-19 measures, of which C1–C8 record information about containment and closure policies; these consist of 8 policy indicators, including school suspension, work stoppage, public event cancellation, assembly restriction, public transportation closure, home quarantine, domestic travel restriction, and international travel control. E1–E4 record economic policies and include citizens’ income support, debt and contract relief, fiscal measures, foreign aid, and 4 other indicators. H1–H7 record health system policies, include public information campaigns, COVID-19 screening, close contact tracing, emergency investments in health care services and vaccines, facial protection, and recent vaccination policies.

Various combinations of these indicators provide four composite indices that reflect the intervention policies in place in a given area. Each index comprises a number of individual policy indicators, as shown in Table 1. For each indicator, an ordinal score was calculated, with half a point deducted for policies that were more targeted than generic. Each value was scaled by its maximum value to produce a score from 0 to 100, with missing values contributing 0, to produce the overall index. This calculation is described as follows:

i n d e x = \frac{1}{k} \sum_{j = 1}^{k} I_{j}

(9)

where k is the number of component indices in a composite index and I_j is the score of a single index.

Each single score I for each indicator j on each day was calculated as follows:

I_{j, t} = 100 \times \frac{v_{j, t} - 0.5 (F_{j} - f_{j, t})}{N_{J}}

(10)

where N_j is the maximum value of the indicator, if that indicator has a flag (F_j = 1 if the indicator has a flag variable, or 0 if the indicator does not have a flag variable); v_j,t is the recorded policy value on the ordinal scale; and f_j,t is the recorded binary flag for that indicator.

Finally, these indicators were aggregated into four composite indices. The first was the overall government response index (OGRI), which recorded the changes in government response measures across indicators that become stronger or weaker over the course of an outbreak; second, the containment and health index (GHI), which combined measures such as lockdown restrictions, testing policies, contact tracing, short-term investments in health care, and investments in vaccines; third, the economic support index (ESI), which recorded indicators such as income support and debt relief; and fourth, the original policy intensity index, the stringency index (SI), which captured the stringency of blockade policies that primarily restricted people’s behavior. In this paper, we collected four indexes (OGRI, GHI, SI, and ESI) from 1 March 2021 to 31 December 2022 in three countries, namely, the USA, the UK, and India, to represent the human impact factors.

3. Experiment

The experiments were divided into two parts: analysis of the relationship between human-influenced factors and epidemic changes and AI model prediction analysis. This experiment examined the correlations and lags between the four human-influenced factors and the number of confirmed epidemic cases and deaths in order to develop a multivariate epidemic prediction model. The second experiment compared the performance of the univariate and multivariate epidemic prediction models after incorporating human-influenced factors. We used Python3.7 as the experimental programming language and Tensorflow as the algorithm platform on a CPU Intel i7 with 8 GB of RAM.

3.1. Evaluation Metrics

In order to assess the predictive validity of the proposed model, three widely used error evaluation metrics, namely, mean absolute error (MAE), root mean square error (RMSE), and goodness of fit (R²), were selected as error metrics to measure the degree of deviation of the predicted values from the actual values, calculated as follows:

M A E = \frac{1}{N} \sum_{n = 1}^{N} | y_{n} - {\overset{\land}{y}}_{n} |

(11)

R M S E = {(\frac{1}{N} {\sum_{n = 1}^{N} (y_{n} - {\overset{\land}{y}}_{n})}^{2})}^{1 / 2}

(12)

R^{2} = 1 - \frac{\sum_{n = 1}^{N} (y_{n} - {\hat{y}}_{n})}{\sum_{n = 1}^{N} (y_{n} - \bar{y})}

(13)

3.2. Results of Correlation and Lag between Human Influences and COVID-19 Epidemic

The Pearson correlation coefficients between the epidemic data and the current indicators of human influences in the three selected countries are shown in Table 2. The results show that the correlation coefficients between the four human influence indicators and the number of confirmed cases and deaths were statistically significant (p < 0.05), suggesting that there is a significant correlation between epidemic changes and epidemic prevention policy indicators. Specifically, the government response index, stringency index, containment and health index, and economic support index were negatively correlated with the number of confirmed cases and deaths in three countries. Accordingly, government prevention and control measures, including response, policy intensity, lockdown restrictions, testing policies, contact tracing, vaccine investments, and economic support contributed to reducing the number of confirmed cases and deaths. In terms of overall correlation with the epidemic data, the economic support index had the highest correlation, followed by the stringency index and the government response index. Moreover, there were differences among the countries in the correlations between the four types of indices and the number of confirmed cases and deaths, reflecting the effects of different types of epidemic prevention and control policies.

The cross-correlation function (CCF) is used to calculate the lagged correlation between human impacts and the COVID-19 epidemic. The CCF is the degree of correlation between two time series at any two different moments. It was assumed that there were two time series, X_t, t = 1, 2, 3, …, and Y_t, t = 1, 2, 3, … Then, the correlation between moment t and moments t + n was the nth-order cross-correlation, which was given using the following equation:

c c f_{n} = \frac{\sum (x_{t} - {\bar{x}}_{t}) (y_{t + n} - {\bar{Y}}_{t + n})}{\sqrt{\sum {(x_{t} - {\bar{x}}_{t})}^{2} \sum {(y_{t + n} - {\bar{Y}}_{t + n})}^{2}}}

(14)

Figure 3, Figure 4 and Figure 5 show the lagged correlations between the four different human factor indices and the number of confirmed cases and deaths in the three countries. It can be seen that lagged effects existed between the four different human intervention factors and the epidemic case data in all three countries. With the increasing number of lag days, the negative correlation between the four human factors and confirmed cases and deaths increased. This demonstrated that human intervention was not immediately effective in controlling the epidemic and took some time to become so. The correlation coefficients generally increased rapidly from 0 to 15 days after the lag, and then the correlations leveled off as the lag days increased. The results indicate that human interventions produced better results around 15 days after they were implemented.

Moreover, there were differences in the lagged effects between the four different human intervention indices and COVID-19 cases across the three countries. In the United States, there were significant lags between the four human intervention factors and the number of confirmed cases and deaths, with an average lag period of between 15 and 30 days; in the United Kingdom, there were notable lags between the four human intervention factors and the number of deaths, but insignificant lags with the number of confirmed cases, with no lagged correlation between the economic support index and confirmed cases. In India, the government response index, stringency index, and containment and health index all had significant lags with confirmed cases and deaths of between 9 and 12 days. Conversely, the economic support index had a substantial impact on the number of confirmed cases and deaths, indicating that early government support helped to control the outbreak.

3.3. Evaluation Results of the Prediction Effectiveness of Multivariate LSTM Models Coupled with Human Influences

For the purpose of forecasting the trend of the COVID-19 epidemic, a multivariate data-driven epidemic prediction model was developed based on a correlation and lag relationship analysis. An input layer featured a selection scheme relying on correlations with human-influenced factors between confirmed cases, deaths, and delays. In the experiment, the data were first divided into two parts: 80% was the training set and the remaining 20% was the test set. The historical time interval T of the input data was 7, the epoch of the model training was 50, and the batch size was 8.

First, three sets of comparative models were developed, including univariate prediction, multivariate prediction, and multivariate prediction considering a correlation threshold filter for human influences. For the univariate case, we used the number of confirmed cases and deaths time series (inputs) to predict the number of confirmed cases and deaths time series (expected outputs), respectively. C (confirmed cases) and D (death cases) were used to represent the univariate inputs to the number of confirmed cases and deaths. Data from the past 7 days were used to forecast the value 1 day in the future using a rolling forecast. For the multivariate case, more variables were gradually added to the input set to form different coupling schemes. The change in the predictive power of the model was then observed experimentally.

The four input coupling schemes for predicting confirmed cases were: confirmed cases and government response index (C + GRI); confirmed cases, government response index, and stringency index (C + GRI + SI); confirmed cases, government response index, stringency index, and containment and health index (C + GRI + SI + CHI); and confirmed cases, government response index, stringency index, containment and health index, and economic support index (C + GRI + SI + CHI + ESI). The four input coupling schemes for death cases were: deaths and government response index (D + GRI); deaths, government response index, and stringency index (D + GRI + SI); deaths, government response index, stringency index, and containment and health index (D + GRI + SI + CHI); and deaths, government response index, stringency index, containment and health index, and economic support index (D + GRI + SI + CHI + ESI). Then, considering the correlations and lags between variables, multivariate forecasts based on correlation thresholds were constructed. For example, the M-LSTM _{(R_thred > 0.7)} indicated that variables with correlation thresholds greater than 0.7 were screened as inputs. Since the situation varied from country to country, the most appropriate threshold was selected based on the results of multiple experiments. Table 3, Table 4 and Table 5 compare the errors of prediction models constructed with different input variables for each of the three countries. Figure 6 shows the curves of the training data and the test data (prediction data) of all the comparative models for the three countries. The specific analysis was conducted as follows.

A comparative analysis of univariate and multivariate forecasting effects indicated that multivariate forecasting methods showed significantly better forecasting performances than univariate methods in most cases in the three countries. The goodness-of-fit R² of the five multivariate forecasting methods was significantly higher than that of the univariate forecasting methods. For instance, compared with univariate forecasting for the number of confirmed cases (C) in the United States, the R² of the multivariate forecasting methods C + GRI, C + GRI + SI, C + GRI + SI + CHI, C + GRI + SI + CHI + ESI, and M-LSTM improved by 19.80%, 41.76%, 45.23%, 51.02%, and 48.93%, respectively. When comparing multivariate forecasting methods for deaths (D) in the USA, C + GRI, C + GRI + SI, C + GRI + SI + CHI, C + GRI + SI + CHI + ESI, and M-LSTM improved R² by 59.03%, 59.99%, 104.63%, 80.74%, and 107.19%, respectively. Based on the above results, it appears that incorporating human influencing factors into the LSTM model will be able to significantly improve its ability to predict epidemics in the future.

The predictive effects of the coupled models with different input variables ere compared and analyzed. The synthesis in Table 3, Table 4 and Table 5 shows that the average predictive performance of the multivariate model improved as more input variables were added, but the situation varied slightly for different countries. In the United States, for example, the MAE and RMSE continued to decrease and the R² continued to increase with the addition of input variables when predicting confirmed cases and deaths. However, for the UK and India, an increase in input variables did not necessarily result in an improvement in prediction performance; for example, D + GRI + SI achieved the best prediction performance and D + GRI + SI + CHI achieved the worst for the prediction of deaths in the UK; in India, C + GRI + SI achieved the best and C + GRI + SI + CHI achieves the worst outcomes for the prediction of confirmed diagnoses. As a result, coupling more input variables improved prediction performance, but coupling some individual input variables also resulted in less accurate predictions.

A comparison of the prediction effects of multivariate methods screened using correlation was conducted. Correlation-based M-LSTM methods consistently outperformed other prediction methods in predicting confirmed cases and deaths. For example, the mean R² for the M-LSTM improved by 31.75%, 18.82%, 32.44%, and 16.56% in comparison to C + GRI, C + GRI + SI, C + GRI + SI + CHI, and C + GRI + SI + CHI + ESI, respectively, in forecasting the number of confirmed cases in the three countries. Comparing the M-LSTM to D + GRI, D + GRI + SI, D + GRI + SI + CHI, and D + GRI + SI + CHI + ESI, the mean R² of the M-LSTM increased by 50.45%, 14.45%, 901.76%, and 28.19%, respectively, in predicting the number of death cases in the three countries. Further observations from Figure 6 show similar conclusions. For instance, Figure 6 shows that M-LSTM was closer to the actual values than the other models. Specifically, in predicting the number of confirmed and death cases in the UK, M-LSTM yielded the best-fitting results with the actual values, indicating its superior forecasting capability relative to the other models. It can be concluded that the M-LSTM proposed in this paper can achieve a superior prediction performance compared with the other models.

4. Conclusion and Discussion

In recent years, research on the application of AI technology to infectious disease prediction has become increasingly popular. Especially since the outbreak of COVID-19, many researchers have developed various AI models to simulate the spread and development of the epidemic in order to assist government agencies in preparing and formulating countermeasures in advance. However, most of the existing studies have ignored the impact of human influences on epidemic prediction and failed to effectively integrate human influences into epidemic prediction; on the other hand, most of these studies used univariate prediction models, which are more common in epidemiology. Thus, this study developed a multivariate machine learning epidemic prediction model that incorporated human intervention factors and investigated the logical relationship between human factors and epidemic cases. An empirical study was conducted to analyze the correlation and lag between human factors and epidemic changes using the COVID-19 epidemic as an example. In addition, several multivariate machine learning epidemic prediction models were developed and compared in terms of their predictive performance.

Based on the above research work, the major findings of the paper are as follows.

(1) There is a correlation between epidemic change and human influences. It was found that the government response index, stringency index, containment and health index, and economic support index were all negatively correlated with confirmed cases and deaths. Furthermore, the correlation between the four types of human intervention factors and the number of confirmations and deaths varied between the countries, reflecting the different effects of the various epidemic prevention policies implemented in different regions.

(2) There was a significant lag between the four types of human intervention factors and the number of diagnoses and deaths. The negative correlations between the four human factors and the number of confirmed cases and deaths increased with the increase in lag days. This indicates that epidemic control measures would not be effective immediately after the implementation of human interventions, but would require a period to be effective. Correlation coefficients generally increase faster with a lag time of between 0 and 15 days, indicating that human interventions are more effective about 15 days after they are implemented.

(3) The M-LSTMs proposed in this study are superior to univariate predictions. The addition of human factor variables can improve prediction performance, but the addition of individual factors can also lead to poor prediction results. The proposed multivariate prediction method with correlation screening in this paper obtained the highest prediction performance among all test cases.

Author Contributions

Conceptualization, methodology, original draft preparation Z.Q.; investigation, data curation B.Z.; writing—review and editing, supervision H.W.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation project of China (72004086) and Fundamental Research Funds for the Central Universities (lzujbky-2022-kb09) and (22lzujbkydx011).

Data Availability Statement

The data were collected from the open datasets of Our World in Data (https://ourworldindata.org/covid-cases (accessed on 15 February 2023)) and the Oxford COVID-19 Government Response Tracker (https://github.com/OxCGRT (accessed on 15 February 2023)).

Conflicts of Interest

The authors declare no conflict of interest.

References

Baker, R.E.; Mahmud, A.S.; Miller, I.F.; Rajeev, M.; Rasambainarivo, F.; Rice, B.L.; Takahashi, S.; Tatem, A.J.; Wagner, C.E.; Wang, L.F.; et al. Infectious Disease in an Era of Global Change. Nat. Rev. Microbiol. 2022, 20, 193–205. [Google Scholar] [CrossRef] [PubMed]
Yi, J.; Zhang, H.; Mao, J.; Chen, Y.; Zhong, H.; Wang, Y. Review on the COVID-19 Pandemic Prevention and Control System Based on AI. Eng. Appl. Artif. Intell. 2022, 114, 105184. [Google Scholar] [CrossRef] [PubMed]
Kassania, S.H.; Kassanib, P.H.; Wesolowskic, M.J.; Schneidera, K.A.; Detersa, R. Automatic Detection of Coronavirus Disease (COVID-19) in X-Ray and CT Images: A Machine Learning Based Approach. Biocybern. Biomed. Eng. 2021, 41, 867–879. [Google Scholar] [CrossRef] [PubMed]
Zgheib, R.; Kamalov, F.; Chahbandarian, G.; Labban, O.E. Diagnosing COVID-19 on Limited Data: A Comparative Study of Machine Learning Methods. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Shenzhen, China, 12–15 August 2021; Volume 12837 LNCS. [Google Scholar]
Akinnuwesi, B.A.; Fashoto, S.G.; Mbunge, E.; Odumabo, A.; Metfula, A.S.; Mashwama, P.; Uzoka, F.M.; Owolabi, O.; Okpeku, M.; Amusa, O.O. Application of Intelligence-Based Computational Techniques for Classification and Early Differential Diagnosis of COVID-19 Disease. Data Sci. Manag. 2021, 4, 10–18. [Google Scholar] [CrossRef]
Qu, Z.; Li, Y.; Jiang, X.; Niu, C. An Innovative Ensemble Model Based on Multiple Neural Networks and a Novel Heuristic Optimization Algorithm for COVID-19 Forecasting. Expert Syst. Appl. 2023, 212, 118746. [Google Scholar] [CrossRef] [PubMed]
Friedman, J.; Liu, P.; Troeger, C.E.; Carter, A.; Reiner, R.C.; Barber, R.M.; Collins, J.; Lim, S.S.; Pigott, D.M.; Vos, T.; et al. Predictive Performance of International COVID-19 Mortality Forecasting Models. Nat. Commun. 2021, 12, 2609. [Google Scholar] [CrossRef]
Guo, S.; Fang, F.; Zhou, T.; Zhang, W.; Guo, Q.; Zeng, R.; Chen, X.; Liu, J.; Lu, X. Improving Google Flu Trends for COVID-19 Estimates Using Weibo Posts. Data Sci. Manag. 2021, 3, 13–21. [Google Scholar] [CrossRef]
Ng, P.C.; Spachos, P.; Plataniotis, K.N. COVID-19 and Your Smartphone: BLE-Based Smart Contact Tracing. IEEE Syst. J. 2021, 15, 5367–5378. [Google Scholar] [CrossRef] [PubMed]
Esmail, S.; Danter, W. Viral Pandemic Preparedness: A Pluripotent Stem Cell-Based Machine-Learning Platform for Simulating SARS-CoV-2 Infection to Enable Drug Discovery and Repurposing. Stem. Cells Transl. Med. 2021, 10, 239–250. [Google Scholar] [CrossRef]
Ly, K.T. A COVID-19 Forecasting System Using Adaptive Neuro-Fuzzy Inference. Financ. Res. Lett. 2020, 41, 101844. [Google Scholar] [CrossRef]
Parbat, D.; Chakraborty, M. A Python Based Support Vector Regression Model for Prediction of COVID19 Cases in India. Chaos Solitons Fractals 2020, 138, 109942. [Google Scholar] [CrossRef] [PubMed]
Chimmula, V.K.R.; Zhang, L. Time Series Forecasting of COVID-19 Transmission in Canada Using LSTM Networks. Chaos Solitons Fractals 2020, 135, 109864. [Google Scholar] [CrossRef]
Shastri, S.; Singh, K.; Kumar, S.; Kour, P.; Mansotra, V. Time Series Forecasting of Covid-19 Using Deep Learning Models: India-USA Comparative Case Study. Chaos Solitons Fractals 2020, 140, 110227. [Google Scholar] [CrossRef]
Kırbaş, İ.; Sözen, A.; Tuncer, A.D.; Kazancıoğlu, F.Ş. Comparative Analysis and Forecasting of COVID-19 Cases in Various European Countries with ARIMA, NARNN and LSTM Approaches. Chaos Solitons Fractals 2020, 138, 110015. [Google Scholar] [CrossRef] [PubMed]
Marzouk, M.; Elshaboury, N.; Abdel-Latif, A.; Azab, S. Deep Learning Model for Forecasting COVID-19 Outbreak in Egypt. Process Saf. Environ. Prot. 2021, 153, 363–375. [Google Scholar] [CrossRef]
Devaraj, J.; Madurai Elavarasan, R.; Pugazhendhi, R.; Shafiullah, G.M.; Ganesan, S.; Jeysree, A.K.; Khan, I.A.; Hossain, E. Forecasting of COVID-19 Cases Using Deep Learning Models: Is It Reliable and Practically Significant? Results Phys. 2021, 21, 103817. [Google Scholar] [CrossRef]
Dehning, J.; Zierenberg, J.; Spitzner, F.P.; Wibral, M.; Neto, J.P.; Wilczek, M.; Priesemann, V. Inferring Change Points in the Spread of COVID-19 Reveals the Effectiveness of Interventions. Science 2020, 369, eabb9789. [Google Scholar] [CrossRef] [PubMed]
Haug, N.; Geyrhofer, L.; Londei, A.; Dervic, E.; Desvars-Larrive, A.; Loreto, V.; Pinior, B.; Thurner, S.; Klimek, P. Ranking the Effectiveness of Worldwide COVID-19 Government Interventions. Nat. Hum. Behav. 2020, 4, 1303–1312. [Google Scholar] [CrossRef] [PubMed]
Hale, T.; Angrist, N.; Goldszmidt, R.; Kira, B.; Petherick, A.; Phillips, T.; Webster, S.; Cameron-Blake, E.; Hallas, L.; Majumdar, S.; et al. A Global Panel Database of Pandemic Policies (Oxford COVID-19 Government Response Tracker). Nat. Hum. Behav. 2021, 5, 529–538. [Google Scholar] [CrossRef] [PubMed]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef] [PubMed]
Coronavirus (COVID-19) Cases—Our World in Data. Available online: https://ourworldindata.org/covid-cases (accessed on 13 February 2022).

Figure 1. LSTM schematic.

Figure 2. M-LSTM model structure.

Figure 3. Lagged relationship between human interventions and US epidemic data.

Figure 4. Lagged relationship between human interventions and UK epidemic data.

Figure 5. Lagged relationship between human interventions and Indian epidemic data.

Figure 6. Comparison of predicted and actual values of different models in three countries.

Table 1. OxCGRT index composition.

Index Name	Government Response Index	Containment and Health Index	Stringency Index	Economic Support Index
C1	x	x	x
C2	x	x	x
C3	x	x	x
C4	x	x	x
C5	x	x	x
C6	x	x	x
C7	x	x	x
C8	x	x	x
E1	x			x
E2	x			x
E3
E4
H1	x	x	x
H2	x	x
H3	x	x
H4
H5
H6	x	x
H7	x	x
H8	x	x

(x indicates that an indicator contributes to that index).

Table 2. Pearson correlation coefficients of human indicators and epidemic data.

Country	Data Type	Government Response Index	Stringency Index	Containment Health Index	Economic Support Index
USA	Confirmed cases	−0.318 **	−0.430 **	0.011	−0.739 **
USA	Deaths	−0.284 **	−0.460 **	0.022	−0.683 **
UK	Confirmed cases	−0.642 **	−0.720 **	−0.531 **	−0.766**
UK	Deaths	−0.324 **	−0.526 **	−0.226 **	−0.495 **
India	Confirmed cases	−0.402 **	−0.434 **	−0.190 **	−0.795 **
India	Deaths	−0.434 **	−0.470 **	−0.228 **	−0.788 **

(Note: ** p < 0.01).

Table 3. Comparison of multivariate model prediction results in the USA.

Dataset	Input Variables	MAE	RMSE	R²
Confirmed cases	C	24,017.09	16,754.50	0.642
	C + GRI	19,280.49	15,318.06	0.769
	C + GRI + SI	12,011.64	10,755.52	0.911
	C + GRI + SI + CHI	10,408.69	9229.33	0.933
	C + GRI + SI + CHI + ESI	6960.93	5311.93	0.970
	M-LSTM _{(R_thred > 0.4)}	8370.55	6751.52	0.957
Deaths	D	161.50	160.10	0.478
	D + GRI	109.36	107.05	0.761
	D + GRI + SI	108.30	91.73	0.765
	D + GRI + SI + CHI	32.36	28.48	0.979
	D + GRI + SI + CHI + ESI	82.24	73.49	0.865
	M-LSTM _{(R_thred > 0.4)}	20.87	15.08	0.991

Table 4. Comparison of multivariate model prediction results in the UK.

Dataset	Input Variables	MAE	RMSE	R²
Confirmed cases	C	31,945.93	23,168.87	0.696
	C + GRI	35,955.69	27,135.31	0.615
	C + GRI + SI	34,315.57	24,584.63	0.650
	C + GRI + SI + CHI	31,845.91	25,464.80	0.698
	C + GRI + SI + CHI + ESI	28,594.80	23,038.30	0.757
	M-LSTM _{(R_thred > 0.7)}	12,572.57	10,922.31	0.953
Deaths	D	108.79	105.49	0.065
	D + GRI	61.08	40.12	0.705
	D + GRI + SI	36.46	32.90	0.895
	D + GRI + SI + CHI	110.47	82.54	0.036
	D + GRI + SI + CHI + ESI	55.33	46.36	0.758
	M-LSTM _{(R_thred > 0.5)}	28.62	22.13	0.935

Table 5. Comparison of multivariate model prediction results in India.

Dataset	Input Variables	MAE	RMSE	R²
Confirmed cases	C	1896.81	1658.97	0.514
	C + GRI	1087.70	901.78	0.840
	C + GRI + SI	712.68	654.50	0.931
	C + GRI + SI + CHI	1686.22	1468.51	0.616
	C + GRI + SI + CHI + ESI	1277.73	1104.02	0.780
	M-LSTM _{(R_thred > 0.4)}	426.37	343.09	0.975
Deaths	D	13.65	13.38	0.005
	D + GRI	9.46	9.41	0.522
	D + GRI + SI	4.35	4.09	0.899
	D + GRI + SI + CHI	11.51	11.18	0.293
	D + GRI + SI + CHI + ESI	7.85	7.31	0.671
	M-LSTM _{(R_thred > 0.4)}	1.77	1.39	0.983

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, Z.; Zhang, B.; Wang, H. A Multivariate Deep Learning Model with Coupled Human Intervention Factors for COVID-19 Forecasting. Systems 2023, 11, 201. https://doi.org/10.3390/systems11040201

AMA Style

Qu Z, Zhang B, Wang H. A Multivariate Deep Learning Model with Coupled Human Intervention Factors for COVID-19 Forecasting. Systems. 2023; 11(4):201. https://doi.org/10.3390/systems11040201

Chicago/Turabian Style

Qu, Zongxi, Beidou Zhang, and Hongpeng Wang. 2023. "A Multivariate Deep Learning Model with Coupled Human Intervention Factors for COVID-19 Forecasting" Systems 11, no. 4: 201. https://doi.org/10.3390/systems11040201

APA Style

Qu, Z., Zhang, B., & Wang, H. (2023). A Multivariate Deep Learning Model with Coupled Human Intervention Factors for COVID-19 Forecasting. Systems, 11(4), 201. https://doi.org/10.3390/systems11040201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multivariate Deep Learning Model with Coupled Human Intervention Factors for COVID-19 Forecasting

Abstract

1. Introduction

2. Methods and Data

2.1. LSTM Model Principles

2.2. Multivariate Epidemic Prediction Model with Coupled Human Factors

2.3. Data Collection

2.3.1. COVID-19 Data Collection

2.3.2. Human Intervention Data Collection

3. Experiment

3.1. Evaluation Metrics

3.2. Results of Correlation and Lag between Human Influences and COVID-19 Epidemic

3.3. Evaluation Results of the Prediction Effectiveness of Multivariate LSTM Models Coupled with Human Influences

4. Conclusion and Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Index Name	Government Response Index	Containment and Health Index	Stringency Index	Economic Support Index
C1	x	x	x
C2	x	x	x
C3	x	x	x
C4	x	x	x
C5	x	x	x
C6	x	x	x
C7	x	x	x
C8	x	x	x
E1	x			x
E2	x			x
E3
E4
H1	x	x	x
H2	x	x
H3	x	x
H4
H5
H6	x	x
H7	x	x
H8	x	x

Index Name	Government Response Index	Containment and Health Index	Stringency Index	Economic Support Index
C1	x	x	x
C2	x	x	x
C3	x	x	x
C4	x	x	x
C5	x	x	x
C6	x	x	x
C7	x	x	x
C8	x	x	x
E1	x			x
E2	x			x
E3
E4
H1	x	x	x
H2	x	x
H3	x	x
H4
H5
H6	x	x
H7	x	x
H8	x	x

Index Name	Government Response Index	Containment and Health Index	Stringency Index	Economic Support Index
C1	x	x	x
C2	x	x	x
C3	x	x	x
C4	x	x	x
C5	x	x	x
C6	x	x	x
C7	x	x	x
C8	x	x	x
E1	x			x
E2	x			x
E3
E4
H1	x	x	x
H2	x	x
H3	x	x
H4
H5
H6	x	x
H7	x	x
H8	x	x