Temporal Relationship between Daily Reports of COVID-19 Infections and Related GDELT and Tweet Mentions

^{*}

Abstract

**:**

## 1. Introduction

- Assess time-lagged relationships between new COVID-19 cases and the number of COVID-19-related GDELT articles and tweets in selected countries using cross-correlation analysis.
- Identify anomalies and their causes on days with abnormally high COVID-19-related responses on GDELT and Twitter but low numbers of new COVID-19 cases.

## 2. Literature Review

## 3. Materials and Methods

#### 3.1. Data Sources

#### 3.1.1. New Daily COVID-19 Infections

#### 3.1.2. Twitter

#### 3.1.3. GDELT

#### 3.2. Data Preprocessing

#### 3.3. Cross-Correlation Analysis

#### 3.3.1. Step 1: Time Series Decomposition

#### 3.3.2. Step 2: Time Series Transformation and Differencing

#### 3.3.3. Step 3: Fitting an ARIMA Model to the Input Series

_{s}terms are included where P, D, Q, and s represent the seasonal AR term, differencing order, MA term, and seasonality, respectively, as in ARIMA (p, d, q) (P, D, Q)

_{s}. An ARMA (autoregressive moving average) model of (p, q) order consists of p AR and q MA terms, as shown in Equation (2) where $\left\{{\mathrm{Z}}_{\mathrm{t}}\right\}$ is a purely random process with mean zero and variance ${\mathsf{\sigma}}_{\mathrm{z}}^{2}$ and ${\mathsf{\alpha}}_{1\dots \mathrm{p}}$ and ${\mathsf{\beta}}_{1\dots \mathrm{q}}$ are the autoregressive and moving average coefficients, respectively [46].

^{d}equal to the d-th nonseasonal difference so that an ARIMA process of order (p, d, q) can be formulated as (Equation (5)):

#### 3.3.4. Steps 4 and 5: Prewhitening and Cross-Correlation of Residuals

_{7}model for the U.S. obtained in the previous step was fitted to the stationary response variables, i.e., COVID-19-related GDELT articles and tweets for the U.S., respectively (step 4). The residuals were then cross-correlated with residuals from the ARIMA model of the COVID-19 dataset (step 5). CCF plots in Figure 6 display cross-correlation values of COVID-19 cases versus GDELT articles (Figure 6a) and tweets (Figure 6b) in the U.S. for different time lags.

#### 3.3.5. Steps 6 and 7: Vector Autoregressive Models and Cross-Correlation of Residuals

**Φ**is a matrix polynomial of order p in the backward shift operator B, and ${\mathsf{\u03f5}}_{\mathrm{t}}$ is a vector of white noise error terms for the m variables at time t. A VAR (1) model for two series is shown in Equation (7), where values of ${\mathrm{X}}_{1,\mathrm{t}}{\mathrm{and}\mathrm{X}}_{2,\mathrm{t}}$ depend linearly on the values of both series at time t−1, and {${\mathsf{\varphi}}_{\mathrm{ij}}$} are autoregressive coefficients.

#### 3.4. Anomaly Detection

#### 3.5. Word Frequency Analysis

## 4. Results

#### 4.1. Cross-Correlation

#### 4.1.1. Positive Lag

#### 4.1.2. Negative Lag

#### 4.1.3. Positive and Negative Lag

#### 4.2. Anomaly Detection

## 5. Discussion

## 6. Conclusions

## Supplementary Materials

## Appendix A

**Figure A1.**COVID-19-related GDELT news items on 5 August 2020, that mention the explosion at the Port of Beirut.

**Figure A2.**Impulse response functions of COVID-19 cases shock on related GDELT and Twitter responses, respectively, for the U.K. (

**a**,

**b**), the Philippines (

**c**,

**d**), and Germany (

**e**,

**f**).

**Figure A3.**Anomalies in COVID-19 (i), GDELT (ii), and Twitter (iii) (left) with the word cloud of headlines for Bangladesh (

**a**,

**b**), Bolivia (

**c**,

**d**), and Botswana (

**e**,

**f**).

**Figure A4.**Anomalies in COVID-19 (i), GDELT (ii), and Twitter (iii) (left) with the word cloud of headlines for Cyprus (

**a**,

**b**), Guatemala (

**c**,

**d**), and Jamaica (

**e**,

**f**).

**Figure A5.**Anomalies in COVID-19 (i), GDELT (ii), and Twitter (iii) (left) with the word cloud of headlines for the Netherlands (

**a**,

**b**), Serbia (

**c**,

**d**) and Singapore (

**e**,

**f**).

**Table A1.**VAR parameters for COVID-19 infections versus GDELT and Twitter for the U.K., the Philippines, and Germany.

Lag | VAR (COVID-19 vs. GDELT) | Lag | VAR (COVID-19 vs. Twitter) | |||
---|---|---|---|---|---|---|

COVID-19 | GDELT | COVID-19 | ||||

U.K. | GDELT (1) | −0.429 *** (0.128) | Twitter (2) | −0.546 *** (0.125) | ||

GDELT (2) | −0.291 ** (0.141) | Twitter (6) | 0.218 * (0.126) | |||

COVID-19 (5) | −0.087 * (0.045) | COVID-19 (7) | 0.229 ** (0.123) | |||

COVID-19 (7) | 0.217 * (0.122) | Twitter (4) | 0.213 * (0.123) | |||

N | 82 | 82 | 82 | 82 | ||

R^{2} | 0.482 | 0.281 | 0.448 | 0.416 | ||

Adjusted R^{2} | 0.312 | 0.046 | 0.267 | 0.225 | ||

Philippines | COVID-19 (1) | −0.615 *** (0.101) | 5.450 ** (2.173) | COVID-19 (1) | −0.599 *** (0.121) | |

GDELT (1) | −0.448 *** (0.103) | Twitter (1) | 0.191 * (0.114) | −0.594 *** (0.118) | ||

COVID-19 (2) | −0.475 *** (0.105) | COVID-19 (2) | −0.449 ** (0.138) | |||

GDELT (2) | −0.421 *** (0.098) | Twitter (2) | −0.375 * (0.132) | |||

Twitter (3) | −0.301 ** (0.131) | |||||

N | 87 | 87 | 82 | 82 | ||

R^{2} | 0.391 | 0.364 | 0.449 | 0.352 | ||

Adjusted R^{2} | 0.311 | 0.281 | 0.338 | 0.222 | ||

Germany | COVID-19 (1) | −0.625 *** (0.113) | 0.011 ** (0.005) | COVID-19 (1) | −0.653 *** (0.125) | |

GDELT (1) | −0.665 *** (0.121) | Twitter (3) | 1.692 * (0.915) | |||

GDELT (2) | −0.513 *** (0.145) | COVID-19 (4) | 0.322 ** (0.132) | |||

GDELT (3) | −0.430 *** (0.150) | Twitter (4) | 0.213 * (0.123) | |||

COVID-19 (4) | 0.302 ** (0.131) | COVID-19 (5) | 0.443 *** (0.136) | |||

COVID-19 (5) | 0.386 *** (0.112) | Twitter (6) | 2.079 ** (0.937) | |||

N | 84 | 84 | 82 | 82 | ||

R^{2} | 0.522 | 0.418 | 0.598 | 0.322 | ||

Adjusted R^{2} | 0.408 | 0.280 | 0.467 | 0.099 |

**Figure 1.**Countries with daily positive count values in all three datasets (

**a**) and subset of countries that were chosen for a 90-day study period (

**b**).

**Figure 3.**Time series decomposition of daily reported COVID-19 cases for the U.S. between 29 February 2020 and 29 May 2020.

**Figure 7.**Impulse response function of COVID-19 cases shock on related GDELT (

**a**) and Twitter responses (

**b**) for Canada.

**Figure 8.**Anomalies in COVID-19 (i), GDELT (ii), and Twitter (iii) time series (

**a**) and word cloud of headlines for GDELT outlier on 5 August 2020 (

**b**) in Lebanon.

Lag | VAR (COVID-19 and GDELT) | Lag | VAR (COVID-19 and Twitter) | ||
---|---|---|---|---|---|

COVID-19 | GDELT | COVID-19 | |||

COVID-19 (1) | −0.860 *** (0.122) | COVID-19 (1) | −0.865 *** (0.22) | ||

GDELT (1) (1) | −0.289 ** (0.121) | Twitter (1) | 0.002 * (0.001) | ||

GDELT (2) | −0.288 *** (0.125) | Twitter (2) | −0.466 *** (0.120) | ||

GDELT (3) | 0.029 * (0.015) | Twitter (3) | 0.303 * (0.130) | ||

COVID-19 (5) | 0.346 ** (0.156) | Twitter (4) | 0.002 * (0.001) | ||

COVID-19 (6) | 0.234 * (0.122) | Twitter (5) | 2.079 ** (0.937) | 0.334 *** (0.124) | |

N | 83 | 83 | 84 | 84 | |

R^{2} | 0.614 | 0.219 | 0.601 | 0.385 | |

Adjusted R^{2} | 0.506 | 0.091 | 0.506 | 0.238 |

Country | Model | Time Lag in Days | Quadrant | |
---|---|---|---|---|

COVID-19 vs. GDELT | COVID-19 vs. Twitter | |||

Australia | ARIMA (1, 0, 2) | 12 | 3 | 1 |

Brazil | ARIMA (0, 0, 1) with nonzero mean | 7 | 10 | 1 |

France | ARIMA (0, 0, 1) | none | 8 | 1 |

Greece | ARIMA (0, 0, 1) | 1 | 0 | 1 |

India | ARIMA (4, 0, 0) with nonzero mean | 7 | 14 | 1 |

Italy | ARIMA (0, 1, 2) (0, 0, 1)_{7} | 16 | 1 | 1 |

Poland | ARIMA (2, 0, 0) with nonzero mean | −16 | 0 | 1 and 3 |

U.S. | ARIMA (0, 1, 2) (1, 0, 1)_{7} | 7 | 5 | 1 |

Canada | VAR (6)—COVID-19 and GDELT VAR (5)—COVID-19 and Twitter | 0 | 11 | 1 |

Germany | VAR (5)—COVID-19 and GDELT VAR (7)—COVID-19 and Twitter | 7 | 11 | 1 |

Philippines | VAR (2)—COVID-19 and GDELT VAR (4)—COVID-19 and Twitter | −7 | 4 | 1 and 3 |

U.K. | VAR (7)—COVID-19 and GDELT VAR (7)—COVID-19 and Twitter | 15 | 13 | 1 |

Country | Anomaly Date | Frequent Words | Events |
---|---|---|---|

Bangladesh | 2020-05-04 | holidays | National holiday |

Bolivia | 2020-10-19 | election, party, victory | General elections |

Botswana | 2020-7-31 | requirements, compliant | Introduction of lockdown |

Cyprus | 2021-03-07 | Cyprus, protest | Protests |

Guatemala | 2020-09-19 | president | President of Guatemala contracted COVID-19 |

Jamaica | 2020-08-25 | Usain Bolt | Jamaican Olympian contracted COVID-19 |

Lebanon | 2020-08-05 | explosion, deadly, Beirut | Beirut port explosion |

Netherlands | 2021-03-03 | explosion | Explosion |

Serbia | 2020-07-08 | protest, violent | Protests |

Singapore | 2020-12-09 | cruise | COVID-19 scare on a cruise ship |

