Next Article in Journal
A Merging Approach for Improving the Quality of Gridded Precipitation Datasets over Burkina Faso
Previous Article in Journal
Development of a Diagnostic Algorithm for Detecting Freezing Precipitation from ERA5 Dataset: An Adjustment to the Far East
Previous Article in Special Issue
The Impact of Meteorological Factors on Stroke Incidence in the Transdanubian Region of Hungary
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Effect of Temperature on the Spread of Contagious Diseases: Evidence from over 2000 Years of Data

1
Department of Economics & Business Analytics, University of New Haven, West Haven, CT 06516, USA
2
Department of Economics, OSTIM Technical University, 06374 Ankara, Türkiye
3
School of Management, Simmons University, Boston, MA 02115, USA
4
Department of Economics, University of Pretoria, Pretoria 0002, South Africa
5
Department of Business Management, University of Pretoria, Pretoria 0002, South Africa
*
Author to whom correspondence should be addressed.
Climate 2024, 12(12), 225; https://doi.org/10.3390/cli12120225
Submission received: 12 September 2024 / Revised: 17 November 2024 / Accepted: 6 December 2024 / Published: 20 December 2024
(This article belongs to the Special Issue Climate Impact on Human Health)

Abstract

:
The COVID-19 pandemic led to a surge in interest among scholars and public health professionals in identifying the predictors of health shocks and their transmission in the population. With temperature increases becoming a persistent climate stress, our aim is to evaluate how temperature specifically impacts the incidences of contagious disease. Using annual data from 1 AD to 2021 AD on the incidence of contagious disease and temperature anomalies, we apply both parametric and nonparametric modelling techniques and provide estimates of the contemporaneous, as well as lagged, effects of temperature anomalies on the spread of contagious diseases. A nonhomogeneous hidden Markov model is then applied to estimate the time-varying transition probabilities between hidden states where the transition probabilities are governed by covariates. For all empirical specifications, we find consistent evidence that temperature anomalies have a statistically significant effect on the incidence of a contagious disease in any given year covered in the sample period. The best fit model further indicates that the contemporaneous effect of a temperature anomaly on the response variable is the strongest. As temperature predictions continue to become more accurate, our results indicate that such information can be used to implement effective public health responses to limit the spread of contagious diseases. These findings further have implications for designing cost effective infectious disease control policies for different regions of the world.

1. Introduction

The global scale and impact of the COVID-19 pandemic led to heightened interests and efforts among scientists, governments, and academics across different fields to identify ways to minimize the cumulative damage from the spread of infectious diseases to human health and the global economy. Infectious diseases are caused by pathogens, such as viruses and harmful bacteria. They include both noncommunicable and communicable or contagious diseases. These diseases can spread through disease vectors, such as mosquitoes, and other transmission pathways, such as air, water, food, and direct human contact. The macro impact caused by the spread of an infectious disease depends on the number of individuals affected by the disease, the speed at which the disease spreads through a population, and the length of time the disease persists in any population. For example, an epidemic is characterized by an accelerated increase in the number of affected individuals in a given region; whereas an endemic refers to a situation when a disease persists in a region for a long time. A pandemic is a scaled-up epidemic, where a contagious infectious disease spreads at an expedited rate through a larger geographic area. It has an epicenter in a region of the world from where the disease spreads rapidly across multiple countries affecting different populations and causing widespread economic losses. It can happen decades and even centuries after the occurrence of an earlier pandemic, which may have a different epicenter located thousands of miles away.
It is extremely challenging to compare epidemics and pandemics given that a myriad of contextual factors lies behind each occurrence. Given the rarity and uniqueness of such events, identifying a common set of contributory factors that cause these events and influence the transmission rates through space and time can be a demanding task for scholars. For example, each event is characterized by a specific point of origin and shaped by a unique line of historical events leading to the onset of that event. The extent of damage to human lives has also varied greatly among the major disease outbreaks over the last 2000-plus years. For example, the World Health Organization estimates that the total number of deaths from COVID-19 is close to 7 million [1]. In contrast, the estimated number of deaths caused by the Black Death plague of Central Asia between 1347 and 1351 has been estimated to be between 75–200 million [2]. However, in spite of all the challenges inherent in this type of analysis, the effort is always worthwhile, given that epidemics and pandemics are large-scale disruptive events that result in shocks to one or more economies and affect society’s overall wellbeing. The expected damages associated with such events can sometimes affect multiple generations of the population, albeit in different ways. Any lesson learned and applied could potentially lead to scores of human lives being saved during a future occurrence. Unlike earlier pandemics, the 2003 SARS-COVID pandemic and the 2019 COVID-19 happened at a time when ample scientific evidence was available on global warming [3], which has spurred interests among scientists and policymakers alike about the relationship between ambient temperature and the transmission rate of contagious diseases.
The Climate Change 2021 report presented by the Intergovernmental Panel on Climate Change presents some startling changes in the global environment that have happened over the past few decades, which are unprecedented in recent human history. For example, Figure 1 below shows that the global surface temperature increased sharply between 1950 and 2020 when contrasted with the relatively moderate rise in surface temperature between 1850 and 1950. The left panel in the following figure shows the changes in the global surface temperature over the past 2000 years using both reconstructed and observed temperature data [4]. The reconstructed data cover the period from 1 AD until the year 2000; whereas the observed data cover the 1850 to 2020 timeframe. The panel on the right focuses on the 1850–2020 period, showing the changes in global surface temperature. The black line indicates observed temperature data. The brown line shows simulated temperature data accounting for both natural and human-related factors, while the blue line indicates simulated surface temperature data that only account for natural climate-change-related factors. The gap between the brown and the blue lines represents an approximate measure of the rise in global surface temperature that stem from factors related to anthropogenic activities, particularly those following the first industrial revolution.
Between 1970 and 2020, the global surface temperature increased faster than any other 50-year period over at least the past 2000 years. Hot extreme events, such as heatwaves, have become more frequent and intense since 1950; whereas cold extreme events have become less frequent and less severe. Human-induced climate events have led to droughts. While the facts about the changes in the global environment are gravely concerning by themselves, they do not capture the full magnitude of potential adversity that can stem from such changes in the future, such as hastening the spread of infectious diseases in the future [5]. Geographic boundaries of disease ranges are climate sensitive. They can both shift and expand with changes in temperature through the effects of various disease-carrying vectors. For example, valley fever is a fungal disease that is endemic to the southwestern United States [6], with the region’s temperature and precipitation affecting the number of valley fever cases and the extent of the spread of the disease across the region. Using climate projections for the 21st century along with a climate niche model derived from contemporary climate and disease incidence data, [6] predicted that throughout this century the endemic region will spread north reaching up to the Canadian border covering the western U.S. states and resulting in 50% more cases.
Mora et al. [7] systematically analyzed empirical examples of 375 infectious diseases to find that 58% of those diseases have been aggravated by climatic hazards. Their research revealed 1006 unique pathways in which climate hazards, through different transmission types, led to pathogenic diseases. In some situations, climate-related hazards, such as heat waves and droughts, resulted in habitat destruction, which led to shifts in the geographical range of different species that brought pathogens and vectors closer to human populations. Warmer temperatures and precipitation expanded the areas covered by vectors, such as mosquitoes, ticks, and fleas. In their review, Caminade et al. [8] highlighted the impact of climate change on the distributions of vectors and pathogens in peri-Arctic, Arctic, temperate, and high-altitude regions in tropical zones. Malaria is a fatal vector-borne infectious disease caused by a plasmodium species that is transmitted between humans by infected female anopheles mosquitoes. Its incidence is affected by changes in temperature, rainfall, and humidity. Siraj et al. [9] used Ethiopian and Colombian data to demonstrate the impact of temperature on malaria, as warmer temperatures increased the incidence of malaria at higher altitudes. The geographic spread of dengue, a common mosquito-borne viral disease, is also affected by temperature changes and how the vectors respond to such changes [10].
Since contagious diseases often affect the human population through carrier organisms, it is essential to understand the effect of temperature changes on the spread of contagious diseases among wildlife. While the frequency of infectious disease outbreaks among wildlife has increased in recent decades paralleling global climate, the exact mechanisms through which climate change affects the spread of infectious disease largely remains unknown. To address this gap in the knowledge, using both laboratory experiments and field prevalence estimates, refs. [11,12] tested the thermal mismatch hypothesis, which posits that cool-adapted host species are more susceptible to pathogen infection during warm temperature periods; whereas warm-adapted host species are more susceptible to pathogens during periods of cool temperatures. The datasets used in these studies include a large and highly diverse spectrum of wildlife hosts and parasites that vary in ecologically important traits across a worldwide climatic gradient. Their results confirmed the thermal mismatch hypothesis, which suggests that as climate change shifts hosts away from their optimal temperature ranges, hosts can become more susceptible to infectious diseases, though the exact effect will be dependent on the particular host and the direction of the shift in climate patterns. Another example is from [13] who compared the 1918 influenza pandemic with the 2019 COVID pandemic, two disastrous health emergencies caused by different viruses that occurred a century apart from each other. The authors were able to identify similarities in both the clinical, pathological, and epidemiological features of the two pandemics and in the civic, medical, and public health responses to these events.
In this paper, we take a historical perspective to understand the relationship between contagious disease outbreaks and changes in ambient temperature. Using alternative model specifications, both parametric and nonparametric, we first derive estimates for the causal relationship between temperature anomalies and contagious disease outbreaks in any given year, modelled as a binary variable. The time evaluation of the transition probabilities of switching between contiguous disease and noncontiguous disease states or time periods are further studied using a nonhomogeneous hidden Markov model, under the assumption that the data generated follow a Markov process. Bearing the name of Russian mathematician Andrey Markov, the hidden Markov model (HMM) is a stochastic model that is assumed to involve a Markov process in which a sequence of events is characterized by their dependance only on the state that has occurred prior to that event and not on any preceding states. The Markov process is essentially a stochastic process with a memoryless property, implying that if the current state is given, past states do not play any role in its transition from a current state to a future state. Note, the transition process itself remains unobserved and is assumed to follow a Markov process. The probability of the process transitioning from a given state in the present to a future state is referred to as a transition probability. A transition matrix provides the set of transition probabilities that describe the likelihood of transition from any present state to all possible future states. The “hidden” in the term refers to the prior states remaining unobserved. HMM models have wide-ranging applications in finance [14,15], statistics [16,17], cognitive science [18,19], mobile communication [20,21], and climatology [22,23]. The HMM model can be extended to a nonhomogeneous HMM model (NHMM), if we relax the assumption of homogeneity among the transitions and allow them to depend on additional variables.
The analysis presented in this paper contributes to the strand of academic literature that aims to develop our understanding of the determinants of the outbreak and spread of contagious diseases, both contemporarily and temporally. The extent of damages associated with a communicable disease is characterized by a set of environmental and socio-economic factors, some of which are unique to the region, while others are common determinants across space, time, and disease type. For example, disease transmission rates can be affected by precipitation and humidity [24,25], population density [26,27], the rate of urbanization [28], and human migration [29], among others. Social outcomes, such as residential segregation, can impact the spread of contagious diseases [30].
Our interest lies in identifying the role played by a common factor, temperature anomalies, in contagious disease spreads over the past two millennia. We believe that comparing different public health events over time and identifying similar features can offer valuable lessons to help inform and improve public health surveillance systems across regions. Public health surveillance systems are the major defensive mechanisms of nations to prevent and control the spread of diseases. Involving a complex network of government officials, public health practitioners, healthcare providers, and the general population, they systematically collect data and information on the status of public health, identifying disease outbreaks and the transmission of communicable diseases. Epidemiological studies depend on this information to identify risk factors and effective prevention and control measures. Studies involving predictive modelling use this information to anticipate alternative scenarios and identify the likelihood of future events [31].
We demonstrate that temperature changes have always played a fundamental role is the spread of contagious diseases, thereby identifying a common factor in epidemics and pandemics covering the past two millenniums. The insights from these findings provide opportunities to develop region-specific and disease-specific policy recommendations that would account for the role of temperature anomalies in the spread of contagious diseases. In tropical and subtropical regions, such as Sub-Saharan Africa and Southeast Asia, temperature anomalies often exacerbate the conditions favorable to vector-borne diseases, like malaria and dengue fever. High temperatures accelerate vector reproduction and expand habitats into previously unaffected regions [9]. Policy measures should prioritize vector control programs, including insecticide-treated nets, community awareness campaigns, and investments in early warning systems based on real-time temperature and precipitation data [8]. In temperate areas, anomalously warm periods can extend the activity of vectors, such as ticks, increasing the risk of diseases like Lyme disease [32]. Alternatively, low temperature anomalies can displace vectors and wildlife, creating novel interactions that may trigger outbreaks. Public health responses should include enhanced surveillance and biodiversity monitoring to detect emerging zoonotic disease risks promptly [33]. In arid and semi-arid zones, such as the Sahel, rising temperatures combined with unpredictable rainfall can lead to shifts in the distribution of waterborne diseases, such as cholera. Infrastructure development focused on clean water access and improved sanitation should be prioritized in these areas [34]. In Arctic regions and high-altitude locations, warming trends can thaw permafrost, potentially reactivating dormant pathogens like anthrax [35]. Governments and global organizations must invest in pathogen containment measures and disaster preparedness to address the risks associated with environmental changes in these fragile ecosystems. Urban areas with high population densities face compounded risks due to the urban heat island effect. Temperature anomalies can intensify disease outbreaks by creating favorable conditions for diseases like influenza and respiratory infections. Policies should focus on green infrastructure to reduce heat retention and ensure equitable healthcare access during outbreaks [28]. While earlier scholars have provided evidence on region-specific temperature effects on disease spreads, we provide evidence on the role of temperature anomalies in the transmission of contagious diseases using a very long timeseries. Our findings indicate that, while the exact nature of the effects will be region-specific, temperature anomalies can be expected to play a significant role in disease outbreaks and spreads, regardless of location.
The paper is organized as follows: Section 2 provides our data sources and presents a description of the dataset. In Section 3, we present the methodology used in the analysis, which is followed by a discussion of the results in Section 4. In Section 5, we include some reflections and concluding remarks.

2. Data Sources and Description

The data used in this paper were obtained from the data on contagious diseases presented in Table 1 in [2], which runs until 2019. We then include the years 2020 and 2021 as periods associated with the COVID-19 pandemic. Table 1 in this paper follows [2] and lists the contagious disease events included in our analysis [2,36,37,38]. The table provides information about the primary regions of the world that were affected and the estimated death tolls.
The temperature anomaly data from 1 AD until 2019 were acquired from [39] and then updated from the National Oceanic and Atmospheric Administration (NOAA) until 2021 AD. Table 2 describes the data characteristics for the different variables used in the analysis. The complete dataset including observations on temperature anomalies and contagious disease breakouts contains 2021 observations. It has been divided into two subsamples comprising the “nondisease” and “disease” periods. A temperature anomaly occurs either when the observed temperature is higher than a reference value, such as the long-run average value of the temperature (a positive anomaly) or when it is lower than the reference value (a negative anomaly) [40]. A high temperature anomaly occurs when the standard deviation between the observed temperature, and the reference value is greater than 0.25 points; whereas a low temperature anomaly occurs when the difference is less than 0.25 points.

3. Methodology

Let t = 1,2 , ,   2021 denote the year of observation, d t denote a binary variable taking the value of 1 if a contiguous disease occurred in the year t and zero, otherwise; h t denote temperature anomaly; and τ t denote a linear time trend, i.e., τ t = t .
We start with the linear probability model:
d t = β 0 + β 1 h t + β 2 h t 1 + β 3 h t 2 + β 4 τ t + ε t
where ε t is an identically and independently distributed error term with zero mean and constant variance σ 2 , ε t ~ i i d ( 0 , σ 2 ) . Defining x t = 1 , h t ,   h t 1 ,   h t 2 , τ t and β = ( β 0 , β 1 , β 2 , β 3 , β 4 ) , we can write Equation (1) as
d t = β x t + ε t , t = 1,2 , , T
Defining π t = π x t = P ( d t = 1 | x t ) , the linear probability model implies that π t = E d t = 1 x t = β x t while E d t = 0 x t = 1 π t = 1 β x t .
We use the linear probability model as one of the benchmark models. The second benchmark model we use is the logistic probability model defined as:
d t = exp   { g x t } 1 + exp   { g x t } + ε t
where the logistic link function g x t is defined as
g x t = log π x t / 1 π x t = β x t
Thus, we can write d t = π x t + ε t , where π x t = exp   { g x t } / [ 1 + exp   { g x t } ] . Here, ε t is distributed with mean zero and variance equal to π x t [ 1 π x t ] .
A generalized additive model (GAM) replaces the logistic link function in Equation (3) with
g x t = β 0 + s 1 ( h t ) + s 2 ( h t 1 ) + s 3 ( h t 2 ) + s 4 ( τ t )
where s i ( ) , i = 1,2 , , 4 , is the univariate smooth function of the arguments. For the GAM model in Equation (4), we specify the smooth terms s i ( ) as nonparametric functions, which are estimated using thin-plate regression splines [41]. We also specify a first order, serially correlated GAM specification, where ε t follows a first-order autoregressive process [AR (1)], i.e., ε t = ρ ε t 1 + v t with v t ~ i i d ( 0 , σ v 2 ) .
The time evaluation of the probabilities of switching between contiguous disease and noncontiguous disease states (periods) can be studied using a hidden Markov model (HMM), which is a statistical model that defines a probability distribution over possible sequences of observations in which each observation is a member of a discrete set of outcomes. It is often used to model time-varying processes.
Hidden Markov models are particularly suitable for modeling disease outbreaks due to their ability to capture time-varying processes with latent states, reflecting the episodic nature of disease dynamics. In this study, we leverage HMM to model transitions between “disease” and “nondisease” states, allowing these transitions to depend on external covariates, such as temperature anomalies. This flexibility, extended through nonhomogeneous HMM (NHMM), enables the integration of environmental factors, uncovering how climate fluctuations influence the likelihood of outbreaks. HMM’s memoryless property aligns with the stochastic nature of disease spread, while its ability to incorporate latent states surpasses simpler models in capturing underlying dynamics. This methodology builds on the prior applications of HMM in environmental and epidemiological modeling [17,22,26], making it a robust framework for analyzing complex temporal data and informing predictive models of disease occurrence under changing climatic conditions.
A hidden Markov model is based on the assumption that the underlying process that generates the data is a Markov process, and that the hidden states of the process are unobserved. In our case, the binary variable d t , which indicates the presence or absence of a contiguous disease in year t , is a two-state process, with d t taking values 0 or one. Let these finite states be Λ = { 1,2 } . The HMM model expresses Markov evolution on the measurable space Λ in terms of a regular Markov chain using the latent variable S t { 1,2 } , where S t = 1 denotes the nondisease stated and S t = 2 denotes the disease state. In general, S t may have M states, S t Λ = { 1,2 , , M } , with the evolution of the state space expressed with the transition probability matrix P = [ p i j ] , i , j = 1,2 , , M , and stationary probability distribution π = ( π 1 , π 2 , , π M ) . The transition probability of switching from state i in year t 1 to state j in year t is defined with the following properties:
p i j = P S t = j S t 1 = i 0,1 ,   i , j j = 1 M p i j = 1 , i = 1,2 , , M
In our case, with M = 2 , we have two free transition probabilities, p 12 = S t = 2 S t 1 = 1 and p 21 = S t = 1 S t 1 = 2 , with p 11 = 1 p 12 and p 22 = 1 p 21 . The stationary probabilities = ( π 1 , π 2 , , π M ) are defined with the following properties:
π i = P ( S t = i ) 0,1 ,   i i = 1 M π i = 1 ,  
which implies one free state probability π 2 , since π 1 = 1 π 2 with M = 2 .
If the transition probabilities p i j are independent of time, then the HMM is time invariant or homogenous. However, the homogenous HMM is quite restrictive for many real-world cases where the transition probabilities change over time, likely due to the effects of some underlying factors. We can relax this restrictive assumption by allowing the transition probabilities to be time-varying, which leads to a nonhomogenous HMM model. The time-varying transition probabilities model is an extension of the standard HMM. In the standard Markov model, the transition probabilities between states are constant over time. In the time-varying transition probabilities model, the transition probabilities can vary over time. The time-varying transition probabilities model is a more accurate representation of reality than the standard HMM. It can be used to model processes that change over time, such as the spread of disease, the growth of a population, or the price of a stock. An attractive approach to making transition probabilities time-varying is to allow them to depend on some other covariate. The NHMM model with time-varying transition probabilities and covariates z t = ( z 1 t , z 1 t , , z K t ) can be represented as p i j ( z t ) = P S t = j S t 1 = i , z t . The transition probabilities between hidden states are allowed to vary over time and are governed by covariates z t . In this model, the probability of transitioning from one hidden state to another at any given time t depends on both the value of the covariate at that time and the values of the transition probabilities at previous times.
Given that the observed state variable d t is binary, we use a NHMM with a logistic link function. The covariates are specified to include the temperature anomaly series h t and a linear time trend in addition to a constant vector, i.e., z t = ( 1 , h t , τ t ) . The logistic HMM model specifies the transition probabilities { p i j , i , j Λ } and stationary distribution components { π i , i Λ } with the following logistic models:
p i j z t = exp α i j z t 1 + exp α i j z t = exp α 0 , i + α 1 , i j h t + α 2 , i j τ t 1 + exp α 0 , i + α 1 , i j h t + α 2 , i j τ t ,             i , j Λ
π i z t = exp γ i z t 1 + exp γ i z t = exp γ 0 + γ 1 , i h t + γ 2 , i τ t 1 + exp γ 0 + γ 1 , i h t + γ 2 , i τ t ,             i Λ
where α i j = ( α 0 , i , α 1 , i j , α 2 , i j ) and γ i = γ 0 , γ 1 , i , γ 2 , i are parameters to be estimated. In reality, only two sets of transition probabilities and one set of stationary state probabilities are estimated for a two-state model, since probabilities sum to one.
There are a number of ways to characterize the statistical properties of a logistic hidden Markov model for binary time series. One common approach is to consider the model’s ability to correctly predict the next time step in the series, given the previous time steps. Another approach is to consider the model’s ability to accurately estimate the underlying probabilities of the time series. Using the later approach, we estimate the parameters of the NHMM model using maximum likelihood (ML) estimation, where the maximization is performed using the expectation maximization (EM) algorithm. Once the NHMM model is estimated, there are a number of methods for decoding the states and obtaining the relevant probabilities. For our purposes, smoothed probabilities are appropriate, as they give us the full sample information for inference in each time point locally.

4. Results and Discussion

Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 showcase various attributes of the data. In panel Figure 2a, the dummy variable indicating a contagious disease in a given year is plotted, while Figure 2b shows both the temperature anomalies and years with contagious diseases (shaded bars) between 1 AD and 2021 AD. We use density plots to provide a visualization of the data. In Figure 3, we plot the conditional distributions of the temperature anomaly using kernel density estimates and box plots. The distribution of the temperature anomaly is conditional on the contagious disease periods with high temperature anomaly levels (above 0.25) and low temperature anomaly levels (below 0.25), respectively. Using a Gaussian kernel, the probability densities at each data point are estimated, which are then smoothed to generate continuous curves. In panel Figure 3a, the kernel density estimates are displayed for the conditional probability distribution function of the temperature anomaly series, while, in Figure 3b, the boxplots with overlayed observations conditional on the contagious disease status are presented.
Figure 3 provides a detailed examination of the conditional distribution of temperature anomalies during periods of contagious disease outbreaks. The kernel density plots in Panel (a) reveal that high temperature anomalies ( h t > 0.25 ) are more prominently associated with years experiencing disease outbreaks compared to nondisease years. This suggests that elevated temperatures likely exacerbate conditions favorable to the transmission of infectious diseases, such as increased vector activity or accelerated pathogen replication rates. Conversely, the boxplots in Panel (b) show that, during low anomaly periods ( h t < 0.25 ), while less frequent, outbreaks are not negligible. This supports the notion that cooler conditions may indirectly influence disease spread by altering host–pathogen interactions, such as through the thermal mismatch hypothesis, where hosts displaced from their thermal optima become more vulnerable to infections. Both high and low temperature anomalies can have marked effects on disease transmission, but the mechanisms differ significantly. Tian et al. [43] demonstrated that temperature extremes impact both the susceptibility of human populations and the dynamics of disease vectors. High anomalies can increase vector activity and expand the range of diseases like malaria, while low anomalies can displace populations and hosts, creating novel interactions that facilitate outbreaks. These findings underscore the duality of risk posed by temperature extremes, necessitating targeted public health strategies based on regional climatic trends.
The autocorrelation (ACF) and the partial autocorrelation function (PACF) of the contagious disease variable and the temperature anomaly behavior are presented in Figure 4 along with the cross-correlation and partial cross-correlation functions. Together, they provide insights about the time series characteristics of the data. The ACF provides the correlation between the current and the lagged values of a variable; whereas the PACF is used to measure the correlation between the current observation of the variable and an observation from a previous time period, after controlling for the observations at the intermediate lags. In Figure 4, the gradual decline in ACFs and PACFs together helps to define the autoregressive process of the two variables. The cross-correlation and partial cross-correlation plots show the relationship between the two time series used in the model. To provide a visual representation of the assessments of the three models, i.e., the logistic generalized additive model (GAM-Logistic), a logistic model, and the benchmark linear model are presented in Figure 5. We plot the predicted probabilities of the occurrence of a contagious disease against the estimated residuals ( d t π t ^ ) , temperature anomalies, and time, in panels (a), (b), and (c), respectively.
The receiver operating curves (ROC) in panel (d) plot the model sensitivity (true positive rate) against the false positive rate. The true positive rate represents the proportion of observations that are predicted to be positive when the observations are positive; whereas the false positive rate indicates the proportion of observations that are predicted to be positive when they are, in fact, negative. The area under the curve indicates the quality of a model in predicting the observations. The GAM-Logistic model with the highest area under the curve indicates the best fit among three models.
In Figure 6, we plot the quantile–quantile (QQ) plot on the histograms of the residuals of the logistic generalized additive model (GAM-Logistic). The points in the QQ plot fall on a straight line, indicating the residuals of the model approximately follow the normal distribution. The histogram of the residuals indicates that the residuals are centered around zero.
Figure 7, Figure 8 and Figure 9 provide further visualization of various features of the logistic generalized additive model. The smoothed transition and state probabilities estimated using the nonhomogenous hidden Markov model are plotted in Figure 10.
The estimates are obtained using maximum likelihood based on the expectation maximization (EM) algorithm. The estimation results from the alternative parametric and nonparametric model specifications are presented in Table 3, Table 4, Table 5 and Table 6. The tables indicate when the null hypothesis of zero effect of the temperature anomaly on disease spread can be rejected at the 1% (***), 5% (**), and 1% (*) levels. The R-square, log likelihood function, Akaike information criterion (AIC), and the Schwartz Bayesian information criterion (BIC) values provide measures for the quality of the respective models and help us to compare them. The models with the better fits have lower AIC and BIC values.
In Table 3, the results of the benchmark linear probability model (LPM) are presented, with the first column providing the estimates of the unrestricted model. Columns 2 through 7 represent the restricted versions of the model with estimates of the core model under different zero restrictions on the parameters of the contemporaneous and different lagged terms along with the trend variable. The first column represents the estimated coefficients of the unrestricted model. None of the coefficients that show the relationship between the temperature anomalies and the dependent variable are statistically significant. Column 2 presents the coefficient of the contemporaneous effect ( β 1 ), which is restricted to zero. In column (5), the results of a restricted version of the model with both the coefficients of h t and h t 2 set to zero are presented. A comparison of the alternative versions indicates that the restricted models in columns 5 and 6 are closely comparable. However, column 6 with the coefficients of h t 1 and h t 2 set to zero gives the best results qualitatively in terms of the information requirement, as confirmed by the AIC and BIC scores. The coefficient for the contemporaneous effect is statistically significant at 1%.
While relatively straightforward to specify and estimate, linear probability models are often not a suitable choice, because the predicted probability values can end up being below zero or greater than 1. To counter the standard limitations of the LPM, a logistic model was estimated. The results are presented in Table 4. Qualitatively, the results from the logistic model are in line with our findings from the benchmark model for both the unrestricted and restricted versions. The best results are for the model that includes a contemporaneous effect of temperature anomalies and a trend term.
The results of the nonparametric logistic general additive model (GAM) are presented in Table 5. A GAM is a powerful analytical tool because of its ability to fit many types of nonlinear data. However, because of this flexibility, it can be easy to overfit the data. The goal of the model is to strike a balance between two objectives. First, the model must capture the relationship exhibited in the data as closely as possible. This is indicated by the “likelihood” function, which indicates how well a model captures patterns in the data it is fitted to. Second, we want to avoid overfitting the data, which is captured by the “wiggliness” in the fit. In the model, the smoothing functions, s(.), are represented by penalized regression splines to avoid complex overfitting of the model. A smooth or a spline is essentially a function that can take a wide variety of shapes. The smoothing functions are estimated with thin plate splines, which do not depend on the prior knowledge of the functional form of the data. Thin plate regression splines can be computationally more costly relative to other smoothing options, such as cubic splines. However, they have the advantage of not requiring knots placements that are a feature of conventional regression spine modelling [44].
We estimate Equation (4) with various restrictions imposed on the smooth functions. In column (2), the results presented are conditioned on the smooth function for ht set to zero. Similarly, the results in column (5) are derived based on the assumption that the two-period lagged effect and the contemporaneous effect of the temperature anomaly of dt are assumed to be zero. The estimates presented in column (6) indicate that the contemporaneous effect of the temperature anomaly on the incidence of a contagious disease in a given year is statistically significant at the 1% level. This restricted model also provides the lowest AIC and BIC values, indicating a better fit than the alternative versions of the nonparametric model we estimated.
In order to account for the possibility that the error term in Equation (4) might be correlated over time, we also run a version of the GAM specificizing a first-order autoregressive process for the error structure. The results are presented in Table 6, which also includes estimates of ρ , the autocorrelation parameter. The estimates from both specifications of the nonparametric model are similar to column 6 (in both tables), indicating the best fit to the data compared to the alternative restricted versions of the core model. In fact, the results from the parametric and nonparametric models are qualitatively consistent. In all specifications, the restricted version of the model that includes the contemporaneous effect of the temperature anomaly and the linear trend term provide the best fit compared to the complete unrestricted specification and the alter zero-restriction variations we imposed.
Table 7 includes the estimates of the parameters of the transition probability expression in Equation (7) and the parameters in (8) used to derive a set of stationary state probabilities for the logistic HMM model. The transition probabilities are calculated at the zero values of the covariates. The sum of the estimated probabilities of a particular state (disease or nondisease) in any time period evolving into either the same state or the alternative state in the following period adds up to 1. Regardless of the initial state, the estimated transition probabilities imply that the probability of transitioning from a given state in a year to the same year in the following year is significantly higher than the probability of transitioning to the other state (0.999 versus 0.0010).

5. Conclusions

The pace and extent of transmission of any contagious disease depend on many contextual factors, such as the availability of healthcare-related services, governmental efficacy in management of the spread, the nature of the diseases, and local and regional socioeconomic and environmental conditions at the epicenter. In this paper, we used annual data on contagious disease outbreaks and temperature anomalies from 1 AD to 2021 AD and parametric and nonparametric modelling approaches to derive the estimates of contemporaneous and lagged effects of temperature anomalies on the spread of contagious diseases. Our results indicate that temperature anomalies have played an influential role in the spread of transmissible diseases over the last two thousand years, thereby identifying a common cause among different disease spreads over time. These findings can be used to develop public health surveillance systems across different regions of the world, characterized by considerable uncertainty in changes in weather and climate patterns. Region-specific climate forecasting results can be combined with demographic information to develop location-specific, cost-effective disease control policy responses and transmission-based precautionary measures. This is particularly important given that regions across the world vary greatly in available resources that can be dedicated to mitigating the damages associated with the transmission of infectious diseases. Future avenues of research could potentially focus on this line of interdisciplinary work. Our analysis does not include data on covariates other than temperature anomalies. Given the length of the time series, our study did not include other environmental and socioeconomic explanatory variables that have been shown to impact the spread of contagious diseases. Also, while historical data allow us to obtain a long-term perspective of the evolution of relationships and helps us avoid sample selection bias, it comes at the cost of some degree of inaccuracy, as the data might originate from alternative sources. We acknowledge these limitations, but there is no other way of handling the issues in the current context. Future research that focuses on shorter time lengths can address these concerns and also explore the complex dynamics among the contagious diseases mentioned in this study.

Author Contributions

Conceptualization, M.B., R.G. and S.D; methodology, M.B.; software, M.B.; formal analysis, M.B.; investigation, R.G. and S.D.; data curation, R.G. and S.D; writing—original draft preparation, M.B. and Z.M.; writing—review and editing, M.B. and Z.M; visualization, M.B.; supervision, R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. WHO COVID Dashboard. Available online: https://COVID19.who.int/ (accessed on 18 July 2023).
  2. Cirillo, P.; Taleb, N.N. Tail risk of contagious diseases. Nat. Phys. 2020, 16, 606–613. [Google Scholar] [CrossRef]
  3. Norris, J.R.; Allen, R.J.; Evan, A.T.; Zelinka, M.D.; O’dell, C.W.; Klein, S.A. Evidence for climate change in the satellite cloud record. Nature 2016, 536, 72–75. [Google Scholar] [CrossRef]
  4. IPCC. Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S.L., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M.I., et al., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2021; 2391p. [Google Scholar] [CrossRef]
  5. McDermott, A. Climate change hastens disease spread across the globe. Proc. Natl. Acad. Sci. USA 2022, 119, e2200481119. [Google Scholar] [CrossRef] [PubMed]
  6. Gorris, M.E.; Treseder, K.K.; Zender, C.S.; Randerson, J.T. Expansion of coccidioidomycosis endemic regions in the United States in response to climate change. GeoHealth 2019, 3, 308–327. [Google Scholar] [CrossRef] [PubMed]
  7. Mora, C.; McKenzie, T.; Gaw, I.M.; Dean, J.M.; von Hammerstein, H.; Knudson, T.A.; Setter, R.O.; Smith, C.Z.; Webster, K.M.; Patz, J.A.; et al. Over half of known human pathogenic diseases can be aggravated by climate change. Nat. Clim. Change 2022, 12, 869–875. [Google Scholar] [CrossRef] [PubMed]
  8. Caminade, C.; McIntyre, K.M.; Jones, A.E. Climate change and vector-borne diseases: Where are we next heading? Lancet Infect. Dis. 2019, 19, e302–e312. [Google Scholar] [CrossRef] [PubMed]
  9. Siraj, A.S.; Santos-Vega, M.; Bouma, M.J.; Yadeta, D.; Ruiz Carrascal, D.; Pascual, M. Altitudinal changes in malaria incidence in Colombia and Ethiopia due to warming. Proc. Natl. Acad. Sci. USA 2014, 111, 3457–3462. [Google Scholar]
  10. Thomson, M.C.; Stanberry, L.R. Climate Change and Vectorborne Diseases. N. Engl. J. Med. 2022, 387, 1969–1978. [Google Scholar] [CrossRef]
  11. Cohen, J.M.; Sauer, E.L.; Santiago, O.; Spencer, S.; Rohr, J.R. Divergent impacts of warming weather on wildlife disease risk across climates. Science 2020, 370, eabb1702. [Google Scholar] [CrossRef] [PubMed]
  12. Cohen, J.M.; Venesky, M.D.; Sauer, E.L.; Civitello, D.J.; Taegan, A.M.; Roznik, E.A.; Rohr, J.R. The thermal mismatch hypothesis explains host susceptibility to an emerging infectious disease. Ecol. Lett. 2017, 20, 184–193. [Google Scholar] [CrossRef]
  13. Morens, D.M.; Taubenberger, J.K.; Fauci, A.S. A Centenary Tale of Two Pandemics: The 1918 Influenza Pandemic and COVID-19, Part I. Am. J. Public Health 2021, 111, 1086–1094. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  14. Dias, J.G.; Vermunt, J.K.; Ramos, S. Clustering financial time series: New insights from an extended hidden Markov model. Eur J. Oper Res. 2015, 243, 852–864. [Google Scholar] [CrossRef]
  15. Mamon, R.S.; Elliott, R.J. Hidden Markov Models in Finance; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
  16. Genon-Catalot, V.; Jeantheau, T.; Laredo, C. Stochastic volatility models as hidden Markov models and statistical applications. Bernoulli 2000, 6, 1051–1079. [Google Scholar] [CrossRef]
  17. Scott, S.L.; James, G.M.; Sugar, C.A. Hidden Markov Models for Longitudinal Comparisons. J. Am. Stat. Assoc. 2005, 100, 359–369. [Google Scholar] [CrossRef]
  18. Nock, H.J.; Young, S.J. Modelling asynchrony in automatic speech recognition using loosely coupled hidden Markov models. Cog Sci. 2002, 3, 283–301. [Google Scholar] [CrossRef]
  19. Dasgupta, I.; Gershman, S. Memory as a Computational Resource. Trends Cogn. Sci. 2021, 25, 240–251. [Google Scholar] [CrossRef] [PubMed]
  20. Yap, K.L.; Chong, Y.W. Optimized access point selection with mobility prediction using hidden Markov Model for wireless network. In Proceedings of the 2017 Ninth International Conference on Ubiquitous and Future Networks (ICUFN), Milan, Italy, 4–7 July 2017; pp. 38–42. [Google Scholar] [CrossRef]
  21. Gani, M.O.; Sarwar, H.; Chowdhury, M.R. Prediction of State of Wireless Network Using Markov and Hidden Markov Model. J. Netw. 2009, 4, 976–984. [Google Scholar] [CrossRef]
  22. Zucchini, W.; Guttorp, P. A hidden Markov model for space-time precipitation. Water Resour. Res. 1991, 27, 1917–1923. [Google Scholar] [CrossRef]
  23. Greene, A.M.; Robertson, A.W.; Smyth, P.; Triglia, S. Downscaling projections of Indian monsoon rainfall using a non-homogeneous hidden Markov model. Q. J. R. Meteorol. Soc. 2011, 137, 347–359. [Google Scholar] [CrossRef]
  24. Kuhn, K.; Campbell-Lendrum, D.; Haines, A.; Cox, J.; Corvalán, C.; Anker, M. Using Climate to Predict Infectious Disease Epidemics; World Health Organization: Geneva, Switzerland, 2005; pp. 16–20. [Google Scholar]
  25. Chen, M.-J.; Lin, C.-Y.; Wu, Y.-T.; Wu, P.-C.; Lung, S.-C.; Su, H.-J. Effects of Extreme Precipitation to the Distribution of Infectious Diseases in Taiwan, 1994–2008. PLoS ONE 2012, 7, e34651. [Google Scholar] [CrossRef] [PubMed]
  26. Perez, L.; Dragicevic, S. An agent-based approach for modeling dynamics of contagious disease spread. Int. J. Health Geogr. 2009, 8, 50. [Google Scholar] [CrossRef]
  27. Li, R.; Richmond, P.; Roehner, B.M. Effect of population density on epidemics. Phys. A Stat. Mech. Its Appl. 2018, 510, 713–724. [Google Scholar] [CrossRef]
  28. Neiderud, C.-J. How urbanization affects the epidemiology of emerging infectious diseases. Infect. Ecol. Epidemiol. 2015, 5, 27060. [Google Scholar] [CrossRef] [PubMed]
  29. Kinasih, S.E.; Devy, S.R.; Koesbardiati, T.; Romadhona, M.K. Human migration, infectious diseases, plague, global health crisis–historical evidence. Cogent Arts Humanit. 2024, 11, 2392399. [Google Scholar] [CrossRef]
  30. Acevedo-Garcia, D. Residential segregation and the epidemiology of infectious diseases. Soc. Sci. Med. 2000, 51, 1143–1161. [Google Scholar] [CrossRef] [PubMed]
  31. Martin-Moreno, J.M.; Alegre-Martinez, A.; Martin-Gorgojo, V.; Alfonso-Sanchez, J.L.; Torres, F.; Pallares-Carratala, V. Predictive Models for Forecasting Public Health Scenarios: Practical Experiences Applied during the First Wave of the COVID-19 Pandemic. Int. J. Environ. Res. Public Health 2022, 19, 5546. [Google Scholar] [CrossRef]
  32. Monaghan, A.J.; Sampson, K.M.; Steinhoff, D.F.; Ernst, K.C.; Ebi, K.L.; Jones, B.; Hayden, M.H. The potential impacts of 21st century climatic and population changes on human exposure to the virus vector Aedes aegypti. Clim. Change 2015, 131, 67–80. [Google Scholar] [CrossRef]
  33. Mills, J.N.; Gage, K.L.; Khan, A.S. Potential influence of climate change on vector-borne and zoonotic diseases: A review and proposed research plan. Environ. Health Perspect. 2010, 118, 1507–1514. [Google Scholar] [CrossRef] [PubMed]
  34. Trærup, S.; Ortiz, R.A.; Markandya, A. The health impacts of climate change: A study of cholera in Tanzania. Glob. Environ. Change 2011, 21, 392–403. [Google Scholar]
  35. Revich, B.; Podolnaya, M.A.; Popova, E.Y. Thawing of permafrost may disturb historic cattle burial grounds in East Siberia. Glob. Health Action 2011, 4, 8482. [Google Scholar] [CrossRef]
  36. ListFist List of Epidemics Compared to Coronavirus (COVID-19). Available online: https://listfist.com/list-of-epidemics-compared-to-coronavirus-COVID-19 (accessed on 24 July 2023).
  37. Wikipedia List of Epidemics and Pandemics. Available online: https://en.wikipedia.org/wiki/List_of_epidemics_and_pandemics#cite_note-38 (accessed on 24 July 2023).
  38. World History Encyclopedia. Available online: https://www.worldhistory.org/article/1532/plagues-of-the-near-east-562-1486-ce/ (accessed on 24 July 2023).
  39. Hawkins, E. Climate Lab Book. 2020. Available online: https://web.archive.org/web/20200202220240/https://www.climate-lab-book.ac.uk/2020/2019-years/ (accessed on 18 July 2023).
  40. National Centers for Environmental Information NOAA. Available online: https://www.ncei.noaa.gov/access/monitoring/global-temperature-anomalies#:~:text=The%20term%20temperature%20anomaly%20means,cooler%20than%20the%20reference%20value (accessed on 20 May 2023).
  41. Wood, S. Thin plate regression splines. J. R. Stat. Soc. Ser. B 2003, 65, 95–114. [Google Scholar] [CrossRef]
  42. Weiss, C. An Introduction to Discrete-Valued Time Series; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 2018. [Google Scholar] [CrossRef]
  43. Tian, H.; Yan, C.; Xu, L.; Büntgen, U.; Stenseth, N.C.; Zhang, Z. Scale-dependent climatic drivers of human epidemics in ancient China. Proc. Natl. Acad. Sci. USA 2017, 114, 12970–12975. [Google Scholar] [CrossRef] [PubMed]
  44. Crawley, M.J. The R Book; Wiley: Chichester, West Sussex, UK, 2013. [Google Scholar]
Figure 1. History of global temperature change and causes of recent warming. Source: IPCC, 2021: Climate Change 2021: The Physical Science Basis (page 6) [4].
Figure 1. History of global temperature change and causes of recent warming. Source: IPCC, 2021: Climate Change 2021: The Physical Science Basis (page 6) [4].
Climate 12 00225 g001
Figure 2. Contagious disease and temperature anomaly series. Note: The figure plots the dummy indicator for the contiguous disease and the temperature anomaly over the years from 1 AD to 2021. Heatmap color intensity in Panel (a) indicates the count of the number of contiguous disease year in consecutive 25-year intervals. Panel (b) displays the temperature anomaly series.
Figure 2. Contagious disease and temperature anomaly series. Note: The figure plots the dummy indicator for the contiguous disease and the temperature anomaly over the years from 1 AD to 2021. Heatmap color intensity in Panel (a) indicates the count of the number of contiguous disease year in consecutive 25-year intervals. Panel (b) displays the temperature anomaly series.
Climate 12 00225 g002
Figure 3. Conditional distribution and temperature anomaly series. Note: The figure displays the density and boxplots of the temperature anomaly conditional on the status of the contagious disease with high and low temperature anomaly levels. High and low anomaly levels are defined values above 0.25 and below 0.25, respectively, where the value naturally splits disease occurrences into these classes. Panel (a) displays kernel density estimates with a Gaussian kernel. Panel (b) displays boxplots with overlayed observations conditional on the contagious disease status.
Figure 3. Conditional distribution and temperature anomaly series. Note: The figure displays the density and boxplots of the temperature anomaly conditional on the status of the contagious disease with high and low temperature anomaly levels. High and low anomaly levels are defined values above 0.25 and below 0.25, respectively, where the value naturally splits disease occurrences into these classes. Panel (a) displays kernel density estimates with a Gaussian kernel. Panel (b) displays boxplots with overlayed observations conditional on the contagious disease status.
Climate 12 00225 g003
Figure 4. Autocorrelations and cross-correlations of contagious disease and temperature anomaly. Note: The figure displays the autocorrelation function (ACF), partial autocorrelation function (PACF), cross-correlation function (CCF), and partial cross-correlation function (PCCF) of contiguous disease and temperature anomaly series. All four measures (ACF, PACF, CCF, and PCCF) when a binary contiguous disease series is involved are obtained using Cohen’s κ statistic [42] p. 130, a measure of signed serial dependence for discrete-valued time series.
Figure 4. Autocorrelations and cross-correlations of contagious disease and temperature anomaly. Note: The figure displays the autocorrelation function (ACF), partial autocorrelation function (PACF), cross-correlation function (CCF), and partial cross-correlation function (PCCF) of contiguous disease and temperature anomaly series. All four measures (ACF, PACF, CCF, and PCCF) when a binary contiguous disease series is involved are obtained using Cohen’s κ statistic [42] p. 130, a measure of signed serial dependence for discrete-valued time series.
Climate 12 00225 g004aClimate 12 00225 g004b
Figure 5. Linear, logistic, and logistic generalized additive model fit assessment. Note: The figure presents the model assessment for the best, among all models considered, logistic generalized additive model (GAM-Logistic), a logistic model that has the best AIC among the logistic models, and a benchmark linear model. Panel (a) plots predictions ( π ^ t ) against the residuals ( d t π ^ t ) with a local polynomial regression (LOESS) fits using a second-degree polynomial. Panel (b) plots the predicted probability of the occurrence of a contiguous disease by temperature anomaly with the trend variable set equal to zero. Panel (c) plots the predicted probability of the occurrence of a contiguous disease by the time trend with the temperature anomaly set equal to zero. Panel (d) plots the receiver operating curves.
Figure 5. Linear, logistic, and logistic generalized additive model fit assessment. Note: The figure presents the model assessment for the best, among all models considered, logistic generalized additive model (GAM-Logistic), a logistic model that has the best AIC among the logistic models, and a benchmark linear model. Panel (a) plots predictions ( π ^ t ) against the residuals ( d t π ^ t ) with a local polynomial regression (LOESS) fits using a second-degree polynomial. Panel (b) plots the predicted probability of the occurrence of a contiguous disease by temperature anomaly with the trend variable set equal to zero. Panel (c) plots the predicted probability of the occurrence of a contiguous disease by the time trend with the temperature anomaly set equal to zero. Panel (d) plots the receiver operating curves.
Climate 12 00225 g005
Figure 6. Diagnostics for the logistic generalized additive model. Note: The figure presents model diagnostics for the selected logistic GAM model. Quantile–quantile (QQ) plots of the model residuals are obtained by generating reference quantiles that associate each data point with a quantile of the uniform distribution. The residual vs. linear predictor plots are based on the fitted model prediction of a binomial link function of expected values for each data point.
Figure 6. Diagnostics for the logistic generalized additive model. Note: The figure presents model diagnostics for the selected logistic GAM model. Quantile–quantile (QQ) plots of the model residuals are obtained by generating reference quantiles that associate each data point with a quantile of the uniform distribution. The residual vs. linear predictor plots are based on the fitted model prediction of a binomial link function of expected values for each data point.
Climate 12 00225 g006
Figure 7. Conditional predictions from the logistic generalized additive model. Note: The figure displays conditional predictions for the probability of contagious diseases from the logistic generalized additive model. In Panel (a), predictions for the time trend τ t conditional on three specific values of temperature anomaly are displayed: high temperature anomaly (Case A: h t = 0.654 ) corresponding to the mean temperature anomaly in high temperature contagious disease periods ( h t > 0.25 and d t = 1 ), medium temperature anomaly (Case B: h t = 0.244 ) corresponding to the mean temperature anomaly in no contagious disease periods ( d t = 0 ), and low temperature anomaly (Case C: h t = 0.363 ) corresponding to the mean temperature anomaly in low temperature contagious disease periods ( h t 0.25 and d t = 1 ). In Panel (b), predictions for the temperature anomaly ( h t ) conditional on three specific values of time are displayed. The time periods that the predictions are conditioned on are τ t = 1900 (Case D), τ t = 1800 (Case E), and τ t = 1700 (Case F).
Figure 7. Conditional predictions from the logistic generalized additive model. Note: The figure displays conditional predictions for the probability of contagious diseases from the logistic generalized additive model. In Panel (a), predictions for the time trend τ t conditional on three specific values of temperature anomaly are displayed: high temperature anomaly (Case A: h t = 0.654 ) corresponding to the mean temperature anomaly in high temperature contagious disease periods ( h t > 0.25 and d t = 1 ), medium temperature anomaly (Case B: h t = 0.244 ) corresponding to the mean temperature anomaly in no contagious disease periods ( d t = 0 ), and low temperature anomaly (Case C: h t = 0.363 ) corresponding to the mean temperature anomaly in low temperature contagious disease periods ( h t 0.25 and d t = 1 ). In Panel (b), predictions for the temperature anomaly ( h t ) conditional on three specific values of time are displayed. The time periods that the predictions are conditioned on are τ t = 1900 (Case D), τ t = 1800 (Case E), and τ t = 1700 (Case F).
Climate 12 00225 g007aClimate 12 00225 g007b
Figure 8. Partial effects and partial derivatives in the logistic generalized additive model. Note: The figure depicts the partial effects and partial derivatives of the temperature anomaly h t and time trend τ t in the logistic GAM model, which is specified as the g h t , τ t = c + s h h t + s τ ( τ t ) , where the function g ( ) is a logistic link function defined as g = log { π ( ) / [ 1 π ( ) ] } , where π h t , τ t = P d t = 1 h t , τ t = exp { g h t , τ t } / ( 1 + exp { g h t , τ t } ) .
Figure 8. Partial effects and partial derivatives in the logistic generalized additive model. Note: The figure depicts the partial effects and partial derivatives of the temperature anomaly h t and time trend τ t in the logistic GAM model, which is specified as the g h t , τ t = c + s h h t + s τ ( τ t ) , where the function g ( ) is a logistic link function defined as g = log { π ( ) / [ 1 π ( ) ] } , where π h t , τ t = P d t = 1 h t , τ t = exp { g h t , τ t } / ( 1 + exp { g h t , τ t } ) .
Climate 12 00225 g008
Figure 9. The joint partial effects of temperature anomaly and trend in the logistic generalized additive model. Note: The figure presents the full joint effects of temperature anomaly and trend variables with over-imposed contour lines. The partial effect estimates are obtained from a tensor product smoother with a logistic link function defined as g h t , τ t = c + s h t , τ t , where the function g ( ) is a logistic link function defined as g = log { π ( ) / [ 1 π ( ) ] } with π h t , τ t = P ( d t = 1 | h t , τ t ) . The tensor product smooth s is constructed using row Kronecker products.
Figure 9. The joint partial effects of temperature anomaly and trend in the logistic generalized additive model. Note: The figure presents the full joint effects of temperature anomaly and trend variables with over-imposed contour lines. The partial effect estimates are obtained from a tensor product smoother with a logistic link function defined as g h t , τ t = c + s h t , τ t , where the function g ( ) is a logistic link function defined as g = log { π ( ) / [ 1 π ( ) ] } with π h t , τ t = P ( d t = 1 | h t , τ t ) . The tensor product smooth s is constructed using row Kronecker products.
Climate 12 00225 g009
Figure 10. Smoothed transition and state probability estimated from a nonhomogenous hidden Markov model. Note: The figure depicts the time-varying transition and state probabilities from a two-state ( S t 1,2 with S t = 1 denoting the noncontagious disease state and S t = 2 contagious disease state) nonhomogenous hidden Markov model. The transition probability estimates given in Panels (ad) are specified as p i j z t = P S t = j S t = i , z t = exp { α i j z t } / ( 1 + exp { α i j z t } ) , i , j { 1,2 } , where z t = 1 , h t , τ t and α i j = ( α 0 i , , α 1 , i j , α 2 , i j ) .
Figure 10. Smoothed transition and state probability estimated from a nonhomogenous hidden Markov model. Note: The figure depicts the time-varying transition and state probabilities from a two-state ( S t 1,2 with S t = 1 denoting the noncontagious disease state and S t = 2 contagious disease state) nonhomogenous hidden Markov model. The transition probability estimates given in Panels (ad) are specified as p i j z t = P S t = j S t = i , z t = exp { α i j z t } / ( 1 + exp { α i j z t } ) , i , j { 1,2 } , where z t = 1 , h t , τ t and α i j = ( α 0 i , , α 1 , i j , α 2 , i j ) .
Climate 12 00225 g010
Table 1. Contagious disease events and dates included in the sample.
Table 1. Contagious disease events and dates included in the sample.
EventStart YearEnd YearLocationEstimated Deaths
Plague of Athens−429−426Greece, Libya, Egypt, Ethiopia75,000–100,000
Antonine Plague165180Roman Empire5–10 million
Plague of Cyprian250266Europe310,000
Plague of Justinian541542Europe, West Asia15–100 million
Plague of Amida562562Mesopotamia (modern day Turkey)30,000
Roman Plague of 590590590Rome, Byzantine EmpireUnknown
Plague of Sheroe627628Bilad al-Sham25,000+
Plague of the British Isles664689British IslesUnknown
Plague of Basra688689Basra (southeast Turkey)200,000
Japanese Smallpox Epidemic735737Japan2 million
Black Death13311353Eurasia and North Africa75–200 million
Sweating Sickness14851551Britain 10,000+
Smallpox Epidemic in Mexico15201520Mexico5–8 million
Cocoliztli Epidemic of 1545–154815451548Mexico5–15 million
1563 London Plague15621564London, England20,100
Malta Plague Epidemic15921593Malta3000
Plague in Spain15961602Spain600,000–700,000
New England Epidemic16161620New EnglandUnknown
Italian Plague of 1629–163116291631Italy1 million
Great Plague of Sevilla16471652Spain500,000
Plague in Kingdom of Naples16561658Italy1,250,000
Plague in the Netherlands16631664Amsterdam, Netherlands24,148
Great Plague of London16651666England100,000
Plague in France16681668France40,000
Malta Plague Epidemic16751676Malta11,300
Great Plague of Vienna16791679Vienna, Austria76,000
Great Northern War plague Outbreak17001721Denmark, Sweden, Lithuania164,000
Great Smallpox Epidemic in Iceland17071709Iceland18,000+
Great Plague of Marseille17201722France100,000
Great Plague of 173817381738Balkans50,000
Russian Plague of 1770–177217701772Russia50,000
Ottoman Plague Epidemic18121819Ottoman Empire300,000+
Caragea’s Plague18131813Romania60,000
Malta Plague Epidemic18131814Malta4500
First Cholera Pandemic18161826Asia, Europe100,000+
Second Cholera Pandemic18291851Asia, Europe, North America100,000+
Typhus Epidemic in Canada18471848Canada20,000+
Third Cholera Pandemic18521860Worldwide1 million+
Cholera Epidemic of Copenhagen18531853Copenhagen, Denmark4737
Third Plague Pandemic18551960Worldwide (India, China)12–15 million
Smallpox in British Columbia18621863Pacific Northwest, Canada, US20,000+
Fourth Cholera Pandemic18631875Middle East600,000
Fiji Measles outbreak18751875Fiji40,000
Yellow Fever18801900Mississippi, New Orleans, US17,000+
Fifth Cholera Pandemic18811896Asia, Africa, Europe, South America298,600
Smallpox in Montreal18851885Montreal, Canada3164
Russian Flu18891890Russia, Worldwide1 million
Sixth Cholera Pandemic18991923Europe, Asia, Africa800,000
China Plague19101912China40,000
Encephalitis Lethargica Pandemic19151926Worldwide500,000
American Polio Epidemic19161916United States7130
Spanish Flu19181920Worldwide17–100 million
HIV/AIDS Pandemic19812023Worldwide42 million
Poliomyelitis in USA19461946United States9000
Asian Flu19571958Worldwide1–4 million
Hong Kong Flu19681969Worldwide1–4 million
London Flu19721973United States1027
Smallpox Epidemic of India19741974India15,000
Zimbabwean Cholera Outbreak20082009Zimbabwe4293
Swine Flu20092009Worldwide151,700–575,400
Haiti Cholera Outbreak20102020Haiti10,075
Measles in D.R. Congo20102014Democratic Republic of Congo (DRC)4500
Ebola in West Africa20132016Worldwide (Guinea, Liberia, Sierra Leone)11,323+
Indian Swine Flu Outbreak20152015India2035
Yemen Cholera Outbreak20162020Yemen3981
2018-2019 Kivu Ebola Epidemic20182020DRC and Uganda2280
Measles in D.R. Congo20192020DRC7018
Dengue Fever20192020Asia-Pacific, Latin America3930
COVID-19 Pandemic2019To dateWorldwide7–29.3 million
Table 2. Descriptive statistics.
Table 2. Descriptive statistics.
(1)
Temperature Anomaly:
Full Sample
(2)
Temperature Anomaly:
Nondisease Periods
(3)
Temperature Anomaly:
Disease Periods
(4)
Low Temperature Anomaly:
Disease Periods
(5)
High Temperature Anomaly:
Disease Periods
(6)
Contagious Disease
Observations20211662359342172021
Mean−0.2565−0.2439−0.3148−0.36300.65420.1776
S.D.0.16260.13150.25440.13150.17890.3823
Min−0.7128−0.6688−0.7128−0.71280.44280.0000
Max1.00710.56801.00710.07741.00711.0000
Skewness1.85520.66272.81530.67600.47811.6856
Kurtosis10.30644.27769.24880.4030−1.24980.8417
JB10,128.7200 ***1394.1920 ***1776.7710 ***28.8250 ***1.52001018.6720 ***
Q(1)1608.9177 ***1218.9419 ***291.7578 ***208.1254 ***9.8364 ***1574.3151 ***
Q(4)5750.1233 ***4171.6226 ***981.4622 ***629.5901 ***17.7115 ***4927.1886 ***
ARCH(1)1844.5608 ***1297.6202 ***335.4321 ***136.2012 ***4.5872 **1574.9812 ***
ARCH(4)1876.8272 ***1346.6727 ***337.1974 ***157.6724 ***4.64651585.6803 ***
Note: The table reports descriptive statistics for the temperature anomaly ( h t ) and contiguous disease variables ( d t ) , with annual data covering the period from 1 AD to April 2021 (2021observations). In addition to the full sample (column 1), the descriptive statistics for the temperature anomaly are reported for four additional subsamples: periods of noncontiguous disease ( d t = 0 ; column 2), periods of contiguous disease ( d t = 1 ; column 3), periods of low temperature anomaly and contiguous disease ( d t = 1 and h t 0.25 ; column 4), and periods of high temperature anomaly and contiguous disease ( d t = 1 and h t > 0.25 ; column 5). The table reports mean, standard deviation (S.D.), minimum, maximum, skewness, and kurtosis, as well as the Jarque–Bera normality test (JB), the first- [Q(1)] and fifth order [Q(5)] Ljung–Box portmanteau test for serial correlation, and the first- [ARCH(1)] and fifth-order [ARCH(5)] autoregressive conditional heteroskedasticity tests. **, and *** denote rejection at 5%, and 1% level, respectively.
Table 3. Linear probability model estimates.
Table 3. Linear probability model estimates.
Model:(1)(2)(3)(4)(5)(6)(7)
Intercept−0.102 ***
(0.019)
−0.101 ***
(0.019)
−0.100 ***
(0.018)
−0.099 ***
(0.018)
−0.098 ***
(0.018)
−0.098 ***
(0.018)
0.077 ***
(0.016)
h t −0.080
(0.121)
−0.108
(0.115)
−0.185 ***
(0.050)
−0.393 ***
(0.052)
h t 1 −0.034
(0.138)
−0.084
(0.115)
−0.087
(0.117)
−0.186 ***
(0.051)
h t 2 −0.089
(0.123)
−0.115
(0.117)
−0.192 ***
(0.052)
τ t 0.00022 ***
(0.00001)
0.00022 ***
(0.00001)
0.00023 ***
(0.00001)
0.00022 ***
(0.00001)
0.00023 ***
(0.00001)
0.00023 ***
(0.00001)
R-squared0.1390.1390.1390.1390.1390.1390.028
Log L−772.311−772.532−772.573−773.186−773.021−772.851−895.195
AIC1556.6221555.0641555.1461554.3721554.0421553.7031796.389
BIC1590.2841583.1161583.1981584.4211576.4831576.1441813.220
Note: The table reports the estimates for the linear probability model in Equation (1) with various zero restrictions on the parameters. The variable h t denotes the temperature anomaly in year t , t = 1,2 , , 2021 , and τ t denotes a linear time trend for year t . The table also reports McFadden’s pseudo-R squared (R-squared), logarithm of likelihood (Log L ), Akaike information criterion (AIC), and Schwarz’s Bayesian information criterion (BIC). The standard errors of the estimates are given in brackets. Boldface denotes the minimum AIC and BIC values. *** denotes rejection of the null hypothesis of zero effect at the 1% level.
Table 4. Logistic model estimates.
Table 4. Logistic model estimates.
Model:(1)(2)(3)(4)(5)(6)(7)
Intercept−4.254 ***
(0.217)
−4.247 ***
(0.217)
−4.247 ***
(0.217)
−4.237 ***
(0.216)
−4.234 ***
(0.216)
−4.237 ***
(0.216)
−2.518 ***
(0.149)
h t −0.609
(0.902)
−0.767
(0.859)
−1.237 ***
(0.334)
−3.501 ***
(0.451)
h t 1 −0.191
(1.045)
−0.592
(0.860)
−0.518
(0.875)
−1.240 ***
(0.339)
h t 2 −0.527
(0.919)
−0.717
(0.875)
−1.271 ***
(0.345)
τ t 0.002 ***
(0.0001)
0.002 ***
(0.0001)
0.002 ***
(0.0001)
0.002 ***
(0.00001)
0.002 ***
(0.0001)
0.002 ***
(0.000)
R-squared0.1640.1630.1640.1630.1630.1630.036
Log L−790.317−790.545−790.481−790.782−790.881−790.657−911.140
AIC1590.6341589.0901588.9631587.5641587.7631587.3151826.280
BIC1618.6861611.5321611.4041604.3951604.5941604.1461837.501
Note: The table reports the estimates for the logistic probability model in Equation (2) with various zero restrictions on the parameters. The variable h t denotes the temperature anomaly in year t , t = 1,2 , , 2021 , and τ t denotes a linear time trend for year t . The table also reports McFadden’s pseudo-R squared (R-squared), logarithm of likelihood (Log L ), Akaike information criterion (AIC), and Schwarz’s Bayesian information criterion (BIC). The standard errors of the estimates are given in brackets. *** denotes rejection of the null hypothesis of zero effect at the 1% level. Minimum AIC and BIC values appear in bold.
Table 5. Logistic generalized additive model estimates.
Table 5. Logistic generalized additive model estimates.
Model:(1)(2)(3)(4)(5)(6)(7)
Intercept−4.739 ***
(0.750)
−4.767 ***
(0.755)
−4.753 ***
(0.754)
−4.829 ***
(0.764)
−4.784 ***
(0.760)
−4.791 ***
(0.760)
−1.822 ***
(0.075)
s 1 ( h t ) 12.337 **
(3.933)
13.788 **
(3.964)
73.866 ***
(5.246)
251.475 ***
(6.633)
s 2 ( h t 1 ) 9.124
(4.186)
29.620 ***
(5.257)
10.414 *
(4.145)
70.326 ***
(5.429)
s 3 ( h t 2 ) 0.499
(1.243)
3.048
(2.053)
64.645 ***
(4.986)
s 4 ( τ t ) 207.781 ***
(12.176)
208.388 ***
(12.188)
208.685 ***
(12.182)
209.577 ***
(12.214)
210.482 ***
(12.196)
209.530 ***
(12.199)
R-squared0.3930.3900.3930.3840.3890.3880.167
Log L−544.991−549.450−545.648−556.882−551.469−550.085−789.674
AIC1141.8341145.6251138.6061151.7171141.9151138.5421595.723
BIC1287.2861276.6931271.3211258.1821251.2551246.1801641.660
UBRE595.411598.140595.064603.269598.528596.767803.486
Note: The table reports the estimates for the logistic probability model in Equation (4) with various restricted variants. The variable h t denotes the temperature anomaly in year t , t = 1,2 , , 2021 , and τ t denotes a linear time trend for year t . The smooth terms s i are represented using penalized regression splines with smoothing parameters selected by unbiased risk estimator (UBRE) criterion. The table reports the estimates of the intercept with its standard error in brackets. For the smooth terms s i ( ) , the table reports the approximate significance χ 2 statistics with effective degrees of freedom in brackets. The table also reports McFadden’s pseudo-R squared (R-squared), logarithm of likelihood (Log L ), Akaike information criterion (AIC), and Schwarz’s Bayesian information criterion (BIC), and unbiased risk estimator (UBRE) score. *, **, and *** denote rejection of the null hypothesis of zero effect at the 10%, 5%, and 1% level, respectively. Minimum AIC and BIC values appear in bold.
Table 6. Logistic generalized additive model estimates with serial correlation.
Table 6. Logistic generalized additive model estimates with serial correlation.
Model:(1)(2)(3)(4)(5)(6)(7)
Intercept−4.682 ***
(0.697)
−4.683 ***
(0.697)
−4.695 ***
(0.701)
−4.729 ***
(0.705)
−4.713 ***
(0.705)
−4.719 ***
(0.705)
−1.823 ***
(0.074)
s 1 ( h t ) 11.569 **
(3.546)
13.269 ***
(3.595)
75.069 ***
(4.931)
250.085 ***
(6.239)
s 2 ( h t 1 ) 9.393 **
(3.832)
67.878 ***
(5.178)
9.887 **
(3.713)
69.784 ***
(5.051)
s 3 ( h t 2 ) 0.862
(1.000)
2.127
(1.000)
67.738 ***
(4.676)
s 4 ( τ t ) 192.115 ***
(12.023)
192.891 ***
(12.029)
193.256 ***
(12.030)
195.289 ***
(12.056)
195.721 ***
(12.043)
195.017 ***
(12.045)
ρ 0.8910.8580.8580.8540.8580.8850.878
R-squared0.3930.3890.3930.3830.3880.3880.167
Log L−599.653−602.125−600.074−607.753−603.168−601.435−802.978
AIC1217.3061218.2511214.1481225.5051216.3371212.8701611.955
BIC1267.7991257.5231253.4211253.5571244.3891240.9221628.786
Note: The table reports the estimates for the logistic probability model with AR(1) error structure in Equation (4) with various restricted variants. The variable h t denotes the temperature anomaly in year t , t = 1,2 , , 2021 , and τ t denotes a linear time trend for year t . The smooth terms s i ( ) are represented using penalized regression splines with smoothing parameters selected by generalized cross-validation (GCV). The table reports the estimates of the intercept with its standard error in brackets. For the smooth terms s i ( ) , the table reports the approximate significance χ 2 statistics with effective degrees of freedom in brackets. The table also reports McFadden’s pseudo-R squared (R-squared), logarithm of likelihood (Log L ), Akaike information criterion (AIC), and Schwarz’s Bayesian information criterion (BIC). Parameters are estimated using a generalization of the penalized quasi likelihood algorithm. ** and *** denote rejections of the null hypothesis of zero effect at the 5% and 1% levels, respectively. Minimum AIC and BIC values appear in bold.
Table 7. Estimates of the hidden Markov model.
Table 7. Estimates of the hidden Markov model.
ParameterEstimateProbabilities at Zero Values of the Covariates
α 0,2 −6.9516 *** (0.7727)
α 1,12 −0.8445 (0.9618) p 11 0.9990
α 2,12 0.0023 *** (0.0004) p 12 0.0010
α 0,2 2.3465 *** (0.0016) p 21 0.0010
α 1,22 −0.0184 (0.1223) p 22 0.9990
α 2,22 0.00002 *** (0.000002)
γ 0 −683.0342 (23.2476)
γ 1,2 17.7412 (18.0686)
γ 2,2 0.4651 *** (0.0155)
Log L−262.0879
AIC542.1758
BIC592.6779
Note: The table reports the estimates for the nonhomogenous hidden Markov model defied in Equations (5)–(8). The variable h t denotes the temperature anomaly in year t , t = 1,2 , , 2021 , and τ t denotes a linear time trend for year t . The table also reports the logarithm of likelihood (Log L ), Akaike information criterion (AIC), and Schwarz’s Bayesian information criterion (BIC) score. *** denotes rejection of the null hypothesis of zero effect at the 1% level.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Balcilar, M.; Mukherjee, Z.; Gupta, R.; Das, S. Effect of Temperature on the Spread of Contagious Diseases: Evidence from over 2000 Years of Data. Climate 2024, 12, 225. https://doi.org/10.3390/cli12120225

AMA Style

Balcilar M, Mukherjee Z, Gupta R, Das S. Effect of Temperature on the Spread of Contagious Diseases: Evidence from over 2000 Years of Data. Climate. 2024; 12(12):225. https://doi.org/10.3390/cli12120225

Chicago/Turabian Style

Balcilar, Mehmet, Zinnia Mukherjee, Rangan Gupta, and Sonali Das. 2024. "Effect of Temperature on the Spread of Contagious Diseases: Evidence from over 2000 Years of Data" Climate 12, no. 12: 225. https://doi.org/10.3390/cli12120225

APA Style

Balcilar, M., Mukherjee, Z., Gupta, R., & Das, S. (2024). Effect of Temperature on the Spread of Contagious Diseases: Evidence from over 2000 Years of Data. Climate, 12(12), 225. https://doi.org/10.3390/cli12120225

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop