## 1. Introduction

Hydroclimatic changes in the landscape, such as in runoff [

1], evapotranspiration [

2], and often both [

3], are being observed and projected regionally and globally, adding to, or occasionally opposing, atmospheric changes in precipitation [

4]. Such changes have serious implications for terrestrial water cycling and availability, with particular availability impacts in cold regions, where the water cycle also depends on snow and ice conditions [

1]. Hydroclimatic changes can also cause profound and complex shifts in the geographical range, prevalence, and/or severity of some infectious diseases [

5,

6,

7,

8,

9].

The mechanisms of such disease impacts are wide-ranging and may include direct or indirect hydroclimatic influences on abundance of vectors, such as mosquitoes and ticks [

10], pathogen survival outside the host [

11], host-pathogen interactions that are closely related to community ecology and biodiversity [

12,

13], dampening of host immunity [

14], exposure to water-borne pollution and associated infections [

15], including from remobilization of previously frozen pathogens through permafrost thaw under warming [

16], and deterioration in health status, e.g., malnutrition and disruption of health systems associated with extreme hydroclimatic events, such as droughts and floods [

17]. High-latitude and Arctic regions, where the diversity of animal, plant, and microbial species is low, and where surface temperatures are increasing faster than the global average [

18], are subject to particular and in some respects (considering glacier melting and permafrost thaw) more severe and uncertain hydroclimatic changes [

19,

20,

21] linked to health and infectious diseases [

16,

22,

23]. These particularities and disease links make it essential to understand how such changes can affect infectious diseases, and how to identify and quantify future risk shifts due to changing hydroclimate in these high-latitude regions.

Many models have been established to reveal the relations between diseases and hydroclimatic factors. However, the contributions of hydroclimatic changes to predicted disease cases are not directly quantified in these models. In this study, we address this knowledge gap by developing a method for identifying and quantifying hydroclimatic changes and their implications for disease outbreak levels, and possible thresholds to such critical/unacceptable levels, based on a previously established statistical model for the example disease of tularemia in high-latitude areas [

24]. Tularemia, caused by the arthropod-borne pathogen

Francisella tularensis, affects wild animals and humans, and is one of the most researched high-latitude vector-borne diseases likely to be impacted by hydroclimatic change [

25]. Its prevalence is verified in Europe, Asia, and America [

26] and its natural foci includes high-latitude areas in, e.g., Russia [

27], Finland, and Sweden [

28]. Climate change is further expected to expand or shift the geographical distribution of tularemia in Russia and the U.S.A. [

29,

30], and increase the disease burden in high-endemic areas of Sweden [

31]. However, contradictory predictions regarding the latter effect have also been published [

32].

The tularemia model used in this study is based on mosquitoes as the main disease vector in the large boreal forest regions of Alaska, Sweden, Finland, and Russia [

33,

34,

35]. Although it is highly likely that other factors also affect outbreaks, the model has been used successfully to capture a series of annual outbreaks in Dalarna County, Sweden, predicting six out of seven high-incidence years occurring between 1981 and 2007 in this region [

24]. These successes indicate that the model has high potential to successfully represent scenarios of possible future tularemia outbreaks under hydroclimatic change in such high-latitude areas, which is a main reason for its selection and being used in the present study. The study also considers and addresses the approach recommended by the World Health Organization (WHO) for assessing climate change impacts on health, by applying the model for possible identification of driver thresholds beyond which unacceptable health hazards may occur under different future hydroclimatic scenarios [

36].

## 2. Methods with analytical solution development

The statistical model [

24] of annual numbers of humans contracting tularemia is based on identified key variables for mosquito abundance and tularemia-relevant local weather, using available data for the period 1981-2007 in Dalarna County, Sweden. Specifically, the model is as follows:

and it relates the annual number of tularemia cases (Tul) to relative annual mosquito abundance (RMA) (which in turn depends on some periodic hydroclimatic variables of water flow (Q) and temperature conditions (T), as explained further below), summer temperature in the preceding year (ST

_{lag}), summer precipitation in the same year (SP), winter days with low snow coverage (CW), and number of tularemia cases in the preceding year (Tul

_{lag}).

To quantify and readily compare the sensitivity of the number of tularemia cases, Tul, for different hydroclimatic conditions, and ultimately identify possible critical condition thresholds, we derived a model-implicit comparative index in terms of expected endemic levels by rearranging the analytical model Equation (1). This comparative index is the expected number of disease cases that each combination of hydroclimatic conditions will tend to drive the number of tularemia cases towards (asymptotically for temporally stable conditions) based on the underlying model equations (

Figure 1). To mathematically derive this index of expected endemic level, the model was first rearranged to represent all the hydroclimate variables by parameter A (Equation (2)), which is a constant in an unchanging hydroclimate setting:

The number of tularemia cases (Tul) in any current/projected year thus becomes a function of a single variable: the number of tularemia cases in the preceding year (Tul_{lag}) to the power of 0.75, a constant derived from the rearrangement of Equation (1) through Equation (2).

Equation (3) implies that, for any given (prevailing/projected) combination of hydroclimatic variable values determining the value of A, the value of Tul can be computed from the corresponding value of Tul

_{lag} for the preceding year. The associated functional relationship Tul=F(Tul

_{lag}) is illustrated in

Figure 1a. The blue curve (passing through the origin and convex upwards) intersects with the black 1:1 line (of Tul=Tul

_{lag,} drawn from the origin with a slope of 1) at some value of Tul=Tul

_{lag}=N*. The black line represents the theoretical case of unchanging number of tularemia outbreaks between years (Tul=Tul

_{lag}) and its intersection with the blue curve Tul=F(Tul

_{lag}) at Tul=Tul

_{lag}=N* indicates a possible steady state number of outbreaks, N*. Through iteration over succeeding years with any given constant value of A (i.e., constant disease-relevant hydroclimatic conditions) and any initial value of Tul

_{lag}, either larger (N

_{01}) or smaller (N

_{02}) than N*, the resulting number of tularemia outbreaks Tul will in following years converge towards and stabilize at N* (

Figure 1b). The resulting value of N* thus represents an expected endemic level for the hydroclimatic conditions considered (determining the value of A). Intersection at the origin, i.e., N*=0, means that the number of annual tularemia outbreaks will tend to vanish over a series of years under the given hydroclimatic conditions.

Note that the index of expected endemic level N* does not represent the actual tularemia cases in a specific year with weather conditions as in a considered combination of hydroclimatic model variables, since weather does not remain constant over consecutive years. However, the level N* works well as a comparative indicator of how different hydroclimatic factors contribute to disease outbreaks, and how the number of tularemia cases will tend to change driven by changes in long-term average hydroclimate. The latter are much slower than weather changes since, by definition, climate is the long-term average of short-term weather conditions.

After developing and using this analytical solution method, we further evaluated the effects of different hydroclimatic variable conditions on disease outbreaks. For this, we calculated the expected endemic level N* for different values of any individual model variable (RMA, ST

_{lag}, SP, and CW) within its data-given observation range (0.09–39, 12.2–16.8 °C, 143–342 mm, and 0–28 days, respectively) [

24], while keeping the other model variables at their median values. For RMA, which is not in itself a hydroclimatic variable but depends on such variables, we also calculated its dependence on the relevant hydroclimatic variable values of maximum standardized river flow for two periods preceding the evaluation time t, denoted Q

_{1} and Q

_{2} (for 36–42 days and 22–28 days, respectively, before time t), and mean temperature (T) over 1–7 days before time t. This calculation was based on meteorological and hydrological data for Q

_{1}, Q

_{2}, and T, obtained from the Swedish Meteorological and Hydrological Institute (

www.smhi.se) for the years 1981–2007. Given that any possible extreme daily values of RMA are filtered out when considering the annual median of this variable for use in Equation (1), we calculated the range of annual median RMA for ranges of Q

_{1} and Q

_{2} between −1 and 1 standard deviation of each variable (

Figure 2a). Furthermore, we used the observed summer temperature range (12.3 °C to 16.8 °C) for T (

Figure 2b). Since coefficients of Q

_{1} and Q

_{2} in the model are similar under the same hydroclimate, we only plotted scenarios where Q

_{1} is equal to Q

_{2}. For scenarios where Q

_{1} is not equal to Q

_{2}, derived values of RMA will fall between the two extreme line boundaries (blue and green lines in

Figure 2a,b). For these value ranges, the resulting RMA variation is from 0.02 up to 39.9 (

Figure 2), which is similar to the previously reported RMA range (0.09–39) [

24]. The observation-based ranges of underlying hydroclimatic variables Q

_{1}, Q

_{2}, and T considered here thus reproduce a consistent range of annual RMA, as needed for hydroclimatic sensitivity assessment of RMA in the model (Equation (1)).

## 3. Results

Figure 3 shows modeled expected endemic levels (crosses in the left-hand panels, dashed lines in the right-hand panels) for different examples of hydroclimatic variable values considered. For all hydroclimatic variable sets, median-range values (indicated by bold black lines in all panels) showed endemic levels close to zero. However, even small variable deviations from these median values can trigger large shifts in endemic levels. Modeled endemic levels are particularly sensitive to the value of RMA, and thereby to the underlying values of Q

_{1}, Q

_{2}, and T (

Figure 2). For example, a value of 4.1 for RMA, which is higher than the median value, but far from the maximum observed value, resulted in endemic level close to 800 annual cases. This was much higher than the near-zero endemic level for median RMA and also higher than, for example, the resulting endemic level of slightly above 400 for the maximum value of 342 mm for SP. For CW, the highest resulting endemic level was less than 100, which is much lower than the maximum levels that other variables could trigger.

To directly compare result sensitivity across all variables, the endemic level response to variable value shifts were plotted over the same standardized value range for all variables (

Figure 4). The most dominant variable was then indeed found to be RMA, for which even values far below its maximum observed value yielded very high endemic levels. The next most dominant variables were ST

_{lag} and SP, for which variations within the observed range, but close to their respective observed maximum value, raised the endemic level to 1200 and 400 annual cases, respectively. In comparison with the other model variables, CW variation contributed much less to outbreak variation (

Figure 4).

Endemic level can also be used as an index for identification of hydroclimatic thresholds to critical/unacceptable health hazards (

Figure 5). As a basis for such identification, a highest societally acceptable endemic level (N

_{acc}) needs to be defined (upper boundary of gray area in

Figure 5b,c). Combinations of observed or projected hydroclimate variable values leading to endemic levels at or around N

_{acc} can then be identified as critical thresholds (e.g., variable combination number 2, light blue in

Figure 5) between hydroclimatic conditions that are acceptable (e.g., yellow combination number 1) and those that are high-risk (green, orange, dark blue combinations 3–5) from a health/disease perspective. Acceptable endemic level here is different from the alert thresholds for epidemic-prone conditions of other diseases defined by WHO for EWARN (Early Warning Alert and Response Network) [

37], because it is not the actual cases that will happen in a certain year but a comparative indicator of the tendency of tularemia cases driven by different hydroclimatic conditions. Therefore, the acceptable level of this indicator has to be further studied and defined by relevant agencies, based on estimated health risks associated with disease outbreak numbers approaching or becoming greater than this acceptable endemic level.

## 4. Discussion

Using a previously tested and validated statistical disease model, we computed model-implicit expected endemic levels of tularemia with hydroclimatic variable values within their respective ranges of observations. We demonstrated that the computed expected endemic level can be used as a comparative index for analyzing the sensitivity of tularemia outbreaks to hydroclimatic conditions and for identifying critical hydroclimatic thresholds to high-risk outbreak conditions. Our analysis revealed that tularemia outbreaks are most sensitive to relative mosquito abundance (RMA) and associated underlying periodic hydroclimatic conditions of flow (Q

_{1} and Q

_{2}) and mean temperature (T), followed by summer temperature in the preceding year (ST

_{lag}) and summer precipitation in the current year (SP). Number of cold winter days with low snow coverage (CW) has the least influence on outbreaks and is the only hydroclimatic variable with a negative relationship with outbreaks, i.e., number of outbreaks and endemic level decreased with increasing CW (

Figure 3g,h,

Figure 4 ).

Although only 370 people were diagnosed with tularemia in the study area of Dalarna County over the 27-year study period (1981–2007) used for development of the statistical model [

24], our analysis revealed that certain combinations of relevant hydroclimatic variables, with values within the historically observed ranges, have high potential to trigger many more outbreaks. Projections of future climate suggest that the range of hydroclimatic variations may move outside the historical range for many locations [

38], implying even more threatening future conditions.

Previous studies have also investigated methods for determining the sensitivity of infectious diseases to hydroclimate change. Time-series studies, especially correlation or regression analyses [

39,

40], and geographical comparisons [

41], have been used for this purpose. A recent study [

41] reports advantages of geographical comparisons over time-series analysis, such as the possibility of using short data time series. However, that study used relative sensitivity as a statistical indicator of climate-sensitive infectious diseases, revealing the impact of only a single variable without considering temporal autocorrelation effects, whereby high incidence in one month or year might increase transmission potential in subsequent months or years. In contrast, the use of endemic level as an indicator, based on a validated statistical disease model, considers both interactions among several variables and temporal autocorrelations. Moreover, endemic level may be an effective comparative indicator across different regions and/or different infectious diseases, if validated statistical models are available for identifying endemic levels in comparable ways.

Here we only considered historic variable observation ranges, but the method for identifying endemic level and using it as an index for further identification of threshold hydroclimatic conditions can also be used with projected possible future values of the relevant hydroclimatic variables. Endemic levels and probability of exceeding hydroclimatic thresholds can then be compared between past conditions and projected future scenarios. Such comparisons will be useful for studying whether, how, and to what extent future hydroclimate changes may affect disease outbreaks and associated risks.

The WHO has performed a quantitative risk assessment of the effects of climate change on mortality caused by diarrhea, malaria, and dengue fever, using outcome-specific models to estimate future infectious disease-induced mortality with and without climate change [

42]. The differences between the two scenarios are then considered to be climate-driven impacts. As in the present study, the aim of the WHO assessment is to predict health impacts of climate change, but its approach requires large amounts of data and introduces uncertainty through choices of baseline climate and socioeconomic development, education, and technology scenarios for the future. In comparison, the method developed here only requires data on relevant hydroclimatic conditions for indicating likely trends in tularemia cases under different future situations.

Limitations of the method presented here include various constraints in the underlying statistical model, such as not representing changes in human behavior. For example, incorrect prediction by the model of tularemia occurrence in year 1987 may have been caused by reduction in forest berry picking and hunting due to the Chernobyl disaster in the preceding year, leading to minimized exposure to tularemia [

24]. Moreover, the method assumes independence of the considered hydroclimatic variables from societal factors that, in reality, may affect both hydroclimate and the tularemia outbreaks, such as climate mitigation-adaptation measures. The time that it may take for the number of outbreaks to approach expected endemic levels, driven by long-term hydroclimatic changes, is also not considered, and may vary for different hydroclimatic conditions. This modeling limitation can be overcome by considering some convergence rate, although validation of such assumed rates will be difficult under long-term hydroclimatic variability and change.

Furthermore, regional statistics-based disease models, such as that used here for tularemia, may not be directly transferable to other geographical regions and/or scales. One way forward is then to use specifically calibrated models for different regions/scales. For example, region-specific models for seven high-risk parts of Sweden have been established and show high capability in predicting regional outbreaks from 1984 to 2012, with potential of using such region/scale-specific models to analyze hydroclimatically driven shifts from the past to the future in each region and compare model coefficients and results among the regions [

43].

The purpose of the present work was to develop a method for interpreting the effect of hydroclimatic factors from statistical models, by which hydroclimatic data can provide a straightforward picture of comparative risk levels. The method has the flexibility to be applied to different statistical disease models, but the extent to which the risk can be analyzed relies on the availability of such validated models. The model and validation limitations so far call for further research on how hydroclimate changes affect infectious disease outbreaks across regions and scales, with particular attention to potential critical thresholds and to the generality and spatiotemporal transferability of statistical disease models.