1. Introduction
Lyme disease is caused by infection with spirochete bacteria within the
Borrelia burgdorferi sensu lato (
B. burgdorferi s.l.) complex, where
Borrelia burgdorferi sensu stricto is the primary pathogen in human and animal populations in North America. In the
B. burgdorferi s.l. complex,
B. mayonii was identified as a novel human pathogen in 2016 when discovered in Minnesota and Wisconsin residents [
1]. It is likely that additional
Borrelia spp., including those in the Bbsl complex, will be discovered with increased surveillance and advances in next-generation screening of humans, domestic animals and wildlife [
2].
Lyme disease is an increasingly significant public health issue across the eastern United States (U.S.). In 2023, over 89,000 Lyme disease cases were reported through routine national surveillance, but alternative estimates suggest that as many as 476,000 people are treated for Lyme disease annually in the U.S. [
3]. Once largely confined to the Northeast, Lyme disease has expanded northward into Canada, westward into Ohio, Iowa, and Illinois, and has become a growing concern in Virginia and the American Southeast associated with the range expansion of the eastern blacklegged tick,
Ixodes scapularis [
4]. The western blacklegged tick,
Ixodes pacificus, is the vector primarily responsible for spreading Lyme disease throughout the western portion of the U.S.
The continued increase in Lyme disease incidence has been attributed to climate change, increased humidity, and human sprawl, which have pushed wildlife into increasingly anthropogenic landscapes [
5]. While white-tailed deer (
Odocoileus virginianus) are important reproductive hosts for adult ticks, several small mammal reservoir species play a critical role in maintaining and transmitting
B. burgdorferi s.l. In the northeast, the white-footed mouse (
Peromyscus leucopus) is a key reservoir host, whereas in western U.S., species such as the western gray squirrel (
Sciurus griseus) and dusky-footed woodrat (
Neotoma fuscipes) contribute to pathogen maintenance in tick populations [
6,
7]. These species readily occupy peridomestic and human-altered environments, increasing opportunities for infected nymphal ticks to encounter humans and domestic animals.
Climate change has further contributed to tick range expansion by altering the distribution and seasonal activity of
Ixodes species. Notably, behavioral differences between northern and southern blacklegged tick populations may influence host-seeking patterns and, ultimately, disease transmission dynamics [
8]. Key environmental factors, such as mild winters, humid summers, reduced burning regimes, and dense vegetation, play critical roles in preventing tick desiccation and facilitating the persistence of
B. burgdorferi s.1 [
9]. Together, these environmental changes, shifting land use patterns, and expanding vector and reservoir host communities have transformed Lyme disease from a historically regional issue into a pressing One Health challenge [
5,
10].
The
Borrelia burgdorferi sensu lato (s.l.) complex comprises at least 28 genospecies, including seven “Candidatus” taxa, recently proposed to be split into two genera:
Borreliella and
Borrelia [
11]. Of the 18 genospecies for which pathogenic potential has been evaluated, 11 are found only in Eurasia, with
B. afzelii and
B. garinii identified as human pathogens in addition to
B. burgdorferi s.s. [
12]. In North America,
B. burgdorferi s.s. and, more rarely,
B. mayonii are human pathogens, and the roles of other genospecies as human or canine pathogens remain poorly characterized. Climate change has altered the phenology of
Ixodes spp., which transmit Lyme disease throughout Eurasia and follow a life cycle similar to that of
Ixodes spp. in North America [
13]. In the U.S., diagnostic tests for Lyme disease predominantly use the
B. burgdorferi B31 strain, whereas multiple Bbsl strains contribute to Lyme disease surveillance in Eurasia. Our study focuses exclusively on North American diagnostic data, so these differences do not affect model development or evaluation.
The volume of search trends in the Google Trends data report by geographic area (metropolitan area and state) has been used for prediction of various infectious diseases, including Lyme disease [
14], influenza [
15], and Zika [
16]. Building on our prior work [
17], which demonstrated the potential utility of Google Trends data for predicting monthly Lyme disease incidence at the state level, the present study extends this approach by integrating environmental variables and canine case data to improve prediction accuracy and explore a One Health perspective. Canine populations serve as valuable sentinels for human Lyme disease risk [
18,
19]. Rising seroprevalence in dogs has been correlated with increased human cases [
20]. This study further advances the methodology by evaluating model performance using bootstrapped Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) with uncertainty quantification, and by comparing predictive patterns across both human and canine populations, which were not assessed in our previous work. We hypothesized that incorporating environmental and canine data to address the One Health triad would improve predictive accuracy for state-level Lyme disease incidence. In the U.S., Lyme disease surveillance and reporting are conducted primarily at the state level by health departments and the CDC, making this a relevant scale for public health decision-making. Ultimately, findings from this work are intended to inform more timely and geographically responsive Lyme disease surveillance and prevention strategies by supporting early risk detection, guiding public health messaging, and helping veterinary and medical communities anticipate periods of elevated exposure risk.
2. Materials and Methods
This ecological study was approved by the Lincoln Memorial University Institutional Review Board (IRB #2025/3/1).
2.1. Data Sources
2.1.1. Lyme Disease Data
Monthly, state-level human Lyme disease case counts were collected through publicly available online repositories and direct requests to state health departments [
21,
22,
23,
24]. Specific data notes from health departments, where applicable, are in
Supplementary File. States reporting fewer than 10 cases annually were excluded to reduce instability associated with very small case counts and protect the privacy and identity of individual cases, as small cell sizes can make individuals potentially identifiable while also producing unreliable rate estimates and disproportionate influence of random variation on model performance. Final inclusion was determined based on data completeness and availability. The states included for analysis were California, Connecticut, Indiana, Kansas, Maine, Michigan, North Dakota, New Hampshire, Oregon, Rhode Island, South Carolina, Texas, Virginia, Vermont, Washington, and West Virginia. Canine Lyme disease test results (SNAP™ 4Dx™ Test or SNAP™ 4Dx™ Plus Test) from 2010 to 2019 were obtained from IDEXX Laboratories, Inc. (Westbrook, ME, USA).
2.1.2. Google Trends Data
The search volume data from Google Trends were retrieved using the gtrendsR package in R version 4.0.2 [
25]. Google Trends reports relative interest scores from 0 to 100, with 100 representing peak popularity during the queried time frame and region [
26]. Term selection was informed by prior research, a literature review, and input from subject matter experts. Based on results from our previous study [
17], we chose the two top-performing search terms for predicting human Lyme disease cases: “Lyme disease” and “Lymes” for inclusion in the predictive models presented in this study.
To capture search interest in canine Lyme disease, we included search terms related to dog-specific symptoms, tick-borne diseases, and canine health that would be applicable to the general public. To account for variation in phrasing, we included both direct and reversed word orders (e.g., “dog limping” and “limping dog”) when there was reportable search volume for those terms. The final search terms included:
- •
Lyme disease and vector-related terms: “dog Lyme”, “Lyme dog”, “tick dog”.
- •
Clinical signs and symptoms: “dog limping”, “limping dog”, “dog lethargic”, “lethargic dog”, “not eating dog”.
- •
Physical abnormalities: “bump dog”, “lump dog”, “dog lump”, “dog swelling”, “swelling dog”.
- •
Chronic and infectious diseases: “dog arthritis”, “arthritis dog”, “kennel cough”, “distemper”, “dog cancer”, “cancer dog”.
Each term was independently evaluated for its predictive performance against state-level human canine disease case counts using MAE. The two terms with the lowest error, indicating the strongest model fit, were retained for inclusion in the final predictive models used in the analysis.
2.1.3. Environmental Data
Environmental data were aggregated at the monthly state-level to align with outcome measures. Environmental data, including the monthly maximum and minimum temperatures, total precipitation (inches), and average humidity (%), were obtained from the NOAA National Centers for Environmental Information [
27].
2.2. Statistical Analysis
Descriptive statistics (mean, standard deviation, minimum, median, and maximum) for monthly state-level canine Lyme disease cases were summarized by month and year. Descriptive statistics for monthly state-level human Lyme disease cases were presented in a study by Wisnieski et al., 2023 [
17].
To forecast monthly, state-level human and canine Lyme disease case counts, the data were structured as state-month observations (“xtset” state time). Twelve-month expanding window negative binomial regression models were estimated using the “rolling” command to generate state-level forecasts while accounting for both temporal trends and differences across states in Stata version 19.0 [
28]. For each month, the model was trained on the preceding 12 months of data, including lagged incidence, Google search term volumes, and climate variables, and then used to predict the subsequent month’s cases. Coefficients were updated recursively as new data became available, allowing the model to generate out-of-sample forecasts while using only information that would have been available at the time of prediction. This approach ensures that predictions reflect a realistic forecasting scenario rather than retrospective fitting. A negative binomial framework was selected after identifying over-dispersion in the outcome data (i.e., human and canine Lyme disease case counts).
We aimed to compare the predictive performance of models incorporating data from components of the One Health Triad: human health, animal health, and environmental factors. By evaluating models that included data from each domain individually, as well as in combination, we were able to assess the relative and combined contributions of these data sources to predict human Lyme disease cases. Each variable in the models included a 12-month lag to account for seasonal differences, allowing us to more accurately capture temporal patterns in disease dynamics. We used the following configurations for models predicting monthly state-level human Lyme disease case counts:
Human search terms (search volume for “Lyme disease” and “Lymes”);
Canine data (search volume for “tick dog” and “Lyme dog” + case counts);
Canine case counts *;
Environmental data (maximum and minimum temperatures, precipitation, and average humidity);
One Health (volume of human search terms for “Lyme disease” and “Lymes”] + canine case counts + environmental data [maximum and minimum temperatures, precipitation, and average humidity]).
* Note that for the human configurations, one model included only canine case data (without canine search data), because canine search term data were missing for certain low-volume states (Kansas, North Dakota, and Vermont) due to data suppression in Google Trends for low search frequency and the inclusion of search terms in the model had a low impact on the results. The One Health model excluded canine search data in order to maximize available sample size and generalizability to low search volume states.
For the monthly state-level canine Lyme disease models, we used the following configurations:
Human data (search term volume for “Lyme disease” and “Lymes” + case counts);
Canine search terms (search volume for “tick dog” and “Lyme dog”);
Environmental data (maximum and minimum temperatures, precipitation, and average humidity);
One Health data: (volume of human search terms volume for “Lyme disease” and “Lymes” + human case counts + canine search term volume for “tick dog” and “Lyme dog” + environmental data [maximum and minimum temperatures, precipitation, and average humidity])
Model performance was assessed by calculating the MAE and the Mean Absolute Percentage Error (MAPE) for each model configuration. MAE reflects the average deviation between predicted and observed Lyme disease case counts. MAPE reflects the mean of the absolute differences between predicted and observed incidence, divided by the observed incidence, expressed as a percentage. To quantify uncertainty, we generated bootstrapped estimates of MAE and corresponding 95% confidence intervals (CIs) using 1000 resamples. Comparisons between models were made by examining the degree of overlap in the CIs for MAEs: non-overlapping CIs were interpreted as indicating a statistically significant difference in predictive performance, while overlapping CIs suggested no significant difference between models. The canine and One Health models for the three states are not included due to missing canine search term data, as described above.
4. Discussion
In this study, predictive models for human and canine Lyme disease incidence were built using One Health (human, canine, and environmental) data. We hypothesized that including One Health data would improve model predictions based on prior research findings that emphasized the need for incorporation of One Health surveillance systems for zoonotic diseases [
29,
30]. However, inclusion of One Health data only improved human Lyme disease predictions in 6 out of 16 states and did not improve canine Lyme disease predictions. Overall, the models had large prediction errors and alternative prediction models are needed to accurately forecast Lyme disease incidence.
Similar studies had varying results. In the study by O’Brien et al. (2025), including canine Lyme disease insurance claims in predictive models did not improve prediction of human Lyme incidence [
31]. By contrast, Bouchard et al. (2023) found an association between human Lyme disease cases and risk maps based on Lyme disease knowledge and behavior and ecological components of Lyme disease [
32]. The predictive ability of a One Health model depends on the type, quality, and operability of the data with other data sources [
29]. In our study, we utilized Lyme disease case information from state health departments, which is a passive surveillance system that underreports the true number of cases [
33,
34]. In addition, surveillance systems vary by state. To ease the burden of case reporting, case definitions do not require clinical information in high-incidence states, but this information is required in low-incidence states [
34]. Our models could potentially be strengthened using other metrics for human Lyme disease incidence in addition to state health department data, such as insurance claims or electronic health records [
35,
36]. In addition, the use of Google Trends data for some search terms was limited in some states, leading to the inability to produce One Health models for canine Lyme disease incidence in three states. Careful consideration of applicable search terms that have an adequate search volume is imperative for future research [
17]. Lastly, human Lyme disease case data and canine IDEXX results represent distinct surveillance sources with different reporting mechanisms and potential biases. Human cases are derived from passive public health surveillance, whereas canine test data reflect veterinary diagnostic testing patterns. As such, these datasets should be interpreted as complementary indicators of Lyme disease risk rather than directly equivalent measures of incidence. These differences in data generation and reporting may also contribute to heterogeneity in model predictive performance between the human and canine outcomes.
4.1. Limitations
There were several limitations that affected this study. Our analysis was conducted at the state scale to align with the spatial resolution of certain predictors, including Google Trends data, which are reported only at state or metropolitan levels. The use of state-level disease incidence and climate variables, aggregated at a monthly resolution, may have influenced model performance and obscured within-state heterogeneity in environmental conditions, tick ecology, and Lyme disease risk. A more refined analysis using county-level incidence and alternative data sources could provide more precise, spatially relevant estimates. In addition, we included data from only 16 states, which limits the generalizability of the model to other U.S. regions and Canadian provinces.
Because of the limited geographic scope, we were also unable to stratify analyses by major ecological regions (e.g., eastern, Midwestern, and western coastal U.S.) that differ in tick vector species, particularly
Ixodes scapularis and
Ixodes pacificus. These species exhibit important ecological and behavioral differences that may influence transmission dynamics and model performance under varying environmental conditions. Future studies incorporating broader geographic coverage could evaluate whether predictive relationships differ across vector regions and how shifting tick distributions associated with climate change may alter these dynamics [
4,
7,
9].
Expanding future models to include additional states and Canadian provinces, along with more spatially resolved disease surveillance data, could strengthen generalizability and improve our understanding of regional variation in Lyme disease risk.
4.2. Conclusions
Overall, the inclusion of One Health data improved prediction of human Lyme disease incidence in some states but did not improve prediction of canine Lyme disease incidence. However, even the best-performing models exhibited substantial prediction errors, limiting their practical utility. We recommend testing other data streams to produce higher-performing predictive models, such as electronic health record data and other environmental data. Future studies can also consider more refined analyses by predicting county-level disease incidence.