Next Article in Journal
Assessing the Value of Data-Driven Frameworks for Personalized Medicine in Pituitary Tumours: A Critical Overview
Previous Article in Journal
Super-Resolution of Sentinel-2 Satellite Images: A Comparison of Different Interpolation Methods for Spatial Knowledge Extraction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Artificial Intelligence Models for Forecasting Mosquito-Borne Viral Diseases in Human Populations: A Global Systematic Review and Comparative Performance Analysis

1
Faculty of Medicine, University Vita-Salute San Raffaele, 20132 Milan, Italy
2
PhD National Program in One Health Approaches to Infectious Diseases and Life Science Research, Department of Public Health, Experimental and Forensic Medicine, University of Pavia, 27100 Pavia, Italy
3
Division of Public Health, Infectious Diseases and Occupational Medicine, Department of Medicine, Mayo Clinic College of Medicine and Science, Mayo Clinic, Rochester, MN 55905, USA
4
Department of Infectious Diseases, “Luigi Sacco” University Hospital, Azienda Socio-Sanitaria Territoriale (ASST) Fatebenefratelli FBF Sacco, 20157 Milan, Italy
5
Regional Health Care and Social Agency of Lodi, Azienda Socio-Sanitaria Territoriale (ASST) Lodi, 26900 Lodi, Italy
6
Local Health Unit of Trapani, ASP Trapani, 91100 Trapani, Italy
7
Department of Biomedical and Clinical Sciences “L. Sacco”, University of Milan, 20157 Milan, Italy
8
Centre for Multidisciplinary Research in Health Science (MACH), University of Milan, 20122 Milan, Italy
9
Department of Cardiac Thoracic Vascular Sciences and Public Health, University of Padua, 35128 Padova, Italy
*
Authors to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2026, 8(1), 15; https://doi.org/10.3390/make8010015
Submission received: 29 November 2025 / Revised: 22 December 2025 / Accepted: 24 December 2025 / Published: 7 January 2026
(This article belongs to the Section Thematic Reviews)

Abstract

Background: Mosquito-borne viral diseases are a growing global health threat, and artificial intelligence (AI) and machine learning (ML) are increasingly proposed as forecasting tools to support early-warning and response. However, the available evidence is fragmented across pathogens, settings and modelling approaches. This review provides, to the best of our knowledge, the first comprehensive comparative assessment of AI/ML models forecasting mosquito-borne viral diseases in human populations, jointly synthesising predictive performance across model families and appraising both methodological quality and operational readiness. Methods: Following PRISMA 2020, we searched PubMed, Embase and Scopus up to August 2025. We included studies applying AI/ML or statistical models to predict arboviral incidence, outbreaks or temporal trends and reporting at least one quantitative performance metric. Given the substantial heterogeneity in outcomes, predictors and time–space scales, we conducted a descriptive synthesis. Risk of bias and applicability were evaluated using PROBAST. Results: Ninety-eight studies met the inclusion criteria, of which 91 focused on dengue. The forecasts spanned national to city-level settings and annual-to-weekly resolutions. Across classification tasks, tree-ensemble models showed the most consistent performance, with accuracies typically above 0.85, while classical ML and deep-learning models showed wider variability. For regression tasks, errors increased with temporal horizon and spatial aggregation: short-term, fine-scale forecasts (e.g., weekly city level) often achieved low absolute errors, whereas long-horizon national models frequently exhibited very large errors and unstable performance. PROBAST assessment indicated that most studies (63/98) were at high risk of bias, with only 24 judged at low risk and limited external validation. Conclusions: AI/ML models, especially tree-ensemble approaches, show strong potential for short-term, fine-scale forecasting, but their reliability drops substantially at broader spatial and temporal scales. Most remain research-stage, with limited external validation and minimal operational deployment. This review clarifies current capabilities and highlights three priorities for real-world use: standardised reporting, rigorous external validation, and context-specific calibration.

Graphical Abstract

1. Introduction

In the last decades, arboviral diseases such as dengue, Zika, chikungunya and yellow fever have posed an increasingly difficult challenge to global public health officials [1]; urbanization, climate change and globalization have facilitated the expansion and the survival of competent mosquito vectors such as Aedes aegypti and Aedes albopictus in new ecological niches [2,3,4]. While these infections have always been responsible for significant mortality, morbidity and economic burden in tropical and subtropical regions [5,6], an increasing number of imported and autochthonous outbreaks has been recently described at higher latitudes [7]. Early detection of outbreaks and accurate prediction of transmission are essential for effective containment and mitigation, supporting vector surveillance, clinical practice, and efficient resource allocation [8]. Conventional surveillance and early-warning methods, while effective, are usually limited by their dependence on delayed case reporting or limited climatic proxies [9,10,11]. In this context, artificial intelligence (AI) and machine learning (ML) techniques have emerged as powerful tools for infectious disease modelling [12,13]; the possibility of integrating high-dimensional and heterogenous data, such as meteorological, environmental and demographic, allows for a deeper and more accurate understanding of arboviral disease transmission [14,15,16]. Despite these promising results, the literature remains largely devoid of systematic, comparative evaluations that can clarify when and where AI-driven models for arboviral forecasting are truly useful. Existing reviews typically focus on single pathogens, narrow modelling families, or descriptive overviews, leaving unresolved questions about real-world applicability. The marked heterogeneity in data sources, algorithmic approaches and temporal and spatial forecasting scales further hampers meaningful comparison across studies and limits the ability to identify which modelling strategies consistently deliver reliable predictions [17]. In addition, direct evaluations of AI/ML models against traditional statistical methods are uncommon, and critical methodological aspects, such as validation strategies, risk of bias, and generalisability across spatial and temporal scales, are often insufficiently addressed. To address these limitations, this systematic review aims to comparatively evaluate the performance of AI and ML models developed to forecast mosquito-borne viral infections in human populations. Specifically, we synthesise predictive performance across different modelling families and epidemiological contexts, compare AI/ML approaches with traditional statistical models when available, and assess methodological quality and implementation readiness. By doing so, this review seeks to clarify the current capabilities and limitations of AI-driven forecasting models and to inform their appropriate use in public health surveillance and early-warning systems.

2. Materials and Methods

2.1. Search Strategy

This systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA 2020) guidelines. The protocol was prospectively registered in PROSPERO (CRD420251124153). The review addressed the following question: “What is the performance of AI and ML models in predicting the temporal trend and incidence peaks of mosquito-borne diseases, such as dengue, Zika, chikungunya, West Nile virus, yellow fever, and Rift Valley fever, compared with traditional statistical approaches?”. A comprehensive literature search was conducted in PubMed, Embase, and Scopus up to 11 August 2025. The search combined controlled vocabulary (MeSH terms) and free-text terms related to artificial intelligence, machine learning, deep learning, and mosquito-borne diseases (including dengue, Zika, chikungunya, West Nile, yellow fever, and Rift Valley fever). Reference lists of included studies and relevant reviews were screened manually, and field experts were contacted to identify additional publications or unpublished data. The full search strings, for each database, are available in Supplementary Table S1.

2.2. Eligibility Criteria

Studies were included if they (i) involved human populations at risk of, or with confirmed cases of, mosquito-borne viral diseases; (ii) developed, validated, or applied AI/ML-based models to predict disease incidence trends or peaks; and (iii) reported at least one quantitative performance metric. Eligible outcomes comprised classification metrics such as Area Under the Curve (AUC), sensitivity, specificity, positive predictive value (PPV)/precision, negative predictive value (NPV), accuracy, F1-score, as well as regression metrics, including mean absolute error (MAE), root mean squared error (RMSE), mean squared error (MSE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE), coefficient of determination (R2), and correlation coefficient (r). Comparator models, when reported, consisted of alternative AI/ML techniques or classical statistical models applied to the same dataset. Eligible study designs included retrospective and prospective cohort studies, case–control studies, cross-sectional studies, and diagnostic accuracy studies. In addition, modelling studies based on secondary data sources—such as surveillance databases, electronic health records, or environmental and climatic datasets—were included, provided that they reported at least one quantitative performance metric related to the predictive or diagnostic task. Lastly, only original research articles published in peer-reviewed journals were retained. Only articles written in English were considered, without date restrictions. On the contrary, studies focusing exclusively on non-human or vector-only data, laboratory research without human health outcomes, or environmental data without direct linkage to human case prediction or diagnosis were excluded. Studies available only as abstracts, conference proceedings, book chapters, letters to the editor, commentaries, reviews, or non–peer-reviewed reports were excluded. Likewise, full texts that could not be retrieved, as well as duplicated datasets published in multiple papers, were omitted to avoid redundancy.

2.3. Study Selection

All records retrieved were imported into a reference management software, and duplicates were removed. Two reviewers (FP and AP) independently screened titles and abstracts to identify potentially eligible studies. Any discrepancies between the two reviewers were resolved by discussion, with arbitration by a third reviewer when necessary (FB). The full texts of the selected articles were then assessed independently for inclusion against the predefined criteria.

2.4. Data Extraction

Data were extracted using a pre-tested form developed in Microsoft Excel. The extracted information included bibliographic details (first author, year of publication, continent, and country), study characteristics (design, period, setting, and population), disease features (type, case definition, prediction horizon, and disease definition), and methodological aspects such as handling of missing or imbalanced data, calibration procedures, data splitting strategy, and implementation readiness.
Information on the data sources used for model development was recorded and categorized as follows: epidemiological (e.g., surveillance data, incidence, outbreak reports); clinical (e.g., laboratory-confirmed cases, hospital records); climatic or environmental (e.g., temperature, rainfall, vegetation index); socio-demographic (e.g., population density, age distribution, poverty, urbanization); mobility or transport (e.g., travel and migration flows, mobile phone data); big data (e.g., social media, web search queries); genomic or virological (e.g., viral sequences, genotypes, serotypes); health system (e.g., access to care, diagnostic capacity); policy or intervention (e.g., vaccination coverage, vector control campaigns); landscape (e.g., agricultural or forested land); and entomological (e.g., mosquito density, breeding sites, vector indices).
For each AI/ML model, the principal algorithm, the number of variables included versus considered, dataset structure and split, and performance metrics were extracted. When available, classification and regression metrics were recorded.
If a comparator model was present, the same performance metrics and details on dataset structure and validation type (internal or external) were also collected. When necessary, authors were contacted to obtain missing or unclear information. Data extraction was conducted by one reviewer and verified by another to ensure accuracy.
To ensure comparability across studies, all predictive models—whether primary models or comparators—were classified into seven predefined categories based on their underlying methodological structure:
(i)
Classical machine learning (NARX neural networks, decision trees, AutoTiC-NN, feed-forward neural networks, ANN, backpropagation NN, SVR, SVM, LASSO, Naive Bayesian Network, Bayesian Network, logistic regression, multiple linear regression, generalized linear models, Gaussian processes, regression models);
(ii)
Tree-ensemble methods (Random Forest, Extra Trees Classifier, Gradient Boosting, AdaBoost, GBM, BRT, XGBoost, LightGBM, CatBoost, CART);
(iii)
Deep learning (BiLSTM, CNN, LSTM, GRU, RNN, DFFN, MobileNetV3, ResNet50, CNN-BiLSTM, CNN-BiGRU with Attention, ConvLSTM, stacked LSTM/BiLSTM, hybrid CNN–LSTM architectures, XEWNet, EWNet, transformer-based models, NBeatsX);
(iv)
Time-series and statistical models (NNAR, SARIMA, ARIMA, VAR, naïve or moving-average baselines, temporal-average baselines, Poisson regression, SARIMAX, Prophet);
(v)
Mechanistic models (WRF, SIR + EAKF, SI–SIR);
(vi)
Other or heuristic approaches (e.g., GANN, ANFIS, Differential Evolution, fuzzy systems, DIR);
(vii)
Hybrid or superensemble models, defined as models integrating two or more techniques from different categories.
The models combining multiple techniques within the same category were classified according to that category and not considered hybrid. This grouping was used consistently across all metric-specific visual summaries.
Because regression-based performance metrics such as MAE, RMSE, and MSE are scale-dependent and do not represent percentage errors, an additional extraction step was performed to ensure comparability across studies. For each model reporting MAE, RMSE, or MSE, the scale of the target variable was extracted along three dimensions:
(i)
Unit of measurement (e.g., absolute case counts, cases per 100,000 population, log-transformed cases);
(ii)
Temporal resolution (e.g., weekly, 10-day, monthly);
(iii)
Spatial resolution, classified as: national (entire country), regional (multiple provinces or states), provincial (single administrative region), district level (municipalities or sub-city areas), or city level (single city).
This information was taken directly from each study and not derived or transformed and was used to facilitate valid cross-study comparisons of scale-dependent regression metrics.

2.5. Data Synthesis and Statistical Analysis

The results are reported in accordance with PRISMA 2020, including a PRISMA flow diagram and detailed tables summarizing study characteristics, methodological quality, and performance metrics. Specifically, a narrative synthesis was first conducted to summarize study characteristics, AI/ML model types, and outcomes. The result ranges reported in the various subparagraphs of the AI categories, refer to both the principal and comparative models.
Given the substantial heterogeneity in study design, temporal and spatial resolution, incidence scale, and reporting practices, no meta-analytic pooling was performed. Instead, all analyses were descriptive and exploratory. To characterise model performance across studies, unweighted distributions were generated for each reported metric of the principal models, without applying transformations, normalisation procedures, or weighting by sample size or study quality. Outliers were retained to preserve the original variability of the source data. All metrics were summarised using multi-panel visualisations to describe performance dispersion within and across modelling families. For classification metrics (AUC, sensitivity, specificity, PPV, NPV, accuracy, and F1-score), study-level estimates were displayed using horizontal boxplots stratified by modelling family. When ≥3 observations were available for a given family, full boxplots were produced. For numerical regression metrics with strong dependence on outcome scale (RMSE and MAE), multi-panel figures were constructed by stratifying results into predefined magnitude ranges (very small, small, medium, large). This approach prevented misleading cross-scale comparisons and highlighted context-dependent error behaviour across modelling families. Additional regression metrics (MAPE, MSE, R2, and r) were summarised using single-panel unweighted distributions. Visual synthesis focused on describing central tendency, variability, dispersion patterns, and systematic differences across modelling groups. No cross-family statistical comparisons or inferential tests were conducted, as heterogeneity in study frameworks, reporting conventions, and outcome units precluded harmonisation.

2.6. Risk of Bias Assessment

The methodological quality and risk of bias of the included studies were evaluated using the Prediction model Risk of Bias Assessment Tool (PROBAST). Each study was assessed across four domains: participants, predictors, outcome, and analysis. Assessments were performed independently by two reviewers, with disagreements resolved through consensus.

3. Results

3.1. Literature Search

A total of 4531 records were identified through database searches in PubMed/MEDLINE (n = 801), Scopus (n = 1768), and Embase (n = 1962). After removing duplicates (n = 2555), 1976 unique records were screened by title and abstract. Of these, 1790 were excluded due to being non-original or focusing on unrelated topics, leaving 186 records eligible for full-text review. Full-texts were unavailable for 22 articles. After full-text assessment, 66 records were excluded, including 98 records for specific reasons. The overall study selection process is illustrated in Figure 1.

3.2. Geographical Distribution

Most of the included studies originated from Asia and South America, reflecting the higher burden of mosquito-borne diseases in these regions. At the continental level (Figure 2), the majority of studies were conducted in Asia (n = 62) [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79], followed by South America (n = 23) [80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101] and North America (n = 6) [102,103,104,105,106,107]. Fewer studies were reported from Europe (n = 2) [91,108], Africa (n = 1) [109], and Oceania (n = 1) [110]. Three [111,112,113,114] studies covered multiple continents and were therefore not represented on the map.
At the country level (Figure 3), the highest number of studies was identified in Brazil (n = 15) [80,82,84,85,86,88,89,90,91,93,94,96,98,99,115], Malaysia (n = 12) [36,38,51,52,53,54,57,58,62,67,76,78], and Bangladesh (n = 8) [18,27,37,47,55,56,65,71]. Several other countries contributed a smaller number of studies, mainly located in the Americas and Southeast Asia. Countries highlighted but without numerical labels correspond to those represented by a single study. Detailed information on the geographical distribution of studies is provided in Supplementary Table S2.

3.3. Temporal Distribution and Evolution of AI Model Types

The 98 studies that met the inclusion criteria had the following yearly distribution: 3 [43,83,110] in 2015, 4 [36,49,70,107] in 2016, 2 [34,100] in 2017, 6 [24,25,45,61,80,106] in 2018, 3 [21,60,102] in 2019, 12 [31,41,50,64,82,93,98,101,103,108,111,116] in 2020, 11 [19,29,32,35,44,59,67,75,76,88,89] in 2021, 11 [38,47,57,72,86,90,91,96,99,115,117] in 2022, 12 [39,52,53,62,69,78,79,81,94,95,105,114] in 2023, 18 [18,20,22,23,33,37,40,46,48,55,58,66,73,74,84,97,104,109] in 2024, and 16 [26,27,30,51,54,56,63,65,68,71,74,77,85,87,92,113] in 2025.
Considering both principal and comparator models, the use of AI methods showed a progressive diversification over time (Figure 4). Classical ML algorithms represented the predominant approach in the initial years and remained consistently applied throughout the decade, although with a reduction from 2022 onward. Tree-ensemble and time-series statistical models did not show a clear increasing trend but were intermittently used during the study period, particularly from 2018 onward. A similar pattern was observed for deep learning models, which were absent before 2018 and showed a marked increase from 2020, becoming one of the most frequently applied categories in recent years. Mechanistic and heuristic models were rarely employed. Models classified as hybrid/superensemble included studies that combined multiple AI algorithms or where the model type could not be unambiguously categorized.

3.4. Characteristics of the Features of the Included Studies

As shown in Figure 5, across the 98 included studies, epidemiological surveillance data were the most frequently used predictors (n = 90) [18,19,20,21,22,23,24,25,26,27,29,30,31,32,33,34,35,36,37,38,39,40,43,44,45,47,48,50,51,52,53,54,55,56,58,59,62,63,64,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,100,101,102,104,105,106,107,108,110,111,113,114,115,116], followed by climatic and environmental variables (n = 87) [18,19,20,21,22,23,24,25,26,27,29,30,31,32,33,34,35,36,37,38,39,40,43,44,45,47,48,50,51,52,53,54,55,56,57,60,62,64,66,69,70,71,72,73,74,75,76,77,78,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,97,98,99,100,101,102,103,104,105,106,108,109,110,111,113,114,115,116]. Socio-demographic factors were incorporated in one third of the models (n = 33) [23,25,31,38,39,43,47,50,52,53,54,65,66,70,71,78,81,89,91,92,93,97,98,100,101,102,105,106,108,110,111,113], whereas landscape variables (n = 18) [25,33,46,52,53,65,66,69,70,71,78,89,91,100,101,105,110,113] and entomological indicators (n = 14) [25,38,43,46,48,51,62,70,74,78,96,102,113,116] were less commonly included. Mobility and transport data (n = 13) [25,44,61,82,84,85,91,100,101,102,108,112,118], big data sources such as internet or social media queries (n = 7) [19,49,64,77,78,90,112], clinical information (n = 5) [46,54,67,68,116], and health system indicators (n = 5) [47,68,102,112,113] were rarely used. Genomic/virological [112,113] and policy or intervention-related [33,112] variables were only sporadically considered (both n = 2).

3.5. Included Studies Characteristics

Table 1 summarizes the main characteristics of the 98 included studies. Most contributions were forecasting or modelling studies using routine surveillance data, while a smaller subset adopted ecological or spatiotemporal designs, and only a few were based on hospital or clinical datasets. The majority of models were developed at national or sub-national level in the general population, whereas studies focusing specifically on travellers or hospitalized patients were rare. Almost all studies targeted dengue (n = 90) [18,19,20,21,22,23,24,25,26,27,29,30,31,32,33,34,35,36,37,38,39,40,41,43,44,45,46,47,48,49,51,52,53,54,55,56,57,58,59,60,62,63,64,65,66,67,68,69,70,71,72,73,74,76,77,78,79,80,81,82,83,84,85,87,88,90,91,92,93,94,95,96,97,99,100,101,103,104,106,107,108,110,111,112,113,114,115,116], with a limited number addressing other mosquito-borne infections such as Zika (n = 2) [98,102], West Nile virus (n = 2) [91,105], Yellow Fever (n = 1) [89] or Rift Valley fever (n = 1) [109], while 2 [75,86] studies modelling multiple arboviral diseases simultaneously. Outcomes were most commonly defined as weekly or monthly incidence or case counts, generally based on suspected or laboratory-confirmed cases recorded in routine surveillance systems. Regarding the prediction task, most models focused on short-term forecasts at weekly time scales, with fewer studies addressing medium-term (up to several months) or long-term horizons of one year or more. Handling of missing values and data imbalance was heterogeneously reported: many studies did not explicitly describe any procedure, while others applied simple [18,26,35,38,39,47,56,60,67,73,74,80,83,87,109,113] imputation or case exclusion strategies, and only a minority used more advanced methods such as resampling or specialized imputation algorithms. Internal validation was typically performed using temporal train–test splits or k-fold cross-validation. In terms of implementation readiness, most studies were classified as “research only” or “proof-of-concept”, with only a small number explicitly designed as decision-support tools or described as being used operationally within routine public-health surveillance.
Data sources varied widely across studies, ranging from national surveillance systems and meteorological stations to remote sensing, mobility, and socioeconomic databases, with some studies integrating multiple heterogeneous datasets. A detailed list of data sources used in each study is provided in Supplementary Table S2.

3.6. Model Performance by AI Category

Overall, the reported performance varied substantially across model categories, reflecting differences in outcome type, data availability, and prediction tasks. Classification metrics (Table 2 and Supplementary Table S3), including AUC, sensitivity, specificity, PPV, or precision, NPV, accuracy, and F1-score, were mainly used in case-based or alert-level models. Regression metrics (Table 3 and Supplementary Table S4), instead, reported error and goodness-of-fit measures such as RMSE, MAE, MAPE, MSE, Coefficient of Determination (R2), Pearson’s correlation coefficient (r), and SMAPE.

Model Validation Approaches

Among the 98 included studies, the vast majority implemented internal validation strategies (n = 89) [18,19,20,21,22,25,26,27,30,31,32,33,34,35,37,38,39,40,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,62,64,65,66,67,68,69,70,71,72,73,74,75,76,77,79,80,81,82,83,84,86,88,89,90,91,92,93,94,95,96,97,99,100,101,102,104,105,106,108,109,110,111,113,114,115,116,119], while only 5 [41,61,98,103,107] studies reported an external validation procedure, typically using data from distinct time periods, regions, or populations, while only 1 [23] study integrating both internal and external validation steps (Table 2). The most frequent internal validation methods were simple hold-out or random train/test splits (n = 33 [20,22,27,30,31,37,39,40,44,45,47,51,54,55,58,62,63,66,68,69,71,73,75,77,85,87,88,93,102,105,108]), with training proportions ranging between 50% and 80%, and k-fold cross-validation (n = 19 [18,20,21,23,27,38,39,43,46,47,48,49,66,71,76,108,109,113,116]), most commonly 5- or 10-fold. Temporal or time-series validation approaches were applied in 22 [25,32,33,34,37,51,56,66,68,71,72,74,75,82,90,91,93,94,97,105,111,115] studies, mainly in incidence forecasting models, often using rolling or expanding-window designs. A few investigations (n = 4 [23,66,71,109]) employed nested resampling or combined spatiotemporal validation schemes, whereas 3 [36,78,112] studies did not clearly report the validation type. Overall, external validation remained uncommon, and independent test sets were often limited in size, potentially leading to optimistic performance estimates.

3.7. Classical Machine Learning

3.7.1. Classification Metrics

Among the classical ML models, AUC values ranged from 0.70 [102] (NARX NN) to 0.96 [95] (SVM). In the time-series applications (NARX NN), performance declined with longer forecast horizons, from 0.91–0.95 at 1 week to 0.70–0.74 at 12 weeks [102]. Sensitivity varied between 0.87 [49] (CART) and 1.00 [43] (SVM-L), while specificity ranged from 0.01 [57] (ANN) to 0.95 [83] (DT). Accuracy values spanned 0.47 [106]–1.00 [99], with the best performance in Diffusion Maps + SVM (RBF), and the lowest in the ANN models. PPV was reported only in 2 studies [82,116] (0.92 in both), and NPV was not provided in any study. Reported F1-scores were generally high (0.73 [106]–0.97 [95]), confirming balanced predictive ability across most classical ML algorithms.

3.7.2. Regression Metrics

In classical ML studies, MAE values ranged from 4.57 [47] (MLR, monthly, city level) to 4759.06 [18] (DT + Sequential Squeeze FS, monthly, national). Notably, 2 studies conducted on the same scale (cases-monthly-city level) reported markedly different MAE values using MLR: 4.57 [47] in one case and 200.68 [41] in another, highlighting substantial variability even within identical spatiotemporal settings. The reported RMSE values spanned from 0.04 [104] for an ANN model predicting severe dengue (weekly, regional) to 9296.35 [18] for DT + Sequential Squeeze FS (monthly, national). It is relevant that, within the same monthly national scale and using comparable DT-based approaches, the RMSE of 5.43 ± 0.43 [52] reported in another study was substantially lower due to model overfitting on a very small dataset. One study [36] reported MSE within the classical ML category (NN Regression, weekly, district level), with values ranging from 0.06–0.08 in Hulu Selangor (Malaysia) to 98.55 in Hulu Langat. Reported MAPE ranged from 0.94 [18] (DT + Sequential Squeeze FS, monthly, national) to 17–24 [70] for LASSO (weekly forecasts). SMAPE was not reported by any study. For R2, values ranged from 0.18 [60] (MLR) to 0.99 [34] (SVR across provincial settings). Finally, r ranged from 0.50 [41] (MLR, monthly, city level) to 0.91 [90] for LASSO at a 1-week horizon, with progressively lower correlations at longer horizons (0.76 at 3 weeks, 0.61 at 6 weeks, and 0.56 at 8 weeks).

3.8. Tree-Ensemble Models

3.8.1. Classification Metrics

Among tree-ensemble models, AUC values were consistently high, ranging from 0.84 [71] (LightGBM) to 0.99 [40] (AdaBoost and XGB). Reported sensitivity ranged from 0.64 [40] (RF, Bangkok) to 0.99 [109] (XGB). Some XGB [91] configurations displayed marked 0.98 [71] (LightGBM). For PPV, values ranged from 0.88 [40] (RF) to 0.99 (GB in Bangladesh and XGB [109]), with consistently strong precision across ensemble approaches. NPV was reported in only one [71] study; it differed across years (0.86 in 2018 vs. 0.69 in 2019), reflecting changes in feature windows or data availability. Specificity values varied between 0.73 [48] (RF) and 0.98 (LightGBM) and 0.96 (XGBoost). Accuracy values spanned 0.79 [40] (RF)–1.00 [109], with the perfect score achieved by XGB in one study, while lower values reflected imbalanced datasets. Finally, F1-scores ranged from 0.72 [20] (Extra Trees) to 0.98 [40] (GB), with most ensemble models showing F1 values ≥ 0.90, confirming a strong balance between sensitivity and precision across settings.

3.8.2. Regression Metrics

In tree-ensemble studies, MAE values ranged from 0.15 [24] (RF, weekly, city level, per 1000 population) to 97.9 [87] in categorical dengue settings (RF, weekly, city level). Several studies operating on the same monthly national scale with RF, XGB, and LightGBM reported MAEs between 0.24 and 0.87 [71], showing relatively tight clustering across different feature groups. Reported RMSE values extended from 0.21 [24] (RF, weekly, city level, per 1000 population) up to 23.36 [71] (RF, weekly, city level, 8-week horizon). Tree-ensemble models applied at monthly national scale [71] (RF, XGB, LightGBM with SHAP) consistently reported RMSEs below 1.0, with LightGBM yielding the lowest values (0.32–0.57). In contrast, weekly city-level [90] forecasting displayed clear horizon-dependent increases (11.03 at 1 week vs. 23.36 at 8 weeks). For MSE, values ranged from 1.20 [20] (ETC, weekly, district level) to over 5000 [23] in 10-day regional incidence forecasts (ranger and ensemble configurations). Reported MAPE values ranged from 0.05–0.17 [71] for LightGBM, RF, and XGB at monthly national scale (training and test), up to 8.32 [38] in RF models when entomological covariates were removed, indicating substantial sensitivity of tree-based predictors to domain-specific input features. SMAPE was not reported by any tree-ensemble study. For R2, estimates ranged from 0.09 [71] (LightGBM) to 0.85 [90] at short horizons in weekly city-level RF models, with values declining systematically as forecast length increased (0.62 at 3 weeks, 0.40 at 6 weeks, 0.34 at 8 weeks). Finally, r values ranged from 0.39 [98] (RF, monthly, national) to 0.95 [87] across several weekly city-level districts (e.g., Natal and Barranquilla).

3.9. Deep Learning Models

3.9.1. Classification Metrics

Among deep learning approaches, AUC was reported in a single study [55], with the principal model (MobileNetV3Small) and its comparators (ResNet50 and MobileNetV3Large) all achieving 0.98 ± 0.01.
Reported sensitivity showed substantial variability across architectures and locations, ranging from values near 0.00 [85] in city-specific LSTM forecasts (e.g., Belém, Fortaleza) to 0.97 ± 0.03 [55] in MobileNetV3Small. Specificity was generally higher, spanning 0.33–1.00 [85] across LSTM thresholds and cities, and reaching 0.99 ± 0.01 [55] in MobileNetV3Small and ResNet50/MobileNetV3Large. For precision, values ranged from 0.88 [68] (CNN-BiGRU with attention) to 0.99 ± 0.01 [55] (MobileNetV3Small), while NPV was not provided in any study. Reported accuracy varied widely depending on the model architecture and forecast horizon, from 0.26–1.00 [21] in CNN spatiotemporal experiments to 0.98 ± 0.01 [55] in MobileNetV3Small and ResNet50/MobileNetV3Large. LSTM models exhibited broad cross-city variability (approximately 0.63–0.98 [85], depending on threshold and location), while horizon-dependent CNN-BiLSTM [79] predictions declined from 0.88 at 1 week to 0.78 at 4 weeks. Finally, F1-scores ranged from 0.00 [85] to 0.98 [55], with the lowest values again observed in city-level LSTM forecasts with extreme class imbalance and the highest in MobileNetV3Small and ResNet50/MobileNetV3Large.

3.9.2. Regression Metrics

Among deep learning models, MAE values varied substantially across architectures and spatiotemporal scales, ranging from 0.20 to 0.53 [91] in log-weekly-regional LSTM configurations to values exceeding 1000 [85] in several weekly city-level LSTM applications (e.g., Belo Horizonte 1483; Brasília 1067). Monthly national LSTM forecasts showed MAEs of 301.64 [37], while BiLSTM and 1D-CNN models at monthly city scale reported values between 19.11 and 31.49 [19]. Reported RMSE values ranged from 0.22 to 0.40 [91] in log-weekly-regional LSTM models to >800 in monthly national forecasts under high-incidence conditions. City-level weekly LSTM outputs showed RMSEs between 4.79 and 10.13 [35], while CNN–BiLSTM and 1D-CNN models at monthly and weekly city scale returned a higher value, at 106.96 [63]. For MSE, only one study [30] provided data, reporting 3187.43 for a monthly 1D-CNN at city level. Reported MAPE varied from approximately 21–30% [85] in individual LSTM predictions to >40% in several LSTM baseline configurations, with notable cross-city variability. SMAPE was reported only from 1 [19] study, with BiLSTM values of 0.18–0.31 at monthly city scale. For R2, estimates ranged from 0.91 to 0.94 [63] for CNN-LSTM hybrid and ConvLSTM models to 1.00 [72] in univariate LSTM settings. Finally, r values ranged from 0.42 [98] (deep feed-forward networks) to 0.92 [59] in LSTM models, with year-to-year improvements observed in multi-year evaluations (e.g., from 0.58 in 2016 to 0.92 in 2018).

3.10. Hybrid/Superensemble Models

3.10.1. Classification Metrics

Across hybrid and super-ensemble approaches, AUC values ranged widely depending on the underlying base learners, from 0.62 to 0.82 [62] in lower-performing models within multi-algorithm frameworks (e.g., ANN, DT, AdaBoost) to 0.93–0.97 [108] in stronger ensembles based on RF, XGB, glmnet, and PLS. Reported sensitivity extended from 0.42 to 0.49 [46] in weaker decision-tree or SVM configurations to 0.99 [95] in ensemble-optimized RF, DT, and AdaBoost models. Specificity showed a similar spread, ranging from 0.67 [33] in basic GAM/ANN/SVM configurations to 0.95 [108] in glmnet, RF, and XGB. For precision, values ranged from 0.41 to 0.48 [46] in NB/DT/SVM models to 0.90–0.93 [108] in RF and XGB. NPV was reported less frequently but ranged from 0.87 [65] to 0.96 [108] in GLM and XGB. Reported accuracy varied substantially, from 0.29 to 0.42 [18] in simple KNN/GB/SVR settings to 0.99 [95] in enriched hybrid pipelines combining feature selection with multiple classifiers. Finally, F1-scores ranged from 0.41 to 0.51 [46] in weaker NB/SVM/DT models to 0.95–0.99 [95] in sophisticated hybrid frameworks (e.g., PCA/GOOSE/PSO, AdaBoost, RF).

3.10.2. Regression Metrics

MAE values spanned a very broad range across scales and settings, from 0.17 to 0.27 [24] per 1000 population at weekly city level (GAM/GB) and from 0.43 to 0.52 [92] per 100,000 population at monthly provincial level (GLM, RF, XGB, LSTM), up to >50,000 [84] cases in the worst long-horizon state-level forecasts of climate-/case-based LSTM and Bayesian models in Brazil. Reported RMSE ranged from 0.02 [33] in weekly case–city-level settings (GAM, RF, CIF, SVM, ANN, XGB) to >20,000 [18] cases for some monthly national SVR and XGBoost models. Where reported, MSE values ranged from 0.18 to 0.37 [113] at the annual national scale (RF, XGB, MLP, SVR) to >80,000 [30] for monthly city-level ANN in high-incidence settings. MAPE values ranged from <1 to 6% [18] for several national monthly ensembles (RF, XGB, GB, SVR, KNN) to 30–70% [77] in ARDL-based city-level hybrids and exceeded 900–1400% [84] in the worst-performing state–horizon combinations of Brazilian LSTM and Bayesian RE models. SMAPE was rarely reported but, where available, was low (0.04–0.08) [69] for optimized weekly city-level ensembles (e.g., CNN + ANN + SVM, LSTM-RF). For R2, estimates ranged from 0.00 to 0.98 [34], while no study reported r for hybrid/super-ensemble regression models.

3.11. Time-Series/Statistical Models

3.11.1. Classification Metrics

Regarding time-series and statistical baselines, AUC was reported in a single study, with a temporal average baseline achieving a value of 0.78 [25]. Accuracy was likewise available from only one [37] comparative analysis, in which ARIMA and Prophet models reached 0.58 and 0.60, respectively. Sensitivity, specificity, PPV, NPV, and the F1-score were not reported by any study in this category.

3.11.2. Regression Metrics

MAE values ranged from 2.80 [64] (moving average baseline, weekly, provincial) to 433.21 [37] (ARIMA, monthly, national). Reported RMSE spanned from 293.9 [94] in a statistical baseline (seasonal naïve) with monthly cases at the district level to 6806 [80] (naïve monthly city-level model). MSE was reported in a single [64] study, with values ranging widely across provinces and forecast horizons, from 6.47 to 32.86 for baseline and moving-average models (Mukdahan, Pattani) up to 1729.00 in naïve forecasts for Chiang Rai at longer horizons. MAPE was seldom reported and varied from 39.66 to 42.18 [37] for Prophet and ARIMA up to 94.84 [58] for NNAR (weekly, national). SMAPE was not provided by any time-series/statistical study. For R2, only naïve and moving-average baselines reported values, ranging from 0.07 (weekly, provincial, long horizon) to 0.97 [64] (weekly, provincial, short horizon), while r was never reported.

3.12. Mechanistic and Heuristic Models

Because only a small number of studies relied on mechanistic or heuristic approaches, their performance metrics are reported together.

3.12.1. Classification Metrics

Classification outcomes were seldom reported. Among mechanistic approaches, only [46] study (WRF) provided evaluation metrics, reporting a sensitivity of 0.88, a specificity of 0.95, a precision of 0.85, an accuracy of 0.94, and an F1-score of 0.86. For heuristic and rule-based systems, only one study [116] reported classification results, comparing Bayesian Belief Networks, neural networks, and fuzzy systems; these models showed highly consistent performance, with sensitivity, specificity, accuracy, and F1-scores all ranging between 0.88 and 0.89.

3.12.2. Regression Metrics

Regression metrics were reported infrequently across mechanistic and heuristic studies. Among the heuristic approaches, ANFIS [41] showed an MAE of 151.51, an RMSE of 216.54, and an r = 0.83 at the monthly city level, while differential evolution models [100] yielded an MAE of 40.18–308.68, an RMSE between 40.04 and 106.30, and an MSE from 1627.11 to 11,869.5 across national monthly settings. One [36] heuristic GANN model reported only case-specific deviations (0.06–0.07) at the weekly district level. Mechanistic studies provided limited numerical outputs: the SIR + EAKF ensemble [107] reported timing, peak, and total-case errors (e.g., 4.8, 25, 519 for its primary configuration), without standard regression metrics. Overall, regression performance in this category remains sparsely documented and highly heterogeneous.

3.13. Descriptive Performance Patterns Based on Unweighted Comparative Analyses

3.13.1. Classification Performance

Across studies, substantial heterogeneity was observed in the performance of the evaluated modelling approaches. Overall, classification metrics demonstrated systematic differences in performance distributions, with tree-ensemble and classical machine-learning models generally showing higher stability and deep-learning models exhibiting greater variability (Figure 6). The distribution of AUC values showed consistently high discrimination for tree-ensemble models, with most estimates exceeding 0.85, while classical ML approaches spanned a wider range (0.65–0.96). Hybrid and superensemble methods also showed high AUC values but were represented by few observations, whereas single estimates from deep-learning and time-series/statistical models limited interpretability. For sensitivity, tree-ensemble models demonstrated the highest and most stable performance, with most values above 0.85. Classical ML models displayed broader dispersion (0.73–0.99), while deep-learning models showed the widest range overall, with sensitivities extending from 0.00 to 1.00 across different LSTM configurations. Few observations represented hybrid and mechanistic approaches. The distribution of specificity values also varied considerably. Tree-ensemble methods showed strong and concentrated performance (typically 0.90–0.98). Classical machine-learning estimates ranged more widely (0.70–0.95), and deep-learning models again showed substantial dispersion (0.33–1.00). Hybrid and mechanistic approaches contributed limited additional observations. Regarding PPV, tree-ensemble models yielded consistently high precision, with estimates typically between 0.89 and 0.99. Classical machine-learning estimates were more variable (0.70–0.92). Deep-learning models generally performed well, with most values clustering between 0.88 and 0.99. Mechanistic modelling contributed a single data point. For NPV, only tree-ensemble and classical machine-learning models reported estimates. Tree-ensemble approaches demonstrated high NPV (0.94–0.98), whereas classical machine-learning values were slightly lower (0.88–0.91). No other modelling families contributed NPV metrics. The distribution of accuracy values showed that tree-ensemble models achieved the most stable and highest performance (0.87–1.00). Classical machine-learning models spanned a broad range (0.20–0.97). Deep-learning models exhibited the widest variability, extending from very low scores (0.26) to perfect accuracy (1.00), reflecting strong dependence on architecture and study conditions. Hybrid and mechanistic approaches provided fewer observations. Finally, F1-scores indicated that tree-ensemble models achieved the highest and most consistent balance between precision and recall (typically 0.90–0.98). Classical machine-learning models showed broader dispersion (0.73–0.97). Deep-learning approaches exhibited the broadest range overall (0.00–0.98), highlighting variability in class-balance handling across datasets and architectures. Mechanistic modelling contributed only a single estimate.

3.13.2. Regression Performance

Regression metrics revealed marked heterogeneity across studies, largely driven by differences in spatial scale, temporal resolution, and underlying case magnitude. Overall, error distributions showed consistent scale-dependence: low errors occurred in fine-resolution forecasts, whereas larger errors were associated with national-level or high-incidence settings.
For RMSE (Figure 7), multi-panel stratification showed four distinct magnitude regimes. In the very small range (≤1), tree-ensemble, hybrid, and deep-learning models showed tightly clustered errors, indicating stable performance under low-incidence, high-resolution conditions. The small-error range (1–10) displayed broader but still moderate variability across modelling families. Medium-range RMSE values (10–1000) showed pronounced heterogeneity, most evident among deep-learning and classical machine-learning models, reflecting mixed spatial/temporal contexts. The largest RMSE values (>1000) originated primarily from national-scale predictions using classical ML and time-series/statistical models.
Magnitude-stratified MAE distributions confirmed these scale effects (Figure 8). Very small errors (≤1) were mostly produced by tree-ensemble and classical ML models applied to weekly district/provincial data. Small errors (1–10) encompassed several modelling families, with variability driven more by data granularity than by algorithm choice. Medium-scale MAE values (10–1000) showed broader dispersion, especially for deep-learning and tree-ensemble methods in monthly national or city-level forecasts. Large errors (>1000) were associated primarily with national-level deep-learning and classical ML models.
MAPE distributions further supported this pattern (Figure 9). Tree-ensemble and classical ML models consistently produced the lowest percentage errors, including several estimates below 1%. Moderate errors (2–15%) were observed across tree-ensemble, hybrid, and some deep-learning approaches. Most deep-learning configurations showed wider dispersion (20–36%). The highest errors (>90%) occurred exclusively in time-series/statistical models. MSE values also reflected strong dependence on geographic and temporal granularity (Figure 9). Tree-ensemble models spanned from near-zero to several thousand, particularly in regional 10-day forecasts. Deep-learning models ranged from low-error district-level settings to much higher values in monthly city-level predictions. Classical ML models generally occupied the lower range, while hybrid and heuristic approaches appeared at both extremes, depending on scale. For R2, classical ML demonstrated the most consistently high explanatory power (0.75–0.99) (Figure 9). Tree-ensemble methods showed wider variation, ranging from low (0.09) to high (0.92–0.85), depending on study context. Deep-learning models reported high values, including near-perfect fits. Hybrid/superensemble models exhibited intermediate-to-high values but were represented by few observations. The correlation coefficient r showed similar patterns (Figure 9). Tree-ensemble models achieved the strongest and most stable correlations (0.60–0.95). Deep-learning models were more heterogeneous (0.42–0.92), though several achieved strong correlations (≥0.80). The heuristic model contributed a single mid-high value (0.83).

3.14. Assessment of Risk of Bias Using PROBAST

Across the 98 included studies, the overall methodological quality was highly variable (Figure 10). Using PROBAST, 63 studies were judged at high overall risk of bias, 23 at low risk, and 12 at unclear risk. At the domain level, the participant and predictor domains showed the most favourable profiles, with >78% of studies rated as low risk, reflecting the reliance on routinely collected surveillance and environmental datasets. In contrast, the analysis domain represented the most critical source of bias: over 60% of studies received a high-risk judgement. Common limitations included lack of transparent reporting of preprocessing steps, inconsistent handling of missing or imbalanced data, absence of hyperparameter tuning procedures, and insufficient safeguards against overfitting.

4. Discussion

4.1. Interpretation of Main Findings

This review provides the most extensive multi-metric, cross-model synthesis to date of forecasting performance for mosquito-borne viral diseases across classical machine-learning, tree-ensemble, deep-learning, hybrid/superensemble, mechanistic and time-series/statistical approaches. Substantial heterogeneity was evident across studies in terms of geographic setting, temporal resolution, incidence scale, predictor selection, model specification, and performance reporting. Forecasts were produced at multiple spatial units (national, provincial, district, city) and time granularities (annual, monthly, weekly, 10-day and bi-monthly), often with different combinations of climate, demographic, environmental and entomological covariates. Reporting practices were highly inconsistent, with many studies providing incomplete or non-standardised metrics and only a minority clarifying validation procedures or uncertainty quantification. Across the classifications, tree-ensemble models consistently showed the highest median performance and the lowest dispersion, whereas classical machine-learning and deep-learning models displayed broader variability, often driven by model architecture, input complexity and study context. Hybrid/superensemble, mechanistic, and time-series models were less frequently reported, limiting inference. Regression metrics exhibited a strong dependence on temporal and spatial scale. Very small error ranges were almost exclusively observed in fine-resolution, district- or provincial-level forecasting, whereas large errors were associated with national-scale, high-incidence settings. Tree-ensemble and classical ML approaches frequently achieved the lowest error values across scale bands, while deep-learning and time-series/statistical methods produced more dispersed and context-dependent results. Overall, the findings demonstrate that forecasting performance reflects the interplay between algorithmic family, spatiotemporal context, incidence magnitude, and study design, underscoring the need for standardised evaluation frameworks.

4.2. Interpretation and Comparison with Existing Literature

In comparing our findings with previously published systematic reviews of dengue and mosquito-borne disease forecasting models, several points of convergence emerge, but our study also provides substantive methodological extensions. The largest study [120] to date, reviewing 98 dengue outbreak prediction models across 64 studies, reported marked inconsistencies in modelling practices, including limited adoption of ML approaches (39.4%), very low rates of external validation (5.2%), and highly heterogeneous reporting of performance metrics. These structural issues are reproduced in our dataset: despite a broader temporal and geographical scope, we likewise observed substantial heterogeneity in modelling choices, input feature sets, temporal horizons, validation strategies, and combinations of reported metrics. A second review [121] focusing on dengue forecasting in endemic regions found that ML models, particularly tree-ensemble methods such as RF, tended to outperform classical statistical approaches (e.g., ARIMA, Poisson regression), yet emphasised the fragmented evidence base and variability in methodological quality. Our results are consistent with these observations: tree-ensemble methods in our synthesis showed the highest central tendency and smallest dispersion for classification performance, whereas classical ML and DL models showed markedly broader variability. This mirrors prior evidence but also shows, with greater granularity, how performance dispersion manifests differently across each metric, modelling family, and study design. A third study [122] highlighted that although many ML-based models reported favourable predictive accuracy, the absence of transparent validation procedures and the inconsistent reporting of evaluation metrics severely limited cross-study comparability. Our findings reinforce this concern: while some deep-learning and hybrid architectures achieved high performance in specific contexts, the variability across metrics, particularly regression measures, was strongly influenced by study-specific factors such as temporal granularity, incidence scale, and covariate selection. Our findings reinforce this concern: high performance was achievable for certain deep-learning or hybrid models, but variability, particularly in regression metrics, was strongly driven by study-specific characteristics such as temporal resolution, spatial aggregation, incidence magnitude, and covariate selection. However, our study extends the existing evidence base by explicitly quantifying how these sources of heterogeneity propagate across performance distributions. Regression metrics (RMSE, MAE, MAPE, and MSE) demonstrated pronounced scale-dependence in our dataset. By stratifying RMSE into predefined magnitude bands (≤1, 1–10, 10–1000, >1000), we showed that fine-resolution district-level weekly forecasts consistently achieved RMSE ≤ 1, while national-scale or monthly predictions routinely exceeded RMSE > 1000. This behaviour aligns with comparative modelling work such as the Rio de Janeiro study [120], which found that LSTM architectures with climatic covariates outperformed ARIMA only at short horizons, whereas the ensemble or hybrid approaches were more competitive for broader spatial scales or longer forecasting windows. Similarly, Liu et al. [123] reported that XGBoost achieved RMSE = 109, MAE = 127 and MAPE = 12.9% for monthly division-level forecasts in Bangladesh, outperforming SARIMA and SVR and reinforcing that scale, horizon, and feature design critically shape regression performance. Our stratified multi-panel visualisations provide direct empirical confirmation of this context-dependence: in the “very small” error band (≤1), tree-ensemble, hybrid, and deep-learning models clustered tightly; in the “small” band (1–10), variability increased across modelling families; in the “medium” band (10–1000), deep-learning and classical ML models exhibited wide interquartile ranges; and in the “large” band (>1000), classical ML and time-series/statistical methods produced the highest absolute errors. Across all metrics, these results indicate that algorithmic family alone is insufficient for accurate forecasting performance; instead, the interplay between modelling strategy, spatial/temporal scale, data structure, and epidemiological signal magnitude is a dominant determinant of predictive accuracy.

4.3. Implications for Public-Health Practice

From a public-health perspective, the findings of this review provide several actionable insights for the integration of mosquito-borne viral disease forecasting models into operational early-warning systems. The consistently strong performance of tree-ensemble methods across classification metrics suggests that these algorithms are particularly well suited for outbreak detection, alert-level assignment, and other decision-support applications requiring robust categorical predictions at fine spatial or temporal resolution. This observation is consistent with evidence from operational or semi-operational contexts, where Random Forest-based systems have shown stable outbreak detection performance, for example, within Singapore’s national dengue forecasting programme [70] and in ArboMAP [124] for West Nile virus risk mapping in the United States. Such stability across heterogeneous input conditions suggests that tree-ensemble models may offer more reliable early-warning signals than deep-learning or traditional statistical approaches in settings where data completeness varies over time and is broadly consistent with previous work on data-driven surveillance and forecasting for tropical and sub-tropical diseases [125]. The pronounced association between error magnitude and epidemiological scale observed in this review underscores the importance of aligning modelling strategy with intended operational use. District-level, short-term forecasts were characterised by low error and narrow dispersion, making them more compatible with actionable public-health decision-making (e.g., targeted vector control, short-term resource allocation). In contrast, national-scale or monthly forecasts frequently exhibited substantially larger errors and wider variability, indicating that such outputs should be interpreted with caution and used primarily for situational awareness rather than precise operational planning. These findings reinforce the need for public-health agencies to calibrate expectations to scale-specific uncertainty, rather than applying uniform performance thresholds across settings. Importantly, the descriptive and unweighted nature of the available evidence highlights that model performance alone is not sufficient for real-world adoption. The lack of rigorous validation practices remains a critical barrier: prior systematic reviews have shown that fewer than 10% of dengue forecasting models undergo external validation, and our review identified similar gaps [120]. Models developed without proper cross-validation or external testing may be overfitted to their training environment, and real-world experience has demonstrated that forecast accuracy often degrades substantially when models are deployed in new locations or under changing epidemiological conditions [120,126]. Similar challenges have been documented for digital epidemiology systems based on non-conventional data streams (e.g., Google Flu Trends and Google Dengue Trends), where insufficient validation led to substantial misestimation of true disease activity. For this reason, external validation, context-specific calibration, and transparent uncertainty quantification should be considered essential prerequisites before operational deployment. The marked heterogeneity in study design, data sources, and performance-reporting standards further highlights the need for harmonised reporting frameworks. The adoption of standardised evaluation protocols, including consistent forecast horizons, uncertainty measures, and scale-stratified error reporting, would substantially improve comparability across modelling approaches and support more evidence-based integration into routine surveillance systems. Finally, most models included in this review were developed exclusively for research purposes, with only a small minority piloted or implemented within real-world public-health infrastructures. This limited operational uptake is consistent with prior assessments of dengue and arboviral forecasting systems, which have repeatedly noted the gap between methodological innovation and practical deployment. Accelerating translation into practice will require co-development with public-health institutions, emphasis on interpretability and maintainability, and integration into existing surveillance workflows. Moreover, early-warning systems will only translate into effective risk reduction if they are accompanied by risk communication and community-engagement strategies; for example, poor awareness and knowledge of Zika virus observed in the Italian general population illustrate how limited understanding of arboviral risks can undermine the impact of preventive measures [127]. Strengthening standardisation, validation, and reporting practices will be essential for transitioning forecasting models from exploratory research tools into reliable components of operational early-warning systems. Looking ahead, the rapidly growing body of AI-based arboviral forecasting studies originating from low- and middle-income countries, particularly in Asia and Latin America, is encouraging and suggests that context-adapted tools may increasingly be developed where the burden of mosquito-borne diseases is highest. At the same time, long-standing concerns about “algorithmic inequity” remain highly relevant: many AI systems in global health have historically been trained and validated predominantly on data from high-income, well-resourced settings, raising the risk of models that encode and amplify structural biases related to geography, race/ethnicity and socioeconomic status, and that perform suboptimally in underrepresented populations [128,129]. Ensuring that future AI models for mosquito-borne viral diseases are trained on diverse, locally generated data, co-designed with stakeholders in resource-limited settings and evaluated through rigorous external validation will therefore be essential to avoid reproducing these inequities [130]. If coupled with investments in open data infrastructures, capacity building and transparent governance, the ongoing expansion of AI research in low- and middle-income settings has the potential to shift the field towards more equitable, accessible, and implementable early-warning tools for those communities most affected by arboviral transmission.

4.4. Strengths and Limitations

This study offers several strengths. It represents the first systematic effort to collate and compare the performance of AI and machine-learning models across the spectrum of mosquito-borne viral diseases, providing a unified descriptive overview of a highly fragmented field. The use of unweighted visual distributions allows an unbiased representation of performance variability, and the multi-panel stratification of scale-dependent metrics (RMSE and MAE) provides a clearer understanding of error behaviour across different incidence scales and analytical contexts. Several limitations must also be acknowledged. The analysis is purely descriptive, without meta-analytic weighting or adjustment for methodological heterogeneity; therefore, no causal inference or formal comparison across modelling families can be drawn. Dependence on published study-level metrics introduces potential publication bias, selective reporting, and uncertainties due to incompletely documented modelling procedures. Residual confounding related to prediction horizon, geographical setting, and underlying incidence patterns is likely, despite stratification efforts. Moreover, the uneven representation of modelling categories, particularly the limited number of deep-learning and mechanistic models, reduces the robustness of comparative insights. Finally, although the review aimed to address all major mosquito-borne viral diseases, the available literature was overwhelmingly focused on dengue, limiting the generalisability of findings to other arboviruses.

5. Conclusions

In conclusion, this review showed substantial variability in performance across AI/ML forecasting models for mosquito-borne viral diseases. Tree-ensemble approaches emerged as the most consistently reliable for classification, while regression accuracy was strongly shaped by spatial and temporal scale, incidence levels, and prediction horizon. By providing the first comparative synthesis that integrates model performance with methodological quality and operational readiness, this review clarifies where current approaches are genuinely fit for purpose. This contribution is particularly timely, given the growing institutional interest in robust early-warning systems for climate-sensitive infectious diseases. Priorities for advancing real-world use include standardised performance reporting, rigorous external validation and calibration to context-specific settings, steps essential to translating modelling advances into dependable public-health decision-support tools.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/make8010015/s1. Table S1: Full search strings used in PubMed, Embase, and Scopus, including all controlled vocabulary terms and free-text keywords, reported exactly as executed; Table S2: The table reports, for each study, the corresponding continent and country (or countries, for multi-country studies), together with the main data sources used for model development; Table S3: Results of classification metrics for the comparator AI models used in the included studies; Table S4: Results of regression metrics for the comparator AI models used in the included studies.

Author Contributions

Conceptualization, V.G. and F.P.; methodology, A.P.; software, V.G. and F.P.; validation, A.P., V.G. and F.P.; formal analysis, A.P., V.G. and F.P.; investigation, A.P., V.G. and F.P.; resources, A.P., V.G. and F.P.; data curation, A.P., V.G. and F.P.; writing—original draft preparation, all authors; writing—review and editing, all authors; visualization, A.P., V.G. and F.P.; supervision, C.S., V.B., O.E.S. and V.G.; project administration, V.G.; funding acquisition, O.E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wilder-Smith, A.; Gubler, D.J.; Weaver, S.C.; Monath, T.P.; Heymann, D.L.; Scott, T.W. Epidemic arboviral diseases: Priorities for research and public health. Lancet Infect. Dis. 2017, 17, e101–e106. [Google Scholar] [CrossRef] [PubMed]
  2. Li, Y.; Kamara, F.; Zhou, G.; Puthiyakunnon, S.; Li, C.; Liu, Y.; Zhou, Y.; Yao, L.; Yan, G.; Chen, X.-G. Urbanization Increases Aedes albopictus Larval Habitats and Accelerates Mosquito Development and Survivorship. PLoS Neglected Trop. Dis. 2014, 8, e3301. [Google Scholar] [CrossRef]
  3. Abbasi, E. Global expansion of Aedes mosquitoes and their role in the transboundary spread of emerging arboviral diseases: A comprehensive review. IJID One Health 2025, 6, 100058. [Google Scholar] [CrossRef]
  4. Nucci, D.; Pennisi, F.; Pinto, A.; De Ponti, E.; Ricciardi, G.E.; Signorelli, C.; Veronese, N.; Castagna, A.; Maggi, S.; Cadeddu, C.; et al. Impact of extreme weather events on food security among older people: A systematic review. Aging Clin. Exp. Res. 2025, 37, 137. [Google Scholar] [CrossRef]
  5. Chilakam, N.; Lakshminarayanan, V.; Keremutt, S.; Rajendran, A.; Thunga, G.; Poojari, P.G.; Rashid, M.; Mukherjee, N.; Bhattacharya, P.; John, D. Economic Burden of Mosquito-Borne Diseases in Low- and Middle-Income Countries: Protocol for a Systematic Review. JMIR Res. Protoc. 2023, 12, e50985. [Google Scholar] [CrossRef]
  6. Roiz, D.; Pontifes, P.A.; Diagne, C.; Leroy, B.; Vaissi, A. Science of the Total Environment the rising global economic costs of invasive Aedes mosquitoes and Aedes -borne diseases. Sci. Total. Environ. 2024, 933, 173054. [Google Scholar] [CrossRef] [PubMed]
  7. Lim, A.; Shearer, F.M.; Sewalk, K.; Pigott, D.M.; Clarke, J.; Ghouse, A.; Judge, C.; Kang, H.; Messina, J.P.; Kraemer, M.U.G.; et al. The overlapping global distribution of dengue, chikungunya, Zika and yellow fever. Nat. Commun. 2025, 16, 3418. [Google Scholar] [CrossRef] [PubMed]
  8. Cintra, A.M.; Noda-Nicolau, N.M.; de Oliveira Soman, M.L.; de Andrade Affonso, P.H.; Valente, G.T.; Grotto, R.M.T. The Main Arboviruses and Virus Detection Methods in Vectors: Current Approaches and Future Perspectives. Pathogens 2025, 14, 416. [Google Scholar] [CrossRef]
  9. Ureña, G.E.; Diaz, Y.; Pascale, J.M.; Lo, S. A framework for the early detection and prediction of dengue outbreaks in the Republic of Panama. Front. Trop. Dis. 2025, 5, 1465856. [Google Scholar] [CrossRef]
  10. Patiño, L.; Benítez, A.D.; Carrazco-Montalvo, A.; Regato-Arrata, M. Genomics for Arbovirus Surveillance: Considerations for Routine Use in Public Health Laboratories. Viruses 2024, 16, 1242. [Google Scholar] [CrossRef]
  11. Pinto, A.; Pennisi, F.; Ricciardi, G.E.; Signorelli, C.; Gianfredi, V. Evaluating the impact of artificial intelligence in antimicrobial stewardship: A comparative meta-analysis with traditional risk scoring systems. Infect. Dis. Now 2025, 55, 105090. [Google Scholar] [CrossRef]
  12. Abdi, Y.H.; Abdullahi, Y.B.; Abdi, M.S.; Bashir, S.G.; Ahmed, N.I. Using Artificial Intelligence in Vector Control: A New Path for Public Health. J. Vector Borne Dis. 2025, 144, 25. [Google Scholar] [CrossRef] [PubMed]
  13. Pennisi, F.; Pinto, A.; Ricciardi, G.E.; Signorelli, C.; Gianfredi, V. Artificial intelligence in antimicrobial stewardship: A systematic review and meta-analysis of predictive performance and diagnostic accuracy. Eur. J. Clin. Microbiol. Infect. Dis. 2025, 44, 463–513. [Google Scholar] [CrossRef]
  14. Brady, O.J.; Bastos, L.S.; Caldwell, J.M.; Cauchemez, S.; Clapham, H.E.; Dorigatti, I.; Gaythorpe, K.A.M.; Hu, W.; Hussain-Alkhateeb, L.; Johansson, M.A.; et al. Why the growth of arboviral diseases necessitates a new generation of global risk maps and future projections. PLoS Comput. Biol. 2025, 21, e1012771. [Google Scholar] [CrossRef]
  15. Pinto, A.; Pennisi, F.; Odelli, S.; Ponti EDe Veronese, N.; Signorelli, C.; Baldo, V.; Gianfredi, V. Artificial Intelligence in the Management of Infectious Diseases in Older Adults: Diagnostic, Prognostic, and Therapeutic Applications. Biomedicines 2025, 13, 2525. [Google Scholar] [CrossRef]
  16. Velasco, H.; Ortiz, S.; Catano-Lopez, A.; Castro, C.; Martin-Barreiro, C.; Leiva, V. Integrating machine learning and time-to-event models to explain and predict risk of hospitalization due to dengue in Colombia. Sci. Rep. 2025, 15, 38847. [Google Scholar] [CrossRef] [PubMed]
  17. Freitas, L.P.; Ferreira, D.A.d.C.; Lana, R.M.; Câmara, D.C.P.; Portella, T.P.; Carvalho, M.S.; Gouveia, A.S.; de Almeida, I.F.; Araujo, E.C.; Vacaro, L.B.; et al. A statistical model for forecasting probabilistic epidemic bands for dengue cases in Brazil. Infect. Dis. Model. 2025, 10, 1479–1487. [Google Scholar] [CrossRef] [PubMed]
  18. Al Mobin, M. Forecasting dengue in Bangladesh using meteorological variables with a novel feature selection approach. Sci. Rep. 2024, 14, 32073. [Google Scholar] [CrossRef]
  19. Anggraeni, W.; Yuniarno, E.M.; Rachmadi, R.F.; Purnomo, M.H. A Sparse Representation of Social Media, Internet Query, and Surveillance Data to Forecast Dengue Case Number using Hybrid Decomposition- Bidirectional LSTM. Int. J. Intell. Eng. Syst. 2021, 14, 209–225. [Google Scholar] [CrossRef]
  20. Nur, D.; Ningrum, A.; Li, Y.J.; Hsu, C. Artificial Intelligence Approach for Severe Dengue Early Warning System. Stud. Health Technol. Inform. 2024, 310, 881–885. [Google Scholar] [CrossRef]
  21. Anno, S.; Hara, T.; Kai, H.; Lee, M.; Chang, Y.; Oyoshi, K.; Mizukami, Y.; Tadono, T. Spatiotemporal dengue fever hotspots associated with climatic factors in Taiwan including outbreak predictions based on machine-learning. Geospat. Health 2019, 14, 183–194. [Google Scholar] [CrossRef]
  22. Anno, S.; Tsubasa, H.; Sugita, S.; Yasumoto, S.; Sasaki, Y.; Oyoshi, K. Geo-spatial Information Science Challenges and implications of predicting the spatiotemporal distribution of dengue fever outbreak in Chinese Taiwan using remote sensing data and deep learning. Geo-Spat. Inf. Sci. 2024, 27, 1155–1161. [Google Scholar] [CrossRef]
  23. Buebos-esteve, D.E.; Heherson, N.; Dagamac, A. Acta Tropica Spatiotemporal models of dengue epidemiology in the Philippines: Integrating remote sensing and interpretable machine learning. Acta Trop. 2024, 255, 107225. [Google Scholar] [CrossRef]
  24. Carvajal, T.M.; Viacrusis, K.M.; Hernandez, L.F.T.; Ho, H.T.; Amalin, D.M.; Watanabe, K. Machine learning methods reveal the temporal pattern of dengue incidence using meteorological factors in metropolitan Manila, Philippines. BMC Infect. Dis. 2018, 18, 183. [Google Scholar] [CrossRef] [PubMed]
  25. Chen, Y.; Hui, J.; Ong, Y.; Rajarethinam, J.; Yap, G.; Ng, L.C.; Cook, A.R. Neighbourhood level real-time forecasting of dengue cases in tropical urban Singapore. BMC Med. 2018, 16, 129. [Google Scholar] [CrossRef]
  26. Cheng, Y.; Cheng, R.; Xu, T.; Tan, X.; Bai, Y.; Yang, J. Integrating meteorological data and hybrid intelligent models for dengue fever prediction. BMC Public Health 2025, 25, 1516. [Google Scholar] [CrossRef]
  27. Chowdhury, A.H. Comparison of Deep Learning and Gradient Boosting: ANN Versus XGBoost for Climate—Based Dengue Prediction in Bangladesh. Health Sci. Rep. 2025, 8, e70714. [Google Scholar] [CrossRef]
  28. Zuanna, T.D.; Del Manso, M.; Giambi, C.; Riccardo, F.; Bella, A.; Caporali, M.G.; Dente, M.G.; Declich, S. The Italian Survey CARE Working Group. Immunization offer targeting migrants: Policies and practices in Italy. Int. J. Environ. Res. Public Health 2018, 15, 968. [Google Scholar] [CrossRef]
  29. Arya Dala, I.M.Y.; Darma Putra, I.K.G.; Buana, P.W. Forecasting Cases of Dengue Hemorrhagic Fever Using the Backpropagation, Gaussians and Support-Vector Machine Methods. J. RESTI 2021, 5, 335–341. [Google Scholar] [CrossRef]
  30. Kumar, D.; Omveer, D.; Yatindra, S.; Ram, G. Prediction of dengue patients using deep learning methods amid complex weather conditions in Jaipur, India. Discov. Public Health 2025, 22, 58. [Google Scholar] [CrossRef]
  31. Doni, A.R.; Sasipraba, T. Ingénierie des Systèmes d ’ Information LSTM-RNN Based Approach for Prediction of Dengue Cases in India. Syst. D’inf. 2020, 25, 327–335. [Google Scholar]
  32. Edussuriya, C.; Id, S.D.; Id, I.G. An accurate mathematical model predicting number of dengue cases in tropics. PLoS Neglected Trop. Dis. 2021, 15, e0009756. [Google Scholar] [CrossRef]
  33. Francisco, M.E.; Carvajal, T.M.; Id, K.W. Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction. PLoS Neglected Trop. Dis. 2024, 18, e0012599. [Google Scholar] [CrossRef]
  34. Guo, P.; Liu, T.; Zhang, Q.; Wang, L.; Xiao, J.; Zhang, Q.; Luo, G.; Li, Z.; He, J.; Zhang, Y.; et al. Developing a dengue forecast model using machine learning: A case study in China. PLoS Neglected Trop. Dis. 2017, 11, e0005973. [Google Scholar] [CrossRef]
  35. Handari, B.D.; Niman, I.M.S.; Hasan, A.; Purba, J.R.P.; Hertono, G.F. Comparation of Elman Neural Network, Long Short- Term Memory, and Gated Recurrent Unit in Predicting Dengue Hemorrhagic Fever at Dki Jakarta. Commun. Math. Biol. Neurosci. 2021, 2021, 87. [Google Scholar] [CrossRef]
  36. Husin, N.A.; Mustapha, N.; Sulaiman, N. Performance of Hybrid GANN in Comparison with Other Standalone Models on Dengue Outbreak Prediction. J. Comput. Sci. 2016, 12, 300–306. [Google Scholar] [CrossRef]
  37. Islam, S.; Shahrear, P.; Saha, G. Mathematical analysis and prediction of future outbreak of dengue on time-varying contact rate using machine learning approach. Comput. Biol. Med. 2024, 178, 108707. [Google Scholar] [CrossRef]
  38. Ismail, S.; Fildes, R.; Ahmad, R.; Najdah, W.; Mohamad, W.; Omar, T. The practicality of Malaysia dengue outbreak forecasting model as an early warning system. Infect. Dis. Model. 2022, 7, 510–525. [Google Scholar] [CrossRef] [PubMed]
  39. Javaid, M.; Sarfraz, M.S.; Aftab, M.U.; Zaman, Q.; Rauf, H.T.; Alnowibet, K.A. WebGIS-Based Real-Time Surveillance and Response System for Vector-Borne Infectious Diseases. Int. J. Environ. Res. Public Health 2023, 20, 3740. [Google Scholar] [CrossRef] [PubMed]
  40. Jayabalan, D.; Elango, S. ICE-VDOP: An integrated clustering and ensemble machine learning methods for an enhanced vector-borne disease outbreak prediction using climatic variables. Int. J. Inf. Technol. 2024, 16, 2077–2088. [Google Scholar] [CrossRef]
  41. Kerdprasop, N.; Kerdprasop, K.; Chuaybamroong, P. Computational Intelligence and Statistical Learning Performances on Predicting Dengue Incidence using Remote Sensing Data. Adv. Sci. Technol. Eng. Syst. J. 2020, 5, 344–350. [Google Scholar] [CrossRef]
  42. Cuomo, G.; Franconi, I.; Riva, N.; Bianchi, A.; Digaetano, M.; Santoro, A.; Codeluppi, M.; Bedini, A.; Guaraldi, G.; Mussini, C. Migration and health: A retrospective study about the prevalence of HBV, HIV, HCV, tuberculosis and syphilis infections amongst newly arrived migrants screened at the Infectious Diseases Unit of Modena, Italy. J. Infect. Public Health 2019, 12, 200–204. [Google Scholar] [CrossRef]
  43. Kesorn, K.; Ongruk, P.; Chompoosri, J.; Phumee, A. Morbidity Rate Prediction of Dengue Hemorrhagic Fever (DHF) Using the Support Vector Machine and the Aedes aegypti Infection Rate in Similar Climates and Geographical Areas. PLoS ONE 2015, 10, e0125049. [Google Scholar] [CrossRef]
  44. Kiang, M.V.; Santillana, M.; Chen, J.T.; Onnela, J.P.; Krieger, N.; Monsen, K.E.; Ekapirat, N.; Areechokchai, D.; Prempree, P.; Maude, R.J.; et al. Incorporating human mobility data improves forecasts of Dengue fever in Thailand. Sci. Rep. 2021, 11, 923. [Google Scholar] [CrossRef]
  45. Koh, Y.; Spindler, R.; Sandgren, M.; Jiang, J. A model comparison algorithm for increased forecast accuracy of dengue fever incidence in Singapore and the auxiliary role of total precipitation information. Int. J. Environ. Health Res. 2018, 28, 535–552. [Google Scholar] [CrossRef] [PubMed]
  46. Kukkar, A.; Kumar, Y.; Sandhu, J.K.; Kaur, M.; Walia, T.S. DengueFog: A Fog Computing-Enabled Weighted Random Forest-Based Smart Health Monitoring System for Automatic Dengue Prediction. Diagnostics 2024, 14, 624. [Google Scholar] [CrossRef]
  47. Dey, S.K.; Rahman, M.; Howlader, A.; Siddiqi, U.R.; Uddin, K.M.M.; Borhan, R.; Rahman, E.U. Prediction of dengue incidents using hospitalized patients, metrological and socio- economic data in Bangladesh: A machine learning approach. PLoS ONE 2022, 17, e0270933. [Google Scholar] [CrossRef]
  48. Kuo, C.Y.; Yang, W.W.; Chia, E.; Su, Y. Improving dengue fever predictions in Taiwan based on feature selection and random forests. BMC Infect. Dis. 2024, 24, 334. [Google Scholar] [CrossRef]
  49. Liu, K.; Wang, T.; Yang, Z.; Huang, X.; Milinovich, G.J.; Lu, Y.; Jing, Q.; Xia, Y.; Zhao, Z.; Yang, Y.; et al. Using Baidu Search Index to Predict Dengue Outbreak in China. Sci. Rep. 2016, 6, 38040. [Google Scholar] [CrossRef]
  50. Liu, K.; Zhang, M.; Xi, G.; Deng, A.; Song, T.; Id, Q.L. Enhancing fine-grained intra-urban dengue forecasting by integrating spatial interactions of human movements between urban regions. PLoS Neglected Trop. Dis. 2020, 14, e0008924. [Google Scholar] [CrossRef]
  51. Lu, X.; Teh, S.Y.; Tay, C.J.; Abu Kassim, N.F.; Fam, P.S.; Soewono, E. Application of multiple linear regression model and long short-term memory with compartmental model to forecast dengue cases in Selangor, Malaysia based on climate variables. Infect. Dis. Model. 2025, 10, 240–256. [Google Scholar] [CrossRef]
  52. Majeed, M.A.; Shafri, H.Z.M.; Wayayok, A.; Zulkafli, Z. Prediction of dengue cases using the attention-based long short-term memory (LSTM) approach. Geospat. Health 2023, 18, 1. [Google Scholar] [CrossRef]
  53. Majeed, M.A.; Zulhaidi, H.; Shafri, M.; Zulkafli, Z. A Deep Learning Approach for Dengue Fever Prediction in Malaysia Using LSTM with Spatial Attention. Int. J. Environ. Res. Public Health 2023, 20, 4130. [Google Scholar] [CrossRef]
  54. Majeed, M.A.; Shafri, H.Z.M.; Zulkafli, Z.; Wayayok, A. Dengue fever prediction using LSTM and integrated temporal—Spatial attention: A case study of Malaysia. Spat. Inf. Res. 2025, 33, 5. [Google Scholar] [CrossRef]
  55. Mayrose, H.; Sampathila, N.; Bairy, G.M.; Nayak, T.; Saravu, K. An explainable artificial intelligence integrated system for automatic detection of dengue from images of blood smears using transfer learning. IEEE Access 2024, 12, 41750–41762. [Google Scholar] [CrossRef]
  56. Al Mobin, M. Multivariate forecasting of dengue infection in Bangladesh: Evaluating the influence of data downscaling on machine learning predictive accuracy. BMC Infect. Dis. 2025, 25, 761. [Google Scholar] [CrossRef]
  57. Farisha, N.; Krishnan, M.; Ahmad, Z.; Ahmad, A.; Jamaludin, M. Predicting Dengue Outbreak based on Meteorological Data Using Artificial Neural Network and Decision Tree Models. Int. J. Inform. Vis. 2022, 6, 597–603. [Google Scholar]
  58. Mustaffa, N.A.; Zahari, S.M.; Farhana, N.A.; Nasir, N.; Azil, A.H. Forecasting the incidence of dengue fever in Malaysia: A comparative analysis of seasonal ARIMA, dynamic harmonic regression, and neural network models. Int. J. Adv. Appl. Sci. 2024, 11, 20–31. [Google Scholar] [CrossRef]
  59. Necesito, I.V.; Velasco, J.M.; Kwak, J.; Lee, J.H.; Lee, M.J.; Kim, J.S.; Kim, H.S. Combination of Univariate Long-Short Term Memory Network And Wavelet Transform For Predicting Dengue Case Density In The National Capital Region, The Philippines. Southeast Asian J. Trop. Med. Public Health 2021, 52, 479–494. [Google Scholar]
  60. Olmoguez, I.L.G.; Catindig, M.A.C.; Fel, M.; Amongos, L.; Lazan, F.G. Developing a Dengue Forecasting Model: A Case Study in Iligan City. Int. J. Adv. Comput. Sci. Appl. 2020, 10, 9. [Google Scholar] [CrossRef]
  61. Ong, J.; Liu, X.; Rajarethinam, J.; Kok, S.Y.; Liang, S.; Tang, S.; Cook, A.R.; Ng, L.C.; Yap, G. Mapping dengue risk in Singapore using Random Forest. PLoS Neglected Trop. Dis. 2018, 12, e0006587. [Google Scholar] [CrossRef]
  62. Ong, S.Q.; Isawasan, P.; Mohiddin, A.; Ngesom, M.; Shahar, H.; Lasim, A.; Nair, G. Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data. Sci. Rep. 2023, 13, 19129. [Google Scholar] [CrossRef]
  63. Patra, S.; Jana, S.; Adak, S.; Kar, T.K. Regular Article—Statistical and Nonlinear Physics A deep learning architecture using hybrid and stacks to forecast weekly dengue cases in Laos. Eur. Phys. J. B 2024, 97, 110. [Google Scholar] [CrossRef]
  64. Puengpreeda, A.; Yhusumrarn, S.; Sirikulvadhana, S. Weekly Forecasting Model for Dengue Hemorrhagic Fever Outbreak in Thailand. Eng. J. 2020, 24, 71–87. [Google Scholar] [CrossRef]
  65. Rahman, S.; Shiddik, A.B. Explainable artificial intelligence for predicting dengue outbreaks in Bangladesh using eco-climatic triggers. Glob. Epidemiol. 2025, 10, 100210. [Google Scholar] [CrossRef]
  66. Ren, H. Forecasting and mapping dengue fever epidemics in China: A spatiotemporal analysis. Infect. Dis. Poverty 2024, 13, 14–28. [Google Scholar] [CrossRef] [PubMed]
  67. Salim, N.A.M.; Wah, Y.B.; Reeves, C.; Smith, M.; Yaacob, W.F.W.; Mudin, R.N.; Dapari, R.; Sapri, N.N.F.F.; Haque, U. Prediction of dengue outbreak in Selangor Malaysia using machine learning techniques. Sci. Rep. 2021, 11, 939. [Google Scholar] [CrossRef]
  68. Salsabiila, N.J. CNN + BiGRU-Attention Classification and TiDE-PSO Forecasting Approach for Social Media-based Predictive Analysis of Dengue. Int. J. Intell. Eng. Syst. 2025, 18, 716–735. [Google Scholar] [CrossRef]
  69. Shaikh, S.G.; Sureshkumar, B.; Narang, G. Biomedical Signal Processing and Control Development of optimized ensemble classifier for dengue fever prediction and recommendation system. Biomed. Signal Process. Control 2023, 85, 104809. [Google Scholar] [CrossRef]
  70. Shi, Y.; Liu, X.; Kok, S.; Rajarethinam, J.; Liang, S.; Yap, G.; Chong, C.-S.; Lee, K.-S.; Tan, S.S.; Chin, C.K.Y.; et al. Three-Month Real-Time Dengue Forecast Models: An Early Warning System for Outbreak Alerts and Policy Decision Support in Singapore. Environ. Health Perspect. 2016, 124, 1369–1375. [Google Scholar] [CrossRef] [PubMed]
  71. Rahman, S. Dengue Early Warning System and Outbreak Prediction Tool in Bangladesh Using Interpretable Tree—Based Machine Learning Model. Health Sci. Rep. 2025, 8, e70726. [Google Scholar] [CrossRef] [PubMed]
  72. Stavelin Abhinandithe, K.; Madhu, B.; Balasubramanian, S.; Ramachandran, S. Forecasting Multivariate time-series data using LSTM Neural Network in Mysore district, Karnataka. Indian J. Public Health Res. Dev. 2022, 13, 2–7. [Google Scholar] [CrossRef]
  73. Tian, N.; Zheng, J.; Li, L.; Xue, J.; Xia, S.; Lv, S.; Zhou, X.-N. Precision Prediction for Dengue Fever in Singapore: A Machine Learning Approach Incorporating Meteorological Data. Trop. Med. Infect. Dis. 2024, 9, 72. [Google Scholar] [CrossRef]
  74. Tuan, D.A. Leveraging Climate Data for Dengue Forecasting in Ba Ria Vung Tau Province, Vietnam: An Advanced Machine Learning Approach. Trop. Med. Infect. Dis. 2024, 9, 250. [Google Scholar] [CrossRef]
  75. Wu, C.; Kao, S. Knowledge discovery in open data for epidemic disease prediction. Health Policy Technol. 2021, 10, 126–134. [Google Scholar] [CrossRef]
  76. Nejad, F.Y.; Varathan, K.D. Identification of significant climatic risk factors and machine learning models in dengue outbreak prediction. BMC Med. Inform. Decis. Mak. 2021, 21, 141. [Google Scholar] [CrossRef]
  77. Yeh, D.Y.; Leu, J.H.; Ye, S.; Cheng, C.H. An intelligent autoregressive-distributed lag model: A climate-driven approach for predicting dengue fever incidence in Taiwan cities. Acta Trop. 2025, 269, 107761. [Google Scholar] [CrossRef]
  78. Yi, C.; Vajdi, A.; Ferdousi, T.; Cohnstaedt, L.W.; Scoglio, C. PICTUREE—Aedes: A Web Application for Dengue Data Visualization and Case Prediction. Pathogens 2023, 12, 771. [Google Scholar] [CrossRef]
  79. Zhao, X.; Li, K.; Ke, C.; Ang, E.; Hao, K. Chaos, Solitons and Fractals A deep learning based hybrid architecture for weekly dengue incidences forecasting. Chaos Solitons Fractals 2023, 168, 113170. [Google Scholar] [CrossRef]
  80. Baquero, O.S.; Maria, L.; Santana, R.; Chiaravalloti-Neto, F. Dengue forecasting in São Paulo city with generalized additive models, artificial neural networks and seasonal autoregressive integrated moving average models. PLoS ONE 2018, 13, e0195065. [Google Scholar] [CrossRef]
  81. Bogado, J.V.; Schaerer, C.E.; Stalder, D.H.; Mart, G. Cluster-based LSTM models to improve Dengue cases forecast. CLEI Electron. J. 2023, 26, 1–14. [Google Scholar] [CrossRef]
  82. Bomfim, R.; Pei, S.; Shaman, J.; Yamana, T.; Makse, A.; Andrade, J.S.; Neto, A.S.L.; Furtado, V. Predicting dengue outbreaks at neighbourhood level using human mobility in urban areas. J. R. Soc. Interface 2020, 17, 20200691. [Google Scholar] [CrossRef]
  83. Campbell, K.M.; Haldeman, K.; Lehnig, C.; Munayco, C.V.; Halsey, S.; Laguna-torres, V.A.; Yagui, M.; Morrison, A.C.; Lin, C.-D.; Scott, T.W. Weather Regulates Location, Timing, and Intensity of Dengue Virus Transmission between Humans and Mosquitoes. PLoS Neglected Trop. Dis. 2015, 9, e0003957. [Google Scholar] [CrossRef]
  84. Chen, X.; Moraga, P. Forecasting Dengue across Brazil with LSTM Neural Networks and SHAP-Driven Lagged Climate and Spatial Effects. BMC Public Health 2024, 25, 973. [Google Scholar] [CrossRef] [PubMed]
  85. Chen, X.; Moraga, P. Dengue forecasting and outbreak detection in Brazil using LSTM: Integrating human mobility and climate factors. Infect. Dis. Model. 2025, 11, 338–354. [Google Scholar] [CrossRef]
  86. Cordeiro, C.; Lins, C.; Ana, D.L.; Gomes, C.; Machado, G.; Moreno, M.; Musah, A.; Aldosery, A.; Dutra, L.; Ambrizzi, T.; et al. Spatiotemporal forecasting for dengue, chikungunya fever and Zika using machine learning and artificial expert committees based on meta-heuristics. Res. Biomed. Eng. 2022, 38, 499–537. [Google Scholar] [CrossRef]
  87. Silva, S.T.; Gabrick, E.C.; Protachevicz, P.R.; Iarosz, K.C. When climate variables improve the dengue forecasting: A machine learning approach. Eur. Phys. J. Spéc. Top. 2025, 234, 555–569. [Google Scholar] [CrossRef]
  88. Ferdousi, T.; Cohnstaedt, L.E.E.W. A Windowed Correlation-Based Feature Selection Method to Improve Time Series Prediction of Dengue Fever Cases. IEEE Access 2021, 9, 141210–141222. [Google Scholar] [CrossRef]
  89. Hamlet, A.; Ramos, D.G.; Gaythorpe, K.A.M.; Pecego, A.; Romano, M.; Garske, T.; Ferguson, N.M. Seasonality of agricultural exposure as an important predictor of seasonal yellow fever spillover in Brazil. Nat. Commun. 2021, 12, 3647. [Google Scholar] [CrossRef]
  90. Id, G.K.; Id, F.L.; Id, L.C.; Buckee, C.; Santillana, M. Predicting dengue incidence leveraging internet-based data sources. A case study in 20 cities in Brazil. PLoS Neglected Trop. Dis. 2022, 16, e0010071. [Google Scholar] [CrossRef]
  91. Li, Z.; Gurgel, H.; Xu, L.; Yang, L.; Dong, J. Improving Dengue Forecasts by Using Geospatial Big Data Analysis in Google Earth Engine and the Historical Dengue Information-Aided Long Short Term Memory Modeling. Biology 2022, 11, 169. [Google Scholar] [CrossRef] [PubMed]
  92. Mills, C.; Falconi-agapito, F.; Carrera, J.; Munayco, C.V.; Moritz, U.G. Multi-model approach to understand and predict past and future dengue epidemic dynamics. R. Soc. Open Sci. 2025, 12, 1–32. [Google Scholar] [CrossRef]
  93. Mussumeci, E.; Coelho, F.C. Spatial and Spatio-temporal Epidemiology Large-scale multivariate forecasting models for Dengue—LSTM versus random forest regression. Spat. Spatiotemporal Epidemiol. 2020, 35, 100372. [Google Scholar] [CrossRef]
  94. Roster, K.; Connaughton, C.; Rodrigues, F.A. Practice of Epidemiology Machine-Learning—Based Forecasting of Dengue Fever in Brazilian Cities Using Epidemiologic and Meteorological Variables. Am. J. Epidemiol. 2022, 191, 1803–1812. [Google Scholar] [CrossRef] [PubMed]
  95. Sofía, B.; López, S.; Nolberto, D.C.; Antonio, J.; Gutiérrez, T.; López, Y.G. Traditional Machine Learning based on Atmospheric Conditions for Prediction of Dengue Presence. Comput. Sist. 2023, 27, 769–777. [Google Scholar] [CrossRef]
  96. Gendriz, I.S.; Souza GFDe Andrade IGMDe Duarte, A.; Neto, D.; Tavares, A.D.M. Data—Driven computational intelligence applied to dengue outbreak forecasting: A case study at the scale of the city of Natal, RN—Brazil. Sci. Rep. 2022, 12, 6550. [Google Scholar] [CrossRef]
  97. Sebastianelli, A.; Spiller, D.; Carmo, R.; Wheeler, J.; Nowakowski, A.; Jacobson, L.V.; Kim, D.; Barlevi, H.; Cordero, Z.E.R.; Colón-González, F.J.; et al. OPEN A reproducible ensemble machine learning approach to forecast dengue outbreaks. Sci. Rep. 2024, 14, 3807. [Google Scholar] [CrossRef]
  98. Soliman, M. Ensemble forecasting of the Zika space-time spread with topological data analysis. Environmetrics 2020, 31, e2629. [Google Scholar] [CrossRef]
  99. Souza, C.; Maia, P.; Stolerman, L.M.; Rolla, V.; Velho, L. Predicting dengue outbreaks in Brazil with manifold learning on climate data. Expert Syst. Appl. 2022, 192, 116324. [Google Scholar] [CrossRef]
  100. Theodorakos, K.; Broeckhove, J.; Willem, L. Examination of influencing factors and high-risk regions of dengue in Nicaragua, using spatiotemporal compartmental simulations. Trop. Med. Int. Health 2018, 22, 156–157. [Google Scholar]
  101. Id, N.Z.; Charland, K.; Carabali, M.; Nsoesie, E.O.; Maheu-, M.; Rees, E.; Yuan, M.; Balaguera, C.G.; Ramirez, G.J.; Zinszer, K. Machine learning and dengue forecasting: Comparing random forests and artificial neural networks for predicting dengue burden at national and sub-national scales in Colombia. PLoS Neglected Trop. Dis. 2020, 14, e0008056. [Google Scholar] [CrossRef]
  102. Akhtar, M.; Kraemer, M.U.G.; Gardner, L.M. A dynamic neural network model for predicting risk of Zika in real time. BMC Med. 2019, 17, 171. [Google Scholar] [CrossRef]
  103. Appice, A.; Gel, Y.R.; Iliev, I. A Multi-Stage Machine Learning Approach to Predict Dengue Incidence: A Case Study in Mexico. IEEE Access 2020, 8, 52713–52725. [Google Scholar] [CrossRef]
  104. Gutiérrez, R.A.C.; Márquez, D.C.A.; Gonzalez, N.P.B. Parallel prediction of dengue cases with different risks in Mexico using an artificial neural network model considering meteorological data. Int. J. Biometeorol. 2024, 68, 1043–1060. [Google Scholar] [CrossRef]
  105. Holcomb, K.M.; Staples, J.E.; Nett, R.J.; Beard, C.B.; Petersen, L.R. Multi-Model Prediction of West Nile Virus Neuroinvasive Disease with Machine Learning for Identification of Important Regional Climatic Drivers. GeoHealth 2023, 7, e2023GH000906. [Google Scholar] [CrossRef]
  106. Laureano-Rosario, A.E.; Duncan, A.P.; Mendez-Lazaro, P.A.; Garcia-Rejon, J.E.; Id, S.G.; Farfan-Ale, J.; Savic, D.A.; Muller-Karger, F.E. Application of Artificial Neural Networks for Dengue Fever Outbreak Predictions in the Northwest Coast of Yucatan, Mexico and San Juan, Puerto Rico. Trop. Med. Infect. Dis. 2018, 3, 5. [Google Scholar] [CrossRef] [PubMed]
  107. Yamana, T.K.; Kandula, S.; Shaman, J. Superensemble forecasts of dengue outbreaks. J. R. Soc. Interface 2016, 13, 20160410. [Google Scholar] [CrossRef]
  108. Salami, D.; Sousa, C.A.; Oliveira, R. Predicting dengue importation into Europe, using machine learning and model-agnostic methods. Sci. Rep. 2020, 10, 9689. [Google Scholar] [CrossRef] [PubMed]
  109. Mulwa, D.; Kazuzuru, B.; Misinzo, G. An XGBoost Approach to Predictive Modelling of Rift Valley Fever Outbreaks in Kenya Using Climatic Factors. Big Data Cogn. Comput. 2024, 8, 148. [Google Scholar] [CrossRef]
  110. Teurlai, M.; Eug, C.; Cavarero, V.; Degallier, N. Socio-economic and Climate Factors Associated with Dengue Fever Spatial Heterogeneity: A Worked Example in New Caledonia. PLoS Neglected Trop. Dis. 2015, 9, e0004211. [Google Scholar] [CrossRef]
  111. Id, C.M.B.; Shea, K.M.; Jenkins, H.E.; Id, L.Y.K.; Markuzon, N. Weekly dengue forecasts in Iquitos, Peru; San Juan, Puerto Rico; and Singapore. PLoS Neglected Trop. Dis. 2020, 14, e0008710. [Google Scholar] [CrossRef]
  112. Anh, D.; Vu, P.; Uyen, N. Acta Tropica Bridging the predictive divide: A hybrid early warning system for scalable and real-time dengue surveillance in LMICs. Acta Trop. 2025, 269, 107765. [Google Scholar] [CrossRef]
  113. Long, H.; Chen, Y.; Feng, J.; Chen, J.; Zhang, X.; Han, W. Annual global dengue dynamics are related to multi-source factors revealed by a machine learning prediction analysis. PLoS Neglected Trop. Dis. 2025, 19, e0013232. [Google Scholar] [CrossRef] [PubMed]
  114. Panja, M.; Chakraborty, T.; Shahid, S.; Ghosh, I. Chaos, Solitons and Fractals An ensemble neural network approach to forecast Dengue outbreak based on climatic condition. Chaos Solitons Fractals 2023, 167, 113124. [Google Scholar] [CrossRef]
  115. Li, Z.; Dong, J. Big Geospatial Data and Data-Driven Methods for Urban Dengue Risk Forecasting: A Review. Remote Sens. 2022, 14, 5052. [Google Scholar] [CrossRef]
  116. Kumar, S.; Vaishali, S.; Isha, S.; Sahil, M. An intelligent healthcare system for predicting and preventing dengue virus infection. Computing 2021, 105, 617–655. [Google Scholar] [CrossRef]
  117. Wallin, J.; Abiri, N.; Sewe, O. Articles Artificial intelligence to predict West Nile virus outbreaks with eco-climatic drivers. Lancet Reg. Health Eur. 2022, 17, 100370. [Google Scholar] [CrossRef]
  118. Coppola, N.; Alessio, L.; De Pascalis, S.; Macera, M.; Di Caprio, G.; Messina, V.; Onorato, L.; Minichini, C.; Stanzione, M.; Stornaiuolo, G.; et al. Effectiveness of test-and-treat model with direct-acting antiviral for hepatitis C virus infection in migrants: A prospective interventional study in Italy. Infect. Dis. Poverty 2024, 13, 39. [Google Scholar] [CrossRef] [PubMed]
  119. Chen, X.; Moraga, P. Assessing dengue forecasting methods: A comparative study of statistical models and machine learning techniques in Rio de Janeiro, Brazil. Trop. Med. Health 2025, 53, 52. [Google Scholar] [CrossRef]
  120. Leung, X.Y.; Islam, R.M.; Adhami, M.; Ilic, D.; McDonald, L.; Palawaththa, S.; Diug, B.; Munshi, S.U.; Karim, N. A systematic review of dengue outbreak prediction models: Current scenario and future directions. PLoS Negl. Trop. Dis. 2023, 17, e0010631. [Google Scholar] [CrossRef]
  121. Sutriyawan, A.; Rahardjo, M.; Martini, M.; Sutiningsih, D.; Rattanapan, C.; Abu Kassim, N.F. Global Forecasting Models for Dengue Outbreaks in Endemic Regions: A Systematic Review. Microbiol. Epidemiol. Immunobiol. 2025, 102, 331–342. [Google Scholar] [CrossRef]
  122. Hoyos, W.; Aguilar, J.; Toro, M. Dengue models based on machine learning techniques: A systematic literature review. Artif. Intell. Med. 2021, 119, 102157. [Google Scholar] [CrossRef] [PubMed]
  123. Liu, B.; Hossain, M.F.; Hossain, S. A comparative evaluation of multiple machine learning approaches for forecasting dengue outbreaks in Bangladesh. Sci. Rep. 2025, 15, 35931. [Google Scholar] [CrossRef]
  124. Nekorchuk, D.M.; Bharadwaja, A.; Simonson, S.; Ortega, E.; França, C.M.B.; Dinh, E.; Reik, R.; Burkholder, R.; Wimberly, M.C. The Arbovirus Mapping and Prediction (ArboMAP) system for West Nile virus forecasting. JAMIA Open 2024, 7, ooad110. [Google Scholar] [CrossRef]
  125. Jaya, I.G.; Andriyana, Y.; Tantular, B.; Pangastuti, S.S.; Kristiani, F. Spatiotemporal Dengue Forecasting for Sustainable Public Health in Bandung, Indonesia: A Comparative Study of Classical, Machine Learning, and Bayesian Models. Sustainability 2025, 17, 6777. [Google Scholar] [CrossRef]
  126. Adde, A.; Roucou, P.; Mangeas, M.; Ardillon, V.; Desenclos, J.-C.; Rousset, D.; Girod, R.; Briolant, S.; Quenel, P.; Flamand, C. Predicting Dengue Fever Outbreaks in French Guiana Using Climate Indicators. PLoS Negl. Trop. Dis. 2016, 10, e0004681. [Google Scholar] [CrossRef] [PubMed]
  127. Gianfredi, V.; Nucci, D.; Pennisi, F.; Provenzano, S. Knowledge and attitudes towards Zika virus: An Italian nation-wide cross-sectional study. Ann. Dell’ist. Super. Sanita 2022, 58, 34–41. [Google Scholar] [CrossRef]
  128. Pennisi, F.; Borlini, S.; Cuciniello, R.; D’Amelio, A.C.; Calabretta, R.; Pinto, A.; Signorelli, C. Improving Vaccine Coverage Among Older Adults and High-Risk Patients: A Systematic Review and Meta-Analysis of Hospital-Based Strategies. Healthcare 2025, 13, 1667. [Google Scholar] [CrossRef]
  129. Signorelli, C.; Pennisi, F.; Lunetti, C.; Blandi, L.; Pellissero, G.; Fondazione Sanità Futura, W.G. Quality of hospital care and clinical outcomes: A comparison between the Lombardy Region and the Italian national data. Ann. Ig. Med. Prev. Comunità 2024, 36, 234–249. [Google Scholar] [CrossRef]
  130. Pennisi, F.; Pinto, A.; Ricciardi, G.E.; Signorelli, C.; Gianfredi, V. The Role of Artificial Intelligence and Machine Learning Models in Antimicrobial Stewardship in Public Health: A Narrative Review. Antibiotics 2025, 14, 134. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flow diagram depicting the selection process.
Figure 1. Flow diagram depicting the selection process.
Make 08 00015 g001
Figure 2. Geographical distribution of studies by continent. The colour gradient represents the number of studies per continent, with darker shades indicating higher counts. Three studies involving multiple continents are not represented in the figure.
Figure 2. Geographical distribution of studies by continent. The colour gradient represents the number of studies per continent, with darker shades indicating higher counts. Three studies involving multiple continents are not represented in the figure.
Make 08 00015 g002
Figure 3. Geographical distribution of studies by country. Labels indicate countries with ≥2 studies; shaded countries without labels correspond to those represented by a single study. Nine multicountry studies are not displayed in the figure.
Figure 3. Geographical distribution of studies by country. Labels indicate countries with ≥2 studies; shaded countries without labels correspond to those represented by a single study. Nine multicountry studies are not displayed in the figure.
Make 08 00015 g003
Figure 4. Temporal evolution of artificial intelligence model types used in the included studies (2015–2025), considering both principal model and comparator algorithms.
Figure 4. Temporal evolution of artificial intelligence model types used in the included studies (2015–2025), considering both principal model and comparator algorithms.
Make 08 00015 g004
Figure 5. Frequency of feature types used as model predictors across the included studies. Each bar represents the total number of studies incorporating a given feature category.
Figure 5. Frequency of feature types used as model predictors across the included studies. Each bar represents the total number of studies incorporating a given feature category.
Make 08 00015 g005
Figure 6. Unweighted distributions of classification performance metrics across modelling families. The figure presents the distribution of seven classification metrics (accuracy, AUC, F1-score, NPV, PPV, sensitivity, and specificity) across five modelling families: classical machine learning, deep learning, hybrid/superensemble, mechanistic, and tree-ensemble approaches. Each subplot displays horizontal box-and-scatter representations in which boxplots are shown only when ≥3 observations were available; otherwise, point estimates are plotted individually. Metrics are reported as observed in the original studies without weighting, transformation, or standardisation.
Figure 6. Unweighted distributions of classification performance metrics across modelling families. The figure presents the distribution of seven classification metrics (accuracy, AUC, F1-score, NPV, PPV, sensitivity, and specificity) across five modelling families: classical machine learning, deep learning, hybrid/superensemble, mechanistic, and tree-ensemble approaches. Each subplot displays horizontal box-and-scatter representations in which boxplots are shown only when ≥3 observations were available; otherwise, point estimates are plotted individually. Metrics are reported as observed in the original studies without weighting, transformation, or standardisation.
Make 08 00015 g006
Figure 7. Root mean squared error (RMSE) distributions stratified by error magnitude across modelling families. Panels represent predefined RMSE ranges (≤1, 1–10, 10–1000, >1000). Within each panel, horizontal boxplots and jittered points display unweighted study-level estimates for tree-ensemble, classical machine-learning, deep-learning, hybrid/superensemble, time-series/statistical, and heuristic models.
Figure 7. Root mean squared error (RMSE) distributions stratified by error magnitude across modelling families. Panels represent predefined RMSE ranges (≤1, 1–10, 10–1000, >1000). Within each panel, horizontal boxplots and jittered points display unweighted study-level estimates for tree-ensemble, classical machine-learning, deep-learning, hybrid/superensemble, time-series/statistical, and heuristic models.
Make 08 00015 g007
Figure 8. Multi-panel visualisation of mean absolute error (MAE) values reported across included studies, stratified into four predefined magnitude ranges (very small ≤1, small 1–10, medium 10–1000, large >1000) to account for scale-dependent behaviour of absolute errors. Within each panel, MAE distributions are displayed by modelling family using horizontal boxplots with overlaid jittered points representing individual study estimates.
Figure 8. Multi-panel visualisation of mean absolute error (MAE) values reported across included studies, stratified into four predefined magnitude ranges (very small ≤1, small 1–10, medium 10–1000, large >1000) to account for scale-dependent behaviour of absolute errors. Within each panel, MAE distributions are displayed by modelling family using horizontal boxplots with overlaid jittered points representing individual study estimates.
Make 08 00015 g008
Figure 9. Comparative visualisation of regression-based performance metrics across modelling families. Multi-panel display summarising mean absolute percentage error (MAPE), mean squared error (MSE), coefficient of determination (R2), and Pearson’s correlation coefficient (r) for all included studies. Each subplot reports the unweighted distribution of study-level estimates across modelling families, illustrating differences in error magnitude, dispersion, and predictive agreement.
Figure 9. Comparative visualisation of regression-based performance metrics across modelling families. Multi-panel display summarising mean absolute percentage error (MAPE), mean squared error (MSE), coefficient of determination (R2), and Pearson’s correlation coefficient (r) for all included studies. Each subplot reports the unweighted distribution of study-level estimates across modelling families, illustrating differences in error magnitude, dispersion, and predictive agreement.
Make 08 00015 g009
Figure 10. Distribution of risk-of-bias assessments across the five PROBAST domains.
Figure 10. Distribution of risk-of-bias assessments across the five PROBAST domains.
Make 08 00015 g010
Table 1. Descriptive characteristics of included studies.
Table 1. Descriptive characteristics of included studies.
First Author
(Year)
Study DesignStudy PeriodSettingPopulationDiseaseDisease and Case DefinitionPrediction HorizonMissing/
Imbalanced Data Handling
Data-SplitImplementation Readiness
Akhtar (2019) [102]Model development/Modelling study2015–2016Multicentre (multi-country or multi-site)GPZikaCase counts—NAShort-term (weekly, ≤4 weeks)Advanced imputationTrain/Test (70/15)Pilot/proof-of-concept
Al Mobin (2024) [18]Forecasting study2010–2024NationalGPDengueMonthly incidence—suspected + labMedium-term (months, >3–12 months)Simple imputationTrain/Val/Test (70/10/20)Research only
Anggraeni (2021) [19]Model development/Modelling study2009–2019Community-based (field/surveillance in population)GPDengueMonthly incidence—NALong-term (≥1 year)NAK-fold CV (K = 5)Research only
Anno (2019) [21]Ecological/Spatiotemporal study1998–2015Sub-national (province/state/municipality)GPDengueMonthly incidence—NANANAK-fold CV (K = 8)Research only
Anno (2024) [22]Ecological/Spatiotemporal study2002–2020NationalGPDengueCase counts—suspected + labShort-term (weekly, ≤4 weeks)Data balancing/ResamplingTemporal (train = 2002–2017; val = 2018; test = 2019–2020)Research only
Appice (2020) [103]Model development/Modelling study1985–2010Sub-national (province/state/municipality)GPDengueMonthly incidence—NALong-term (≥1 year)NATemporal (train = January 1985–December 2009; test = January–December 2010)Research only
Baquero (2018) [80]Forecasting study2000–2016Community-based (field/surveillance in population)GPDengueNANASimple imputationTime-series CV (rolling) [train = January 2000–December 2014; val = with train 165 months + validate next 6 months; test = remainder]Research only
Benedum (2020) [111]Forecasting study2009–2016UrbanGPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)NATemporal (train = 4 years; test = 1 year)Research only
Bogado (2023) [81]Forecasting study2009–2013Community-based (field/surveillance in population)GPDengueWeekly incidence—NANANATemporal (train = (2009–2012); test = (2013))Research only
Bomfim (2020) [82]Forecasting study2007–2015Sub-national (province/state/municipality)GPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)NATemporal (train = 2011–2014; test = 2015–2016)Pilot/proof-of-concept
Buebos-Esteve (2024) [23]Ecological/Spatiotemporal study2016–2020Sub-national (province/state/municipality)GPDengueNA—suspected + labLong-term (≥1 year)NATemporalResearch only
Campbell (2015) [83]Ecological/Spatiotemporal study1994–2012Sub-national (province/state/municipality)GPDengueCase counts—suspected + labNASimple imputationTemporal (test = within 2005–2012 by exhaustive classification tree search)Research only
Carvajal (2018) [24]Forecasting study2009–2013Sub-national (province/state/municipality)GPDengueWeekly incidence—suspected + labShort/Medium-term (weeks to ≤3 months)Advanced imputationTemporal (train = 2009–2012; val = 2009–2012; test = 2013)Research only
Chen (2018) [25]Forecasting study2010–2016UrbanGPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)NATemporal (train = 2010–2015; test = 2016)Public health decision support
Chen (2024) [84]Ecological/Spatiotemporal study2016–2023Sub-national (province/state/municipality)GPDengueCase counts—suspected + labMedium-term (months, >3–12 months)NATemporal (train = January 2016–December 2022; test = January 2023–December 2023)Research only
Chen (2025) [85]Ecological/Spatiotemporal study2016–2023Multicentre (multi-country or multi-site)GPDengueCase counts—suspected + labShort-term (weekly, ≤4 weeks)NATemporal (train = 2016–2022; test = 2023)Research only
Cheng (2025) [26]Forecasting study2005–2024Sub-national (province/state/municipality)GPDengueMonthly incidence—suspected + labLong-term (≥1 year)Simple imputationTrain/Test (70/30)Research only
Chowdhury (2025) [27]Model development/Modelling study2000–2022NationalGPDengueMonthly incidence—labMedium-term (months, >3–12 months)Data cleaning/Exclusion/NormalizationTrain/Test (80/20)Conceptual/simulation study
Conde-Gutiérrez (2024) [104]Forecasting study2012–2019Sub-national (province/state/municipality)GPDengueNAShort-term (weekly, ≤4 weeks)NATrain/Test (80/20)Research only
da Silva (2022) [86]Forecasting study2009–2017Community-based (field/surveillance in population)GPDengue, Chikungunya, ZikaMonthly incidence—NAMedium-term (months, >3–12 months)NAK-fold CV (K = 10)Research only
da Silva (2025) [87]Forecasting study2016–2019
Iquitos: 2001–2012 (597 weeks)
Barranquilla: 2011–2016 (307 weeks)
UrbanGPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)Simple imputationTrain/Test (70/30)Research only
Dala (2021) [29]Forecasting study2008–2018Community-based (field/surveillance in population)GPDengueCase counts—NALong-term (≥1 year)NATemporal (val = with multiple hold-outs: 50/50)Research only
Dang Anh Tuan (2025) [112]Model development/Modelling study2020–2023
Vietnam (2018–2023)
Sub-national (province/state/municipality)GPDengueNA—labNAData cleaning/Exclusion/NormalizationTemporalConceptual/simulation study
Dhaked (2025) [30]Model development/Modelling study2015–2021UrbanGPDengueMonthly incidence—labMedium-term (months, >3–12 months)Data cleaning/Exclusion/NormalizationTrain/Test (80/20)Research only
Doni (2020) [31]Model development/Modelling study2015–2019NationalGPDengueCase counts—NALong-term (≥1 year)NATemporal (train = 2015–2018; test = 2019)Research only
Edussuriya (2021) [32]Forecasting study2010–2019NationalGPDengueMonthly incidence—suspected + labMedium-term (months, >3–12 months)Data balancing/ResamplingTemporal (train = 2010–2018; test = January–March 2019)Pilot/proof-of-concept
Farooq (2022) [91]Forecasting study2010–2019Community-based (field/surveillance in population)GPWest NileCase counts—suspectedLong-term (≥1 year)Data balancing/ResamplingK-fold CV (K = 5)Research only
Ferdousi (2021) [88]Forecasting study2010–2019Community-based (field/surveillance in population)GPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)NATrain/Val/Test (unspecified)Research only
Francisco (2024) [33]Forecasting study2009–2013Sub-national (province/state/municipality)GPDengueWeekly incidence—suspected + labNANATemporal (train = 2009–2012; test = 2013)Research only
Guo (2017) [34]Forecasting study2011–2014Sub-national (province/state/municipality)GPDengueWeekly incidence—suspected + labShort-term (weekly, ≤4 weeks)NATemporal (train = 2011–2013; test = 2014)Research only
Hamlet (2021) [89]Ecological/Spatiotemporal study2003–2018Sub-national (province/state/municipality)GP + NHPYellow feverCase counts—suspected + labShort-term (weekly, ≤4 weeks)Data cleaning/Exclusion/NormalizationTemporal (train = 60–70/Test 30_40; test = 30_40)Proof-of-concept/Early research
Handari (2021) [35]Forecasting study2009–2017Community-based (field/surveillance in population)GPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)Simple imputationNAResearch only
Holcomb (2023) [105]Model development/Modelling study2015–2021NationalGPWest NileNA—labLong-term (≥1 year)NATemporal (train = 2015–2019; test = 2020–2021)Pilot/proof-of-concept
Husin (2016) [36]Forecasting studyNASub-national (province/state/municipality)GPDengueWeekly incidence—NANANANAResearch only
Islam (2024) [37]Forecasting study2001–2023NationalGPDengueMonthly incidence—NALong-term (≥1 year)NATrain/Test (80/20)Research only
Ismail (2022) [38]Model development/Modelling study2010–2019Sub-national (province/state/municipality)GPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)Simple imputationK-fold CV (K = 10)Pilot/proof-of-concept
Javaid (2023) [39]Model development/Modelling study2014–2018Sub-national (province/state/municipality)GPDengueCase counts—suspected + labNASimple imputationTemporal (test = split)Public health decision support
Jayabalan (2024) [40]Model development/Modelling study2003–2021National and sub-nationalGPDengueMonthly incidence—NAMedium-term (months, >3–12 months)NATrain/Test (70/30)Research only
Kerdprasop (2020) [41]Model development/Modelling study2003–2017UrbanGPDengueMonthly incidence—NAMedium-term (months, >3–12 months)NATemporal (train = 2003–2015 (156 records); test = 2016–2017 (24 records))Research only
Kesorn (2015) [43]Model development/Modelling study2007–2013Sub-national (province/state/municipality)GPDengueNANAData cleaning/Exclusion/NormalizationK-fold CV (K = 10)Research only
Kiang (2021) [44]Forecasting study2010–2017NationalGPDengueMonthly incidence—labMedium-term (months, >3–12 months)NATemporal (test = 42 months (January-2010 → June-2013))Pilot/proof-of-concept
Koh (2018) [45]Forecasting study2016NationalGPDengueWeekly incidence—labNANANAResearch only
Koplewitz (2022) [90]Forecasting study2010–2016Sub-national (province/state/municipality)GPDengueWeekly incidence—suspected + labShort/Medium-term (weeks to ≤3 months)NATemporal (train = 2010–2015; test = 2016)Pilot/proof-of-concept
Kukkar (2024) [46]Model development/Modelling study2016–2020Hospital-based/ClinicalPtsDengueNANANANAConceptual/simulation study
Kumar Dey (2022) [47]Model development/Modelling study2011–2021NationalPtsDengueCase counts—NAMedium-term (months, >3–12 months)Simple imputationTrain/Test (80/20)Conceptual/simulation study
Kuo (2024) [48]Ecological/Spatiotemporal study2013–2015UrbanGPDengueNANAData cleaning/Exclusion/NormalizationTrain/Test (80/20)Research only
Laureano Rosario (2018) [106]Model development/Modelling study1994–2012Community-based (field/surveillance in population)GPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)NATrain/Val/Test (unspecified)Research only
Li (2022) [115]Forecasting study2007–2019Community-based (field/surveillance in population)GPDengueWeekly incidence—suspected + labShort-term (weekly, ≤4 weeks)NATemporal (train = 2007–2015; val = 2016–2017; test = 2018–2019)Research only
Li (2022) [91]Forecasting study2013–2020Sub-national (province/state/municipality)GPDengueWeekly incidence—suspected + labShort-term (weekly, ≤4 weeks)NATemporal (train = 2013-mid 2019 (326 weeks); test = until December 2020 (92 w))Research only
Liu (2016) [49]Forecasting study2010–2014Sub-national (province/state/municipality)GPDengueWeekly incidence—NANANANAResearch only
Liu (2020) [50]Model development/Modelling study2015–2019Sub-national (province/state/municipality)GPDengueCase counts—NAShort/Medium-term (weeks to ≤3 months)NATemporal (train = 2015–2018; test = January–September 2019)Research only
Long (2025) [113]Model development/Modelling study1990–2018Multicentre (multi-country or multi-site)GPDengueNA—suspected + labLong-term (≥1 year)Simple imputationK-fold CV (K = 4)Research only
Lu (2025) [51]Forecasting study2014–2020NationalGPDengueCase counts—labShort/Medium-term (weeks to ≤3 months)NATemporal (train = 2014–2018 (Weeks 1–261); test = 2019–2020 (Weeks 262–365))Research only
Majeed (2023) [53]Forecasting study2010–2017Sub-national (province/state/municipality)GPDengueMonthly incidence—labShort-term (weekly, ≤4 weeks)NATemporal (test = 2010–2016)Research only
Majeed (2025) [54]Forecasting study2011–2016NationalGPDengueWeekly incidence—labShort/Medium-term (weeks to ≤3 months)NATemporal (test = with temporal partitioning (not cross-country))Research only
Majeed2 (2023) [52]Forecasting study2010–2016Sub-national (province/state/municipality)GPDengueMonthly incidence—suspected + labShort-term (weekly, ≤4 weeks)NATemporal (test = 2010–2016)Research only
Mayrose (2024) [55]Model development/Modelling study2010–2019NationalGPDengueNAMedium-term (months, >3–12 months)Data balancing/ResamplingTrain/Val/Test (70/20/10)Research only
Mills (2025) [92]Forecasting study2010–2021Sub-national (province/state/municipality)GPDengueMonthly incidence—suspected + labNANATemporal (train = 2010–2017; test = 2018–2021)Research only
Mobin (2025) [56]Forecasting study2010–2023NationalGPDengueMonthly incidence—suspected + labLong-term (≥1 year)Simple imputationTrain/Test (80/20)Research only
Muhamad Krishnan (2022) [57]Model development/Modelling study2015–2019Community-based (field/surveillance in population)GPDengueCase counts—NANAData cleaning/Exclusion/NormalizationTrain/Test (unspecified)Research only
Mulwa (2024) [109]Model development/Modelling study1981–2010NationalGPRift Valley feverMonthly incidence—labMedium-term (months, >3–12 months)Simple imputationTrain/Test (80/20)Research only
Mussumeci (2020) [93]Model development/Modelling study2010–2018Sub-national (province/state/municipality)GPDengueWeekly incidence—suspected + labShort-term (weekly, ≤4 weeks)NATemporal (train = January 2010–June 2017; val = July 2017–June 2018))Research only
Mustaffa (2024) [58]Forecasting study2017–2022NationalGPDengueWeekly incidence—labNANATemporal (train = 207 weeks (2017–2020); test = 99 weeks (2021–2022))Research only
Necesito (2021) [59]Forecasting study1994–2018Community-based (field/surveillance in population)GPDengueMonthly incidence—NAMedium-term (months, >3–12 months)NATemporal (train = 1994–2015)Research only
Ningrum (2024) [20]Ecological/Spatiotemporal study2014–2021Sub-national (province/state/municipality)GPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)Data balancing/ResamplingTrain/Test (80/20)Pilot/proof-of-concept
Olmoguez (2019) [60]Model development/Modelling study2008–2017Sub-national (province/state/municipality)GPDengueMonthly incidence—NANASimple imputationNAResearch only
Ong (2018) [61]Ecological/Spatiotemporal study2006–2016UrbanGPDengueCase counts—suspected + labShort-term (weekly, ≤4 weeks)NATemporal (train = 2006–2013)Operational use in surveillance
Ong (2023) [62]Model development/Modelling study2018–2020Sub-national (province/state/municipality)GPDengueNANANATrain/Test (70/30)Research only/proof-of-concept
Panja (2023) [114]Forecasting study1991–2012Community-based (field/surveillance in population)GPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)NATemporal (train = 1170; test = 1144)Research only
Patra (2025) [63]Forecasting study2013–2023NationalGPDengueWeekly incidence—NANANATemporal (train = October 2013–July 2020; test = last 30% (July 2020–May 2023))Research only
Puengpreedaa (2020) [64]Model development/Modelling study2014–2018Sub-national (province/state/municipality)GPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)NAK-fold CV (K = 5)Research only
Rahman (2025) [65]Model development/Modelling study2000–2023NationalGPDengueCase counts—labMedium-term (months, >3–12 months)NATemporal (train = 2000–2019; test = 2020–2023)Research only
Ren (2024) [66]Forecasting study2003–2022Sub-national (province/state/municipality)GPDengueNALong-term (≥1 year)Data balancing/ResamplingTrain/Test (70/30)Research only
Roster (2023) [94]Forecasting study2007–2019Sub-national (province/state/municipality)GPDengueMonthly incidence—suspected + labShort-term (weekly, ≤4 weeks)Data balancing/ResamplingTemporal (train = 2007–2016 Test 2016–2019; test = 2016–2019)Research only
Salami (2020) [108]Model development/Modelling study2010–2015Multicentre (multi-country or multi-site)TDengueCase counts—labMedium-term (months, >3–12 months)Data balancing/ResamplingTemporal (test = split)Research only
Salim (2021) [67]Ecological/Spatiotemporal study2013–2017Sub-national (province/state/municipality)GPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)Simple imputationTrain/Test (70/30)Research only
Salsabiila (2025) [68]Model development/Modelling study2010–2020National and sub-nationalGPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)Data balancing/ResamplingTrain/Test (80/20)Research only
Sánchez López (2023) [95]Model development/Modelling study2010–2020Community-based (field/surveillance in population)GPDengueWeekly incidence—suspected + labNAData cleaning/Exclusion/NormalizationK-fold CV (K = 5)Research only
Sanchez-Gendriz (2022) [96]Forecasting study2016–2019UrbanGPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)NANAPilot/proof-of-concept
Sebastianelli (2024) [97]Forecasting study2001–2019NationalGPDengueCase counts—NANANATemporal (train = 2001–2016; val = 2017–2019 (Brazil))Pilot/proof-of-concept
Shaikh (2023) [69]Model development/Modelling studyNAOther (benchmark dataset)GPDengueNA—suspected + labNANANAResearch only
Shi (2016) [70]Forecasting study2001–2012UrbanGPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)NATemporal (train = 2001–2010; val = 2011–2012)Operational use in surveillance
Siddikur Rahman (2025) [71]Forecasting study2000–2021NationalGPDengueWeekly incidence—NANAAdvanced imputationTemporal (train = 2000–2018; test = 2019–2021)Research only
Soliman (2020) [98]Model development/Modelling study2017–2018Sub-national (province/state/municipality)GPZikaMonthly incidence—NALong-term (≥1 year)NATemporal (train = 2017; test = 2018)Research only
Sood (2020) [116]Model development/Modelling studyNAHospital-based/ClinicalPtsDengueNANANAK-fold CV (K = 10)Research only
Souza (2022) [99]Forecasting study2002–2012Community-based (field/surveillance in population)GPDengueNALong-term (≥1 year)NATemporal (train= first 11 years; val= noise-augmented set per tuning (Gaussian noise); test= last 3–5 years for cities)Research only
Stavelin (2022) [72]Forecasting study2006–2019Sub-national (province/state/municipality)GPDengueMonthly incidence—suspected + labLong-term (≥1 year)Data cleaning/Exclusion/NormalizationNAResearch only
Teurlai (2015) [110]Model development/Modelling study1995–2012Sub-national (province/state/municipality)GPDengueCase counts—suspected + labNANANAResearch only
Theodorakos (2017) [100]Ecological/Spatiotemporal study2002–2002NationalGPDengueMonthly incidence—NANANATemporal (train = and validation on same epidemic season (2002); val = on same epidemic season (2002))Conceptual/simulation study
Tian (2024) [73]Ecological/Spatiotemporal study2012–2022NationalGPDengueCase counts—suspected + labShort-term (weekly, ≤4 weeks)Simple imputationTrain/Test (80/20)Research only
Tuan (2024) [74]Forecasting study2010–2020Sub-national (province/state/municipality)GPDengueMonthly incidence—labMedium-term (months, >3–12 months)Simple imputationTemporal (train = January 2010–October 2018; test = November 2018–December 2020)Research only
Wu (2021) [75]Model development/Modelling study2005–2016UrbanGPDengue, Enterovirus, InfluenzaWeekly incidence—labShort-term (weekly, ≤4 weeks)Data balancing/ResamplingTrain/Test (83/17)Research only
Yamana (2016) [107]Model development/Modelling study1990–2013Sub-national (province/state/municipality)GPDengueWeekly incidence—labNANATemporal (train = seasons 1–14; test = seasons 15–23)Research only
Yavari Nejad (2021) [76]Ecological/Spatiotemporal study2010–2013NationalGPDengueWeekly incidence—labShort-term (weekly, ≤4 weeks)Data cleaning/Exclusion/NormalizationTemporal (train = 75/Test 25; test = 25)Pilot/proof-of-concept
Yeh (2025) [77]Forecasting study2014–2018UrbanGPDengueWeekly incidence—labNANATemporal (train = 257 weeks; test = last 4 weeks)Research only
Yi (2023) [78]Model development/Modelling studyNASub-national (province/state/municipality)GPDengueCase counts—suspected + labMedium-term (months, >3–12 months)NATemporal (train = epidemic curves from historical datasets (1960–2012) Test: Malaysian outbreak of 2022 (3 timepoints predictions); test = Malaysian outbreak of 2022 (3 timepoints predictions))Public health decision support
Zhao (2020) [101]Forecasting study2014–2018Sub-national (province/state/municipality)GPDengueCase counts—suspected + labShort-term (weekly, ≤4 weeks)Data cleaning/Exclusion/NormalizationTrain/Test (80/20)Pilot/proof-of-concept
Zhao (2023) [79]Forecasting study2012–2022UrbanGPDengueWeekly incidence—NAShort-term (weekly, ≤4 weeks)NATrain/Test (70/30)Research only
GP = General population; lab = laboratory; Pts = Patients (hospital/clinical cases); T = Travelers; val = validation.
Table 2. Results of classification metrics for the principal AI models used in the included studies.
Table 2. Results of classification metrics for the principal AI models used in the included studies.
First Author (Year)Principal AI ModelAI CategoryN Variables Included vs. ConsideredValidationAUCSensitivitySpecificityPPV/PrecisionNPVAccuracyF1-Score
Akhtar (2019) [102]NARX NNClassical ML11/16INT1 w 0.91–0.95, 2 w 0.91–0.93, 4 w 0.83–0.87, 8 w 0.75–0.80, 12 w 0.70–0.74NANANANA1 w 0.94, 2 w 0.92, 4 w 0.88, 8 w 0.82, 12 w 0.78NA
Al Mobin (2024) [18]DT + Sequential Squeeze FSClassical ML12/13INT 5-fold TSCVNANANANANA0.82NA
Anggraeni (2021) [19]BiLSTMDeep LearningNAINTNANANANANANANA
Anno (2019) [21]CNNDeep Learning4/2INT 8-fold CVNANANANANA1.0 → 0.81 (longitude-time), 0.75 (latitude-time), 0.48 → 0.26 (longitude-latitude)NA
Anno (2024) [22]CNNDeep Learning4/4INT train/val/test splitNANANANANASST + Rainfall + SWR 1.00, 0.51; SST only 1.00, 0.51; Rainfall only 1.00, 0.51; SWR only 1.00, 0.51; Rainfall + SWR 1.00, 0.51NA
Appice (2020) [103]AutoTiC-NNClassical ML1/2EXTNANANANANANANA
Baquero (2018) [80]GAM, ANN (MLP), LSTMHybrid/Superensemble4/NAINTNANANANANANANA
Benedum (2020) [111]RFTree Ensemble5/5INT TS split 4 y train; 1 y testNANANANANANANA
Bogado (2023) [81]LSTMDeep Learning4/4INTNANANANANANANA
Bomfim (2020) [82]NNClassical ML2/2INT TS split train 2011–2014; test 2015–2016NA0.91NA0.92NANA0.92
Buebos-Esteve (2024) [23]RFTree Ensemble4/4INT + EXT nested resampling (spatiotemporal LOOCV internal; 3-fold CV external)NANANANANANANA
Campbell (2015) [83]DTClassical ML2/2INTNA0.950.95NANANANA
Carvajal (2018) [24]RFTree Ensemble5/19INTNANANANANANANA
Chen (2018) [25]LASSOClassical ML73/73INT1 w 0.88; 2 w 0.86; 4 w 0.82; 8 w 0.78; 12 w 0.76NANANANANANA
Chen (2024) [84]LSTM + SHAPDeep Learning7/17INTNANANANANANANA
Chen (2025) [85]LSTMDeep Learning4/4INTNAMean threshold, mean + 2SD threshold = Manaus 0.76, 0.60, Belém 0.08, 0.00, Fortaleza 0.75, 0.00, Salvador 0.71, 0.73, Brasília 0.92, 1.00, Goiânia 0.69, 0.57, Belo Horizonte 0.73, 0.72, Rio de Janeiro 0.88, 0.88, São Paulo 0.90, 0.88, Curitiba 0.85, 0.78Mean threshold, mean + 2SD threshold = Manaus 1.00, 1.00, Belém 0.90, 1.00, Fortaleza 0.98, 1.00, Salvador 0.33, 0.71, Brasília 1.00, 0.95, Goiânia 0.95, 0.98, Belo Horizonte 1.00, 1.00, Rio de Janeiro, 1.00, São Paulo 1.00, 1.00, Curitiba 1.00, 0.94NANAMean threshold, mean + 2SD threshold = Manaus 0.92, 0.88, Belém 0.69, 0.96, Fortaleza 0.96, 0.96, Salvador 0.69, 0.73, Brasília 0.94, 0.98, Goiânia 0.88, 0.92, Belo Horizonte 0.79, 0.79, Rio de Janeiro 0.88, 0.92, São Paulo 0.90, 0.88, Curitiba 0.90, 0.83Mean threshold, mean + 2SD threshold = Manaus 0.87, 0.75, Belém 0.11, 0.00, Fortaleza 0.75, 0.00, Salvador 0.81, 0.83, Brasília 0.96, 0.98, Goiânia 0.75, 0.67, Belo Horizonte 0.85, 0.84, Rio de Janeiro 0.94, 0.94, São Paulo 0.95, 0.94, Curitiba 0.93, 0.86
Cheng (2025) [26]Feature selection: Regression + fuzzy c-means + IHLOA; Classificators: SVM, KNN, RFHybrid/Superensemble3/13 (Zhejiang), 9/13 (Guangdong)INTNANANANANAGuangdong SVM 0.96, Guangdong KNN 0.96, Guangdong RF 0.96, Zhejiang SVM 0.96, Zhejiang KNN 0.96, Zhejiang RF 0.96Guangdong SVM 0.96, Guangdong KNN 0.96, Guangdong RF 0.96, Zhejiang SVM 0.96, Zhejiang KNN 0.96, Zhejiang RF 0.96
Chowdhury (2025) [27]ANN, XGBHybrid/Superensemble4/7INT 10-fold CVNANANANANANANA
Conde-Gutiérrez (2024) [104]ANNClassical ML5/5INTNANANANANANANA
da Silva (2022) [86]RFTree Ensemble44/44INTNANANANANANANA
da Silva (2025) [87]RFTree Ensemble2/2INT 70/30 temporal splitNANANANANANANA
Dala (2021) [29]Backpropagation NNClassical ML4/4INTNANANANANANANA
Dang Anh Tuan (2025) [112]GLM + XGB, LSTMHybrid/SuperensembleNANANANANANANA0.80–0.90NA
Dhaked (2025) [30]1D-CNNDeep Learning4/4INT 80/20 splitNANANANANANANA
Doni (2020) [31]LSTMDeep Learning6/6INTNANANANANACases overall: 0.89; deaths overall: 0.81NA
Edussuriya (2021) [32]LSTM + Grey Wolf OptimizerDeep Learning4/3INT TS split train 2010–2018; test January–March 2019NANANANANANANA
Farooq (2022) [91]XGB + SHAPTree Ensemble57/57INT2018 0.97; 2019 0.932018 0.86; 2019 0.692018 0.95; 2019 0.93NANANANA
Ferdousi (2021) [88]GRU, LSTMDeep Learning12/12INTNANANANANANANA
Francisco (2024) [33]Hybrid ML (CIF, RF, GAM, ANN, SVM/SVR, XGB)Hybrid/Superensemble8–30/8–30INT TS split train 2009–2012; test 2013GAM 0.69, RF 0.79, CIF 0.79, SVM 0.75, ANN 0.71, XGB 0.79NANANANAGAM 0.49, RF 0.59, CIF 0.77, SVM 0.57, ANN 0.51, XGB 0.59NA
Guo (2017) [34]SVRClassical ML5/12INTNANANANANA>0.90NA
Hamlet (2021) [89]BRTTree Ensemble18/18INT SB-CV (~200 bootstraps)0.93 (95% CI: 0.90–0.96)NANANANANANA
Handari (2021) [35]LSTMDeep Learning4/4INTNANANANANANANA
Holcomb (2023) [105]RF, NNHybrid/Superensemble12/>20INT temporal split train 2015–2019; test 2020–2021; LOOCV by year/stateNANANANANANANA
Husin (2016) [36]GANNOther/HeuristicNANANANANANANANANA
Islam (2024) [37]LSTMDeep Learning1/1INT hold-out TSNANANANANA0.71NA
Ismail (2022) [38]RFTree Ensemble13/13INT 10-fold CV0.980.970.96NANA0.95NA
Javaid (2023) [39]RFTree Ensemble23/16 (18 after preprocessing)INT random 80/20 split; k-fold CVNA0.97NA0.96NA0.940.97
Jayabalan (2024) [39]GBTree Ensemble3/3INT 70/30 train-test splitBangkok 0.97; Bangladesh 0.98Bangkok 0.98; Bangladesh 0.96NABangkok 0.96; Bangladesh 0.99NABangkok 0.97; Bangladesh 0.98Bangkok 0.96; Bangladesh 0.98
Kerdprasop (2020) [40]ANFISOther/Heuristic3/8EXTNANANANANANANA
Kesorn (2015) [43]SVM with kernel RBF (SVM-R)Classical ML9/9INT 10-fold CVNA0.940.94NANA0.88NA
Kiang (2021) [44]LASSOClassical MLNA/77INTNANANANANANANA
Koh (2018) [45]NN(AR(2)) with rainfallTime-series/Statistical2/4INTNANANANANANANA
Koplewitz (2022) [90]RFTree Ensemble10–15/NAINT TS split train 2010–2015; test 2016 rolling OOSNANANANANANANA
Kukkar (2024) [46]WRFMechanisticNAINT 10-fold CVNA0.880.950.85NA0.940.86
Kumar Dey (2022) [47]SVRClassical ML4/4INT 80/20 train-test; CV on same datasetNANANANANA0.75NA
Kuo (2024) [48]RFTree Ensemble121/121INT 10-fold CV0.950.970.73NANA0.87NA
Laureano Rosario (2018) [106]ANNClassical ML9/12INTPuerto Rico < 24 y: 0.91; Puerto Rico < 5 & 65 y: 0.71; Mexico < 24 y: 0.88; Mexico < 5 & 65 y: 0.90NANANANAPuerto Rico < 24 y: 0.47; Puerto Rico < 5 & 65 y: 0.58; Mexico < 24 y: 0.51; Mexico < 5 & 65 y: 0.66Puerto Rico < 24 y: 0.97; Puerto Rico <5 & 65 y: 0.81; Mexico < 24 y: 0.80; Mexico < 5 & 65 y: 0.73
Li (2022) [115]LSTMDeep Learning7/7INTNANANANANANANA
Li (2022) [91]LSTM, LSTM + AttentionDeep Learning6/6INT TS splitNANANANANANANA
Liu (2016) [49]CARTClassical ML1/2INT 10-fold CVNAGuangzhou 0.87; Zhongshan 0.96Guangzhou 0.92; Zhongshan 0.94NANAGuangzhou 0.92; Zhongshan 0.95NA
Liu (2020) [50]LSTMDeep Learning144/144INTNANANANANANANA
Long (2025) [113]RF, XGB, SVR, MLPHybrid/Superensemble28/28INT 4-fold CVNANANANANANANA
Lu (2025) [51]MLR, LSTM, SI-SIRHybrid/Superensemble5/5INT temporal validation train 2014–2018; test 2019–2020NANANANANANANA
Majeed (2023) [53]LSTMDeep Learning9/9INT vs. benchmark modelNANANANANANANA
Majeed (2025) [54]LSTMDeep Learning8–10/8–10INTNANANANANANANA
Majeed2 (2023) [52]LSTMDeep Learning9/9INT vs. benchmark modelNANANANANANANA
Mayrose (2024) [55]MobileNetV3SmallDeep Learning12/20INT train/val/test split0.98 ± 0.010.97 ± 0.030.99 ± 0.010.99 ± 0.01NA0.98 ± 0.010.98 ± 0.01
Mills (2025) [92]Median EnsembleHybrid/Superensemble6/6INT0.880.820.94NANANANA
Mobin (2025) [56]DT, RF, GB, XGB, SVR, KNN (Daily dataset)Hybrid/Superensemble78–86/78–86INT 5-fold TSCV on train 80%; independent hold-out test 20%NANANANANADT: 0.93; RF: 0.96; XGB: 0.93; GB: 0.92; SVR: 0.90; KNN: 0.89NA
Muhamad Krishnan (2022) [57]ANNClassical ML4/4INTNA0.990.01NANA0.69NA
Mulwa (2024) [109]XGBTree Ensemble5/5INT 80/20 split; 5-fold CV for tuning0.890.99NA0.99NA1.00NA
Mussumeci (2020) [93]LASSO, LSTM, RFHybrid/Superensemble6/6INT TS split train January 2010–June 2017; val/test July 2017–June 2018NANANANANANANA
Mustaffa (2024) [58]NNARTime-series/Statistical1/1INT train/test splitNANANANANANANA
Necesito (2021) [59]LSTMDeep Learning1/1INTNANANANANANANA
Ningrum (2024) [20]ETC (best model), CatBoost, XGB, LightGBM, LSTM, CBR, GB, OMP, Huber RegressorHybrid/SuperensembleNAINT 80/20 train-test splitETC: 0.95ETC: 0.61NAETC: 0.89NAETC: 0.89ETC: 0.72
Olmoguez (2019) [60]RFTree Ensemble2/8INTNANANANANANANA
Ong (2018) [61]RFTree Ensemble8/8EXT temporal validation train 2006–2013; test 2014–2016NANANANANANANA
Ong (2023) [62]LR, DT, RF, SVM, NB, XGB, AdaBoost + BorutaHybrid/Superensemble7/8INT train/val splitML and Boruta features selection: LR 0.79, DT 0.65, RF 0.75, SVM 0.82, XGB 0.72, AdaBoost 0.62NANANANANANA
Panja (2023) [114]XEWNetDeep Learning2/2INTNANANANANANANA
Patra (2025) [63]CNN + BiLSTMDeep Learning1/1INT train/test splitNANANANANANANA
Puengpreedaa (2020) [64]RF, AdaBoost, ETC, LASSOHybrid/SuperensembleNAINTNANANANANANANA
Rahman (2025) [65]XGB, LightGBMTree Ensemble18/18INT 10-fold CV; independent test setXGBoost 0.89, LightGBM 0.84LightGBM 0.96LightGBM 0.98LightGBM 0.97, XGBoost 0.95LightGBM 0.98, XGBoost 0.96LightGBM 0.97, XGBoost 0.95LightGBM 0.96, XGBoost 0.95
Ren (2024) [66]RFTree Ensemble11/11INT 5-fold CV on train 2003–2018; independent temporal validation 2019–20220.92NANANANA0.95NA
Roster (2023) [94]RF, GB, SVR, MLPHybrid/Superensemble9/9INT temporal expanding-window CVNANANANANANANA
Salami (2020) [108]PLS, glmnet, RF, XGBHybrid/Superensemble17/17INT 70/30 train-test; 5 × 10-fold CV on trainPLS: 0.88 (95% CI: 0.86–0.90); glmnet: 0.89 (95% CI: 0.87–0.91); RF: 0.97 (95% CI: 0.96–0.98); XGB: 0.97 (95% CI: 0.96–0.98)PLS: 0.75 (95% CI: 0.71–0.78); glmnet: 0.79 (95% CI: 0.76–0.82); RF: 0.89 (95% CI: 0.87–0.91); XGB: 0.88 (95% CI: 0.86–0.91)PLS: 0.84 (95% CI: 0.83–0.84); glmnet: 0.93 (95% CI: 0.92–0.93); RF: 0.93 (95% CI: 0.92–0.93); XGB: 0.94 (95% CI: 0.94–0.95)PLS: 0.70; glmnet: 0.83; RF: 0.90; XGB: 0.93PLS:0.88; glmnet: 0.91; RF: 0.94; XGB: 0.96PLS: 0.84; glmnet: 0.89; RF: 0.92; XGB: 0.95PLS: 0.76; glmnet: 0.81; RF: 0.91; XGB: 0.90
Salim (2021) [67]RF, SVM, ANNHybrid/Superensemble5/5INT 70/30 hold-outEpidemiological only: RF 0.80, SVM 0.75; ANN 0.70–0.72; Epidemiological + Climatic: RF 0.88–0.90, SVM 0.82–0.85, ANN 0.75–0.80Epidemiological only: RF 0.75–0.80, SVM 0.70–0.75, ANN 0.75–0.85; Epidemiological + Climatic: RF 0.85–0.90, SVM 0.75–0.85, ANN 0.70–0.80Epidemiological only: RF 0.75–0.80, SVM 0.70–0.75, 0.68–0.72; Epidemiological + Climatic: RF 0.85–0.90, SVM 0.75–0.85, ANN 0.70–0.80NANAEpidemiological only: RF 0.85–0.88, SVM 0.78, ANN 0.72–0.75; Epidemiological + Climatic data: RF 0.85–0.88, SVM 0.80–0.85, ANN 0.75–0.80NA
Salsabiila (2025) [68]CNN-BiGRU + AttentionDeep Learning4/4INT 80/20 temporal split; CV for ablationNA0.79NA0.88NA0.740.82
Sánchez López (2023) [95]SVMClassical ML10/10INT0.960.97NANANA0.970.97
Sanchez-Gendriz (2022) [96]LSTMDeep Learning2/2INT chronological split 2016–2018 train; 2019 test; 30 runsNANANANANANANA
Sebastianelli (2024) [97]CatBoost, SVM, LSTM, RFHybrid/Superensemble20–40/42INT temporal validation train 2001–2016; test 2017–2019NANANANANANANA
Shaikh (2023) [69]Optimized Ensemble (CNN + ANN + SVM, NC-DEFO)Hybrid/Superensemble20/NAINT validationNANANANANANANA
Shi (2016) [70]LASSOClassical ML60/226INT CVNANANANANANANA
Siddikur Rahman (2025) [71]RF, XGB, LightGBM + SHAPTree Ensemble22/22INTNANANANANANANA
Soliman (2020) [98]DFFN (deep feed-forward neural network)Deep Learning7/7EXTNANANANANANANA
Sood (2020) [116]Naive Bayesian Network (NBN)Classical ML17/17INT 10-fold CVNA0.930.930.92NA0.930.92
Souza (2022) [99]Diffusion Maps + SVM (RBF)Classical ML2/2INTNANANANANAAracaju 1.00; Belo Horizonte 0.80; Manaus 0.80; Recife 0.20; Rio de Janeiro 1.00; Salvador 0.80; São Luís 1.00. Mean: 0.80 ± 0.20NA
Stavelin (2022) [72]LSTMDeep Learning20/20INT TS rolling forecastNANANANANANANA
Teurlai (2015) [110]SVMClassical ML5/34INTNANANANANANANA
Theodorakos (2017) [100]Differential Evolution (Numerical)Other/HeuristicNAINTNANANANANANANA
Tian (2024) [73]SVM, XGBHybrid/Superensemble19/19INT 80/20 train-test splitNANANANANANANA
Tuan (2024) [74]RF, GB, LSTMHybrid/Superensemble13/13INT cross-sectional + TSNANANANANANANA
Wu (2021) [75]SVR, RF, ANN (MLP)Hybrid/Superensemble12/12INT chronological split 83/17NANANANANANANA
Yamana (2016) [107]Superensemble: F1 (SIR-EAKF), F2 (Bayesian weighted outbreaks), F3 (historical likelihood)Hybrid/SuperensembleNAEXTNANANANANANANA
Yavari Nejad (2021) [76]Bayes Net (BN) + TRFClassical ML6/7INT 10-fold CV (WEKA 3.8)NANANANANABayes Net: + TRF 0.92; without TRF 0.91NA
Yeh (2025) [77]ARDL + LSTM, SVR, MLP, GRNN, RBF, GMDH, GEPHybrid/Superensemble8/8 (+lags)INT train/test splitNANANANANANANA
Yi (2023) [78]Hybrid NN + RNN + EnKF superensemble (PICTUREE-Aedes)Hybrid/SuperensembleNANANANANANANANANA
Zhao (2020) [101]RFTree Ensemble25–30/25–30INTNANANANANANANA
Zhao (2023) [79]CNN-BiLSTMDeep Learning3/3INTNANANANANA1 w 0.88; 2 w 0.85; 3 w 0.81; 4 w 0.78NA
AUC = Area Under the ROC Curve; AdaBoost = Adaptive Boosting; ANFIS = Adaptive Neuro-Fuzzy Inference System; AR(2) = autoregressive model of order 2; ARDL = Autoregressive Distributed Lag; BN = Bayesian Network; Boruta = Boruta feature selection; BRT = Boosted Regression Trees; CART = Classification and Regression Tree; CatBoost = Categorical Gradient Boosting; CBR = CatBoost Regressor; CIF = Conditional Inference Forest; CNN = Convolutional Neural Network; CNN-BiLSTM = Convolutional Neural Network + Bidirectional LSTM; DFFN = Deep Feed-Forward Network; DT = decision tree; EAKF = Ensemble Adjustment Kalman Filter; EnKF = Ensemble Kalman Filter; ETC = Extra Trees Classifier; EXT = external validation; F1-score = harmonic mean of precision and recall; GAM = Generalized Additive Model; GB = Gradient Boosting; GANN = Genetic Algorithm Neural Network; GEP = Gene Expression Programming; glmnet = Elastic-Net Regularized GLMs; GMDH = Group Method of Data Handling; GRNN = General Regression Neural Network; GRU = Gated Recurrent Unit; Huber Regressor = robust regression with Huber loss; IHLOA = Improved Harris Hawks/Heuristic Learning Optimization Algorithm (feature selection); INT = internal validation; KNN = k-Nearest Neighbours; LASSO = Least Absolute Shrinkage and Selection Operator; LightGBM = Light Gradient Boosting Machine; LOOCV = Leave-One-Out Cross-Validation; LR = Logistic Regression; LSTM = Long Short-Term Memory; LSTM + attention = LSTM with attention mechanism; MLP = Multilayer Perceptron; MobileNetV3Small = lightweight CNN architecture; N = Number; NARX = Nonlinear Autoregressive Network with Exogenous Inputs; NB = Naïve Bayes; NBN = Naïve Bayesian Network; NN = Neural Network; NNAR = Neural Network Autoregression; NPV = Negative Predictive Value; OMP = Orthogonal Matching Pursuit; OOS = Out-of-Sample; PPV (Precision) = Positive Predictive Value; PLS = Partial Least Squares; RF = Random Forest; RBF = Radial Basis Function (kernel); SB-CV = Spatial Block Cross-Validation; SHAP = SHapley Additive exPlanations; SI-SIR = Susceptible–Infectious/Susceptible–Infectious–Recovered (compartmental model); SIR-EAKF = SIR model with Ensemble Adjustment Kalman Filter; SVM = Support Vector Machine; SVR = Support Vector Regression; SWR = Short-Wave Radiation; SST = Sea Surface Temperature; TSCV = Time-Series Cross-Validation; w = Week; WRF = Weather Research and Forecasting.
Table 3. Results of regression metrics for the principal AI models used in the included studies.
Table 3. Results of regression metrics for the principal AI models used in the included studies.
First Author (Year)Principal AI ModelAI CategoryMetric Scale (Unit–Temporal–Spatial)MAERMSEMSEMAPESMAPER2r
Akhtar (2019) [102]NARX NNClassical MLNANANANANANANANA
Al Mobin (2024) [18]DT + Sequential Squeeze FSClassical MLcases–monthly–national4759.069296.35NA0.94NANANA
Anggraeni (2021) [19]BiLSTMDeep Learningcases–monthly–city levelSurabaya 19.11; Malang 25.73Surabaya 30.11; Malang 28.65NANASurabaya 0.31; Malang 0.18NANA
Anno (2019) [21]CNNDeep LearningNANANANANANANANA
Anno (2024) [22]CNNDeep LearningNANANANANANANANA
Appice (2020) [103]AutoTiC-NNClassical MLcases–monthly–regionalNA32 states: 13; 17 active states: 7NANANANANA
Baquero (2018) [80]GAM, ANN (MLP), LSTMHybrid/Superensemblecases–monthly–city levelNAGAM 2152; Ensemble 3164; MLP 4422NANANANANA
Benedum (2020) [111]RFTree Ensemblecases–weekly–city level6.3NANANANANANA
Bogado (2023) [81]LSTMDeep LearningNANANANANANANANA
Bomfim (2020) [82]NNClassical MLNANANANANANANANA
Buebos-Esteve (2024) [23]RFTree Ensemblecases–10 days–regionalRegional incidence: rfsrc 32.55; ranger 74.56; rf 40.19; ensbl 43.76. Yearly incidence: rfsrc 39.56; ranger 41.24; rf 41.97; ensbl 39.82. Regional mortality: rfsrc 0.79; ranger 0.82; rf 0.73; ensbl 1.36NARegional incidence: rfsrc 2414.53; ranger 11,018.55; rf 2539.72; ensbl 2445.02. Yearly incidence: rfsrc 4314.84; ranger 5210.29; rf 3740.15; ensbl 5430.59. Regional mortality: rfsrc 2.78; ranger 2.77; rf 2.04; ensbl 69.41NANANANA
Campbell (2015) [83]DTClassical MLNANANANANANANANA
Carvajal (2018) [24]RFTree Ensembleper 1000 population–weekly–city level0.150.21NANANANANA
Chen (2018) [25]LASSOClassical MLNANANANANANANANA
Chen (2024) [84]LSTM + SHAPDeep Learningcases–monthly–nationalTop3/Worst3 out of 27 = 1 month: Top Roraima 6.36; Amapá 27.45; Sergipe 41.52; Worst Espírito Santo 6300.78; Minas Gerais 5088.71; Paraná 4450.76; 3 months: Top Roraima 5.70; Amapá 37.99; Sergipe 44.62; Worst Minas Gerais 28,714.48; Santa Catarina 23,381.95; Espírito Santo 20,177.49Top3/Worst3 out of 27 = 1 month: Top Santa Catarina 15.21; Ceará 15.51; Pernambuco 15.84; Worst Rio Grande do Sul 56.30; Roraima 44.31; Rondônia 40.93; 3 months: Top Sergipe 18.62; Roraima 20.76; Pernambuco 22.94; Worst Rio Grande do Sul 826.28; São Paulo 570.01; Santa Catarina 418.37NANANANANA
Chen (2025) [85]LSTMDeep Learningcases–weekly–city levelManaus 75.45, Belém 23.98, Fortaleza 247.27, Salvador 228.55, Brasília 1067.66, Goiânia 439.02, Belo Horizonte 1483.27, Rio de Janeiro 819.73, São Paulo 1102.75, Curitiba 65.99NANAManaus 29.95, Belém 29.28, Fortaleza 22.59, Salvador 23.95, Brasília 22.12, Goiânia 23.26, Belo Horizonte 22.47, Rio de Janeiro 21.87, São Paulo 22.18, Curitiba 25.33NANANA
Cheng (2025) [26]Feature selection: Regression + fuzzy c-means + IHLOA; Classificators: SVM, KNN, RFHybrid/SuperensembleNANANANANANANANA
Chowdhury (2025) [27]ANN, XGBHybrid/Superensemblecases–monthly–nationalANN 1260.98; XGB 479.44ANN 2229.66; XGB 918.83NAANN 1.92; XGB 2.25NANANA
Conde-Gutiérrez (2024) [104]ANNClassical MLcases–weekly–regionalNANon-severe dengue 0.26; Dengue with warning signs 0.17; Severe dengue 0.04NANANANon-severe dengue 0.97; Dengue with warning signs 0.98; Severe dengue 0.81NA
da Silva (2022) [86]RFTree Ensemblecases–bimonthly–city levelNABimonthly (b1–b6): 2014: 4.67, 5.57, 3.79, 4.51, 3.24, 2.34; 2015: 9.19, 4.44, 2.97, 5.21, 5.12, 5.99; 2016: 4.15, 3.88, 4.38, 3.15, 3.94, 3.30NANANANANA
da Silva (2025) [87]RFTree Ensemblecases–weekly–city levelNatal D 57.8–71.8; Natal CD 97.9. Iquitos CD 2.78–4.16; Iquitos D 4.02. Barranquilla HD 6.09–6.67; Barranquilla CD 7.81NANANANANANatal D 0.92–0.95; Natal CD 0.90. Iquitos CD 0.85–0.89; Iquitos D 0.81. Barranquilla HD 0.94–0.95; Barranquilla CD 0.92
Dala (2021) [29]Backpropagation NNClassical MLNANANANANANANANA
Dang Anh Tuan (2025) [112]GLM + XGB, LSTMHybrid/SuperensembleNANANANANANANANA
Dhaked (2025) [30]1D-CNNDeep Learningcases–monthly–city level31.4956.453187.43NANANANA
Doni (2020) [31]LSTMDeep Learning NANANANANANANA
Edussuriya (2021) [32]LSTM + Grey Wolf OptimizerDeep Learningcases–monthly–district levelNAWithout GWO 25.45; GWO: 20.45; final model: 10.84NANANANANA
Farooq (2022) [91]XGB + SHAPTree Ensemble NANANANANANANA
Ferdousi (2021) [88]GRU, LSTMDeep Learningper 100,000 population–weekly–district levelGRU 0.34 ± 0.02; LSTM 0.36 ± 0.01NANANANANANA
Francisco (2024) [33]Hybrid ML (CIF, RF, GAM, ANN, SVM/SVR, XGB)Hybrid/Superensemblecases–weekly–city levelNANANANANANANA
Guo (2017) [34]SVRClassical MLcases–weekly–provincialNAGuangzhou 16.26, Foshan 1.05, Zhongshan 0.35, Zhuhai 0.57, Shenzhen 0.80, Other cities 0.27NANANAGuangzhou 0.99, Foshan 0.99, Zhongshan 0.99, Zhuhai 0.99, Shenzhen 0.99, Other cities 0.99NA
Hamlet (2021) [89]BRTTree EnsembleNANANANANANANANA
Handari (2021) [35]LSTMDeep Learningcases–weekly–district levelNAWest 10.13, South 5.63, East 9.58, North 5.34, Central 4.79NANANANANA
Holcomb (2023) [105]RF, NNHybrid/Superensemblecases–annual–nationalRF 21.30; NN 22.70RF 30.10; NN 31.60NANANANANA
Husin (2016) [36]GANNOther/Heuristiccases–weekly–district levelNANASepang 0.07; Hulu Selangor 0.06; Hulu Langat 0.07; Klang 0.06; Kuala Selangor 0.06NANANANA
Islam (2024) [37]LSTMDeep Learningcases–monthly–national301.64414.23NA28.78NANANA
Ismail (2022) [38]RFTree EnsembleNANANANA5.46; after removal of entomological data 8.32NANANA
Javaid (2023) [39]RFTree EnsembleNANANANANANANANA
Jayabalan (2024) [40]GBTree EnsembleNANANANANANANANA
Kerdprasop (2020) [41]ANFISOther/Heuristiccases–monthly–city level151.51216.54NANANANA0.83
Kesorn (2015) [43]SVM with kernel RBF (SVM-R)Classical MLNANANANANANANANA
Kiang (2021) [44]LASSOClassical MLcases–monthly–provincialLASSO: Bangkok, 1-month ahead 423.7NANANANANANA
Koh (2018) [45]NN(AR(2)) with rainfallTime-series/Statisticalcases–weekly–city levelNANANANANANANA
Koplewitz (2022) [90]RFTree Ensemblecases–weekly–city levelNA1 w 11.03; 3 w 17.62; 6 w 22.06; 8 w 23.36NANANA1 w 0.85; 3 w 0.62; 6 w 0.40; 8 w 0.341 w 0.93; 3 w 0.80; 6 w 0.67; 8 w 0.60
Kukkar (2024) [46]WRFMechanistic NANANANANANANA
Kumar Dey (2022) [47]SVRClassical MLcases–monthly–city level4.95NANANANANANA
Kuo (2024) [48]RFTree EnsembleNANANANANANANANA
Laureano Rosario (2018) [106]ANNClassical MLNANANANANANANANA
Li (2022) [115]LSTMDeep Learninglog(cases)–weekly–regionalTest 2018–2019: 1 w 0.27; 2 w 0.27; 3 w 0.27; 4 w 0.26; 5 w 0.27; 6 w 0.31; 7 w 0.30; 8 w 0.29; 9 w 0.29; 10 w 0.31; 11 w 0.27; 12 w 0.33. Test 2019 peak (January–August): 1 w 0.20; 2 w 0.19; 3 w 0.20; 4 w 0.21; 5 w 0.19; 6 w 0.21; 7 w 0.22; 8 w 0.23; 9 w 0.27; 10 w 0.22; 11 w 0.23; 12 w 0.28Test 2018–2019: 1 w 0.35; 2 w 0.34; 3 w 0.34; 4 w 0.35; 5 w 0.34; 6 w 0.40; 7 w 0.37; 8 w 0.38; 9 w 0.38; 10 w 0.39; 11 w 0.34; 12 w 0.40. Test 2019 peak (January–August): 1 w 0.23; 2 w 0.22; 3 w 0.25; 4 w 0.25; 5 w 0.22; 6 w 0.26; 7 w 0.28; 8 w 0.29; 9 w 0.32; 10 w 0.28; 11 w 0.28; 12 w 0.33NANANANANA
Li (2022) [91]LSTM, LSTM + AttentionDeep Learninglog(cases)–weekly–regionalFederal District LSTM w/o cases: 1 w 0.53; 2 w 0.56; 3 w 0.50; 4 w 0.50; LSTM with cases: 1 w 0.42; 2 w 0.41; 3 w 0.40; 4 w 0.46; LSTM-ATT w/o cases: 1 w 0.53; 2 w 0.49; 3 w 0.46; 4 w 0.47; LSTM-ATT with cases: 1 w 0.42; 2 w 0.38; 3 w 0.40; 4 w 0.43; Fortaleza LSTM w/o cases: 1 w 0.44; 2 w 0.47; 3 w 0.45; 4 w 0.43; LSTM with cases: 1 w 0.35; 2 w 0.35; 3 w 0.40; 4 w 0.44; LSTM-ATT w/o cases: 1 w 0.41; 2 w 0.44; 3 w 0.45; 4 w 0.43; LSTM-ATT with cases: 1 w 0.26; 2 w 0.34; 3 w 0.33; 4 w 0.39Federal District LSTM w/o cases: 1 w 0.70; 2 w 0.73; 3 w 0.66; 4 w 0.66; LSTM with cases: 1 w 0.53; 2 w 0.52; 3 w 0.50; 4 w 0.56; LSTM-ATT w/o cases: 1 w 0.66; 2 w 0.68; 3 w 0.61; 4 w 0.61; LSTM-ATT with cases: 1 w 0.53; 2 w 0.46; 3 w 0.49; 4 w 0.51; Fortaleza LSTM w/o cases: 1 w 0.55; 2 w 0.57; 3 w 0.59; 4 w 0.55; LSTM with cases: 1 w 0.42; 2 w 0.44; 3 w 0.50; 4 w 0.56; LSTM-ATT w/o cases: 1 w 0.51; 2 w 0.53; 3 w 0.57; 4 w 0.55; LSTM-ATT with cases: 1 w 0.33; 2 w 0.46; 3 w 0.43; 4 w 0.51NANANANANA
Liu (2016) [49]CARTClassical MLcases–weekly–city levelNA1 w Guangzhou 3.22, Zhongshan 0.37; 1–3 w Guangzhou 3.72, Zhongshan 0.38NANANANANA
Liu (2020) [50]LSTMDeep Learningcases–weekly–district levelNANANANANANANA
Long (2025) [113]RF, XGB, SVR, MLPHybrid/Superensemblecases–annual–nationalNARF 0.42; XGB 0.46; MLP 0.53; SVR 0.61RF 0.18; XGB 0.21; MLP 0.28; SVR 0.37NANARF 0.84 XGB 0.82; MLP 0.75; SVR 0.68NA
Lu (2025) [51]MLR, LSTM, SI-SIRHybrid/Superensemblecases–weekly–nationalPre-lockdown 204.36; During lockdown 434.02NANAPre-lockdown 13.97; During lockdown 87.03; Extended validation 13.12–17.09NANANA
Majeed (2023) [53]LSTMDeep Learningcases–weekly–nationalNABest/Worst by look-back = 1 m Best SA-LSTM (Climate/time/geography) 3.27; Worst SA-LSTM (Climate) 6.77; 2 m Best A-LSTM (Climate/time/geography) 3.10; Worst LSTM (Climate/time) 5.01; 3 m Best SA-LSTM (Climate/time/geography) 4.56; Worst S-LSTM (Climate/time) 6.32; 4 m Best SA-LSTM (Climate) 3.01; Worst S-LSTM (Climate/time) 4.88; 5 m Best SA-LSTM (Climate/time/geography) 3.37; Worst LSTM (Climate) 6.69; 6 m Best SA-LSTM (Climate/time/geography) 4.32; Worst S-LSTM (Climate) 7.44NANANANANA
Majeed (2025) [54]LSTMDeep Learningcases–weekly–nationalNAST-SLSTM 2.66 ± 0.57; ST-LSTM 3.61 ± 0.57; SSA-LSTM 3.17 ± 0.41; STA-LSTM 3.67 ± 0.60; TA-LSTM 3.66 ± 0.63; SA-LSTM 3.87 ± 0.58; S-LSTM 4.13 ± 0.59; Plain LSTM 4.15 ± 0.61NANANANANA
Majeed2 (2023) [52]LSTMDeep Learningcases–monthly–nationalNALSTM: 4.15 ± 0.61; S-LSTM: 4.13 ± 0.59; TA-LSTM: 4.13 ± 0.59; STA-LSTM: 3.67 ± 0.60; SA-LSTM: 3.87 ± 0.58; SSA-LSTM (stacked + spatial attention): 3.17 ± 0.41NANANANANA
Mayrose (2024) [55]MobileNetV3SmallDeep Learning NANANANANANANA
Mills (2025) [92]Median EnsembleHybrid/Superensembleper 100,000 population–monthly–provincialNA0.81NANANA0.74NA
Mobin (2025) [56]DT, RF, GB, XGB, SVR, KNN (Daily dataset)Hybrid/Superensemblecases–monthly–nationalRF 90; DT 114; XGB 121; GB 132; SVR 147; KNN 161RF 176; DT 225; XGB 240; GB 260; SVR 290; KNN 320NARF 3.6; DT 4.5; XGB 5.0; GB 5.4; SVR 5.9; KNN 6.3NANANA
Muhamad Krishnan (2022) [57]ANNClassical MLNANANANANANANANA
Mulwa (2024) [109]XGBTree EnsembleNANANANANANANANA
Mussumeci (2020) [93]LASSO, LSTM, RFHybrid/SuperensembleNANANANANANANANA
Mustaffa (2024) [58]NNARTime-series/Statisticalcases–weekly–nationalNA597.74NA94.84NANANA
Necesito (2021) [59]LSTMDeep Learningcases–monthly–city levelNA2016: 32.14; 2017: 38.41; 2018: 28.06NANANANA2016: 0.58; 2017: 0.82; 2018: 0.92
Ningrum (2024) [20]ETC (best model), CatBoost, XGB, LightGBM, LSTM, CBR, GB, OMP, Huber RegressorHybrid/Superensemblecases–weekly–district levelMAE 0.631.091.20NANA0.56NA
Olmoguez (2019) [60]RFTree EnsembleNANANANANANA0.73NA
Ong (2018) [61]RFTree EnsembleNANANANANANANANA
Ong (2023) [62]LR, DT, RF, SVM, NB, XGB, AdaBoost + BorutaHybrid/SuperensembleNANANANANANANANA
Panja (2023) [114]XEWNetDeep Learningper 10,000 population–weekly–regionalPuerto Rico 26 w 5.66, 52 w 42.14; Peru 26 w 1.57, 52 w 2.50; India 26 w 2.36, 52 w 6.55Puerto Rico 26 w 7.69, 52 w 68.49; Peru 26 w 1.98, 52 w 4.73; India 26 w 2.04, 52 w 9.98NANANANANA
Patra (2025) [63]CNN + BiLSTMDeep Learningcases–weekly–national54.53106.96NANANA0.94NA
Puengpreedaa (2020) [64]RF, AdaBoost, ETC, LASSOHybrid/Superensemblecases–weekly–provincialChiang Rai h1 10.98 RF, Chiang Rai h2 16.44 RF, Chiang Rai h3 21.27 RF, Chiang Rai h4 25.65 RF, Mukdahan h1 1.61 AdaBoost, Mukdahan h2 1.80 ETC, Mukdahan h3 2.05 LASSO, Mukdahan h4 2.02 RF, Pattani h1 2.83 ETC, Pattani h3 3.20 AdaBoost, Pattani h4 3.37 LASSO, Ayutthaya h3 9.34 RF, Ratchaburi h4 8.56 AdaBoostNAChiang Rai h1 237.98 RF, Chiang Rai h2 543.87 RF, Chiang Rai h3 847.23 RF, Chiang Rai h4 1193.56 RF, Mukdahan h1 5.52 AdaBoost, Mukdahan h2 6.91 ETC, Mukdahan h3 9.18 LASSO, Mukdahan h4 10.12 RF, Pattani h1 17.13 ETC, Pattani h3 18.20 AdaBoost, Pattani h4 22.16 LASSO, Ayutthaya h3 155.30 RF, Ratchaburi h4 116.05 AdaBoostNA Chiang Rai h1 0.92 RF, Chiang Rai h2 0.82 RF, Chiang Rai h3 0.72 RF, Chiang Rai h4 0.61 RF, Mukdahan h1 0.81 AdaBoost, Mukdahan h2 0.76 ETC, Mukdahan h3 0.68 LASSO, Mukdahan h4 0.65 RF, Pattani h1 0.78 ETC, Pattani h3 0.78 AdaBoost, Pattani h4 0.73 LASSO, Ayutthaya h3 0.56 RF, Ratchaburi h4 0.47 AdaBoostNA
Rahman (2025) [65]XGB, LightGBMTree EnsembleNANANANANANALightGBM 0.09, XGBoost 0.84NA
Ren (2024) [66]RFTree EnsembleNANANANANANANANA
Roster (2023) [94]RF, GB, SVR, MLPHybrid/Superensemblecases–monthly–district levelMean, median: train: GB Corr 39.0, 8.8; GB PCMCI 38.7, 8.9; GB OnlyD 38.3, 8.9; GB Clim 41.4, 9.7; MLP Corr 44.0, 12.1; MLP PCMCI 41.7, 12.4; MLP OnlyD 48.2, 13.0; MLP Clim 53.8, 22.7; RF Corr 37.4, 8.9; RF PCMCI 38.8, 8.5; RF OnlyD 37.2, 8.6; RF Clim 42.6, 9.6; SVR Corr 54.6, 15.1; SVR PCMCI 54.6, 15.3; SVR OnlyD 55.3, 14.9; SVR Clim 53.0, 14.8; test: RF OnlyD 53.7, 12.2; City specific best 52.5, 11.9Mean, median: train: GB Corr 68.1, 13.6; GB PCMCI 67.6, 13.9; GB OnlyD 67.3, 13.5; GB Clim 72.5, 15.6; MLP Corr 74.6, 17.7; MLP PCMCI 71.2, 18.7; MLP OnlyD 78.6, 18.3; MLP Clim 89.9, 32.0; RF Corr 67.4, 15.4; RF PCMCI 67.9, 14.2; RF OnlyD 67.3, 15.5; RF Clim 73.5, 15.6; SVR Corr 90.5, 18.0; SVR PCMCI 90.4, 18.8; SVR OnlyD 90.5, 17.7; SVR Clim 91.1, 19.9; val: RF OnlyD 130.4, 25.4; City specific 119.2, 26.5NANANANANA
Salami (2020) [108]PLS, glmnet, RF, XGBHybrid/SuperensembleNANANANANANANANA
Salim (2021) [67]RF, SVM, ANNHybrid/SuperensembleNANANANANANANANA
Salsabiila (2025) [68]CNN-BiGRU + AttentionDeep Learningcases–weekly–city levelTiDE-PSO 45.10TiDE-PSO 75.76NANANANANA
Sánchez López (2023) [95]SVMClassical MLNANANANANANANANA
Sanchez-Gendriz (2022) [96]LSTMDeep LearningNANANANANANANA0.92
Sebastianelli (2024) [97]CatBoost, SVM, LSTM, RFHybrid/Superensembleper 100,000 population–monthly–nationalNARondônia 0.24; Acre 0.28; Amazonas 0.23; Roraima 0.12; Piauí 0.08NANANANANA
Shaikh (2023) [69]Optimized Ensemble (CNN + ANN + SVM, NC-DEFO)Hybrid/Superensemblecases–weekly–city level1.055.73NANA0.04NANA
Shi (2016) [70]LASSOClassical MLNANANANA1 w: 17 (95% CI: 16–19); 12 w: 24 (95% CI: 22–26)NANANA
Siddikur Rahman (2025) [71]RF, XGB, LightGBM + SHAPTree Ensemblecases–monthly–nationalRF Climate test 0.65, RF Climate training 0.42, RF SocDem test 0.85, RF SocDem training 0.48, RF Landscape test 0.87, RF Landscape training 0.47, XGBoost Climate test 0.51, XGBoost Climate training 0.42, XGBoost SocDem test 0.53, XGBoost SocDem training 0.52, XGBoost Landscape test 0.54, XGBoost Landscape training 0.41, LightGBM Climate test 0.28, LightGBM Climate training 0.24, LightGBM SocDem test 0.46, LightGBM SocDem training 0.41, LightGBM Landscape test 0.47, LightGBM Landscape training 0.34RF Climate test 0.71, RF Climate training 0.67, RF SocDem test 0.79, RF SocDem training 0.56, RF Landscape test 0.78, RF Landscape training 0.65, XGBoost Climate test 0.62, XGBoost Climate training 0.58, XGBoost SocDem test 0.68, XGBoost SocDem training 0.53, XGBoost Landscape test 0.67, XGBoost Landscape training 0.54, LightGBM Climate test 0.36, LightGBM Climate training 0.32, LightGBM SocDem test 0.53, LightGBM SocDem training 0.42, LightGBM Landscape test 0.57, LightGBM Landscape training 0.42NARF Climate test 0.16, RF Climate training 0.15, RF SocDem test 0.17, RF SocDem training 0.14, RF Landscape test 0.15, RF Landscape training 0.12, XGB Climate test 0.13, XGB Climate training 0.12, XGB SocDem test 0.17, XGB SocDem training 0.132, XGB Landscape test 0.16, XGB Landscape training 0.13, LightGBM Climate test 0.09, LightGBM Climate training 0.05, LightGBM SocDem test 0.11, LightGBM SocDem training 0.08, LightGBM Landscape test 0.11, LightGBM Landscape training 0.09NANANA
Soliman (2020) [98]DFFN (deep feed-forward neural network)Deep Learningper 100,000 population–monthly–national6.368.93NANANANA0.42
Sood (2020) [116]Naive Bayesian Network (NBN)Classical MLNANANANANANANANA
Souza (2022) [99]Diffusion Maps + SVM (RBF)Classical MLNANANANANANANANA
Stavelin (2022) [72]LSTMDeep Learninglog(cases)–monthly–district levelNAUnivariate Slice1 2010–2015 1.20; Slice2 2011–2019 1.30; mean 1.25; SD 0.05; Multivariate 1.13NANANAUnivariate 1.00NA
Teurlai (2015) [110]SVMClassical ML NANANANANANANA
Theodorakos (2017) [100]Differential Evolution (Numerical)Other/Heuristiccases–monthly–national40.18106.3011,869.5NANANANA
Tian (2024) [73]SVM, XGBHybrid/Superensemblecases–weekly–nationalXGB (lag + temporal) 89.12; SVM 160.73; XGB (temporal only) 160.65; XGB (no lag/no temporal) 175.49XGB (lag + temporal) 156.07; SVM 268.83; XGB (temporal only) 232.58; XGB (no lag/no temporal) 247.86NANANAXGB (lag + temporal) 0.83; SVM 0.50; XGB (temporal only) 0.49; XGB (no lag/no temporal) 0.42NA
Tuan (2024) [74]RF, GB, LSTMHybrid/Superensemblecases–monthly–provincialRF 232.22; GB 206.60; LSTM 89.15RF 381.52; GB 336.40; LSTM 106.23NANANANANA
Wu (2021) [75]SVR, RF, ANN (MLP)Hybrid/Superensemble NANANANANANANA
Yamana (2016) [107]Superensemble: F1 (SIR-EAKF), F2 (Bayesian weighted outbreaks), F3 (historical likelihood)Hybrid/Superensemblecases–weekly–city levelTiming, Peak, Total: SE(F1,F2) 3.3, 21, 473, SE(F1,F2,F3) 3.7, 20, 486NANANANANANA
Yavari Nejad (2021) [76]Bayes Net (BN) + TRFClassical MLNANANANANANANANA
Yeh (2025) [77]ARDL + LSTM, SVR, MLP, GRNN, RBF, GMDH, GEPHybrid/Superensemblecases–monthly–city levelNAKaohsiung (high incidence area): ARDL + SVR 4.25; ARDL + LSTM 4.41; ARDL + GRNN 4.68; ARDL + RBF 4.79; ARDL + MLP logistic 4.83; ARDL + GEP 5.20; ARDL + GMDH 5.62. Tainan (high incidence area): ARDL + GEP 0.76; ARDL + SVR 0.83; ARDL + GRNN 0.84; ARDL + RBF 0.91; ARDL + MLP logistic 0.93; ARDL + LSTM 0.96; ARDL + GMDH 1.08. Taipei (low incidence area): ARDL + MLP logistic 1.46; ARDL + GRNN 1.42; ARDL + SVR 1.69NAKaohsiung (high incidence area): ARDL + SVR 34.3; ARDL + LSTM 35.8; ARDL + GRNN 39.2; ARDL + RBF 39.4; ARDL + MLP logistic 36.6; ARDL + GEP 39.1; ARDL + GMDH 37.9. Tainan (high incidence area): ARDL + GEP 30.1; ARDL + SVR 33.4; ARDL + GRNN 34.7; ARDL + RBF 33.9; ARDL + MLP logistic 32.1; ARDL + LSTM 32.8; ARDL + GMDH 32.5. Taipei (low incidence area): ARDL + MLP logistic 30.8; ARDL + GRNN 37.2; ARDL + SVR 34.4NANANA
Yi (2023) [78]Hybrid NN + RNN + EnKF superensemble (PICTUREE-Aedes)Hybrid/SuperensembleNANANANANANANANA
Zhao (2020) [101]RFTree Ensemblecases–weekly–district level1 w 0.93, 2 w 0.95, 3 w 0.94, 4 w 0.95, 5 w 0.95, 6 w 0.94, 7 w 0.93, 8 w 0.92, 9 w 0.90, 10 w 0.89, 11 w 0.87, 12 w 0.86NANANANANANA
Zhao (2023) [79]CNN-BiLSTMDeep LearningNA1 w 41.40; 2 w 53.01; 3 w 65.99; 4 w 79.441 w 73.30–85.00; 2 w 90.33; 3 w 112.65; 4 w 136.36NANANANANA
ANFIS = Adaptive Neuro-Fuzzy Inference System; ANN = Artificial Neural Network; AR(2) = autoregressive model of order 2; ARDL = Autoregressive Distributed Lag; BN = Bayesian Network; BRT = Boosted Regression Trees; CBR = Case-Based Reasoning; CD = Dengue with Climate; CNN = Convolutional Neural Network; D = Dengue only; DEFO = Differential Evolution Optimization; DFFN = Deep Feed-Forward Network; DT = decision tree; ENKF = Ensemble Kalman Filter; ETC = Extra Trees Classifier; F1 = SIR-EAKF Superensemble model variant; F2 = Bayesian weighted outbreaks superensemble model variant; F3 = historical likelihood superensemble model variant; FS = feature selection; GAM = Generalized Additive Model; GANN = Genetic Algorithm Neural Network; GB = Gradient Boosting; GEP = Gene Expression Programming; GLM = Generalized Linear Model; GMDH = Group Method of Data Handling; GRNN = General Regression Neural Network; GRU = Gated Recurrent Unit; GWO = Grey Wolf Optimizer; HD = Humidity (Dengue with Humidity); IHLOA = Improved Harris Hawks Optimization Algorithm; KNN = K-Nearest Neighbours; LASSO = Least Absolute Shrinkage and Selection Operator; LightGBM = Light Gradient Boosting Machine; LR = Logistic Regression; LSTM = Long Short-Term Memory; MAE = Mean Absolute Error; MAPE = Mean Absolute Percentage Error; MLP = Multi-Layer Perceptron; MLR = Multiple Linear Regression; MSE = Mean Squared Error; NARX NN = Nonlinear AutoRegressive Network with eXogenous inputs Neural Network; NB = Naive Bayes; NC-DEFO = Novel Combined Differential Evolution Optimization; NN = Neural Network; NNAR = Neural Network Autoregression; OMP = Orthogonal Matching Pursuit; PICTUREE-Aedes = Hybrid NN + RNN + EnKF Superensemble model; PLS = Partial Least Squares; R2 = Coefficient of Determination; RBF = Radial Basis Function; RF = Random Forest; RMSE = Root Mean Squared Error; r = Pearson Correlation Coefficient; SA-LSTM = Spatial Attention Long Short-Term Memory; SE = Superensemble; SHAP = SHapley Additive exPlanations; SI-SIR = Spatially Informed Susceptible–Infectious–Recovered model; S-LSTM = Standard Long Short-Term Memory; SMAPE = Symmetric Mean Absolute Percentage Error; SIR-EAKF = Susceptible–Infectious–Recovered Ensemble Adjustment Kalman Filter; SSA-LSTM = Stacked Spatial Attention Long Short-Term Memory; STA-LSTM = Spatiotemporal Attention Long Short-Term Memory; ST-LSTM = Spatiotemporal Long Short-Term Memory; ST-SLSTM = Spatiotemporal Stacked Long Short-Term Memory; SVR = Support Vector Regression; SVM = Support Vector Machine; TA-LSTM = Temporal Attention Long Short-Term Memory; TRF = Trust Region Framework; w = Week; WRF = Weather Research and Forecasting model; XGB = Extreme Gradient Boosting.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pennisi, F.; Pinto, A.; Borgonovo, F.; Scaglione, G.; Ligresti, R.; Santangelo, O.E.; Provenzano, S.; Gori, A.; Baldo, V.; Signorelli, C.; et al. Artificial Intelligence Models for Forecasting Mosquito-Borne Viral Diseases in Human Populations: A Global Systematic Review and Comparative Performance Analysis. Mach. Learn. Knowl. Extr. 2026, 8, 15. https://doi.org/10.3390/make8010015

AMA Style

Pennisi F, Pinto A, Borgonovo F, Scaglione G, Ligresti R, Santangelo OE, Provenzano S, Gori A, Baldo V, Signorelli C, et al. Artificial Intelligence Models for Forecasting Mosquito-Borne Viral Diseases in Human Populations: A Global Systematic Review and Comparative Performance Analysis. Machine Learning and Knowledge Extraction. 2026; 8(1):15. https://doi.org/10.3390/make8010015

Chicago/Turabian Style

Pennisi, Flavia, Antonio Pinto, Fabio Borgonovo, Giovanni Scaglione, Riccardo Ligresti, Omar Enzo Santangelo, Sandro Provenzano, Andrea Gori, Vincenzo Baldo, Carlo Signorelli, and et al. 2026. "Artificial Intelligence Models for Forecasting Mosquito-Borne Viral Diseases in Human Populations: A Global Systematic Review and Comparative Performance Analysis" Machine Learning and Knowledge Extraction 8, no. 1: 15. https://doi.org/10.3390/make8010015

APA Style

Pennisi, F., Pinto, A., Borgonovo, F., Scaglione, G., Ligresti, R., Santangelo, O. E., Provenzano, S., Gori, A., Baldo, V., Signorelli, C., & Gianfredi, V. (2026). Artificial Intelligence Models for Forecasting Mosquito-Borne Viral Diseases in Human Populations: A Global Systematic Review and Comparative Performance Analysis. Machine Learning and Knowledge Extraction, 8(1), 15. https://doi.org/10.3390/make8010015

Article Metrics

Back to TopTop