Machine Learning and Prediction of Infectious Diseases: A Systematic Review

: The aim of the study is to show whether it is possible to predict infectious disease outbreaks early, by using machine learning. This study was carried out following the guidelines of the Cochrane Collaboration and the meta-analysis of observational studies in epidemiology and the preferred reporting items for systematic reviews and meta-analyses. The suitable bibliography on PubMed/Medline and Scopus was searched by combining text, words, and titles on medical topics. At the end of the search, this systematic review contained 75 records. The studies analyzed in this systematic review demonstrate that it is possible to predict the incidence and trends of some infectious diseases; by combining several techniques and types of machine learning, it is possible to obtain accurate and plausible results.


Introduction 1.Burden of Infectious Diseases
Infectious diseases are now an increasing part of the global health burden: every year, millions of deaths are due to infectious diseases [1].To be able to develop protective measures against infectious diseases, it is crucial to identify the factors of disease diffusion in order to apply control and prevention measures [2].Identifying the factors of disease diffusion could allow for making predictions that can be useful for policymakers to make appropriate decisions on, for example, the purchase of vaccines, public awareness campaigns and training programs for health professionals [3,4].

Machine Learning Applied to Infectious Diseases-Overview
Machine-learning algorithms can contribute to the control of infectious diseases by helping to both spatially and temporally predict the evolution and spread of infectious diseases [5].Machine-learning algorithms are capable of analyzing large, complex data sets and identifying patterns and trends that may be difficult for humans to detect.This makes them well suited for the prediction of infectious diseases, which often involve multiple factors such as population demographics, environmental conditions, and individual behaviors.In recent years, many studies have applied machine-learning techniques to the prediction of infectious diseases, and the results have been promising.One of the key challenges in using machine learning for disease prediction is the availability of high-quality, comprehensive data.Infectious disease surveillance systems often collect data on a variety of factors, including the number of reported cases, the locations of outbreaks, and the demographics of infected individuals.However, these data are often incomplete, biased, or noisy, which can affect the performance of machine-learning models.Additionally, many infectious diseases have long incubation periods, which means that data on past outbreaks may not accurately reflect current conditions.To overcome these challenges, researchers have employed a range of machine-learning algorithms, including decision trees, random forests, support vector machines, and deep-learning networks.These algorithms have been applied to a variety of data sets, including electronic health records, genomic data, and social media posts.In general, the results of these studies have shown that machine-learning algorithms can accurately predict the spread and onset of infectious diseases, with performance comparable with or better than traditional statistical methods.Machine-learning algorithms have shown great promise in predicting the spread and onset of infectious diseases.For example, some studies have used machine learning to forecast the number of cases of a particular disease in a given region on the basis of historical data and current conditions [6].Others have used machine learning to identify the most likely sources of an outbreak on the basis of the genetic makeup of the pathogen and the patterns of infection [7,8].Still others have used machine learning to predict the likelihood of an individual contracting an infectious disease on the basis of their personal characteristics and behaviors [9].Overall, the use of machine learning in the prediction of infectious diseases is a promising area of research, with potential applications in public health, epidemiology, and clinical practice.However, there are also significant challenges and limitations to using machine learning in this context, including the need for high-quality data, the complexity of the underlying phenomena, and the potential for bias and overfitting [10].Despite the promising results of these studies, there are also significant challenges and limitations to using machine learning in the prediction of infectious diseases.One of the main challenges is the need for high-quality data, which are often difficult to obtain owing to issues such as missing values, incomplete records, and varying data formats.Additionally, the complexity of the underlying phenomena, such as the transmission of infectious diseases, can make it difficult to develop accurate models.Finally, there is a potential for bias and overfitting in machine-learning models, which can lead to inaccurate predictions [10].In this systematic review, we will explore the current state of the art in the use of machine learning for the prediction of infectious diseases.We will focus on recent, high-quality studies that have applied machine-learning techniques to real-world data and have evaluated the performance of these models in predicting the spread and onset of infectious diseases.We will also discuss the challenges and limitations of using machine learning in this context and provide insights into future directions for research in this area.

Aim
The aim of the study is to show whether it is possible to predict infectious disease outbreaks early, by using machine learning.

Search Strategy and Data Sources
The Cochrane Collaboration [11] and the meta-analysis of observational studies in epidemiology (MOOSE) guidelines [12] were followed in order to conduct the current systematic review.The preferred reporting items for systematic reviews and meta-analyses [13] guidelines [14] were used to report the process and the results.A bibliographic search was conducted on 9 November 2022, on the Scopus and PubMed/MEDLINE databases, combining keywords by using the Boolean operators "AND", "OR" and "NOT".The search strategy is reported in Supplementary Table S1.No time filter was used.Given the innovative nature of the study and its recent field of application halfway between medicine, epidemiology, and information technology, it was not always possible to apply all the items of the PRISMA checklist (more details are given in the study limitations section).

Inclusion and Exclusion Criteria
Studies had to meet the following criteria to be considered eligible: (i) language: written in English; (ii) population: human; (iii) interventions: machine learning; (iv) comparators/control: infectious diseases; (v) outcomes: prediction/forecasting outbreaks infectious diseases; (vi) type of study: epidemiologic studies (case-control, cross-sectional, or cohort studies).
Exclusion criteria were as follows: (i) articles not published in English; (ii) not human; (iii) full text not available; (iv) interventions: not about machine learning; (v) comparators/control: not about infectious diseases; (vi) outcomes: not about prediction/forecasting outbreaks infectious diseases; (vii) type of study: review article, meta-analysis, trial, expert opinion, commentary, editorial, case report, letter to the editor, or book chapters.See Supplementary Table S2 where the detailed description of the inclusion/exclusion criteria is reported.

Selection Process and Data Extraction
Titles and abstracts of manuscripts found using the search strategy and those retrieved from additional sources were independently assessed by two reviewers (D.G. and C.F.).Subsequently, the same authors assessed the eligibility of the articles and independently reviewed the full downloaded text.When there was an unresolved disagreement between the two evaluators, the discussion was resolved by discussing the case with a senior reviewer (O.E.S.).Full texts were downloaded only for potentially eligible studies.
Data extraction was conducted only for those articles that met all the inclusion criteria, and it was performed using a predefined and prepiloted spreadsheet elaborated in Microsoft Excel for Windows.The extracted data included the author, publication year, study period, country where the study was conducted, disease, data source, model and/or techniques, aim, main results, accuracy/best model, space/time resolution, order of magnitude modeled populations, funds, and conflicts of interest.

Strategy for Data Synthesis
By following the PRISMA 2020 guidelines, a flowchart (Figure S1) was created showing the number of references at each stage of the review process [15].Summary tables were created showing the qualitative results of the literature.A full report was produced; in this, there is a general overview of the main findings of the review.

Critical Appraisal
A critical evaluation of the articles using the Newcastle-Ottawa scale (NOS) was independently carried out by two authors (O.E.S. and D.G.) [16]; this was a bias-risk assessment tool for observational studies that can assign up to nine points for the lowest risk of bias in three domains: (i) study group selection; (ii) comparability; and (iii) assessment of exposure and outcomes for case-control and cohort studies, respectively.
An adapted version of the NOS was used to assess cross-sectional studies [17].According to these criteria and on the standard cutoff used in the previous literature [18,19], studies were classified as being of high, moderate, or low quality when their NOS score was ≥7, 4-6, and ≤3, respectively.

Literature Search
First, 375 and 333 records were found on Scopus and PubMed/MEDLINE, respectively, and the total was 708.Second, 89 records were eliminated because they were duplications.In the end, 619 records were evaluated for admissibility.By evaluating the title and abstract, 537 records were deleted because the topic was not related (n = 530), the articles were not original (n = 3), they were not written in English (n = 3), and one was not about humans (n = 1).The full text of the 82 records was downloaded, and 7 records were excluded with reasons following an in-depth assessment.At the end of the process, 75 records were included in our review .Figure S1 shows the selection flowchart.There was a 0.7% disagreement among the authors during the first screening.Table 1 lists the characteristics of the included studies in alphabetical order by author.The articles appear from Absar N [20] to Zhong R [73], and the proceedings/conference papers appear from Ajith A [74] to the end.

Quality Assessment
The quality of the 75 studies ranged from 6 to 9. The assessment revealed a mediumhigh quality level for cohort studies and cross-sectional ones.See Supplementary Table S3 for a complete overview that is based on the NOS checklist.

Discussion
Several infectious diseases are emerging and threatening the human health condition across the world.The burden of infectious diseases is certainly a planetary issue, annually causing millions of deaths [24].Therefore, the study of infectious disease behavior has been a subject of scientific interest for many years; the early identification of emerging infectious disease outbreak patterns is critical and offers great advantages [36,43].Indeed, as is evident from the studies under observation, accurate and reliable predictions of infectious diseases can be invaluable to public health organizations planning interventions to reduce or prevent disease transmission [38] and mitigate the negative impacts of diseases [35].As seen by Ketu S. et al., the viral epidemic, in addition to exerting direct damage on people's lives, can affect a country's economy [42,43].As reiterated by Roster K. et al., recent epidemic outbreaks, such as the COVID-19 pandemic and the zika epidemic in Brazil, have demonstrated the importance and difficulty of accurately predicting new infectious diseases [57].A lack of knowledge about new infectious diseases and their consequences, along with complicated social and governmental factors, may influence the spread of every newly emerging disease [42].It is, in fact, essential to try to estimate the future movement and pattern of a new disease [39], so that preventive measures such as closing schools, shopping centers, and theaters; closing borders; suspending public services; and stopping travel can be quickly implemented [42].However, because the epidemic spread of an infectious disease usually occurs sporadically and rapidly, it is not easy to predict whether an infectious disease will emerge and how.In addition, collecting data on a specific infectious disease is not always easy.Knowledge about the transmission paths of emerging diseases, the level and duration of immunity to reinfection, and other parameters needed to build realistic epidemiological models are often scarce.For these reasons, it is necessary to find appropriate and useful information sources and data and build up reliable prediction models with these [43,57].Indeed, to develop increasingly effective control and prevention strategies, reliable computational tools that may help to understand disease dynamics and predict future cases are needed.Policymakers can use these computational tools to make decisions that are more informed [24].Several approaches have been proposed in the literature to produce accurate and timely predictions and potentially improve public health response [32].Time series forecasting and machine learning, while less dependent on disease assumptions, require large amounts of data that may not be available in the early stages of an outbreak [57].Modeling the spread of infectious diseases in space and time must account for complex dependencies and uncertainties.Machine-learning methods, especially neural networks, are useful for modeling these kinds of complex problems, even if they in some cases lack probabilistic interpretations [53].Predicting the evolution of contagion dynamics is still an open problem, to which mechanical models offer only a partial answer.To remain mathematically or computationally tractable, these models must rely on simplifying assumptions, thus limiting the quantitative accuracy of their predictions and the complexity of the dynamics they can model [51].Mathematical modeling is the most scientific technique for understanding the evolution of natural phenomena, including the spread of infectious diseases.Therefore, these modeling tools have been widely used in epidemiology to predict risks and inform decision-making [46].While imperfect, these models offer an additional input for decision makers of infectious disease responses.These results could be useful insofar as informing decisions on planning, resource allocation, and social-distancing policies [37].Deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks, as demonstrated by Murphy C. et al. [51].The analysis found that models based on combining multiple machine-learning methods, incorporating information from different models that are based on multiple data sources, produced the most robust and most accurate results [47].

Acute Respiratory Infection (ARI)
Acute respiratory infections (ARIs) are one of the main causes of morbidity and mortality in the world, particularly in children under 5 years and adults over 65 years [36].Gónzalez-Bandala et al. [36] propose a methodology that merges the predictions of a computational model with machine learning, a projection model, and a proposed smoothed endemic channel calculation.The predictions are made on weekly acute respiratory infection (ARI) data obtained from epidemiological reports in Mexico, along with the usage of key terms in the Google search engine.The results obtained with this methodology were compared with state-of-the-art techniques resulting in reduced root-mean-square percentage error (RMPSE) and maximum absolute percent error (MAPE) metrics, achieving a MAPE of 21.7%.The results show that the combination of different data analysis techniques (FFNN, SoS, and smoothed endemic channels) can provide an accurate prediction for ARI data 1 week in advance [36].

Brucellosis
Brucellosis (Malta fever) is one of the most common zoonotic diseases and has long been one of the most important health concerns for humans and animals [61].Two studies emerged in our review: one by Bagheri H. et al. [28], performed in Iran, and one by Shen L. et al. [61], performed in Europe.In both studies, the authors demonstrated that their models can predict brucellosis cases in humans; to demonstrate this, they used different models.Bagheri [28] compared RBF and MLP, stating that RBF is a more common type of neural network learning that responds to a limited section of the input space; it has a faster and more accurate and yet simpler network structure compared to other neural networks, while the MLP is more generalizable.Shen L. used LSTM, LSTM, and ARIMA as convolutional models, and the prediction results have shown that the LSTM and ConvLSTM models have higher forecast precision.

Campylobacteriosis, Q-Fever, and Typhoid
Only one study addressed these diseases, and it was Dixon S. et al. [35], where multiple models were compared: RF, XGB, MLP, ARIMA, ARIMAX, GLARMA, and SARIMA.The end result was that the XGB models performed the best for all diseases, and in general, tree-based ML models performed the best when looking at data splits [35].According to the authors, this study demonstrated the power of ML approaches to incorporate a wide range of factors to more accurately forecast various diseases, regardless of location, than traditional statistical approaches [35].

Chickenpox
Only one study addressed chickenpox; however, it was not dedicated only to this disease but rather took into consideration various infectious diseases with respiratory transmission [32].The study by Chen et al. [32], using the LASSO model, showed that predictions made more than 4 weeks in advance were increasingly discrepant from the real scenario and that the prediction model was more accurate in capturing the epidemic but less sensitive to predicting the size of the epidemic, probably because the climatic variables have different levels of importance in the accuracy of the forecasts [32].

Clostridiodes Difficile
The study by Marra et al. [48] was the one that obtained the least accuracy.It was a study on Clostridiodes difficile, and despite the large variability of the models used (LR, FR, naïve Bayes, K-nearest neighbor, MLP, Lib SVM, decision tree (J48), AdaBoost (M1), bagging, and radial basis function classifier), these machine-learning models have produced only modest results in a real-world population [32].The logistic regression, random forest, and naïve Bayes models yielded the highest performance: 0.6 [32].

Crimean-Congo Hemorrhagic Fever (CCHF)
Crimeane-Congo hemorrhagic fever (CCHF) is a tick-borne viral infection usually transmitted by tick bites or through contact with tissues, blood, or other bodily fluids from infected people and animals [23,24].In our systematic review, there were two studies conducted by the same author: Ak Ç [23,24].The author demonstrated in both studies that Gaussian process formulation obtained better results than two frequently used standard machine-learning algorithms (i.e., random forests and boosted regression trees) under temporal, spatial, and spatiotemporal prediction scenarios.The Gaussian process was the best model to spatiotemporally show CCHF outbreaks [23,24].

COVID-19
Most of the studies under observation concern the COVID-19 pandemic [20-22,25-27, 29,31,33,34,37,39,41,42,45,46,50,51,53,56,58-60,63,65-68,71,72].Ever since the coronavirus pandemic (COVID-19) emerged in Wuhan, China, and was recognized as a global threat, several national and global studies have been conducted to try to predict the epidemic, with various levels of reliability and accuracy [34], as has been carried out in Korea [37] and India [65], among others.Academic researchers all around the world have proposed various predictive models to allow policymakers to make better decisions, to apply appropriate control measures [26,29,32,63], and to reduce the burden in hospitals [58].Recently, new machine-learning approaches have been used to understand the dynamic trend of the COVID-19 spread, as was shown in the study by Verma H. et al., who used the temporal deep-learning architecture to predict COVID-19 cases in India [65].In the study by Pourghasemi H.R. et al., in order to assess the risk of COVID-19 outbreak in Fars province, Iran, a machine-learning algorithm (MLA) based on a geographic information system (GIS) and a support vector machine (SVM) was used, all while daily observations of infected cases were analyzed with polynomial and autoregressive integrated moving average (ARIMA) models to examine the virus infestation patterns in the province and Iran [56].The deeplearning model was used as a sustainable prognostic method of the COVID-19 outbreak in Bangladesh, by Mohammad Masum A.K. et al. [50].Kumar S.L. et al. showed how, through statistical modeling on COVID-19 data, they performed linear regressions, random forests, ARIMA, and LSTM to estimate the empirical indication of COVID-19 infection and intensity in four countries (the US, India, Brazil, and Russia) in order to arrive at better validation [45].Wang Y. et al. proposed the ARIMA, SARIMA, and PROPHET models to predict daily new cases and cumulative confirmed cases in the US, Brazil, and India over the next 30 days on the basis of the data set on new confirmed cases and cumulative confirmed cases of COVID-19 published by the WHO [68].Wang X. et al. created a method to predict the daily number of confirmed cases of infectious diseases by combining an ordinary differential equation mechanistic (ODE) model for infectious classes and a generalized machine-learning model (GBM) to predict how public health policies and mobility data influence the transmission rate in the ODE model [67].In addition to monitoring general research and publication activities, the use of machine-learning approaches and a theoretical understanding of information-sharing behaviors is a productive approach to improve the effectiveness of infosurveillance [60], as demonstrated by the study conducted by Zhang Y. et al., whose experimental results suggest that it is feasible to use Twitter data to provide the surveillance and prediction of COVID-19 in the United States to support decision-making by health departments [72].Shaghaghi N. et al. demonstrated that the use of eVision, an epidemic prediction system that combines machine learning (ML) in the form of a recursive neural network (RNN) long short-term memory (LSTM) and search engine statistics in order to make accurate predictions of the weekly number of cases of highly communicable diseases, was able to achieve 89% accuracy in predicting the progress of the COVID-19 pandemic in the United States [59].Among the proceedings concerning COVID-19 from an accuracy point of view, it is worth mentioning the study by Rohini et al. [89], carried out in India using K-nearest neighbors (KNN); the authors declared to have reached a predictive accuracy of 98.34%.Moreover, the studies on world data by Andreas et al. [75] and Satu et al. [90] should also be noted, which through a model created ad hoc by Andreas and the use of PROPHET by Satu, reached an R 2 greater than or equal to 0.99.

Dengue
Some of the included studies were conducted on dengue fever prediction models [55], as in the study conducted by Nguyen V.H. et al.It aimed to develop an accurate prediction model of dengue fever in Vietnam by using a wide range of weather factors as inputs to inform public health responses for outbreak prevention in the context of future climate change [52], by comparing convolutional neural network (CNN), transformer, long shortterm memory (LSTM), and attention-enhanced LSTM (LSTM-ATT) models with traditional machine-learning models on weather-based DF forecasting.Interesting results were found by Shy et al., who demonstrated that statistical models built with machine-learning methods such as LASSO have the potential to greatly improve forecasting techniques for recurrent outbreaks of infectious diseases such as dengue [62], and by Xu J. et al., whose results, based on the use of LSTM, provide a more accurate dengue prediction model and could be used for other dengue-like infectious diseases [69].As for the proceedings, Kolesnikov et al. [81] in their study indicate that the most effective predictions were given by a mathematical model based on a combination of spatial analysis techniques (MGWR) and neural networks based on the LSTM architecture.

Epatitis B
The quality of the study is modest because it is a conference paper.The study by Zhang P. et al. [94] implemented various models: oriented attention model (OAM), AR, LSTM, gated recurrent unit, encoder-decoder, CNN, CNN-RNN, LSTM-attn, GRU-attn, ED-attn, CNN-attn, and CNNRNN-attn.The self-attention significantly improves the predictive accuracy of all comparable methods.The MAE and RMSE values were decreased by 51.67% and 39.43% at most, respectively.The R 2 increased by 52.99% at most [94].

Epatitis E
Guo Y. et al. investigated what might be the most appropriate model to predict the incidence of hepatitis E. By comparing ARIMA, SVM, and LSTM, they found that nonlinear models (SVM, LSTM) outperform linear models (ARIMA).LSTM obtained the best performance according to all three metrics: RSME, MAPE, and MAE.Hence, LSTM is the most suitable for predicting the hepatitis E monthly incidence and cases numbers [38].

Hand, Foot, and Mouth Disease
Studies on a variety of other infectious diseases were included in the review, such as handfoot-mouth disease (HFMD), an increasingly prominent public health problem that has caused an epidemic in China every year since 2008.Predicting the incidence of HFMD and analyzing the key factors that may play a role are of great importance for its prevention [49,63], which has been researched in the study conducted by Meng D. et al.They proposed two machinelearning algorithms, random forest and eXtreme gradient boosting (XGBoost), for the analysis and prediction of hand, foot, and mouth disease [49].Among the proceedings was a study by Zhang et al. [94], where the authors stated that the self-attention methodology applied to various models increased the accuracy of their predictions.

Influenza/Influenza-Like Illness (ILI)
Several studies included in the review concern influenza and influenza-like illness (ILI) [44,47,54,64,70].Influenza epidemics are a major public health challenge worldwide and annually cause thousands of deaths, posing a serious threat to worldwide health [44,47].In assessing the utility of influenza-like illness (ILI), surveillance systems and developing approaches for predicting future trends are important for pandemic preparedness [54].Building forecasting models and accurate systems that track influenza activity at the city level are necessary to provide usable information for clinical, hospital, and community outbreak preparedness [44,47].Such propositions find application in studies such as the one conducted by Venkatramanan S. et al., which can contribute to the development of the timely forecasting of infectious diseases on a global scale by using human mobility data, expanding its applications in the area of infectious disease epidemiology [64]; that of Xu Q. et al., who attempted to predict influenza in Hong Kong with Google search queries and statistical model fusion [70]; and that of Lu F.S. et al., who demonstrated how information from Internet-based data sources, when combined with using an informed, robust methodology, can be effectively used as early indicators of influenza activity at fine geographic resolutions [47].

Malaria
Another application of machine learning was studied by Kamana E. et al.They demonstrated how the LSTMSeq2Seq model can be effectively applied in the prediction of malaria re-emergence [40].The LSTMSeq2Seq model achieved an average prediction accuracy of 87.3% [40].A lower accuracy was calculated by Brock et al. [76], who, using the BRT model, estimated an accuracy of between 55% and 82%.

West Nile Virus
West Nile fever is a disease caused by the West Nile virus (WNV), which was discovered in Africa in 1979 and which quickly spread to Western Asia, Europe, Australia, and the US thanks to its natural reservoirs: birds and mosquitoes [74].In this study, the author says that random forest was able to correctly predict the probability of WNV's presence with the highest accuracy; it also is possible to know not only the possibility of WNV's occurrence but also how it could spread.This could help policymakers to implement safety measures to prevent the deadly spread of WNV [74].

Zika
This study by Roster K. et al. [57] took into consideration various diseases, including COVID-19 and zika, which we will briefly discuss.Human infection with zika virus is a viral disease transmitted by the bite of infected mosquitoes.If the subject is stung by a carrier mosquito and stung again by an uninfected mosquito, this can trigger a chain capable of giving rise to an endemic outbreak, in which human-to-human contagion, even if possible, is modest and unlikely [57].The study was conducted in Brazil, and the models used were RF, TrAdaBoost, and neurol network (NN) [57].

Strengths and Limitations
As with any study, this systematic literature review has its limitations.First, we limited our search to articles published in English, and this might have reduced the total number of potential eligible studies, although English is the most commonly used language in the scientific community.The number of papers included in the systematic review was low, despite the fact that the initial search comprised more than 700 articles.Nevertheless, we believe this did not significantly affect our results; indeed, the small number of retrieved articles could be rather due to the fact that this is a relatively new area of research, proved by the fact that articles have been published only in the past few years: the 75 papers included in the study were published between 2016 and 2022.Second, this study reviewed only published journal articles and proceedings; however, normally important studies are disseminated mainly by scientific journal articles and not by editorials or commentaries.Another important limitation of the study is due to the fact that among the articles included are proceedings and conference papers, which could have diminished the quality of this systematic review.The study also bears limitations on the applicability of all items of the PRISMA checklist; in fact, items 12, 13, and 15 have not always been developed or completed adequately.For item 12, it was not possible to specify the effect measures for each result, because in most cases, the effect measures such as odds ratios were completely absent.This systematic review explored a new and objective field of the study, which was as specified in paragraph 1.3, to show whether it is possible to predict infectious disease outbreaks early by using machine learning.For item 13, the study was not composed of clinical trials, and a meta-analysis was not performed; therefore, most of the subitems were not applicable to this systematic review.For the applicable items, 13a to 13c, the synthesis methods were applied.In the assessment of certainty (item 15) as related to item 12 (measures of effect), the authors tried to evaluate the quality of the studies through NOS even if for this item it may not have been the appropriate tool.
Lastly, our paper also has some important strengths.First, as the title suggests, this article is a systematic literature review.Second, our search strategy was developed with several keywords.Third, the review was conducted in accordance with international guidelines.

Conclusions
The use of machine-learning models for the early or real-time verification of epidemics is a new and innovative field that today presents many study methodologies, and this makes it difficult to compare studies from a methodological point of view even if almost all of them come to the same conclusion, which is that the outbreak of major infectious diseases can be monitored.
Concerning public health and preventive strategies, this study suggests that machine learning is a new tool that can be widely used for public health practice.On the basis of the evidence collected so far, we can hypothesize that the research objectives explored so far and the results generated can be considered preliminary and that new research questions and new applications of this method can be developed in the future.In light of the recent outbreaks of infectious diseases that have occurred around the world over the past few decades, machine-learning models offer an opportunity to monitor the introduction and spread.Furthermore, their use in scientific research is expected to grow and become more and more used in the daily life of the healthcare world.
The results showed that with the association of multiple machine-learning models, it is possible to spatially and temporally predict the trends or the incidence of infectious diseases, and future research efforts will allow the construction of more-precise and moreplausible models.In conclusion, the combination of different data sets improves the results.The most probable ones are obtained when the data source is not unique, but if different data are used in this way, the results are more likely to be accurate.For future research, the integration of machine-learning models is suggested to improve existing standard epidemiological models in terms of accuracy.