A Statistical Examination of Distinct Characteristics Inﬂuencing the Performance of Vector-Borne Epidemiological Agent-Based Simulation Models

: The spread of infectious diseases is a complex system in which pathogens, humans, the environment, and sometimes vectors interact. Mathematical and simulation modelling is a suitable approach to investigate the dynamics of such complex systems. The 2019 novel coronavirus (COVID-19) pandemic reinforced the importance of agent-based simulation models to quickly and accurately provide information about the disease spread that would be otherwise hard or risky to obtain, and how this information can be used to support infectious disease control decisions. Due to the trade-offs between complexity, time, and accuracy, many assumptions are frequently made in epidemiological models. With respect to vector-borne diseases, these assumptions lead to epidemiological models that are usually bounded to single-strain and single-vector scenarios, where human behavior is modeled in a simplistic manner or ignored, and where data quality is usually not evaluated. In order to leverage these models from theoretical tools to decision-making support tools, it is important to understand how information quality, human behavior, multi-vector, and multi-strain affect the results. For this, an agent-based simulation model with different parameter values and different scenarios was considered. Its results were compared with the results of a traditional compartmental model with respect to three outputs: total number of infected individuals, duration of the epidemic, and number of epidemic waves. Paired t -test showed that, in most cases, data quality, human behavior, multi-vector, and multi-strain were characteristics that lead to statistically different results, while the computational costs to consider them were not high. Therefore, these characteristics should be investigated in more detail and be accounted for in epidemiological models in order to obtain more reliable results that can assist the decision-making process during epidemics.


Introduction
Infectious disease outbreaks are among the largest and oldest challenges faced by humanity. Although the theme is old, characteristics of the current globalized world, such as interconnectivity and frequent movement of people, allow infectious diseases to spread further and more quickly nowadays. Moreover, due to cross-species transmission, there is also an increased risk for novel diseases to emerge.
Recently, the 2019 novel coronavirus (COVID- 19) quickly spread around the globe, leading countries to activate emergency plans, travel restrictions, and quarantine [1]. Apart from the large number of cases and deaths, the outbreak led to anxiety, struggle in the health systems, and global economic slowdown as restrictions were imposed [2]. As highlighted by the World Health Organization [3], the year 2020 was the scenario we all had feared for decades, "a virus that spread quickly around the world". This scenario shows the The overview, design concepts, and details (ODD) protocol proposed by Grimm et al. [26] was used in this study to provide researchers with a rigorous structure and information that would allow them to systematically replicate the experiments in different contexts. The protocol is shown in Table 1. Table 1. Overview, design concepts, and details protocol for the study.

Overview Purpose
The model developed in this study, which represents a simplified abstraction of reality, was designed to investigate and highlight the importance of three different factors, namely, data quality, human behavior, and multi-strain, multi-vector, when developing epidemiological models. Rather than developing a detailed representation of an epidemic to predict the outcomes of future epidemics or to understand past epidemics, the goal was to call the attention of the academic community to the importance of investigating the aforementioned factors more in depth when developing epidemiological models.

State variables and scales
Agents: (i) humans, (ii) mosquitoes, and (iii) the environment. In this simplified abstraction, individuals and mosquitoes were randomly distributed within the environment. The epidemiological parameters, such as latent rate and recovery rate, were considered independent of age, gender, time, or any other parameters. Detailed information about the parameters used in each model is provided in Section 2.1.

Process overview and scheduling
In vector-borne diseases, there are generally three types of agents: (i) the pathogen, which may be a virus or bacteria; (ii) the vector, in this work a mosquito that can be either Aedes aegypti or Aedes albopictus; and, (iii) the final host, a human in this case.
The model was built based on the traditional susceptible, exposed, infectious, and recovered (SEIR) and susceptible, exposed, and infectious (SEI) compartmental models. The SEIR model was used to represent humans, while the SEI model was used to represent the vector. The baseline conceptual model represented by the SEIR-SEI compartmental model is shown in Figure 1, and Table 2 provides the definition of the symbols presented in Figure 1.
The life cycle of the pathogen can be described in four stages: (1) The pathogen is transmitted from an infectious mosquito (Mi) to a susceptible host (Hs) when the mosquito feeds on human blood.
(2) The pathogen infects the exposed host (He), who still has no ability to transmit the disease to another mosquito. After the latent period, the pathogen reaches sufficiently high densities in the blood of the infectious host (Hi), who is now able to infect another susceptible mosquito (Ms).
(3) Whenever a susceptible mosquito feeds on an infectious host, the susceptible mosquito inoculates the pathogen and becomes an exposed mosquito (Me). Similar to the host, the mosquito is not able to immediately transmit the disease to other susceptible hosts.
(4) After the latent period, the pathogen develops in the mosquito to the point that the pathogen becomes present in the salivary glands of the mosquito, who becomes an infectious mosquito (Mi) and can now transmit the disease by biting a susceptible host. After the recovery period, the host is considered to be recovered (Hr) and immune to the pathogen (this is not true for all mosquito-borne diseases but it is true for some of them, such as dengue with respect to the same virus strain and chikungunya). An infectious mosquito never recovers from the disease and will stay infectious until its death. Three different models were developed in this work to meet the proposed goal. Each model works slightly differently and they are described in detail in The transition between each state is based on the epidemiological parameters as shown in Figure 1 and described in Tables 2 and 3. The data were collected at the end of each day on the basis of an event.

Design concepts Design concepts
Stochasticity was considered in all models through the infectious disease parameters used as input. These data are provided in Tables 4-7. Adaptation was considered in Model C when humans changed their behavior in response to the total number of infected individuals, either instantaneously or after a specific amount of time. This was considered at an individual and a population level. In summary, human behavior was considered to change in four different situations, as described in Section 2.1.3.

Details Initialization
Humans and mosquitoes were uniformly randomly distributed in the continuous space. The human and mosquito population size and the initial number of infectious humans and mosquitoes are provided in Tables 4-7.

Input
The input data used in the model were defined within the model and they are provided in Tables 4-7.

Sub-models
This consists of the "skeleton" of the model, as well as its description, which are provided in Figures 1-4 and Section 2.
of these modules. The transition between each state is based on the epidemiological par as shown in Figure 1 and described in Tables 2 and 3. The data were c at the end of each day on the basis of an event.

Design concepts Design concepts
Stochasticity was considered in all models through the infectious dise rameters used as input. These data are provided in Tables 4-7. Adaptation was considered in Model C when humans changed their b ior in response to the total number of infected individuals, either insta ously or after a specific amount of time. This was considered at an ind and a population level. In summary, human behavior was considered change in four different situations, as described in Section 2.1.3.

Details Initialization
Humans and mosquitoes were uniformly randomly distributed in the uous space. The human and mosquito population size and the initial n of infectious humans and mosquitoes are provided in Tables 4-7.

Input
The input data used in the model were defined within the model and are provided in Tables 4-7.

Sub-models
This consists of the "skeleton" of the model, as well as its description, are provided in Figures 1-4 and Section 2.  He Number of exposed humans

Experiment Design
The input data used in this study was taken from the World Health Organization [27,28], Araújo [29], and Yakob and Clements [30].
To answer the research questions discussed in the Introduction section, we consid-

Experiment Design
The input data used in this study was taken from the World Health Organization [27,28], Araújo [29], and Yakob and Clements [30].
To answer the research questions discussed in the Introduction section, we considered 4 different models: (1) Model A or baseline (single-strain, single-vector dengue spread model); (2) Model B, which is the baseline model with different parameter values to investigate the impact of data quality; (3) Model C, coupling human behavior and dengue spread model; and (4) Model D, a multi-strain, multi-vector dengue spread model. The models were developed from modifications of the baseline model (Model A) in order to answer the research questions discussed in Section 1. The required modifications of each model are discussed in their respective sections.
The parameters used in each one of these models are presented in Table 3. Model A is the baseline model and contains the parameters as described in Section 3. Model B is also the baseline model with different parameter values to assess the impact of data quality on the results of the epidemiological model. In other words, there is no logic modification between Model A and Model B. Model C is the model where the impacts of human behavior are investigated. The parameters related to human behavior, such as population cautious factor and population time to switch behavior, are included. Finally, Model D is the model where the impacts of multi-strain and multi-vector are investigated. Parameters such as the daily human latent rate for DENV2 and the proportion of wild and Wolbachia-carrier mosquitoes are included. A more detailed discussion about each model and the number of parameters added in each model is presented below.  Each model was run for 2 years (730 days), which was long enough for the epidemic to die out in all iterations and replications of the experiment. A total of 50 replications were performed and 3 output responses of interest were considered: (1) total number of infected individuals, (2) duration of the epidemic in days, and (3) number of epidemic waves. Three model measures were also collected: (1) runtime, (2) number of agents/entities, and (3) number of states.
Paired t-test with α = 0.05 was applied to the results of the computational models to investigate whether the results of Model A (baseline low-level versus baseline high-level) were statistically significantly different or not for each one of the three output responses and runtime. Multi-way analysis of variance (ANOVA) with α-level of 0.05 was used to investigate (i) in Model B, the impact of changing the values of the different inputs per baseline level (low or B1 and high or B2) on the three responses and runtime; (ii) in Model C, the impact of considering human behavior on the vector-borne disease model; and (iii) the impact of considering multi-strain and multi-vector on the vector-borne disease model. Next, the Tukey multiple comparison method was used to identify which pair of treatment (or inputs values) was significantly different among the inputs. The tests were performed using the software JMP.
The baseline model was described in the process overview and scheduling row of Table 1. The parameter values used in the model are given in Table 4. The assumptions adopted in this study are based on the work of Ross and Thomson [31], Kermack and McKendrick [32], Dumont et al. [33], and Yakob and Clements [30]. Such assumptions lead to a simplified model compared to reality. However, such limitation should not affect the objective of this work, since this study proposes to explore the importance of data quality, human behavior, multi-strain, and multi-vector in the results of disease spread models. Rather than foresee possible future epidemics or seek to understand past epidemics, the goal here is to draw the attention of researchers and experts in the field to the importance of these characteristics and to serve as a starting point for the elaboration of more detailed and more realistic models.

Model B (Data Quality)
In order to assess the impact of data quality on the model's results, we decided to perform a sensitivity analysis on each of the factors of the baseline model. In the sensitivity analysis, the factors were varied one at a time. By varying the parameters one at a time, we were able to gain insight into the impact of the parameters on the model results, but it was not possible to assess the impact of the interaction between the parameters on the simulation responses. Table 5 presents the parameter values used to assess the impact of the information quality. For all scenarios, the parameters were varied one at a time. Therefore, the experiment in this model had a total of 65 scenarios: (1) 3 different levels (C1, C2, and C3) for each of the 10 parameters of Table 5 for low-and high-level baselines (B1 and B2), except for level C3 of "initial number of infectious mosquitoes" for low-level baseline because it is not feasible. This gives a total of 3 × 10 × 2 − 1 = 59 scenarios. (2) Low-level (B1) of parameters "mosquito population size", "initial number of infectious mosquitoes", and "human population size" for high-level baseline (B2), which gives 3 scenarios. Finally, (3) high-level (B2) of parameters "mosquito population size", "initial number of infectious mosquitoes", and "human population size" for low-level baseline (B1), which gives 3 iterations. The justification for varying the value of each parameter is given below. • Mosquito population size: variation in mosquito population size may represent either the adoption of control strategies (e.g., insecticide use), the elimination of mosquito breeding sites (e.g., cleaning pots with standing water), climatic variation (e.g., increased rainfall and temperature that favor the reproduction of mosquitoes), or errors in estimating the mosquito population through techniques such as mosquito trap. • Initial number of infectious mosquitoes: this parameter was varied to represent regions in which the disease is imported by travelers who bring infectious mosquitoes to the area and regions where the disease is endemic.

•
Mosquito daily latent rate, mosquito daily mortality rate, human daily latent rate, and human daily recovery rate: considered low and high rates, on the basis of the values found in the literature, as well as low and high variation. The variations of these rates represent the existence of several types of virus that can reproduce in mosquitoes and humans more slowly or quickly and, consequently, also affect the human recovery rate; the genetic and immune variation of humans and mosquitoes; the use of medical treatment that affects the recovery rate of individuals; and climatic variations and use of control measures, such as the use of screens in windows and insecticides, which may alter the mortality rate of mosquitoes. • Daily infectivity rate from mosquito to human: the variations of this rate are due to reasons similar to mosquito daily latent rate, such as the existence of different types of virus, and genetic and immune variation of mosquitoes. • Human population size: the size was varied to represent different neighborhoods or sizes of cities. • Initial number of infectious humans: this parameter was varied for reasons similar to mosquito population size and initial number of infectious mosquitoes. The scenarios in Table 5 with initial number of infectious humans equal to 0 are equivalent to an epidemic-free population where a new epidemic is normally carried by a mosquito brought from an epidemic area. Besides representing an epidemic-free society and a society in which the disease is endemic, the variation may also represent large events, such as big sporting events, music events, or refugee entry into a region, which can lead to several cases imported at a single time.

•
Daily infectivity rate from human to mosquito: the variations of this rate were for reasons similar to mosquito daily latent rate and daily infectivity rate from mosquito to human, such as the existence of different types of virus, genetic and immune variation of humans, and use of medical treatment.

Model C (Coupled Human Behavior and Dengue Spread Model)
Despite the impacts of human behavior in the course of an outbreak, many disease spread models still ignore the human behavior factor. Lack of data on human behavior during outbreaks and the difficulty in quantifying some human behaviors may be one of the primary reasons for not including human behavior in disease spread simulation models. Moreover, to incorporate behavior in simulation models, it is necessary to have a more detailed model that makes use of agents, which requires more processing power, especially when the agent population is large.
Human behavior is considered in this study in 4 different situations: (1) situation 1, where the whole population adopts the same cautious behavior after the epidemic has reached a specific threshold; (2) situation 2, where each individual adopts his/her own cautious behavior after the epidemic has reached a specific threshold; (3) situation 3, where the whole population adopts the same cautious behavior after the epidemic has crossed the specific threshold for a specific amount of time; and (4) situation 4, where each individual adopts his/her own cautious behavior after the epidemic has crossed the specific threshold for a specific amount of time.
Compared to the baseline, the population behavior model and the individual behavior models (situations 1 and 2 of Model C) have two extra varying parameters each, namely, the percentage of infected individuals to trigger cautious behavior and the population/individual cautious factor. Compared to both previous situations, the inclusion of time to change population or individual behavior adds 1 extra parameter in each case. A total of 3 states had to be added to represent the change in human behavior. Table 6 presents the parameter values used to investigate the coupled human behavior. Usually, dengue is modeled as a single-strain, single-vector disease. This can be represented by the traditional SEIR-SEI compartmental model discussed in Model Abaseline. This was realistic in the past when only one virus strain was encountered in the endemic regions, and when multiple strains existed, they were not simultaneously encountered. However, nowadays many countries, including Brazil, have 2 or more dengue virus strains at the same time. Moreover, countries are investigating new vector control methods, such as Wolbachia-carrier mosquitoes and genetically modified mosquitoes, as possible alternatives to contain the spread of the disease. Wolbachia-carrier mosquitoes can become infected by feeding on infectious humans, but they cannot further transmit the disease to susceptible humans [34].
In this study, we consider the possibility that 2 virus strains coexist and the use of Wolbachia-carrier mosquitoes as an alternative method for disease prevention. Currently, Wolbachia-carrier mosquitoes are Aedes aegypti mosquitoes that have been altered in a laboratory and are introduced into the environment on the basis of health policy decisions made by government agencies. This altered mosquito has already been approved as a safety Aedes aegypti control strategy in different countries after empirical studies have been successfully conducted [35].
To represent this multi-strain, multi-vector context, some changes had to be made in the baseline model. For the human population, 4 extra states were added and 1 of the existent states was modified. Six states were added for the mosquito population. Table 7 presents the parameter values used to investigate the coupled human behavior.

Baseline
As discussed in Section 2.1, paired t-test with α = 0.05 based on the high-level baseline (B2)-low-level baseline (B1) was applied to investigate whether the results of Model A were statistically significantly different or not for each one of the three output responses and runtime. The test results comparing the low-and high-levels of the baseline model for each of the output responses and for runtime are presented in Table 8, followed by a discussion of the results. As is expected, the total number of infected individuals was statistically different due to the larger human population in the high-level baseline. The results also show that in the larger population, the epidemics lasted longer, but the number of epidemic waves was not statistically different than in the smaller population, which was the first interesting observation from this study. As also expected, the runtime when modelling a larger population was statistically greater due to the increase in the number of agents in the agent-based model. Figures 5 and 6 show the boxplot of the output responses. One can see that the variability was greater for the total number of infected individuals on the high-level of the baseline, while the opposite was observed for the number of epidemic waves. For the response duration of the epidemic, the variability did not seem to change considerably in terms of the human and mosquito population sizes. Figure 7 shows the evolution of the epidemic on a large population. In Figure 7, it is possible to observe the total number of humans and mosquitoes in each epidemiological state (susceptible, exposed, infectious, and recovered). As is expected, the total number of infected individuals was statistically different due to the larger human population in the high-level baseline. The results also show that in the larger population, the epidemics lasted longer, but the number of epidemic waves was not statistically different than in the smaller population, which was the first interesting observation from this study. As also expected, the runtime when modelling a larger population was statistically greater due to the increase in the number of agents in the agentbased model. Figures 5 and 6 show the boxplot of the output responses. One can see that the variability was greater for the total number of infected individuals on the high-level of the baseline, while the opposite was observed for the number of epidemic waves. For the response duration of the epidemic, the variability did not seem to change considerably in terms of the human and mosquito population sizes. Figure 7 shows the evolution of the epidemic on a large population. In Figure 7, it is possible to observe the total number of humans and mosquitoes in each epidemiological state (susceptible, exposed, infectious, and recovered).

Impact of Data Quality
Multi-way analysis of variance (ANOVA) with an α-level of 0.05 was used to investigate the impact of changing the values of the different inputs per baseline level (low or B1 and high or B2) on the three responses of interest and runtime. First, ANOVA was used to identify whether there was a relationship between the output and the inputs, that is, whether the model was statistically significant. Next, ANOVA was used to identify which inputs were statistically significant for the model. Finally, for the inputs found to be statistically significant, the Tukey multiple comparison method was used to identify, among the inputs, which pair of treatment (or inputs values) was significantly different. The ANOVA results can be found in Table 9 and the Tukey multiple comparison results can be found in Table 10.
As shown in Table 9, in all the experiments performed, there was statistically significant evidence of relationship of at least one of the inputs and each of the outputs at αlevel of 0.05. In general, the inputs were not statistically significant for the runtime per replication when a low population (low number of agents) was considered. Human population size and the initial number of infectious humans were the only inputs that were

Impact of Data Quality
Multi-way analysis of variance (ANOVA) with an α-level of 0.05 was used to investigate the impact of changing the values of the different inputs per baseline level (low or B1 and high or B2) on the three responses of interest and runtime. First, ANOVA was used to identify whether there was a relationship between the output and the inputs, that is, whether the model was statistically significant. Next, ANOVA was used to identify which inputs were statistically significant for the model. Finally, for the inputs found to be statistically significant, the Tukey multiple comparison method was used to identify, among the inputs, which pair of treatment (or inputs values) was significantly different. The ANOVA results can be found in Table 9 and the Tukey multiple comparison results can be found in Table 10.
As shown in Table 9, in all the experiments performed, there was statistically significant evidence of relationship of at least one of the inputs and each of the outputs at αlevel of 0.05. In general, the inputs were not statistically significant for the runtime per replication when a low population (low number of agents) was considered. Human population size and the initial number of infectious humans were the only inputs that were

Impact of Data Quality
Multi-way analysis of variance (ANOVA) with an α-level of 0.05 was used to investigate the impact of changing the values of the different inputs per baseline level (low or B1 and high or B2) on the three responses of interest and runtime. First, ANOVA was used to identify whether there was a relationship between the output and the inputs, that is, whether the model was statistically significant. Next, ANOVA was used to identify which inputs were statistically significant for the model. Finally, for the inputs found to be statistically significant, the Tukey multiple comparison method was used to identify, among the inputs, which pair of treatment (or inputs values) was significantly different. The ANOVA results can be found in Table 9 and the Tukey multiple comparison results can be found in Table 10.
As shown in Table 9, in all the experiments performed, there was statistically significant evidence of relationship of at least one of the inputs and each of the outputs at α-level of 0.05. In general, the inputs were not statistically significant for the runtime per replication when a low population (low number of agents) was considered. Human population size and the initial number of infectious humans were the only inputs that were shown to be statistically significant in this case. In a small population scenario, a few factors also appeared to not be significant for the number of epidemic waves, such as the mosquito population size, the daily infectivity rate from mosquito to human, and the daily infectivity rate from human to mosquito. Interestingly, mosquito population size was also not significant for the responses total number of infected individuals and duration of the epidemic in days on the small population baseline level. This can lead to discussion and further investigation of the effectiveness of control measures, such as controlling mosquito population by the elimination of mosquitoes' habitats such as bottles, tires, and fountains versus reducing the personal contact with mosquitoes by using window and door screens, mosquito repellents, long sleeve clothes, etc.
A detailed discussion of the results is presented below.     The mosquito population size appeared to have a higher impact on the epidemic responses for larger human population size. While in the low-level baseline, the mosquito population size was not a significant input for any of the responses considered, in the high-level baseline, 9 out of 10 comparisons were statistically different with respect to the total number of infected individuals, and at least 4 comparisons were also statistically different for duration of the epidemic in days and number of epidemic waves. The output response "total number of infected individuals" appeared to be the most sensitive to the size of the mosquito population, followed by runtime per replication, number of epidemic waves, and last by the duration of the epidemic in days.
The mosquito population size had an opposite impact on the pair of output responses "total number of infected individuals" and "duration of the epidemic"-an increase in the mosquito population size increased the total number of infected individuals, but it reduced the duration of the epidemic in days. A similar characteristic was observed for the pair of output responses "total number of infected individuals" and "number of epidemic waves". This was possibly due to the herd immunity effect, where the majority of the population has more quickly become infected and immune to the disease, shortening the duration of the epidemic and the number of epidemic waves.
With respect to runtime, by increasing the mosquito population size, there was an increase in runtime. An average increase of 10.87 s per replication from the baseline was observed when the human population size was large.

2.
Initial number of infectious mosquitoes: The initial number of infectious mosquitoes appeared to have a great impact on the results of epidemiological models, especially for the high-level baseline. The input appeared to not be as significant for runtime, wherein the low-level baseline was not considered significant, and in the high-level baseline, only a few pairs (5 out of 10) were found to be significant. The input also appeared to be less significant for the total number of infected individuals in small populations, where only two pairs were found to be significant. For larger populations, with the exception of one pair for the output response "duration of the epidemic in days" and one pair for "number of epidemic waves", both in the high-level baseline, all pairs were statistically different for the output responses "total number of infected individuals", "number of epidemic waves", and "duration of the epidemic in days". The impact of this parameter was similar to parameter #1-an increase in the initial number of infectious mosquitoes led to an increase in the total number of infected individuals, but in a reduction of the duration of the epidemic and in the number of epidemic waves.
An average increase of 7.50 s per replication from the baseline was observed when the human population size was large. This increase in runtime is probably explained by the increase in the number of state changes in the simulation model-with the increase in the initial number of infectious mosquitoes, there was an increase in the total number of infected individuals and, hence, the humans went through more state changes (from susceptible to exposed to infectious to recovered).

3.
Mosquito daily latent rate: This parameter appears to impact all three output responses, with the output response "total number of infected individuals" being the one most sensitive to variations in the mosquito daily latent rate and the "number of epidemic waves" the least sensitive. The output response "duration of the epidemic" did not appear to be as sensitive in the lowlevel baseline as it was in the high level. The runtime decreased with the increase in the mosquito daily latent rate. The reduction in runtime can be explained following similar logic to parameter #2-there was a lower number of state changes in the simulation model and consequently a reduction in runtime.
It is worth pointing out that, contrary to parameters #1 and #2, a reduction in this parameter led to a decrease in all three output responses. This indicates the need to direct research on reducing the mosquito daily latent rate (i.e., increasing the latent period), because contrary to what happened when reducing the mosquito population size, reducing the mosquito daily latent rate had a positive effect, that is, reduces all three output responses being investigated.

4.
Mosquito daily mortality rate: This parameter impacted the epidemiological model results similar to parameter #3 above. First, the mosquito daily mortality rate appeared to impact all three output responses, with the output response "total number of infected individuals" being the one most sensitive to variations in the parameter and the "number of epidemic waves" the least sensitive. Similar to parameter #3, the output response "duration of the epidemic" did not appear to be as sensitive in the low-level baseline as it was in the high level. The runtime also decreased with the increase in the mosquito daily mortality rate. Finally, an increase in the mosquito mortality rate led to a reduction in all three output responses. This may indicate that more important than controlling the birth of new mosquitoes is assuring that the mosquito lifetime is shortened, which has been the focus of some new strategies such as the release of genetically modified mosquitoes.
The results of this parameter also agree with what is known about mosquito-borne diseases. For instance, it is known that temperature, rainfall, and mosquito density in the environment are factors that have a considerable impact on the lifetime of mosquitoes and, therefore, disease epidemics such as dengue are stronger in summer months and weaker in winter months, when the mosquito lifetime is shorter. Using this finding, it is important that researchers, especially entomologists, conduct more empirical experiments to know more accurately the mosquito mortality rate and how this parameter varies int erms of climatic and population factors. The variability of the parameter also seems important, because tests that compared scenarios with less variability (e.g., B1-C2) resulted in larger statistical differences.

5.
Daily infectivity rate from mosquito to human: This parameter impacts the epidemiological model results similar to parameter #3. It is important to mention that for small population size, this parameter was not significant for the output response "number of epidemics waves" and there were only one or two pair comparisons where the parameter was shown to be significant for the output response "number of epidemic waves" for large population sizes and the response "duration of the epidemic in days" in both small and large population sizes.

6.
Human population size: Contrary to parameter #1, an increase in the human population size led to an increase in all three output responses and vice versa. This is expected and, unfortunately, it was not as useful in terms of control strategies. However, it indicates the need for research that investigates the impact of quarantine and isolation in controlling dengue. These strategies have been discussed at length during the investigation of airborne and direct contact transmitted diseases, such as influenza, but the strategies have not been extensively explored in terms of mosquito-borne diseases.
The runtime was highly affected by the number of humans or agents in the model. Human population size affected runtime regardless of the population size, and there was only one pair comparison that was not statistically significant, which indicates that runtime was very sensitive to this parameter.
Another result was that larger populations prolonged the epidemic (since more individuals are transmitting the pathogen). However, it is important to investigate whether these results remain in epidemics in which the pathogen is transmitted more slowly or more rapidly. With epidemics that spread too quickly, the total number of infected individuals will most likely still increase with the increase in the human population size, but the duration may decrease because all individuals will quickly become infected, as discussed in parameter #1. On the other hand, with epidemics that spread slowly, the epidemic may end before it infects many individuals, which would lead to a decrease in the total number of infected individuals, as well as the duration of the epidemic.

7.
Initial number of infectious humans: As expected, an increase in the initial number of infectious humans led to an increase in the total number of infected individuals for large population size. However, the increase in the initial number of infectious humans led to a decrease in the duration of the epidemic and the number of epidemic waves for large population size. For a small population, initially, an increase in the parameter led to an increase in the total number of infected individuals, duration of the epidemic, and the number of epidemic waves. However, a further increase of the parameter led to a reduction of all three output responses. This was likely due to the disease spreading more quickly over all the population and it may also indicate an interaction among the parameters. This also highlights the importance of investigating the quality of parameters values because the output responses do not monotonically increase or decrease as a function of the parameter. The output response "total number of infected individuals" appeared to be the most sensitive to this parameter.

8.
Human daily latent rate: This parameter had a similar impact on the model results as parameter #3. However, it is important to highlight two main differences. First, this parameter was not found to be significant for the total number of infected individuals, while parameter #3 was considered significant for the total number of infected individuals. The second difference is that conversely to parameter #3, where an increase in the parameter led to an increase of the total number of infected individuals only and a reduction in the duration of the epidemic and number of waves. Here, an increase in this parameter led to a decrease in the total number of infected individuals, but also to an increase in the duration of the epidemic and the number of waves. This can be intuitively explained-since the individuals become infectious faster, the total duration of the epidemic for one individual is shorter, and, therefore, there may not be enough time to infect many mosquitoes and, consequently, other humans. However, the epidemic may last longer and have more waves due to sporadic cases here and there. This parameter is another important factor for future control actions.

9.
Human daily recovery rate: Similar to parameter #4, an increase in the human daily recovery rate led to a reduction in the total number of infected individuals and in the duration of the epidemic. However, in this case, the number of epidemic waves increased. According to the Tukey multiple comparison test, in a large population, the human daily recovery rate resulted in a different number of infected individuals, different duration of the epidemic, and different number of epidemic waves in almost every test performed. For small population size, the output responses appeared to not be as sensitive to the variation in the parameter, but the parameter was still found to be significant in many pair comparisons. Thus, due to the impacts of this parameter on the epidemiological model results, health agencies must investigate the recovery rate of each disease to provide accurate information to researchers working on disease spread models. Likewise, due to the impact of the human recovery rate on the epidemic responses, the population must follow the treatment prescribed by doctors and health agents to maximize the recovery rate. During the recovery period, it is also important to follow the guidelines in adopting control measures, such as protecting against mosquito bites, in order to avoid infecting new mosquitoes.

Daily infectivity from human to mosquito:
The results were similar to parameters #3 and #5 with respect to the total number of individuals infected. However, the duration of the epidemic and the number of epidemic waves were not as sensitive to this parameter, with only 3 comparisons out of 24 being statistically different.
Overall, initial number of infectious mosquitoes (#2), mosquito daily latent rate (#3), mosquito daily mortality rate (#4), human population size (#6), initial number of infectious humans (#7), and human daily recovery rate (#9) were the parameters that appeared to have a greater impact on the three output responses considered simultaneously. However, it is worth noting that initial number of infectious mosquitoes had an opposite impact on the output responses-an increase in the parameter increased the "total number of infected individuals" and decreased the "duration of the epidemic in days" and the "number of epidemic waves".
The mosquito population size (#1) for small population size, the human daily recovery rate (#8), and the daily infectivity rate from human to mosquito (#10) were the parameters that appeared to have less impact on the three output responses considered simultaneously. Figure 8 shows the results discussed above in a succinct way. It is possible to identify that parameters #1 and #10 were the ones that led to less variation in the results when compared to other parameters in the low-level baseline, while parameter #8 was the one that led to less variation in the high-level baseline. On the other hand, parameter #6 was the one that led to the highest variation in both levels, followed by parameters #2, #3, #4, #7, and #9. Figure 8 shows the results discussed above in a succinct way. It is possible to identify that parameters #1 and #10 were the ones that led to less variation in the results when compared to other parameters in the low-level baseline, while parameter #8 was the one that led to less variation in the high-level baseline. On the other hand, parameter #6 was the one that led to the highest variation in both levels, followed by parameters #2, #3, #4, #7, and #9.

Impact of Human Behavior
Multi-way analysis of variance (ANOVA) with an α-level of 0.05 was used to investigate the impact of considering human behavior on the vector-borne disease model. The human behavior was considered through different scenarios: (i) individual versus population level, (ii) with and without time to switch behavior, (iii) with different thresholds of the total number of infectious individuals to trigger cautious behavior, and (iv) different human cautious behaviors. The ANOVA results can be found in Table 11. A discussion of the results is presented below.
From the results shown in Table 11, it is observed that including human behavior in epidemiological models did not impact the runtime for small population sizes. However, in any of the cases, the difference was not greater than 20 s per replication, which is a reasonable increase when considering the trade-off between accuracy and computational needs.
For the parameter values used in the experiments of this work, all three output responses were not found to be sensitive to the different coupled human behavior and dengue spread models investigated when the population was large. When the population was small, all the output responses were found to be sensitive to the coupled human behavior, with the total number of infected individuals appearing to the most sensitive response, followed by the duration of the epidemic in days, and lastly by the number of epidemic waves. Although the number of results that were statistically different was not large, the existence of a few differences already shows the importance of human behavior in the results of epidemiological models. Moreover, this is just one model with specific parameters.
A work of [36] also investigated the impacts of human behavior on the results of the epidemiological model. The authors tested the same scenarios mentioned in this work, but the model mimicked the spread of the chikungunya disease and, hence, the parameter values were slightly different. In that work, Scheidegger and Banerjee [36] found that the total number of infected individuals and the duration of the epidemic were statistically

Impact of Human Behavior
Multi-way analysis of variance (ANOVA) with an α-level of 0.05 was used to investigate the impact of considering human behavior on the vector-borne disease model. The human behavior was considered through different scenarios: (i) individual versus population level, (ii) with and without time to switch behavior, (iii) with different thresholds of the total number of infectious individuals to trigger cautious behavior, and (iv) different human cautious behaviors. The ANOVA results can be found in Table 11. A discussion of the results is presented below.
From the results shown in Table 11, it is observed that including human behavior in epidemiological models did not impact the runtime for small population sizes. However, in any of the cases, the difference was not greater than 20 s per replication, which is a reasonable increase when considering the trade-off between accuracy and computational needs.
For the parameter values used in the experiments of this work, all three output responses were not found to be sensitive to the different coupled human behavior and dengue spread models investigated when the population was large. When the population was small, all the output responses were found to be sensitive to the coupled human behavior, with the total number of infected individuals appearing to the most sensitive response, followed by the duration of the epidemic in days, and lastly by the number of epidemic waves. Although the number of results that were statistically different was not large, the existence of a few differences already shows the importance of human behavior in the results of epidemiological models. Moreover, this is just one model with specific parameters.
A work of [36] also investigated the impacts of human behavior on the results of the epidemiological model. The authors tested the same scenarios mentioned in this work, but the model mimicked the spread of the chikungunya disease and, hence, the parameter values were slightly different. In that work, Scheidegger and Banerjee [36] found that the total number of infected individuals and the duration of the epidemic were statistically different in every comparison of the coupled human behavior disease spread model against the baseline model with a larger population. The authors also found a few differences in the low-level baseline.
Tukey multiple comparison test was used to investigate whether the results from the population-based Model C were statistically different from the individual-based Model C. With a few exceptions (three scenarios for the total number of infected individuals, one scenario for the duration of the epidemic, and one scenario for the number of epidemic waves), the results of the population-based Model C were not statistically different from the results of the individual-based Model C. This indicates that in some cases and depending on the parameter values and the response of interest, human behavior may be accurately represented at the population level, instead of at the individual level.
The differences found in this work and the work of Scheidegger and Banerjee [36] may indicate an interaction between the disease parameter values and human behavior. This highlights the importance of considering the impacts of both data quality and human behavior on the results of epidemiological models-while including human behavior may improve the accuracy of the results, the improvement may be compromised if the disease data are not accurate. More investigation in this area is needed to better understand the impacts of human behavior on vector-borne disease dynamics.

Impact of Multi-Vector and Multi-Strain
Multi-way analysis of variance (ANOVA) with an α-level of 0.05 was used to investigate the impact of considering multi-strain and multi-vector on the vector-borne disease model. Multi-strain and multi-vector were considered through different scenarios: (i) multistrain model versus baseline model, (ii) multi-vector model versus baseline model, and (iii) multi-strain and multi-vector model versus baseline. The ANOVA results can be found in Table 12. A discussion of the results is presented below. The multi-strain, multi-vector model against the baseline had a statistically different total number of infected individuals, duration of the epidemic in days, number of epidemic waves, and runtime for all the scenarios investigated: multi-strain only, multi-vector only, and multi-strain and multi-vector for both small and large populations.
From the results, one can observe that in terms of runtime the differences decrease when the population increase. For a large population, a significant difference was only observed among the high-level baseline and the multi-strain, multi-vector model, but no difference was observed when considering multi-strain only or multi-vector only. This indicates that in large populations, the increase in runtime caused by adding these characteristics in the model may not be as big as the increase in runtime caused by the increase in the number of agents, as discussed in Section 3.2. Moreover, in any case, the increase in runtime was not greater than 18 s, which is a reasonable increase considering the significant differences in the other three output responses.
As the ANOVA results showed, every scenario considering multi-strain and multivector, or multi-strain or multi-vector individually was statistically different than the baseline model at either low or high level. Including multi-strain and/or multi-strain and multi-vector led to an increase in the total number of infected individuals and duration of the epidemic, but a decrease in the number of epidemic waves. The increase in the total number of infected individuals and duration of the epidemic was expected as there were different strains of virus circulating simultaneously and people are immune to specific strains only.
As previously discussed, Wolbachia-carrier mosquitoes are laboratory-modified mosquitoes that can be infected by the dengue virus but cannot transmit the disease to other humans. Contrary to what one would expect from such a disease control strategy, the introduction of the Wolbachia-carrier mosquitoes led to a reduction in the duration of the epidemic and in the number of waves, but it increased the total number of infected individuals in comparison to the baseline. However, using Tukey multiple comparison test, we could check that using multi-vector strategy was significantly capable to lower the total number of infected individuals and the duration of the epidemic in scenarios with multi-strain, although there was no evidence to reduce the number of epidemic waves in comparison to the multi-strain scenario. This may indicate that the Wolbachia-carrier mosquitoes may not be a good strategy for regions where there is the circulation of only one strain of the dengue virus, but it may be a promising strategy for regions with a simultaneous circulation of multiple virus strains. We recognize that this is one model with specific parameters. Therefore, more studies must be performed to check whether this recommendation is valid for different contexts.

Discussion
From the discussion of the above results, some general inferences can be made. First, it was possible to observe that the variation of the parameter values had a greater impact on the total infected individuals than on the total duration of the epidemic or the number of epidemic waves. While 94 out of 140 comparisons led to statistically different results for the total number of infected individuals, 76 were statistically different for the duration of the epidemic, and 66 for the number of epidemic waves.
Parameters #1 and #8 were the ones that had the least impact on the total number of infected individuals, while parameters #1, #5, and #10 on the duration of the epidemic, and parameters #1 and #10 on the number of epidemic waves. On the other hand, parameters #6, followed by different parameters such as #2 and #7 were the most impactful for the output responses.
This allows us to emphasize two findings: first, the importance of defining the response of interest in epidemiological models, and second, the importance of accurately estimating the parameters. While some parameters may lead to little to no change to one output response, that same parameter may cause large changes in another output response.
Moreover, as discussed in Section 3.2, in some cases, a change in the parameter led to a decrease in the total number of infected individuals and an increase in the duration of the epidemic and/or in the number of epidemic waves. Thus, before implementing control measures, it is important to clearly define the priority for the population and the health system: reducing the total number of infected individuals, reducing the duration of the epidemic, or reducing the number of epidemic waves. In general, it is believed that the total number of infected individuals is more important than the duration of the epidemic. However, a long epidemic can generate greater rumors and fear among the population, as well as it potentially leading to a lower awareness of the population over time, which can reduce adherence to control measures and, consequently, increase the total number of infected individuals later. Depending on the control measures adopted, a longer epidemic may also have other long-term and unexpected consequences, such as economic losses and psychological impacts. The contrary results that the same input parameter has on the output responses highlight the importance of adequately estimating disease parameters such as the disease latent rate and the infectivity rate, which have a positive effect on all three output responses discussed in this work. Although important, many epidemiological models use estimates for these parameters without relying on empirical studies or some scientific support. It is recognized that it is difficult to perform experiments to define these parameters, but we call for more multidisciplinary attention to these parameters and for greater investment in the area. Figures 9 and 10 show the relationship between the output responses per parameter. According to Figure 9, it is possible to verify that, with exception of parameters #6 and #7, there was no apparent correlation between the total number of infected individuals and the duration of the epidemic, and between the total number of infected individuals and the number of epidemic waves. On the other hand, according to Figure 10, except for parameter #9, there appeared to exist a positive correlation between the duration of the epidemic and the number of epidemic waves. tion of the epidemic and the number of epidemic waves. Figure 11 shows the relation between the output responses as well, but the response total number of infected mosquitoes was included. As it can be seen, the total number of infected mosquitoes was highly correlated with the total number of infected individuals and slightly correlated with the duration of the epidemic. The runtime did not seem to be correlated with any of the output responses.  except for parameter #9, there appeared to exist a positive correlation between the duration of the epidemic and the number of epidemic waves. Figure 11 shows the relation between the output responses as well, but the response total number of infected mosquitoes was included. As it can be seen, the total number of infected mosquitoes was highly correlated with the total number of infected individuals and slightly correlated with the duration of the epidemic. The runtime did not seem to be correlated with any of the output responses.   Figure 11 shows the relation between the output responses as well, but the response total number of infected mosquitoes was included. As it can be seen, the total number of infected mosquitoes was highly correlated with the total number of infected individuals and slightly correlated with the duration of the epidemic. The runtime did not seem to be correlated with any of the output responses.
Modelling 2021, 2, FOR PEER REVIEW 29 Figure 10. The relation between the number of waves vs. the duration of the epidemic per parameter. Figure 11. The relation between the output responses.

Conclusions
The main conclusions that can be derived from this work are as follows: (i) The data quality is indeed an important factor and must be investigated in more detail by researchers and simulation specialists modelling disease spread. In fact, we suggest that a data quality impact analysis should be included as a section of any rigorous epidemiological simulation model study to acknowledge the uncertainties that might underly the model responses.
(ii) Variations in the parameters were shown to have a greater impact on the total number of infected individuals than on the duration of the epidemic or the number of epidemic waves.
(iii) Variations in the parameters may lead to divergent results of what is desired in an epidemic, i.e., a variation may lead to a reduction in the total number of infected individuals and an increase in the duration of the epidemic and the number of epidemic waves or vice versa.
(iv) Some parameters were shown to be significant in low population size, while others were shown to be significant in large population size only. This reinforces the importance of investigating the accuracy of data in epidemiological studies and considering the different contexts that exist, such as different population sizes, different geographies, different human behavior, how disease parameters change over time, etc.
(v) Similar to the item (iv), the responses appeared to not monotonically increase or decrease as a function of some of the parameters. This also reflects the importance of investigating the data accuracy on epidemiological studies, as a slight change in the value of the parameters may bring opposite effects on the responses of interest.
(vi) Human behavior appears to be appropriately mimicked in either the individualbased level or in the population-based level, which could save some computational resources.
(vii) Human behavior appears to present a strong interaction with the parameter values, which indicates that although in some cases it may not impact the results, it must be investigated to make the appropriate modelling decision.
(viii) Wolbachia-carrier mosquitoes, which is a recent control strategy being investigated, appear to be a promising control strategy to regions with a simultaneous circulation of multiple virus strains, but it may increase the total number of infected individuals in regions with a single virus strain.

Conclusions
The main conclusions that can be derived from this work are as follows: (i) The data quality is indeed an important factor and must be investigated in more detail by researchers and simulation specialists modelling disease spread. In fact, we suggest that a data quality impact analysis should be included as a section of any rigorous epidemiological simulation model study to acknowledge the uncertainties that might underly the model responses.
(ii) Variations in the parameters were shown to have a greater impact on the total number of infected individuals than on the duration of the epidemic or the number of epidemic waves.
(iii) Variations in the parameters may lead to divergent results of what is desired in an epidemic, i.e., a variation may lead to a reduction in the total number of infected individuals and an increase in the duration of the epidemic and the number of epidemic waves or vice versa.
(iv) Some parameters were shown to be significant in low population size, while others were shown to be significant in large population size only. This reinforces the importance of investigating the accuracy of data in epidemiological studies and considering the different contexts that exist, such as different population sizes, different geographies, different human behavior, how disease parameters change over time, etc.
(v) Similar to the item (iv), the responses appeared to not monotonically increase or decrease as a function of some of the parameters. This also reflects the importance of investigating the data accuracy on epidemiological studies, as a slight change in the value of the parameters may bring opposite effects on the responses of interest.
(vi) Human behavior appears to be appropriately mimicked in either the individualbased level or in the population-based level, which could save some computational resources.
(vii) Human behavior appears to present a strong interaction with the parameter values, which indicates that although in some cases it may not impact the results, it must be investigated to make the appropriate modelling decision.
(viii) Wolbachia-carrier mosquitoes, which is a recent control strategy being investigated, appear to be a promising control strategy to regions with a simultaneous circulation of multiple virus strains, but it may increase the total number of infected individuals in regions with a single virus strain.
While discussing mosquito-borne diseases, one of the first recommendations given by health agencies is to control the mosquito population growth. As discussed here, although reducing the mosquito population size reduces the total number of infected individuals, it increases the duration of the epidemic and the number of epidemic waves. Therefore, a recommendation that seems more important is to ensure humans follow the proper treatment to make the recovery process from the disease faster in order to reduce the life span of the mosquitoes, as well as to search for strategies that would increase the latent period of the disease in mosquitoes. These three parameters, for instance, lead to positive changes in all three output responses discussed here. Following this suggestion, the introduction of genetically modified mosquitoes that have a shorter lifecycle should be further investigated in epidemiological models. Contrary to what one expects for mosquitoborne diseases, quarantine and isolation that would make the human population size of the endemic region temporarily smaller appears to be useful, due to the positive effects on all three output responses, and should also be further investigated.
We recognize that the model developed in this work is a large simplification of the real world. However, the focus of this work was not to develop a model for epidemic prediction. Instead, we wanted to illustrate the possible impacts of data quality, human behavior, multi-strain, and multi-vector on epidemiological results, and to attract the attention of the academic community to the importance of not overlooking these characteristics when modelling disease spread.
We also wanted to assess the trade-off between model accuracy and the required computational power. As the results indicate, due to the impacts on the results and the generally low to no increase in runtime when considering human behavior, multi-vector, or multi-strain, it appears beneficial to include those characteristics in the models. Although the model developed in this work is simple, the results align with what is known in this field of research, which indicates that modelling is a suitable tool for exploratory research and it is a good start point for showing the cost-effect of mimicking the reality more accurately.
Due to the simplicity of the model, further investigation is needed to evaluate whether these results would persist for larger human populations, for different values of the parameters, and for more detailed models. Therefore, several suggestions for future research can be made, such as (i) to repeat the same analyses, but using a larger number of replications to verify whether with a larger sample and consequently greater accuracy, the results will be similar or not; (ii) to repeat the same analyses for other human population sizes and other variations of the parameters and verify whether the results are similar; (iii) to carry out more experiments, possibly with complete or at least fractional factorial planning to evaluate the interaction between factors; (iv) to increase the level of detail of the model to more accurately represent reality; (v) to include genetically modified mosquitoes; and (vi) to perform similar analysis on different rules for behavior inclusion, such as change in behavior in terms of the number of infected individuals within a specific distance or in terms of the number of infected individuals in a social network (emotional proximity).

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. This study also used data available in previously published studies as appropriately cited.