An Evolutionary Perspective of Virus Propagation

: This paper presents an evolutionary algorithm that simulates simpliﬁed scenarios of the diffusion of an infectious disease within a given population. The proposed evolutionary epidemic diffusion (EED) computational model has a limited number of variables and parameters, but is still able to simulate a variety of conﬁgurations that have a good adherence to real-world cases. The use of two space distances and the calculation of spatial 2-dimensional entropy are also examined. Several simulations demonstrate the feasibility of the EED for testing distinct social, logistic and economy risks. The performance of the system dynamics is assessed by several variables and indices. The global information is efﬁciently condensed and visualized by means of multidimensional scaling.


Introduction
The diffusion of a infectious disease within a given population during a short time period is called an epidemic.If the infection spreads to a large number of countries and to other continents it may be classified as a pandemic.Recent outbreaks are the 2003 severe acute respiratory syndrome (SARS), 2004 H5N1 (Avian flu), 2005 Zika fever, 2009 H1N1 and HIV/AIDS pandemics, just to name a few [1][2][3][4][5].Presently the Coronavirus disease 2019 (COVID- 19) is of utmost importance for the human species [6][7][8][9][10].For controlling an outbreak, governments adopt basic containment, mitigation and suppression strategies.Containment is considered the early stages of the outbreak, for trying to stop the infection from being transmitted to the rest of the population.Mitigation is adopted to slow down the spread of the disease to moderate its effects on the population and the health care system.Suppression, attempts to reverse the pandemic by reducing the so-called basic reproduction number R 0 , to a value less than 1 [11,12].
The political and logistic management of an infectious outbreak is of key importance in order to decrease the epidemic peak, often called as 'flattening the epidemic curve'.A second political issue is to handle the consequences in economy, with governments investing huge capitals to reduce the threats posed by the pandemics.In both cases, tools for estimating and foreseeing the evolution of the spread are an invaluable asset for decision makers.We find studies adopting a variety of approaches going from the use of models based on mathematical tools [13,14], such as systems of differential equations [15][16][17], or data fitting techniques and statistics, up to computer assisted strategies [18][19][20][21].In the present days, artificial intelligence and soft computing are gaining importance and they lead to reliable methods to handle problems difficult to model and involving variables, either impossible to measure, or exhibiting unreliable values.
We find nowadays relevant initiatives involving political, academic and social organizations, trying to give a fast response to urgent problems based on data-driven and computational resources.For example, in the scope of the COVID-19 outbreak we can mention the daily update by the Italian Department of Civil Protection http://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2cce478eaac82fe38d4138b1, the European Centre for Disease Prevention and Control https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographicdistribution-covid-19-cases-worldwide, the 'Acción Matemática contra el Coronavirus', http:// matematicas.uclm.es/cemat/covid19/en/by the Spanish Mathematics Committee, or the Global research on coronavirus disease (COVID-19) https://www.who.int/emergencies/diseases/novelcoronavirus-2019/global-research-on-novel-coronavirus-2019-ncov by the World Health Organization (WHO), just to name a few.
In this area of computer science, evolutionary computation provides a framework for optimization schemes inspired by biological evolution [22][23][24][25][26].In general, evolutionary computation leads to a family of population-based optimization algorithms [27,28], with a numerical trial and error meta-heuristic and probabilistic behavior [29][30][31][32][33].An initial set of elements in a population (often called candidate solutions) are generated and successively updated by means of some logical rules including some random variations.Probably the most well-known scheme is the 'genetic algorithm' [34], where a population is subjected to natural selection using mutation and crossover operations.The population gradually evolves to increase its performance according with a given fitness function chosen by the user.
In the last decades a large variety of Evolutionary algorithms (EA) were proposed.In 2013 Iztok Fister [35] presented a brief review of EA and found 74 algorithms that could be distributed in four classes, namely the families entitled swarm intelligence, biological inspired, or physics and chemistry based algorithms and miscellaneous schemes.A search in scientific and technical literature reveals a large number of proposed techniques for EA, demonstrating their popularity in the research community for dealing with real-world problems [36][37][38].
In the case of health care systems, machine learning algorithms have also been explored.Interested readers can find a review in Reference [39].In what concerns specific diseases, we find, for example, the problem of distinguishing bacterial from viral meningitis through a machine learning using a data set [40].Nonetheless, we must have in mind that in pervasive/ubiquitous computing environments, users may evaluate their trustworthiness by using historical data from their past interactions, and that some solution detecting unfair recommendations must be implemented [41].
The classical approach for compartmental models in epidemiology follows the so-called SIR model by Kermack et al. [13,[42][43][44], described by a system of differential equations, where a fixed population of individuals that fit into three categories, namely the susceptible (but not yet infected), infectious and recovered (with immunity) individuals, that vary in time.The small number of variables and parameters limit its applicability and several improvements have been proposed [45][46][47].Nonetheless, the additional degrees of freedom come at the cost of an extra complexity often requiring variables and parameters either not available in real world, or with a considerable noise and uncertainty due to a variety of factors.Therefore, we can question if a computational scheme following the concepts of EA can overcome those problems and represent a valid alternative for modeling purposes.
Having these ideas in mind, this paper develops an EA for mimicking a epidemic diffusion within a population.As usual with these computational schemes several simplifications will be adopted both to speed-up the computational processing and to clarify the role of the distinct parameters and variables.In fact, we can add extra degrees of sophistication and detail at the expense of building a logical scheme leading to results more difficult to interpret given the plethora of cross effects.In spite of the simplification, we shall have a considerable number of variables and scenarios which makes difficult to compare the results.In order to overcome this problem, we adopt the multidimensional scaling (MDS) technique [48][49][50][51].The MDS is a data processing numerical technique that allows depicting in a low dimensional locus results from a high-dimensional space.This strategy leads to a computational visualization that unravels the most important aspects embedded in the data.
The paper is organized as follows.Section 2 discusses the main ideas and formulates the structure of the proposed evolutionary epidemic diffusion (EED).Sections 3 and 4 discuss the results given by the EED and explore several possible configurations of the parameters and distinct scenarios.Section 5 analyses the results by means MDS.Finally, Section 6 summarizes the main conclusions.

The Evolutionary Epidemic Diffusion Algorithm
The EED adopts a 2-dimensional squared small-world (SW) with a population of N elements, so that the i-th individual is positioned on the coordinates (x i , y i ), i = 1, . . ., N, inside the square.The population will evolve in time t and in the 2-dimensional space according with the rules defined in the sequel.Two exceptions outside the coordinates of the SW are considered consisting of the 'hospital and house' (hereafter simply denoted as hospital) and 'cemetery'.These two additional repositories are allocated to the elements of the population that are found by health care system as infected or that passed away due to the action of the virus.The population is distributed along five categories: 'healthy', 'infected' (but not in hospital or confined in house) and 'immunized' that move in time and space, the 'hospitalized' corresponding to infected people that are taken to the hospital or that are confined to some part of his house in case the health state is not too serious, and 'dead' that are positioned in the 'cemetery'.
The population is initialized in 2 categories 'healthy' and 'infected' with percentages 1 − p 1 and p 1 , respectively.Let us suppose that two individuals in the SW can be randomly selected.If they are in a distance range inferior to a given threshold limit, that may be called 'social neighborhood', D, and if one is healthy and the other is infected, then the healthy became infected with a probability p 2 , otherwise he stays in the previous state.Therefore, if (x i , y i ) and (x j , y j ) are the coordinates of the i-th and j-th individuals (i, j = 1, . . ., N) and one is infected, then infection of the other will occur with probability p 2 if d < D, where the spatial distance d is defined as [52]: An infected individual can remain as 'infected' with probability p 3 if not detected by the health care system (either due to logistic limitations, or because of being asymptomatic).Nonetheless, an 'infected' individual can change to the categories 'hospitalized', 'dead', or 'immunized' as described in the follow-up.The hospital has limited capacity of N h places.Therefore, an infected individual that is detected by the health care system, can enter in the hospital if, and only if, the number of patients n h does not surpass the limit, that is n h ≤ N h .Moreover, infected and hospitalized individuals can change to the category 'dead' with probabilities p 4 and p 5 , respectively.The infected or hospitalized i-th individual changes to 'immunized' after being in that state for the period of time t i ≥ T ± ∆T, where ∆T stands for some random variation with a uniform probability distribution.Individuals 'immunized' re-enter the SW with randomly generated coordinates and cannot be infected again.Moreover, dead or immunized do not transmit infection to others.
The population located in the SW (i.e., 'healthy', 'infected' and 'immunized') changes position in space with a small increment (δ x , δ y ) generated randomly and independently with uniform distribution and zero mean.Therefore, the ith individual evolves in space (x i , y i ) → (x i + δ x , y i + δ y ) between two successive time iterations t and t + 1.In the follow-up, we consider that the maximum displacement values are identical (i.e., max |δ x | = max |δ y | = δ).Therefore, two independent values −δ ≤ δ x ≤ δ and −δ ≤ δ y ≤ δ are generated, as long as the new coordinates (x i + δ x , y i + δ y ) remain in the SW, otherwise a new set of random displacements is generated.
The Minkowski distance d leads to the Manhattan (or city), Euclidean and Chebyshev distances [53] for α = {1, 2, ∞}.In particular, we can simulate different environments in the SW, such with and without buildings for α = 1 and α = 2, respectively.
Several issues must be considered: • The SW is considered to have an isotropic structure, that is, without a special architectural or geophysical structure.Therefore, the choice for max |δ x | = max |δ y | = δ seems intuitive.
• The uniform probability distribution is often adopted in the scope of EA, merely for the sake of simplification.The adoption of other distributions, such as the Gaussian, or the use of conditional probabilities, for example, would imply further assumptions that imply adding an extra complexity to the EED.Section 4 will discuss further this preliminary assumption.• The Manhattan and Euclidean attempt to model distinct behaviors of the individuals moving in the SW, but not the propagation of the virus.In fact, individuals have a macroscopic nature in opposition to virus that has a microscopic dimension and can propagate in a variety of ways, such as in air by the wind or in the clothes and shoes of the host individuals.So, the city distance reflects the adoption of social attitudes such as passing for the other side of the street when another individual appears in the same direction.
In summary, the EA (i) starts with two categories 'healthy' and 'infected', (ii) evolves in time and space with five categories 'healthy', 'infected', 'hospitalized', 'immunized' and 'dead', and (iii) at the end the SW includes only three categories, specifically the 'healthy', 'immunized' and 'dead'.Obviously, during the simulation the number of active individuals in the SW (i.e., those that are not 'hospitalized', or 'dead') decreases.
The overall description and assignment of variables and parameters is represented in the diagram of Figure 1.Some of the simplifications may be a matter of criticism, such as: • A 2-dimensional SW with fixed Cartesian boundaries is considered • The SW has a uniform structure, without including rivers, hills or other structures • The rules of the EA are fixed in time • Some additional modeling of the hospital, such as staff burnout or equipment limitation, is absent • The effect of different clinical conditions and age are identical for all population • The total population decreases along time since no births or incoming visitors are included • Hospitalized individuals can infect others, but no such case is considered • No 'dead' individuals due to other causes Indeed we can include extra rules and answer these issues.However, we must have in mind that EA are usually a simplification of the real-world.This allows a faster calculation and a simpler and more assertive interpretation of the meaning of each parameter.

Exploring the Evolutionary Algorithm: A First Set of Experiments
Let us consider a population of N = 500 individuals with p 1 = 0.1 and 1 − p 1 = 0.9 for infected and healthy, respectively, and simulate the SW during τ = 30 iterations.Therefore, the EED simulation starts with n i = 50 'infected' and N − n i = 450 'healthy' individuals.The SW is delimited by the coordinates (x, y) = (0, 0) and (x, y) = (1, 1), the infection distance is set to D = 0.07, and the maximum step in space (x, y) between two consecutive time iterations is δ = 0.1 .Other parameters are set to p 2 = 0.9, p 3 = 0.7, p 4 = 0.05 and α = 2.The infection incubation period is set to T = 15 days with a variation ∆T = ±2 days, and the hospital capacity N h is considered unlimited.We consider low values for the mortality, such as the common flu or the COVID19, but not others such as Ebola or Marburg [2,4].Nonetheless, the adoption of such numerical values does not precludes the use of higher values for such probabilities.Moreover, it is considered p 4 > p 5 to describe the positive influence of the medical health care.Figure 2 shows the time evolution of the 5 state variables during the period of t = 1, . . ., τ iterations for this first scenario S 1 .We verify a fast decrease of the number of 'hospitalized' and an increase in the 'immunized'.The number of 'infected' (not in hospital) has a slow but continuous reduction and the 'dead' increase linearly until the 'immunized' reach the steady-state.Varying the value of T changes the delay period for starting the emergence of the variable 'immunized'.On the other hand, the main effect of increasing ∆T is to smooth the curve at the start and end of the immunization transient.
The value of α = 2 represents a free space.Therefore, we consider a second scenario S 2 of a SW described by a distance with α = 1, while the rest of the parameters remain identical.Figure 3 depicts the time evolution of the 5 state variables.Comparing the two scenarios we conclude that for α = 1 we have a larger number of 'healthy' and a smaller value for the 'dead'.In a third scenario S 3 we limit the hospital capacity (and house confinement capacity) to N h = 100, that is, to 20% of the total population.The other parameters remain identical to first scenario.Figure 4 shows clearly the effect of saturation in the 'hospitalized' time evolution and a clear increase in the final number of 'dead'.In a 4th scenario S 4 we evaluate the influence of confining the individuals by reducing the distance threshold limit to D = 0.05 while the rest of the parameters remain identical to scenario S 1 .Figure 5 shows the strong effect of the limitation that leads to the increasing of 'healthy', and the diminishing of 'immunized' and 'dead'.Let us consider four cost indices consisting of: • I 1 , the integral in time of the number of 'hospitalized' divided by the maximum value (i.e., N × τ) • I 2 , the maximum number of 'hospitalized' divided by the population maximum number (i.e., N) • I 3 , the number of 'dead' divided by the population maximum number (i.e., N) • I 4 , the integral in time of the total number 'healthy' and 'immunized' divided by the maximum value (i.e., N × τ).
The indices I 1 and I 2 assess social and logistic issues, I 3 addresses mainly a social aspect and I 4 focuses in economic matters.Table 1 lists the parameters and the obtained values of the indices for the proposed scenarios.In all scenarios we find 4 phases • P 1 starting from two categories only, namely 'healthy' and 'infected', an initial transient with a fast diminishing in the number of 'healthy' and an inverse behavior with the 'infected', 'hospitalized' and 'dead' • P 2 a slow diminishing or even a steady-state in the 'healthy' and a moderate increase with the 'infected', 'hospitalized' and 'dead' • P 3 a fast increase of the 'immunized' and slow decrease of 'infected', 'hospitalized' and 'dead', • P 4 a steady-state with individuals in there categories only, namely 'healthy', 'dead' and 'immunized'.
Let us now consider the influence of the control strategy based on the social neighborhood D upon the cost indices.Therefore, we simulate the EED for 0 ≤ D ≤ 0.1 and we consider the effect of the social neighborhood for the cases of α = 1 and α = 2.The rest of the parameters remain identical to those adopted in scenario S 1 .
Figures 6-9 show the evolution of the four indices versus D for the two values of α.In all cases we verify that the cost increases significantly for D > 0.02 and stabilizes at an high value for D > 0.06.Furthermore, the value of α has some influence and we verify that a SW with α = 1 limits slightly the propagation of the infection in relation with the case of a SW with α = 2.The spatial distribution of the elements in the SW can also be measured.Let us adopt the Shannon entropy of the (x, y) space occupation by any type of individual, that is, 'healthy', 'infected' and 'immunized', since the 'hospitalized' and and 'dead' are located outside the SW namely at the 'hospital' and 'cemetery', respectively.The SW is subdivided into an array of r × r matrix and the number of individuals in each cell in counted for obtaining a 2-dimensional histogram.
The statistics of the spatial distribution versus time t is obtained my means of the 2-dimensional space entropy H xy given by [54][55][56][57][58]: where p x i , y j is the probability obtained for the i, j-cell of the r × r matrix, at a given time instant t.
Figure 10 illustrates the evolution of the individuals in the SW for t = {1, 5, 10, 20} and Figure 11 depicts the evolution of H xy versus time t, for r = 20 and α = {1, 2}.In all cases it is considered N = 500, D = 0.05, δ = 0.1, p 1 = 0.1, p 2 = 0.9, p 3 = 0.7, p 4 = 0.05 and p 5 = 0.02.We observe clearly (i) the diminishing of the 'healthy' and the increase of the 'infected' when passing from t = 1 to t = 5, (ii) the disappearance to the 'infected' and the emergence of the 'immunized' when going to t = 17 and (iii) the steady-state at t = 25 with 'healthy' and 'immunized'.The space entropy H xy reflects also clearly the transient and the difference between the initial condition and the steady-state.

Exploring the Evolutionary Algorithm: A Second Set of Experiments
In the previous scenarios no available vaccine or treatment is considered.Therefore, the 'immunized' variable is an important issue against any new surge of infection.To illustrate this problem let us consider the 5th and 6th scenarios, S 5 and S 6 , with a disturbance of 50 recently 'infected' visitors entering in the SW at t = 25 with coordinates (x, y) generated randomly.Scenarios S 5 and S 6 follow the conditions of scenarios S 1 and S 4 (i.e., D = 0.07 and D = 0.05), respectively, for the rest of the parameters, with exception of the simulation period that is extended to τ = 50.
Figures 12 and 13 show the immediate effect in all variables.However, in scenario S 6 the number of 'immunized' is lower at t = 25 and, consequently, the propagation of the perturbation is much superior.
The public feeling of risk as a consequence of known critical cases is a variable that we can consider and implement in the EED.Therefore, we consider a scenario S 7 where a disturbance is simulated by reducing the social neighborhood to D = 0.01 for t ≥ 4. Figure 14 shows clearly the reduction of 'hospitalized', 'dead' and 'immunized' and, correspondingly, the improvement in the cost functions.During the outbreak of the virus spread some medical drug for an efficient treatment may be discovered and administrated to all population, both hospitalized and non-hospitalized.To simulate this scenario S 8 we consider a disturbance at t ≥ 6, where the probabilities of death are set to the values p 4 = p 5 = 0.01. Figure 15 shows again a remarkable improvement.
Another issue to consider is the initialization of the EED.In the scenarios adopted so far, the simulation starts with a considerable number of 'infected', namely with 50 individuals in a total of 500.To evaluate the effect of starting with a smaller number, we consider 5, 10 and 20 'infected' (see  corresponding to the scenarios S 9 , S 10 and S 11 , respectively.We verify clearly a smother variation of the curves and the improvement in the cost indices, demonstrating the importance of an early detection of the 'infected'.

Multidimensional Scaling Analysis of the Evolutionary Algorithm
The results given by the EED have a probabilistic nature and vary slightly in each run.Moreover, we have a large number of variables which makes difficult to compare the results.In order to overcome these problems we adopt the multidimensional scaling (MDS) technique [48][49][50][51] to condense the results in a single representation.In short, the MDS represents in a low dimensional space (i.e., with dimension q) data sets from an higher dimensional space (i.e., with dimension p > q), while preserving its main characteristics and highlighting the key issues.For achieving that goal the MDS computational algorithms requires two phases.In a first phase the user defines a given distance d comparing the set of M ∈ N items in a p-dimensional space [52,53].Then, all items are compared by calculating a matrix ∆ = d ij , i, j = 1, . . ., M, with d ij = d ji and d ii = 0, of item-to-item distances in the p-dimensional space.In a second phase, the MDS performs successive iterations trying to find a configuration of items in the q-dimensional space that mimic approximately the original distances.Usually the MDS adopts a numerical strategy for minimizing some kind of least squares index, often called stress.Several distinct types of distances can be used.Also, since we are dealing with relative measurements (i.e., distances), the representation in the q-dimensional space is read in terms of clusters and can be rotated, magnified or shifted since the axis have no physical interpretation.Often the dimensions q = 2 or q = 3 are adopted since they allow a straightforward visualization [59][60][61].
We consider data sets produced by the EED with vectors v i (n, t), n = 1, . . ., 5 and t = 1, . . ., 30, for the five variables and time of simulation, respectively.For calculating the distances between items we adopt the Canberra and Lorentzian metrics, d C and d L , to assess the dissimilarity between pairs: where t i and t j denote two time instants, so that t i , t j ∈ {1, . . . ,30}.Therefore, in this case the matrices ∆ have dimension 30 × 30, for both distances.Additionally, we consider 10 runs to evaluate the probabilistic effect and 3 scenarios for the social neighborhood D = {0.025, 0.05, 0.075}.We overlap the 10 MDS loci using the Procrustes technique [62,63].The Procrustes processing between two matrices ∆ 1 and ∆ 2 of identical dimension determines a linear transformation (i.e., a translation, reflection, orthogonal rotation, and scaling) of the points in matrix ∆ 2 to best conform them to the points in matrix ∆ 1 .
We verify clearly the emergence of the phase P i , i = 1, . . ., 4, observed in the first group of experiments.Nonetheless, as usual with MDS, the two distances capture slightly distinct aspects of the time series in what concerns the three distinct scenarios for D. The Canberra distance highlights the distinct initial transients, while the crossover instants (t ≈ 5 and t ≈ 13) between phases and the steady state have close characteristics.On the other hand, the Lorentzian distance considers that the three scenarios have similar topological form, having merely a distinct scale factor.
We test also the EED for the collection of scenarios S i , i = 12, . . ., 20, corresponding to the combination of three values of the social neighborhood D = {0.025, 0.05, 0.075} and three values of the hospital capacity N h = {200, 50, 20}.The average of the cost indices av(I j ), j = 1, . . ., 6, for 10 independent runs of each scenario, are listed in Table 3.We can also use the MDS to visualize the information using the Canberra and Lorentzian metrics, d C and d L , that in this case are written as: where the vectors u i (m, t), m = 1, . . ., 4, and s = 1, . . ., 90, include the values of the 4 cost indices for the nine scenarios with 10 runs each.Therefore, in this case each matrix ∆ has dimension 90 × 90.

Conclusions
This paper presented an EA that simulates the propagation of a virus infection in a SW.The EED includes a limited number of variables and parameters, but characterizes a large number of possible scenarios and allows a fast computational simulation.In fact, the extension of the EED to include additional factors present in the real world is reasonably straightforward, but the limitation of the EED complexity follows the usual strategy of developing EA with focus on the most important variables and parameters.Several scenarios were tested and the results follow the common intuition about the diffusion of infectious diseases.The system, involving several variables and parameters, was assessed by several tools and performance indices.The MDS technique proved to be a valuable tool to condense the multidimensional nature of the information.In fact, this technique takes also advantage of present day computational resources and allows unraveling patterns embedded in complex data and tracing conclusions that otherwise would require laborious schemes to highlight the main conclusions.In summary, the EED represents a valuable computational tool for the development of strategies for simulating, controlling and foreseeing future results of virus outbreaks.

Figures 21
Figures 21 and 22  show the MDS loci for q = 3 and the Canberra (5) and Lorentzian(6), respectively.The loci reveal clearly that the pairs of scenarios {S 12 , S 15 }, {S 16 , S 17 } and {S 19 , S 20 } have some similarities.On the other hand, the scenarios S 13 , S 14 and S 18 have, each one, a very distinct behavior.These results show that the combination of the parameters D and N h are of utmost importance.

Table 2
lists the parameters and the obtained values of the indices for the seven new scenarios.