Cross-entropy method in application to SIRC model

The study considers the usage of a probabilistic optimization method called Cross-Entropy (CE). This is the version of the Monte Carlo method created by Reuven Rubinstein (1997). It was developed in the context of determining rare events. Here we will present the way in which the CE method can be used for problems of optimization of epidemiological models, and more specifically the optimization of the SIRC (Susceptible - Infectious - Recovered - Cross-immune) model based on the functions supervising the care of specific groups in the model. With the help of weighted sampling, an attempt was made to find the fastest and most accurate version of the algorithm.

In this study our aim is to develop the possibilities of numerically solving variational problems that appear in epidemic dynamics models. The algorithm that we will use for this purpose will be a modified Monte Carlo method. In a few sentences, we recall those stages of the Monte Carlo method development and its improvement, which constitute the essence of the applied approach (cf. Martino et al. [1, in sec.

1.1]).
Before constructing the computer, the numerical analysis of deterministic models used approximations that aimed, among other things, at reducing and simplifying the calculations. The ability to perform a significantly larger number of operations in a short time led Ulam (cf. Metropolis [2]) to the idea of transforming a deterministic task to an appropriate stochastic task and applying the numerical analysis using the simulation of random quantities of the equivalent model (cf. Metropolis and Ulam [2],). The problem Ulam was working on was part of a project headed by von Neumann, who accepted the idea, and Metropolis gave it the name of the Monte Carlo Method [1, in sec. 1.1].
The resulting idea allowed us to free oneself from difficult calculations, replacing them with a large number of easier calculations. The problem was the errors that could not be solved by increasing the number of iterations. Work on improving the method consisted in reducing the variance of simulated samples, which led to the development of the Importance Sampling (IS) method in 1956 (cf. Marshall [4] ).
Computational algorithms in this direction are a rapidly growing field. Currently, the computing power of computers is already large enough, to provide an opportunity to solve problems that can't be solved analytically. However, some problems and methods still require a lot of power. A lot of research is devoted to finding or improving such methods. The development of this field is currently very important.
There are many models describing current problems. Not all offer the possibility of an analytical solution. That is why various studies are appearing to find best methods for accurate results. the Cross-Entropy method helps to realize IS and it may, among other things, be an alternative to current methods in problems in epidemiological models.

Sequential Monte Carlo methods.
The CE method was proposed as an algorithm for estimation of probabilities of rare events. Modifications of cross entropy to minimize variance were used for this by Rubinstein in his seminal paper [5] in 1997. This is done by translating the "deterministic" optimization problem in the related "stochastic" optimization and then simulating a rare event. However, to determine this low probability well, the system would need a large sample and a long time to simulate.

Importance sampling.
Therefore, it was decided to apply the importance sampling (IS), which is a technique used for reducing the variance. The system is simulated using other parameter sets (more precisely, a different probability distribution) that helps increase the likelihood of this occurrence. Optimal parameters, however, are very often difficult to obtain. This is where the important advantage of the CE method comes in handy, which is a simple procedure for estimating these optimal parameters using an advanced simulation theory. Sani and Kroese [6] have used the Cross-Entropy method as the support of the importance sampling to solve the problem of the optimal epidemic intervention of HIV spread. The idea of his paper will be adopted to treat the SIRC models, which has a wide application to modeling the division of population to four groups of society members based on the resistance to infections (v. modeling of bacterial Meningitis transmission dynamics analyzed by Asamoah et al. [7], Vereen [8]). The result will be compared with an analytical solution of Casagrandi et al. [9].

Cross-Entropy method.
This part will contain information on the methods used in the paper. The Cross-Entropy method developed in [5] is one of the versions of the Monte Carlo method developed for problems requiring the estimation of events with low probabilities. It is an alternative approach to combinatorial and continuous, multi-step, optimization tasks. In the field of rare event simulation, this method is used in conjunction with weighted sampling, a known technique of variance reduction, in which the system is simulated as part of another set of parameters, called reference parameters, to increase the likelihood of a rare event. The advantage of the relative entropy method is that it provides a simple and fast procedure for estimating optimal reference parameters in IS.
Cross entropy is the term from the information theory (v. [10], [11,Chapter 2]). It is a consequence of measuring the distance of the random variables based on the information provided by their observation. The relative entropy, the close idea to the cross entropy, was first defined by Kullback and Leibler [12] [13]) for two distributions p, q and is known also as Kullback-Leibler distance. This idea was studied in detail by Csiszár [14] and Amari [15]. The application of this distance measure in the Monte Carlo refinements is related to a realization of the importance sampling technique.
The CE algorithm can be seen as a self-learning code covering the following two iterative phases: 1. Generating random data samples (trajectories, vectors, etc.) according to a specific random mechanism. 2. Updating parameters of the random mechanism based on data to obtain a "better" sample in the next iteration. Now the main general algorithm behind the Cross-Entropy method will be presented on the examples.

Application to optimization problems.
This section is intended to show how extensive the CE method is. The examples will show the transformation of the optimization problems to the Monte Carlo task of estimation for some expected values. The equivalent MC task uses the CE method. Other very good examples can be found in [16].
The first variational problem presented in this context is the specific multiple stopping model. Consider the so-called secretary problem (v. Ferguson [17] or the second author's paper [18]) for multiple choice, i.e. the issue of selecting the best proposals from a finite set with at most k attempts. There is a set with N objects numbered from 1 to N. By convention, the object with the number 1 is classified as the best and the object with the number N as the worst. Objects arrive one at a time in a random order. We can accept this object or reject it. However, if we reject an object, we cannot return. The task is to find such an optimal stopping rule that the sum of the ranks of all selected objects is the lowest (their value is then the highest). So we strive for a designation inf Let a i be the rank of the selected object. The goal is to find a routine for which the value of Epa τ 1`¨¨¨`a τ k q for k ě 2, is the smallest. More details are moved to A.1. Next, the problem was transformed to the minimization of the mean of sum of ranks. The details are presented in the section A.2. This problem and its solution by CE was described by Polushina [19]. Its correctness was checked and programmed in a different environment by Stachowiak in [20].
The second example is presented in details in Section A.3. It is a formulation of the vehicle routing problem (v. [21], [22]). This is an example of stochastic optimization where the Cross-Entropy is used.

Goal and organization of this paper.
In the following parts of the paper, we will focus on models of the dynamics of the spread of infection over time when the population consists of individuals susceptible to infection, sick, immune to vaccination or past infection, and partially susceptible i.e. the population is assigned to four compartments of individuals. Actions taken and their impact on the dynamics of the population are important. It is precisely the analysis of the impact of the preventive diagnosis and treatment that is particularly interesting in the model. How the mathematical model covers such actions is presented in Section 2.1. The model created in this way is then adapted for Monte Carlo analysis in 2.2 and the results obtained on this basis are found in 2.3. The analysis of computational experiments concludes this discussion in 3.

Optimization of control for SIRC model.
Let us focus the attention on the introduction of SIRC model and presentation of the logic behind using the CE method to determine functions that optimize the spread of epidemics. The ideology behind creation of the SIRC model and its interpretation will be presented in order to match the right parameters for calculating the cost functions.
The CE method here is used to solve the variational deterministic problem. Solving such problems has long been undertaken with the help of numerical methods with a positive result. In the position [23] proposals of the route for variational problems for optimization problems in time are presented. The main focus was on the non-convex problems, which often occur with optimal control problems. In the book by Glowinski [24] a review of the methods for solving variational problems was made. And then [25] presented a complicated method of approximating functions for the problem optimization.

SIRC model.
The subject of the work is to solve the problem optimizing the spread of the disease, which can be modeled with the SIRC model. This model was proposed by Casagrandi et al. [9]. Its creation was intended to create a better model that would describe the course of influenza type A. Contrary to appearances, this is an important topic because the disease, although it is widely regarded as weak, is a huge problem for healthcare. In the US alone, the cost associated with the influenza epidemic in the season is estimated to exceed USD 10 billion (v. [26]) and the number of deaths is over 21,000 per season (v. [27]).
Various articles have previously proposed many different mathematical models to describe a pandemic of influenza type A. An overview of such models was made by Earn [28]. In general, it came down to combining SIR models with the use of cross-resistance parameters (cf. details in the papers by Andreasen et al. [29] and Lin et al. [30]). The authors of the article write that the main disadvantage of this approach is the difficulty of the analysis and calculations with a large number of strains. The results of their research showed that the classic SIR model cannot be used to model and study influenza epidemics. Instead, they proposed the extension of the SIR model to a new class C that will simulate a state between a susceptible state and a fully protected state. This extension aims to cover situations where vulnerable carriers are exposed to similar stresses as they had before. As a result, their immune system is stronger when fighting this disease.
The SIRC model divides the community into 4 groups: ‚ S -persons susceptible to infection, who have not previously had contact with this disease and have no immune defense against this strain ‚ I -people infected with the current disease ‚ R -people who have had this disease and are completely immune to this strain ‚ C -people partially resistant to the current strain (e.g. vaccinated or those who have had a different strain) Figure 1 contains a general scheme of this model. The four rhombus represent the four compartments of individuals, the movement between the compartments is indicated by the continuous arrows. The main advantage of this model over other SIR users is the fact that after recovering from a given strain, in addition to being completely resistant to this strain, they are also partly immune to a new virus that will appear later. This allows you to model the resistance and response of people in the group to different types of disease. Parameters α, δ i γ can be interpreted as the reciprocal of the average time spent by a person in order of ranges I, R, C. Parameters µ with indices S, I, R, C represent the natural mortality in each group, respectively. In some versions of the model, additional mortality rates are considered for the group of infected people. Here we assume that this factor is not affected by the disease and we will denote it as µ in later formulas (µ S " µ I " µ R " µ C " µ). The next one σ is the likelihood of reinfection of a person who has cross-resistance while the parameter β describes the contact indicator.
The SIRC model is represented as a set of four ordinary differential equations. Let Sptq, Iptq be the number of people in adequate compartments. We have (v. [9]) with initial conditions

Derivation of optimization functions
An approach to optimize this model was proposed by Iacoviello and Stasio [31]. In this article one can find the suggestion concerning performing the calculations as outlined by Zaman et al. in [32]. Two functions were proposed in the approach as the parameter or controls. One relates to the susceptible people and the other describes the number of sick people. The method described by Kamien and Schwartz [33] was used to determine the optimal solution. In order to apply it, the set of equations (1) should be updated accordingly. After these modifications, it has the following form: where gpSptq, uptqq " ρ 1 Sptquptq, uptq represent the percentage of those susceptible who have been taken care of thanks to using control on population and vptq represent the percentage of infected people with the same description, respectively. ρ 1 and ρ 2 are weights that optimize the proportion of given control options. Both functions gpSptq, uptqq hpIptq, vptqq and in order they can be interpreted as actions performed on people susceptible to the disease and infected. In addition, two new conditions related to optimization functions are added to the initial conditions.
They describe the limits on what part of the population care can be given to. The smallest values u min and v min is zero. The existence of the solution is shown by Iacoviello et al. [31] and they are functions that minimize the following cost index: α 1 , α 2 are used to maintain the balance in the susceptible and infected group. τ 1 , τ 2 is interpreted as weighting in the cost index. t 1 , t 2 determine the time interval. The square next to the functions uptq and vptq indicates increasing intensity of functions [32]. Thus the function Jpu, vq reflects the human value of susceptible and infected persons, taking into account the growing value of funds used over a specified period of time.
The objective function can be changed as long as there is a minimum. The CE method does not impose any restrictions here. However, to compare the results with [31], the function proposed by them is used also here. There, the objective function must be square in order for the quadratic programming techniques to be used. Usually quadratic function is good in mathematics, but not in practice. Changing the objective function and impact of it can be considered further.

Optimization of the epistemological model by CE method.
Sani and Kroese [6] present the way in which the Cross-Entropy method can be used to solve the optimization of the epistemological model consisting of ordinary differential equations. The main task was to minimize the objective function Jpuq depending on one optimization function uptq over a certain set U consisting of continuous functions u. The minimum can be saved as: Parameterizing the minimum problem looks as follows: where u c is a function from U, which is parameterized by a certain control vector c P R m . Collection C is a set of vectors c. It should be noted that the selected set C should be big enough to get the enough precise solution. Then in such a set there are such c˚which will be the optimal control vector and γ˚-optimal value, respectively. One way to parameterize a problem is to divide the time interval rt 1 , t 2 s at small intervals and using these intervals together with control vector points c to define a function u c . This function can be created by interpolating between points. Such interpolation can be done using e.g. the finite element method, finite difference method, finite volume method or cubic B-Spline . Fang et al. in [34] and Caglar et al. in [35] show that all of this methods can be used to the two-point boundary value problems. Here the FEM method was used (v. [36]). The results for the other methods have not been checked. The choice of the approximation method is a difficult and interesting issue, but this is not what is consider here. With this function u c , it takes form where k i ptq " Now when u c is represented as a parameterized function, the value can be countedJpu c q by solving the system of ordinary differential equations (3) e.g. by the Runge-Kutta method. The idea of using the CE method in this problem is to use a multi-level algorithm with generated checkpoints c i , here, from a normal distribution. The CE method does not impose the distribution. Usually it is chosen from the family of density which is expected. Here, it's expected that there will be one high peak at the peak of the epidemic. Hence the normal distribution. Then, update the distribution parameters using the CE procedure using the target indicator as a condition for selecting the best samples. The calculations are carried out until the empirical variance of optimal control function is smaller than the given . This means that the function values are close to the optimal expected value The idea of applying the CE method to the SIRC model is similar. The only difficulty is the fact that when optimizing the SIRC model there are two functions instead of one as in the case of the algorithm described here. In the form of an algorithm, it looks as follows: Algorithm ( Modification of the Sani and Kroese's algorthm (v. [6]) to two optimal functions case):    (6) is adopted.

Description of the numerical results.
This section will present the results obtained using the Cross-Entropy method. For the correct comparison the same parameters for the SIRC model were used. These values are: For optimization functions uptq and vptq the following restrictions have been applied: 0 ď uptq ď 0.9, 0 ď vptq ď 0.9 Restrictions were proposed by Lenhart and Workman [37]). Their value was explained by the fact that the whole group cannot be controlled. In addition, the weights used for functions have values ρ 1 " 2 and ρ 2 " 2 Now, all that's left is to propose variable values for the objective function: The parameter values are adjusted using historical data like parameter mortality rate µ, which is counted as average lifetime of a host or parameters α, δ, γ mean the inverted time of belonging to the group I, R and C. A detailed description of determining parameters along with their limitations is presented in the article by Casagrandi et al. [9].

A remark about adjusting the parameters of the control determination procedure.
Here these parameters are given and describes the influenza A epidemics well, because of long historical data sets. However, this may not be the case, and then these parameters are subject to adjustment in the control determination procedure. An overview of applications and accuracy of calibration methods, which can be used for it is presented in the article by Hazelbag et al. [38], which provides an overview of the model calibration methods. All parameters in the model are constant and independent of time, so methods which try to optimizes a goodness-of-fit (GOF) can be use to this example. GOF is a measure that assesses consistency between model the output and goals. As a result, it gives the best combination of parameters. Examples of such methods are Grid search, Nelder-Mead method, the Iterative, descent-guided optimisation algorithm, Sampling from a tolerable range. After finding the appropriate parameters, other algorithms like the profile likelihood method or Fisher information can be used to calculate the confidence intervals for these coefficients. If the epidemic described by the model consists of transition probabilities that cannot be estimated from currently available data, calibrations can be performed to many end points. Then GOF is measured as the mean percentage deviation of the values obtained at the endpoints (v. [39]).

Proposed optimization methods in the model analysis.
The results will be presented for two moments in the model: at the beginning and in the middle of the epidemic. This is initialized with other initial parameters. For the beginning of the epidemic, they are as follows: And for the widespread epidemic: The values in the model have been normalized for the entire population, and they add up to 1. The results for both models without the use of controls are shown in 2. They were obtained using the Runge-Kutty method. The cost index in this case was in order 0.00799 and 0.00789 for the model of the beginning and the development of an epidemic. In Sections 3.3 and 3.4 two ways of calculating optimal functions uptq and vptq for the SIRC model will be considered. In the first version, the functions will be calculated separately. In the next both uptq and vptq will be counted at the same time. All versions will be presented at two moments of the epidemic: when it began and when it has already spread. The results will be compared with those obtained in the article by Iacoviello et al. [31], where the problem was solved using the sequential quadratic programming method using a tool from Matlab. Unfortunately, the article does not specify the cost index value for solving the obtained sequential quadratic programming method. Therefore, in the paper, there were attempts to recreate the form of the function uptq and vptq and the cost index obtained for them (0.003308 for the first situation and 0.006489 for the second one) and the model solution.
As can be seen in Figure 3 in the first situation, the control functions helped reduce peak infection, which was previously without control ( Figure 2) occurred between 2 and 4 months. Previously, the largest peak was around 20% occurred between 2 and 4 months. Previously, the largest peak was around 10%. In the case of the second situation, the peak also decreased, but not very clearly (it is also visible in the value of the cost index, which for the first situation is 0.003308, and for the other one 0.006489). It can be seen that timely control is also very important to reduce the harmful effects of an epidemic. For more conclusions on how to apply control properly in this model, see [31].  In the first method, the algorithm shown at the end of the section 2.3 has been applied twice. Once for the calculation of the optimal function uptq with a fixed vptq, and then vptq with a fixed uptq. In the next step, the results were combined and the result for the SIRC model was calculated using these two functions. This approach is possible due to the lack of a combined restriction on uptq and vptq. Features are not dependent on each other with resources.
The cost index was 0.003086 for the start of epidemic and 0.006443 a developed epidemic. The figure 4 show the graphs of functions uptq and vptq along with a graph of solutions for both moments during the epidemic. For functions uptq the results are similar, only the decrease here is more linear. However, the solution of the SIRC model gives the same results. Function vptq is much more interesting. The CE method showed that it should have very low values, even close to 0 throughout the entire period in both cases. However, this does not affect the cost index (it is even smaller than in the previous case) and the SIRC model. More on this subject will be included in the summary.

CE method version 2.
In the second case, it was necessary to change slightly the algorithm from Section 2.3 to be able to use it for two control functions. In point 1 of the algorithm a variable is generated C m " Npµ k , σ 2 k q, which serves to designate u C m , where µ k , σ k are requested by the user. Here 2 µ will need to be initialized µ  This section gathers the results from each version and compares them with each other. At first, the behavior of each group of the SIRC model was looked at and compared with the solution of the uncontrolled model. This is shown in Figures 6,7,8,9. The results are presented only for the first situation, where the epidemic begins to spread, because there you can see the results of using control much better.   Table 1). This is probably due to the high value of the function vptq, which increases the cost index. The question remains why there is such a discrepancy between the obtained function values vptq with both methods. In this article [31] weight and impact of functions uptq and vptq were compared. The chart in the article compares the results obtained for the application of both functions, one at a time and none. It turned out that the main influence on the number of infected has control over susceptible persons, i.e. the function uptq. Choosing the optimal values for vptq is not very important for the solution of the model. Therefore, such discrepancies are possible when choosing optimal values. The table 1 contains a comparison of the cost index values and the duration of the algorithms for each method. The first value is when the epidemic started, and the second is when it has already developed. The article does not specify the algorithm time of the method used. It can be seen that for each version the cost index values came out very similar. The differences can be seen in time. This is because in the first version of the CE method a smaller sample was simulated because both functions were considered separately. Fortunately, the functions were not closely related and it was possible to consider them separately.

Summary.
The study examined the possibility of using the CE method to determine the optimal control in the SIRC model. Similar models are being developed to model the propagation of malicious software in computer networks (v. Taynitskiy et al. [40]). However, the number of works with this approach is still relatively small, although its universality should encourage checking its effectiveness. There are two control functions in the model, which when properly selected optimize the cost function. The CE method was used in two versions: considering the two functions separately and together. The results were then compared with those obtained by Iacoviello and Stasio [31] using routing of Matlab 1 . The Cross-Entropy method had little problems when considering two functions at the same time. Due to the large number of possibilities, it was necessary to simulate a large sample, which significantly extended the algorithm's time. The second way came out much better. However, it only worked because the functions were not dependent on each other. In the opposite situation there is also a possibility to get result via CE results however, it will be necessary to change the algorithm and the dependence of how the functions depend on each other and add this to the algorithm when initiating optimization functions in the code. The same situation applies to changing the objective function and the density used for approximation. Changing the objective function gives the possibility to use other methods. If it stays quadratic and the constraints are linear, quadratic programming techniques like the sequential quadratic programming method can 1 Matlab is a paid environment with a number of functions. Unlike other environments and programming languages, additional features are created by developers in a closed environment and can only be obtained through purchase. still be used. If the objective function and the constraints stays convex, we use general methods from convex optimization. The CE method is based on well-known and simply classical rules and is therefore quite problem-independent. There are no rigid formulas and therefore it requires consideration for each problem separately. So it is very possible to reconsider it in another way, which can be interesting for further work. The CE method may also facilitate the analysis of modifications of epidemiological models related to virus mutation or delayed expression (v. Gubar et al. [41], Kochańczyk et al. [42]).
Author Contributions: KSz gives an idea of application CE to SIRC model; MKS realizes implementation and simulation.
Funding: KSz are thankful to Faculty of Pure and Applied Mathematics for research funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this article:

CE
Cross Entropy Method (v. page 2) FEM the finite element methods (v. page 7) GOF a goodness-of-fit (v. page 9) IS the importance sampling (v. page 2) SIR The SIR model is one of the simplest compartmental models. The letters means the number of Susceptible-Infected-Removed (and immune) or deceased individuals. (v. [43], [44]). SIRC The SIR model with the additional group of partially resistant to the current strain people: Susceptible -Infectious -Recovered -Cross-immune (v. page 4).

Appendix A. Optimization by the method of cross-entropy.
Appendix A.1. Multiple selection to minimize the sum of ranks.
Let pa 1 , . . . , a N q be a permutation of integers p1, 2, . . . , Nq, where all permutations are equally probable. For every i " 1, 2, . . . , N Y i will be the number of values a 1 , . . . , a i , which are ď a i . Y i " cardt1 ď j ď i : a j ď iu is the relative rank of the i-th object. The decision-maker observe the relative ranks and the realite ranks define filtration: F i " σtY 1 , Y 2 , . . . , Y i u. Let S be the set of Markov time with respect to tF i u where τ " pτ 1 , . . . , τ k q, τ i P S.
We want to find the optimal procedure τ˚" pτ1 , . . . , τk q and the value of the optimization problem v. Let F pmq i be σ-algebra generated by pY 1 , . . . , Y m i q. If we accept The problem is reduced to the problem of stopping sequences multiple times Z pmq k . As shown in [45] and [46] the solution to the problem is the following strategy: If there are integer vectors on the set F i´1 " tω : τ1 " m 1 , . . . , τi´1 " m i´1 u. i " 2, . . . , k, F 0 " Ω. For small N there is a possibility to get accurate values by the analytical method. The CE method solves this problem by changing the estimation problem to the optimization problem. Then by randomizing this problem using a defined family of probability density functions. With this the CE method solves this efficiently by making adaptive changes to this pdf and go in the direction of the theoretically optimal density.
Appendix A.2. Minimization of mean sum of ranks.
Let's consider the following problem of minimizing the mean sum of ranks: where χ " tx " px p1q , . . . , x pkq q : conditions from (A1) are preservedu is some defined set. R " pR 1 , . . . , R N q is a random permutation of numbers 1, . . . , N.Ŝ is an unbiased estimator ESpx, Rq with the following formula:Ŝ where pR n1 , . . . , R nN q is the nth copy of a random permutation R. Now a cross-entropy algorithm can be applied (see 1.3, [5]). Let's define the indicator collections tI tSpxqďγu u on χ for different levels γ P R. Let t f pp¨; uqu be the density family on χ parameterized by the actual parameter value u. For a specific u we can combine (A2) with the problem of estimation lpγq " P u pSpXq ď γq " where P u is a measure of the probability in which the random state X has a density t f pp¨; uqu and γ is a known or unknown parameter. l we estimate using the Kullback-Leibler distance. The Kullback-Leibler distance is defined as follows: Dpg, hq " ż gpxq ln gpxqdx´ż gpxqhpxqdx Usually gpxq is chosen from the family of density f p¨; vq. So here D for g and f px; vq it comes down to selecting such a reference parameter v for which´ş gpxq ln f px; vqdx is the smallest, that is, it comes down to maximization: max v ż gpxq f px; vqdx.
After some transformations: So firstly, let's generate a pair tpγ t , u t qu, which we then update until the stopping criterion is met and the optimal pair is obtained tpγ˚, u˚qu. More precisely, arrange u 0 and choose not too small and we proceed as follows: 1) Updating γ t Generate a sample X 1 , . . . , X N 2 from t f pp¨; u t´1 qu. CalculateŜpX 1 q, . . . ,ŜpX N 2 q and sort in ascending order. Forγ t chooseγ t "Ŝ pr Nsq 2) Updating u tût obtain from the Kullback-Leibler distance, that is, from maximization As in [47], here a 3-dimensional matrix of parameters u " tu ijl u is consider.
where X n " tX nij u, X nij is a random variable from f px piq j ;û t´1 q, corresponding to the formula (A4). Instead of updating a parameter use the following smoothed version u t " αû t`p 1´αqû t´1 .

3) Stopping Criterion
The criterion is from [16], which stop the algorithm whenγ T (T is last step) has reached stationarity. To identify the stopping point of T, consider the following moving average process where K is fixed.
Then let's define Ct pK, Rq " min j"1,...,R C t`j pKq and Ct pK, Rq " max where R is fixed. Then the stopping criterion is defined as follows T " mintt : Ct pK, Rq´Ct pK, Rq Ct pK, Rq ď u (A5) where K and R are fixed and is not too small.
Appendix A.3. The vehicle routing problem.
The classical vehicle routing problem (VRP) is defined on a graph G " pV, Aq, where V " tv 0 , v 1 , . . . , v n u is a set of vertices and A " tpv i , v j q : i, j P t0, . . . , nu, v i , v j P Vu is the arc set. A matrix L " pL i,j q can be defined on A, where the coefficient L i,j defines the distance between the nodes v i and v j and is proportional to the cost of travelling by the corresponding arc. There can be one or more vehicles starting off from the depot v 0 with a given capacity, visiting all or a subset of the vertices, and returning to the depot after having satisfied the demands at the vertices. The Stochastic Vehicle Routing Problem (SVRP) arises when elements of the vehicle routing problem are stochastic -the set of customers visited, the demands at the vertices, or the travel times.
Let us consider that a certain type of a product is distributed from a plant to N customers, using a single vehicle, having a fixed capacity Q. The vehicle strives to visit all the customers periodically to supply the product and replenish their inventories. On a given periodical trip through the network, on visiting a customer, an amount equal to the demand of that customer is downloaded from the vehicle, which then moves to the next site. The demands of a given customer during each period are modelled as independent and identically distributed random variables with known distribution. A reasonable assumption is that all the customers' demands belong to a certain distribution (say normal) with varying parameters for different customers.
The predominant approach for solving the SVRP class of problems is to use a "here-and-now" optimization technique, where the sequence of customers to be visited is decided in advance. On the given route, if the vehicle fails to meet the demands of a customer, there is a recourse action taken. The recourse action could be in the form of going back to the depot to replenish and fulfil the customers' demand, and continue with the remaining sequence of the customers to be visited or any other meaningful formulation. The problem then reduces to a stochastic optimization problem where the sequence with the minimum expected distance of travel (or equivalently, the minimum expected cost) has to be arrived at. The alternative is to use a re-optimization strategy where upon failure at a node, the optimum route for the remaining nodes is recalculated. The degree of re-optimization varies. At one extreme is the use of a dynamic approach where one can re-optimize at any point, using the newly obtained data about customer demands, or to re-optimize after failure. Neuro Dynamic Programming has been used to implement techniques based on re-optimization (cf., e.g., [48]).
The vehicle departs at constant capacity Q and has no knowledge of the requirements it will encounter on the route, except for the probability distributions of individual orders. Hence, there is a positive probability that the vehicle runs out of the product along the route, in which case the remaining demands of that customer and the remaining customers further along the route are not satisfied (a failure route). Such failures are discouraged with penalties, which are functions of the recourse actions taken. Each customer in the set can have a unique penalty cost for not satisfying the demand. The cost function for a particular route travelled by the vehicle during a period is calculated as the sum of all the arcs visited and the penalties (if any) imposed. If the vehicle satisfies all the demands on that route, the cost of that route will simply be the sum of the arcs visited including the arc from the plant to the first customer visited, and the arc from the last customer visited back to the plant.
Alternatively, if the vehicle fails to meet the demands of a particular customer, the vehicle heads back to the plant at that point, terminating the remaining route. The cost function is then the sum of all the arcs visited (including the arc from the customer where the failure occurred back to the plant) and the penalty for that customer. In addition, the penalties for the remaining customers who were not visited will also be imposed. Thus, a given route can have a range of cost function values associated with it. The objective is to find the route for which the expected value of the cost function is minimum compared to all other routes.
Let us adopt the "here-and-now" optimization approach. It means that the operator decides the sequence of vertices to be visited in advance and independent of the demands encountered. Such approach leads to a discrete stochastic optimization problem with respect to the discrete set of finite routes that can be taken. Let Gprq :" ErHpr, D r qs, where Hpr, D r q is the deterministic cost function of the route r " pr 1 , . . . , r N q, R -is the discrete and finite (or countable finite) feasible set of values that r can take, and demands D r i are a random variable that may or may not depend on the parameters r i . The class of discrete stochastic optimization problems we are considering here are those of the form min rPR tGprqu.
Thus, Hpr, D r q is a random variable whose expected value, Gprq is usually estimated by Monte Carlo simulation. The global optimum solution set can then be denoted by R˚" tr˚P R : Gpr˚q ď Gprq, @ rPR u.