On the Entropy of Events under Eventually Global Inflated or Deflated Probability Constraints. Application to the Supervision of Epidemic Models under Vaccination Controls

This paper extends the formulation of the Shannon entropy under probabilistic uncertainties which are basically established in terms or relative errors related to the theoretical nominal set of events. Those uncertainties can eventually translate into globally inflated or deflated probabilistic constraints. In the first case, the global probability of all the events exceeds unity while in the second one lies below unity. A simple interpretation is that the whole set of events losses completeness and that some events of negative probability might be incorporated to keep the completeness of an extended set of events. The proposed formalism is flexible enough to evaluate the need to introduce compensatory probability events or not depending on each particular application. In particular, such a design flexibility is emphasized through an application which is given related to epidemic models under vaccination and treatment controls. Switching rules are proposed to choose through time the active model, among a predefined set of models organized in a parallel structure, which better describes the registered epidemic evolution data. The supervisory monitoring is performed in the sense that the tested accumulated entropy of the absolute error of the model versus the observed data is minimized at each supervision time-interval occurring in-between each two consecutive switching time instants. The active model generates the (vaccination/treatment) controls to be injected to the monitored population. In this application, it is not proposed to introduce a compensatory event to complete the global probability to unity but instead, the estimated probabilities are re-adjusted to design the control gains.


Introduction
Classical entropy is a state function in Thermodynamics. The originator of the concept of entropy was the celebrated Rudolf Clausius in the mid-nineteenth century. The property that reversible processes have a zero variation of entropy among equilibrium states while irreversible processes have an increase of entropy is well-known. Reversible processes only occur under ideal theoretical modelling of isolated processes without energy losses such as, for instance, the Carnot cycle. Real processes are irreversible because the above ideal conditions are impossible to fulfil. In classical Thermodynamics, the Clausius equality establishes that the entropy variation in reversible cycle processes is zero. That is, the entropy variation is path-independent since it is a state function which takes an identical value at a final state being coincident with an initial one state. However, in irreversible ones the entropy variation is positive. This is the motivating reason which associates a positive variation of entropy to a "disorder increase". It has to be pointed out that, in order to interpret correctly the non-negative variations of entropy, the system under consideration has to be an isolated one. In other words, some of the mutually interacting subsystems of an isolated system can exhibit negative variations of entropy. Later on Boltzmann, Gibbs and Maxwell have defined entropy under a statistical framework. On the other hand, Shannon entropy (named after Claude E. Shannon, 1916Shannon, -2001, "the father of information theory") is a very important tool to measure the amount of uncertainty in processes characterized in the probabilistic framework by sets of events, [1][2][3][4][5]. In some extended studies in the frameworks of physics, economy or fractional calculus, it is admitted the existence of events with negative probabilities, [4][5][6]. Entropy may be interpreted as information loss [1][2][3][7][8][9] and is useful, in particular, to characterize dynamic systems from this point of view [8]. On the other hand, entropy tools have also being used either to model or for complementary modelling support to evaluate certain epidemic models, [10][11][12][13][14][15][16][17].
In some of the studies, the investigation of epidemic models, which are ruled by a differential system of coupled equations involving the various subpopulations, has been proposed in the framework of a patchy environment [13]. The model uncertainty amount is evaluated in such a way that the time-derivative of the entropy is shown to be non-negative at a set of testing sampling instants. In [9], a technique to develop a formal Shannon entropy in the complex framework is proposed. In this way, the components of the overall entropy are calculated so as to determine the real and the imaginary parts of the state complex Shannon entropy as a natural quantum-amplitude generalization of the classical Shannon entropy.
Epidemic evolution has been typically studied through models based either on differential equations, difference equations or mixed hybrid models. Such models, because of their structure, become very appropriate to study the equilibrium points, the oscillatory behaviors, the illness permanence and the vaccination and treatment controls. See, for instance, [18][19][20][21][22][23][24] and some of the references therein. More recently, entropy-based models have been proposed for epidemic models. See, for instance, [14][15][16]25] and some of the references therein. In particular, entropy tools for analysis are mixed with differential-type models in [16,25,26]. Also, the control techniques are appropriate for studies of alternative biological problems, [27][28][29] and, in particular, for implementation of decentralized control techniques in patchy environments where several nodes are interlaced, [27,29]. Under that generic framework, it can be described, for instance, the situation of several towns with different own health centers, where the controls are implemented, and whose susceptible and infectious populations interact through in-coming and out-coming population fluxes. This paper proposes a supervisory design tool to decrease the uncertainty between the observed data related to infectious disease, or those given by a complex model related to a disease, via the use of a set of dynamic integrated simplest models together with a higher-level supervisory switching algorithm. Such a hierarchical structure selects on line the most appropriate model as the one which has a smaller error uncertainty according to an entropy description (the so-called "active model"). Such an active model is re-updated through time in the sense that another active model can enter into operation. In that way, the active model is used to generate the correcting sanitary actions to control the epidemics such as, for instance, the gains of the vaccination and antiviral or antibiotic treatment controls and the corresponding control interventions.
The paper is organized as follows: Section 2 recalls some basic concepts of Shannon entropy. Also, those concepts are extended to the case of eventual presence of relative probabilistic uncertainties in the defined set of events compared to the nominal set of probabilities. Section 3 states and proves mathematical results for the Shannon entropy for the case when a nominal complete finite and discrete set of events is eventually subject to relative errors of their associate probabilities in some of all of their various integrating events. The error-free nominal system of events is assumed to be complete. The current system of events under probabilistic errors may be either deflated or inflated in the sense that the total probability for the whole sets might be either below or beyond unity, respectively, so that it Entropy 2020, 22, 284 3 of 35 might be non-complete. In the case of globally inflated of deflated probability of the whole set of events, new compensatory events can be used to accomplish with a unity global probability. The entropy of the current system is compared via quantified worst-case results to that of the nominal system. Section 4 applies partially the results of the former sections to epidemics control in the case that either the disease transmission coefficient rate is not well-known or it varies through time due, for instance, to seasonality. The controls are typically of vaccination or treatment type or appropriate mixed combinations of both of them. A finite predefined set of running models described by a system of coupled differential equations is set. Such a whole discrete set covers a range of variation of such a coefficient transmission rate within known lower-bound and upper-bound limits. Each model is driven by a constant disease transmission rate and the whole set of models covers the whole range of foreseen variation of such a parameter in the real system. Since the set of models is finite, the various values of the disease transmission rate are integrated in a discrete set within the whole range of admissible variation of the true coefficient rate. A supervisory technique of control monitoring is proposed which chooses the so-called active model which minimizes the accumulated entropy of the absolute error data/model within each supervision interval. A switching rule allows to choose another active model as soon as it is detected that the current active model becomes more uncertain than other(s) related to the observed data. The active model supplies the (vaccination and/or treatment) controls to be injected to the real epidemic process. Due to the particular nature of the problem, compensatory events are not introduced for equalizing the global probability to unity. Instead, a re-adjustment of the error probabilities related to the true available data is performed to calculate the control gains provided by the active model. It can be pointed out that some parameters of the epidemic description evolutions, typically the coefficient transmission rate, can vary according to seasonality [30]. This justifies the use of simpler active invariant models to describe the epidemics evolution along time subject to appropriate model switching. On the other hand, the existing medical tests which evaluate the proportions of healthy and infected individuals within the total population are not always subject to confluent worst-case estimation errors. See, for instance [31,32]. In the case that the estimated global probability of the various subpopulations is not unity, the estimated worst-case probabilities need to be appropriately amended before an intervention. This design work is performed and discussed in Section 5 through numerical simulations in confluence with the above mentioned supervision monitoring technique. Finally, conclusions end the paper.

Basic Entropy Preliminaries
Assume a finite system of events A = A i : i ∈ n of respective probabilities p i for A i ; i ∈ n = {1, 2, · · · , n} with p i : i ∈ n ⊂ [0, 1], with n i=1 p i = 1 which is complete, that is, only one event occurs at each trial (e.g., the appearance of 1 to 6 points in rolling a die). The Shannon entropy is defined as follows: which serves as a suitable measure of the uncertainty of the above finite scheme. The name "entropy" pursues a physical analogy with parallel problems, for instance, in Thermodynamics or Statistics Physics which does not have a similar sense here so that there is no need to go into in the current context. The above entropy has been defined with neperian logarithms but any logarithm with a fixed base could be used instead with no loss in generality. Note that H(p 1 , p 2 , . . . ..p n ) ≥ 0 with H(p 1 , p 2 , . . . ..p n ) = 0 if and only if p j = 1 for some arbitrary j ∈ n and p i = 0; ∀i( j) ∈ n and H max = H(p, p, . . . ., p) = H(1/n, 1/n, . . . ., 1/n) = max H max = H(p, p, . . . ., p) = H(1/n, 1/n, . . . ., 1/n) = max with H max = 0 if and only if n = p = 1. Assume that the probabilities are uncertain, given by p i (1 + ε i ); ∀i ∈ n, subject to the constraints where ε i0 and ε i1 are known so that the current entropy (or the entropy of the current system) is uncertain and given by H ε = H(p 1 (1 + ε 1 ), p 2 (1 + ε 2 ), . . . ..p n (1 + ε n )); ∀i ∈ n. The reference entropy H * = H(p 1 , p 2 , . . . ..p n ) for the case when the probabilities of all the events are precisely known being equal to p i ; ∀i ∈ n is said to be the nominal entropy (or the entropy of the nominal system). Some constraints have to be fulfilled in order for the formulation to be coherent related to the entropy bounds under probabilistic constraints in the set of involved events. The following related result follows: Lemma 1. Assume a nonempty finite complete nominal system of events A * = A i * : i ∈ n of respective nominal probabilities p i ∈ [0, 1] for A i * ; ∀i ∈ n = {1, 2, · · · , n} and a current (or uncertain) version of the system of events A = A i : i ∈ n of uncertain probabilities p i (1 + ε i ) ∈ [0, 1]; ∀i ∈ n, where ε i ∈ R are relative probability errors due to probabilistic uncertainties. Define the following disjoint subsets of A: Then, the following constraints hold: (i) A is complete if and only if n i=1 ε i p i = 0 (global probabilistic uncertainty mutual compensation) (ii) max (cardA + , cardA − ) < n, and min (cardA + , cardA − ) > 0 or cardA 0 = n (and then A + = A − = ∅) Proof: The proof of Property (i) follows since A * being complete implies that A is complete if and only if n Property (i) has been proved, The proof of Property (ii) follows from Property (i). To prove the first constraint, assume, on the contrary, that max (cardA + , cardA − ) = n. Then, either cardA + = n and A − = A 0 = ∅ or cardA − = n and A + = A 0 = ∅. Assume that cardA + = n and A − = A 0 = ∅. Then, n i=1 ε i p i > 0 which contradicts Property (i). Similarly, if cardA − = n and A + = A 0 = ∅ then n i=1 ε i p i < 0 which again contradicts Property (i). Thus, max (cardA + , cardA − ) < n and the first constraint of Property (i) has been proved. Now, assume that min (cardA + , cardA − ) = 0 and 0 ≤ cardA 0 < n. First, assume that cardA − = 0 and cardA + = cardA − cardA 0 = max (cardA + , cardA − ) ≤ n (from the already proved above first constraint of this property). If cardA + > 0 then, n i=1 ε i p i > 0 which contradicts Property (i). If cardA + = 0 then A = A 0 and cardA 0 = n. So, if cardA − = 0 then cardA + = 0 and cardA 0 = n (then A = A 0 ). In the same way, interchanging the roles of A + and A − , it follows that, if cardA + = 0 then cardA − = 0 and cardA 0 = n. One concludes that either min (cardA + , cardA − ) > 0 or cardA 0 = n and Property (ii) is proved.
Note that, in Lemma 1, A 0 ⊂ A is the subset of probabilistically certain events of A (in the sent that is elements have a known probability) and A + ∪ A − ⊂ A is the subset of probabilistically uncertain events of A. For A being nonempty, any of the sets A 0 , A + and A − or pair combinations may be empty. Note also that the probabilistic uncertainties have been considered with fixed values ε i ∈ [0, 1]; ∀i ∈ n.

Entropy Versus Global Inflated and Deflated Probability Constraints in Incomplete Systems of Events under Probabilistic Uncertainties
An important issue to be addressed is how to deal with the case when, due to incomplete knowledge of the probabilistic uncertainties, the sum of probabilities of the current complete system of events, i.e., that related to the uncertain values of individual probability values exceeds unity, that is n i=1 p i (1 + ε i ) > 1, a constraint referred to as "global inflated probability". If both the nominal individual and nominal global probability constraints, p i ≥ 0; ∀i ∈ n and n i=1 p i = 1 hold then there is a global inflated probability if and only if the probabilistic disturbances fulfill n i=1 p i ε i > 0. It is always possible to include the case of "global deflated probability" if n i=1 p i (1 + ε i ) < 1 implying that n i=1 p i ε i < 0 provided that n i=1 p i = 1. Three possible solutions to cope with this drawback, associate to an exceeding amount of modeled probabilistic uncertainty leading to global inflated probability, are: (a) To incorporate a new (non empty) event A n+1 with negative probability which reduces the probabilistic uncertainty so that the global probability constraint of the extended complete system of events A e = A ∪ A n+1 holds. Obviously, A losses is characteristic property of being a complete system of events and A e is said to be partially complete since it fulfills the global probability constraint but it has one negative probability.
(b) To modify all or some of the probabilistic uncertainty relative amounts ε i ; i ∈ n so that the global probability constraint of the complete system of events holds for the new amended fixed set ε i : i ∈ n .
(c) To consider uncertain normalized probabilities p i = In this case, there is no need to modify the individual uncertainties of the events but the initial uncertainties are kept. Furthermore, the events would keep its ordination according to uncertainties, namely, if p i (1 + ε i ) ≤ p j 1 + ε j for i, j( i) ∈ n then p i ≤ p j , the modified set of events is complete and keeps the same number of events as the initial one.
Two parallel solutions to cope with global deflated probability are: (d) To incorporate a new (non empty) event A n+1 with positive probability which decreases the probabilistic uncertainty so that the global probability constraint of the extended complete system of events A e = A ∪ A n+1 holds. As before, the current system of events A losses is completeness and A e is complete.
(e) To modify all or some of the probabilistic uncertainty relative amounts ε i ; i ∈ n so that new amended fixed set ε i : i ∈ n agrees with the global probability constraint.
Firstly, we note that the introduction of negative probabilities invoked in the first proposed solution has a sense in certain problems. In this context, Dirac commented in a speech in 1942 that negative probabilities can have a sense in certain problems as negative money has in some financial situations. Later on, Feynman has used also this concept to describe some problems of physics [4] and also said that negative trees have nonsense but negative probability can have sense as negative money has in some financial situations. More recently, Tenreiro-Machado has used this concept in the context of fractional calculus [6].
It is now proved in the next result that the incorporation of a new event of negative probability to define from the system of events A an extended partial complete system of events A e does not alterate the property of the non-negativity of the Shannon entropy in the sense that that of the extended system of events is still non-negative. At the same time, it is proved that (1) If 1 < n i=1 p i (1 + ε i ) ≤ 2 then the real part of the entropy of A e (which is complex because of the negative probability of the added event for completeness) is smaller that that of A. The interpretation is that the excessive disorder in A, due to its global inflated probability, is reduced (equivalently, "the order amount" A e is increased in with respect to A) when building its extended version A e by the contribution of the added event of negative probability. Also, both A and A e are not complete while A e is partially complete since it fulfills the global probability constraint although not the partial ones in all the events due to the incorporated one of negative probability.
(2) If n i=1 p i (1 + ε i ) > 2 then the above qualitative considerations on increase/ decrease of "order" or" disorder" are reversed with respect to Case a. Theorem 1. Assume a nonempty finite complete nominal system of events A * = A i * : i ∈ n of respective nominal probabilities p i ∈ [0, 1] for A i * ; ∀i ∈ n = {1, 2, · · · , n} and a current (or uncertain) version of the system of events A = A i : i ∈ n of uncertain probabilities p i (1 + ε i ) ∈ [0, 1]; ∀i ∈ n. Assume that 2 ≥ n i=1 p i (1 + ε i ) > 1 (that is, there is a global inflated probability of the current system of events). Define the extended system of events A e = A ∪ A n+1 such that A n+1 has a probability p n+1 = p n+1 (ε 1 , ε 2 , · · · , ε n ) = 1 − n i=1 p i (1 + ε i ). Then, where: are the respective entropies of A and A e , the second one being complex.
Proof: Since n i=1 p i (1 + ε i ) > 1 then A = A i1 : i ∈ n is not complete. Since the nominal system of events is complete then n i=1 p i = 1 with p i ≥ 0; ∀i ∈ n and p n+1 = 1 − n i=1 p i (1 + ε i ) < 0. The entropy of A is and that of the extended system of events A e is H ε1 . Consider two cases: Since the neperian logarithm of a negative real number exists in the complex field and it is real such that its real part is the neperian logarithm of its modulus and the imaginary part is iπ (i = √ −1 being the imaginary unit), so that it becomes a negative real number, one gets the following set of relations: Then: One concludes that that ReH ε1 ≤ H ε with 0 ≤ ReH ε1 = H ε if and only if n i=1 p i (1 + ε i ) = 2. It has to be proved now that ReH ε1 ≥ 0. Assume, on the contrary, that ReH ε1 < 0. Since ReH ε1 < H ε , this happens if and only if the following contradiction holds since n i=1 p i (1 + ε i ) > 1 in the assumption of global inflated probability. Then, 0 ≤ ReH ε1 ≤ H ε .
Then ln n i=1 p i (1 + ε i ) − 1 > 0 and then the last above identity changes the second term right-hand-side term resulting in: It is now proved in the next result for the case of deflated probability that the introduction of an additional event of positive probability to define from the system of events A an extended partial complete system of events A e does not modify the property of the non-negativity of the Shannon entropy in the sense that that of the extended system of events is still non-negative. At the same time, it is proved that the entropy of A e is larger that that of A. The interpretation is that the excessively low disorder in A, due to its global deflated probability, is increased (equivalently, "the order amount" is decreased) when building its extended version A e by the contribution of the added event A n+1 of positive probability. Also, A is not complete while A e is complete. Theorem 2. Assume a nonempty finite complete nominal system of events A * = A i * : i ∈ n of respective nominal probabilities p i ∈ [0, 1] for A i * ; ∀i ∈ n = {1, 2, · · · , n} and a current (or uncertain) version of the system of events A = A i : i ∈ n of uncertain probabilities p i (1 + ε i ) ∈ [0, 1]; ∀i ∈ n. Assume that n i=1 p i (1 + ε i ) < 1 (that is, there is a global deflated probability of the current system of events). Define the extended system of events are the respective entropies of A and A e0 .
The entropy of A is H ε and that of A e0 is: Since 0 < 1 − n i=1 p i (1 + ε i ) < 1, one gets the following set of relations: One concludes that that H ε0 > H ε ≥ 0. Note that since we entropy is defined with a sum of weighted logarithms of nonnegative real numbers bounded by unity in the usual cases (which exclude negative probabilities) then the Shannon entropy of the worst case of the probability uncertainty is not necessarily larger than or equal to the nominal one.

Example 1. Assume that the nominal system of two events is
Assume that it probabilistic uncertainty is given by ε = 0.1, Then, the entropy under the uncertain probability of Example 2. Consider the complete system of events A = A i : i ∈ n such that p i = λ i p; ∀i ∈ n, for some p > 0, is the probability of A i , ∀i ∈ n and constants λ i ∈ R 0+ = R + ∪ {0}; ∀i ∈ n satisfying that n i=1 λ i = 1/p. Note that n i=1 p i = 1 so that there is no negative probability and neither global inflated or global deflated probabilities. The Shannon entropy is the everywhere continuous real concave function: whose maximum value is reached at a real constant p * if dH/dp] p=p * = 0. Direct calculation yields, since Such a p * is unique for the given set λ i : i ∈ n , since the function H a (p) is concave in its whole definition domain, and admissible provided that p * ∈ [0, 1]. The unique solution for each given set λ i : i ∈ n is p * = p * (λ 1 , λ 2 , . . . , λ n ) satisfying the constraint: Since f : [0 , 1] → R is continuous and bounded from above with f (0) = −∞ and f (1) = 1, and K < 0 such that a real p * exists in (0, 1] and it is unique, since there is a unique p * such that f (p * ) = K λ , for each given set λ i : i ∈ n and the maximum entropy becomes In particular, if n = 2 with p 1 = λ 1 p and leading to λ 2 p = 1 − λ 1 p = 1/2 so that λ 1 p = λ 2 p = 1/2 and p * = 1/2λ 1 . Note that this is equivalent to ln In particular, if λ 1 = λ 2 = 1 then p * = 1/2. Assume now that one uses any basis b of logarithms to define the entropy so that: leading to: dH ab (p) dp which is satisfied for the same conditions as above, that is, if λ 1 p * = 1 − λ 1 p * = 1/2 so that p * = 1/2λ 1 again. However, H ab (p * ) H a (p * ) if b e (e being the basis of the neperian algorithms).

Example 3.
Consider the particular case of Example 2 for n = 2 with global deflated probability, that is, p 1 = λ 1 p and p 2 = λ 2 p < 1 − λ 1 p. The system of two events {A 1 , A 2 } is not complete and we extend it with a new event A 3 of positive probability p 3 = 1 − (λ 1 + λ 2 )p so that the extended system of events is complete. Now, one has for such an extended system: which is satisfied for p = p * such that with H a (p) being obtained in Example 2 for the non-deflated probability case, as expected from Theorem 2. Note also that the value of p * giving the maximum entropy is less than the one producing the maximum entropy in the non-deflated case of Example 2 since that one p * = 1/(λ 1 + λ 2 ) with give the contradiction 0=1 in the above formula for p = p * and p * < 1/(λ 1 + λ 2 ) would give the contradiction 1 < 0 in such a mentioned formula.
The next result relies on the case where the probabilities are uncertain but each nominal probability is assumed to be eventually subject to the whole set of probabilistic uncertainties so as to cover a wider class of potential probabilistic uncertainties. It is also assumed (contrarily to the assumptions of Theorem 1 and Theorem 2) that the probability disturbances never increase each particular probability of the nominal system of event since they are all constrained to the real interval [0, 1]. As a result, the cardinalities of the extended current and nominal systems of events are, in general, distinct. It is found that the entropy of the current system of events is non-smaller than the nominal entropy. Theorem 3. Let H εδ = H(p 1 δ 1 , p 1 δ 2 , . . . .., p 1 δ m , . . . ., p n δ 1 , p n δ 2 , . . . .., p n δ m ) and H * = H(p 1 , p 2 , . . . .., p n ) the extended perturbed and current entropies of the sets of nonempty events A α = A ij : (i, j) ∈ n × m and A * = A i : i ∈ n , respectively, of respective cardinalities n × m and n, with p i , δ i ∈ [0, 1] and n i=1 p i = m i=1 δ i = 1. Then, the following properties hold: is the incremental entropy due to the uncertainties and H εδ = H * if and only if δ j = 1 and δ i = 0; ∀i( j) ∈ m and some arbitrary j ∈ m. If m = n and δ i = 1 + ε i with ε i ∈ [0, 1]; ∀i ∈ n and n i=1 ε i = 1 − n then the above relations hold with H εδ = H * if and only if ε j = 0 and ε i = −1; ∀i( j) ∈ n and some arbitrary j ∈ n.
(ii) Assume that m = n and Then, for any given integer n ≥ 3 and any k ∈ n − 2 such that 0 The following identity holds under the constraints of Property (ii): for any given integer n ≥ 3 and any k ∈ n − 2 such that 0 < n − k + n−1 i=1 ε i < 1.

Proof: Note from the additive property of the entropy that H
Then, Property (i) follows from the non-negativity property of the entropy and the fact that H δ = 0 if and only if δ j = 1 and δ i = 0; ∀i( j) ∈ m and some arbitrary j ∈ m what implies that H(δ 1 , δ 2 , . . . .., δ m ) = H 0, 0, . . . , δ j (= 1), 0, . . . .., 0 = 0. The property for m = n and ε i = δ i − 1; ∀i ∈ n is a particular case of the above one. Property (i) has been proved.
On the other hand, one gets from the recursive property of the entropy if m = n and for any given integer n ≥ 3 each equality for a right-hand-side implying that the sum of the two first argument of the entropy is positive, that is, ε 1 + ε 2 > −2. Property (ii) has been proved. Property (iii) is a consequence of Property (ii).
The next result relies on the case where the probabilities are uncertain but belonging to a known admissibility real interval, rather than fixed as it has been assumed in Lemma 1. Contrarily to Theorem 3, it is not assumed that the probabilistic disturbances are within [0, 1] so that they can increase the corresponding nominal probabilities. Theorem 4. Assume a finite complete system of events A i : i ∈ n such that the nominal and current probabilities of the event A i are p i and p Then, the following properties hold: where: such that an upper-bound of the lower-bound H 0 of H ε is: Proof: Note that, since ε i > −1; ∀i ∈ n then ln(1 + ε i ) ≤ ε i ; ∀i ∈ n so that: and Property (i) follows directly. On the other hand, note that: Note that, since max(ε i0 , ε i1 ) ≤ 1; ∀i ∈ n, we can use Taylor's expansion series around 1 for the above neperian logarithms to get: so that:  (26) and (27) imply that m 1 ∈ [−1, 0] and m 0 ∈ [0, 1]. The proof of Property (i) has been completed. Now, since ε i0 < 1 and ε i1 > −1; ∀i ∈ n then (25) follows since: since −ε i0 > −1; ∀i ∈ n, equivalently, ε i0 < 1; ∀i ∈ n so that: and Property (ii) has been proved. On the other hand, Property (iii) is proved as follows: After using of Cauchy-Schwartz inequality for summable sequences and the constraint n i=1 (p i lnp i ) 2 1/2 ≤ H * , one gets, since ε i1 > −1 and ε i0 < 1, that Property (iii) holds since: and Property (iv) follows from Property (iii) and the fact that:

Applications to Epidemic Systems
This section is devoted to the application of entropy tools to the modelling of epidemics. It should be pointed first out that some parameters of the epidemic description evolutions, typically the coefficient transmission rate, can vary according to seasonality [30]. It can be also pointed out that the existing medical tests to evaluate the proportions of healthy and infected individuals are not always subject to confluent estimation of the errors. That is the estimated worst-case errors for reach of the populations can have different running ranges due to the fact that they are performed with different techniques. See, for instance, [31,32] where the mentioned question is justified taking as a basis medical tests. -To solve the first problem, one considers the usefulness of designing a parallel scheme of alternative time-invariant models, being ran by a supervisory switching law, to choose the most appropriate active time-invariant model through time. The whole set of models covers the whole range of expected variations of the model parameters through time. The objective of the parallel structure is to select the one which describes the registered data more tightly along a certain period of time.
-On the other hand, the fact that the existing tests of errors on the subpopulation integrating the model not always give similar worst-case allowed estimated errors for all the subpopulations, justifies and adjustment of the probabilities in the case when the sum of all of them does not equalize unity. Two potential actions to overcome this drawback are: (a) the introduction of events of positive (respectively, negative) probabilities in the case of deflated (respectively, inflated) global probability; (b) to readjust all the individuals estimated probabilities via normalization by the current sum which is distinct of unity. In the first case, the re-adjustment is made by an algebraic sum manipulation. In the second one, be readjusting via normalization all the individual probabilities. In both cases, it is achieved that the amended global probability is unity.
Assume an epidemic disease with unknown time-varying bounded coefficient transmission rate β(t) ∈ β 0 , β 1 , where β 0 and β 1 are known, which is defined by the following differential system of n first-order differential equations: .
where x(t) ∈ R n is the state-vector and p is the vector of parameters containing all other parameters that the coefficient transmission rate, like recovery rate mortality rate, average survival rate, average expectation of life irrespective of the illness etc. In practice, Equation (43) can be replaced by a non-parameterized description, based on the state measurements through time, where x(t) is given by provided experimental on-line data on the subpopulation which in this case, should be discretized with a small sampling period. In this way, (43) can be either a more sophisticated mathematical model, than those simplified ones provided later on, which provides data x(t) on the illness close to the real measurements or the listed real data themselves got from the disease evolution. The state vector contains the subpopulations integrated in the model which depend on the type of model itself such as susceptible (E), infectious (I) and recovered (or immune) (R) in the so-called SIR models to which it is added, in the so-called SEIR models the exposed subpopulation (E) which are those in the first infection stages with no external symptoms. The models can also contain a vaccinated subpopulation (V) and can have also several nodes or patches, describing, for instance, different environments, in general coupled, each having their own set of coupled subpopulations which interact with the remaining ones though population fluxes. The vector u(t) ∈ R m is the control vector. There are typically either one control, namely, vaccination on the susceptible, or two controls, namely, vaccination on the susceptible and (either antiviral or antibiotic) treatment on the infectious in the case when there is only one node. Those controls might be applied to each subsystem associated to one patch if there are several patches integrated in the model. The matrix function of dynamics Ψ(x(t), β(t) , p) is a real n × n-matrix for each t ∈ R 0+ . The control matrix Γ has as many columns as controls are applied and it typically consists of entries being "o" (i.e., no control applied on the corresponding state component associated to one subpopulation), "−1" if the control leads to a decrease of the rate of growing of a subpopulation, for instance, vaccination effort on the susceptible) and "+1" if it leads to a compensatory increase rate of a subpopulation due to a corresponding decrease of another one, for instance, the increase in the recovered in the vaccination case (when the susceptible are decreased via vaccination) or again the recovered in the treatment case (when the infectious are decreased via treatment). For simplicity, it is assumed that p is constant and there are no delays in the dynamics.
The control architecture which is proposed consists of a scheme of approximated models located in a parallel disposal, one of them being chosen by a higher-level supervisory switching scheme as the active model to select the controller gains along each current time interval.
In particular, it is proposed to run a set of Q + 1 approximated models of the same dimension as Equation (43) with a constant coefficient transmission rate β i ; ∀i ∈ Q + 1 being chosen as: in such way that β 1 = β 0 and β Q+1 = β 1 . Note that the approximated models (44) and (45) are parameterized by a constant coefficient transmission rate contrarily to the real model (43) whose coefficient transmission rate is time-varying. The remaining parameters are constant and eventually time-varying, respectively. Note also that the models (44) are initialized to the initial conditions. We consider a set of (Q + 1) event errors E = E i : i ∈ Q + 1 of the states of the models (44) and (45) with respect to (43), that is, e i (t) = x i (t) − x(t); ∀i ∈ Q + 1. Each event E i is integrated by a set of events E ij which are the errors of each of its integrating subpopulations with respect to the real system, that is, Define the instantaneous error entropies of each error event by summing up all the component-wise contributions, that is, H(E i , t) = − n j=1 p ij (t)lnp ij (t); ∀i ∈ Q + 1, while the corresponding accumulated continuous-time and discrete-time entropies on the time interval [t, t + T) are defined in a natural way from the instantaneous ones, respectively, as follows: and: provided that T = ατ so that α = T/τ is the set of sampling intervals on T of period τ which is a submultiple of T with T and τ being design parameters satisfying these constraints. The control effort is calculated by applying on a time interval [t, t i+1 ) the control which has made the accumulated entropy of the error on an error event to be smallest one among all the error events on a tested previous time interval [t − t i , t) which defines the so-called active model on [t, t i+1 ). To simplify the exposition, and with no loss in generality, the accumulated discrete -time entropy is the particular one used for testing in the sequel. Then, the following switching Algorithm 1 is proposed: Algorithm 1 (all the subpopulations are ran by the same active model within each inter-switching interval) Step 0-Auxiliary design parameters: Define the prefixed minimum inter-sample period threshold T min > 0 and σ being an auxiliary time interval, 0 < σ << T min , to measure the possible degradation of the current active model in operation what foresees a new coming switching.
Step 1-Initial control: u(t) = u i (t) for t = t 0 , with t 0 = 0, for some arbitrary model i ∈ Q + 1 and make k(∈ Z 0+ ) = 0, the initial active control being a(0) = i ∈ Q + 1 an the initial running integer for switching time instants is k = 0.
Step 2-Eventually switched control: such that: -current active model: a(t) = i ∈ Q + 1; ∀t ∈ [t k , t k+1 ) is the active model in the set of (Q + 1) models which generates the control on [t k , t k+1 ), -next active model: is the next active model to be in operation at the next switching time instant t = t k+1 .
Remarks: (1) Note that the initial control run on a time interval lasting at least the designed time interval length T min . In the case of availability of some "a priori" knowledge about the adequacy of the various models to the epidemic process in the initial stage of the disease, this knowledge can be used to overcome the arbitrariness in the selection of the initial controller.
(2) Note also that if k ij ≤ n j=1 e ij (t+ jτ) e ij (t+ jτ) ; i ∈ Q + 1, ∀j ∈ n, ∀t ∈ R 0+ then the global probability of the error event E i cannot be inflated but it can be deflated. If k ij (t) = n − 1; i ∈ Q + 1, ∀j ∈ n, ∀t ∈ R 0+ then such a global probability is neither inflated nor deflated for all time. = n − (n − 1) = 1; ∀i ∈ Q + 1, ∀t ∈ R 0+ . Note also that the fact that k ij (t) can be time-varying so that the probabilities have a margin for experimental design adjustment relies with the problem statement of probabilities subject to possible errors in the theoretical statements of the above sections.
(3) Note that in the above algorithm there are zero-probability (although non impossible) events of leading a unique solution like, for instance, the first accumulated entropy equality in (48) of the strict inequality leading the next active controller. If the event a lack of uniqueness be detected one can fix any valid solution (from the set of valid ones) to run or simply to give an uniqueness rule, as for instance, to get as active the nearest indexed model to the last active one active among the set of valid ones.
(4) Note also that the events E ij for j ∈ n and each given i ∈ Q + 1 are not mutually independent since they consist of the solutions of all the subpopulations, each given by a single-order differential equation, between the n-th differential equation of the i-th model. The reason is that first-order equations of the differential system are coupled.
(5) It can be observed that the switching time instants are also resetting times of initial conditions with the real (or more tightly) data provided by the real model (43).
(6) The use of Equation (49) to calculate the probabilities together with their companion saturation rules for the control gains, is sufficient to evaluate the entropies of Equation (48) in order to implement the algorithm. So, it is no need of introduction of a compensatory event of negative probability to equalize to unity the global probability.
An alternative to the above switching algorithm consists of defining the error events, one-per subpopulation of the epidemic description (44) versus their real one counterparts in (43). For instance, we can think of choosing the "susceptible event error" with the Q + 1 first error components of all the solutions of (44) compared to the first component of (43) and one proceeds so on for the various remaining subpopulations. So, we can chose online the best simplified model (that is, the closest one to the true complex model) for each subpopulation. Note that this can be reasonable since the approximated models are designed by choosing parameterization of the coefficient transmission rate which sweep a region where the true one is point-wise allocated through time. In parallel, each whole set of error events, associated to each of the subpopulations, consists of mutually independent events since, in this case, the data of all the set of models, including those of the active one, are generated by different differential equations on nonzero measure intervals. In this case, the set of events is {E 1 , E 2 , · · · , E n }, such that E j = E ij : i ∈ Q + 1 ; ∀j ∈ n is associated to the j-th subpopulation, instead of E 1 , E 2 , · · · , E Q+1 with E i = E ij : j ∈ n ; ∀i ∈ Q + 1 as associated to one model as it was sated in the former design. Then, we have the following Algorithm 2.

Algorithm 2 (each subpopulation can be ran by a different active model within each inter-switching interval)
Step 0-Auxiliary design parameters: it is similar to that of Algorithm 1.
Step 1-Initial control: it is similar to that of Algorithm 1.
Step 2-Eventually switched control: such that u j (t) is the j-th controller component, and one can distinguish: -the current active model for the j-th subpopulation: a(t) = i ∈ n; ∀t ∈ [t k , t k+1 ) is the active model in the set of n models which generates the control on [t k , t k+1 ), -the next active model for the j-th subpopulation: is the next active model to be in operation at the next switching time instant t = t k+1 . -The switching time instants are obtained in a similar way as in Algorithm 1.
-The control law (33) is calculated by computing the accumulated discrete-time entropies (47) with a similar probabilistic rule as that of Algorithm1 with errors e i j (t) = x i j (t) − x j (t); ∀i ∈ Q + 1, ∀ j ∈ n, , ∀t ∈ R 0+ Step 3-Updating the activation of the next active control and inter-switching time interval: it is similar to that of Algorithm 1.

Remark 6.
Related to (50) in Algorithm 2 versus (48) in Algorithm 1, note that u j (t) is the j-th controller component. Assume that the epidemic model is a SEIR one such that x = (x 1 , x 2 , x 3 , x 4 ) T = S −susceptible , E −exposed , I −in f ectious , R −immune T . The precise meaning of the sentence is that the active vaccination control is got, for instance, from the active controller 1 ∈ n if feedback vaccination control is used as being proportional to the susceptible subpopulation. And the active treatment control is got, for instance, from the active controller 3 ∈ n if feedback treatment control is used being proportional to the infectious subpopulation. If there are no more controllers the remaining controllers would be zeroed for all time. The models for the other components could be omitted from the whole scheme or simply used for information of the estimation of subpopulations since no controls are got from them to be injected to the real population.

Simulation Examples
This section contains some simulation examples illustrating the application of the proposed Entropy paradigm to the multi-model epidemic system discussed in Section 4. Thus, the behavior of Algorithms 1 and 2 will be shown in this section through numerical examples in open and closed-loop. The accurate model considered as the one generating the actual or true data is the SEIR one described in [27] with vaccination: where µ = 2.0 years −1 is the growth and death rate of the population, ε = 1.0 years −1 , δ = 0.1 days −1 , γ = 0.02 days −1 are the instantaneous per capita rates of leaving the exposed, infected and recovered stages, respectively, and V(t) denotes the vaccination. This model fits in the structure given by (26) where u(t) = V(t) acts as the control command. The initial conditions are given by S(0) = E(0) = I(0) = R(0) = 0.1. All the parameters are assumed to be constant except β(t), the disease transmission coefficient, which describes the seasonality in the infection rate and is given by the widely accepted Dietz's model, [30], β(t) = β 0 (1 + bcos(2πt)) with β 0 = 6.2 and b = 0.6. The function β(t) describes annual seasonality in this example. Figure 1 shows the behavior of this system in the absence of any external action (i.e., in open loop). As it can be observed in Figure 1, the disease is persistent since the infectious do not converge to a zero steady-state value asymptotically. This situation will be tackled in Section 5.2 by means of the vaccination function in order to generate a closed-loop system whose infectious tend to zero.

Simulation Examples
This section contains some simulation examples illustrating the application of the proposed Entropy paradigm to the multi-model epidemic system discussed in Section 4. Thus, the behavior of Algorithms 1 and 2 will be shown in this section through numerical examples in open and closedloop. The accurate model considered as the one generating the actual or true data is the SEIR one described in [27] with vaccination: All the parameters are assumed to be constant except The function β(t ) describes annual seasonality in this example. Figure 1 shows the behavior of this system in the absence of any external action (i.e., in open loop). As it can be observed in Figure 1, the disease is persistent since the infectious do not converge to a zero steady-state value asymptotically. This situation will be tackled in Subsection 5.2 by means of the vaccination function in order to generate a closed-loop system whose infectious tend to zero.          It is worth to mention that none of the models whose trajectory is depicted in Figures 2-5 can solely describe the dynamics of the whole accurate model, since none of them is able to reproduce the complex behavior generated by the time-varying seasonal incidence rate, β(t). In this way, the switching mechanisms given by Algorithms   It is worth to mention that none of the models whose trajectory is depicted in Figures 2-5 can solely describe the dynamics of the whole accurate model, since none of them is able to reproduce the complex behavior generated by the time-varying seasonal incidence rate,           From the above Figures we can conclude that the switched system, either with Algorithm 1 or with Algorithm 2, is able to reproduce the shape of the trajectories of the accurate-time-varying system. Consequently, the presented approach is useful to describe the overall dynamics of a complex set of data, coming either from real measurements or from a complex model, by means of a simpler piece-wise time-invariant model. The switching rules based on the entropy of the errors between each model and the data have revealed to be an adequate frame for such a task. Figures 10 and 15 show the active model selected within each time interval by Algorithm 1 to parameterize the switched From the above Figures we can conclude that the switched system, either with Algorithm 1 or with Algorithm 2, is able to reproduce the shape of the trajectories of the accurate-time-varying system. Consequently, the presented approach is useful to describe the overall dynamics of a complex set of data, coming either from real measurements or from a complex model, by means of a simpler piece-wise time-invariant model. The switching rules based on the entropy of the errors between each model and the data have revealed to be an adequate frame for such a task. Figures 10 and 15 show the active model selected within each time interval by Algorithm 1 to parameterize the switched system. Moreover, Figure 20 shows the active model when Algorithm 2 is employed. It is seen in Figure 20 how the algorithm selects a different model for each one of the state components of the system, which is the particularity of the algorithm. From the simulation results we can also observe that the probability value has a slight influence on the approximation performance and the selection of an appropriate value is not that critical. However, the switching time plays a more important role in the obtained performance as it can be deduced from the above examples. Moreover, there are no specific criteria for the selection of the switching time while a trial-error work can be performed in order to obtain an appropriate value for it. Finally, it can also be observed that Algorithm 1 is able to attain closer curves to the actual system than Algorithm 2, especially when it comes to the Susceptible population. The above figures show that the disease is persistent in the sense that the infectious do not vanish with time. Therefore, in the next Section 5.2 vaccination is used as external control to make the infectious converge to zero asymptotically.

Closed-Loop Control
In this subsection, vaccination is employed to avoid the persistency of the disease. To this end, the following state-feedback type vaccination law is used [29]: with K S = 0.1 and K I = 0.01 being the state-feedback gains and S active (t), I active (t) the state components of the corresponding active model according to Algorithms 1 and 2. Along this section, T = 5 days and k ij = 15.8. Figure 25 displays the evolution of the closed-loop system when Algorithm 1 is employed while Figure 26 shows the corresponding active model selected to parameterize the control command within each time interval. Figure 27 shows the trajectories for Algorithm 2 and Figure 28 shows the corresponding active model selected to parameterize the control command within each time interval. From Figures 25 and 27 we can conclude that the output of the real system and the output of the active model are practically the same when a feedback control action is included in the system. Thus, the approximation errors appearing in open loop vanish in closed-loop because of the control action. Moreover, there are no differences in the closed-loop trajectories generated by Algorithms 1 and 2. It can also be observed in Figures 25 and 27 that the infectious tend to zero eradicating the disease from the population, as desired. Therefore, the control objective is achieved. Figures 29 and 30 display the vaccination function calculated by using both algorithms. It can be seen that both control commands are very similar with only some peaks associated with the switching process making the difference between one and another. Overall, the proposed approach has been showed to be a powerful tool to model the complex time-varying system. to zero eradicating the disease from the population, as desired. Therefore, the control objective is achieved. Figures 29 and 30 display the vaccination function calculated by using both algorithms. It can be seen that both control commands are very similar with only some peaks associated with the switching process making the difference between one and another. Overall, the proposed approach has been showed to be a powerful tool to model the complex time-varying system.

Example with Actual Data
In this subsection the proposed entropy approach is applied to the real case of measles in the city of New York. In this way, it was stated in [33] that measles outbreaks in NYC during the period 1930-1970 can be appropriately described by a SIR model with a time-varying contact rate. Moreover, [33] estimates a monthly contact rate for the model while [34] proposes a Dietz-type contact rate for this problem. Thus, this real situation fits in with the formulation treated in the application problem. The authors of [34] gathered the weekly amount of reported measles cases for 93 years corresponding to the period 1891-1984 and made them publicly available as a supplementary material of [34]. In this simulation, the Algorithm 2 from Section 4 will be used to generate a switched model describing the data set of the year 1960. The SIR model describing the problem is given by: The parameters of the model are estimated in [34] to be γ −1 = 13 days, µ = 0.02yr −1 , β min = 1.79 · 10 −10 , β max = 5.3831 · 10 −10 and the yearly number of new born is approximately 10 5 .
The initial values of the populations are S(0) = 7782000, the population of New York in 1960, I(0) = 225 and R(0) = 0 while T = 20 days. There are 40 time-invariant models linearly spaced between β min and β max . Actual data for the infectious are used in (49) in order to calculate the error corresponding to each one of models running in parallel. The Figure 31 shows the number infectious predicted by the switched model compared to the actual data during 1960. The x axis of the figure represents the 52 weeks of the year. and R (0) = 0 while T = 20 days. There are 40 time-invariant models linearly spaced between β min and β max . Actual data for the infectious are used in (49) in order to calculate the error corresponding to each one of models running in parallel. The Figure 31 shows the number infectious predicted by the switched model compared to the actual data during 1960. The x axis of the figure represents the 52 weeks of the year. As it can be observed from Figure 31 the proposed approach succeeds at reproducing the trend contained in the data and predicting the time when the outbreak reaches the peak. Finally, the Figure  32 shows the active model at each time for the infectious population. Since the data are only available for the infectious, only this population has been displayed in the figures. As it can be observed from Figure 31 the proposed approach succeeds at reproducing the trend contained in the data and predicting the time when the outbreak reaches the peak. Finally, the Figure 32 shows the active model at each time for the infectious population. Since the data are only available for the infectious, only this population has been displayed in the figures.

Conclusions
This paper has presented results for the Shannon entropy when a complete the events of a complete finite and discrete set are eventually subject to relative errors of their associate probabilities. As a result, the current system of events, eventually under probabilistic errors, can lose its completeness since it may be either deflated or inflated in the sense that the total probability for the whole sets might be either below or beyond unity. Later on the previous technical results have been applied to control of epidemics evolution in the case that either the disease transmission coefficient rate is not well-known or it varies through time due, for instance, to seasonality. For such a purpose, a finite predefined set of running models, described by coupled differential equations, is chosen which covers a range of variation of such a coefficient transmission rate within known lower-bound and upper-bound limits. Each one of such models is driven by a constant disease transmission rate and the whole set of models covers the whole range of foreseen variation of such a parameter in the real system. In a general context, different uncertain parameters, or groups of parameters, other that the coefficient transmission rate could be checked by the proposed minimum-error entropy supervisory scheme. Two monitoring algorithms have been proposed to select the active one which minimizes the accumulated entropy of the absolute error data/model within each supervision time interval. A switching rule allows to choose another active model as soon as it is detected that the current active model becomes more uncertain than other (s) related to the observed data. Some numerical results have been also performed and discussed including a discussion on a real registered

Conclusions
This paper has presented results for the Shannon entropy when a complete the events of a complete finite and discrete set are eventually subject to relative errors of their associate probabilities. As a result, the current system of events, eventually under probabilistic errors, can lose its completeness since it may be either deflated or inflated in the sense that the total probability for the whole sets might be either below or beyond unity. Later on the previous technical results have been applied to control of epidemics evolution in the case that either the disease transmission coefficient rate is not well-known or it varies through time due, for instance, to seasonality. For such a purpose, a finite predefined set of running models, described by coupled differential equations, is chosen which covers a range of variation of such a coefficient transmission rate within known lower-bound and upper-bound limits. Each one of such models is driven by a constant disease transmission rate and the whole set of models covers the whole range of foreseen variation of such a parameter in the real system. In a general context, different uncertain parameters, or groups of parameters, other that the coefficient transmission rate could be checked by the proposed minimum-error entropy supervisory scheme. Two monitoring algorithms have been proposed to select the active one which minimizes the accumulated entropy of the absolute error data/model within each supervision time interval. A switching rule allows to choose another active model as soon as it is detected that the current active model becomes more uncertain than other (s) related to the observed data. Some numerical results have been also performed and discussed including a discussion on a real registered case. The active model, which is currently in operation, generates the vaccination and/or treatment controls to be injected to the real epidemic process.