Some Dissimilarity Measures of Branching Processes and Optimal Decision Making in the Presence of Potential Pandemics

We compute exact values respectively bounds of dissimilarity/distinguishability measures–in the sense of the Kullback-Leibler information distance (relative entropy) and some transforms of more general power divergences and Renyi divergences–between two competing discrete-time Galton-Watson branching processes with immigration GWI for which the offspring as well as the immigration (importation) is arbitrarily Poisson-distributed; especially, we allow for arbitrary type of extinction-concerning criticality and thus for non-stationarity. We apply this to optimal decision making in the context of the spread of potentially pandemic infectious diseases (such as e.g., the current COVID-19 pandemic), e.g., covering different levels of dangerousness and different kinds of intervention/mitigation strategies. Asymptotic distinguishability behaviour and diffusion limits are investigated, too.

Amongst the above-mentioned dissimilarity measures, an important omnipresent subclass are the so-called f −divergences of Csiszar [17], Ali & Silvey [18] and Morimoto [19]; important special cases thereof are the total variation distance and the very frequently used λ−order power divergences I λ (P, Q) (also known as alpha-entropies, Cressie-Read measures, Tsallis cross-entropies) with λ ∈ R. The latter cover e.g., the very prominent Kullback-Leibler information divergence I 1 (P, Q) (also called relative entropy), the (squared) Hellinger distance I 1/2 (P, Q), as well as the Pearson chi-square divergence I 2 (P, Q). It is well known that the power divergences can be build with the help of the λ−order Hellinger integrals H λ (P, Q) (where e.g., the case λ = 1/2 corresponds to the well-known Bhattacharyya coefficient), which are information measures of interest by their own and which are also the crucial ingredients of λ−order Renyi divergences R λ (P, Q) (see e.g., Liese & Vajda [1], van Erven & Harremoes [20]); the case R 1/2 (P, Q) corresponds to the well-known Bhattacharyya distance.
Another important class of time-dynamic models is given by discrete-time integer-valued branching processes, in particular (Bienaymé-)Galton-Watson processes without immigration GW respectively with immigration (resp. importation, invasion) GWI, which have numerous applications in biotechnology, population genetics, internet traffic research, clinical trials, asset price modelling, derivative pricing, and many others. As far as important terminology is concerned, we abbreviatingly subsume both models as GW(I) and, simply as GWI in case that GW appears as a parameter-special-case of GWI; recall that a GW(I) is called subcritical respectively critical respectively supercritical if its offspring mean is less than 1 respectively equal to 1 respectively larger than 1.
As far as the combined study of information measures and GW processes is concerned, let us first mention that (transforms of) power divergences have been used for supercritical Galton-Watson processes without immigration for instance as follows: Feigin & Passy [49] study the problem to find an offspring distribution which is closest (in terms of relative entropy type distance) to the original offspring distribution and under which ultimate extinction is certain. Furthermore, Mordecki [50] gives an equivalent characterization for the stable convergence of the corresponding log-likelihood process to a mixed Gaussian limit, in terms of conditions on Hellinger integrals of the involved offspring laws. Moreover, Sriram & Vidyashankar [51] study the properties of offspring-distribution-parameters which minimize the squared Hellinger distance between the model offspring distribution and the corresponding non-parametric maximum likelihood estimator of Guttorp [52]. For the setup of GWI with Poisson offspring and nonstochastic immigration of constant value 1, Linkov & Lunyova [53] investigate the asymptotics of Hellinger integrals in order to deduce large deviation assertions in hypotheses testing problems.
In contrast to the above-mentioned contexts, this paper pursues the following main goals: (MG1) for any time horizon and any criticality scenario (allowing for non-stationarities), to compute lower and upper bounds-and sometimes even exact values-of the Hellinger integrals H λ (P A ||P H ), power divergences I λ (P A ||P H ) and Renyi divergences R λ (P A ||P H ) of two alternative Galton-Watson branching processes P A and P H (on path/scenario space), where (i) P A has Poisson(β A ) distributed offspring as well as Poisson(α A ) distributed immigration, and (ii) P H has Poisson(β H ) distributed offspring as well as Poisson(α H ) distributed immigration; the non-immigration cases are covered as α A = α H = 0; as a side effect, we also aim for corresponding asymptotic distinguishability results; (MG2) to compute the corresponding limit quantities for the context in which (a proper rescaling of) the two alternative Galton-Watson processes with immigration converge to Feller-type branching diffusion processes, as the time-lags between the generation-size observations tend to zero; (MG3) as an exemplary field of application, to indicate how to use the results of (MG1) for Bayesian decision making in the epidemiological context of an infectious-disease pandemic (e.g., the current COVID- 19), where e.g., potential state-budgetary losses can be controlled by alternative public policies (such as e.g., different degrees of lockdown) for mitigations of the time-evolution of the number of infectious persons (being quantified by a GW(I)). Corresponding Neyman-Pearson testing will be treated, too.
Because of the involved Poisson distributions, these goals can be tackled with a high degree of tractability, which is worked out in detail with the following structure (see also the full table of contents after this paragraph): in Section 2, we first introduce (i) the basic ingredients of Galton-Watson processes together with their interpretations in the above-mentioned pandemic setup where it is essential to study all types of criticality (being connected with levels of reproduction numbers), (ii) the employed fundamental information measures such as Hellinger integrals, power divergences and Renyi divergences, (iii) the underlying decision-making framework, as well as (iv) connections to time series of counts and asymptotical distinguishability. Thereafter, we start our detailed technical analyses by giving recursive exact values respectively recursive bounds-as well as their applications-of Hellinger integrals H λ (P A ||P H ) (see Section 3), power divergences I λ (P A ||P H ) and Renyi divergences R λ (P A ||P H ) (see Sections 4 and 5). Explicit closed-form bounds of Hellinger integrals H λ (P A ||P H ) will be worked out in Section 6, whereas Section 7 deals with Hellinger integrals and power divergences of the above-mentioned Galton-Watson type diffusion approximations.

Process Setup
We investigate dissimilarity measures and apply them to decisions, in the following context. Let the integer-valued random variable X n (n ∈ N 0 ) denote the size of the nth generation of a population (of persons, organisms, spreading news, other kind of objects, etc.) with specified characteristics, and suppose that for the modelling of the time-evolution n → X n we have the choice between the following two (e.g., alternative, competing) models (H) and (A): (H) a discrete-time homogeneous Galton-Watson process with immigration GWI, given by the recursive description where Y n−1,k is the number of offspring of the kth object (e.g., organism, person) within the (n − 1)th generation, and Y n denotes the number of immigrating objects in the nth generation. Notice that we employ an arbitrary deterministic (i.e., degenerate random) initial generation size X 0 . We always assume that under the corresponding dynamics-governing law P H (GWI1) the collection Y := Y n−1,k , n ∈ N, k ∈ N consists of independent and identically distributed (i.i.d.) random variables which are Poisson distributed with parameter β H > 0, (GWI2) the collection Y := Y n , n ∈ N consists of i.i.d. random variables which are Poisson distributed with parameter α H ≥ 0 (where α H = 0 stands for the degenerate case of having no immigration), (GWI3) Y and Y are independent.
(A) a discrete-time homogeneous Galton-Watson process with immigration GWI given by the same recursive description (1), but with different dynamics-governing law P A under which (GWI1) holds with parameter β A > 0 (instead of β H > 0), (GWI2) holds with α A ≥ 0 (instead of α H ≥ 0), and (GWI3) holds. As a side remark, in some contexts the two models (H) and (A) may function as a "sandwich" of a more complicated not fully known model.
For the sake of brevity, wherever we introduce or discuss corresponding quantities simultaneously for both models H and A, we will use the subscript • as a synonym for either the symbol H or A.
For illustration, recall the well-known fact that the corresponding conditional probabilities P • (X n = · |X n−1 = k) are again Poisson-distributed, with parameter β • · k + α • .
In oder to achieve a transparently representable structure of our results, we subsume the involved parameters as follows: (PS1) P SP is the set of all constellations (β A , β H , α A , α H ) of real-valued parameters β A > 0, β H > 0, α A > 0, α H > 0, such that β A = β H or α A = α H (or both); in other words, both models are non-identical and have non-vanishing immigration; (PS2) P NI is the set of all (β A , β H , α A , α H ) of real-valued parameters β A > 0, β H > 0, α A = α H = 0, such that β A = β H ; this corresponds to the important special case that both models have no immigration and are non-identical; (PS3) the resulting disjoint union will be denoted by P = P SP ∪ P NI .
For the non-immigration case α • = 0 one has the following extinction properties (see e.g., Harris [66], Athreya & Ney [55]). As usual, let us define the extinction time τ := min i ∈ N : X = 0 for all integers ≥ i if this minimum exists, and τ := ∞ else. Correspondingly, let B := {τ < ∞} be the extinction set. If the offspring mean β • satisfies β • < 1-which is called the subcritical case-or β • = 1-which is known as the critical case-then extinction is certain, i.e., there holds P(B | X 0 = 1) = 1. However, if the offspring mean satisfies β • > 1-which is called the supercritical case-then there is a probability greater than zero, that the population never dies out, i.e., P(B | X 0 = 1) ∈]0, 1[. In the latter case, X n explodes (a.s.) to infinity as n → ∞.
In contrast, for the (nondegenerate, nonvanishing) immigration case α • = 0 there is no extinction, viz. P(B | X 0 = 1) = 0, although there may be zero population X 0 = 0 for some intermediate time 0 ∈ N; but due to the immigration, with probability one there is always a later time 1 > 0 , such that X 1 > 0. Nevertheless, also for the setup α • = 0 it is important to know whether β • 1-which is still called (super-, sub-)criticality-since e.g., in the case β • < 1 the population size X n converges (as n → ∞) to a stationary distribution on N whereas for β • > 1 the behaviour is non-stationary (non-ergodic), see e.g., Athreya & Ney [55].
At this point, let us emphasize that in our investigations (both for α • = 0 and for α • = 0) we do allow for "crossovers" between "different criticalities", i.e., we deal with all cases β A 1 versus all cases β H 1; as will be explained in the following, this unifying flexibility is especially important for corresponding epidemiological-model comparisons (e.g., for the sake of decision making). One of our main goals is to quantitatively compare (the time-evolution of) two competing GWI models H and A with respective parameter sets (β H , α H ) and (β A , α A ), in terms of the information measures H λ (P A ||P H ) (Hellinger intergrals), I λ (P A ||P H ) (power divergences), R λ (P A ||P H ) (Renyi divergences). The latter two express a distance (degree of dissimilarity) between H and A. From this, we shall particularly derive applications for decision making under uncertainty (including tests).

Connections to Time Series of Counts
It is well known that a Galton-Watson process with Poisson offspring (with parameter β • ) and Poisson immigration (with parameter α • ) is "distributionally" equal to each of the following models (listed in "tree-type" chronological order): (M1) a Poissonian Generalized Integer-valued Autoregressive process GINAR(1) in the sense of Gauthier & Latour [67] (see also Dion, Gauthier & Latour [44], Latour [68], as well as Grunwald et al. [45]), that is, a first-order autoregressive times series with Poissonian thinning (with parameter β • ) and Poissonian innovations (with parameter α • ); (M2) Poissonian first order Conditional Linear Autoregressive model (Poissonian CLAR (1)) in the sense of Grunwald et al. [45] (and earlier preprints thereof) (since the conditional expectation is EP • [X n |F n−1 ] = α • + β • · X n−1 ); this can be equally seen as Poissonian autoregressive Generalized Linear Model GLM with identity link function (cf. [45] as well as Chapter 4 of Kedem & Fokianos [46]), that is, an autoregressive GLM with Poisson distribution as random component and the identity link as systematic component; the same model was used (and generalized) (M2i) under the name BIN(1) by Rydberg & Shephard [69] for the description of the number X n of stock transactions/trades recorded up to time n; (M2ii) under the name Poisson autoregressive model PAR(1) by Brandt & Williams [70] for the description of event counts in political and other social science applications; (M2iii) under the name Autoregressive Conditional Poisson model ACP(1,0) by Heinen [71]; (M2iv) by Held, Höhle & Hofmann [47] as well as Held et al. [72], as a description of the time-evolution of counts from infectious disease surveillance databases, where β • (respectively, α • ) is interpreted as driving parameter of epidemic (respectively, endemic) component; in principle, this type of modelling can be also implicitly recovered as a special case of the epidemics-treating work of Finkenstädt, Bjornstad & Grenfell [73], by assuming trend-and season-neglecting (e.g., intra-year) measles data in urban areas of about 10 million people (provided that their population size approximation extends linearly); (M2v) under the name integer-valued Generalized Autoregressive Conditional Heteroscedastic model INGARCH(1,0) by Ferland, Latour & Oraichi [74] (since the conditional variance is VarP • [X n |F n−1 ] = α • + β • · X n−1 ), see also Weiß [75]; this has been refinely named as INARCH(1) model by Weiß [76,77], and frequently applied thereafter; for an "overlapping-generation type" interpretation of the INARCH(1) model, which is an adequate description for the time-evolution of overdispersed counts with an autoregressive serial dependence structure, see Weiß & Testik [78]; for a corresponding comprehensive recent survey (also to more general count time series), the reader is referred to the book of Weiß [48]; Moreover, according to the general considerations of Grunwald et al. [45], the Poissonian Galton-Watson model with immigration may possibly be "distributionally equal" to an integer-valued autoregressive model with random coefficient (thinning).
Nowadays, besides the name homogeneous Galton-Watson model with immigration GWI, the name INARCH(1) seems to be the most used one, and we follow this terminology (with emphasis on GWI). Typical features of the above-mentioned models (M1) to (M2v), are the use of Z as the set of times, and the assumptions α • > 0 as well as β • ∈]0, 1[, which guarantee stationarity and ergodicity (see above). In contrast, we employ N 0 as the set of times, degenerate (and thus, non-equilibrium) starting distribution, and arbitrary α • ≥ 0 as well as β • > 0. For such a situation, as explained above, we quantitatively compare two competing GWI models H and A with respective parameter sets (β H , α H ) and (β A , α A ). Since-as can be seen e.g., in (29) below-we basically employ only (conditionally) distributional ingredients, such as the corresponding likelihood ratio (see e.g., (13) to (15), (27) to (29) below), all the results of the Sections 3-6 can be immediately carried over to the above-mentioned time-series contexts (where we even allow for non-stationarities, in fact we start with a one-point/Dirac distribution); for the sake of brevity, in the rest of the paper this will not be mentioned explicitly anymore.
Notice that a Poissonian GWI as well as all models (M1) and (M2) are-despite of their conditional Poisson law-typically overdispersed since with equality iff (i.e., if and only if) α • = 0 (NI) and X n−2 = 0 (extinction at n − 2 with n ≥ 3).

Applicability to Epidemiology
The above-mentioned framework can be used for any of the numerous fields of applications of discrete-time branching processes, and of the closely related INARCH(1) models. For the sake of brevity, we explain this-as a kind of running-example-in detail for the currently highly important context of the epidemiology of infectious diseases. For insightful non-mathematical introductions to the latter, see e.g., Kaslow & Evans [79], Osterholm & Hedberg [80]; for a first entry as well as overviews on modelling, the reader is referred to e.g., Grassly & Fraser [81], Keeling & Rohani [82], Yan [83,84], Britton [85], Diekmann, Heesterbeek & Britton [86], Cummings & Lessler [87], Just et al. [88], Britton & Giardina [89], Britton & Pardoux [43]. A survey on the particular role of branching processes in epidemiology can be found e.g., in Jacob [41].
Undoubtedly, by nature, the spreading of an infectious disease through a (human, animal, plant) population is a branching process with possible immigration. Indeed, typically one has the following mechanism: (D1) at some time t E k -called the time of exposure (moment of infection)-an individual k of a specified population is infected in a wide sense, i.e., entered/invaded/colonized by a number of transmissible disease-causative pathogens (etiologic agents such as viruses, bacteria, protozoans and other parasites, subviruses (e.g., prions and plant viroids), etc.); the individual is then a host (of pathogens); (D2) depending on the level of immunity and some other factors, these pathogens may multiply/replicate within the host to an extent (over a threshold number) such that at time t I k some of the pathogens start to leave their host (shedding of pathogens); in other words, the individual k becomes infectious at the time t I k of onset of infectiousness. Ex post, one can then say that the individual became infected in the narrow sense at earlier time t E k and call it a primary case. The time interval [t E k , t I k [ is called the latent/latency/pre-infectious period of k, and t I k − t E k its duration (in some literature, there is no verbal distinction between them); notice that t I k may differ from the time t OS k of onset (first appearance) of symptoms, which leads to the so-called incubation [ is called the pre-symptomatic period; (D3) as long as the individual k stays infectious, by shedding of pathogens it may infect in a narrow sense a random number Y k ∈ N 0 of other individuals which are susceptible (i.e., neither immune nor already infected in a narrow sense), where the distribution of Y k depends on the individual's (natural, voluntary, forced) behaviour, its environment, as well as some other factors e.g., connected with the type of pathogen transmission; the newly infected individuals are called offspring of k, and secondary cases if they are from the same specified population or exportations if they are from a different population; from the view of the latter, these infections are imported cases and thus can be viewed as immigrants; (D4) at the time t R k of cessation of infectiousness, the individual stops being infectious (e.g., because of recovery, death, or total isolation); the time interval [t I k , t R k [ is called the period of infectiousness (also period of communicability, infectious/infective/shedding/contagious period) of k, and t R k − t I k its duration (in some literature, there is no verbal distinction between them); notice that t R k may differ from the time t CS k of cessation (last appearance) of symptoms which leads to the so-called sickness period [t OS k , t CS k [; (D5) this branching mechanism continues within the specified population until there are no infectious individuals and also no importations anymore (eradication, full extinction, total elimination)up to a specified final time (which may be large or even infinite); All the above-mentioned times t · k and time intervals are random, by nature. Two further connected quantities are also important for modelling (see e.g., Yan & Chowell [84] (p. 241ff), including a history of corresponding terminology). Firstly, the generation interval (generation time, transmission interval) is the time interval from the onset of infectiousness in a primary case (called the infector) to the onset of infectiousness in a secondary case (called the infectee) infected by the primary case; clearly, the generation interval is random, and so is its duration (often, the (population-)mean of the latter is also called generation interval). Typically, generation intervals are important ingredients of branching process models of infectious diseases. Secondly, the serial interval describes time interval from the onset of symptoms in a primary case to the onset of symptoms in a secondary case infected by the primary case. By nature, the serial interval is random, and so is its duration (often, the (population-)mean of the latter is also called serial interval). Typically, the serial interval is easier to observe than the generation interval, and thus, the latter is often approximately estimated from data of the former. For further investigations on generation and serial intervals, the reader is referred to e.g., Fine [90], Svensson [91,92] [105].
With the help of the above-mentioned individual ingredients, one can aggregatedly build numerous different population-wide models of infectious diseases in discrete time as well as in continuous time; the latter are typically observed only in discrete-time steps (discrete-time sampling), and hence in the following we concentrate on discrete-time modelling (of the real or the observational process). In fact, we confine ourselves to the important task of modelling the evolution n → X n of the number of incidences at "stage" n, where incidence refers to the number of new infected/infectious individuals. Here, n may be a generation number where, inductively, n = 0 refers to the generation of the first appearing primary cases in the population (also called initial importations), and n refers to the generation of offsprings of all individuals of generation n − 1. Alternatively, n may be the index of a physical ("calender") point of time t n , which may be deterministic or random; e.g., (t n ) n∈N may be a strictly increasing series of (i) equidistant deterministic time points (and thus, one can identify t n = n in appropriate time units such as days, weeks, bi-weeks, months), or (ii) non-equidistant deterministic time points, or (iii) random time points (as a side remark, let us mention that in some situations, X n may alternatively denote the number of prevalences at "stage" n, where prevalence refers to the total number of infected/infectious individuals (e.g., through some methodical tricks like "self-infection")).
In the light of this, one can loosely define an epidemic as the rapid spread of an infectious disease within a specified population, where the numbers X n of incidences are high (or much higher than expected) for that kind of population. A pandemic is a geographically large-scale (e.g., multicontinental or worldwide) epidemic. An outbreak/onset of an epidemic in the narrow sense is the (time of) change where an infectious disease turns into an epidemic, which is typically quantified by exceedance over an threshold; analogously, an outbreak/onset of a pandemic is the (time of) change where the epidemic turns into a pandemic. Of course, one goal of infectious-disease modelling is to quantify "early enough" the potential danger of an emerging outbreak of an epidemic or a pandemic.
Returning to possible models of the incidence-evolution n → X n , its description may be theoretically derived from more detailed, time-finer, highly sophisticated, individual-based "mechanistic" infectious-disease models such as e.g., continuous-time suscetible-exposed-infectious-recovered (SEIR) models (see the above-mentioned introductory texts); however, as e.g., pointed out in Held et al. [72], the estimation of the correspondingly involved numerous parameters may be too ambitious for routinely collected, non-detailed disease data, such as e.g., daily/weekly counts X n of incidences-especially in decisive emerging/early phases of a novel disease (such as the current COVID-19 pandemic). Accordingly, in the following we assume that X n can be approximately described by a Poissonian Galton-Watson process with immigration respectively a ("distributionally equal") Poissonian autoregressive Generalized Linear Model in the sense of (M2). Depending on the situation, this can be quite reasonable, for the following arguments (apart from the usual "if the data say so"). Firstly, it is well known (see e.g., Bartoszynski [33], Ludwig [34], Becker [35,36], Metz [37], Heyde [38], von Bahr & Martin-Löf [39], Ball [40], Jacob [41], Barbour & Reinert [42], Section 1.2 of Britton & Pardoux [43]) that in populations with a relatively high number of susceptible individuals and a relatively low number of infectious individuals (e.g., in a large population and in decisive emerging/early phases of the disease spreading), the incidence-evolution n → X n can be well approximated by a (e.g., Poissonian) Galton-Watson process with possible immigration where n plays the role of a generation number. If the above-mentioned generation interval is "nearly" deterministic (leading to nearly synchronous, non-overlapping generations)-which is the case e.g., for (phases of) Influenza A(H1N1)pdm09, Influenza A(H3N2), Rubella (cf. Vink, Bootsma & Wallinga [98]), and COVID-19 (cf. Ferretti et al. [101])-and the length of the generation interval is approximated by its mean length and the latter is tuned to be equal to the unit time between consecutive observations, then n plays the role of an observation (surveillance) time. This effect is even more realistic if the period of infectiousness is nearly deterministic and relatively short. Secondly, as already mentioned above, the spreading of an infectious disease is intrinsically a (not necessarily Poissonian Galton-Watson) branching mechanism, which may be blurred by other effects in a way that a Poissonian autoregressive Generalized Linear Model is still a reasonably fitting model for the observational process in disease surveillance. The latter have been used e.g., by Finkenstädt, Bjornstad & Grenfell [73], Held, Höhle & Hofmann [47], and Held et al. [72]; they all use non-constant parameters (e.g., to describe seasonal effects, which are however unknown in early phases of a novel infectious disease such as . In contrast, we employ different new-namely divergence-based-statistical techniques, for which we assume constant parameters but also indicate procedures for the detection of changes; the extension to non-constant parameters is straightforward. Returning to Galton-Watson processes, let us mention as a side remark that they can be also used to model the above-mentioned within-host replication dynamics (D2) (e.g., in the time-interval [t E k , t I k [ and beyond) on a sub-cellular level, see e.g., Spouge [106], as well as Taneyhill, Dunn & Hatcher [107] for parasitic pathogens; on the other hand, one can also employ Galton-Watson processes for quantifying snowball-effect (avalanche-effect, cascade-effect) type, economic-crisis triggered consequences of large epidemics and pandemics, such as e.g., the potential spread of transmissible (i) foreclosures of homes (cf. Parnes [108]), or clearly also (ii) company insolvencies, downsizings and credit-risk downgradings; moreover, the time-evolution of integer-valued indicators concerning the spread of (rational or unwarranted) fears resp. perceived threats may be modelled, too.
Summing up things, we model the evolution n → X n of the number of incidences at stage n by a Poissonian Galton Watson process with immigration GWI (where Y n−1,k corresponds to the Y k of (D3), equipped with an additional stage-index n − 1), respectively by a corresponding "distributionally equal"-possibly non-stationary-Poissonian autoregressive Generalized Linear Model in the sense of (M2); depending on the situation, we may also fix a (deterministic or random) upper time horizon other than infinity. Recall that both models are overdispersed, which is consistent with the current debate on overdispersion in connection with the current COVID-19 pandemic. In infectious-disease language, the sum ∑ X n−1 k=1 Y n−1,k can also be loosely interpreted as epidemic component (in a narrow sense) driven by the parameter β • , and Y n as endemic component driven by the parameter α • . In fact, the offspring mean (here, β • ) is called reproduction number and plays a major role-also e.g., in the current public debate about the COVID-19 pandemic-because it crucially determines the rapidity of the spread of the disease and-as already indicated above in the second and third paragraph after (PS3)-also the probability that the epidemic/pandemic becomes (maybe temporally) extinct or at least stationary at a low level (that is, endemic). For this to happen, β • should be subcritical, i.e., β • < 1, and even better, close to zero. Of course, the size of the importation mean α • ≥ 0 matters, too, in a secondary order.
Keeping this in mind, let us discuss on which factors the reproduction number β • and the importation mean α • depend upon, and how they can be influenced/controlled. To begin with, by recalling the above-mentioned points (D1) to (D5) and by adapting the considerations of e.g., Grassly & Fraser [81] to our model, one encounters the fact that the distribution of the offspring Y n−1,k -here driven by the reproduction number (offspring mean) β • -depends on the following factors: (B1) the degree of infectiousness of the individual k, with three major components: (B1a) degree of biological infectiousness; this reflects the within-host dynamics (D2) of the "representative" individual k, in particular the duration and amount of the corresponding replication and shedding/excretion of the infectious pathogens; this degree depends thus on (i) the number of host-invading pathogens (called the initial infectious dose), (ii) the type of the pathogen with respect to e.g., its principal capabilities of replication speed, range of spread and drug-sensitivity, (iii) features of the immune system of the host k including the level of innate or acquired immunity, and (iv) the interaction between the genetic determinants of disease progression in both the pathogen and the host; (B1b) degree of behavioural infectiousness; this depends on the contact patterns of an infected/infectious individual (and, if relevant, the contact patterns of intermediate hosts or vectors), in relation to the disease-specific type of route(s) of transmission of the infectious pathogens (for an overview of the latter, see e.g., Table 3 of Kaslow & Evans [79]); a long-distance-travel behaviour may also lead to the disease exportation to another, outside population (and thus, for the latter to a disease importation); (B1c) degree of environmental infectiousness; this depends on the location and environment of the host k, which influences the duration of outside-host survival of the pathogens (and, if relevant, of the intermediate hosts or vectors) as well as the speed and range of their outside-host spread; for instance, high temperature may kill the pathogens, high airflow or rainfall dynamics may ease their spread, etc.
(B2) the degree of susceptibility of uninfected individuals who have contact with k, with the following three major components (with similar background as their infectiousness counterparts): (B2a) degree of biological susceptibility; (B2b) degree of behavioural susceptibility; (B2c) degree of environmental susceptibility.
All these factors (B1a) to (B2c) can be principally influenced/controlled to a certain-respective-extent. Let us briefly discuss this for human infectious diseases, where one major goal of epidemic risk management is to operate countermeasures/interventions in order to slow down the disease transmission (e.g., by reducing the reproduction number β • to less than 1) and eventually even break the chain of transmission, for the sake of containment or mitigation; preparedness and preparation are motives, too, for instance as a part of governmental pandemic risk management.
For instance, (B1a) can be reduced or even erased through pharmaceutical interventions such as medication (if available), and preventive strengthening of the immune system through non-extreme sports activities and healthy food.
Moreover, the following exemplary control measures for (B2) can be either put into action by common-sense self-behaviour, or by large-scale public recommendations (e.g., through mass media), or by rules/requirements from authorities: personal preventive measures such as frequent washing and disinfecting of hands; keeping hands away from face; covering coughs; avoidance of handshakes and hugs with non-family-members; maintaining physical distance (e.g., of two meters) from non-family-members; wearing a face-mask of respective security degree (such as homemade cloth face mask, particulate-filtering face-piece respirator, medical (non-surgical) mask, surgical mask); self-quarantine; (ii) environmental measures, such as e.g., cleaning of surfaces; (iii) community measures aimed at mild or stringent social distancing, such as e.g., prohibiting/cancelling/banning gatherings of more than z non-family members (e.g., z = 2, 5, 10, 100, 1000 in various different phases and countries during the current COVID-19 pandemic); mask-wearing (see above); closing of schools, universities, some or even all nonessential ("system-irrelevant") businesses and venues; home-officing/work ban; home isolation of disease cases; isolation of homes for the elderly/aged (nursing homes); stay-at-home orders with exemptions, household or even general quarantine; testing & tracing; lockdown of entire cities and beyond; restricting the degrees of travel freedom/allowed mobility (e.g., local, union-state, national, international including border and airport closure). The latter also affects the mean importation rate α • , which can be controlled by vaccination programs in "outside populations", too.
As far as the degree of biological susceptibility (B2a) is concerned, one obvious therapeutic countermeasure is a mass vaccination program/campaign (if available).
In case of highly virulent infectious diseases causing epidemics and pandemics with substantial fatality rates, some of the above-mentioned control strategies and countermeasures may (have to) be "drastic" (e.g., lockdown), and thus imply considerable social and economic costs, with a huge impact and potential danger of triggering severe social, economic and political disruptions.
In order to prepare corresponding suggestions for decisions about appropriate control measures (e.g., public policies), it is therefore important-especially for a novel infectious disease such as the current COVID-19 pandemic-to have a model for the time-evolution of the incidences in (i) a natural (basically uncontrolled) set-up, as well as in (ii) the control set-ups under consideration. As already mentioned above, we assume that all these situations can be distilled into an incidence evolution n → X n which follows a Poissonian Galton-Watson process with respectively different parameter pairs (β • , α • ). Correspondingly, we always compare two alternative models (H) and (A) with parameter pairs (β H , α H ) and (β A , α A ) which reflect either a "pure" statistical uncertainty (under the same uncontrolled or controlled set-up), or the uncertainty between two different potential control set-ups (for the sake of assessing the potential impact/efficiency of some planned interventions, compared with alternative ones); the economic impact can be also taken into account, within a Bayesian decision framework discussed in Section 2.5 below. As will be explained in the next subsections, we achieve such comparisons by means of density-based dissimilarity distances/divergences and related quantities thereof.
From the above-mentioned detailed explanations, it is immediately clear that for the described epidemiological context one should investigate all types of criticality and importation means for the therein involved two Poissonian Galton-Watson processes with/without immigration (respectively the equally distributed INARCH(1) models); in particular, this motivates (or even "justifies") the necessity of the very lengthy detailed studies in the Sections 3-7 below.

Information Measures
Having two competing models (H) and (A) at stake, it makes sense to study questions such as "how far are they apart?" and thus "how dissimilar are they?". This can be quantified in terms of divergences in the sense of directed (i.e., not necessarily symmetric) distances, where usually the triangular inequality fails. Let us first discuss our employed divergence subclasses in a general set-up of two equivalent probability measures P H , P A on a measurable space (Ω, F ). In terms of the parameter λ ∈ R, the power divergences-also known as Cressie-Read divergences, relative Tsallis entropies, or generalized cross-entropy family-are defined as (see e.g., Liese & Vajda [1,10]) where is the Kullback-Leibler information divergence (also known as relative entropy) and is the Hellinger integral of order λ ∈ R\{0, 1}; for this, we assume as usual without loss of generality that the probability measures P H , P A are dominated by some σ−finite measure µ, with densities defined on Ω (the zeros of p H , p A are handled in (3) and (4) with the usual conventions). Clearly, for The Kullback-Leibler information divergences (relative entropies) in (2) and (3) can alternatively be expressed as (see, e.g., Liese & Vajda [1]) Apart from the Kullback-Leibler information divergence (relative entropy), other prominent examples of power divergences are the squared Hellinger distance 1 2 I 1/2 (P A ||P H ) and Pearson's χ 2 −divergence 2 I 2 (P A ||P H ); the Hellinger integral H 1/2 (P A ||P H ) is also known as (multiple of) the Bhattacharyya coefficent. Extensive studies about basic and advanced general facts on power divergences, Hellinger integrals and the related Renyi divergences of order λ ∈ R\{0, 1} can be found e.g., in Liese & Vajda [1,10], Jacod & Shiryaev [24], van Erven & Harremoes [20] (as a side remark, R 1/2 (P A ||P H ) is also known as (multiple of) Bhattacharyya distance). For instance, the integrals in (3) and (4) do not depend on the choice of µ. Furthermore, one has the skew symmetries for all λ ∈ R (see e.g., Liese & Vajda [1]). As far as finiteness is concerned, for λ ∈]0, 1[ one gets the rudimentary bounds 0 < H λ (P A ||P H ) ≤ 1 , and equivalently, where the lower bound in (10) (upper bound in (9)) is achieved iff P A = P H . For λ ∈ R\]0, 1[, one gets the bounds where, in contrast to above, both the lower bound of H λ (P A ||P H ) and the lower bound of I λ (P A ||P H ) is achieved iff P A = P H ; however, the power divergence I λ (P A ||P H ) and Hellinger integral H λ (P A ||P H ) might be infinite, depending on the particular setup. The Hellinger integrals can be also used for bounds of the well-known total variation with p A and p H defined in (5). Certainly, the total variation is one of the best known statistical distances, see e.g., Le Cam [109]. For arbitrary From this together with the particular choice λ = 1 2 , we can derive the fundamental universal bounds We apply these concepts to our setup of Section 2.1 with two competing models (H) and (A) of Galton-Watson processes with immigration, where one can take Ω ⊂ N N 0 0 to be the space of all paths of (X n ) n∈N . More detailed, in terms of the extinction set B := {τ < ∞} and the parameter-set notation (PS1) to (PS3), it is known that for P SP the two laws P H and P A are equivalent, whereas for P NI the two restrictions P H | B and P A | B are equivalent (see e.g., Lemma 1.1.3 of Guttorp [52]); with a slight abuse of notation we shall henceforth omit | B . Consistently, for fixed time n ∈ N 0 we introduce P A,n := P A | F n and P H,n := P H | F n as well as the corresponding Radon-Nikodym-derivative (likelihood ratio) where (F n ) n∈N denotes the corresponding canonical filtration generated by X := (X n ) n∈N ; in other words, F n reflects the "process-intrinsic" information known at stage n. Clearly, Z 0 = 1. By choosing the reference measure µ = P H,n one obtains from (4) the Hellinger integral H λ (P A,0 ||P H,0 ) = 1, as well as and for all n ∈ N from which one can immediately build I λ (P A,n ||P H,n ) (λ ∈ R) respectively R λ (P A,n ||P H,n ) (λ ∈ R\{0, 1}) respectively bounds of V (P A,n ||P H,n ) via (2) respectively (7) respectively (12).
The outcoming values (respectively bounds) of H λ (P A,n ||P H,n ) are quite diverse and depend on the choice of the involved parameter pairs (β H , α H ), (β A , α A ) as well as λ; the exact details will be given in the Sections 3 and 6 below.
Before we achieve this, in the following we explain how the outcoming dissimilarity results can be applied to Bayesian testing and more general Bayesian decision making, as well as to Neyman-Pearson testing.

Decision Making under Uncertainty
Within the above-mentioned context of two competing models (H) and (A) of Galton-Watson processes with immigration, let us briefly discuss how knowledge about the time-evolution of the Hellinger integrals H λ (P A,n ||P H,n )-or equivalently, of the power divergences I λ (P A,n ||P H,n ), cf. (2)-can be used in order to take decisions under uncertainty, within a framework of Bayesian decision making BDM, or alternatively, of Neyman-Pearson testing NPT.
In our context of BDM, we decide between an action d H "associated with" the (say) hypothesis law P H and an action d A "associated with" the (say) alternative law P A , based on the sample path observation X n := {X l : l ∈ {0, 1, . . . , n} } of the GWI-generation-sizes (e.g., infectious-disease incidences, cf. Section 2.3) up to observation horizon n ∈ N. Following the lines of Stummer & Vajda [15] (adapted to our branching process context), for our BDM let us consider as admissible decision rules δ n : Ω n → {d H , d A } the ones generated by all path sets G n ∈ Ω n (where Ω n denotes the space of all possible paths of (X k ) k∈{1,...,n} ) through as well as loss functions of the form with pregiven constants L A > 0, L H > 0 (e.g., arising as bounds from quantities in worst-case scenarios); notice that in (16), d H is assumed to be a zero-loss action under H and d A a zero-loss action under A. Per definition, the Bayes decision rule δ G n,min minimizes-over G n -the mean decision loss for given prior probabilities p for A. As a side remark let us mention that, in a certain sense, the involved model (parameter) uncertainty expressed by the "superordinate" Bernoulli-type law Pr = Bin(1, p prior H ) can also be reinterpreted as a rudimentary static random environment caused e.g., by a random Bernoulli-type external static force.
By straightforward calculations, one gets with (13) the minimizing path set G n,min = Z n ≥ p prior H L H p prior A L A leading to the minimal mean decision loss, i.e., the Bayes risk, Notice that-by straightforward standard arguments-the alternative decision procedure as well as the lower bound which implies in particular the "direct" lower bound By using (19) (respectively (20)) together with the exact values and the upper (respectively lower) bounds of the Hellinger integrals H λ (P A,n ||P H,n ) derived in the following sections, we end up with upper (respectively lower) bounds of the Bayes risk R n . Of course, with the help of (2) the bounds (19) and (20) can be (i) immediately rewritten in terms of the power divergences I λ (P A,n ||P H,n ) and (ii) thus be directly interpreted in terms of dissimilarity-size arguments. As a side-remark, in such a Bayesian context the λ−order Hellinger integral H λ (P A,n ||P H,n ) = EP H,n (Z n ) λ (cf. (14)) can be also interpreted as λ−order Bayes-factor moment (with respect to P H,n ), since Z n = Z n (X n ) = is the Bayes factor (i.e., the posterior odds ratio of (A) to (H), divided by the prior odds ratio of (A) to (H)).
At this point, the potential applicant should be warned about the usual way of asynchronous decision making, where one first tests (A) versus (H) (i.e., L A = L H = 1 which leads to 0-1 losses in (16)) and afterwards, based on the outcoming result (e.g., in favour of (A)), takes the attached economic decision (e.g., d A ); this can lead to distortions compared with synchronous decision making with "full" monetary losses L A and L H , as is shown in Stummer & Lao [16] within an economic context in connection with discrete approximations of financial diffusion processes (they call this distortion effect a non-commutativity between Bayesian statistical and investment decisions).
For different types of-mainly parameter estimation (squared-error type loss function) concerning-Bayesian analyses based on GW(I) generation size observations, see e.g., Jagers [ [114], and the references therein.
Within our running-example epidemiological context of Section 2.3, let us briefly discuss the role of the above-mentioned losses L A and L H . To begin with, as mentioned above the unit-free choice L A = L H = 1 corresponds to Bayesian testing. Recall that this concerns with two alternative infectious-disease models (H) and (A) with parameter pairs (recall the interpretation of β • as reproduction number and α • as importation mean) (β H , α H ) and (β A , α A ) which reflect either a "pure" statistical uncertainty (under the same uncontrolled or controlled set-up), or the uncertainty between two different potential control set-ups (for the sake of assessing the potential impact/efficiency of some planned interventions, compared with alternative ones). As far as non-unit-free-e.g., macroeconomic or monetary-losses is concerned, recall that some of the above-mentioned control strategies (countermeasures, public policies, governmental pandemic risk management plans) may imply considerable social and economic costs, with a huge impact and potential danger of triggering severe social, economic and political disruptions; a corresponding tradeoff between health and economic issues can be incorporated by choosing L A and L H to be (e.g., monetary) values which reflect estimates or upper bounds of losses due to wrong decisions, e.g., if at stage n due to the observed data one erroneously thinks (reinforced by fear) that a novel infectious disease (e.g., COVID-19) will lead (or re-emerge) to a severe pandemic and consequently decides for a lockdown with drastic future economic consequences, versus, if one erroneously thinks (reinforced by carelessness) that the infectious disease is (or stays) non-severe and consequently eases some/all control measures which will lead to extremely devastating future economic consequences. For the estimates/bounds of L A and L H , one can e.g., employ (i) the comprehensive stochastic studies of Feicht & Stummer [115] on the quantitative degree of elasticity and speed of recovery of economies after a sudden macroeconomic disaster, or (ii) the more short-term, German-specific, scenario-type (basically non-stochastic) studies of Dorn et al. [116,117] in connection with the current COVID-19 pandemic.
Of course, the above-mentioned Bayesian decision procedure can be also operated in sequential way. For instance, suppose that we are encountered with a novel infectious disease (e.g., COVID-19) of non-negligible fatality rate and let (A) reflect a "potentially dangerous" infectious-disease-transmission situation (e.g., a reproduction number of substantially supercritical case β A = 2, and an importation mean of α A = 10, for weekly appearing new incidence-generations) whereas (H) describes a "relatively harmless/mild" situation (e.g., a substantially subcritical β H = 0.5, α H = 0.2). Moreover, let d A respectively d H denote (non-quantitatively) the decision/action to accept (A) respectively (H). It can then be reasonable to decide to stop the observation process n → X n (also called surveillance or online-monitoring) of incidence numbers at the first time at which n → Z n = Z n (X n ) exceeds the threshold p prior H /p prior A ; if this happens, one takes d A as decision (and e.g., declare the situation as occurrence of an epidemic outbreak and start with control/intervention measures (however, as explained above, one should synchronously involve also the potential economic losses)) whereas as long as this does not happen, one continues the observation (and implicitly takes d H as decision). This can be modelled in terms of the pair ( τ, d A ) with (random) stopping time τ := inf n ∈ N : Z n ≥ p prior H p prior A (with the usual convention that the infimum of the empty set is infinity), and the corresponding decision d A . After the time τ < ∞ and e.g., immediate subsequent employment of some control/counter measures, one can e.g., take the old model (A) as new (H), declare a new target (A) for the desired quantification of the effectiveness of the employed control measures (e.g., a mitigation to a slightly subcritical case of β A = 0.95, α H = 0.8), and starts to observe the new incidence numbers until the new target (A) has been reached. This can be interpreted as online-detection of a distributional change; a related comprehensive new framework for the use of divergences (even much beyond power divergences) for distributional change detection can be found e.g., in the recent work of Kißlinger & Stummer [118]. A completely different, SIR-model based, approach for the detection of change points in the spread of COVID-19 is given in Dehning et al. [119]. Moreover, other different surveillance methods can be also found e.g., in the corresponding overview of Frisen [120] and the Swedish epidemics outbreak investigations of Friesen & Andersson & Schiöler [121].
One can refine the above-mentioned sequential procedure via two (instead of one) appropriate thresholds c 1 < c 2 and the pair (τ, δτ), with the stopping timeτ := inf n ∈ N : Z n / ∈ [c 1 , c 2 ] as well as corresponding decision rule An exact optimized treatment on the two above-mentioned sequential procedures, and their connection to Hellinger integrals (and power divergences) of Galton-Watson processes with immigration, is beyond the scope of this paper.
As a side remark, let us mention that our above-mentioned suggested method of Bayesian decision making with Hellinger integrals of GWIs differs completely from the very recent work of Brauner et al. [122] who use a Bayesian hierarchical model for the concrete, very comprehensive study on the effectiveness and burden of non-pharmaceutical interventions against COVID-19 transmission.
The power divergences I λ (P A,n ||P H,n ) (λ ∈ R) can be employed also in other ways within Bayesian decision making, of statistical nature. Namely, by adapting the general lines of Österreicher & Vajda [123] (see also Liese & Vajda [10], as well as diffusion-process applications in Stummer [5,31,32]) to our context of Galton-Watson processes with immigration, we can proceed as follows. For the sake of comfortable notations, we first attach the value θ := 1 to the GWI model (A) (which has prior probability p prior A ∈ ]0, 1[) and θ := 0 to (H) (which has prior probability 1 − p prior A ). Suppose we want to decide, in an optimal Bayesian way, which degree of evidence deg ∈ [0, 1] we should attribute (according to a pregiven loss function LO) to the model (A). In order to achieve this goal, we choose a nonnegatively-valued loss function LO(θ, deg) defined on {0, 1} × [0, 1], of two types which will be specified below. The risk at stage 0 (i.e., prior to the GWI-path observations X n ), from the optimal decision about the degree of evidence deg concerning the decision parameter θ, is defined as which can be thus interpreted as a minimal prior expected loss (the minimum will always exist). The corresponding risk posterior to the GWI-path observations X n , from the optimal decision about the degree of evidence deg concerning the parameter θ, is given by which is achieved by the optimal decision rule (about the degree of evidence) as well as the representation formula (cf. Österreicher & Vajda [123], Liese & Vajda [10], adapted to our GWI context); in other words, the power divergence I λ (P A,n ||P H,n ) can be regarded as a weighted-average statistical information measure (weighted-average decision risk reduction). One can also use other weights of p prior A in order to get bounds of I λ (P A,n ||P H,n ) (analogously to Stummer [5]).
As an alternative to the above-mentioned Bayesian-decision-making applications of Hellinger integrals H λ (P A,n ||P H,n ), let us now briefly discuss the use of the latter for the corresponding Neyman-Pearson (NPT) framework with randomized tests T n : Ω n → [0, 1] of the hypothesis P H against the alternative P A , based on the GWI-generation-size sample path observations X n := {X l : l ∈ {0, 1, . . . , n} }. In contrast to (17) and (18) a Neyman-Pearson test minimizes-over T n -the type II error probability Ω n (1 − T n ) dP A,n in the class of the tests for which the type I error probability Ω n T n dP H,n is at most ς ∈]0, 1[. The corresponding minimal type II error probability can for all ς ∈]0, 1[, λ ∈]0, 1[, i ∈ I be bounded from above by and for all λ > 1, i ∈ I it can be bounded from below by which is an adaption of a general result of Krafft & Plachky [125], see also Liese & Vajda [1] as well as Stummer & Vajda [15]. Hence, by combining (23) and (24) with the exact values respectively upper bounds of the Hellinger integrals H 1−λ (P A,n ||P H,n ) from the following sections, we obtain for our context of Galton-Watson processes with Poisson offspring and Poisson immigration (including the non-immigration case) some upper bounds of E ς (P A,n ||P H,n ), which can also be immediately rewritten as lower bounds for the power 1 − E ς (P A,n ||P H,n ) of a most powerful test at level ς. In contrast to such finite-time-horizon results, for the (to our context) incompatible setup of Galton-Watson processes with Poisson offspring but nonstochastic immigration of constant value 1, the asymptotic rates of decrease as n → ∞ of the unconstrained type II error probabilities as well as the type I error probabilites were studied in Linkov & Lunyova [53] by a different approach employing also Hellinger integrals. Some other types of Galton-Watson-process concerning Neyman-Pearson testing investigations different to ours can be found e.g., in Basawa & Scott [126], Feigin [127], Sweeting [128], Basawa & Scott [61], and the references therein.

Asymptotical Distinguishability
The next two concepts deal with two general families (P A,i ) i∈I and (P H,i ) i∈I of probability measures on the measurable spaces (Ω i , F i ) i∈I , where the index set I is either N 0 or R + . For them, the following two general types of asymptotical distinguishability are well known (see e.g., LeCam [109], Liese & Vajda [1], Jacod & Shiryaev [24], Linkov [129], and the references therein). Definition 1. The family (P A,i ) i∈I is contiguous to the family (P H,i ) i∈I -in symbols, (P A,i ) (P H,i )if for all sets A i ∈ F i with lim i→∞ P H,i (A i ) = 0 there holds lim i→∞ P A,i (A i ) = 0.

Definition 2.
Families of measures (P A,i ) i∈I and (P H,i ) i∈I are called entirely separated (completely asymptotically distinguishable)-in symbols, (P A,i ) (P H,i )-if there exist a sequence i m ↑ ∞ as m ↑ ∞ and for each m ∈ N 0 an A i m ∈ F i m such that lim m→∞ P A,i m (A i m ) = 1 and lim m→∞ P H,i m (A i m ) = 0.
It is clear that the notion of contiguity is the attempt to carry the concept of absolute continuity over to families of measures. Loosely speaking, (P A,i ) is contiguous to (P H,i ), if the limit lim i→∞ (P A,i ) (existence preconditioned) is absolute continuous to the limit lim i→∞ (P H,i ). However, for the definition of contiguity, we do not need to require the probability measures to converge to limiting probability measures. On the other hand, entire separation is the generalization of singularity to families of measures.
The corresponding negations will be denoted by and . One can easily check that a family (P A,i ) cannot be both contiguous and entirely separated to a family (P H,i ). In fact, as shown in Linkov [129], the relation between the families (P A,i ) and (P H,i ) can be uniquely classified into the following distinguishability types: As demonstrated in the above-mentioned references for a general context, one can conclude the type of distinguishability from the time-evolution of Hellinger integrals. Indeed, the following assertions can be found e.g., in Linkov [129], where part (c) was established in Liese & Vajda [1] and (f), (g) in Vajda [3]. Proposition 1. The following assertions are equivalent: , for all λ ∈]0, 1[.
In combination with the discussion after Definition 2, one can thus interpret the λ−order Hellinger integral H λ (P A,i ||P H,i ) as a "measure" for the distinctness of the two families P A,i and P H,i up to a fixed finite time horizon i ∈ I.
Furthermore, for the contiguity we obtain the equivalence (see e.g., Liese & Vajda [1], Linkov [129]) All the above-mentioned general results can be applied to our context of two competing Poissonian Galton-Watson processes with immigration (GWI) (H) and (A) (reflected by the two different laws P H resp. P A with parameter pairs (β H , α H ) resp. (β A , α A )), by taking P A,i := P A | F i and P H,i := P H | F i . Recall from the preceding subsections (by identifying i with n) that the latter two describe the stochastic dynamics of the respective GWI within the restricted time-/stage-frame {0, 1, . . . , i}.
In the following, we study in detail the evolution of Hellinger integrals between two competing models of Galton-Watson processes with immigration, which turns out to be quite extensive.

A First Basic Result
In terms of our notations (PS1) to (PS3), a typical situation for applications in our mind is that one particular constellation (β A , β H , α A , α H ) ∈ P (e.g., obtained from theoretical or previous statistical investigations) is fixed, whereas-in contrast-the parameter λ ∈ R\{0, 1} for the Hellinger integral or the power divergence might be chosen freely, e.g., depending on which (transform of a) dissimilarity measure one decides to choose for further analysis. At this point, let us emphasize that in general we will not make assumptions of the form β • 1, i.e., upon the type of criticality.
To start with our investigations, in order to justify for all n ∈ N 0 Z n := dP A,n dP H,n (cf. (13)), (14) and (15) (as well as I λ (P A,n ||P H,n ) for λ ∈ R respectively R λ (P A,n ||P H,n ) for λ ∈ R\{0, 1}), we first mention the following straightforward facts: (i) if (β A , β H , α A , α H ) ∈ P NI , then P A,n and P H,n are equivalent (i.e., P A,n ∼ P H,n ), as well as (ii) if (β A , β H , α A , α H ) ∈ P SP , then P A,n and P H,n are equivalent (i.e., P A,n ∼ P H,n ). Moreover, by recalling Z 0 = 1 and using the "rate functions" a version of (13) can be easily determined by calculating for each where for the last term we use the convention 0 0 x = 1 for all x ∈ N 0 . Furthermore, we define for each with the convention (0) 0 0! = 1 for the last term. Accordingly, one obtains from (14) the Hellinger integral H λ (P A,0 ||P H,0 ) = 1, as well as for all ( for x 0 = X 0 ∈ N, and for all n ∈ N\{1} From (29), one can see that a crucial role for the exact calculation (respectively the derivation of bounds) of the Hellinger integral is played by the functions defined for where we have used the λ-weighted-averages Since λ plays a special role, henceforth we typically use it as index and often omit ( This is consistent with the corresponding generally valid upper and lower bounds (cf. (9) and (11) As a first indication for our proposed method, let us start by illuminating the simplest case λ ∈ R\{0, 1} and γ : In this situation, all the three functions (30) to (32) are linear. Indeed, , whereas on P NI , the no-immigration setup, we get for all λ ∈ R\{0, 1} r E λ = 0. As it will be seen later on, such kind of linearity properties are useful for the recursive handling of the Hellinger integrals. However, only on the parameter set P NI ∪ P SP,1 the functions ϕ λ and φ λ are linear. Hence, in the general case (β A , β H , α A , α H , λ) ∈ P × R\{0, 1} we aim for linear lower and upper bounds x ∈ [0, ∞[ (ultimately, x ∈ N 0 ), which by (30) and (31) leads to x ∈ [0, ∞[ (ultimately, x ∈ N 0 ). Of course, the involved slopes and intercepts should satisfy reasonable restrictions. Later on, we shall impose further restrictions on the involved slopes and intercepts, in order to guarantee nice properties of the general Hellinger integral bounds given in Theorem 1 below for instance, in consistency with the nonnegativity of ϕ λ we could require p U λ ≥ p L λ ≥ 0, q U λ ≥ q L λ ≥ 0 which nontrivially implies that these bounds possess certain monotonicity properties . For the formulation of our first assertions on Hellinger integrals, we make use of the following notation: Notice the interrelation a Clearly, for all q ∈ R\{0} and p ∈ R one has the linear interrelation Accordingly, we obtain fundamental Hellinger integral evaluations: , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N one can recursively compute the exact value (35) for all x ∈ N 0 and thus in particular p L λ ≤ p U λ , q L λ ≤ q U λ , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N one gets the following recursive (i.e., recursively computable) bounds for the Hellinger integrals: λ,X 0 ,n , 1 =: B U λ,X 0 ,n , (40) for λ ∈ R\[0, 1] : where for general λ ∈ R\{0, 1}, p ∈ R, q ∈ R\{0} we use the definitions as well as B

Remark 1.
(a) Notice that the expression B (p,q) λ,X 0 ,n can analogously be defined on the parameter set P NI ∪ P SP,1 . For the λ,X 0 ,n = V λ,X 0 ,n as the exact value (rather than a lower/upper bound (component)).

(b)
In the case q = β λ one gets the explicit representation B Using the skew symmetry (8), one can derive alternative bounds of the Hellinger integral by switching to the transformed parameter setup ( . However, this does not lead to different bounds: define (30), (31) and (32) by , and the set of (lower and upper bound) parameters If there are no other restrictions on p L λ , p U λ , q L λ , q U λ than (35), the bounds in (40) and (41) can have some inconvenient features, e.g., being 1 for all (large enough) n ∈ N, having oscillating n-behaviour, being suboptimal in certain (other) senses. For a detailed discussion, the reader is referred to Section 3.16 ff. below. (e) For the (to our context) incompatible setup of GWI with Poisson offspring but nonstochastic immigration of constant value 1, the exact values of the corresponding Hellinger integrals (i.e., an "analogue" of part (a)) was established in Linkov & Lunyova [53].
Proof of Theorem 1. Let us fix (β A , β H , α A , α H ) ∈ P as well as x 0 := X 0 ∈ N, and start with arbitrary λ ∈]0, 1[. We first prove the upper bound B U λ,X 0 ,n of part (b). Correspondingly, we suppose that the coefficients p U λ , q U λ satisfy (35) for all x ∈ N 0 . From (28), (30), (31), (32) and (35) one gets immediately B U λ,X 0 ,1 in terms of the first sequence-element a (q U λ ) 1 (cf. (36)). With the help of (29) for all observation horizons n ∈ N\{1} we get (with the obvious shortcut for n = 2) Notice that for the strictness of the above inequalities we have used the fact that φ λ (x) < φ U λ (x) for some (in fact, all but at most two) x ∈ N 0 (cf. Properties 3(P19) below). Since for some admissible choices of p U λ , q U λ and some n ∈ N the last term in (43) can become larger than 1, one needs to take into account the cutoff-point 1 arising from (9). The lower bound B L λ,X 0 ,n of part (b), as well as the exact value of part (a) follow from (29) in an analogous manner by employing p L λ , q L λ and p E λ , q E λ respectively. Furthermore, we use the fact that for (β n . For the sake of brevity, the corresponding straightforward details are omitted here. Although we take the minimum of the upper bound derived in (43) and 1, the inequality B L λ,X 0 ,n < B U λ,X 0 ,n is nevertheless valid: the reason is that for constituting a lower bound, the parameters p L λ , q L λ must fulfill either the conditions p L λ − α λ < 0 and q L λ − β λ ≤ 0 or p L λ − α λ ≤ 0 and q L λ − β λ < 0 (or both), which guarantees that B L λ,X 0 ,n < 1. The proof for all λ ∈ R\[0, 1] works out completely analogous, by taking into account the generally valid lower bound H λ (P A,n ||P H,n ) ≥ 1 (cf. (11)).

Some Useful Facts for Deeper Analyses
Theorem 1(b) and Remark 1(a) indicate the crucial role of the expression B (p,q) λ,X 0 ,n and that the choice of the quantities p, q depends on the underlying (e.g., fixed) offspring-immigration parameter constellation (β A , β H , α A , α H ) as well as on the (e.g., selectable) value of λ, i.e., In order to study the desired time-behaviour n → B (·,·) λ,X 0 ,n of the Hellinger integral bounds resp. exact values, one therefore faces a six-dimensional (and thus highly non-obvious) detailed analysis, including the search for criteria (in addition to (35)) on good/optimal choices of p L λ , q L λ , p U λ , q U λ . Since these criteria will (almost) always imply the nonnegativity of , let us first present some fundamental properties of the underlying crucial sequences a for general p ≥ 0, q ≥ 0.

Properties 1.
For all λ ∈ R the following holds: is strictly negative, strictly decreasing and converges to the is strictly positive and strictly increasing. Notice that in this setup, q = 1 implies min{1, e β λ −1 } = e β λ −1 < q.
(P4) If q = 0, then one gets a Due to the linear interrelation (38), these results directly carry over to the behaviour of the sequence b (p,q) n n∈N : (P5) If p > 0 and 0 < q < β λ , then the sequence b (p,q) n n∈N is strictly decreasing and converges to is strictly negative for all n ∈ N.
(P5c) If additionally p > α λ , then b (p,q) n n∈N is strictly positive for some (and possibly for all) n ∈ N.
(P8) For the remaining cases we get: b Moreover, in our investigations we will repeatedly make use of the function ξ n (see also (44)), which has the following properties: (P9) For q ∈]0, ∞[ and all λ ∈ R\{0, 1} the function ξ (q) λ (·) is strictly increasing, strictly convex and smooth, and there holds The proof of these properties is provided in Appendix A.
As already mentioned before, the sequences a (q) n n∈N and b (p,q) n n∈N -whose behaviours for general p ≥ 0 and q ≥ 0 were described by the Properties 1-have to be evaluated at setup-dependent , one of the questions-which arises in the course of the desired investigations of the time-behaviour of the Hellinger integral bounds (resp. exact values)-is for which λ ∈ R the sequence a (q λ ) n n∈N converges. In the following, we illuminate this for the important special case there holds q λ > max{0, β λ }, and from (P3) one can see that a does not converge to x (q λ ) 0 in general, but for q λ ≤ min{1, e β λ −1 } which constitutes an implicit condition on λ. This can be made explicit, with the help of the auxiliary variables in case that the set is nonempty, 0, else, in case that the set is nonempty, 1, else.

For the constellation β
converges for all λ ∈ R\{0, 1} and we can set λ − := −∞ as well as λ + := ∞. Incorporating this and by adapting a result of Linkov & Lunyova [53] on Here, for fixed β ∈]0, ∞[\{1} we denote by z(β) the unique solution of the equation A corresponding proof is given in Appendix A.1.
With these auxiliary basic facts in hand, let us now work out our detailed investigations of the time-behaviour n → H λ (P A,n ||P H,n ), where we start with the exactly treatable case (a) in Theorem 1.

Detailed Analyses of the Exact Recursive Values, i.e., for the Cases
In the no-immigration-case (β A , β H , α A , α H ) ∈ P NI and in the equal-fraction-case (β A , β H , α A , α H ) ∈ P SP,1 , the Hellinger integral can be calculated exactly in terms of H λ (P A,n ||P H,n ) = V λ,X 0 ,n (cf. (39)), as proposed in part (a) of Theorem 1. This quantity depends on the behaviour of the sequence a The last expression is equal to zero on P NI . On P SP,1 , this sum is unequal to zero. Using Lemma A1 we is strictly negative, strictly decreasing and it converges to the unique solution x is strictly positive, strictly increasing and converges to the smallest positive solution x (44) in case that (P3a) is satisfied, otherwise it diverges to ∞. Thus, we have shown the following detailed behaviour of Hellinger integrals: the sequence (H λ (P A,n ||P H,n )) n∈N given by the sequence (H λ (P A,n ||P H,n )) n∈N given by under consideration is formally the same, with the parameter q E λ := β λ A β 1−λ H > 0. However, in contrast to the case P NI , on P SP,1 both the sequence a are strictly decreasing in case that λ ∈]0, 1[, and strictly increasing in case that λ ∈ R\[0, 1]. The respective convergence behaviours are given in Properties 1 (P1) and (P3). We thus obtain the map X 0 → V λ,X 0 ,n is strictly decreasing.
Due to the nature of the equal-fraction-case P SP,1 , in the assertions (a), (b), (d) of the Propositions 4 and 5, the fraction α A /β A can be equivalently replaced by α H /β H .

Remark 2.
For the (to our context) incompatible setup of GWI with Poisson offspring but nonstochastic immigration of constant value 1, an "analogue" of part (d) of the Propositions 4 resp. 5 was established in Linkov & Lunyova [53].

Some Preparatory Basic Facts for the Remaining Cases
The bounds B L λ,X 0 ,n , B U λ,X 0 ,n for the Hellinger integral introduced in formula (40) in Theorem 1 can be chosen arbitrarily from a (p L λ , q L λ , p U λ , q U λ )-indexed set of context-specific parameters satisfying (34), or equivalently (35).
In order to derive bounds which are optimal, with respect to goals that will be discussed later, the following monotonicity properties of the sequences a general, context-independent parameters q and p, will turn out to be very useful: , for all n ∈ N.
As an example, consider the setup (β A , β H , α A , α H , λ) = (0.4, 0.8, 5, 3, 0.5); within our running-example epidemiological context of Section 2.3, this corresponds to a "nearly dangerous" infectious-disease-transmission situation (H) (with nearly critical reproduction number β H = 0.8 and importation mean of α H = 3), whereas (A) describes a "mild" situation (with "low" subcritical β A = 0.4 and α A = 5). On the nonnegative real line, the function φ λ (x) can be bounded from above by the linear functions φ U,1 λ (x) := p 1 + q 1 x := 4.040 + 0.593 · x as well as by φ U,2 λ (x) := p 2 + q 2 x := 4.110 + 0.584 · x. Clearly, p 1 < p 2 and q 1 > q 2 . Let us show the first eight elements and the respective limits of the corresponding sequences b 0 . Then there holds From (P10) to (P12) one deduces that both sequences a which are as small as possible, and for the lower bound B L λ,X 0 ,n we should use nonnegative context-specific parameters ) which are as large as possible, of course, subject to the (equivalent) restrictions (34) and (35).
The properties (P15) to (P20) above describe in detail the characteristics of the function φ λ (·) = φ(·, β A , β H , α A , α H , λ). In the previous parameter setup P NI ∪ P SP,1 , this function is linear, which can be seen from (P19). In the current parameter setup P SP \P SP,1 , this function can basically be classified into four different types. From (P16) to (P20) it is easy to see that for all current parameter constellations the particular choices which correspond to the following choices in (35) 1]). Notice that for the previous parameter setup (β A , β H , α A , α H ) ∈ (P NI ∪ P SP,1 ) these choices led to the exact values of the Hellinger integral and to the simplification For a better distinguishability and easier reference we thus stick to the L−notation (resp. U−notation) here.

Lower Bounds for the Cases
The discussion above implies that the lower bound B L λ,X 0 ,n for the Hellinger integral H λ (P A,n ||P H,n ) in (40) is optimal for the choices p L λ , q L λ > 0 defined in (45). If β A = β H , due to Properties 1 (P1) and Lemma A1, the sequence a is strictly negative and strictly decreasing and converges to the unique negative solution of the Equation (44). Furthermore, due to (P5), , as defined in (37), is strictly decreasing. Since b Thus, analogously to the cases P NI ∪ P SP,1 we obtain the map X 0 → B L λ,X 0 ,n is strictly decreasing.

Goals for Upper Bounds for the Cases
contrast to the treatment of the lower bounds (cf. the previous Section 3.5), the fine-tuning of the upper bounds of the Hellinger integrals H λ (P A,n ||P H,n ) is much more involved. To begin with, let us mention that the monotonicity-concerning Properties 2 (P10) to (P12) imply that for a tight upper bound B U λ,X 0 ,n (cf. (40)) one should choose parameters p U λ ≥ p L λ > 0, q U λ ≥ q L λ > 0 as small as possible. Due to the concavity (cf. Properties 3 (P19)) of the function φ λ (·), the linear upper bound φ U λ (·) (on the ultimately relevant subdomain N 0 ) thus must hit the function φ λ (·) in at least one point x ∈ N 0 , which corresponds to some "discrete tangent line" of φ λ (·) in x, or in at most two points x, x + 1 ∈ N 0 , which corresponds to the secant line of φ λ (·) across its arguments x and x + 1. Accordingly, there is in general no overall best upper bound; of course, one way to obtain "good" upper bounds for H λ (P A,n ||P H,n ) is to solve the optimization problem subject to the constraint (35). However, the corresponding result generally depends on the particular choice of the initial population X 0 ∈ N and on the observation time horizon n ∈ N. Hence, there is in general no overall optimal choice of p U λ , q U λ without the incorporation of further goal-dependent constraints such as lim n→∞ B U λ,X 0 ,n = 0 in case of lim n→∞ H λ (P A,n ||P H,n ) = 0. By the way, mainly because of the non-explicitness of the sequence a (q U λ ) n n∈N (due to the generally not explicitly solvable recursion (36)) and the discreteness of the constraint (35), this optimization problem seems to be not straightforward to solve, anyway. The choice of parameters p U λ , q U λ for the upper bound B U λ,X 0 ,n ≥ H λ (P A,n ||P H,n ) can be made according to different, partially incompatible ("optimality-" resp. "goodness-") criteria and goals, such as: (G1) the validity of B U λ,X 0 ,n < 1 simultaneously for all initial configurations X 0 ∈ N, all observation horizons n ∈ N and all λ ∈]0, 1[, which leads to a strict improvement of the general upper bound H λ (P A,n ||P H,n ) < 1 (cf. (9)); (G2) the determination of the long-term-limits lim n→∞ H λ (P A,n ||P H,n ) respectively lim n→∞ B U λ,X 0 ,n for all X 0 ∈ N and all λ ∈]0, 1[; in particular, one would like to check whether lim n→∞ H λ (P A,n ||P H,n ) = 0, which implies that the families of probability distributions (P A,n ) n∈N and (P H,n ) n∈N are asymptotically distinguishable (entirely separated), cf. (25); (G3) the determination of the time-asymptotical growth rates lim n→∞ 1 n log H λ (P A,n ||P H,n ) resp. lim n→∞ 1 n log B U λ,X 0 ,n for all X 0 ∈ N and all λ ∈]0, 1[.
Further goals-with which we do not deal here for the sake of brevity-are for instance (i) a very good tightness of the upper bound B U λ,X 0 ,n for n ≥ N for some fixed large N ∈ N, or (ii) the criterion (G1) with fixed (rather than arbitrary) initial population size X 0 ∈ N.
Let us briefly discuss the three Goals (G1) to (G3) and their challenges: due to Theorem 1, Goal is non-increasing, since otherwise, for each fixed observation horizon n ∈ N there is a large enough initial population size X 0 such that the upper bound λ,X 0 ,n becomes larger than 1, and thus B U λ,X 0 ,n = 1 (cf. (40)). Hence, Properties 1 (P1) and (P2) imply that one should have q U λ ≤ β λ . Then, the sequence b is also non-increasing.
is not necessarily decreasing. Nevertheless, the restriction where at least one of the inequalities is strict, (47) ensures that both sequences a are nonpositive and decreasing, where at least one sequence is strictly negative, implying that the sum ∑ n k=1 b is strictly negative for n ≥ 2 and strictly decreasing. To see this, suppose that (47) is satisfied with two strict inequalities.
are strictly negative and strictly decreasing. If q U λ = β λ and p U λ < α λ , we see from (P2) and (P6) that a is not possible in the current setup P SP \P SP,1 and for λ ∈]0, 1[). In the last case q U λ < β λ and p U λ = α λ , from (P1) and (P5) it follows that a (q U λ ) n n∈N is strictly negative and strictly decreasing, is strictly decreasing and strictly negative for n ≥ 2.
Thus, whenever (47) is satisfied, the sum ∑ n k=1 b is strictly negative for n ≥ 2 and strictly decreasing.
To achieve Goal (G2), we have to require that the sequence a converges to a negative limit, i.e., lim n→∞ b The examination of Goal (G2) above enters into the discussion of Goal (G3): if the sequence a (q U λ ) n n∈N converges and lim n→∞ B U λ,X 0 ,n = 0, then there holds For the case (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 )×]0, 1[, let us now start with our comprehensive investigations of the upper bounds, where we focus on fulfilling the condition (47) which tackles Goals (G1) and (G2) simultaneously; then, the Goal (G3) can be achieved by (48). As indicated above, various different parameter subcases can lead to different Hellinger-integral-upper-bound details, which we work out in the following. For better transparency, we employ the following notations (where the first four are just reminders of sets which were already introduced above) notice that because of Lemma A1 and of the Properties 3 (P15) one gets on the domain ]0,

Upper Bounds for the Cases
For this parameter constellation, one has φ λ (0) = 0 and φ λ (0) = 0 (cf. Properties 3 (P16), (P17)). Thus, the only admissible intercept choice satisfying (47) and the minimal admissible slope which implies (35) for all Analogously to the investigation for P SP,1 in the above-mentioned is strictly negative, strictly decreasing, and converges to Moreover, in the same manner as for the case P SP,1 this leads to the map X 0 → B U λ,X 0 ,n is strictly decreasing.

Upper Bounds for the Cases
8, 0.9, 2.9, 0.7, 0.5) for φ λ (0) > 0; within our running-example epidemiological context of Section 2.3, this corresponds to a "nearly dangerous" infectious-disease-transmission situation (H) (with nearly critical reproduction number β H = 0.9 and importation mean of α H = 0.7), whereas (A) describes a "dangerous" situation (with supercritical β A = 1.8 and α A = 2.7, 2.8, 2.9). However, in all three subcases there holds max (47)) such that (35) is satisfied. As explained above, we get the following (35) for all x ∈ N 0 , and for all such pairs (p U λ , q U λ ) and all initial population sizes X 0 ∈ N there holds λ,X 0 ,n n∈N of upper bounds for H λ (P A,n ||P H,n ) given by Notice that all parts of this proposition also hold true for parameter pairs (p U λ , q U λ ) satisfying (35) and additionally either p U λ = α λ , q U λ < β λ or p U λ < α λ , q U λ = β λ . Let us briefly illuminate the above-mentioned possible parameter choices, where we begin with the case of φ λ (0) ≤ 0, which corresponds to P17)); then, the function φ λ (·) is strictly negative, strictly decreasing, and-due to (P19)-strictly concave (and thus, the assumption α H −α A β A −β H < 0 is superfluous here). One pragmatic but yet reasonable parameter choice is the following: take any intercept which corresponds to a linear function φ U λ which is (i) nonpositive on N 0 and strictly negative on N, and (ii) larger than or equal to φ λ on N 0 , strictly larger than φ λ on N\{1, 2}, and equal to φ λ at the point x = 1 ("discrete tangent or secant line through x = 1"). One can easily see that (due to the restriction (34)) not all (1)).
Let us first inspect the case A larger intercept would lead to a linear function φ U λ for which (35) is not valid at x max + 1. In the other subcase φ λ ( , 0] and as corresponding slope ≤ 0 (notice that the corresponding line φ U λ is on ] x max , ∞[ strictly larger than the secant line through φ λ ( x max ) and φ λ ( x max + 1)).
If φ λ ( x max ) ≤ φ λ ( x max + 1), one can proceed as above by substituting the crucial pair of points ( x max , x max + 1) with ( x max + 1, x max + 2) and examining the analogous two subcases.

Upper Bounds for the Cases
The only difference to the preceding Section 3.8 is that-due to Properties 3 (P15)-the maximum value of φ λ (·) now achieves 0, at the positive non-integer point 1.1, 3.0, 0.5) as an example, which within our running-example epidemiological context of Section 2.3 corresponds to a "nearly dangerous" infectious-disease-transmission situation (H) (with nearly critical reproduction number β H = 0.9 and importation mean of α H = 3), whereas (A) describes a "dangerous" situation (with supercritical β A = 1.8 and α A = 1.1)); this implies that φ λ (x) < 0 for all x on the relevant subdomain N 0 . Due to (P16), (P17) and (P19) one gets automatically (47) and (35) are satisfied. Thus, all the assertions (a) to (e) of Proposition 8 also hold true for the current parameter constellations.

Upper Bounds for the Cases
The only difference to the preceding Section 3.9 is that the maximum value of φ λ (·) now achieves 0 at the integer point x max = x * = α H −α A β A −β H ∈ N (take e.g., (β A , β H , α A , α H , λ) = (1.8, 0.9, 1.2, 3.0, 0.5) as an example). Accordingly, there do not exist parameters p U λ , q U λ , such that (35) and (47) are satisfied simultaneously. The only parameter pair that ensures exp a n ∈ N and all X 0 ∈ N without further investigations, leads to the choices p U λ = α λ as well as q U λ = β λ . Consequently, B U λ,X 0 ,n ≡ 1, which coincides with the general upper bound (9), but violates the above-mentioned desired Goal (G1). However, there might exist parameters p U λ < α λ , q U λ > β λ or p U λ > α λ , q U λ < β λ , such that at least the parts (c) and (d) of Proposition 8 are satisfied. Nevertheless, by using a conceptually different method we can prove as well as the convergence lim n→∞ H λ (P A,n ||P H,n ) = 0 (51) which will be used for the study of complete asymptotical distinguishability (entire separation) below. This proof is provided in Appendix A.1.

Upper Bounds for the Cases
This setup and the remaining setup (β A , β H , α A , α H , λ) ∈ P SP,4b ×]0, 1[ (see the next Section 3.12) are the only constellations where φ λ (·) is strictly negative and strictly increasing, with lim x→∞ φ λ (x) = lim x→∞ φ λ (x) = 0, leading to the choices p U λ = α λ as well as q U λ = β λ = β under the restriction that exp a ≤ 1 for all n ∈ N and all X 0 ∈ N. Consequently, one has B U λ,X 0 ,n ≡ 1, which is consistent with the general upper bound (9) but violates the above-mentioned desired Goal (G1). Unfortunately, the proof method of (51) (cf. Appendix A.1) can't be carried over to the current setup. The following proposition states two of the above-mentioned desired assertions which can be verified by a completely different proof method, which is also given in Appendix A.1. (35) is satisfied for all x ∈ [0, ∞[ and such that for all initial population sizes X 0 ∈ N the parts (c) and (d) of Proposition 8 hold true.

Upper Bounds for the Cases
The assertions preceding Proposition 9 remain valid. However, any linear upper bound of the function φ λ (·) on the domain N 0 possesses the slope q U λ − β λ ≥ 0. If q U λ = β λ , then the intercept is p U λ − α λ = 0 leading to B U λ,X 0 ,n ≡ 1 and thus Goal (G1) is violated. If we use a slope q U λ − β λ > 0, then both the sequences a are strictly increasing and diverge to ∞.
This comes from Properties 1 (P3b) and (P7b) since q U λ > β λ = β ≥ 1. Altogether, this implies that the corresponding upper bound component B λ,X 0 ,n (cf. (42) As mentioned earlier on, starting from Section 3.6 we have principally focused on constructing upper bounds B U λ,X 0 ,n of the Hellinger integrals, starting from p U λ , q U λ which fulfill (35) as well as further constraints depending on the Goals (G1) and (G2). For the setups in the Sections 3.7-3.9, we have proved the existence of special parameter choices p U λ , q U λ which were consistent with (G1) and (G2). Furthermore, for the constellation in the Section 3.11 we have found parameters such that at least (G2) is satisfied. In contrast, for the setup of Section 3.12 we have not found any choices which are consistent with (G1) and (G2), leading to the "cut-off bound" B U λ,X 0 ,n ≡ 1 which gives no improvement over the generally valid upper bound (9).
In the following, we present some alternative choices of p U λ , q U λ which-depending on the parameter constellation (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 )×]0, 1[-may or may not lead to upper bounds B U λ,X 0 ,n which are consistent with Goal (G1) or with (G2) (and which are maybe weaker or better than resp. incomparable with the previous upper bounds when dealing with some relaxations of (G1), such as e.g., H λ (P A,n ||P H,n ) < 1 for all but finitely many n ∈ N).
As a first alternative choice for a linear upper bound of φ λ (·) (cf. (35)) one could use the asymptote φ λ (·) (cf. Properties 3 (P20)) with the parameters p U λ : where φ λ (·) is given by (P17). Notice that this upper bound is for y ∈]0, ∞[\N "not tight" in the sense that φ tan λ,y (·) does not hit the function φ λ (·) on N 0 (where the generation sizes "live"); moreover, φ tan λ,y (x) might take on strictly positive values for large enough points x which is counter-productive for Goal (G1). Another alternative choice of a linear upper bound for φ λ (·), which in contrast to the tangent line is "tight" (but not necessarily avoiding the strict positivity), is the secant line φ sec λ,k (·) across its arguments k and k + 1, given by Another alternative choice is the horizontal line For p U λ ∈ p λ , p tan λ,y , p sec λ,y and q U λ ∈ q tan λ,y , q sec λ,y it is possible that in some parameter cases (β A , β H , α A , α H ) either the intercept r U λ = p U λ − α λ is strictly larger than zero or the slope s U λ = q U λ − β λ is strictly larger than zero. Thus, it can happen that B λ,X 0 ,n > 1 for some (and even for all) n ∈ N, such that the corresponding upper bound B U λ,X 0 ,n for the Hellinger integral H λ (P A,n ||P H,n ) amounts to the cut-off at 1. However, due to Properties 1 (P5) and (P7a), the sequence B (p U λ ,q U λ ) λ,X 0 ,n n∈N may become smaller than 1 and may finally converge to zero. Due to Properties 2 (P14), this upper bound can even be tighter (smaller) than those bounds derived from parameters p U λ , q U λ fulfilling (47). As far as our desired Hellinger integral bounds are concerned, in the setup of Section 3.11 -where lim y→∞ φ tan λ,y (·) ≡ 0-for the proof of Proposition 9 in Appendix A.1 we shall employ the mappings y → φ tan λ,y resp. y → p tan λ,y resp. y → q tan λ,y . These will also be used for the proof of the below-mentioned Theorem 4.

Intermezzo 1: Application to Asymptotical Distinguishability
The above-mentioned investigations can be applied to the context of Section 2.6 on asymptotical distinguishability. Indeed, with the help of the Definitions 1 and 2 as well as the equivalence relations (25) and (26) we obtain the following Corollary 1.

(a)
For all (β A , β H , α A , α H ) ∈ P SP \P SP,4b and all initial population sizes X 0 ∈ N, the corresponding sequences (P A,n ) n∈N 0 and (P H,n ) n∈N 0 are entirely separated (completely asymptotically distinguishable).
For all (β A , β H , α A , α H ) ∈ P NI with β A ≤ 1 and all initial population sizes X 0 ∈ N, the sequence For all (β A , β H , α A , α H ) ∈ P NI with β A > 1 and all initial population sizes X 0 ∈ N, the sequence (P A,n ) n∈N 0 is neither contiguous to nor entirely separated to (P H,n ) n∈N 0 .
The proof of Corollary 1 will be given in Appendix A.1.

Remark 3.
(a) Assertion (c) of Corollary 1 contrasts the case of Gaussian processes with independent increments where one gets either entire separation or mutual contiguity (see e.g., Liese & Vajda [1]). (b) By putting Corollary 1(b) and (c) together, we obtain for different "criticality pairs" in the non-immigration case P NI the following asymptotical distinguishability types: in particular, for P NI the sequences (P A,n ) n∈N 0 and (P H,n ) n∈N 0 are not completely asymptotically inseparable (indistinguishable). (c) In the light of the above-mentioned characterizations of contiguity resp. entire separation by means of Hellinger integral limits, the finite-time-horizon results on Hellinger integrals given in the "λ ∈]0, 1[ parts" of Theorem 1, the Sections 3.3-3.13 and also in the below-mentioned Section 6 can loosely be interpreted as "finite-sample (rather than asymptotical) distinguishability" assertions. The above-mentioned investigations can be applied to the context of Section 2.5 on dichotomous Bayesian decision making on the space of all possible path scenarios (path space) of Poissonian Galton-Watson processes without/with immigration GW(I) (e.g., in combination with our running-example epidemiological context of Section 2.3). More detailed, for the minimal mean decision loss (Bayes risk) R n defined by (18) we can derive upper (respectively lower) bounds by using (19) respectively (20) together with the exact values or the upper (respectively lower) bounds of the Hellinger integrals H λ (P A,n ||P H,n ) derived in the "λ ∈]0, 1[ parts" of Theorem 1, the Sections 3.3-3.13 (and also in the below-mentioned Section 6); instead of providing the corresponding outcoming formulas-which is merely repetitive-we give the illustrative Example 1. Based on a sample path observation X n := {X : = 1, ..., n} of a GWI, which is either governed by a hypothesis law P H or an alternative law P A , we want to make a dichotomous optimal Bayesian decision described in Section 2.5, namely, decide between an action d H "associated with" P H and an action d A "associated with" P A , with pregiven loss function (16) involving constants L A > 0, L H > 0 which e.g., arise as bounds from quantities in worst-case scenarios.
For this, let us exemplarily deal with initial population X 0 = 5 as well as parameter setup (β A , β H , α A , α H ) = (1.2, 0.9, 4, 3) ∈ P SP,1 ; within our running-example epidemiological context of Section 2.3, this corresponds e.g., to a setup where one is encountered with a novel infectious disease (such as COVID-19) of non-negligible fatality rate, and (A) reflects a "potentially dangerous" infectious-disease-transmission situation (with supercritical reproduction number β A = 1.2 and importation mean of α A = 4, for weekly appearing new incidence-generations) whereas (H) describes a "milder" situation (with subcritical β H = 0.9 and α H = 3). Moreover, let d H and d A reflect two possible sets of interventions (control measures) in the course of pandemic risk management, with respective "worst-case type" decision losses L A = 600 and L H = 300 (e.g., in units of billion Euros or U.S. Dollars). Additionally we assume the prior probabilities π = Pr(H) = 1 − Pr(A) = 0.5, which results in the prior-loss constants L A = 300 and L H = 150. In order to obtain bounds for the corresponding minimal mean decision loss (Bayes Risk) R n defined in (18) we can employ the general Stummer-Vajda bounds (cf. [15]) (19) and (20) in terms of the Hellinger integral H λ (P A,n ||P H,n ) (with arbitrary λ ∈]0, 1[), and combine this with the appropriate detailed results on the latter from the preceding subsections. To demonstrate this, let us choose λ = 0.5 (for which H 1/2 (P A,n ||P H,n ) can be interpreted as a multiple of the Bhattacharyya coefficient between the two competing GWI) respectively λ = 0.9, leading to the parameters p E 0.5 = 3.464, q E 0.5 = 1.039 respectively p E 0.9 = 3.887, q E 0.9 = 1.166 (cf. (33)). Combining (19) and (20)  . Figure 1 illustrates the lower (orange resp. cyan) and upper (red resp. blue) bounds R L n resp. R U n of the Bayes Risk R n employing λ = 0.5 resp. λ = 0.9 on both a unit scale (left graph) and a logarithmic scale (right graph).

Neyman-Pearson Testing
By combining (23) with the exact values resp. upper bounds of the Hellinger integrals H λ (P A,n ||P H,n ) from the preceding subsections, we obtain for our context of GW(I) with Poisson offspring and Poisson immigration (including the non-immigration case) some upper bounds of the minimal type II error probability E ς (P A,n ||P H,n ) in the class of the tests for which the type I error probability is at most ς ∈]0, 1[, which can also be immediately rewritten as lower bounds for the power 1 − E ς (P A,n ||P H,n ) of a most powerful test at level ς. As for the Bayesian context of Section 3.15.1, instead of providing the-merely repetitive-outcoming formulas for the bounds of E ς (P A,n ||P H,n ) we give the illustrative Example 2. Consider the Figures 2 and 3 which deal with initial population X 0 = 5 and the parameter setup (β A , β H , α A , α H ) = (0.3, 1.2, 1, 4) ∈ P SP,1 ; within our running-example epidemiological context of Section 2.3, this corresponds to a "potentially dangerous" infectious-disease-transmission situation (H) (with supercritical reproduction number β H = 1.2 and importation mean of α H = 4), whereas (A) describes a "very mild" situation (with "low" subcritical β A = 0.3 and α A = 1). Figure 2 shows the lower and upper bounds of E ς (P A,n ||P H,n ) with ς = 0.05, evaluated from the Formulas (23) and (24), together with the exact values of the Hellinger integral H λ (P A,n ||P H,n ), cf. Theorem 1 (recall that we are in the setup P SP,1 ) on both a unit scale (left graph) and a logarithmic scale (right graph). The orange resp. red resp. purple curves correspond to the outcoming upper bounds E U n := E U n (P A,n ||P H,n ) (cf. (23)) with parameters λ = 0.3 resp. λ = 0.5 resp. λ = 0.7. The green resp. cyan resp. blue curves correspond to the lower bounds E L n := E L n (P A,n ||P H,n ) (cf. (24)) with parameters λ = 2 resp. λ = 1.5 resp. λ = 1.1. Notice the different λ-ranges in (23) and (24). In contrast, Figure 3 compares the lower bound E L n (for fixed λ = 1.1) with the upper bound E U n (for fixed λ = 0.5) of the minimal type II error probability E ς (P A,n ||P H,n ) for different levels ς = 0.1 (orange for the lower and cyan for the upper bound), ς = 0.05 (green and magenta) and ς = 0.01 (blue and purple) on both a unit scale (left graph) and a logarithmic scale (right graph).
, where for the latter we have derived in Theorem 1(a) and in Proposition 5 the exact recursive values for the time-behaviour of the Hellinger integrals H λ (P A,1 ||P H,1 ) of order λ ∈ R\[0, 1]. Moreover, recall that for the case (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 )×]0, 1[ we have obtained in the Sections 3.4 and 3.5 some "optimal" linear lower bounds φ L λ (·) for the strictly concave function φ λ (x) := φ(x, β A , β H , α A , α H , λ) on the domain x ∈ [0, ∞[; due to the monotonicity Properties 2 (P10) to (P12) of the sequences , these bounds have led to the "optimal" recursive lower bound B L λ,X 0 ,n of the Hellinger integral H λ (P A,n ||P H,n ) in (40) of Theorem 1(b)). n n E U n , λ = 0.5: In contrast, the strict convexity of the function φ λ (·) in the case (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 ) × (R\[0, 1]) implies that we cannot maximize both parameters p L λ , q L λ ∈ R simultaneously subject to the constraint (35). This effect carries over to the lower bounds B L λ,X 0 ,n of the Hellinger integrals H λ (P A,n ||P H,n ) (cf. (41)); in general, these bounds cannot be maximized simultaneously for all initial population sizes X 0 ∈ N and all observation horizons n ∈ N.
Analogously to (46), one way to obtain "good" recursive lower bounds for H λ (P A,n ||P H,n ) from (41) in Theorem 1 (b) is to solve the optimization problem, for each fixed initial population size X 0 ∈ N and observation horizon n ∈ N. But due to the same reasons as explained right after (46), the optimization problem (55) seems to be not straightforward to solve explicitly. In a congeneric way as in the discussion of the upper bounds for the case λ ∈]0, 1[ above, we now have to look for suitable parameters p L λ , q L λ for the lower bound B L λ,X 0 ,n ≤ H λ (P A,n ||P H,n ) that fulfill (35) and that guarantee certain reasonable criteria and goals; these are similar to the goals (G1) to (G3) from Section 3.6, and are therefore supplemented by an additional " ": (G1 ) the validity of B L λ,X 0 ,n > 1 simultaneously for all initial configurations X 0 ∈ N, all observation horizons n ∈ N and all λ ∈ R\[0, 1], which leads to a strict improvement of the general upper bound H λ (P A,n ||P H,n ) > 1 (cf. (11)); (G2 ) the determination of the long-term-limits lim n→∞ H λ (P A,n ||P H,n ) respectively lim n→∞ B L λ,X 0 ,n for all X 0 ∈ N and all λ ∈ R\[0, 1]; in particular, one would like to check whether lim n→∞ H λ (P A,n ||P H,n ) = ∞; (G3 ) the determination of the time-asymptotical growth rates lim n→∞ 1 n log H λ (P A,n ||P H,n ) resp. lim n→∞ 1 n log B L λ,X 0 ,n for all X 0 ∈ N and all λ ∈ R\[0, 1]. In the following, let us briefly discuss how these three goals can be achieved in principle, where we confine ourselves to parameters p L λ , q L λ which-in addition to (35)-fulfill the requirement where ∧ is the logical "AND" and ∨ the logical "OR" operator. This is sufficient to tackle all three Goals (G1 ) to (G3 ). To see this, assume that p L λ , q L λ satisfy (35). Let us begin with the two "extremal" cases in (56) Suppose in the first extremal case (i) that β λ ≤ 0. Then, q L λ = 0 and Properties 1 (P4) implies that a This enters into (41) as follows: the Hellinger integral lower bound becomes B L λ,X 0 ,n ≥ B λ,X 0 ,n = exp{−β λ · X 0 + (p L λ e −β λ − α λ ) · n} > 1. Furthermore, one clearly has lim n→∞ B L λ,X 0 ,n = ∞ as well as lim n→∞ Furthermore, one gets lim n→∞ B L λ,X 0 ,n = ∞ as well as lim n→∞ 1 n log B L λ,X 0 ,n = p L λ − α λ > 0. Let us consider the other above-mentioned extremal case (ii). Suppose that q L λ > max{0, β λ } together with q L λ > min{1, e β λ −1 } which implies that the sequence a (q L λ ) n n∈N is strictly positive, strictly increasing and grows to infinity faster than exponentially, cf. (P3b). Hence, B L λ,X 0 ,n ≥ exp{a (q L λ ) n · X 0 } > 1, lim n→∞ B L λ,X 0 ,n = ∞ as well as lim n→∞ 1 n log B L λ,X 0 ,n = ∞. If max{0, β λ } < q L λ ≤ min{1, e β λ −1 }, then a (q L λ ) n n∈N is strictly positive, strictly increasing and converges to P3a)). This carries over to the sequence b is strictly increasing and converges to p L λ · e x (q L λ ) 0 − α λ > 0, leading to B L λ,X 0 ,n > 1 for all n ∈ N, to lim n→∞ B L λ,X 0 ,n = ∞ as well as to lim n→∞ It remains to look at the cases where p L λ , q L λ satisfy (35), and (56) with two strict inequalities. For this situation, one gets is strictly positive, strictly increasing and-iff q L λ ≤ min{1, e β λ −1 }-convergent namely to the smallest positive solution x is strictly increasing, strictly positive since b , cf (P7). Hence, under the assumptions (35) and p L λ > max{0, α λ } ∧ q L λ > max{0, β λ } the corresponding lower bounds B L λ,X 0 ,n of the Hellinger integral H λ (P A,n ||P H,n ) fulfill for all X 0 ∈ N • B L λ,X 0 ,n > 1 for all n ∈ N, • lim n→∞ B L λ,X 0 ,n = ∞, − α λ > 0 for the case q L λ ∈ max{0, β λ }, min{1, e β λ −1 } , respectively lim n→∞ 1 n log B L λ,X 0 ,n = ∞ for the remaining case q L λ > min{1, e β λ −1 }.
Let us now undertake the desired detailed investigations on lower and upper bounds of the Hellinger integrals H λ (P A,n ||P H,n ) of order λ ∈ R\[0, 1], for the various different subclasses of P SP \P SP,1 .

Lower Bounds for the Cases
Within such a constellation, where P SP,3b : However, x * / ∈ N 0 , which implies φ λ (x) > 0 for all x on the relevant subdomain N 0 .

Lower Bounds for the Cases
Since in this subcase one has P SP,3c : (49)) and thus φ λ (x * ) = 0 for x * ∈ N, there do not exist parameters p L λ , q L λ such that (35) and (56) are satisfied. The only parameter pair that ensures exp a ≥ 1 for all n ∈ N and all X 0 ∈ N within our proposed method, is the choice p L λ = α λ , q L λ = β λ . Consequently, B L λ,X 0 ,n ≡ 1, which coincides with the general lower bound (11) but violates the above-mentioned desired Goal (G1 ). However, in some constellations there exist nonnegative parameters p L λ < α λ , q L λ > β λ or p L λ > α λ , q L λ < β λ , such that at least the parts (c) and (d) of Proposition 12 are satisfied. As in Section 3.19 above, by using a conceptually different method (without p L λ , q L λ ) we prove in Appendix A.1 that for all λ ∈ R\[0, 1], all observation times n ∈ N and all initial population sizes X 0 ∈ N there holds 1 < H λ (P A,n ||P H,n ) and lim n→∞ H λ (P A,n ||P H,n ) = ∞ .
The only choice of parameters p L λ , q L λ which fulfill (35) and exp a ≥ 1 for all n ∈ N and all X 0 ∈ N, is the choice p L λ = α λ as well as q L λ = β λ = β • , where β • stands for both (equal) β H and β A . Of course, this leads to B L λ,X 0 ,n ≡ 1, which is consistent with the general lower bound (11), but violates the above-mentioned desired Goal (G1 ). Nevertheless, in Appendix A.1 we prove the following Proposition 13. For all (β A , β H , α A , α H , λ) ∈ P SP,4a × R\[0, 1] there exist parameters p L λ > α λ (not necessarily satisfying p L λ ≥ 0) and 0 < q L λ < β λ = β • < min{1, e β • −1 } = e β • −1 such that (35) holds for all x ∈ [0, ∞[ and such that for all initial population sizes X 0 ∈ N the parts (c) and (d) of Proposition 12 hold true. (49)), the assertions preceding Proposition 13 remain valid. However, the proof of Proposition 13 in Appendix A.1 contains details which explain why it cannot be carried over to the current case P SP,4b . Thus, the generally valid lower bound B L λ,X 0 ,n ≡ 1 cannot be improved with our methods.

Concluding Remarks on Alternative Lower Bounds for all Cases
To achieve the Goals (G1 ) to (G3 ), in the above-mentioned investigations about lower bounds of the Hellinger integral H λ (P A,n ||P H,n ), λ ∈ R\[0, 1], we have mainly focused on parameters p L λ , q L λ which satisfy (35) and additionally (56). Nevertheless, Theorem 1 (b) gives lower bounds B L λ,X 0 ,n whenever (35) is fulfilled. However, this lower bound can be the trivial one, B L λ,X 0 ,n ≡ 1. Let us remark here that for the parameter constellations (β A , β H , α A , α H , λ) ∈ P SP,2 × R [0, 1] ∪ I SP,2 ∪ P SP,3a × R [0, 1] ∪ I SP,3a ∪ P SP,3b × R [0, 1] ∪ I SP,3b one can prove that there exist p L λ , q L λ which satisfy (35) for all x ∈ N 0 as well as the condition (generalizing (56)) (where at least one of the inequalities is strict) , and that for such p L λ , q L λ one gets the validity of H λ (P A,n ||P H,n ) ≥ B L λ,X 0 ,n = B λ,X 0 ,n > 1 for all X 0 ∈ N and all n ∈ N; consequently, Goal (G1 ) is achieved. However, in these parameter constellations it can unpleasantly happen that n → B L λ,X 0 ,n is oscillating (in contrast to the monotone behaviour in the Propositions 11 (b), 12 (b)).
As a final general remark, let us mention that the functions φ tan λ,y (·), φ sec λ,k (·), φ hor λ (·), φ λ (·) -defined in (52)- (54) and Properties 3 (P20)-constitute linear lower bounds for φ λ (·) on the domain N 0 in the case λ ∈ R\[0, 1]. Their parameters p L λ ∈ p tan λ,y , p sec λ,y , p hor λ,y , p λ and q L λ ∈ q tan λ,y , q sec λ,y , q hor λ,y , q λ lead to lower bounds B L λ,X 0 ,n of the Hellinger integrals that may or may not be consistent with Goals (G1 ) to (G3 ), and which may be possibly better respectively weaker respectively incomparable with the previous lower bounds when adding some relaxation of (G1 ), such as e.g., the validity of H λ (P A,n ||P H,n ) > 1 for all but finitely many n ∈ N.

Upper Bounds for the Cases
For the cases λ ∈ R\[0, 1], the investigation of upper bounds for the Hellinger integral H λ (P A,n ||P H,n ) is much easier than the above-mentioned derivations of lower bounds. In fact, we face a situation which is similar to the lower-bounds-studies for the cases λ ∈]0, 1[ : due to Properties 3 (P19), the function φ λ (·) is strictly convex on the nonnegative real line. Furthermore, it is asymptotically linear, as stated in (P20). The monotonicity Properties 2 (P10) to (P12) imply that for the tightest upper bound (within our framework) one should use the parameters p U , if λ ∈ [λ − , λ + ], cf. Lemma 1 (a)), and otherwise it diverges to ∞ faster than exponentially (cf. (P3b)). If β A = β H (i.e., if (β A , β H , α A , α H ) ∈ P SP,4 = P SP,4a ∪ P SP,4b ), then one gets q U λ = β λ and a for all n ∈ N (cf. (P2)). Altogether, this leads to Proposition 14. For all (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 ) × (R\[0, 1]) and all initial population sizes X 0 ∈ N there holds with p U λ := α λ λ,X 0 ,n n∈N of upper bounds for H λ (P A,n ||P H,n ) given by is strictly increasing.

A First Basic Result
For orders λ ∈ R\{0, 1}, all the results of the previous Section 3 carry correspondingly over from the Hellinger integrals H λ (·||·) to the total variation distance V(·||·), by virtue of the relation (cf. (12) in the following, we concentrate on the latter. In particular, the above-mentioned carrying-over procedure leads to bounds on I λ (P A ||P H ) which are tighter than the general rudimentary bounds (cf. (10) and (11)) Because power divergences have a very insightful interpretation as "directed distances" between two probability distributions (e.g., within our running-example epidemiological context), and function as important tools in statistics, information theory, machine learning, and artificial intelligence, we present explicitly the outcoming exact values respectively bounds of I λ (P A ||P H ) (λ ∈ R\{0, 1}, n ∈ N), in the current and the following subsections. For this, recall the case-dependent parameters To begin with, we can deduce from Theorem 1 Theorem 2.
In order to deduce the subsequent detailed recursive analyses of power divergences, we also employ the obvious relations lim n→∞ 1 n log 1 as well as for λ ∈ R\[0, 1] (provided that lim inf n→∞ H λ (P A,n ||P H,n ) > 1).

Detailed Analyses of the Exact
the sequence (I λ (P A,n ||P H,n )) n∈N given by (e) the map X 0 → V I λ,X 0 ,n is strictly increasing.

Corollary 3. For all (β
the sequence (I λ (P A,n ||P H,n )) n∈N given by the map X 0 → V I λ,X 0 ,n is strictly increasing.
In the assertions (a), (b), (d) of the Corollaries 4 and 5 the fraction α A /β A can be equivalently replaced by α H /β H .
Let us now derive the corresponding detailed results for the bounds of the power divergences for the parameter cases P SP \P SP,1 , where the Hellinger integral, and thus I λ (P A,n ||P H,n ), cannot be determined exactly. The extensive discussion on the Hellinger-integral bounds in the Sections 3.4-3.13, as well as in the Sections 3.16-3.24 can be carried over directly to obtain power-divergence bounds. In the following, we summarize the outcoming key results, referring a detailed discussion on the possible choices of to the corresponding above-mentioned subsections.
As in Section 3.12, for the parameter setup (β A , β H , α A , α H , λ) ∈ P SP,4b ×]0, 1[ we cannot derive a lower bound for the power divergences which improves the generally valid lower bound I λ (P A,n ||P H,n ) ≥ 0 (cf. (10)) by employing our proposed (p U λ , q U λ )-method.

Upper Bounds of I λ (·||·) for the Cases (β
Since in this setup the upper bounds of the power divergences can be derived from the lower bounds of the Hellinger integrals, we here appropriately adapt the results of Proposition 6.

sequence of upper bounds B I,U
λ,X 0 ,n n∈N for I λ (P A,n ||P H,n ) given by the map X 0 → B I,U λ,X 0 ,n is strictly increasing.  (58)) which satisfy (35) for all x ∈ N 0 and (56), whereas for the constellations (P SP,3a × I SP,3a )∪(P SP,3b × I SP,3b ) we have proved the existence of parameters p L λ , q L λ satisfying both (35) for all x ∈ N 0 and (56) with two strict inequalities. Subsuming this, we obtain Corollary 9. For all (β A , β H , α A , α H , λ) ∈ (P SP,2 × I SP,2 )∪(P SP,3a × I SP,3a )∪(P SP,3b × I SP,3b ) there exist parameters p L λ , q L λ which satisfy max{0, (35) for all x ∈ N 0 , and for all such pairs (p L λ , q L λ ) and all initial population sizes X 0 ∈ N one gets λ,X 0 ,n n∈N of lower bounds for I λ (P A,n ||P H,n ) given by (e) the map X 0 → B I,L λ,X 0 ,n is strictly increasing.
Analogously to the discussions in the Sections 3.17-3.20, for the parameter setups P SP,2 × and for all initial population sizes X 0 ∈ N one can still show 0 < I λ (P A,n ||P H,n ) , and lim n→∞ I λ (P A,n ||P H,n ) = ∞ .
For the penultimate case we obtain (35) is satisfied for all x ∈ [0, ∞[ and such that for all initial population sizes X 0 ∈ N at least the parts (c) and (d) of Corollary 9 hold true.

Upper Bounds of I λ (·||·) for the Cases (β
For these constellations we adapt Proposition 14, which after modulation becomes 1]) and all initial population sizes X 0 ∈ N there holds with p U λ := α λ A α 1−λ H and q U λ := β λ λ,X 0 ,n n∈N of upper bounds for I λ (P A,n ||P H,n ) given by the map X 0 → B I,U λ,X 0 ,n is strictly increasing.

Applications to Bayesian Decision Making
As explained in Section 2.5, the power divergences fulfill and and thus can be interpreted as (i) weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GWI-path X n until stage n, and as (ii) limit decision risk reduction (limit statistical information measure). Hence, by combining (21) and (22) with the investigations in the previous Sections 4.1-4.6, we obtain exact recursive values respectively recursive bounds of the above-mentioned decision risk reductions. For the sake of brevity, we omit the details here.

Exact Values Respectively Upper Bounds of I(·||·)
From (2), (3) and (6) in Section 2.4, one can immediately see that the Kullback-Leibler information divergence (relative entropy) between two competing Galton-Watson processes without/with immigration can be obtained by the limit and the reverse Kullback-Leibler information divergence (reverse relative entropy) by I (P H,n ||P A,n ) = lim λ 0 I λ (P A,n ||P H,n ). Hence, in the following we concentrate only on (68), the reverse case works analogously. Accordingly, we can use (68) in appropriate combination with the λ∈]0, 1[-parts of the previous Section 4 (respectively, the corresponding parts of Section 3) in order to obtain detailed analyses for I (P H,n ||P A,n ). Let us start with the following assertions on exact values respectively upper bounds, which will be proved in Appendix A.2: (a) For all (β A , β H , α A , α H ) ∈ (P NI ∪ P SP,1 ), all initial population sizes X 0 ∈ N and all observation horizons n ∈ N the Kullback-Leibler information divergence (relative entropy) is given by I(P A,n ||P H,n ) = I X 0 ,n := For all (β A , β H , α A , α H ) ∈ P SP \P SP,1 , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N there holds I(P A,n ||P H,n ) ≤ E U X 0 ,n , where

Remark 5.
(i) Notice that the exact values respectively upper bounds are in closed form (rather than in recursive form).
(ii) The n−behaviour of (the bounds of) the Kullback-Leibler information divergence/relative entropy I(P A,n ||P H,n ) in Theorem 3 is influenced by the following facts: (b) In the case β A = 1 of (70), there holds

Lower Bounds of I(·||·) for the Cases (β
Again by using (68) in appropriate combination with the "λ∈]0, 1[-parts" of the previous Section 4 (respectively, the corresponding parts of Section 3), we obtain the following (semi-)closed-form lower bounds of I (P H,n ||P A,n ): 1 , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N I(P A,n ||P H,n ) ≥ E L X 0 ,n := sup k∈N 0 , y∈[0,∞[ E L,tan y,X 0 ,n , E L,sec k,X 0 ,n , E L,hor where for all y ∈ [0, ∞[ we define the -possibly negatively valued-finite bound component E L,tan y,X 0 ,n := and for all k ∈ N 0 the -possibly negatively valued-finite bound component Furthermore, on P SP,4 we set E L,hor X 0 ,n := 0 for all n ∈ N whereas on P SP \(P SP,1 ∪ P SP,4 ) we define E L,hor X 0 ,n : . On P SP \(P SP,1 ∪ P SP,3c ) one even gets E L X 0 ,n > 0 for all X 0 ∈ N and all n ∈ N. For the subcase P SP,3c , one obtains for each fixed n ∈ N and each fixed X 0 ∈ N the strict positivity E L X 0 ,n > 0 if ∂ ∂y E L,tan y,n (y * ) = 0, where y * := α A −α H β H −β A ∈ N and hence ∂ ∂y E L,tan y,X 0 ,n (y * ) A proof of this theorem is given in in Appendix A.2.
It seems that the optimization problem in (71) admits in general only an implicitly representable solution, and thus we have used the prefix "(semi-)" above. Of course, as a less tight but less involved explicit lower bound of the Kullback-Leibler information divergence (relative entropy) I(P A,n ||P H,n ) one can use any term of the form max E L,tan y,X 0 ,n , E L,sec k,X 0 ,n , E L,hor X 0 ,n (y ∈ [0, ∞[, k ∈ N 0 ), as well as the following Corollary 12. (a) For all (β A , β H , α A , α H ) ∈ P SP \P SP,1 , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N I(P A,n ||P H,n ) ≥ E L X 0 ,n ≥ E L X 0 ,n := max E L,tan ∞,X 0 ,n , E L,sec 0,X 0 ,n , E L,hor and -possibly negatively valued-finite bound component For the cases P SP,2 ∪ P SP,3a ∪ P SP,3b one gets even E L X 0 ,n > 0 for all X 0 ∈ N and all n ∈ N.

Applications to Bayesian Decision Making
As explained in Section 2.5, the Kullback-Leibler information divergence fulfills and thus can be interpreted as weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GWI-path X n until stage n. Hence, by combining (21) with the investigations in the previous Sections 5.1 and 5.2, we obtain exact values respectively bounds of the above-mentioned decision risk reductions. For the sake of brevity, we omit the details here.

Principal Approach
Depending on the parameter constellation (β A , β H , α A , α H , λ) ∈ P × (R\{0, 1}), for the Hellinger integrals H λ (P A,n ||P H,n ) we have derived in Section 3 corresponding lower/upper bounds respectively exact values-of recursive nature-which can be obtained by choosing appropriate by the linear transformation (38). Both sequences are "stepwise fully evaluable" but generally seem not to admit a closed-form representation in the observation horizons n; consequently, the time-evolution n → H λ (P A,n ||P H,n )-respectively the time-evolution of the corresponding recursive bounds-can generally not be seen explicitly. On order to avoid this intransparency (at the expense of losing some precision) one can approximate (36) by a recursion that allows for a closed-form representation; by the way, this will also turn out to be useful for investigations concerning diffusion limits (cf. the next Section 7).
To explain the basic underlying principle, let us first assume some general q ∈]0, β λ [ and λ ∈]0, 1[. With Properties 1 (P1) we see that the sequence a (q) n n∈N is strictly negative, strictly decreasing and converges to x (q) 0 ∈] − β λ , q − β λ [. Recall that this sequence is obtained by the recursive application of the function ξ (36)). As a first step, we want to approximate ξ (q) λ (·) by a linear function on the interval x (q) 0 , 0 . Due to convexity (P9), this is done by using the tangent line of ξ as a linear lower bound, and the secant line of ξ (q) λ (·) across its arguments 0 and x as a linear upper bound. With the help of these functions, we can define the linear recursions a (q),T 0 as well as a (q),S 0 In the following, we will refer to these sequences as the rudimentary closed-form sequence-bounds. Clearly, both sequences are strictly negative (on N), strictly decreasing, and one gets the sandwiching for all n ∈ N, with equality on the right side iff n = 1 (where a (q) Furthermore, such linear recursions allow for a closed-form representation, namely where the " * " stands for either S or T. Notice that this representation is valid due to d (q),T , d (q),S ∈]0, 1[. So far, we have considered the case q ∈]0, β λ [. If q = β λ , then one can see from Properties 1 (P2) that a (q) n ≡ 0, which is also an explicitly given (though trivial) sequence. For the remaining case, where q > β λ and thus ξ (q) , we want to exclude q ≥ min 1 , e β λ −1 for the following reasons. Firstly, if q > min 1 , e β λ −1 , then from (P3) we see that the sequence a (q) n n∈N is strictly increasing and divergent to ∞, at a rate faster than exponentially (P3b); but a linear recursion is too weak to approximate such a growth pattern. Secondly, if q = min 1 , e β λ −1 , then one necessarily gets q = e β λ −1 < 1 (since we have required q > β λ , and otherwise one obtains the contradiction . This means that the function ξ  78)). This is due to the fact that the tangent line ξ (q),T λ (·) is in the current case equivalent with the straight line id(·). Consequently, (81) would not be satisfied.
In a second step, we want to improve the above-mentioned linear (lower and upper) approximations of the sequence a (q) n by reducing the faced error within each iteration. To do so, in both cases of lower and upper approximates we shall employ context-adapted linear inhomogeneous difference equations of the form a 0 := 0 ; a n := ξ ( a n−1 ) + ρ n−1 , n ∈ N, with for some constants c ∈ R, d ∈]0, 1[, K 1 , K 2 , κ, ν ∈ R with 0 ≤ ν < κ ≤ d. This will be applied to c := c (q),S , c := c (q),T , d := d (q),S and d := d (q),T later on. Meanwhile, let us first present some facts and expressions which are insightful for further formulations and analyses.

Lemma 2.
Consider the sequence ( a n ) n∈N 0 defined in (83) to (85). If 0 ≤ ν < κ < d, then one gets the closed-form representation a n = a hom n + c n with a hom which leads for all n ∈ N to n ∑ k=1 a k = If 0 ≤ ν < κ = d, then one gets the closed-form representation a n = a hom n + c n with a hom n = c · 1 − d n 1 − d and c n = K 1 · n · d n−1 which leads for all n ∈ N to Lemma 2 will be proved in Appendix A.3. Notice that (88) is consistent with taking the limit κ d in (86). Furthermore, for the special case K 2 = −K 1 > 0 one has from (85) for all integers n ≥ 2 the relation ρ n−1 < 0 and thus a n − a hom n < 0, leading to c n < 0 and n ∑ k=1 c n < 0 .
Lemma 2 gives explicit expressions for a linear inhomogeneous recursion of the form (83) possessing the extra term given by (85). Therefrom we derive lower and upper bounds for the sequence a . Moreover, our concrete approximation-error-reducing "correction terms" ρ n will have different form, depending on whether 0 < q < β λ or q > max{0, β λ }. In both cases, we express ρ n by means of the slopes d (q), (77)), as well as in terms of the parameters Γ (q) In detail, let us first define the lower approximate by a (q) 0 where ρ (q) The upper approximate is defined by where ρ (q) In terms of (85), we use for ρ (q) n the constants K 2 = ν = 0 as well as n we shall employ the constants −K 1 = K 2 = Γ (q) ,T := qe x (q) 0 and from (77) c (q),S := q − β λ , d (q),S := In the following, we will refer to the sequences a (q) n resp. a (q) n as the improved closed-form sequence-bounds. Putting all ingredients together, we arrive at the (a) in the case 0 < q < β λ : with equality on the right-hand side iff n = 1, where a (q) in the case max{0, β λ } < q < min 1 , e β λ −1 : n , for all n ∈ N, with equality on the right-hand side iff n = 1, where a (q)  A detailed proof of Lemma 3 is provided in Appendix A.3. In the following, we employ the above-mentioned investigations in order to derive the desired closed-form bounds of the Hellinger integrals H λ (P A,n ||P H,n ).

Explicit Closed-Form Bounds for the Cases
Recall that in this setup, we have obtained the recursive, non-explicit exact values V λ,X 0 ,n = H λ (P A,n ||P H,n ) given in (39) of Theorem 1, where we used . This-together with (39) from Theorem 1, Lemma 2 and with the quantities d (q),T , d (q),S , Γ (q) < and Γ (q) > as defined in (76) and (77) resp. (91) -leads to , all initial population sizes X 0 ∈ N and for all observation horizons n ∈ N the following assertions hold true: (a) the Hellinger integral can be bounded by the closed-form lower and upper bounds where the involved closed-form lower bounds are defined by and the closed-form upper bounds are defined by where in the case λ ∈]0, 1[ and where in the case Notice that α A β A can be equivalently be replaced by α H β H in (96) and in (97).

Explicit Closed-Form
Let us start with closed-form lower bounds for the case λ ∈]0, 1[; recall that the choice p L λ = α λ A α 1−λ H , q L λ = β λ A β 1−λ H led to the optimal recursive lower bounds B L λ,X 0 ,n of the Hellinger integral (cf. Theorem 1(b) and Section 3.5). Correspondingly, we can derive Then, the following assertions hold true: (a) For all (β A , β H , α A , α H , λ) ∈ P SP,2 ∪ P SP,3a ∪ P SP,3b ∪ P SP,3c ×]0, 1[ (for which particularly 0 < q L λ < β λ , β A = β H ), all initial population sizes X 0 ∈ N and all observation horizons n ∈ N there holds , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N there holds (c) For all (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 )×]0, 1[ and all initial population sizes X 0 ∈ N one gets where in the case β A = β H there holds q L λ = β λ and x The proof will be provided in Appendix A.3. In order to deduce closed-form upper bounds for the case λ ∈]0, 1[, we first recall from the Sections 3.6-3.13, that we have to employ suitable parameters (35). Notice that we automatically obtain for all x ∈ N 0 and additionally either 0 < q U λ ≤ β λ or β λ < q U λ < min{1, e β λ −1 }, all initial population sizes X 0 ∈ N and all observation horizons n ∈ N the following assertions hold true: (a) in the case 0 < q U λ < β λ one has (112) furthermore, whenever p U λ , q U λ satisfy additionally (47) such parameters exist particularly in the setups P SP,2 ∪ P SP,3a ∪ P SP,3b , cf. Sections 3.7-3.9 , then (c) in the case β λ < q U λ < min 1 , e β λ −1 the formulas (109) and (110) remain valid, but with (d) for all cases (a) to (c) one gets where in the case q U λ = β λ there holds x This Theorem 7 will be proved in Appendix A.3. Notice that for an inadequate choice of p U λ , q U λ it may hold that
To derive closed-form upper bounds of the recursive upper bounds B U λ,X 0 ,n of the Hellinger integral in the case λ ∈ R\[0, 1] , let us first recall from Section 3.24 that we have to use the parameters = 0 for all n ∈ N and for all λ ∈ R\[0, 1]. Correspondingly, we deduce Then, the following assertions hold true: , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N there holds , all initial population sizes X 0 ∈ N and all observation horizons n ∈ N there holds 1] ) and all initial population sizes X 0 ∈ N one gets A proof of Theorem 9 is provided in Appendix A.3.   λ,X 0 ,n from (42) leads to the "improved" closed-form bounds C (p,q),L λ,X 0 ,n resp. C (p,q),U λ,X 0 ,n in all the Theorems 5-9.

Closed-Form Bounds for Power Divergences of Non-Kullback-Leibler-Information-Divergence Type
Analogously to Section 4 (see especially Section 4.1), for orders λ ∈ R\{0, 1} all the results of the previous Sections 6.1-6.5 carry correspondingly over from closed-form bounds of the Hellinger integrals H λ (·||·) to closed-form bounds of the total variation distance V(·||·), by virtue of the relation (cf. (12)) to closed-form bounds of the Renyi divergences R λ (·||·), by virtue of the relation (cf. (7)) as well as to closed-form bounds of the power divergences I λ (·||·), by virtue of the relation (cf. (2)) For the sake of brevity, the-merely repetitive-exact details are omitted.

Applications to Decision Making
The above-mentioned investigations of the Sections 6.1 to 6.6 can be applied to the context of Section 2.5 on dichotomous decision making on the space of all possible path scenarios (path space) of Poissonian Galton-Watson processes without (with) immigration GW(I) (e.g., in combination with our running-example epidemiological context of Section 2.3). More detailed, for the minimal mean decision loss (Bayes risk) R n defined by (18) we can derive explicit closed-form upper (respectively lower) bounds by using (19) respectively (20) together with the results of the Sections 6.1-6.5 concerning Hellinger integrals of order λ ∈ ]0, 1[; we can proceed analogously in the Neyman-Pearson context in order to deduce closed-form bounds of type II error probabilities, by means of (23) and (24). Moreover, in an analogous way we can employ the investigations of Section 6.6 on power divergences in order to obtain closed-form bounds of (i) the corresponding (cf. (21)) weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GW(I)-path X n until stage n, as well as (ii) the corresponding (cf. (22)) limit decision risk reduction (limit statistical information measure). For the sake of brevity, the-merely repetitive-exact details are omitted.
In order to make the above-mentioned limit procedure rigorous, it is reasonable to work with appropriate approximations such that in each convergence step m one faces the setup P NI ∪ P SP,1 (i.e., the non-immigration or the equal-fraction case), where the corresponding Hellinger integral can be calculated exactly in a recursive way, as stated in Theorem 1. Let us explain the details in the following. (Ω, F ), where as above the subscript • stands for either the hypothesis H or the alternative A.
Analogously to (1), we use for each fixed step m ∈ N the representation X (m) := X (m) , ∈ N with where under the law P Here and henceforth, we always assume that the approximation step m is large enough to ensure that β (m) • ∈]0, 1] and at least one of β (m) H is strictly less than 1; this will be abbreviated by m ∈ N. Let us point out that -as mentioned above-our choice entails the best-to-handle setup P NI ∪ P SP,1 (which does not happen if instead of η one uses η • with η A = η H ). Based on the GW(I) X (m) , let us construct the continuous-time branching process X (m) := X where W • s , s ∈ [0, ∞[ denotes a standard Brownian motion with respect to the limit probability measure P • . The corresponding proof of Theorem 10-which is outlined in Appendix A.4-is an adaption of the proof of Theorem 9.1.3 in Ethier & Kurtz [138] which deals with drift-parameters η = 0, κ • = 0 in the SDE (133) whose solution is approached on a σ−independent time scale by a sequence of (critical) Galton-Watson processes without immigration but with general offspring distribution with mean 1 and variance σ. Notice that due to (131) the latter is inconsistent with our Poissonian setup, but this is compensated by our chosen σ−dependent time scale. Other limit investigations for (133) involving offspring/immigration distributions and parametrizations which are also incompatible to ours, are e.g., treated in Sriram [142].

Remark 8. Notice that the condition
As illustration of our proposed approach, let us give the following Example 3. Consider the parameter setup (η, κ • , σ) = (5, 2, 0.4) and initial generation size X 0 = 3. Figure 4 shows the diffusion-approximation X . The "long-term mean" of the limit process X s is η κ • = 2.5 and is indicated as red line. The "long-term mean" of the approximations X  It is easy to see that H λ P
Notice that η = 0 corresponds to the no-immigration (NI) case and that α (m) as well as the connected sequence a with η = 0 in the NI case.
In the following, we employ the SDE-parameter constellations (which are consistent with (131) in combination with our requirement to work here only on (P NI ∪ P SP,1 )) Due to the-not in closed-form representable-recursive nature of the sequences a (q) n n∈N defined by (36), the calculation of lim m→∞ h (m) λ in (135) seems to be not (straightforwardly) tractable; after all, one "has to move along" a sequence of recursions (roughly speaking) since σ 2 mt → ∞ as m tends to infinity. One way to "circumvent" such technical problems is to compute instead of the Then, for all (κ A , κ H , η, λ) ∈ ( P NI ∪ P SP, 1] there holds for all sufficiently large m ∈ N q (m) λ and thus the sequence a λ · e x (m) 0 . By the above considerations, the Theorem 5 (together with Remark 7(a)) adapts to the current setup as follows: where for the (sub)case of all λ ∈]0, 1[ and all t ≥ 0 and for the remaining (sub)case of all λ ∈ λ − , λ + [0, 1] and all t ≥ 0 Notice that the components L  (a) For all (κ A , κ H , η, λ) ∈ P N I × λ − , λ + \{0, 1} the Hellinger integral limit converges to (b) For all (κ A , κ H , η, λ) ∈ P SP,1 × λ − , λ + \{0, 1} the Hellinger integral limit possesses the asymptotical behaviour The assertions of Corollary 15 follow immediately by inspecting the expressions in the exponential of (153) and (154) in combination with (155) to (162).

Bounds of Power Divergences for Diffusion Approximations
Analogously to Section 4 (see especially Section 4.1), for orders λ ∈ R\{0, 1} all the results of the previous Section 7.2 carry correspondingly over from (limits of) bounds of the Hellinger integrals H λ P (by virtue of (12)), to (limits of) bounds of the Renyi divergences (by virtue of (7)) as well as to (limits of) bounds of the power divergences (2)). For the sake of brevity, the-merely repetitive-exact details are omitted. Moreover, by combining the outcoming results on the above-mentioned power divergences with parts of the Bayesian-decision-making context of Section 2.5, we obtain corresponding assertions on (i) the (cf. (21)) weighted-average decision risk reduction (weighted-average statistical information measure) about the degree of evidence deg concerning the parameter θ that can be attained by observing the GWI-path X n until stage n, as well as (ii) the (cf. (22)) limit decision risk reduction (limit statistical information measure).
In the following, let us concentrate on the derivation of the Kullback-Leibler information divergence KL (relative entropy) within the current diffusion-limit framework. Notice that altogether we face two limit procedures simultaneously: by the first limit lim λ↑1 I λ P This immediately leads to the following

Remark 9.
In Appendix A. 4 we shall see that the proof of the last (limit-interchange concerning) equality in (163) relies heavily on the use of the extra terms L (1) (153) and (154). Recall that these terms ultimately stem from (manipulations of) the corresponding parts of the "improved closed-form bounds" in Theorem 5, which were derived by using the linear inhomogeneous difference equations a (q) n resp. a resp. (79)) as explicit approximates of the sequence a (q) n . Not only this fact shows the importance of this more tedious approach.
Interesting comparisons of the above-mentioned results in Sections 7.2 and 7.3 with corresponding information measures of the solutions of the SDE (129) themselves (rather their branching approximations), can be found in Kammerer [157].

Applications to Decision Making
Analogously to Section 6.7, the above-mentioned investigations of the Sections 7.1-7.3 can be applied to the context of Section 2.5 on dichotomous decision making about GW(I)-type diffusion approximations of solutions of the stochastic differential Equation (129). For the sake of brevity, the-merely repetitive-exact details are omitted.

Appendix A. Proofs and Auxiliary Lemmas
Appendix A.1. Proofs and Auxiliary Lemmas for Section 3 Lemma A1. For all real numbers x, y, z > 0 and all λ ∈ R one has with equality in the cases λ ∈ R\{0, 1} iff x y = z.
Proof of Properties 1. Property (P9) is trivially valid. To show (P1) we assume 0 < q < β λ , which implies a (q) By induction, (a n ) n∈N is strictly negative and strictly decreasing. As stated in (P9), the function ξ (q) λ is strictly increasing, strictly convex and converges to −β λ for x → −∞. Thus, it hits the straight line id(x) = x once and only once on the negative real line at x (q) (44)). This implies that the sequence a converges to x (q) 0 ∈] − β λ , q − β λ [. Property (P2) follows immediately. In order to prove (P3), let us fix q > max{0, β λ }, implying a (q) notice that in this setup, the special choice q = 1 implies min{1, e β λ −1 } = e β λ −1 < q. By induction, a   λ (x) = q · e x − 1, and therefore h (q) λ (− log q) = 1 − β λ + log q, which is less or equal to zero iff q ≤ e β λ −1 . It remains to show that for q > β λ and q > min 1 , e β λ −1 the sequence a (q) n n∈N grows faster than exponentially, i.e., there do not exist constants c 1 , c 2 ∈ R such that a (q) n ≤ e c 1 +c 2 n for all n ∈ N. We already know that (in the current case) a (q) n n→∞ −→ ∞. Notice that it is sufficient to verify lim sup n→∞ log(a For the case β λ ≥ 0 the latter is obtained by An analogous consideration works out for the case β λ < 0. Property (P4) is trivial, and (P5) to (P8) are direct implications of the already proven properties (P1) to (P4). Lemma A1). Below, we follow the lines of Linkov & Lunyova [53], appropriately adapted to our context. We have to find those λ ∈ R\]0, 1[ for which the following two conditions hold: P3a)), which is equivalent with the existence of a-positive, if (i) is satisfied,-solution of the equation ξ (q λ ) λ (x) = x. Notice that the case q λ = 1, λ ∈ R\[0, 1], cannot appear in (i), provided that (ii) holds (since due to Lemma A1 e β λ −1 < e q λ −1 = 1). For (i), it is easy to check that we have to require
Suppose that λ < 0. Case 1: If β H = 1, then condition (ii) is not satisfied whenever β A = β H , since the right side of (A2) is equal to zero and the left side is strictly greater than zero. Hence, λ − = 0.
On the other hand, incorporating the discussion of the function h β (·), we see that . We claim thatλ <λ and conclude that the conditions (i) and (ii) are not fulfilled jointly, which leads to λ − = 0. To see this, we notice that due to 1 < β H < β A we get log(β A )/(β A − 1) < log(β H )/(β H − 1) and thus The representation of λ + follows straightforwardly from the λ − -result and the skew symmetry  (51). For the parameter constellation in Section 3.10, we employ as upper bound for
Notice that the above proof method of formula (51) does not work for the parameter setup in Section 3.11, because there one gets Proof of Proposition 9. In the setup (β A , β H , α A , α H , λ) ∈ P SP,4a ×]0, 1[ we require β • := β A = β H < 1. As a linear upper bound for φ λ (·), we employ the tangent line at y ≥ 0 (cf. (52)) Since in the current setup P SP,4a the function φ λ (·) is strictly increasing, the slope φ λ (y) of the tangent line at y is positive. Thus we have q y > β λ and Properties 1 (P3) implies that the sequence a (q y ) n n∈N is strictly increasing and converges to x is the smallest solution of the equation ξ (q y ) λ (x) = q y · e x − β • = x. Since q y β • for y → ∞ (cf. Properties 3 (P18)) and additionally e β • −1 > β • , there exists a large enough y ≥ 0 such that the sequence a where the latter exists for y ≥ y 1 (say). Notice that since Q (q y ) For sufficiently large y ≥ y 2 ≥ y 1 (say), we easily obtain the smaller solution of Q (q y ) where the expression in the root is positive since q y β • for y → ∞. We now have Hence, it suffices to show that h(y) < 0 for some y ≥ y 2 . We recall from Properties 3 (P15), (P17) and (P19) that which immediately implies lim y→∞ φ λ (y) = lim y→∞ φ λ (y) = lim y→∞ φ λ (y) = 0 and with l'Hospital's rule The formulas (A5), (A7) and (A9) imply the limits lim y→∞ p y = α λ , lim y→∞ q y = β • , lim y→∞ x (q y ) 0 = 0. Notice that p y < α λ holds trivially for all y ≥ 0 since the intercept (p y −α λ ) of the tangent line φ tan λ,y (·) is negative. Incorporating (A8) we therefore obtain lim y→∞ h(y) ≤ lim y→∞ h(y) = 0. As mentioned before, for the proof it is sufficient to show that h(y) < 0 for some y ≥ y 2 . This holds true if lim y→∞ y · h(y) < 0. To verify this, notice first that from (A5), (A7) and (A8) we get Finally we obtain with (A10) Proof of Corollary 1. Part (a) follows directly from Proposition 1 (a),(b) and the limit lim n→∞ H λ (P A,n ||P H,n ) = 0 in the respective part (c) of the Propositions 7, 8, 9 as well as from (51). To prove part (b), according to (26) we have to verify lim inf λ 1 {lim inf n→∞ H λ (P A,n ||P H,n )} = 1.
Notice that this method does not work for the parameter cases P SP,4a ∪ P SP,4b , since there the infimum in (A12) is equal to one.
Appendix A.2. Proofs and Auxiliary Lemmas for Section 5 We start with two lemmas which will be useful for the proof of Theorem 3. They deal with the sequence a (q λ ) n n∈N from (36).
Proof. This can be easily seen by induction: for n = 1 there clearly holds Assume now that lim λ 1 a (q λ ) k = 0 holds for all k ∈ N, k ≤ n − 1, then Lemma A3. In addition to the assumptions of Lemma A2, suppose that λ → q λ is continuously differentiable on ]0, 1[ and that the limit l := lim λ 1 ∂ q λ ∂λ is finite. Then, for all n ∈ N one obtains which is the unique solution of the linear recursion equation Furthermore, for all n ∈ N there holds Proof. Clearly, u n defined by (A22) is the unique solution of (A23). We prove by induction that lim λ 1 ∂ a (q λ ) n ∂λ = u n holds. For n = 1 one gets Suppose now that (A22) holds for all k ∈ N, k ≤ n − 1. Then, by incorporating (A21) we obtain The remaining assertions follow immediately.
We are now ready to give the Proof of Theorem 3. (a) Recall that for the setup (β A , β H , α A , α H ) ∈ (P NI ∪ P SP,1 ) we chose the intercept as p λ := p E λ := α λ A α 1−λ H and the slope as q λ := q E λ := β λ A β 1−λ H , which in (39) lead to the exact value V λ,X 0 ,n of the Hellinger integral. Because of p λ q λ β λ − α λ = 0 as well as lim λ 1 q λ = β A , we obtain by using (38) and Lemma A2 for all X 0 ∈ N and for all n ∈ N lim λ 1 V λ,X 0 ,n := lim which leads by (68) to I(P A,n ||P H,n ) = lim λ 1 1 − H λ (P A,n ||P H,n ) For further analysis, we use the obvious derivatives where the subcase (β A , β H , α A , α H ) ∈ P NI (with p λ ≡ 0) is consistently covered. From (A25) and Lemma A3 we deduce and by means of (A21) ∀ n ∈ N : lim For the last expression in (A24) we again apply Lemma A3 to end up with which finishes the proof of part (a). To show part (b), for the corresponding setup (β A , β H , α A , α H ) ∈ P SP \P SP,1 let us first choose -according to (45) in Section 3.4-the intercept as p λ := p L λ := α λ A α 1−λ H and the slope as q λ := q L λ := β λ A β 1−λ H , which in part (b) of Proposition 6 lead to the lower bounds B L λ,X 0 ,n of the Hellinger integral. This is formally the same choice as in part (a) satisfying lim λ 1 p λ = α A , lim λ 1 q λ = β A but in contrast to (a) we now have p λ q λ β λ − α λ = 0 but nevertheless lim λ 1 p λ q λ β λ − α λ = 0.
To show the strict positivity E L X 0 ,n > 0 in the parameter case P SP,2 , we inspect the bound E L,sec 0,X 0 ,n . With α := α • := α A = α H (the bullet will be omitted in this proof) and the auxiliary variable x := β H β A > 0, the definition (73) respectively its special case (76) rewrites for all n ∈ N as E L,sec 0,X 0 ,n := E L,sec 0,X 0 ,n (x) := To prove that E L,sec 0,X 0 ,n > 0 for all X 0 ∈ N and all n ∈ N it suffices to show that E L,sec 0,X 0 ,n (1) = ∂ ∂x E L,sec 0,X 0 ,n (1) = 0 and ∂ 2 ∂x 2 E L,sec 0,X 0 ,n (x) > 0 for all x ∈]0, ∞[\{1}. The assertion E L,sec 0,X 0 ,n (1) = 0 is trivial from (A34). Moreover, we obtain which immediately yields ∂ ∂x E L,sec 0,X 0 ,n (1) = 0. For the second derivative we get ∂ 2 ∂x 2 E L,sec 0,X 0 ,n (x) = where the strict positivity of E L,sec 0,X 0 ,n in the case β A = 1 follows immediately by replacing X 0 with 0 and by using the obvious relation 1 The strict positivity in the case β A = 1 is trivial by inspection.
Finally, for the parameter case P SP,3c we consider the bound E L,tan y * ,X 0 ,n , with y * = α A −α H β H −β A . Since α A + β A y * = α H + β H y * , it is easy to see that E L,tan y * ,X 0 ,n = 0 for all n ∈ N. However, the condition ∂ ∂y E L,tan y,X 0 ,n (y * ) = 0 implies that sup y≥0 E L,tan y,X 0 ,n > 0. The explicit form (75) of this condition follows from ∂ ∂y E L,tan y,X 0 ,n (y) = y ≥ 0, by using the particular choice y = y * together with f A (y λ (0), and the latter is obviously true. Let us assume that a (q) n ≤ a (q) n holds. From this, (93), (78) and (80) we obtain Thus, there holds a n+1 . For the right-hand inequality in (i), we proceed analogously: for all x ∈]x . Thus, υ (q) λ (·) constitutes a positive functional lower bound for ξ (q) λ (·) on [0, x (q) 0 ]. Let us now prove the left-hand inequality of (i) by induction: for n = 1 we get a (q) 1 . Moreover, by assuming a (q) n ≤ a (q) n for n ∈ N, we obtain with the above-mentioned considerations and (93), (80) and (82) Hence, a (q) n+1 < a (q) n+1 . For the right-hand inequality in part (i), we define the quadratic function n log V λ,X 0 ,n = lim n→∞ 1 n log C (p E λ ,q E λ ),L λ,X 0 ,n = lim we have to use (A56)). From this, (A49) can be deduced directly; the representation (A50) comes from the expressions in the squared brackets in the last line of (A56) and from Proof of Theorem 6. The assertions follow immediately from (A45), Lemma A4(b),(e), Proposition 6(d) as well as the incorporation of the fact that for λ ∈]0, 1[ there holds q L λ = β λ A β 1−λ H < β λ in the case (β A , β H , α A , α H ) ∈ (P SP \(P SP,1 ∪ P SP,4 )) (i.e., β A = β H ) respectively q L λ = β λ in the case (β A , β H , α A , α H ) ∈ P SP,4 (i.e., β A = β H ).
Proof of Lemma A6. For each of the assertions (a) to (l), we will make use of l'Hospital's rule. To begin with, we obtain for arbitrary µ, ν ∈ R From this, the first part of (a) follows immediately and the second part is a direct consequence of the definition of β (m) λ . Part (b) can be deduced from (A71): For the proof of (c), we rely on the inequalities x (m)  127) and (128)). These solutions clearly exist in the case λ ∈]0, 1[. For sufficiently large approximations steps m ∈ N, these solutions also exist in the case λ ∈ λ − , λ + [0, 1] since (138) together with parts (a) and (b) imply To prove part (c), we show that the limits of x (m) 0 and x (m) 0 coincide. Assume first that λ ∈]0, 1[. Using (a) and (b), we obtain together with the obvious limit lim m→∞ q (m) be the adapted version of the auxiliary fixed-point lower bound defined in (125).
By incorporating lim m→∞ β (m) λ = 1 we obtain with (a) and (b) Combining (A72) and (A73), the desired result (c) follows for λ ∈]0, 1[. Assume now that λ ∈ λ − , λ + [0, 1]. In this case the approximates x (m) 0 and x (m) 0 have a different form, given in (124) and (126). However, the calculations work out in the same way: with parts (a) and (b) we get as well as as well as the formulas (A82) and (A83) for the case κ A = 0. Accordingly, we compute For the case κ A > 0, one can combine this with (A97), (A99) and (A74) to end up with