Some distance bounds of branching processes and their diffusion limits

We compute exact values respectively bounds of"distances"- in the sense of (transforms of) power divergences and relative entropy - between two discrete-time Galton-Watson branching processes with immigration GWI for which the offspring as well as the immigration is arbitrarily Poisson-distributed (leading to arbitrary type of criticality). Implications for asymptotic distinguishability behaviour in terms of contiguity and entire separation of the involved GWI are given, too. Furthermore, we determine the corresponding limit quantities for the context in which the two GWI converge to Feller-type branching diffusion processes, as the time-lags between observations tend to zero. Some applications to (static random environment like) Bayesian decision making and Neyman-Pearson testing are presented as well.


Introduction
It is well known that "distances" in form of (relative-entropy covering) power divergences between finite measures are important for probability theory and statistics as well as their applications to various different research fields such as physics, information theory, econometrics, biology, speech and image recognition, transportation of (sorts of) "mass", etc. For probability measures P H , P A on a measurable space (Ω, A) and parameter λ ∈ R these power divergences -also known as Cressie-Read measures respectively generalized cross-entropy family -are defined as (see e.g. Liese and Vajda [47], [48]) where is the relative entropy (Kullback-Leibler information divergence) and is the Hellinger integral of order λ ∈ R\{0, 1}; for this, we assume as usual without loss of generality that the probability measures P H , P A are dominated by some σ−finite measure µ, with densities p A = dP A dµ and p H = dP H dµ defined on Ω (the zeros of p H , p A are handled in (2), (3) with the usual conventions). Apart from the relative entropy, other prominent examples of power divergences are the squared Hellinger distance 1 2 I 1/2 (P A ||P H ) and Pearson's χ 2 −divergence 2 I 2 (P A ||P H ). Extensive studies about basic and advanced general facts on power divergences, Hellinger integrals and the related Renyi divergences of order λ ∈ R\{0, 1} R λ (P A ||P H ) := 1 λ(λ − 1) log H λ (P A ||P H ) , with log 0 = −∞, can be found e.g. in Liese and Vajda [47], [48], Jacod and Shiryaev [29]. For instance, the integrals in (2) and (3) do not depend on the choice of µ. As far as finiteness is concerned, for λ ∈]0, 1[ one gets the rudimentary bounds where the lower bound is achieved if and only if P A = P H , and the upper bound is achieved if and only if P A ⊥P H (singularity). For λ / ∈]0, 1[, the power divergences I λ (P A ||P H ) and Hellinger integrals H λ (P A ||P H ) might be infinite, depending on the particular setup. For the sake of brevity, we only deal here with the case λ ∈ [0, 1]; the case λ / ∈ [0, 1] will appear elsewhere.
Another important class of time-dynamic models is given by discrete-time branching processes, in particluar Galton-Watson processes without immigration GW respectively with immigration GWI, which have numerous applications in biotechnology, population genetics, internet traffic research, clinical trials, asset price modelling and derivative pricing. (Transforms of) Power divergences have been used for supercritical Galton-Watson processes without immigration SUPGW for instance as follows: Feigin and Passy [15] study the problem to find an offspring distribution which is closest (in terms of relative entropy type distance) to the original offspring distribution and under which ultimate extinction is certain. Furthermore, Mordecki [56] gives an equivalent characterization for the stable convergence of the corresponding log-likelihood process to a mixed Gaussian limit, in terms of conditions on Hellinger integrals of the involved offspring laws. Moreover, Sriram and Vidyashankar [62] study the properties of offspring-distribution-parameters which minimize the squared Hellinger distance 1 2 I 1/2 between the model offspring distribution and the corresponding non-parametric maximum likelihood estimator of Guttorp [19]. For the setup of GWI with Poisson offspring and nonstochastic immigration of constant value 1, Linkov and Lunyova [52] investigate the asymptotics of Hellinger integrals in order to deduce large deviation assertions in hypotheses testing problems.
In contrast to the abovementioned contexts, this paper pursues the following main goals: (MG1) for any time horizon and any criticality scenario, to compute (non-rudimentary) lower and upper bounds -and sometimes even exact values -of the Hellinger integrals H λ (P A ||P H ) and power divergences I λ (P A ||P H ) (λ ∈ [0, 1]) of two Galton-Watson branching processes P A , P H with Poisson(β A ) respectively Poisson(β H ) distributed offspring as well as Poisson(α A ) respectively Poisson(α H ) distributed immigration. As a side effect, we also aim for corresponding asymptotic distinguishability results in terms of contiguity and entire separation. (MG2) to compute the corresponding limit quantities for the context in which (a proper rescalation of) the two Galton-Watson processes with immigration converge to Feller -type branching diffusion processes, as the time-lags between the generation-size observations tend to zero. (MG3) as an exemplary field of application, to indicate how to use the results of (MG1) for Bayesian decision making and Neyman-Pearson testing based on the sample path observations of the GWI-generation sizes, when the hypothesis law is given by P H and the alternative law by P A ; in a certain sense, this can also be interpreted in terms of a rudimentary static random environment.
Because of the involved Poisson distributions, these goals (which are potentially reasonable also for other types of offspring resp. immigration distributions) can be tackled with a high degree of tractability, which is worked out in detail with the following structure: we first deal with the non-relative-entropy case λ(1−λ) = 0. Section 2 contains the first basic result concerning Goal (MG1), which is then deepened in Section 3 in order to obtain -parameter constellation dependent -recursively computable exact values respectively recursively computable lower and upper bounds of H λ (P A ||P H ). Additionally, we construct related closed-form bounds in Section 4, which will also be used to achieve (the Hellinger-integral part of) Goal (MG2) in Section 5. The power divergences I λ (P A ||P H ) are treated in Section 6, complemented with the relative-entropy cases λ(1 − λ) = 0 of the Goals (MG1), (MG2). The subsequent Section 7 is concerned with Goal (MG3), whereas the Appendix contains main proofs and auxiliary lemmas.

Process setup and first basic result
Let X n denote the nth generation size of a discrete-time Galton-Watson process with immigration GWI. We use the recursive description where Y n−1,k is the number of offspring of the kth object (e.g. organism, person) within the (n − 1)th generation, and Y n denotes the number of immigrating objects in the nth generation. Notice that we employ an arbitrary deterministic initial generation size X 0 . We always assume that under the law P H (e.g. a hypothesis), • the collection Y := {Y n−1,k , n ∈ N, k ∈ N} consists of independent and identically distributed (i.i.d.) random variables which are Poisson distributed with parameter β H > 0, • the collection Y := Y n , n ∈ N consists of i.i.d. random variables which are Poisson distributed with parameter α H ≥ 0 (where α H = 0 stands for the degenerate case of having no immigration), • Y and Y are independent.
In contrast, under the law P A (e.g. an alternative) the same is supposed to hold with parameters β A > 0 (instead of β H > 0) and α A ≥ 0 (instead of α H ≥ 0). Furthermore, let (F n ) n∈N be the corresponding canonical filtration generated by X := (X n ) n∈N .
For the sake of brevity, wherever we introduce or discuss corresponding quantities simultaneously for both the hypothesis H and the alternative A, we will use the subscript • as a synonym for either the symbol H or A. For illustration, recall the well-known fact that the corresponding conditional probabilities P • (X n = · |X n−1 = k) are again Poisson-distributed, with parameter β • · k + α • . In oder to achieve a transparently representable structure of our results, we subsume the involved parameters as follows: let P SP be the set of all constellations (β A , β H , α A , α H ) of real-valued parameters β A > 0, β H > 0, α A > 0, α H > 0, such that β A = β H or α A = α H (or both). Furthermore, we write P NI for the set of all (β A , β H , α A , α H ) of real-valued parameters β A > 0, β H > 0, α A = α H = 0, such that β A = β H ; this corresponds to the important special case of having no immigration. The resulting disjoint union will be denoted by P = P SP ∪ P NI . A typical situation for applications in our mind is that one particular constellation (β A , β H , α A , α H ) ∈ P (e.g. obtained from theoretical or previous statistical investigations) is fixed, whereas -in contrast -the parameter λ ∈]0, 1[ for the Hellinger integral or the power divergence might be chosen freely, e.g. depending on which "probability distance" one decides to choose for further analysis. At this point, let us emphasize that in general we will not make assumptions of the form β • 1, i.e. upon the type of criticality.
To start with our investigations, we define the extinction time τ := min{l ∈ N : X m = 0 for all integers m ≥ l} if this minimum exists, and τ := ∞ else. Correspondingly, let B := {τ < ∞} be the extinction set. It is well known that in the case P NI one gets P • (B) = 1 if 0 < β • ≤ 1 and P • (B) ∈ ]0, 1[ if β • > 1. In contrast, for P SP there always holds P • (B) = 0. Furthermore, for P SP the two laws P H and P A are equivalent, whereas for P NI the two restrictions P H | B and P A | B are equivalent (see e.g. Lemma 1.1.3 of Guttorp [19]); with a slight abuse of noation we shall henceforth omit | B . Consistently, for fixed time n ∈ N 0 we introduce P A,n := P A | Fn and P H,n := P H | Fn as well as the corresponding Radon-Nikodym-derivative Clearly, Z 0 = 1. By using the "rate functions" a version of (6) can be easily determined by calculating for each ω = (ω 0 , ..., ω n ) ∈ Ω n := N n where for the last term we use the convention 0 0 x = 1 for all x ∈ N 0 . Furthermore, we define for each with the convention (0) 0 0! = 1 for the last term. Accordingly, with the choice µ = P H,n one obtains from (3) the Hellinger integral H λ (P A,0 ||P H,0 ) = 1, as well as for all and for all n ∈ N\{1} n,k (ω) · e (fA(ωn−1)) λ (fH(ωn−1)) 1−λ −(λfA(ωn−1)+(1−λ)fH(ωn−1)) .
From (9), one can see that a crucial role for the exact calculation (respectively the derivation of bounds) of the Hellinger integral is played by the functions defined for . This is consistent with the corresponding generally valid upper bound H λ (P A,n ||P H,n ) ≤ 1 .
As a first indication for our proposed method, let us start by illuminating the simplest case λ ∈]0, 1[ and In this situation, all the three functions (10) to (12) are linear. Indeed, (where the index E stands for exact linearity). Clearly, q E λ > 0 on P NI ∪ P SP,1 , as well as p E λ > 0 on P SP,1 respectively p E λ = 0 on P NI . Furthermore, As it will be seen later on, such kind of linearity properties are useful for the recursive handling of the Hellinger integrals. However, only on the parameter set P NI ∪ P SP,1 the functions ϕ λ and φ λ are linear. Hence, in the general case (β A , β H , α A , α H , λ) ∈ P×]0, 1[ we aim for linear lower and upper bounds x ∈ [0, ∞[ (ultimately, x ∈ N 0 ), which lead to . Of course, the involved slopes and intercepts should satisfy reasonable restrictions. For instance, because of the nonnegativity of ϕ λ we require p U λ ≥ p L λ ≥ 0, q U λ ≥ q L λ ≥ 0 (leading to the nonnegativity of ϕ L λ , ϕ U λ ). Furthermore, (9) and (13) suggest that p L λ ≤ α λ , q L λ ≤ β λ which leads to the nonpositivity of φ L λ . Moreover, it is assumed that at least one of the two inequalities p U λ < α λ , q U λ < β λ holds, (16) and hence φ U λ (x) < 0 for some (but not necessarily all) x ∈ [0, ∞[. Notice that in (16) we do not demand the validity of both inequalities, which might lead to the effect that the constructed Hellinger integral upper bounds have to be cut off at 1 for some (but not all) observation horizons n ∈ N; see (21) below. For the formulation of our first assertions on Hellinger integrals, we make use of the following notation: recursively by a (q) Notice the interrelation a Accordingly, we obtain fundamental Hellinger integral evaluations: (a) For all (β A , β H , α A , α H , λ) ∈ (P NI ∪ P SP,1 )×]0, 1[, all initial population sizes ω 0 ∈ N and all observation horizons n ∈ N one can recursively compute the exact value where αA βA can be equivalently replaced by αH βH . Recall that q E λ : (14) holds for all x ∈ N 0 as well as (16), all initial population sizes ω 0 ∈ N and all observation horizons n ∈ N one gets the recursive (i.e. recursively computable) bounds Remark 2.3. From the proof below one can see that both parts of Theorem 2.2 remain true for the cases λ / ∈ [0, 1]. For the (to our context) incompatible setup of GWI with Poisson offspring but nonstochastic immigration of constant value 1, the exact values of the corresponding Hellinger integrals (i.e. an "analogue" of part (a)) was established in Linkov and Lunyova [52].

Proof:
We first prove the upper bound B U λ,n . Let us fix (β A , β H , α A , α H , λ), p U λ , q U λ , ω 0 ∈ N as described in part (b). From (8), (10), (11), (12) and (14) one gets immediately B U λ,1 , and with the help of (9) for all observation horizons n ∈ N\{1} (with the obvious shortcut for n = 2) Notice that for the strictness of the above inequalities we have used the fact that φ λ (x) < φ U λ (x) for some (in fact, all but at most two) x ∈ N 0 (cf. (p-xiv) below). Since for some admissible choices of p U λ , q U λ and some n ∈ N the last term in (22) can become larger than 1, one needs to take into account the cutoff-point 1 arising from (13). Notice that without assumption (16), the last term in (22) would always be larger than 1 (and thus useless). The lower bound B L λ,n of part (b), as well as the exact value of part (a) follow from (9) in an analoguous manner by employing p L λ , q L λ and p E λ , q E λ respectively. Furthermore, we use the fact that for (β A , β H , α A , α H , λ) ∈ (P NI ∪ P SP,1 )×]0, 1[ one gets from (19) . For the sake of brevity, the corresponding straightforward details are omitted here. Although we take the minimum of the upper bound derived in (22) and 1, the inequality B L λ,n < B U λ,n is nevertheless valid: the reason is that for constituting a lower bound, the parameters p L λ , q L λ must fulfil either the conditions [p L λ < 0 and q L λ ≤ 0] or [p L λ ≤ 0 and q L λ < 0] (or both).

Detailed analyses
For part (b) in Theorem 2.2, we have assumed the existence of reasonable linear lower and upper bounds of ϕ λ and φ λ . In the following, we shall carry out a more detailed analysis addressing questions upon the non-uniqueness (and thus, flexibility) of the coefficients p L λ , q L λ , p U λ , q U λ in (14), their "optimal respectively reasonable choices", as well as the corresponding behaviour of the Hellinger integrals H λ (P A,n ||P H,n ) as the observation horizon n increases and finally converges to ∞. Of course, the answers to these questions will depend on the (e.g. fixed) value of (β A , β H , α A , α H ) and the (e.g. selectable) value of λ.
Before starting a closer inspection, notice by induction the general fact that for (β A , β H , α A , α H , λ) ∈ P×]0, 1[ and q ∈]0, ∞[ the principal behaviour of the sequence a (q) n n∈N is strongly governed by its first element: is strictly negative and strictly decreasing, if a (q) is strictly positive and strictly increasing, if a (q) Due to the linear interrelation (19), the monotonicity carries over to the sequence b (p,q) n n∈N0 (p ∈ [0, ∞[, q ∈]0, ∞[) in the following way: Notice that the sign of b (p,q) n might not be same as the sign of a (q) n (see e.g. (p-i), (p-iv)). Finally, for the remaining case one trivially gets Moreover, for (β A , β H , α A , α H , λ) ∈ P×]0, 1[ and q ∈]0, ∞[ we shall sometimes use the function which has the following obvious properties: λ is strictly increasing, strictly conxex and smooth, With these auxilliary basic facts in hand, let us now start our detailed investigations of the time-behaviour n → H λ (P A,n ||P H,n ) for the exactly treatable case (a) in Theorem 2.2. A.1, one has q E λ < β λ and thus, a

Detailed analysis of the exact values
is strictly negative as well as strictly decreasing. Furthermore, because of (p-ix), (p-x) and a (q E λ ) 1 < 0, the function ξ Summing up, we have shown the following detailed behaviour of Hellinger integrals: the sequence (H λ (P A,n ||P H,n )) n∈N given by (aEF) The "equal-fraction-case" (β A , β H , α A , α H , λ) ∈ P SP,1 ×]0, 1[: the sequence (H λ (P A,n ||P H,n )) n∈N given by For the (to our context) incompatible setup of GWI with Poisson offspring but nonstochastic immigration of constant value 1, an "analogue" of part (d) of Proposition 3.2 was established in Linkov and Lunyova [52].
From (p-xi) to (p-xv) it is easy to see that for all current parameter constellations the particular choices -which correspond to the choices (15) (and at least one of the two last inequalities is strict) -lead to the tightest lower bound B L λ,n for H λ (P A,n ||P H,n ) in (21). This situation coincides partially with those in Section 3.1. Formally, p L λ = p E λ and q L λ = q E λ , but because of γ = 0 the relation b is in general not valid anymore and has to be replaced by the relation (cf. (19) Hence, for a better distinguishability and easier reference we stick to the L−notation here. Nevertheless, the behaviour of the sequence a (q L λ ) n n∈N coincides exactly with that of the sequence a (q E λ ) n n∈N in the Subsections 3.1(aNI), (aEF). In particular a (q L λ ) n n∈N is strictly negative, strictly decreasing and converges to the unique solution x Consequently, because of (25) and b is strictly negative and strictly decreasing. As in Subsection 3.1(aEF), we obtain

Detailed analysis of the upper bounds
As above, we again assume (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 )×]0, 1[ throughout this section. In contrast to the treatment of the lower bounds in Section 3.2, the finetuning of the upper bounds is more involved. Because of the strict concavity of the function φ λ (·) (cf. (p-xiv)), there is in general no overall best linear upper bound of φ λ (·) within the framework (15). Different reasonable goals might lead to different reasonable choices of p U λ , q U λ (and thus of r U λ , s U λ ) which might imply different behaviour of the corresponding sequence B U λ,n n∈N of upper bounds in (21). This can be conjectured from the following immediate monotonicity properties: which holds for all n ∈ N; take e.g. (β A , β H , α A , α H , λ) = (1, 0.6, 3, 3, 0.5), p 1 = 3.4641, q 1 = 0.7785 (for which φ U λ (·) corresponds to the secant line through the points φ λ (0) and φ λ (1)), as well as p 2 = 3.4857, q 2 = 0.7746 (for which φ U λ (·) corresponds to the asymptote of φ λ ), and inspect the first six values of of the corresponding b n −sequence.
The properties (p-xvi) to (p-xx) have corresponding effects on the behaviour (21)) of the upper bounds. For instance, for any fixed admissible intercept p U λ one would always choose the smallest admissible q U λ in order to achieve the smallest possible upper bound; due to (p-xiv) this implies that on the ultimately relevant subdomain N 0 the linear function φ U λ (·) should hit φ λ (·) in at least one but at most two points (tangent or secant line). Furthermore, we require for the rest of the section that p U λ > 0 and q U λ > 0, because otherwise r U λ < φ λ (0) and s U λ < s λ (cf. (p-xv)) which contradicts to the nature of linear upper bounds of φ λ .
The (only partially restricted) choice of parameters p U λ , q U λ for the upper bounds B U λ,n can be made according to different, partially incompatible ("optimality-" respectively "goodness-") criteria, such as: (Ga) very good tightness for n ≥ N for some fixed large N ∈ N, or (Gb) for a fixed initial population size ω 0 ∈ N there holds B U λ,n < 1 for all n ∈ N, or (Gc) there holds B U λ,n < 1 for all n ∈ N and all ω 0 ∈ N (strict improvement of the general upper bound (13)). For the sake of brevity, we investigate only goal (Gc) (with the exception of Subsection 3.3(a7) and Theorem 6.3) which can be achieved if (and "nearly but not fully" iff) (16) holds; this can be seen from and the properties (p-i) to (p-vii). Furthermore, (p-xiv) and (p-xv) imply that the slope s λ := q U λ − β λ in (15) should be greater or equal to the limit slope s λ which leads to the restriction q (15) should be greater or equal to φ λ (0) and thus, p U λ ≥ α λ A α 1−λ H . By comparing the above established lower and upper parameter-bounds, from Lemma A.1 it follows that the case q U λ < β λ automatically implies β A = β H whereas the case p U λ < α λ leads to α A = α H . In consistence with (p-xiv), various different parameter constellations can lead to different Hellinger-integral-upper-bound details, which we investigate in the following.
, and the minimal admissible slope which implies (15) for x ∈ N is given by is strictly negative, strictly decreasing, and converges to the unique solution x Moreover, in the same manner as in Section 3.2, the sequence b is strictly negative and strictly decreasing. This leads to In contrast to P SP,2 , the constellation P SP,3 of all (componentwise) strictly positive (β A , β H , α A , α H ) with α A = α H , β A = β H and αA αH = βA βH is divided into three main parts as follows: because of Lemma A.1 one Accordingly, (for reasons which will be explained below) we denote by P SP,3ab resp. P SP,3c resp. P SP,3d the subset of P SP,3 for which αH−αA βA−βH < 0 resp. αH−αA βA−βH ∈]0, ∞[\N resp. αH−αA βA−βH ∈ N; notice that the case αH−αA βA−βH = 0 can not appear within P SP,3 . For further investigations let us first divide the set P SP,3ab ×]0, 1[ of quintuples (β A , β H , α A , α H , λ) into two parts P λ,≤0 SP,3a and P λ,>0 SP,3b : For the latter, both the strict negativity as well as the vanishing can appear in the current parameter setup, take e.g.
In the current setup, φ λ is a strictly negative, strictly decreasing, and -due to (p-xiv) -strictly concave function (and thus, the assumption αH−αA βA−βH < 0 is superfluous here). In contrast to Subsection (a1), one has the flexibility to choose the intercept (15)). Of course, one way to obtain a reasonable choice of intercept and slope is the search for the optimum subject to the abovementioned constraints. However, the corresponding result generally depends on the choice of the initial population size ω 0 and the observation horizon n. Hence, there is in general no overall optimal choice of p U λ , q U λ (without the incorporation of further goal-dependent constraints such as lim n→∞ B U λ,n = 0 in case of lim n→∞ H λ (P A,n ||P H,n ) = 0). By the way, due to the recursive nature of the sequences in (28) and the nontriviality of the constraints, this optimization problem seems to be not straightforward to solve, in general.
Inspired from Subsection (a1), a more pragmatic but yet reasonable choice is the following: take any intercept , which corresponds to a linear function φ U λ which is (a) nonpositive on N 0 and strictly negative on N, (b) larger than or equal to φ λ on N 0 , strictly larger than φ λ on N\{1, 2}, and equal to φ λ at the point x = 1 ("discrete tangent or secant line through x = 1").
One can easily see that (due to the restriction (14)) not all (1)). Hence, analogously to Subsection (a1) one can derive that a is strictly negative, strictly decreasing, and converges to the unique solution x [ of equation (27). More- is strictly negative and strictly decreasing. Thus, all the assertions (a), SP,3a and all initial population sizes ω 0 ∈ N.
which means that the secant line through φ λ (⌊x max ⌋) and φ λ (⌊x max ⌋ + 1) possesses a non-positive intercept. In this situation it is reasonable to choose A larger intercept would lead to a linear function φ U λ for which (15) is not valid at ⌊x max ⌋ + 1.
, one can proceed as above by substituting the crucial pair of points (⌊x max ⌋, ⌊x max ⌋ + 1) with (⌊x max ⌋ + 1, ⌊x max ⌋ + 2) and examining the analogous two subcases.
With the accordingly derived p U λ , q U λ one gets in all four (sub)cases exactly the same kind of behaviour of the sequences a as in Subsection (a2). Hence, all the assertions (a), (b), (c), (d) of SP,3b and all initial population sizes ω 0 ∈ N.
The only difference to Subsection (a4) is that the maximum value of φ λ (·) now achieves 0 at the integer point ≤ 1 for all n ∈ N and all ω 0 ∈ N, our method leads to the choices r U λ = 0 as well as s U λ = 0. Consequently, B U λ,n ≡ 1, which coincides with the general upper bound (13), but violates the abovementioned desired goal (Gc). However, by using a conceptually different method we can nevertheless prove the convergence lim (which will be used for the study of entire separation below). This will be done in Appendix A.1.
As a next step, let us investigate the last possible parameter constellation: This is the only case where φ λ (·) is strictly negative and strictly increasing, with lim x→∞ φ λ (x) = lim x→∞ φ ′ λ (x) = 0, leading to the choices r U λ = 0 as well as s U λ = 0 under the restriction that exp a 1 for all n ∈ N and all ω 0 ∈ N. Consequently, B U λ,n ≡ 1, which is consistent with the general upper bound (13), but violates the abovementioned desired Goal (Gc). Unfortunately, the proof method of (30) can't be carried over to the current setup (see Appendix A.1).
(a7) Alternative bounds for P SP,2 ∪ P SP,3ab ∪ P SP,3c ∪ P SP,3d Within this last subsection, let us exceptionally ignore the Goal (Gc). Correspondingly, for the derivation of an upper bound B U λ,n one can use the asymptote of ϕ λ given in (p-xv) to end up with p U λ : (16) holds, since we have excluded P SP,4 . However -depending on the choice of (β A , β H , α A , α H ) -the intercept r λ = p U λ − α λ may become strictly positive, and hence may become larger than 1. However, according to properties (p-ii) and (p-vi) the sequence and ω 0 = 10, with p U λ = 1.960, q U λ = 1.225, instead of the (amongst others proposed) choice p U λ = 1.897 and q U λ = 1.249.

Asymptotic distinguishability
For each n ∈ N 0 , let (Ω n , F n ) be a measurable space equipped with two probability measures P n , P n . The following two general types of asymptotic distinguishability are well known (see e.g. LeCam [42], Liese and Vajda [47], Jacod and Shiryaev [29], Linkov [51], and the references therein): (CEa) the sequence ( P n ) n∈N0 is contiguous to the sequence (P n ) n∈N0 -in symbols, ( P n ) ⊳ (P n )) -if for all sequences A n ∈ F n with lim n→∞ P n (A n ) = 0 there holds lim n→∞ P n (A n ) = 0.
(CEb) the sequences ( P n ) n∈N0 and (P n ) n∈N0 are entirely separated (completely asymptotically separable) -in symbols, ( P n ) △ (P n ) -if there exist a sequence n m ↑ ∞ as m ↑ ∞ and for each m ∈ N 0 an A nm ∈ F nm such that lim m→∞ P nm (A nm ) = 1 and lim m→∞ P nm (A nm ) = 0.
The corresponding negations will be denoted by ⊳ and △. As demonstrated in the abovementioned references for a general context, (CEb) holds iff lim inf n→∞ H λ P n ||P n = 0 for some (or equivalently, all) λ ∈]0, 1[; furthermore, Combining these results with the respective part (c) of Propositions 3.1, 3.2 and 3.5 as well as the connected investigations of Subsections 3.3(a2) to (a5), we obtain the following ) and all initial population sizes ω 0 ∈ N, the corresponding sequences (P A,n ) n∈N0 and (P H,n ) n∈N0 are entirely separated.
and all initial population sizes ω 0 ∈ N, the sequence (P A,n ) n∈N0 is neither contiguous to nor entirely separated to (P H,n ) n∈N0 .
Remarks 3.7. (i) Assertion (c) of Corollary 3.6 contrasts the case of Gaussian processes with independent increments where one gets either entire separation or mutual contiguity (see e.g. Liese and Vajda [47]).
(ii) By putting Corollary 3.6(b) and (c) together, we obtain for different "criticality pairs" in the nonimmigration case P NI the following asymptotic distinguishability types: in particular, for P NI the sequences (P A,n ) n∈N0 and (P H,n ) n∈N0 are not completely asymptotically inseparable (indistinguishable). (iii) In the light of the abovementioned (CEa) resp. (CEb) characteriztions by means of Hellinger integral limits, the finite-time-horizon results on Hellinger integrals given in Theorem 2.2, Section 3 and also in the following Section 4 can loosely be interpreted as "finite-sample (rather than asympotic) distinguishability" assertions.

Closed-form bounds
Depending on the parameter constellation, we have given bounds respectively exact values for the Hellinger integrals, which can be obtained with the help of recursions (17) (together with (19) respectively (p-viii)) which are "stepwise fully evaluable" but generally seem not to admit a closed-form representation in the observation horizons n; consequently, the exact time-behaviour of (the bounds of) the Hellinger integrals can generally not be seen explicitly. To avoid this intransparency (at the expense of losing some precision) one can approximate (17) by a recursion that allows for a closed-form representation. Accordingly, we shall employ (context-adapted) linear inhomogeneous difference equations a 0 := 0 ; a n := ξ ( a n−1 ) + ρ n−1 , n ∈ N, with for some constants c ∈] − ∞, 0[, d ∈]0, 1[, K 1 , K 2 , κ, ν ∈ R with 0 ≤ ν < κ < d. As usual, one gets the closed-form representation a n = a hom n + c n with a hom which immediately leads for all n ∈ N to n k=1 Notice that for the special case K 2 = −K 1 > 0 one has from (33) for all integers n ≥ 2 the relation ρ n−1 < 0 and thus a n − a hom n < 0, leading to c n < 0 and n k=1 c n < 0 .
In the following, we appropriately apply (31)- (35) to the different parameter contexts of Section 3.

Closed-form lower bounds
Let (β A , β H , α A , α H , λ) ∈ P×]0, 1[. We have seen in the Sections 3.1 and 3.2 that the determination of the exact values and the lower bounds had (more or less) identical structure: by the nonlinear recursion (cf. (17), (23)) and finally end up with (cf. (21), (20)) exp a which is either interpreted as bound B L λ,n in the parameter case (β A , β H , α A , α H , λ) ∈ (P SP \P SP,1 ) ×]0, 1[ or as exact value V λ,n in the parameter case (β A , β H , α A , α H , λ) ∈ (P NI ∪ P SP,1 )×]0, 1[ (where we achieved some further simplifications above). Since is strictly negative, strictly decreasing and converges to the unique solution x we use the following approximative linear recursion in order to obtain a closed-form lower bound for both (here identically treatable) cases L, E: i.e. we replace the nonlinear function ξ and reduce the error we face by adding the "correction-term" In other words, by means of the two functions on the domain [0, ∞[ we use (31), (32), (33) with constants d : ∈ R\{d, 1}, K 2 := 0, ν := 0. Let us first present some fundamental properties which will be proved in Appendix A.2: for all n ∈ N.
is strictly decreasing.
Applying Theorem 2.2, Lemma 4.1 as well as the formulae (19), (34) and (35), one gets Theorem 4.2. For all (β A , β H , α A , α H , λ) ∈ P×]0, 1[ and all initial population sizes ω 0 ∈ N the following assertions hold: (a) for all observation horizons n ∈ N the Hellinger integral can be bounded from below by the closed-form bounds H λ (P A,n ||P H,n ) > C L λ,n given by (b) the sequence C L λ,n n∈N is strictly decreasing.
In order to get an "explicit" lower bound which does not rely on the implicitly given fixed point x (q ⋆ λ ) 0 , one can replace the latter by a close explicit lower approximate x and proceed completely analogously, leading to a smaller lower bound (say) C L λ,n < C L λ,n in assertion (a) of Theorem 4.2; in the corresponding assertions (b), (c) and (d) one then has to replace C L λ,n by C L λ,n and x where if q ⋆ λ ≥ 1; this will be used as an auxiliary tool for the diffusion-limit-concerning proof of Lemma A.3(c) in the appendix.
represents the existing negative intersection of the tangent of ξ

Closed-form upper bounds
In order to achieve closed-form upper bounds, we principially proceed as in the previous Section 4.1. However, the situation is now more diverse since we have to start from Section 3.3 which carries much more "nonuniqueness" respectively variety than the corresponding Sections 3.1 and 3.2 which we used as a starting point for the investigations in Section 4.1.
Notice first that for the subcases P SP,3ab ×]0, 1[ and P SP,3c ×]0, 1[ (cf. Subsections 3.3(a2),(a3),(a4)) one can achieve a closed-form upper bound without further investigations: if one chooses q U λ = β λ (and thus, the slope s U λ = 0), then by properties (p-i), (p-v) one has a (q) However, there might exist (and for P λ,≤0 SP,3a definitely exists) choices (p U λ , q U λ ) which lead to (fully or eventually partially) tighter upper bounds B U λ,n but for which the non-linear recursion (17) is nontrivial. Such potential cases, for which in particular 0 < q U λ < β λ and 0 < p U λ ≤ α λ holds, will be treated in the following; since the parameter constellation (β A , β H , α A , α H , λ) ∈ (P SP,3d ∪ P SP,4 )×]0, 1[ does not meet this requirement, let us fix (β A , β H , α A , α H , λ) ∈ (P NI ∪ P SP,1 ∪ P SP,2 ∪ P SP,3ab ∪ P SP,3c )×]0, 1[ where we also include the two setups P NI ∪ P SP,1 for which we want to replace the recursive, non-closed-form exact values by closed-form upper bounds. For this situation, we determined recursive upper bounds respectively exact values in a (more or less) identical structure which is also very close to the one given by (37) to (38): choose q G λ for G = U respectively G = E subject to the corresponding parameter case which leads to a (q G λ ) 1 = s G λ = q G λ − β λ < 0, compute the (rest of the) sequence a choose p G λ subject to the corresponding parameter case and evaluate According to (p-ii), the fundamentally important sequence a (q G λ ) n n∈N is strictly negative, strictly decreasing and converges to the unique solution x For an upper bound of the sequence a (q G λ ) n we introduce the recursion a (q G λ ) 0 i.e. we replace the nonlinear function ξ (q G λ ) λ (x) = q G λ · e x − β λ by the secant line of ξ (q G λ ) λ across its arguments x (q G λ ) 0 and 0, defined by and reduce the error we face by adding the "correction-term" In other words, by means of (42) and the function on the domain [0, ∞[ we use (31), (32), (33) with the constants d : The following fundamental properties will be proved in the appendix: for all n ∈ N, with equality iff n = 1.
is strictly decreasing.
In order to get an "explicit" upper bound which does not rely on the implicitly given fixed point x (q G λ ) 0 (G ∈ {U, E}), one can replace the latter by a close explicit upper approximate x and proceed completely analogously, leading to a larger upper bound (say) C G λ,n > C G λ,n in assertion (a) of Theorem 4.5; in the corresponding assertions (b), (c) and (d) one then has to replace C G λ,n by C G λ,n and x . One possibility along these lines is the choice which is exactly the unique negative solution to the quadratic equation By inspection of the first two derivatives, one gets Q , 0]. Such a situation will be used as an auxiliary tool for the proof of Lemma A.3(c) in the appendix.
Along these lines of branching-type diffusion limits, it makes sense to consider the solutions of two SDEs (57) with different fixed parameter sets (η, κ A , σ) and (η, κ H , σ), determine for each of them a corresponding approximating GWI, investigate the Hellinger integral between the laws of these two GWI, and finally calculate the Hellinger integral (bounds) limit as the GWI approach their SDE solutions. Notice that for technicality reasons (which will be explained below), the constants η and σ ought to be independent of A, H in our current context.
In order to make the abovementioned limit procedure rigorous, it is reasonable to work with appropriate approximations such that in each convergence step m one faces the setup (P NI ∪ P SP,1 )×]0, 1[ (i.e. the nonimmigration or the equal-fraction case), where the corresponding Hellinger integral can be calculated exactly in a recursive way (cf. Section 3.1). Let us explain the details in the following.
Consider a sequence of GWI X (m) m∈N with probability laws P (m) • on a measurable space (Ω, A), where as above the subscript • stands for either the hypothesis H or the alternative A. Analogously to (5), we use for each fixed step m ∈ N the representation X (m) := X (m) n , n ∈ N with where under the law P Here and henceforth, we always assume that the approximation step m is large enough to ensure that β living on the state space E (m) := 1 m N 0 . From (60) one can see immediately the necessity of having σ to be independent of A, H because for the required absolute continuity in (6) both models at stake have to "live" on the same time-scale τ (m) t := σ 2 mt . For this setup, one obtains the following convergence result: 0, ∞[ and X (m) be as defined in (58) to (60). Furthermore, let us suppose that lim m→∞ where W • t denotes a standard Brownian motion with respect to the limit probability measure P • .
Notice that the condition η σ 2 ≥ 1 2 can be interpreted in our approximation setup (59) as α The corresponding proof of Theorem 5.1 -which is outlined in Appendix A.3 -is an adaption of the proof of Theorem 9.1.3 in Ethier and Kurtz [13] which deals with drift-parameters η = 0, κ • = 0 in the SDE (61) whose solution is approached on a σ−independent time scale by a sequence of (critical) Galton-Watson processes without immigration but with general offspring distribution with mean 1 and variance σ. Notice that due to (59) the latter is inconsistent with our Poissonian setup, but this is compensated by our chosen σ−dependent time scale. Furthermore, (59) is also inconsistent with the other concrete parameter choices in the abovementioned corresponding references.
Let us finally present the corresponding desired limit assertions as the approximation step m tends to infinity, by making use of the quantities Notice that the components L

Power divergences and relative entropy
All the results of the previous sections carry correspondingly over from the Hellinger integrals H λ (·||·) (λ ∈ ]0, 1[) to the power divergences I λ (·||·) by virtue of the relation (cf. (1)) In particular, this leads to bounds on I λ (P A ||P H ) which are tighter than the general rudimentary bound (4) connected with (13). Furthermore, it is well known that in general the relative entropy defined by (2) see e.g. Liese and Vajda [47]. Accordingly, for our context of GWI we can use (71) in combination with the recursive exact values respectively recursive lower bounds of Theorem 2.2 and Section 3.2 to obtain the following closed-form exact values respectively closed-form upper bounds of the relative entropy I(P A,n ||P H,n ): , all initial population sizes ω 0 ∈ N and all observation horizons n ∈ N (b) For all (β A , β H , α A , α H ) ∈ P SP \P SP,1 , all initial population sizes ω 0 ∈ N and all observation horizons n ∈ N it holds I(P A,n ||P H,n ) ≤ E U n , where Remark 6.2. The n−behaviour of (the bounds of ) the relative entropy I(P A,n ||P H,n ) in Theorem 6.1 is influenced by the following facts: • β A · log βA βH − 1 + β H ≥ 0 with equality iff β A = β H .
• In the case β A = 1 of (73), there holds In contrast, in order to derive (semi-)closed-form lower bounds of the relative entropy I(P A,n ||P H,n ) we use (71) in combination with the recursive upper bounds of Theorem 2.2(b) and appropriately adapted detailed analyses along the lines of Section 3.3. This amounts to where for all y ∈ [0, ∞[ we define the -possibly negatively valued -finite bound component ln αA+y αH+βHy + β H 1 − αA+y αH+βHy · αA 2 · n 2 + ω 0 + αA and for all k ∈ N 0 the -possibly negatively valued -finite bound component (76) Furthermore, on P SP,4 we set E L,hor n := 0 for all n ∈ N whereas on P SP \(P SP,1 ∪ P SP,4 ) we define In the subcases P SP,2 ∪P SP,3ab ∪ P SP,3c ∪ P SP,4 one gets even E L n > 0 for all ω 0 ∈ N and all n ∈ N. In the subcase P SP,3d , one obtains for each fixed n ∈ N and each fixed ω 0 ∈ N the strict positivity E L n > 0 if ∂ ∂y E L,tan y,n (y * ) = 0, where y * := αA−αH βH−βA ∈ N and hence ∂ ∂y E L,tan and -possibly negatively valued -finite bound component For the cases P SP,2 ∪ P SP,3ab ∪ P SP,3c one gets even E L n > 0 for all ω 0 ∈ N and all n ∈ N.
For the diffusion-limit of the relative entropy we obtain closed-form exact values:

Applications
As already mentioned in the introduction, there are numerous applications of both ingredients -power divergences resp. Hellinger integrals resp. relative entropy on the one hand and Galton-Watson branching processes with immigration on the other hand. In order to indicate the concrete applicability of our combinating investigations, for the sake of brevity we confine ourselves to some issues in the context of Bayesian decision making BDM and Neyman-Pearson testing NPT. In BDM, we decide here between an action d H "associated with" the (say) hypothesis law P H and an action d A "associated with" the (say) alternative law P A , based on the sample path observation X n := {X l : l ∈ {0, 1, . . . , n} } of the GWI-generation-sizes up to observation horizon n ∈ N. Following the lines of Stummer and Vajda [67] (adapted to our branching process context), for BDM let us consider as admissible decision rules δ n : Ω n → {d H , d A } the ones generated by all path sets G n ∈ Ω n through δ n (X n ) := δ Gn (X n ) := d A , if X n ∈ G n , d H , if X n / ∈ G n , as well as loss functions of the form with pregiven constants L A > 0, L H > 0 (e.g. arising as bounds from quantities in worst-case scenarios); notice that in (80), d H is assumed to be a zero-loss action under H and d A a zero-loss action under A. Per definition, the Bayes decision rule δ Gn,min minimizes -over G n -the mean decision loss as well as the lower bound H λ (P A,n ||P H,n ) which implies in particular the "direct" lower bound By using (83) (respectively (84)) together with the exact values and the upper (respectively lower) bounds of the Hellinger integrals H λ (P A,n ||P H,n ) derived in the preceding sections, we end up with upper (respectively lower) bounds of the Bayes risk R n . For different types of -mainly parameter estimation (squared-error type loss function) concerning -Bayesian analyses based on GW(I) generation size observations, see e.g. Jagers [30], Heyde [23], Heyde and Johnstone [24], Johnson et al. [32], Basawa and Rao [4], Basawa and Scott [6], Scott [60], Guttorp [19], Yanev and Tsokos [74], Mendoza and Gutierrez-Pena [54], and the references therein.
Alternatively to the BDM applications above, let us now briefly deal with the corresponding NPT framework with randomized tests T n : Ω n → [0, 1] of the hypothesis P H against the alternative P A , based on the GWI-generation-size sample path observations X n := {X l : l ∈ {0, 1, . . . , n} }. In contrast to (81), (82) a Neyman-Pearson test minimizes -over T n -the type II error probability Ωn (1 − T n ) dP A,n in the class of the tests for which the type I error probability Ωn T n dP H,n is at most ς ∈]0, 1[. The corresponding minimal type II error probability E ς (P A,n ||P H,n ) := inf Tn: Ωn Tn dPH,n≤ς Ωn (1 − T n ) dP A,n can for all ς ∈]0, 1[, λ ∈]0, 1[, n ∈ N be bounded from above by which is an adaption of a general result of Krafft and Plachky [35], see also Liese and Vajda [47] as well as Stummer and Vajda [67]. Hence, by combining (85) with the exact values respectively upper bounds of the Hellinger integrals H 1−λ (P A,n ||P H,n ) from the preceding sections, we obtain for our context of GWI with Poisson offspring and Poisson immigration (including the non-immigration case) some upper bounds of E ς (P A,n ||P H,n ), which can also be immediately rewritten as lower bounds for the power 1 − E ς (P A,n ||P H,n ) of a most powerful test at level ς. In contrast to such finite-time-horizon results, for the (to our context) incompatible setup of GWI with Poisson offspring but nonstochastic immigration of constant value 1, the asymptotic rates of decrease as n → ∞ of the unconstrained type II error probabilities as well as the type I error probabilites were studied in Linkov and Lunyova [52] by a different approach employing also Hellinger integrals. Some other types of GW(I) concerning Neyman-Pearson testing investigations different to ours can be found e.g. in Basawa and Scott [5], Feigin [14], Sweeting [68], Basawa and Scott [6], and the references therein.
For the sake of brevity, a further more detailed discussion of GWI statistical issues along the lines of this section as well as power-divergences-connected goodness-of-fit investigations will appear in a forthcoming paper.

A.2. Proofs of Section 4
Proof of Lemma 4.1 Recall the fundamental nonlinear recursion of a (q ⋆ λ ) n n∈N0 (cf. (37), (38)), the corresponding "substitute" inhomogeneous linear recursion of a (q ⋆ λ ) n n∈N0 (cf. (39), (40), (41)) and its homogenous linear relative a (q ⋆ λ ),hom n n∈N0 (cf. (31), (32)) which by (34) and (40) As an auxiliary step, let us compare x → ξ We are now ready to prove part (a) by induction. For n = 1, we easily see that a and the latter is obviously true. To continue, let us assume that holds. From this, (41), (87) and (88) we obtain n+1 holds. In order to show (b), we make use of the straightforward representation which implies that the sequence a (q ⋆ λ ) n n∈N is strictly decreasing since for all k ∈ N 0 there holds by (41) The final assertion follows immediately from (88) and the closed-form representation (34) with the choices K 1 K 2 , κ, ν, c given just right after (42).
Proof of Lemma 4.4 For P SP \(P SP,3d ∪ P SP,4 ) we deal with the fundamental nonlinear recursion of a (q G λ ) n n∈N0 , G ∈ {E, U } (cf. (48), (27)), the corresponding "substitute" inhomogeneous linear recursion of a (q G λ ) n n∈N0 (cf. (49), (50), (51)) and its homogenous linear counterpart a (q G λ ),hom n n∈N0 (cf. (31), (32)) which by (34) and (50) In analogy to the Proof of Lemma 4.1, we use the quadratic function , 0] a strict upper functional bound of ξ (q G λ ) λ (·). To start with the proof of part (a), let us first observe for n = 1 the obvious relation a Furthermore, let us assume that a N) holds. From this, (51), (89), (90) and the appropriately adapted version of a (·),hom n we obtain the desired inequality a Moreover, the property (b) follows from the representation which implies that the sequence a (q G λ ) n n∈N is strictly decreasing since for all k ∈ N 0 one has ρ (q G λ ) k + (q G λ − β λ ) < 0. Finally, part (c) follows immediately from (90) and the closed-form representation (34) with the choices K 1 K 2 , κ, ν, c given just right after (52).

A.3. Proofs of Section 5
Proof of Theorem 5.1 As already mentioned above, one can adapt the proof of Theorem 9.1.3 in Ethier-Kurtz [13] who deal with drift-parameters η = 0, κ • = 0, and the different setup of σ−independent time-scale and a sequence of critical Galton-Watson processes without immigration with general offspring distribution. For the sake of brevity, we basically outline here only the main differences to their proof; for similar limit investigations involving offspring/immigration distributions and parametrizations which are incompatble to ours, see e.g. Sriram [61].
As a first step, let us define the generator which corresponds to the diffusion process X governed by (61). In connection with (58), we study where the Y But (91) follows mainly from the next with the usual convention S (m) 0 Proof of Lemma A.2 Let us fix f ∈ C ∞ c [0, ∞) . From the involved Poissonian expectations it is easy to see that lim and thus (92) holds for x = 0. Accordingly, we next consider the case x ∈ E (m) \{0}, with fixed m ∈ N. From as well as With our choice β where for the case η = κ = 0 we use the convention o 1 m ≡ 0. Combining (93) to (96) and the centering EP • S (m) mx = 0, the left hand side of equation (92) becomes which immediately leads to the right hand side of (92).
To proceed with the proof of Theorem 5.1, we obtain for m ≥ 2κ • /σ 2 the inequality β (m) • ≥ 1/2 and accordingly for all v ∈]0, 1[, Suppose that the support of f is contained in the interval [0, c]. Correspondingly, for v ≤ 1 − 2c/x the integrand in ǫ (m) (x) is zero and hence with (97) we can estimate From this, one can deduce lim m→∞ sup x∈E (m) ǫ (m) (x) = 0 -and thus (91) -in the same manner as at the end of the proof of Theorem 9.1.3 in [13] (by means of the dominated convergence theorem).
The following lemma is the main tool for the proof of Theorem 5.3 below.
Lemma A.3. Let (κ A , κ H , η, λ) ∈ ( P N I ∪ P SP,1 )×]0, 1[. By using the quantities κ λ : (64), one gets for all t > 0 (a) lim (m) lim m→∞ ϑ (m) Proof of Lemma A.3 For each of the assertions (a) to (m), we will make use of l'Hospital's rule. To begin with, we obtain for arbitrary µ, ν ∈ R From this, (a) follows immediately and (b) can be deduced by For the proof of the first part of (c), we rely on the inequalities x From (46) For the first part of (e), we use the general limit lim From this and (c), the second part of (e) is obvious. The limit (f) can be obtained from (d) and (e). The assertions (g) respectively (h) respectively (i) follow from (d) respectively (e) respectively (f) by using the general relation lim m→∞ 1 + xm m m = e limm→∞ xm . The last four limits (j) to (m) are straightforward implications of (a) to (i).
Proof of Theorem 5.3 It suffices to compute the limits of the bounds given in Corollary 5.2 as m tends to infinity. This is done by applying Lemma A.3 which provides corresponding limits of various involved quantities. Accordingly, for all t > 0 the lower bound (66) can be obtained from (62) by ∀ n ∈ N : lim λր1 a (q λ ) n = 0.