Exploiting the Massey Gap

In this paper, we find new refinements to the Massey inequality, which relates the Shannon and guessing entropies, introducing a new concept: the Massey gap. By shrinking the Massey gap, we improve all previous work without introducing any new parameters, providing closed-form strict refinements, as well as a numerical procedure improving them even further.


Introduction
The guessing entropy associated with a probabilistic vector p = (p 1 , p 2 , . . . , p n ) with p 1 ≥ · · · ≥ p n > 0 is the expected value of the random variable G (p) given by Pr [G (p) = i] = p i ∀i ∈ 1, n, i.e., E [G (p)] = ∑ n i=1 ip i , and it corresponds to the optimal average number of binary questions required guessing the value of a random variable distributed according to p. Meanwhile, the Shannon entropy of p is H (p) = − ∑ n i=1 p i log p i , and the first relation between them was provided by J. Massey in [1], namely that if H (p) ≥ 2, then E [G (p)] ≥ 2 H(p)−2 + 1.
While the Massey inequality has been shown to have many applications such as password guessing [2] and encourages multiple developments such as moments inequalities (e.g., [3,4]), there are only two known direct refinements to the Massey inequality, both of which were published in 2019. In an ISITpaper, Popescu and Choudary [5] found that: E [G (p)] ≥2 H(p)+2p n −2 + 1 − p n (1) ≥2 H(p)+p n −2 + 1 − 1 2 p n (2) ≥2 H(p)−2 + 1, subject to the same conditions as the Massey inequality. Meanwhile, Rioul's inequality [6], published in a CHESpaper [7], states that for all values of H (p), we have: a bound that is similar to the Massey inequality and refines it when H (p) ≥ log e 1−e/4 . Remarkably, the refinement in [5] is actually strict since p has finite support, |p| = n ≤ 1/ inf p < ∞. In fact, applying the method from [5], it can be easily observed that Equation (4) could be further improved. For example, applying just the first step of the method presented in [5], we can obtain that for H (p) ≥ log e 2 ln 2 , we have a new further refinement of Equation (4), Motivated by this insight, in this paper, we find and fully investigate a method to refine an inequality regarding finite support vectors using only itself. Here, we apply our analysis to the famous Massey inequality and explore multiple avenues of applying our method: shrinking the Massey gap.

Results
In this section, we present our results, as well as their proofs. In Section 2.1, we find a refinement over the initial result of [5], i.e., Equation (2), which we improve in Section 2.2 beyond even the strongest result of [5], i.e., Equation (1). Then, in Section 2.3, we introduce a new concept, the Massey gap, and use it to generalize the method in Section 2.2. Next, in Section 2.4, we show how using the Massey gap, we can almost always improve our refinement, by rebalancing the middle of the distribution, and finally, in Section 2.5, we find closed-form easy to compute refinements over the work of [5].

An Improved Refinement to the Massey Inequality
In this section, we find a new relation between the Shannon and guessing entropy, dependent on the minimal probability of a given distribution, refining the initial bound, Equation (2), from the work of [5].
Consider a decreasing probabilistic vector p = (p 1 , p 2 , . . . , p n ) with H (p) ≥ 2. Inspired by the trick in [5], we construct the new probabilistic vector q = (p 1 , p 2 , . . . , p n−1 , (1 − α) p n , αp n ), which is decreasing and strictly positive if and only if α ∈ (0, 1/2]. From the grouping property of entropy, where is the binary entropy function. Moreover, we calculate: Thus, because H (q) ≥ 2, we can apply the Massey inequality for q to obtain: Now, we note that h (0) = 0 and h (1/2) = 1 > 1 2 1 ln 2 ≈ 0.7213; thus, by the strict concavity of h, we have h (α) > α 1 ln 2 . Using the inequality 2 x > 1 + x ln 2 for x = p n h (α) > 0, we find that: i.e., our lower bound is always strictly tighter than Massey's, For the particular value α = 1/2, the bound above produces the initial bound, Equation (2), of [5]. Thus, we conclude this subsection with the following refinement to Equation (2) summarizing the above: Theorem 1. For strictly positive descending probabilistic vectors p ∈ R n with H (p) ≥ 2, we have: Proof. The first inequality follows taking the supremum over α in Equation (7). The second and third inequality follow taking α = 1/2, which yields the right-hand side in Equation (2), where the strictness of the last inequality follows from the strictness of (7), the right-hand side of which is the bound obtained taking α = 0.
Up to this point, we found a direct refinement over the initial result of [5]. In the next section, we will leverage Theorem 1 in order to refine the strongest result, the bound in Equation (1).

Refinement through Generalization
In this section, we extend the reasoning of [5] to improve our bound from Theorem 1 beyond (1). We leverage a technique previously used in [8], namely applying the Massey inequality to increasingly complicated perturbations of the input in order to obtain ever tighter bounds for the original input.
Given an initial decreasing probabilistic vector p, we construct a list of probabilistic sequences {Q k }, which we recursively define according to the procedure in the previous subsection.
We begin by fixing an arbitrary parameter α ∈ [0, 1/2] as in the section above. Now, we explicate our construction. Denoting by Q k,i the ith component of the sequence Q k , we define the terms of the list {Q k } as follows. We let the support of the first term coincide with p, i.e., Q 0 = (p 0 , p 1 , . . . , p n , 0, 0, . . . , 0, . . . ), and we define the other terms by recurrence: Now, we expand the recursion relation for the Shannon entropy, H (Q k+1 ), and guessing entropy E [G (Q k+1 )], which are analogous to Equations (5) and (6), respectively, to obtain: so, at each step k, we can use Q k+1 to obtain an approximation of E [G (Q k )] that is tighter than Massey's. This procedure allows us to obtain ever tighter approximations of the guessing entropy. For example, fixing an arbitrary mixing rate α > 0 and step number k > 0, we apply the Massey inequality for the distribution Q k+1 and find that as per Equation (7), Now, this refinement trickles down producing ever tighter bounds on the guessing entropy of lower index elements of the list up to the initial element p = Q 0 , where the tightest of the enumerated bounds is: which as we have shown increases with k up to the limit: where the optimal value of α will be discussed in the following sections. We now conclude this section with an even stronger refinement of the Massey inequality.

Theorem 2.
For strictly positive descending probabilistic vectors p ∈ R n with H (p) ≥ 2, we have: Proof. The first inequality follows taking the supremum over α in Equation (8) and the rest taking Furthermore, note that Theorem 2 is a refinement over the strongest result of [5], namely Equation (1).

The Massey Gap
In this section, we reevaluate the scaffolding we used to prove Theorem 2 and re-frame our previous technique as an optimization process applied to an objective function we name the Massey gap, the function M (x) described below.
The first observation that we make is that Theorem 2 also applies to descending probabilistic infinite sequences p, by taking n to be the size of its support, |p|. Moreover, we notice that the core of the argument was not the precise construction of the list {Q k }, but rather the fact that: Notice that this remark provides a sequence {y k } of ever tighter lower bounds on E [G (X 0 )], the weakest of which, y 0 , reduces to the Massey inequality, while the strongest is given by lim k→∞ y k . As an example, considering the list X k := Q k , this limit coincides with the result of Theorem 2.

Mixing Probabilities
In the previous section, we presented a refinement to (1) by generalizing the tail rebalancing method of [5]. In this section, we investigate whether we can obtain even better results by applying the same rebalancing technique in the middle of the distribution, rather than at the tail.
In the previous section, we showed how Theorem 2 follows from Remark 1 for some particular {X k }. However, this option is not necessarily unique. In this section, we consider alternative methods of optimizing the Massey gap, that is we fix an arbitrary intermediary configuration X k , and we find conditions on the existence of available X k+1 that decrease the Massey gap, i.e., M (X k ) ≥ M (X k+1 ).
In order to simplify the notation, for the remainder of this section, we denote the available configuration X k by p and the potential next configuration X k+1 by q. Now, we are ready for a trick similar to the one in Section 2.2: we fix an arbitrary integer i ∈ N such that p i = 0 and an arbitrary mixing rate α ∈ (0, 1/2] and consider q of the form: with the same support as p. For q as defined above in Equation (9), we find that E [G (q)] = E [G (p)] + α (p i − p i+1 ) while: thus, the condition M (p) ≥ M (q) is equivalent to: which, as we have seen in the previous section, always holds at least for some i, namely i = |p|. Now, we check whether alternative values for i may be eligible, i.e., whether we may mix probabilities in the middle of the distribution. Using the inequality (Tighter bounds may also be used.) , we find that: Thus, to satisfy M (p) ≥ M (q), corresponding to a decrease in the Massey gap, it is sufficient that: i.e., so depending on the value of H (p), multiple indices i and mixing ratios α may be selected for rebalancing, as outlined in the following Theorem 3. Let p be a descending probabilistic sequence with H (p) ≥ 2 and i ∈ N be such that p i = 0 and α ∈ (0, 1/2] satisfy Equation (10) or, sufficiently, Equation (11). Then, defining q as in Equation (9), we have: Proof. We have shown that Equation (11) is a sufficient condition for us to have M (p) ≥ M (q); thus, applying Remark 1 to the sequence X = (p, q, q, q, . . . ), the result follows.
We conclude this section by remarking that Theorem 3 has the potential to further refine bounds obtained using Theorem 1 depending on the form of the distribution p.

Strict Refinement and Optimal Mixing Rate
In this section, we consider the problem of finding the optimal mixing rate α. We notice that α must be a critical point of the difference between the sides of (10), which is: For example, we notice that if H (p) ≥ 2, then the right denominator is greater than one and: so the mixing rate α = 1/2 is never optimal; thus, Theorem 2 is a strict refinement over the work of [5].
Investigating whether q actually decreases the Massey gap, we now consider the optimality of α = 0. In this case, q i = p i and q i+1 = p i+1 ; thus, when p i ≥ p i+1 2 2 2−H(p) , then f (0) > 0, so α = 0 is certainly sub-optimal as M (q) − M (p) is increasing around zero. Moreover, we note that the simple upper bound: imposes that the optimal mixing ratio α * satisfy: i.e., and hence: which is a non-trivial bound. For example, to apply Equation (12) to the particular case of mixing only at the tail of the distribution, i.e., Theorem 2, we take i = n, hence p i+1 = 0, and substituting in Equation (12), we obtain the requirement: α * ≤ 1/ 1 + 2 1/(p n +2 H(p)−2 ) < 1/2 on the mixing rate realizing the supremum, so we have a closed-form strict refinement over Equation (2): Theorem 4. Let p be a descending probabilistic sequence with H (p) ≥ 2 and i ∈ N be such that p i > 2 2 2−H(p) p i+1 , and let α = 1/ 1 + 2 1/(p n +2 H(p)−2 ) . Then, defining q as in Equation (9), we have: We end this section by remarking that Theorem 4 can be used in conjunction with Theorem 2, complementing the construction of the list of sequences defined in the proof of Theorem 2.

Discussion
In this section, we discuss in a more detailed manner the context of our work, evaluate the saturation conditions of the discussed bounds, and present prospects for future work.
We begin our discussion by taking a closer look at the most prevalently used form of the Massey inequality, namely Equation (3). Notably, this inequality only holds for the case that H (p) ≥ 2, suggesting that its nice shape is a case of form vs. function, as illustrated below.
Massey [1] found that the set of probabilistic vectors of arbitrary length, which have a fixed guessing entropy A, are convex for any given A; thus, by strict convexity, the Shannon entropy attains its maximum over this set at an interior point. Using Lagrange multipliers, Massey found that the critical p must be a geometric distribution, and imposing the condition E [G (p)] = A, he found Because this distribution maximizes the Shannon entropy, for arbitrary distributions p, we have: an unwieldy form that is saturated only for geometric distributions. Thus, the bound in Equation (14) is only ever saturated when |p| = ∞; therefore, the Massey inequality is never tight for distributions encountered in most applications such as cryptography.
To find a simpler form of Equation (14), Massey noticed that when A ≥ 2, the second term of Equation (13)  From this remark, we conclude that any inequality derived from an application of the Massey inequality is saturated if and only if the distribution for which the Massey inequality is ultimately applied is the geometric distribution of ratio 1/2. It follows that for the initial result of [5], Equation (2), as well as for our own initial result from Theorem 1, the bounds are saturated only for the geometric distribution of the ratio 1/2, even if they are tighter than the Massey inequality.
However, when those inequalities are applied after an unbounded number of steps, it is possible to obtain bounds saturated by more inputs. Indeed, for the stronger result of [5], as well as our own stronger result from Theorem 2, the bounds are saturated for any truncation of the geometric distribution of the ratio 1/2, i.e., when the original distribution has the form p = 1/2, 1/4, . . . , 1/2 n−2 , 1/2 n−1 , 1/2 n−1 for some n ≥ 2, i.e., for a countable range of E [G (p)]. It is then natural to ask how the tightness improvement in Theorems 1 and 2 is reflected in saturation conditions. To this end, we notice that the limiting factor is the requirement that the limit of the list of probabilistic distributions is required to be the geometric distribution of the ratio 1/2; hence, for saturation, we necessarily have α = 1/2, i.e., this is a direct consequence of using the nice form of the Massey inequality, Equation