Generalized Species Richness Indices for Diversity

A generalized notion of species richness is introduced. The generalization embeds the popular index of species richness on the boundary of a family of diversity indices each of which is the number of species in the community after a small proportion of individuals belonging to the least minorities is trimmed. It is established that the generalized species richness indices satisfy a weak version of the usual axioms for diversity indices, are qualitatively robust against small perturbations in the underlying distribution, and are collectively complete with respect to all information of diversity. In addition to a natural plug-in estimator of the generalized species richness, a bias-adjusted estimator is proposed, and its statistical reliability is gauged via bootstrapping. Finally an ecological example and supportive simulation results are given.


Introduction
Consider an ecological community with a well-defined set of species X = { k ; k = 1, · · · , K} and an associated distribution of proportions, also known as species abundances, p = {p k ; k = 1, · · · , K}. More generally, X and p may be considered as a countable alphabet and an associated probability distribution, where K may be a finite integer or infinite. In this article, the k s may be interchangeably referred to as letters of an alphabet or species in a community, and p may be referred to as a species abundance distribution or a probability distribution. The notion of diversity in a community has been of long standing research interest. What is diversity and how should it be quantified have been the two fundamental questions at the center of diversity literature for many decades. A large number of diversity indices have been proposed in the history, for example, those by Simpson in [1], Shannon in [2], Rényi in [3] and Tsallis in [4] are among many most commonly used indices, each of which has been argued to have particular merit. The opinions on diversity and possible numerical indices to measure it are indeed diverse. There are even doubts in the general concept of diversity, for example see [5,6]; and there is also a school of thought which believes that the species richness is the only acceptable diversity index, for example see [7]. There have also been unifying efforts to define diversity indices to accommodate a range of such indices, for example see [8][9][10][11], among others. Nevertheless when it come to measuring diversity, there is a lack of agreement for a generally satisfactory univariate index. The general consensus in the existing literature seems to be that a better description of diversity should be a multidimensional index set, or a profile. A good introduction to diversity profiles is offered in [10] where many basic concepts are articulated and many related references are found.
The departure point of this article is the species richness index, K, the number of different species in a community. The species richness index is a part of almost every discussion in the existing literature, and it is so for a good reason. Like the notion of happiness, diversity is an intuitively clear notion for most, but is difficult to quantify. Does there exist a universally accepted index (or an index profile) that would please all? The answer is unknown. If there does, it has not been found. If not, then the objective would be to find one that would have wider acceptance. Either way, the search should and does continue. In that regard, the species richness index K is perhaps one of the simplest, the most direct and most intuitive of all existing diversity indices. It is difficult to dismiss such an index.
Nevertheless the species richness index has many weaknesses which can be summarized into the following list.

1.
It is oblivious to the magnitude of species abundances. 2.
It is ultra-sensitive to redistribution of any arbitrarily small proportion. 3.
It is difficult to estimate based on a sample. 4.
It does not provide an ordering, or a partial ordering, for communities with infinite number of species.
The first weakness is easily illustrated by a simple example. Consider two distributions with K = 2, p = {0.5, 0.5} and q = {0.99, 0.01}. The species richness is 2 for both but it clearly does not capture the intuitive notion of diversity. In the diversity literature species richness is sometimes considered a separated type of index from those taking abundances into consideration. This article argues that the separation is not necessary and a slight change of perspective would embed species richness in a profile that naturally takes abundances into consideration.
The second weakness is also easily illustrated by a simple example. Consider p = {1 − ε, ε} where ε > 0 is an arbitrarily small value. The species richness of p is K = 2. However taking the abundance p 2 = ε and redistributing it to m new species, k = 2, · · · , m + 1, evenly, a new distribution q = {q 1 = 1 − ε, q k = ε/m; k = 2, · · · , m + 1} is created. It is easily seen first that the species richness of q is K = m + 1, second that m is arbitrarily large so the species richness of q can be carried over all bounds, and third that the arbitrarily large difference in species richness between p and q is due to an arbitrarily small difference between p and q.
The second weakness demonstrated above is not unique to the species richness. Consider Shannon's entropy, H = ∑ K k=1 p k ln(1/p k ). Taking an arbitrarily small quantity ε > 0 (from any p k ), re-distributing it evenly to m new species each of which with proportion ε/m, and hence creating distribution q, it would then add approximately to H in evaluating entropy of q. (1) may be carried over all bounds as m increases indefinitely.
In fact, this issue of ultra-sensitivity is well-known beyond the boundary of diversity literature. In modern data science where the sample space is large, non-metrized, non-ordinal, and not completely pre-scribed, statistical inference often relies on information theoretical quantities that are sensitive to the probabilities of rare events. Such information-theoretic quantities are often ultra-sensitive toward small perturbations in the tail of a distribution.
The third weakness is essentially caused by the second weakness. As demonstrated above, two distributions, different only in the way that one is an arbitrarily stretched version of another by an arbitrarily small mass in abundance, can have arbitrarily different values in species richness. In that regard, in a random sample of size n, the species with stretched proportions collectively have very small probability to be represented. This makes it nearly impossible to estimate K with any reliability non-parametrically. Estimating K with a random sample is a long standing difficult problem in statistics. Interested readers may refer to two excellent survey papers, ref. [12,13], respectively. More specifically, a worthy line of approaches based on Turing's formula may also be of interest, see [14]. See also for example, [15][16][17][18]. Nevertheless it is fair to say that, not surprisingly, there are no known generally satisfactory estimators of K.
The fourth weakness is in the generality of the definition. Generally one would prefer to have a notion of diversity not only for communities with finite K but also for K = ∞. The species richness does not provide an ordering, or a partial ordering, for all communities with K = ∞, In fact, it does not provide an ordering or partial ordering communities with a same K < ∞.
The generalized species richness proposed in this article resolves, or at least alleviates all these weaknesses. Toward introducing the generalized species richness indices, consider the second weakness mentioned above once more. Recognizing the fact that an infinitesimal perturbation in the abundance distribution could greatly impact species richness, one may ask the following questions.

1.
If 100 × α%, where α ∈ (0, 1), of the communities belonging to species with the lowest abundances is trimmed, what would be the species richness of the remaining community? 2.
What is the least number of species that can be represented by 100 × (1 − α)% of the community?
Let the non-increasing ordered p = {p k ; k ≥ 1} be denoted by where p (k) ≥ p (k+1) for all k ≥ 1. The answer to both above questions is, for a fixed α ∈ (0, 1), where 1[·] is the indicator function. For a given α ∈ (0, 1), there is only one non-zero term in the summation of (3) with an integer value k such that 1 − α is sandwiched between See a graphic representation of K α in Figure 1. K α is the proposed generalized species richness, and it may also be reasonably referred to as the α-trimmed species richness. Let be referred to as the species richness profile. Revisiting the example of p = {0.5, 0.5} and q = {0.99, 0.01} mentioned above for the first weakness of species richness K = K 0 , with say α = 0.05, it is easily seen that K 0.05 (p) = 2 and K 0.05 (q) = 1. Revisiting the example of p = {1 − ε, ε} and its stretched version q = {q 1 = 1 − ε, q k = ε/m; k = 2, · · · , m + 1} mentioned above for the second weakness of species richness K = K 0 , it is also easy to see that arbitrary stretching of ε, that is, letting m increase indefinitely, will not carry K α (q) over all bounds so long as ε < α. In this regard, it is clear that K α may be viewed as a robustified version of species richness. With the influence from arbitrary stretching of an infinitesimal mass in abundance controlled (but not eliminated), the difficulty level in estimating K α is considerably reduced from that in estimating K. Finally the fourth weakness of species richness is eliminated since K α is always finite so long as α > 0 for distributions with K < ∞ as well as In Section 2, several properties of the generalized species richness are established. More specifically, it is established that every member of K in (6) is a diversity index as it satisfies a weak version of the usual axioms of diversity indices; and a notion of "breakdown point" is introduced and the robustness of K α is gauged accordingly. Furthermore, a notion of "completeness" in profiles is introduced and K of (6), as a profile, is shown to be complete.
To estimate K α , let an identically and independently distributed (iid) sample of size n be summarized into sample species frequencies, {Y k ; k ≥ 1}, and relative species frequencies, p = {p k = Y k /n; k ≥ 1}; and letp ↓ = {p (k) ; k ≥ 1} be a non-increasingly orderedp. A natural estimator of K α is (3), (4) or (5), withp (i) in place of p (i) , that is, specifically noting thatK α is based on the same functional K α (·) in (3) but evaluated at the empirical distributionp ↓ instead of p ↓ . It is easy to see that (7) is simply counting the number of species in the sample after 100 × α% of the observations in the sample with the lowest (observed) species relative frequencies trimmed.K α in (7) will be referred to as the plug-in estimator of K α in subsequent text. HoweverK α significantly under-estimates K α due to a well-known phenomenon-a perpetual under representation of small probability letters in a finite sample. This phenomenon was perhaps first explicitly identified by Alan Turing during World War II in an effort to break the German naval enigmas, and is referred to as the Turing phenomenon in the subsequent text. The core of the Turing phenomenon is the total probability associated with letters of the alphabet that are not represented in a sample, that is, π 0 = ∑ k≥1 p k 1[Y k = 0], also sometimes known as the "missing probability". In non-parametric estimation of information-theoretic quantities, small probability letters often carry much information and the fact many (possibly infinitely many) of them are missing in a sample often causes a significant downward bias. For example, in view of ∑ k≥1 w k p k = 1 where w k = 1 and p k > 0, Shannon's entropy H = ∑ k≥1 (ln(1/p k ))p k is an weighted average of {p k } with w k = ln(1/p k ). For another example, the species richness K = ∑ k≥1 (1/p k )p k is a weighted average of {p k } with w k = 1/p k . In both cases, the small probability events get heavy weights and therefore under-representation of them in a sample translates to under-estimation. In comparison of the two examples mentioned above, the Turing phenomenon has a much more profound impact on estimation of K than H in the sense that (ln(1/p))p → 0 and (1/p)p → 1 as p → 0. Having realized the difficulty in estimating such quantities, it would seem reasonable to device mechanisms, either by modifying the estimands (provided that the modified estimands remain relevant) or the assumption on the underlying distribution, to control the behavior of corresponding estimators. For example, ref. [19,20] discuss certain optimal rates of convergence for a class of estimators of entropy and community size under certain condition to prevent p k from being arbitrarily small, in turn to control the behavior of the estimators. This article however seeks such controls by means of α-trimming, both in the estimand, K α , as well as in its estimator,K α , specifically with regard to the notion of species richness.
On the other hand,K α in (7) may be improved by means of bias correction. There are many possible ways to correct the bias. For simplicity, an estimator based on the basic bootstrap method is proposed as in (14) of Section 3. In the same section, the statistical properties of both K α (p ↓ ) of (7) and K α (p ↓ ) of (14) are discussed. More specifically several asymptotic properties of partial sums ofp ↓ are given. Based on these asymptotic results, several conservative one-sample and two-samples inferential procedures regarding the underlying generalized species richness are proposed and justified. Several simulation results are also reported in gauging the performance of the estimators. Finally an real life ecological data set is used to illustrate the proposed method.
The article ends with an appendix where many lemmas, corollaries and propositions, along with their proofs, are found.

Properties of Generalized Species Richness Indices
Diversity as an intuitive notion is quite clear in most minds. However the quantification of diversity is still quite a distance away from a point of universal consensus. In the diversity literature it is commonly accepted that an index may be reasonably referred to as a diversity index if it satisfies several axioms. For notation convenience, let P K be the family of all distributions such that K = ∑ k≥1 1[p k > 0], that is, on a community with K species (or a finite alphabet with cardinality K), and let P be the family of all possible distributions on a general countable community. It follows that P = ∪ ∞ K=1 P K . Let D(p) be a functional defined for every p ∈ P. The essential axioms of diversity indices include: A 1 : A diversity index D(p) is invariant under any permutation of species labels, that is, any permutation on the index set {k; k ≥ 1}.
K}, the uniform distribution in P K for every positive integer K. A 4 : For any distribution p, let p * be the associated distribution of p resulted from a transfer of a mass δ > 0 from a higher p i to a lower p j subject to δ ≤ p i − p j , with all other p k s remain unchanged. A diversity index D(p) satisfies D(p) ≤ D(p * ). The list of axioms may grow longer representing a more stringent imposition on the underlying diversity indices. There are also stronger versions of the axioms. For example, A 2 as stated is a weaker version of one that requires the index D(p) to be minimized only at p = {p (1) = 1, p (k) = 0; k ≥ 2} but not at any other distributions. Similarly A 3 as stated above also has a stronger version which requires the index D(p) to be maximized only at p = {p (k) = 1/K; k = 1, · · · , K} but not at any other distributions. Axiom A 4 also has a stronger version which requires a strict inequality, that is, D(p) < D(p * ). The weaker axioms are chosen in this article because species richness K, the reference index of the discussion, satisfies them.
Regardless the length or the version of the axioms, Axiom A 1 is the most essential of them all and is universally accepted. It is important to recognize the implication of A 1 -every diversity index is a functional of p only through p ↓ . Consequently the domain of all diversity indices can be represented by the subset of P that contains only distributions in non-increasing order, denoted as P ↓ .
For a given α ∈ (0, 1), it is clear K α satisfies A 1 , A 2 and A 3 . The fact that K α satisfies A 4 is true but is not so obviously. This fact is one of the main results of this article and is summarized in Proposition A1 along with a lengthy proof, both of which are given in Appendix A. The fact that K α satisfies all axioms A 1 through A 4 suggests that it may be reasonably regarded as a diversity index.
To quantify the robustness of the generalized species richness indices against disturbances due to re-distributions of a small abundance (or probability) mass, a notion of breakdown point may be introduced. Breakdown point, roughly speaking, is the greatest proportion of data, whose worst behavior may not carry a function of the data over all bounds. To be more precise, let p ∈ P be an abundance distribution, let ε ∈ (0, 1) be an arbitrarily small value, and let ε 1 = {ε 1,k ; k ≥ 1} and ε 2 = {ε 2,k ; k ≥ 1} be two non-negative sequences, each of which is with total mass of ε > 0, that is, represent a perturbation by subtracting a mass ε away from p by means of ε 1 and adding back the same mass by means of ε 2 .

Definition 1.
Let D(p) be any non-negative function of p ∈ P. The breakdown point of D at p is Obviously 0 ≤ B p (D) ≤ 1. A higher value of B p (D) is regarded as an indication that D is more robust at p. Definition 2. Let B p (D) be as in Definition 1. Let P 0 be a sub-family of P. For any given α ∈ (0, 1], if B p (D) ≥ α for every p ∈ P 0 , then D(p) is said to be 100 × α% robust with respect to P 0 . In particular, if B p (D) ≥ α for every p ∈ P, then D(p) is said to be 100 × α% robust. Example 1. The species richness, K, is 0%-robust. This is so because sup ε K((1 − ε)p + ε) = ∞ for any p ∈ P and any small ε > 0.
Example 2. The generalized species richness, K α , is 100 × α%-robust. This claim is one of the main results of this article and is summarized in Proposition A2. Both the proposition and its proof are given in Appendix A.
In passing, it may also be of interest to evaluate the robustness of two other community diversity indices, Shannon's entropy H = − ∑ k≥1 p k ln p k and the Gini-Simpson index Example 3. Shannon's entropy is 0%-robust. To see this, for a given p, let ε > 0 be an arbitrarily small value and let a total mass of ε > 0 cumulatively trimmed from the right end in p ↓ = {p (1) , p (2) , · · · }, that is, using the language of Definition 1, which has zeros in the first K ε − 1 positions and ε K ε = ε − ∑ ∞ i=K ε +1 p (i) in the K ε th position. In such a construction, the remainder of the mass of 1 − ε covers K ε species, and p ↓ − ε 1 Redistributing the mass ε > 0 uniformly over Example 4. The Gini-Simpson index is 100%-robust. This is clearly true because 0 < D(p) ≤ 1 for any abundance distribution p ∈ P.
A diversity profile is a set of diversity indices containing more than one index. A profile is generally preferred over a single diversity index because it is commonly accepted that diversity is a multi-dimensional notion and is better captured by a multivariate index. An immediate question naturally arises: how much diversity information is contained in a profile? This question can be partially answered with a notion of completeness defined below.
is a set containing more than one element, is said to be complete, if, for any two distributions p and q, Definition 3 essentially says that a complete profile D p uniquely determines p ↓ , and in turn uniquely determines any other diversity index evaluated at p ↓ . (6) is complete. This claim is clearly true noting, for each positive integer i, (6) is not the only complete profile. The two well known families of diversity indices given in the following two examples are also complete.

Inference
Let the discussion of this section begin with a natural estimator of K α ,K α = K α (p ↓ ), as given in (7), which may be viewed an estimator based on the right-tail ofp ↓ being trimmed by a fixed mass α. This estimator however presents several difficulties in developing valid inferential procedures regarding K α . Towards describing some of these difficulties, the following proposition is first stated and proved. Proposition 1. Let p = {p k ; k ≥ 1} be the underlying distribution on a countable alphabet, satisfying p k ≥ p k+1 for every k ≥ 1, letp = {p i ; i ≥ 1} be the corresponding relative letter frequencies in an iid sample of size n, and let K be a positive integer such that 1 ≤ K < K. Suppose the multiplicity of p K in p is one. Then as n → ∞, Proof. Part 1 directly follows from the central limit theorem. For Part 2, first consider an aggregation of the letters as follows. If K < ∞ let K = K, and if K = ∞ let K be any index such that p * K = ∑ ∞ i=K p i < p K . Let the observed relative letter frequencies in the sample be aggregated accordingly, in particularly letp * is an arbitrarily small ε-neighborhood centered at the point p * . Noting p k s are arranged in a non-increasing order, p K has multiplicity 1, K is finite, and n ε (p * ) is arbitrarily small, the event {p * ∈ n ε (p * )} implies the event that the set of K largestps are identical to the first K ps inp, that is, It follows that P(O n (K )) → 1, and that for any ε > 0.
and the first term converges to zero in probability by Part 2, the asymptotic normality follows Part 1 by Slusky's theorem .
The first difficulty ofK α = K α (p ↓ ) is that it cannot be guaranteed to be consistent under general conditions. To see this, one needs only to consider a special case of ∑ i≤K α p k = 1 − α. By Part 3 of Proposition 1, for sufficiently large n, (10) implies inconsistency and, in addition to that, (10) also suggests that, for sufficiently large n,K α could over-estimate K α , albeit by at most one. Clearly the said inconsistency is caused by the discrete nature of the functional K α (p ↓ ). The second difficulty ofK α = K α (p ↓ ) is its significant downward bias when n is relatively small. To illustrate the bias, consider the extreme case of α = 0 in K α , which is simply the species richness index, K, in case of a finite sample space. If K is relatively large, a relatively small iid sample of size n would likely not cover all K species in the community. In fact, the sample would typically miss a large number of species, that is, K obs K where K obs is observed number of species in a sample. Consequently the empirical distribution, p = {p k ; k = 1, · · · , K} would consist of mostly zeros and hence would severely underrepresent p = {p k ; k = 1, · · · , K} in terms of species richness. When α > 0 but small, the same qualitative argument explains the significant downward bias ofK α .
The possible inconsistency, along with the persistent and significant downward bias, gives much difficulty in developing inferential procedures under general conditions based on asymptotic properties such as Part 3 of Proposition 1.
First let it be noted that the quantile method [θ * β/2 ,θ * 1−β/2 ] is an inadequate 100 × (1 − β)% confidence. To see this, let the extreme case of K α = K with α = 0 be considered once again. There, given an empirical distribution,p ↓ = {p (1) ,p (2) , · · · }. It is clear thatK α K α as already argued above. For the same reason, by sampling fromp ↓ , everyK * α ≤K α K α . Consequently [θ * β/2 ,θ * 1−β/2 ] necessarily excludes K α far to the right, causing the coverage of the bootstrapping interval to have much lower coverage than 1 − β. This is to say that, in terms of estimating K α , the downward bias ofK α strikes twice in bootstrapping with the quantile method, once in using the original sample and once in using a bootstrapping sample. In fact, it is commonly observed with real data sets that whereK * α,β/2 andK * α,1−β/2 are the 100 × (1 − β/2) th and 100 × β/2 th percentiles of the estimates ofK α based on bootstrapping samples. See Example 8 below. The discomforting (11) essentially disqualifies the bootstrapping confidence interval based on the quantile method as a valid inferential tool.
However bootstrapping based on the centered quantile method, also known as the basic bootstrapping method, is qualitatively different. There the downward biasK α − K α is off set by the bootstrapping downward biasK * α −K α . Once again in the extreme case of K α = K with α = 0, sinceK * α ≤K α for every bootstrapping sample, it follows thatK α −K * α,β/2 ≥ K α −K * α,1−β/2 ≥ 0 and henceK α ≤K α + (K α −K * α,1−β/2 ) ≤K α + (K α −K * α,β/2 ), or that is, the centered bootstrapping confidence interval excludesK α to the left of the interval. In fact (12) is commonly observed with real data sets even when α > 0 is small. See Example 8 below. Unlike (11), the fact thatK α is outside of the centered bootstrapping confidence interval in (12) only indicates inadequacy of the estimatorK α but not that of the interval itself. In fact the centered bootstrapping confidence interval, represents a bias-adjustment in the right direction, that is, the bias inK α as an estimator of K α is partially offset by that inK * α as an estimator ofK α . It is to be noted that (12) suggests a bias-adjusted alternative estimator toK α , whereK * α,1/2 is the median of bootstrapped estimates. The 100 × (1 − β)% bootstrapping confidence interval, or confidence set since only the integer values in the interval are relevant, in (13) provides a basic assessment of K α 's whereabouts. However its coverage does necessarily converge to the claimed value 1 − β as n increasing indefinitely, due to the above mentioned possible inconsistency ofK α and the consequential "at-most-one" over-estimation asymptotically. To take that into consideration, a conservative adjustment may be adopted by extending the lower limit of (13) by one, that is, An advantage of (15) is that its asymptotic coverage is at least 1 − β for general p ↓ , but a disadvantage is that the limiting form of (15) necessarily contains two integer values instead of one, which (13) could achieve whenK α is consistent. On the other hand, while (15) accommodates the issue of possible asymptotic overestimation (by at most one) byK α , in most practical cases, the more acute issue is still the under-estimation of K α byK α when n is not sufficiently large. The confidence set in (15) generally requires n to be quite large for its coverage to be reasonably close to the claimed coverage 1 − β. To help accelerate the convergence of the actual coverage to the claimed coverage, a more conservative adjustment may be adopted by extending the right limit of (15) by one, that is, Advantages of (16) are that its asymptotic coverage is at least 1 − β for general p ↓ and that its actual coverage converges to at least 1 − β faster as n increases. However a disadvantage is that the limiting form of (16) necessarily contains three integer values and no fewer. The bootstrapping confidence intervals, described in (13), (15) and (16), may also be utilized in testing hypothesis with different degrees of conservativeness. For example, based on (13) and at the β level of significance, in testing H 0 : K α = k α versus H a : K α > k α , H a : K α < k α or H a : K α = k α , k α is a pre-specified positive integer, one may choose to reject H 0 when respectively. Based on (15) and at the β level of significance, in testing H 0 : K α = k α versus H a : K α > k α , H a : K α < k α or H a : K α = k α , k α is a pre-specified positive integer, one may choose to reject H 0 when respectively. Based on (16) and at the β level of significance, in testing H 0 : K α = k α versus H a : K α > k α , H a : K α < k α or H a : K α = k α , k α is a pre-specified positive integer, one may choose to reject H 0 when respectively. Suppose there are two communities and it is of interest to estimate the difference between the two α-trimmed richness indices, where K 1,α and K 2,α are α-trimmed richness indices of the two underlying communities, respectively. The proposed estimator of D α in (26) iŝ whereK 1,α = 2K 1,α −K * 1,α,1/2 andK 2,α = 2K 2,α −K * 2,α,1/2 , whereK 1,α andK 2,α are as in (7) andK * 1,α,1/2 andK * 2,α,1/2 are respective bootstrapping medians from the two samples as in (14).
In testing equality of generalized species richness of two communities, D α = K 1,α − K 2,α , one may first consider a bootstrapping 1 − β confidence interval for D α based on two independent samples are size n 1 and n 2 , respectively, whereD * α,β/2 andD * α,1−β/2 are the 100 × β/2 th and the 100 × (1 − β/2) th percentiles of the bootstrapping estimates, each of which is based a sample of size n 1 fromp 1,↓ and a sample of size n 2 fromp 2,↓ , where, for j = 1 or j = 2,p j,↓ is the ordered relative frequencies of letters in the sample of size n j from the j th community.
For, H 0 : 1 and K α,2 are the respective generalized species richness of two communities and d 0 is a pre-fixed integer, approximate testing procedures may be devised based (28) or (29). For example, based on (28), one may choose to reject H 0 when Similarly, based on (29), one may choose to reject H 0 when To assess the reliability of the inferential procedures discussed above, several simulation studies are conducted. The studies are carried out under three different distributions. The first distribution is the uniform distribution with K = 20 and p k = 0.05 for k = 1, · · · , 20. The second distribution is a triangular distribution with K = 20 and p k = k/20 for k = 1, · · · , 20. The third distribution is the Poisson distribution with λ = 10 and p k = e −λ λ x /x!, noting that in this case K is infinite.
In Tables 1-3, the bias and the mean squared errors ofK α of (7) andK α of (14) are compared, at two levels of α, α = 0.01 and α = 0.05, for various sample sizes, n. Tables 1-3, respectively, summarize the results under three different distributions, the uniform, the triangular and the Poisson. Each simulation scenario is based on 5000 repeated samples. Each sample is bootstrapped 1000 times. The bias is defined in such a way that, a positive value indicates an under-estimation and a negative value indicates an over-estimation. The variable T is the average of Turing's formula, T n = n 1 /n, where n 1 is the number of singletons in a sample, based on 5000 simulated samples. T helps to indicate the adequacy of sample size. Turing's formula, T n , is sometimes called the sample coverage deficit and 1 − T n is the sample coverage (see [17]).  It is quite clear thatK α generally has a smaller simulated bias thanK α . More specifically, if one considers an absolute bias being less than one to be satisfactory, thenK α gets there faster, as n increases, thanK α in all cases considered in the simulation studies.
To assess the performance of the confidence sets in (13), (15) and (16), their actual coverage rates are evaluated by simulation studies with 1 − β = 0.95 for various sample sizes and distributions. For each scenario, the coverage rate is based on 5000 simulated samples and for each sample, the bootstrapping confidence set is based on 1000 bootstrapping samples. The results are summarized in Tables 4-6.   Let it be noted that, although the confidence set of (13) could perform well in some cases (see Columns 3 and 6 in Table 4, and Columns 6 and 12 of Table 5), it has difficulty in providing an appropriate coverage in many other cases (see Column 12 of Table 4, Columns 3, 6, 9 and 12 of Table 5, and Columns 3 and 9 of Table 6). The said difficulty is partially caused by the inconsistency mentioned above in combinations of certain distributional characteristics and the values of α. Similarly, the confidence set of (15) suffers from the same difficulty though to a lesser degree. It could also perform well in some cases (see Columns 4, 7, 10 and 13 in Table 4, Columns 7 and 10 of Table 5, and Columns 7 and 10 of Table 6), but it does not in many other cases (see Column 4 of Table 5, Columns 4 and 9 of Table 6). Since in practice the underlying distribution is not observable, it cannot be determined a priori what values of α are appropriate and what are not. This fact essentially disqualifies the confidence sets of (13) and (15) as general inferential procedures, but (16). Additionally, to be noted is the fact that the confidence set of (16) performs well across all cases in the simulation studies albeit more conservative. The confidence sets of (28) and (29) have general better performances than their one-sample counterparts due to an offset of bias between the two one-sample estimators.
Another point of interest pertains to the practically important question of how large a sample should be in order for (16) to produce a reasonable coverage. Simulation results in Tables 4-6 seem to indicate that the coverage is adequate when Turing's formula, which estimates the total probability associated with the letters of the alphabet not represented in a given sample, takes on a value approximately at a level not much greater than α, that is, T = n 1 /n < α where n 1 is the number of species observed exactly once in the sample, referred to as the rule of thumb below. (Interested readers may refer to Zhang (2017) for a comprehensive introduction to Turing's formula.) In summary, all things considered, observing the rule of thumb, is the proposed estimator of K α ; 2.

Example 8.
Two tree samples of 1-ha plots (#6 and #18), respectively, indexed as samples 6 and 18, of tropical forest in the experimental forest of Paracou, French Guiana, described in [22], are compared in terms of biodiversity. Respectively 643 and 481 trees with diameter at breast height over 10 cm were inventoried. The data is available in the entropart package for R. In these samples, 147 and 149 tree species from plots #6 and #18 are, respectively, observed, along with their frequencies.
In [23], the data are analyzed by using generalized Simpson's indices and concluded that plot #18 is more diverse than plot #6. In the respective samples, Turing's formula takes on the values of T 6 = 10.58% and T 18 = 15.38%. Observing the rule of thumb, let the generalized species richness be evaluated at α = 0. 15 (14) give two curves in Figure 2, which visually suggests that plot #18 is more diverse than plot #6 for a wide range of α.D α =K 18,α −K 6,α as a function of α, along with the 95% point-wise confidence band by means of (29), is given in Figure 3, where it is evident that, with reasonable statistical confidence, K α,18 > K α, 6 for α values in the range from 0.6 to 0.15, that is, for 1 − α values from 0.4 to 0.85.

Summary
This article proposes a generalized richness index, K α of (3), or equivalently of (4) or of (5), and an estimator,K α of (14). α ∈ [0, 0) is a user-chosen constant, and when α = 0, K α becomes the well-known original richness index, K. K α may also be referred to as the α-trimmed richness index. It is designed to remove or to alleviate several weaknesses of K. First, K is only finitely defined for some distributions but not for all. On the other hand, K α is finitely defined for all distributions on a countable alphabet. Second, K does not take the abundance {p k ; k ≥ 1} into consideration, but K α does. Third, K is ultra-sensitive to redistribution of an arbitrarily small mass, but K α is not, as evidenced by Definitions 1 and 2, Examples 1 and 4, and Proposition A2.
A conservative confidence interval based on bootstrapping is proposed in (16). This confidence interval provides the basic support for inferences about K α . A rule of thumb to judge whether the sample is adequate in supporting the proposed methodology is also proposed based on Turing's formula: T = n 1 /n < α, where n 1 is the number of singletons in the sample of size n. The rule of thumb is illustrated by simulated results in Tables 4-6. More specifically, in Table 4, the rule of thumb amounts to n ≥ 110 for α = 0.01, n ≥ 60 for α = 0.05, n ≥ 50 for α = 0.10 and n ≥ 40 for α = 0.15. The simulated coverages are all near or above the target 95%. In Table 5, the rule of thumb amounts to n ≥ 150 for α = 0.01, n ≥ 70 for α = 0.05, n ≥ 50 for α = 0.10 and n ≥ 40 for α = 0.15. The simulated coverages are all above the target 95%. In Table 6, the rule of thumb amounts to n ≥ 450 for α = 0.01, n ≥ 70 for α = 0.05, n ≥ 40 for α = 0.10 and n ≥ 30 for α = 0.15. The simulated coverages are all above the target 95%.
The one-sample estimator of K α in (14) for a single community is extended to the two-sample estimator of D α of (26), the difference of two α-trimmed richness indices of two communities. The proposed estimator of D α isD α as in (27). A proposed 100 × (1 − β)% confidence interval for D α is given in (29). This interval provides the basic support for testing hypotheses regarding D α , as specified in (32) and (33).
For the two-sample problem, the rule of thumb for the one-sample problem is modified to be: T 1 = n 1,1 /n 1 < α and T 2 = n 2,1 /n 2 < α where n 1 and n 2 are the respective sample sizes of the two independent samples, and n 1,1 and n 2,1 are the respective numbers of singletons in the two independent samples.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data used in Example 8 are available in the entropart package for R.

Conflicts of Interest:
The author of this manuscript has no financial or non-financial interests that are directly or indirectly related to the work submitted for publication.

Appendix A
The claims that K α satisfies Axiom A 4 and that K α is 100 × α% robust (the claim of Example 2) are established in this section.
For clarity of the proof, a definition and two lemmas are needed. The generalized species richness K α (p ↓ ) as in (3), (4) or (5) is defined for an underlying p being a probability distribution, that is, more specifically p k ≥ 0 for each k and ∑ k≥1 p k = 1. For notation convenience in the proofs of this section, let the definition of K α (p ↓ ) be extended to any sequence of non-negative numbers, p = {p k ; k ≥ 1} or p ↓ = {p (k) ; k ≥ 1}, such that ∑ k≥1 p k < ∞, which implies that p (k) → 0 as k → ∞, specifically noting that ∑ k≥1 p k may not necessarily be one. Definition A1. For any sequence of non-negative values p = {p k ; k ≥ 1}, such that ∑ k≥1 p k < ∞, and an α ∈ (0, ∑ k≥1 p k ), the generalized species richness is given by It is clear that if p is a bonafide probability distribution, then K α (p ↓ ) given in (3), (4) or (5) is identical to (A1) in Definition A1. In this section, the notion of K α used is that of (A1).
Lemma A1. For any given sequence of non-negative values p = {p k ; k ≥ 1}, let a mass of ε > 0 be taken away from p i for a specific index i, where ε ∈ (0, p i ]. Let p * i = p i − ε, let p * be the sequence p but with p * i in place of p i , and let p * ↓ = {p * (k) ; k ≥ 1} be the re-arranged p * in a non-increasing order. For any α ∈ (0, ∑ k≥1 p k ), K α (p * ↓ ) ≤ K α (p ↓ ).