Extreme Tail Ratios and Overrepresentation among Subpopulations with Normal Distributions

: Given several different populations, the relative proportions of each in the high (or low) end of the distribution of a given characteristic are often more important than the overall average values or standard deviations. In the case of two different normally-distributed random variables, as is shown here, one of the (right) tail ratios will not only eventually be greater than 1 from some point on, but will even become inﬁnitely large. More generally, in every ﬁnite mixture of different normal distributions, there will always be exactly one of those distributions that is not only overrepresented in the right tail of the mixture but even completely overwhelms all other subpopulations in the rightmost tails. This property (and the analogous result for the left tails), although not unique to normal distributions, is not shared by other common continuous centrally symmetric unimodal distributions, such as Laplace, nor even by other bell-shaped distributions, such as Cauchy (Lorentz) distributions.


Introduction
In comparisons of the distributional values of a given psychological or physical trait between two populations (e.g., treated/untreated, male/female, exposed/nonexposed, elderly/youthful), the relative proportion of each population with values exceeding specified threshold or cutoff levels is often of more interest than comparisons of average values, standard deviations, or combinations thereof such as the Cohen d effect-size measure. The (right) tail ratio of one distribution compared to a second distribution, a measure of the relative tail proportions, is the ratio of the fraction of the first population above a given cutoff to the fraction of the second population above that same cutoff.
Tail ratios are a common measure of differences in extremes between populations in general and are of particular interest in psychological research, as emphasized by [1] in their review of the background, history, and practical advantages of tail ratios. For example, ref. [2] reported actual numerical ranges (between two and four) of certain male/female tail ratios and [3] studied how certain male/female tail ratios have changed over time. More generally, the issues of over-and underrepresentation of various factions of populations consisting of two or more subpopulations has become an important subject of study (see [4][5][6][7][8]).
The values of many biological or psychological traits (e.g., blood pressure, IQ, height) are often assumed to have normal (Gaussian) distributions (e.g., [9][10][11]), and the goal of this note is to record a simple fact about normal distributions that may be useful in interpreting statistical data concerning both tail ratios and over-and underrepresentation in mixed populations. In particular, it is shown below that in every population consisting of a finite number of subpopulations with different distributions of a normally distributed trait, exactly one of the subpopulations will not only dominate every other one in the right tail but also will do this in an extreme manner, eventually overwhelming all the other subpopulations. This property, although not unique to normal distributions, is not shared by other common distributions, including ones that are also continuous, centrally symmetric, and unimodal, nor even by other bell-shaped distributions such as the common Cauchy (Lorentz) distributions.

Tail Ratios and Right-Tail Dominance
Recall that a probability measure P on the real line is uniquely determined by its complementary cumulative distribution function (ccdf) F P , defined by F P (x) = P((x, ∞)) for all x ∈ R. (F P is also often called the survival function of P, since F P (c) represents the P-probability of the set above the cutoff threshold c, i.e., the fraction that survives when all values less than or equal to c are removed.) With this notation, the formal definition of the tail ratio of P 1 to P 2 is as follows. Definition 1. Given probability distributions P 1 and P 2 , the tail ratio of P 1 to P 2 is the function F 1 (c)/F 2 (c), where F 1 and F 2 are the ccdfs for P 1 and P 2 , respectively.
(N.B. By convention, the tail ratio of P 1 to P 2 is that of the right tail; i.e., the ratio of the ccdf of P 1 to the ccdf of P 2 , not the ratio of the cdf of P 1 to the cdf of P 2 , which would yield the left tail ratio.) A distribution P 1 may be said to dominate another distribution P 2 in the right tail if, for all sufficiently large cutoffs c, the tail ratio of P 1 to P 2 is strictly greater than 1, i.e., F P 1 (c) > F P 2 (c). A much stronger notion of domination in the right tail is when the survival probabilities of P 1 eventually become arbitrarily larger than those of P 2 as the cutoff increases; this is formalized in the next definition.

Definition 2.
A probability distribution P 1 strongly dominates distribution P 2 in the right tail if the (right) tail ratio of P 1 to P 2 becomes infinitely large as the cutoff increases, i.e., if We also recall that a continuous (absolutely continuous) probability distribution P is uniquely determined by its probability density function f P : R → [0, ∞) via P((a, b)) = R b R ∞ f P (x)dx; so, in particular, F P (c) = f P (x)dx. The next lemma, which records a simple a c relationship between the quotients of probability density functions (pdfs) and the quotients of the corresponding ccdfs, is used in several examples and proofs below. Lemma 1. Suppose P 1 and P 2 are continuous probability distributions with strictly positive continuous pdfs f 1 and f 2 and with ccdfs F 1 and F 2 , respectively. If lim x→∞ Proof. Let f 1 and f 2 be strictly positive continuous pdfs with corresponding ccdfs F 1 and F 2 , respectively. Then, since lim x→∞ F 1 (x) = lim x→∞ F 2 (x) = 0, and f 2 (x) > 0 for all x, where the first equality follows by the general form of L'Hôpital's rule, the second since F 0 i (x) = − f i (x), and the third by hypothesis.
The next example illustrates the difference between domination and strong domination.
, respectively. Then by Lemma 1, which implies that, as c → ∞, the P 1 -probability of the set of numbers greater than c approaches exactly twice the P 2 -probability of numbers greater than c. Thus, although P 1 dominates P 2 in the right tail, neither P 1 nor P 2 strongly dominates the other in the right tail. (ii) Let P 1 and P 2 be Laplace distributions with medians m 1 = 1 and m 2 = 0 and scale pa- again, neither P 1 nor P 2 strongly dominates the other in the right tail. (iii) Let P 1 and P 2 be Laplace distributions with medians m 1 = m 2 = 0 and scale parameters s 1 = 1 and s 2 = 0.5, respectively. Then, P 1 strongly dominates P 2 in the right tail since (iv) Let P 1 and P 2 be normal distributions with identical variances +1 and with means 1 and √ as x → ∞; so, by Lemma 1, P 1 strongly dominates P 2 in the right tail.

Tail Ratios in Normal Distributions
When population research studies report only the means and standard deviations of their results, the default scientific understanding is that the data are approximately normally distributed. That is, the distributions in question are close to normal (Gaussian) distributions (see [12] for a comprehensive treatment of this classic distribution.) For example, if a research study reports that their data have an average value of 2 and a standard deviation of 1, then the usual understanding is that the underlying dataset looks like the diagram in Figure 1(left) with m = 2 and s = 1, not like the somewhat similar Cauchy distribution in Figure 1(right). The underlying theoretical basis for the assumption of normality in most cases is the remarkable Central Limit Theorem, which says that if the numerical results of independent repetitions of any experiment are added, the empirical distribution (and consequently the sample average) always approaches a normal distribution. For instance, in the present context of tail ratios, the survey article by Voracek, Mohr, and Hagmann states "all tail-ratio calculations discussed here assume normally distributed variables" ([1], p. 882).
The appropriateness of assuming that given data have a normal distribution is often tested using the well-known empirical observation called the "68%-95%-99.7% rule" of normality illustrated in Figure 1(left). The one key property of a continuous centrally symmetric unimodal distribution that makes it normal is the unique (after rescaling) rate of decrease in its density function away from its mean. The normal density function, discovered by Gauss in 1809 in connection with his studies of astronomical observation errors, decreases from its mean at a rate exactly proportional to e 2 and not to e or x , for example, as is the case for the Laplace and Cauchy distributions, respectively. The Cauchy distribution, for instance, which sees widespread application in physics, also has a continuous centrally symmetric unimodal bell-shaped density similar to the normal distribution, but the Cauchy distribution has an undefined mean and variance and satisfies a different empirical rule, namely the 50%-70%-79.5% rule illustrated in Figure 1(right). Thus, it is very easy in practice to distinguish between these two similarlooking common bell-shaped distributions.
As is easy to see, the density functions of every two different normal distributions intersect in either exactly one or in exactly two distinct points. Thus, the density function of one of those two distributions is strictly larger than that of the other at all points greater than the larger of the two intersection points (or the unique one, if there is only one). This, in turn, implies that the proportion of that distribution from that point on is strictly larger than that of the other distribution from that point on; thus, this distribution will be overrepresented in the right tail. This is illustrated in the following example.
(N.B. For brevity, the standard notation N(m, σ 2 ) will be used throughout this note to denote a normal distribution with mean m and standard deviation σ > 0.) Example 2. (i) Let P 1 ∼ N(100, 10 2 ) and P 2 ∼ N(110, 10 2 ). It is clear that the unique crossing point of the density functions of P 1 and P 2 is at x = 105, which implies that the proportion of P 2 that is above any cutoff c > 105 is greater than the proportion of P 1 above c, i.e., the tail ratio of P 2 to P 1 is greater than 1 for all cutoff values c strictly greater than 105. Conversely, the proportion of P 1 below any c < 105 is greater than the proportion of P 2 below c. (ii) Let P 1 ∼ N(100, 10 2 ) and P 2 ∼ N(101, 11 2 ). By basic algebra, the two crossing points of the density functions of P 1 and P 2 are seen to be at x 1 = ∼ 83.52 and x 2 = ∼ 106.95, which implies that the tail ratio of P 2 to P 1 is greater than 1 for all cutoffs c > x 2 . Similarly, in this case P 2 also dominates P 1 in the lower tail in that the proportion of P 2 that is below any cutoff c < x 1 is also greater than the proportion of P 1 below c.
As was seen in Example 1, for two given different Cauchy distributions or two different Laplace distributions, neither distribution may strongly dominate the other in the right tail. This is in sharp contrast to the main conclusion in this note, where it will be shown that in every finite collection of different normal distributions, there is always a unique one of those distributions that strongly dominates every one of the other distributions in the right tail. Theorem 1. Let P 1 and P 2 be different normal distributions. Then, (i) either P 1 strongly dominates P 2 in the right tail or P 2 strongly dominates P 1 in the right tail; (ii) if P 1 strongly dominates P 2 in the right tail, then either P 1 has greater mean (average value) than P 2 or P 1 has greater variance than P 2 or both; (iii) if P 1 has greater variance than P 2 , then P 1 strongly dominates P 2 in both right and left tails, independent of the means.
The same essential argument extends easily to show that among every finite collection of different normal distributions, strong domination in the right tail by exactly one of those distributions is inevitable. Corollary 1. Given a finite number of different normal distributions P 1 , . . . , P n , there is a unique one of these distributions that strongly dominates all the others in the right tail.
Proof. For each i ∈ {1, . . . , n}, let normal distributions P i have mean m i and standard deviation σ i , respectively. Since the distributions are all different, if m i = m j and σ i = σ j then i = j, which implies that there exists a unique i * ∈ {1, . . . , n} such that m i * = max{m j : σ j = max{σ 1 , . . . , σ n }}. By the arguments for Cases 1 and 2 in Theorem 1, P i * strongly dominates P i in the right tail for all i 6 = i * .
As was seen in Theorem 1, if P 1 has either greater variance than P 2 or the same variance and higher mean, then P 1 will strongly dominate P 2 in the right tail. Moreover, for many practical purposes, "most important, what might appear to be trivial group differences in both variability and central tendency can cumulate to yield very appreciable differences between the groups in numbers of extreme scorers" ([10], p. 11). The next example, a slight modification of the numerical example suggested by Feingold, illustrates this observation with the two normal distributions in Example 2(ii) that are close in mean value (100 vs. 101) and in standard deviation (10 vs. 11).
Example 3. Suppose a population X consists of two mutually exclusive subpopulations X 1 and X 2 , where the values of a given trait are normally distributed with distributions P 1 ∼ N(100, 10 2 ) and P 2 ∼ N(101, 11 2 ), respectively, as in Example 2(ii). A normal distribution calculator yields that P(X 1 > 130) ∼ = 1.3490 × 10 −3 and P(X 2 > 130) ∼ = 4.1900 × 10 −3 , which yields the tail ratio P(X 2 > 130)/P(X 1 > 130) ∼ = 3.106. Thus, even with distributions this close in average value and standard deviation, if the two subpopulations X 1 and X 2 are of the same size, then the X 2 population will comprise more than 75% of the combined population beginning only three standard deviations above the mean.
Note that the results in Theorem 1 and Corollary 1 for normal distributions only depend on first-order asymptotic terms, and the question of which more general classes of distributions with rapidly decaying tails satisfy the same conclusions is left to the interested reader.

Overrepresentation in the Right Tail
Whether a particular subpopulation is overrepresented or underrepresented among the other subpopulations with respect to given values for a specific trait depends on the relative size of that subpopulation with those trait values compared to the size of the whole population with those trait values. For example, if subpopulation X 1 comprises 30% of the total population, but comprises 40% of the population with trait values above a given cutoff c, then X 1 is overrepresented in the portion of the total population with values greater than c.
The goal of this section is to show that a simple consequence of Corollary 1 is that in every finite mixture of different normal distributions, exactly one of those distributions will be strongly overrepresented in the right tail. (Recall that a finite mixture of distributions is a n probability distribution with cdf F satisfying F = ∑ w i F i , where n > 1, F 1 , . . . , F n are cdfs, i=1 n and w 1 , . . . , w n are strictly positive weights with ∑ w i = 1. ) i=1 n Definition 3. Given a finite mixture of distributions F = ∑ w i F i , the distribution F i * is strongly i=1 overrepresented in the right tail of F if, as c → ∞, the proportion of subpopulation F i * with values above c approaches 100% of the total population of F with values above c, that is, if n Theorem 2. In every finite mixture of different normal distributions F = ∑ w i F i , there is a unique i=1 i * ∈ {1, . . . , n} such that F i * is strongly overrepresented in the right tail of F.
Proof. For all i ∈ {1, . . . , n}, let F i denote the ccdf of P i . By Corollary 1, there exists a unique i * ∈ {1, . . . , n} such that P i * strongly dominates P i for all i 6 = i * , i ∈ {1, . . . , n}, i.e., so F i * is strongly overrepresented in the right tail of F.
Two concrete examples of normally-distributed traits among human populations are height and test scores such as those in the College Board Scholastic Aptitude Test (SAT). There are enormous amounts of data on human height, which are essentially continuous and are very close to being normally distributed ( [11], p. 24). Scores on the SAT, on the other hand, are originally discrete but the distribution "obtained from a continuized, smoothed frequency distribution of original SAT scores" is a linear transformation of a normal distribution ( [9], p. 59). Thus, since all linear transformations of normal distributions are normal, for all practical purposes, the resulting smoothed SAT scores have normal distributions.

Example 4.
The SAT scores of males and females are usually assumed (or designed) to be approximately normally distributed [9]. Unless the distributions are identical, Theorem 2 implies that exactly one of those two sexes must be strongly overrepresented in the right tail, and that this overrepresentation will increase as the score range increases; Figure 2 illustrates this with actual College Board statistics.  [13], p. 7, titled "The numbers College Board didn't publish" showing statistics for nearly two million students for the 2016 Edition of the Scholastic Aptitude Test, with breakdown by gender and score ranges. Note that the proportions of males in various score ranges, i.e., the tail ratios, increase as the score range increases and the left tail ratios also increase as the score range decreases.(About 10% more females participated than males, which is reflected in the Adjusted Male/Female Ratios).

Discussion
In real life examples, of course, there are no variables that are exactly normally distributed, since the normal distribution is continuous, and the number of people in various categories, for example, is necessarily finite. However, if distributions are close to being normally distributed, the right-tail overrepresentation of a unique subpopulation predicted by Theorem 2 (and the analogous conclusions for left tails) are perhaps reasonable to expect. Similarly, in real-life examples, calculations involving tails that are 6 or 7 standard deviations out involve probabilities of less than one in 10 billion and are meaningless among the current human population of this planet.

Conflicts of Interest:
The authors declare no conflict of interest.