A New Index for Measuring the Non-Uniformity of a Probability Distribution

Hening Huang

doi:10.3390/appliedmath5030102

Teledyne RD Instruments, San Diego, CA 92127, USA

^†

Retired.

AppliedMath2025, 5(3), 102;https://doi.org/10.3390/appliedmath5030102

Version Notes

Order Reprints

Abstract

This paper proposes a new index, the “distribution non-uniformity index (DNUI)”, for quantitatively measuring the non-uniformity or unevenness of a probability distribution relative to a baseline uniform distribution. The proposed DNUI is a normalized, distance-based metric ranging between 0 and 1, with 0 indicating perfect uniformity and 1 indicating extreme non-uniformity. It satisfies our axioms for an effective non-uniformity index and is applicable to both discrete and continuous probability distributions. Several examples are presented to demonstrate its application and to compare it with two distance measures, namely, the Hellinger distance (HD) and the total variation distance (TVD), and two classical evenness measures, namely, Simpson’s evenness and Buzas and Gibson’s evenness.

Keywords:

probability distribution; non-uniformity; statistical distance; unevenness; uniformity

1. Introduction

Non-uniformity, or unevenness, is an inherent characteristic of probability distributions, as outcomes or values from a probability system are typically not distributed uniformly or evenly. Although the shape of a distribution can offer an intuitive sense of its non-uniformity, researchers often require a quantitative measure to assess this property. Such a measure is valuable for constructing distribution models and for comparing the non-uniformity across different distributions in a consistent and interpretable way.

A probability distribution is considered uniform when all outcomes have equal probability, in the discrete case, or when the probability density is constant, in the continuous case. Therefore, the uniform distribution serves as the natural baseline for assessing the non-uniformity of any given distribution, and non-uniformity is referred to as the degree to which a distribution deviates from this uniform benchmark. It is essential to ensure that the distribution being evaluated and the baseline uniform distribution share the same support. This requirement is especially important in the continuous case, where a fixed and clearly defined support is crucial for meaningful comparison.

The Kullback–Leibler (KL) divergence or the χ² divergence may be used as a metric for measuring the non-uniformity of a given distribution by quantifying how different the distribution is from a baseline uniform distribution. For a discrete random variable X with probability mass function (PMF)

P (x)

and n possible outcomes, the KL divergence relative to the uniform distribution with PMF 1/n is given by

K L = \sum_{i = 1}^{n} P (x_{i}) l o g \frac{P (x_{i})}{(1 / n)} = l o g (n) + \sum_{i = 1}^{n} P (x_{i}) l o g P (x_{i})

(1)

The χ² divergence is given by

χ^{2} d i v e r g e n c e = \sum_{i = 1}^{n} \frac{{P (x_{i})}^{2}}{(\frac{1}{n})} - 1 = n \sum_{i = 1}^{n} {P (x_{i})}^{2} - 1

(2)

While a KL or χ² divergence value of zero indicates perfect uniformity, there is no natural upper bound that allows us to specify how non-uniform a distribution is. Furthermore, as shown in Equations (1) and (2), the KL or χ² divergence will tend to infinity as the number of possible outcomes (n) goes to infinity, regardless of the distribution (except for the uniform distribution). The lack of an upper bound can make interpretation difficult, especially when comparing different distributions or when the scale of the divergence matters. Therefore, we will not discuss the KL and χ² divergence further in this paper.

The Hellinger distance (HD) and the total variation distance (TVD), two well-known distances, may be used to measure the non-uniformity of a given distribution relative to a baseline uniform distribution. For the discrete case, the HD, as a non-uniformity measure, is given by (relative to a uniform distribution with PMF 1/n)

H D = \sqrt{\frac{1}{2} \sum_{i = 1}^{n} {(\sqrt{P (x_{i})} - \frac{1}{\sqrt{n}})}^{2}}

(3)

The TVD as a non-uniformity measure is given by

T V D = \frac{1}{2} \sum_{i = 1}^{n} | P (x_{i}) - \frac{1}{n} |

(4)

The HD and TVD range between 0 and 1 and do not require standardization or normalization. This is a desirable property for non-uniformity metrics. However, to the best of the author’s knowledge, the HD and TVD have not been used to measure distribution non-uniformity. Therefore, their performance is unknown.

In recent work, Rajaram et al. [1,2] proposed a measure called the “degree of uniformity (DOU)” to quantify how evenly the probability mass or density is distributed across available outcomes or support. Specifically, they defined the DOU for a partial distribution on a fixed interval as the ratio of the exponential of the Shannon entropy to the coverage probability of that interval [1,2].

D O U = \frac{D_{P}}{c_{P}} = \frac{1}{c_{P}} e x p (H_{P})

(5)

where the subscript “P” denotes “part”, referring to the partial distribution on the fixed interval,

c_{P}

is the coverage probability of the interval,

H_{P}

is the entropy of the partial distribution, and

D_{P} = e x p (H_{P})

is the entropy-based diversity of the partial distribution. When the entire distribution is considered,

c_{P} = 1

, and thus, the DOU equals the entropy-based diversity

e x p (H)

. It should be noted that the DOU is neither standardized nor normalized and does not explicitly measure the deviation of a given distribution from a uniform benchmark. Therefore, we will not discuss the DOU further in this paper.

Classical evenness measures, such as Simpson’s evenness and Buzas and Gibson’s evenness, are essentially diversity ratios. For a discrete random variable X with PMF

P (x)

and n possible outcomes, Simpson’s evenness is defined as (e.g., [3])

E_{S 2} = \frac{1 / \sum_{i = 1}^{n} {[P (x_{i})]}^{2}}{n}

(6)

where

1 / \sum_{i = 1}^{n} {[P (x_{i})]}^{2}

is called Simpson’s diversity, representing the effective number of distinct elements in the probability system

{X, P (x)}

, and n is the maximum diversity that corresponds to a uniform distribution with PMF 1/n. The concept of effective number is the core of diversity measures in biology [4].

Buzas and Gibson’s evenness is defined as [5].

E_{B G} = \frac{e x p [H (X)]}{e x p [l n (n)]} = \frac{e x p [H (X)]}{n}

(7)

where

H (X)

is the Shannon entropy of X,

H (X) = - \sum_{i = 1}^{n} P (x_{i}) \ln P (x_{i})

, and

l n (n)

is the entropy of the baseline uniform distribution. The exponential of the Shannon entropy

e x p [H (X)]

is the entropy-based diversity and is also considered to be an effective number of elements in the probability system

{X, P (x)}

.

Both

E_{S 2}

and

E_{B G}

are normalized by n, the maximum diversity corresponding to the baseline uniform distribution. Therefore, these indices range between 0 and 1, with 0 indicating extreme unevenness and 1 indicating perfect evenness. Since evenness is negatively correlated with unevenness, we consider the complement of

E_{S 2}

and

E_{B G}

as unevenness (i.e., non-uniformity) indices. That is, we denote

{(1 - E}_{S 2})

as Simpson’s unevenness and

(1 - E_{B G})

as Buzas and Gibson’s unevenness, with 0 indicating perfect evenness (uniformity) and 1 indicating extreme unevenness (non-uniformity).

However, as Gregorius and Gillet [6] pointed out, “Diversity-based methods of assessing evenness cannot provide information on unevenness, since measures of diversity generally do not produce characteristic values that are associated with states of complete unevenness.” This limitation arises because diversity measures are primarily designed to capture internal distribution characteristics, such as concentration and relative abundance within the distribution. For example, the quantity

\sum_{i = 1}^{n} {[P (x_{i})]}^{2}

is often called the “repeat rate” [7] or Simpson concentration [4]; it has historically been used as a measure of concentration [7]. Moreover, since diversity metrics are not constructed within a comparative distance framework, they inherently lack the ability to quantify deviations from uniformity in a meaningful or interpretable way. This limitation significantly diminishes their effectiveness when the goal is specifically to detect or describe high degrees of non-uniformity.

The aim of this study is to develop a new normalized, distance-based index that can effectively quantify the non-uniformity or unevenness of a probability distribution. In the following sections, Section 2 describes the proposed distribution non-uniformity index (DNUI). Section 3 presents several examples to compare the proposed NDUI with the Hellinger distance (HD), the total variation distance (TVD), Simpson’s unevenness, and Buzas and Gibson’s unevenness. Section 4 and Section 5 provide discussion and conclusion, respectively.

2. The Proposed Distribution Non-Uniformity Index (DNUI)

The mathematical formulation of the proposed distribution non-uniformity index (DNUI) differs for discrete and continuous random variables.

2.1. Discrete Cases

Consider a discrete random variable X with PMF

P (x)

and n possible outcomes. Let

X_{U}

denote the uniform distribution with the same possible outcomes, so that its PMF

P_{U} (x)

= \frac{1}{n}

for all x. We use this uniform distribution as the baseline for measuring the non-uniformity of the distribution of X.

The difference between the two PMFs

P (x)

and

P_{U} (x)

is given by

Δ_{P (x)} = P (x) - P_{U} (x) = P (x) - \frac{1}{n}

(8)

Thus,

P (x)

can be written as

P (x) = Δ_{P (x)} + \frac{1}{n}

(9)

Taking squares on both sides of Equation (9) yields

{P (x)}^{2} = Δ_{P (x)}^{2} + \frac{2}{n} Δ_{P (x)} + \frac{1}{n^{2}}

(10)

Then, taking the expectation on both sides of Equation (10) yields

{E [P (x)}^{2}] = E (Δ_{P (x)}^{2}) + \frac{2}{n} E (Δ_{P (x)}) + \frac{1}{n^{2}} = ω_{P (x)}^{2} + \frac{1}{n^{2}}

(11)

In Equation (11), the second moment

{E [P (x)}^{2}]

is expressed as the sum of the total variance

ω_{P (x)}^{2}

and the baseline term 1/n², where

ω_{P (x)}

is called the total deviation given by

ω_{P (x)} = \sqrt{E (Δ_{P (x)}^{2}) + \frac{2}{n} E (Δ_{P (x)})}

(12)

where

E (Δ_{P (x)}^{2})

is the variance of

P (x)

relative to the baseline uniform distribution, given by

E (Δ_{P (x)}^{2}) = E {{[P (x) - P_{U} (x)]}^{2}} = \sum_{i = 1}^{n} {P (x_{i}) [P (x_{i}) - \frac{1}{n}]}^{2}

(13)

E (Δ_{P (x)})

is the bias of

P (x)

relative to the baseline uniform distribution, given by

E (Δ_{P (x)}) = E [P (x) - P_{U} (x)] = \sum_{i = 1}^{n} {P (x_{i})}^{2} - \frac{1}{n} = β (X) - \frac{1}{n}

(14)

where

β (X) = \sum_{i = 1}^{n} {P (x_{i})}^{2}

is called the (discrete) informity of X in the theory of informity proposed by Huang [8], which is the expectation of the PMF

P (x)

. The informity of the baseline uniform distribution of

X_{U}

is

β (X_{U}) = E [P_{U} (x)] = \frac{1}{n}

. Therefore,

E (Δ_{P (x)})

is the difference between the two discrete informities.

Definition 1.

The proposed DNUI (denoted by

ρ (X)

) for the distribution of X is given by

ρ (X) = \frac{ω_{P (x)}}{\sqrt{{E [P (x)}^{2}]}} = \sqrt{\frac{{E [P (x)}^{2}] - \frac{1}{n^{2}}}{{E [P (x)}^{2}]}} = \sqrt{\frac{E (Δ_{P (x)}^{2}) + \frac{2}{n} E (Δ_{P (x)})}{E (Δ_{P (x)}^{2}) + \frac{2}{n} E (Δ_{P (x)}) + \frac{1}{n^{2}}}}

(15)

where

\sqrt{{E [P (x)}^{2}]}

is the root mean square (RMS) of

P (x)

. The second moment

{E [P (x)}^{2}]

can be calculated as

{E [P (x)}^{2}] = \sum_{i = 1}^{n} {P (x_{i}) P (x_{i})}^{2} = \sum_{i = 1}^{n} {P (x_{i})}^{3}

(16)

2.2. Continuous Cases

Consider a continuous random variable Y with probability density function (PDF)

p (y)

defined on an unbounded support, such as

(- \infty, \infty

). Since there is no baseline uniform distribution defined over an unbounded support, we cannot measure the non-uniformity of the entire distribution. Instead, we examine parts of the distribution on a fixed interval

{[y}_{1}, y_{2}]

, which allows us to assess local non-uniformity.

According to Rajaram et al. [1], the PDF of a partial distribution on

{[y}_{1}, y_{2}]

is given by the renormalization of the original PDF

p^{'} (y) = \frac{p (y)}{P_{{(y}_{1}, y_{2})}}

(17)

where

P_{{(y}_{1}, y_{2})} = \int_{y_{1}}^{y_{2}} p (y) d y

, which is the coverage probability of the interval

{[y}_{1}, y_{2}]

.

Let

Y_{U}

denote the uniform distribution on

{[y}_{1}, y_{2}]

with PDF

p_{U} (y)

= \frac{1}{{{(y}_{2} - y}_{1})}

. We use this uniform distribution as the baseline for measuring the non-uniformity of the partial distribution.

Similar to the discrete case, the difference between the two PDFs

p^{'} (y)

and

p_{U} (y)

is given by

Δ_{p^{'} (y)} = p^{'} (y) - p_{U} (y) = p^{'} (y) - \frac{1}{{{(y}_{2} - y}_{1})}

(18)

Thus,

p^{'} (y)

can be written as

p^{'} (y) = Δ_{p^{'} (y)} + \frac{1}{{{(y}_{2} - y}_{1})}

(19)

Taking squares on both sides of Equation (19) yields

{p^{'} (y)}^{2} = Δ_{p^{'} (y)}^{2} + \frac{2}{{{(y}_{2} - y}_{1})} Δ_{p^{'} (y)} + \frac{1}{{{{(y}_{2} - y}_{1})}^{2}}

(20)

Then, taking the expectation on both sides of Equation (20) yields

{E [p^{'} (y)}^{2}] = E (Δ_{p^{'} (y)}^{2}) + \frac{2}{{{(y}_{2} - y}_{1})} E (Δ_{p^{'} (y)}) + \frac{1}{{{{(y}_{2} - y}_{1})}^{2}} = ω_{p^{'} (y)}^{2} + \frac{1}{{{{(y}_{2} - y}_{1})}^{2}}

(21)

The total deviation

ω_{p^{'} (y)}

is given by

ω_{p^{'} (y)} = \sqrt{E (Δ_{p^{'} (y)}^{2}) + \frac{2}{{{(y}_{2} - y}_{1})} E (Δ_{p^{'} (y)})}

(22)

where

E (Δ_{p^{'} (y)}^{2})

is the variance in

p^{'} (y)

relative to

p_{U} (y)

, given by

E (Δ_{p^{'} (y)}^{2}) = E {{[p^{'} (y) - p_{U} (y)]}^{2}} = \int_{y_{1}}^{y_{2}} \frac{p (y)}{P_{{(y}_{1}, y_{2})}} {[\frac{p (y)}{P_{{(y}_{1}, y_{2})}} - \frac{1}{{{(y}_{2} - y}_{1})}]}^{2} d y

(23)

and

E (Δ_{p^{'} (y)})

is the bias of

p^{'} (y)

relative to

p_{U} (y)

, given by

E (Δ_{p^{'} (y)}) = E [p^{'} (y) - p_{U} (y)] = \int_{y_{1}}^{y_{2}} \frac{p (y)}{P_{{(y}_{1}, y_{2})}} [\frac{p (y)}{P_{{(y}_{1}, y_{2})}} - \frac{1}{{{(y}_{2} - y}_{1})}] d y

(24)

Definition 2.

The proposed DNUI for the partial distribution on

{[y}_{1}, y_{2}]

(denoted by

ρ (y_{1}, y_{2})

) is given by

ρ (y_{1}, y_{2}) = \frac{ω_{p^{'} (y)}}{\sqrt{{E [p^{'} (y)}^{2}]}} = \sqrt{\frac{{E [p^{'} (y)}^{2}] - \frac{1}{{{{(y}_{2} - y}_{1})}^{2}}}{{E [p^{'} (y)}^{2}]}} = \sqrt{\frac{E (Δ_{p^{'} (y)}^{2}) + \frac{2}{{{(y}_{2} - y}_{1})} E (Δ_{p^{'} (y)})}{E (Δ_{p^{'} (y)}^{2}) + \frac{2}{{{(y}_{2} - y}_{1})} E (Δ_{p^{'} (y)}) + \frac{1}{{{{(y}_{2} - y}_{1})}^{2}}}}

(25)

where

{E [p^{'} (y)}^{2}]

is the second moment of

p^{'} (y)

, given by

{E [p^{'} (y)}^{2}] = \int_{y_{1}}^{y_{2}} p^{'} (y) {[p^{'} (y)]}^{2} d y = \int_{y_{1}}^{y_{2}} {[\frac{p (y)}{P_{{(y}_{1}, y_{2})}}]}^{3} d y

(26)

Definition 3.

If the continuous distribution is defined on the fixed support

[- a, a]

,

P_{(- a, a)} = 1

and

{{(y}_{2} - y}_{1}) = 2 a

, the proposed DNUI for the entire distribution of Y (denoted by

ρ (Y)

) is given by

ρ (Y) = \frac{ω_{p (y)}}{\sqrt{{E [p (y)}^{2}]}} = \sqrt{\frac{{E [p (y)}^{2}] - \frac{1}{4 a^{2}}}{{E [p (y)}^{2}]}} = \sqrt{\frac{E (Δ_{p (y)}^{2}) + \frac{1}{a} E (Δ_{p (y)})}{E (Δ_{p (y)}^{2}) + \frac{1}{a} E (Δ_{p (y)}) + \frac{1}{4 a^{2}}}}

(27)

where

{E [p (y)}^{2}]

is the second moment of

p (y)

, given by

{E [p (y)}^{2}] = \int_{- a}^{a} {p (y)}^{3} d y

(28)

the variance

E (Δ_{p (y)}^{2})

is given by

E (Δ_{p (y)}^{2}) = \int_{- a}^{a} {p (y) [p (y) - \frac{1}{2 a}]}^{2} d y = \int_{- a}^{a} {p (y)}^{3} d y - \frac{1}{a} \int_{- a}^{a} {p (y)}^{2} d y + \frac{1}{{4 a}^{2}}

(29)

and the bias

E (Δ_{p (y)})

is given by

E (Δ_{p (y)}) = E [p (y) - p_{U} (y)] = \int_{- a}^{a} p (y) [p (y) - \frac{1}{2 a}] d y = \int_{- a}^{a} {p (y)}^{2} d y - \frac{1}{2 a}

(30)

The quantity

\int_{- a}^{a} {p (y)}^{2} d y

is denoted by

β (Y)

and is called the continuous informity of Y in the theory of informity [8]. The continuous informity of the baseline uniform distribution of

Y_{U}

is

β (Y_{U}) = E [p_{U} (y)] = \frac{1}{2 a}

. Therefore,

E (Δ_{p (y)})

is the difference between the two continuous informities.

3. Examples

3.1. Coin Tossing

Consider tossing a coin, which is a simplest two-state probability system: {X; P(x)} = {head, tail; P(head), P(tail)}, where

P (t a i l) = 1 - P (h e a d)

. The DNUI for the distribution of X is given by

ρ (X) = \sqrt{\frac{{E [P (x)}^{2}] - \frac{1}{2^{2}}}{{E [P (x)}^{2}]}}

(31)

where the second moment

{E [P (x)}^{2}]

can be calculated as

{E [P (x)}^{2}] = {[P (h e a d)]}^{3} + {[P (t a i l)]}^{3}

(32)

Figure 1 shows the DNUI for the distribution of X as a function of the bias represented by

P (h e a d)

. The HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

are also shown in Figure 1 for comparison.

Figure 1. The DNUI for the distribution of X as a function of the bias represented by the probability of heads, compared with the Hellinger distance (HD), the total variation distance (TVD), Simpson’s unevenness

{(1 - E}_{S 2})

, and Buzas and Gibson’s unevenness

(1 - E_{B G}) .

As shown in Figure 1, when the coin is fair (i.e.,

P (h e a d) = P (t a i l) = 0.5

), the DNUI, HD, and TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

are all 0, indicating perfect uniformity or evenness. As the coin becomes increasingly biased toward either heads or tails, all indices increase. In the extreme case where

P (t a i l) = 1

or

P (h e a d) = 1

, the DNUI reaches a maximum value of

ρ (X) = 0.866

, reflecting a high degree of non-uniformity. However, the HD reaches a maximum value of 0.541, and the TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

reach a maximum value of 0.5, which are significantly smaller than 1, indicating that these indices fail to capture the high degree of non-uniformity.

3.2. Three Frequency Data Series

JJC [9] posted a question on Cross Validated about quantifying distribution non-uniformity. He supplied three frequency datasets (Series A, B, and C), each containing 10 values (Table 1). Visually, Series A is almost perfectly uniform, Series B is nearly uniform, and Series C is heavily skewed by a single outlier (0.6). Table 1 lists these datasets alongside the corresponding DNUI, HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

values.

Table 1. Three frequency data series and the corresponding DNUI, HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

values.

From Table 1, we can see that the DNUI value for Series A is 0.1864, confirming its high uniformity, while the DNUI value for Series B is 0.2499, indicating near-uniformity. In contrast, the DNUI value for Series C is 0.9767 (close to 1), signaling extreme non-uniformity. These results align well with intuitive expectations. The HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

values range from 0.0060 to 0.04 for Series A and from 0.0109 to 0.06 for Series B, which may be considered to reflect the uniformity of these two series fairly well. However, the HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

values range from 0.4121 to 0.7375 for Series C, which are too high to adequately reflect the severity of the non-uniformity.

3.3. Five Continuous Distributions with Fixed Support $[- a, a]$

Consider five continuous distributions with fixed support

[- a, a]

: uniform, triangular, quadratic, raised cosine, and half-cosine. Table 2 summarizes their PDFs, variances, biases, second moments, and DNUIs.

Table 2. The PDF

p (y)

, variance

E (Δ_{p (y)}^{2})

, bias

E (Δ_{p (y)})

, second moment

{E [p (y)}^{2}]

, and DNUI

ρ (Y)

for five continuous distributions with fixed support

[- a, a] .

As shown in Table 2, the DNUI is independent of the scale parameter a, which is a desirable property for a measure of distribution non-uniformity. By definition, the DNUI for the uniform distribution is 0. In contrast, the DNUI values for the other four distributions range from 0.5932 to 0.7746, indicating moderate to high non-uniformity. These results align well with intuitive expectations. Notably, the raised cosine distribution has the highest DNUI value among the five distributions, suggesting it exhibits the greatest non-uniformity.

3.4. Exponential Distribution

The PDF of the exponential distribution with support

[0, \infty)

is

p (y) = λ e^{- λ y}

(33)

where

λ

is the shape parameter.

We consider a partial exponential distribution on the interval [

0, b]

(i.e.,

y_{1} = 0

and

y_{2} = b

), and b is the length of the interval. Thus, the DNUI for the partial exponential distribution is given by

ρ (0, b) = \sqrt{\frac{{E [p^{'} (y)}^{2}] - \frac{1}{b^{2}}}{{E [p^{'} (y)}^{2}]}}

(34)

where the second moment

{E [p^{'} (y)}^{2}]

is given by

{E [p^{'} (y)}^{2}] = \frac{1}{{{[P}_{(0, b)}]}^{3}} \int_{0}^{b} {p (y)}^{3} d y

(35)

The coverage probability of the interval [

0, b]

is given by

P_{(0, b)} = \int_{0}^{b} λ e x p (- λ y) d y = 1 - e^{- λ b}

(36)

The integral

\int_{0}^{b} {p (y)}^{3} d y

can be solved as

\int_{0}^{b} {p (y)}^{3} d y = \int_{0}^{b} {\{λ e^{- λ y})\}}^{3} d y = λ^{3} \int_{0}^{b} e^{- 3 λ y} d y = \frac{λ^{2}}{3} (1 - e^{- 3 λ b})

(37)

Figure 2 shows the plot of the DNUI for the partial exponential distribution with

λ = 1

as a function of the interval length b. It also shows the PDF of the original exponential distribution, Equation (33) with

λ = 1

, as a function of y.

Figure 2. Plots of the DNUI for the partial exponential distribution with

λ = 1

and the PDF of the original exponential distribution.

As shown in Figure 2, when the interval length b is very small (approaching 0), the DNUI is close to 0, reflecting the high local uniformity within small intervals. As the interval length b increases, the DNUI also increases, indicating the growing local non-uniformity with larger intervals. When the interval length b becomes very large, the DNUI approaches 1, indicating that the distribution over a large interval is extremely non-uniform. These observations align well with intuitive expectations.

4. Discussion

4.1. Axioms for an Effective Non-Uniformity Index

It is important to note that non-uniformity indices require an axiomatic foundation to ensure their validity and meaningful interpretation. This foundation should be built upon a set of axioms that any acceptable non-uniformity index should satisfy. We propose the following four axioms for an effective non-uniformity index:

Normalization: The index should range between 0 and 1 (or approximately), with 0 indicating perfect uniformity and 1 (or near 1) indicating extreme non-uniformity.
Sensitivity to Deviations: The index should be sensitive to any deviations from a baseline uniform distribution, producing a value that reflects the extent of non-uniformity.
Consistency and Comparability: The index should yield consistent results when applied to similar distributions and enable comparisons across different distributions.
Intuitive Interpretation: The index should be easy to understand and interpret, providing a clear indication of how close a distribution is to perfect uniformity.

Of the eight non-uniformity measures evaluated in this paper, the DOU, KL divergence, and χ² divergence fail to meet Axiom 1 (normalization), as noted in the Introduction. The Hellinger distance (HD), total variation distance (TVD), Simpson’s unevenness

{(1 - E}_{S 2})

, and Buzas and Gibson’s unevenness

(1 - E_{B G})

do not satisfy Axiom 2 (sensitivity to deviations), as demonstrated in Examples 3.1 and 3.2. Only the proposed NDUI satisfies all four axioms, making it a robust and effective measure.

4.2. Normalization, Benchmarks for Defining Non-Uniformity Levels, and Invariance to Probability Permutations

The definition of the proposed NUDI is both mathematically sound and intuitively interpretable. It is a normalized, distance-based metric derived from the total deviation defined in Equations (11) and (21). Importantly, this total deviation incorporates two components, namely, variance and bias, both measured relative to the baseline uniform distribution. The particular normalization using the second moment of the PMF or PDF provides a natural and robust scaling factor, which ensures that the NDUI consistently reflects deviations from uniformity across diverse distributions while maintaining a normalized range of [0, 1], as demonstrated in the presented examples.

The proposed DNUI ranges between 0 and 1, with 0 indicating perfect uniformity and 1 indicating extreme non-uniformity. Lower DNUI values (near 0) suggest a more uniform or flatter distribution, while higher values (near 1) suggest a greater degree of non-uniformity or unevenness. Since there are no universally accepted benchmarks for defining levels of non-uniformity, we tentatively propose DNUI values of 0.25, 0.5, and 0.75 to represent low, moderate, and high non-uniformity, respectively. The proposed thresholds are determined by the DNUI’s normalized [0, 1] range, which approximately divide it into quartiles and aligns with the empirical DNUI values observed in Examples 3.1 and 3.2, where values near 0.25 indicate minor deviations, 0.5 indicate moderate deviations, and 0.75 or higher indicate significant deviations from uniformity.

Note that the DNUI (similar to other indices) depends solely on the probability values and not on the associated outcomes (or scores) or their order. This property can be illustrated using the frequency data from Series C in Section 3.2: {0.03, 0.02, 0.6, 0.02, 0.03, 0.07, 0.06, 0.05, 0.05, 0.07}. If, for example, the second and third values are swapped, the DNUI value remains unchanged. Therefore, the DNUI is not a one-to-one function of the distribution; it can “collapse” different distributions into the same value. This property is analogous to how different distributions can share the same mean or variance. The invariance of the DNUI to probability permutations implies that it may not distinguish distributions with identical probability sets but different arrangements, suggesting that in applications like clustering or anomaly detection, the DNUI should be complemented with order-sensitive metrics when structural differences are critical.

4.3. Upper Bounds of Non-Uniformity Indices in the Discrete Case

In the discrete case, when X follows a uniform distribution, the DNUI, HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

are all 0 regardless of the number of possible outcomes. However, in the extreme case, where all outcomes have probability 0 except one with probability 1, the upper bound of these indices depends on the number of possible outcomes. The upper bound of the DNUI is given by

{ρ (X)}_{u p p e r - b o u n d} = \sqrt{1 - \frac{1}{n^{2}}}

(38)

The upper bound of the HD is given by

{H D}_{u p p e r - b o u n d} = \sqrt{\frac{1}{2} \sum_{i = 1}^{n - 1} \frac{1}{n} + \frac{1}{2} {(1 - \frac{1}{\sqrt{n}})}^{2}} = \sqrt{\frac{n - 1}{2 n} + \frac{1}{2} {(1 - \frac{1}{\sqrt{n}})}^{2}}

(39)

The upper bound of the TVD is given by

{T V D}_{u p p e r - b o u n d} = \frac{1}{2} \sum_{i = 1}^{n - 1} \frac{1}{n} + \frac{1}{2} (1 - \frac{1}{n}) = 1 - \frac{1}{n}

(40)

The upper bound of

{(1 - E}_{S 2})

is given by

{{(1 - E}_{S 2})}_{u p p e r - b o u n d} = 1 - \frac{1}{n}

(41)

The upper bound of

(1 - E_{B G})

is given by

{(1 - E_{B G})}_{u p p e r - b o u n d} = 1 - \frac{1}{n}

(42)

Note that the TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

have the same upper bound. Figure 3 shows plots of the upper bounds of the five indices as functions of the number of possible outcomes. It can be seen that among the five indices, the NDUI has the largest upper bound at n = 2 (where the upper bound is minimum), which increases rapidly to 1 as n increases. In contrast, the other indices have a very low upper bound at n = 2, less than 0.541, which increases slowly to 1 as n increases. Our common sense tells us that this extreme case represents a very high degree of non-uniformity and should be represented by an index value of 1 or close to 1. Therefore, the NDUI performs best among the five non-uniformity indices.

Figure 3. Plots of the upper bounds of the five indices as functions of the number of possible outcomes.

5. Conclusions

Four axioms for an effective non-uniformity index are proposed: normalization, sensitivity to deviations, consistency and comparability, and intuitive interpretation. Among the eight non-uniformity measures evaluated in this paper, the degree of uniformity (DOU), KL divergence, and χ² divergence fail to satisfy Axiom 1 (normalization). The Hellinger distance (HD), total variation distance (TVD), Simpson’s unevenness

{(1 - E}_{S 2})

, and Buzas and Gibson’s unevenness

(1 - E_{B G})

do not satisfy Axiom 2 (sensitivity to deviations). Only the proposed NDUI satisfies all four axioms.

The proposed DNUI provides an effective metric for quantifying the non-uniformity or unevenness of probability distributions. It is applicable to any distribution, discrete or continuous, defined on a fixed support. It can also be applied to partial distributions on fixed intervals to examine local non-uniformity, even when the overall distribution has unbounded support. The presented examples have demonstrated the effectiveness of the proposed DNUI in capturing and quantifying distribution non-uniformity.

It is important to emphasize that the NDUI, as a normalized and axiomatically grounded measure of non-uniformity, could be applied to fields such as ecological modeling, information theory, and machine learning. For example, the NDUI’s sensitivity to deviations and intuitive interpretation could support its use as an alternative to diversity-based evenness measures in evenness/unevenness analysis in ecology. The scope of application of the NDUI needs to be further studied and expanded.

Funding

This research received no external funding.

Data Availability Statement

The data are contained within this article.

Acknowledgments

The author would like to thank three anonymous reviewers for their valuable comments that helped to improve the quality of this article.

Conflicts of Interest

Author Hening Huang was employed by the company Teledyne RD Instruments and retired in February 2022. The author declares that this study received no funding from the company. The company was not involved in the study design; the collection, analysis, or interpretation of data; the writing of this article; or the decision to submit it for publication.

References

Rajaram, R.; Ritchey, N.; Castellani, B. On the mathematical quantification of inequality in probability distributions. J. Phys. Commun. 2024, 8, 085002. [Google Scholar] [CrossRef]
Rajaram, R.; Ritchey, N.; Castellani, B. On the degree of uniformity measure for probability distributions. J. Phys. Commun. 2024, 8, 115003. [Google Scholar] [CrossRef]
Roy, S.; Bhattacharya, K.R. A theoretical study to introduce an index of biodiversity and its corresponding index of evenness based on mean deviation. World J. Adv. Res. Rev. 2024, 21, 22–32. [Google Scholar] [CrossRef]
Jost, L. Entropy and diversity. Oikos 2006, 113, 363–375. [Google Scholar] [CrossRef]
Buzas, M.A.; Gibson, T.G. Species diversity: Benthonic foraminifera in western North Atlantic. Science 1969, 163, 72–75. [Google Scholar] [CrossRef] [PubMed]
Gregorius, H.R.; Gillet, E.M. The Concept of Evenness/Unevenness: Less Evenness or More Unevenness? Acta Biotheor. 2021, 70, 3. [Google Scholar] [CrossRef] [PubMed]
Rousseau, R. The repeat rate: From Hirschman to Stirling. Scientometrics 2018, 116, 645–653. [Google Scholar] [CrossRef]
Huang, H. The theory of informity: A novel probability framework. Bull. Taras Shevchenko Natl. Univ. Kyiv Phys. Math. 2025, 80, 53–59. [Google Scholar] [CrossRef] [PubMed]
JJC. How Does One Measure the Non-Uniformity of A Distribution? Available online: https://stats.stackexchange.com/q/25827 (accessed on 20 March 2025).

Figure 1. The DNUI for the distribution of X as a function of the bias represented by the probability of heads, compared with the Hellinger distance (HD), the total variation distance (TVD), Simpson’s unevenness

{(1 - E}_{S 2})

, and Buzas and Gibson’s unevenness

(1 - E_{B G}) .

Figure 2. Plots of the DNUI for the partial exponential distribution with

λ = 1

and the PDF of the original exponential distribution.

Figure 3. Plots of the upper bounds of the five indices as functions of the number of possible outcomes.

Table 1. Three frequency data series and the corresponding DNUI, HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

values.

Table 1. Three frequency data series and the corresponding DNUI, HD, TVD,

{(1 - E}_{S 2})

, and

(1 - E_{B G})

values.

Series	$ρ (X)$	HD	TVD	${1 - E}_{S 2}$	$1 - E_{B G}$
A: {0.1, 0.11, 0.1, 0.09, 0.09, 0.11, 0.1, 0.1, 0.12, 0.08}	0.1864	0.0389	0.04	0.0119	0.0060
B: {0.1, 0.1, 0.1, 0.08, 0.12, 0.12, 0.09, 0.09, 0.12, 0.08}	0.2499	0.0524	0.06	0.0215	0.0109
C: {0.03, 0.02, 0.6, 0.02, 0.03, 0.07, 0.06, 0.05, 0.05, 0.07}	0.9767	0.4121	0.5	0.7375	0.5455

Table 2. The PDF

p (y)

, variance

E (Δ_{p (y)}^{2})

, bias

E (Δ_{p (y)})

, second moment

{E [p (y)}^{2}]

, and DNUI

ρ (Y)

for five continuous distributions with fixed support

[- a, a] .

Table 2. The PDF

p (y)

, variance

E (Δ_{p (y)}^{2})

, bias

E (Δ_{p (y)})

, second moment

{E [p (y)}^{2}]

, and DNUI

ρ (Y)

for five continuous distributions with fixed support

[- a, a] .

Distribution	$p (y)$	$E (Δ_{p (y)}^{2})$	$E (Δ_{p (y)})$	${E [p (y)}^{2}]$	$ρ (Y)$
Uniform	$\frac{1}{2 a}$	0	0	$\frac{1}{{4 a}^{2}}$	0
Triangular	$\{\begin{matrix} \frac{(y + a)}{a^{2}}, - a \leq y \leq 0 \\ \frac{(a - y)}{a^{2}}, 0 \leq y \leq a \end{matrix}$	$\frac{1}{{12 a}^{2}}$	$\frac{1}{6 a}$	$\frac{1}{2 a^{2}}$	0.7071
Quadratic	$\frac{3}{4 a} [1 - {(\frac{1}{a} y)}^{2}]$	$\frac{1}{{28 a}^{2}}$	$\frac{1}{10 a}$	$\frac{27}{70 a^{2}}$	0.5932
Raised cosine	$\frac{1}{2 a} [1 + \cos (\frac{π}{a} y)]$	$\frac{1}{{8 a}^{2}}$	$\frac{1}{4 a}$	$\frac{5}{8 a^{2}}$	0.7746
Half-cosine	$\frac{π}{4 a} \cos (\frac{π}{2 a} y)$	$(\frac{1}{4} - \frac{π^{2}}{48}) \frac{1}{a^{2}}$	$(\frac{π^{2}}{16} - \frac{1}{2}) \frac{1}{a}$	$\frac{1}{{4 a}^{2}}$	0.6262

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A New Index for Measuring the Non-Uniformity of a Probability Distribution

Abstract

1. Introduction

2. The Proposed Distribution Non-Uniformity Index (DNUI)

2.1. Discrete Cases

2.2. Continuous Cases

3. Examples

3.1. Coin Tossing

3.2. Three Frequency Data Series

3.3. Five Continuous Distributions with Fixed Support $[- a, a]$

3.4. Exponential Distribution

4. Discussion

4.1. Axioms for an Effective Non-Uniformity Index

4.2. Normalization, Benchmarks for Defining Non-Uniformity Levels, and Invariance to Probability Permutations

4.3. Upper Bounds of Non-Uniformity Indices in the Discrete Case

5. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

A New Index for Measuring the Non-Uniformity of a Probability Distribution

Abstract

1. Introduction

2. The Proposed Distribution Non-Uniformity Index (DNUI)

2.1. Discrete Cases

2.2. Continuous Cases

3. Examples

3.1. Coin Tossing

3.2. Three Frequency Data Series

3.3. Five Continuous Distributions with Fixed Support [ − a , a ]

3.4. Exponential Distribution

4. Discussion

4.1. Axioms for an Effective Non-Uniformity Index

4.2. Normalization, Benchmarks for Defining Non-Uniformity Levels, and Invariance to Probability Permutations

4.3. Upper Bounds of Non-Uniformity Indices in the Discrete Case

5. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

3.3. Five Continuous Distributions with Fixed Support $[- a, a]$