1. Introduction
Non-uniformity, or unevenness, is an inherent characteristic of probability distributions, as outcomes or values from a probability system are typically not distributed uniformly or evenly. Although the shape of a distribution can offer an intuitive sense of its non-uniformity, researchers often require a quantitative measure to assess this property. Such a measure is valuable for constructing distribution models and for comparing the non-uniformity across different distributions in a consistent and interpretable way.
A probability distribution is considered uniform when all outcomes have equal probability, in the discrete case, or when the probability density is constant, in the continuous case. Therefore, the uniform distribution serves as the natural baseline for assessing the non-uniformity of any given distribution, and non-uniformity is referred to as the degree to which a distribution deviates from this uniform benchmark. It is essential to ensure that the distribution being evaluated and the baseline uniform distribution share the same support. This requirement is especially important in the continuous case, where a fixed and clearly defined support is crucial for meaningful comparison.
The Kullback–Leibler (KL) divergence or the χ
2 divergence may be used as a metric for measuring the non-uniformity of a given distribution by quantifying how different the distribution is from a baseline uniform distribution. For a discrete random variable
X with probability mass function (PMF)
and
n possible outcomes, the KL divergence relative to the uniform distribution with PMF 1/
n is given by
The χ
2 divergence is given by
While a KL or χ2 divergence value of zero indicates perfect uniformity, there is no natural upper bound that allows us to specify how non-uniform a distribution is. Furthermore, as shown in Equations (1) and (2), the KL or χ2 divergence will tend to infinity as the number of possible outcomes (n) goes to infinity, regardless of the distribution (except for the uniform distribution). The lack of an upper bound can make interpretation difficult, especially when comparing different distributions or when the scale of the divergence matters. Therefore, we will not discuss the KL and χ2 divergence further in this paper.
The Hellinger distance (HD) and the total variation distance (TVD), two well-known distances, may be used to measure the non-uniformity of a given distribution relative to a baseline uniform distribution. For the discrete case, the HD, as a non-uniformity measure, is given by (relative to a uniform distribution with PMF 1/
n)
The TVD as a non-uniformity measure is given by
The HD and TVD range between 0 and 1 and do not require standardization or normalization. This is a desirable property for non-uniformity metrics. However, to the best of the author’s knowledge, the HD and TVD have not been used to measure distribution non-uniformity. Therefore, their performance is unknown.
In recent work, Rajaram et al. [
1,
2] proposed a measure called the “degree of uniformity (DOU)” to quantify how evenly the probability mass or density is distributed across available outcomes or support. Specifically, they defined the DOU for a partial distribution on a fixed interval as the ratio of the exponential of the Shannon entropy to the coverage probability of that interval [
1,
2].
where the subscript “P” denotes “part”, referring to the partial distribution on the fixed interval,
is the coverage probability of the interval,
is the entropy of the partial distribution, and
is the entropy-based diversity of the partial distribution. When the entire distribution is considered,
, and thus, the DOU equals the entropy-based diversity
. It should be noted that the DOU is neither standardized nor normalized and does not explicitly measure the deviation of a given distribution from a uniform benchmark. Therefore, we will not discuss the DOU further in this paper.
Classical evenness measures, such as Simpson’s evenness and Buzas and Gibson’s evenness, are essentially diversity ratios. For a discrete random variable
X with PMF
and
n possible outcomes, Simpson’s evenness is defined as (e.g., [
3])
where
is called Simpson’s diversity, representing the effective number of distinct elements in the probability system
, and
n is the maximum diversity that corresponds to a uniform distribution with PMF 1/
n. The concept of effective number is the core of diversity measures in biology [
4].
Buzas and Gibson’s evenness is defined as [
5].
where
is the Shannon entropy of
X,
, and
is the entropy of the baseline uniform distribution. The exponential of the Shannon entropy
is the entropy-based diversity and is also considered to be an effective number of elements in the probability system
.
Both and are normalized by n, the maximum diversity corresponding to the baseline uniform distribution. Therefore, these indices range between 0 and 1, with 0 indicating extreme unevenness and 1 indicating perfect evenness. Since evenness is negatively correlated with unevenness, we consider the complement of and as unevenness (i.e., non-uniformity) indices. That is, we denote as Simpson’s unevenness and as Buzas and Gibson’s unevenness, with 0 indicating perfect evenness (uniformity) and 1 indicating extreme unevenness (non-uniformity).
However, as Gregorius and Gillet [
6] pointed out, “Diversity-based methods of assessing evenness cannot provide information on unevenness, since measures of diversity generally do not produce characteristic values that are associated with states of complete unevenness.” This limitation arises because diversity measures are primarily designed to capture internal distribution characteristics, such as concentration and relative abundance within the distribution. For example, the quantity
is often called the “repeat rate” [
7] or Simpson concentration [
4]; it has historically been used as a measure of concentration [
7]. Moreover, since diversity metrics are not constructed within a comparative distance framework, they inherently lack the ability to quantify deviations from uniformity in a meaningful or interpretable way. This limitation significantly diminishes their effectiveness when the goal is specifically to detect or describe high degrees of non-uniformity.
The aim of this study is to develop a new normalized, distance-based index that can effectively quantify the non-uniformity or unevenness of a probability distribution. In the following sections,
Section 2 describes the proposed distribution non-uniformity index (DNUI).
Section 3 presents several examples to compare the proposed NDUI with the Hellinger distance (HD), the total variation distance (TVD), Simpson’s unevenness, and Buzas and Gibson’s unevenness.
Section 4 and
Section 5 provide discussion and conclusion, respectively.
4. Discussion
4.1. Axioms for an Effective Non-Uniformity Index
It is important to note that non-uniformity indices require an axiomatic foundation to ensure their validity and meaningful interpretation. This foundation should be built upon a set of axioms that any acceptable non-uniformity index should satisfy. We propose the following four axioms for an effective non-uniformity index:
Normalization: The index should range between 0 and 1 (or approximately), with 0 indicating perfect uniformity and 1 (or near 1) indicating extreme non-uniformity.
Sensitivity to Deviations: The index should be sensitive to any deviations from a baseline uniform distribution, producing a value that reflects the extent of non-uniformity.
Consistency and Comparability: The index should yield consistent results when applied to similar distributions and enable comparisons across different distributions.
Intuitive Interpretation: The index should be easy to understand and interpret, providing a clear indication of how close a distribution is to perfect uniformity.
Of the eight non-uniformity measures evaluated in this paper, the DOU, KL divergence, and χ2 divergence fail to meet Axiom 1 (normalization), as noted in the Introduction. The Hellinger distance (HD), total variation distance (TVD), Simpson’s unevenness , and Buzas and Gibson’s unevenness do not satisfy Axiom 2 (sensitivity to deviations), as demonstrated in Examples 3.1 and 3.2. Only the proposed NDUI satisfies all four axioms, making it a robust and effective measure.
4.2. Normalization, Benchmarks for Defining Non-Uniformity Levels, and Invariance to Probability Permutations
The definition of the proposed NUDI is both mathematically sound and intuitively interpretable. It is a normalized, distance-based metric derived from the total deviation defined in Equations (11) and (21). Importantly, this total deviation incorporates two components, namely, variance and bias, both measured relative to the baseline uniform distribution. The particular normalization using the second moment of the PMF or PDF provides a natural and robust scaling factor, which ensures that the NDUI consistently reflects deviations from uniformity across diverse distributions while maintaining a normalized range of [0, 1], as demonstrated in the presented examples.
The proposed DNUI ranges between 0 and 1, with 0 indicating perfect uniformity and 1 indicating extreme non-uniformity. Lower DNUI values (near 0) suggest a more uniform or flatter distribution, while higher values (near 1) suggest a greater degree of non-uniformity or unevenness. Since there are no universally accepted benchmarks for defining levels of non-uniformity, we tentatively propose DNUI values of 0.25, 0.5, and 0.75 to represent low, moderate, and high non-uniformity, respectively. The proposed thresholds are determined by the DNUI’s normalized [0, 1] range, which approximately divide it into quartiles and aligns with the empirical DNUI values observed in Examples 3.1 and 3.2, where values near 0.25 indicate minor deviations, 0.5 indicate moderate deviations, and 0.75 or higher indicate significant deviations from uniformity.
Note that the DNUI (similar to other indices) depends solely on the probability values and not on the associated outcomes (or scores) or their order. This property can be illustrated using the frequency data from Series C in
Section 3.2: {0.03, 0.02, 0.6, 0.02, 0.03, 0.07, 0.06, 0.05, 0.05, 0.07}. If, for example, the second and third values are swapped, the DNUI value remains unchanged. Therefore, the DNUI is not a one-to-one function of the distribution; it can “collapse” different distributions into the same value. This property is analogous to how different distributions can share the same mean or variance. The invariance of the DNUI to probability permutations implies that it may not distinguish distributions with identical probability sets but different arrangements, suggesting that in applications like clustering or anomaly detection, the DNUI should be complemented with order-sensitive metrics when structural differences are critical.
4.3. Upper Bounds of Non-Uniformity Indices in the Discrete Case
In the discrete case, when
X follows a uniform distribution, the DNUI, HD, TVD,
, and
are all 0 regardless of the number of possible outcomes. However, in the extreme case, where all outcomes have probability 0 except one with probability 1, the upper bound of these indices depends on the number of possible outcomes. The upper bound of the DNUI is given by
The upper bound of the HD is given by
The upper bound of the TVD is given by
The upper bound of
is given by
The upper bound of
is given by
Note that the TVD,
, and
have the same upper bound.
Figure 3 shows plots of the upper bounds of the five indices as functions of the number of possible outcomes. It can be seen that among the five indices, the NDUI has the largest upper bound at
n = 2 (where the upper bound is minimum), which increases rapidly to 1 as
n increases. In contrast, the other indices have a very low upper bound at
n = 2, less than 0.541, which increases slowly to 1 as
n increases. Our common sense tells us that this extreme case represents a very high degree of non-uniformity and should be represented by an index value of 1 or close to 1. Therefore, the NDUI performs best among the five non-uniformity indices.
5. Conclusions
Four axioms for an effective non-uniformity index are proposed: normalization, sensitivity to deviations, consistency and comparability, and intuitive interpretation. Among the eight non-uniformity measures evaluated in this paper, the degree of uniformity (DOU), KL divergence, and χ2 divergence fail to satisfy Axiom 1 (normalization). The Hellinger distance (HD), total variation distance (TVD), Simpson’s unevenness , and Buzas and Gibson’s unevenness do not satisfy Axiom 2 (sensitivity to deviations). Only the proposed NDUI satisfies all four axioms.
The proposed DNUI provides an effective metric for quantifying the non-uniformity or unevenness of probability distributions. It is applicable to any distribution, discrete or continuous, defined on a fixed support. It can also be applied to partial distributions on fixed intervals to examine local non-uniformity, even when the overall distribution has unbounded support. The presented examples have demonstrated the effectiveness of the proposed DNUI in capturing and quantifying distribution non-uniformity.
It is important to emphasize that the NDUI, as a normalized and axiomatically grounded measure of non-uniformity, could be applied to fields such as ecological modeling, information theory, and machine learning. For example, the NDUI’s sensitivity to deviations and intuitive interpretation could support its use as an alternative to diversity-based evenness measures in evenness/unevenness analysis in ecology. The scope of application of the NDUI needs to be further studied and expanded.