1. Introduction
In the late nineteenth century, astronomer Simon Newcomb observed that in the logarithmithic books at his workplace, certain pages were “more worn than others” [
1]. In particular, there was more wear and tear in the earlier pages than the later pages. He deduced that there is a “bias” towards smaller leading digits, with the digit 1 showing up roughly
of the time, the digit 2 showing up roughly
of the time, and so on. Newcomb’s findings were practically ignored until about fifty years later, when physicist Frank Benford published his own research on the distribution of leading digits in Reader’s Digest [
2]. Benford displayed a table of roughly 20,000 observations from twenty different sets of data, shown in
Figure 1.
Benford called this distribution the “law of Anomalous Numbers”. However, due to the popularity of his publication, the phenomena eventually became known as “Benford’s law”; see
Figure 2 for probabilities. Benford’s law is a powerful phenomena that occurs in a variety of data, including accounting, elections, finance, geosciences, physics, population data, street addresses, and more. The law’s prevalence makes it a valuable tool for ensuring data integrity. For example, it is often used in fraud detection for tax returns, insurance claims, and expense reports [
3,
4]. For further details on the history of Benford’s law, see [
5,
6,
7,
8,
9].
There is extensive research on fragmentation problems related to Benford’s law, which is the subject of our paper. In [
10], Kakutani considered the following fragmentation problem: Start with the initial set
and a fixed constant
. At each stage, from
, construct
by adding the point
, where
. Kakutani was able to show that as
, the points of
converge to a uniform distribution on
and are therefore non-Benford. This problem has been generalized to various setting; see [
11,
12,
13,
14,
15,
16] for examples. For other decomposition problems, see [
17,
18] for fragmentation in two dimensions, [
19] for fragmentation in a fractal setting, and [
20] for discrete fragmentation.
In this paper, we investigate a fragmentation problem inspired by the work by Becker et al. [
21], who studied the stick fragmentation problem. Their model begins with a stick of length
L and a density function
f on
from which proportions are sampled. At Stage 1, the stick is split into two substicks at proportion
. At Stage 2, the left substick is split into two substicks at proportion
, while the right substick is split into two substicks at proportion
. Iterating this procedure for
N stages can produce up to
sticks.
Becker et al. analyzed three versions of this process: (i) the unrestricted case, in which a new proportion is drawn from f at each stage and all sticks split; (ii) the restricted case, in which a new proportion is drawn at each stage but only one of the two substicks splits; and (iii) the fixed-proportion case, in which all sticks split but a single proportion p is fixed in advance and applied throughout. For the restricted and unrestricted cases, they established convergence to strong Benford behavior under mild conditions on the Mellin transform of f and on the mean and variance of for . For the fixed-proportion case, they proved that convergence occurs if and only if is irrational.
We extend the fixed single-proportion stick fragmentation problem to a fixed multi-proportion setting. Specifically, we begin with a stick of length L, an integer , and any proportions satisfying , independent of L. At Stage 1, the stick is split into m substicks of lengths . At Stage 2, each of these substicks is split into m smaller substicks according to the same proportions. In general, at Stage N, every substick from Stage is split into m substicks using . After N stages, the process produces substicks.
Our principal question is whether the stick lengths exhibit Benford behavior. The novelty of this model lies in dividing the subdivision into more than two substicks at each stage, which renders the techniques used in Becker et al. [
21] inapplicable. Our main result is a necessary and sufficient condition under which this fixed multi-proportion stick fragmentation model yields stick lengths converging to strong Benford’s behavior.
Theorem 1. Let be an integer, and choose such that . At each stage, a stick is split into m substicks according to the proportions . After N stages, the process produces sticks, whose lengths are for all non-negative integers with . For , set . Then the stick lengths converge to strong Benford behavior if and only if is irrational for some .
1.1. Definitions and Theory of Benford’s Law
We begin with the definition of Benford’s law (see, for example, [
8,
22]).
Definition 1 (Benford’s Law for the Leading Digit). A dataset is said to satisfy Benford’s law for the leading digit if the frequency of having the leading digit d is given by for all .
There are several approaches to proving that a dataset follows Benford’s law. A standard method is to use the Uniform Characterization Theorem. For this, we first introduce the notion of the significand of a real number.
Definition 2 (The Significand)
. For any , we can uniquely writewhere and . Equivalently, and . We call the significand of x. The following definition generalizes Benford’s law from the leading digit to the entire significand.
Definition 3. A sequence of random variables is said to converge to strong Benford’s law iffor all . Note that the index N typically represents the size of the dataset, so convergence to strong Benford’s law should be interpreted as an asymptotic property. Definition 4 (Uniform Distribution Modulo 1)
. A sequence of random variables is said to converge to being equidistributed mod 1 iffor all . We are now ready to state the Uniform Characterization Theorem [
8].
Theorem 2 (Uniform Characterization Theorem). A sequence of random variables converges to strong Benford’s law if and only if the sequence of their base-10 logarithms converges to being equidistributed mod 1.
Thus, convergence to strong Benford’s law is equivalent to convergence to equidistribution modulo 1. Consequently, proving or disproving convergence to strong Benford’s law can be reduced to proving or disproving convergence to equidistribution modulo 1. In what follows, we illustrate the above definitions with an example of random variables that converge to strong Benford behavior, as well as an example of random variables that do not.
Example 1. Let , where is normally distributed with mean 0 and variance N. No assumptions are made on the dependence structure among the . Then the sequence converges to strong Benford’s law. This follows from the fact that is equidistributed modulo 1
. For details, see [8]. Example 2. Let be uniformly distributed on for all , with no assumptions on dependence among the . Then for every N,By comparison with Definition 3, it follows that does not converge to strong Benford’s law. 1.2. Fixed-Proportion Stick Fragmentation Model and the Multinomial Distribution
Now that we have a proper foundation for Benford’s law, it is time to describe in detail the model used in this paper. The model can be viewed as an extension of a model introduced by Becker et al., called the fixed single-proportion stick fragmentation model [
21].
Suppose we start with a stick of length L. We split the stick at a fixed proportion of . After the first break, we obtain two sticks of lengths and . We then split each of these two sticks again at the same fixed proportion p, producing four sticks of lengths , , , and . Repeating this process for N stages, we obtain sticks in total, with distinct lengths. These lengths follow a binomial distribution.
Becker et al. were interested in whether the leading digits of the significands of the stick lengths converge to strong Benford’s law. They discovered a necessary and sufficient condition for this convergence and proved it in [
21].
Theorem 3 (Fixed Single-Proportion Stick Fragmentation Theorem [
21])
. Consider the fixed single-proportion stick fragmentation model. Choose y so that . The fragmentation model produces stick lengths that converge to strong Benford’s law if and only if y is irrational. In the non-Benford case, Becker et al. observed cyclic behavior in the significands and used the multisection formula [
23]. By contrast, establishing the Benford case was more difficult. They adopted methods from [
22,
24,
25], applying truncation to demonstrate roughly equal probability across intervals, and ultimately proved equidistribution modulo 1. In summary, they showed that stick lengths governed by a binomial distribution converge to strong Benford’s law when the ratio equals 10 and is raised to an irrational power and fail to converge when the ratio equals 10 and is raised to a rational power.
We now extend their fixed single-proportion stick fragmentation model to what we call the fixed multi-proportion stick fragmentation model. Becker et al. considered only the case in which the stick L is cut at a single fixed proportion p at each stage. We generalize this by cutting the stick at multiple distinct fixed proportions at every stage. Our model is as follows:
Suppose we have a stick of length
L. We split the stick simultaneously at fixed proportions
, with
. Define
. Thus, after Stage 1, we obtain sticks of lengths
. At Stage 2, we cut each stick obtained from the previous stage at the same fixed proportions
. The resulting stick lengths from Stage 2 are
,
, and so on. After Stage
N, we are left with
sticks in total, with
distinct lengths (see
Figure 3 for the case where
and
). The number of distinct lengths arises because the process can be interpreted as unordered sampling with replacement.
Moreover, the stick lengths are distributed according to a generalization of the binomial distribution known as the multinomial distribution, which is defined in terms of the multinomial coefficient, which itself is a generalization of the binomial coefficient. We recall the following definition and result (see [
26]).
Definition 5 (Multinomial Coefficient)
. For any non-negative integer N and positive integer m, the multinomial coefficient isfor . A random vector follows a multinomial distribution with parameters N and iffor all with . Remark 1. Formula (
5)
gives the number of ways to choose N objects with exactly objects of type j, which is when order does not matter. Theorem 4 (Multinomial Theorem)
. Let N be any non-negative integer and be real numbers. Thenwhere the s are non-negative integers summing to N. 2. Fixed Multi-Proportion Stick Fragmentation Model
Recall that we are interested in studying whether or not a stick fragmentation process results in stick lengths that converge to strong Benford’s law. For Becker et al., they were able to prove that if the ratio is equal to 10 and to an irrational power, the stick lengths will follow strong Benford’s law, but if the ratio is equal to 10 and to a rational power, then the distribution of stick lengths will not follow strong Benford’s law.
In what follows, we generalize the results of Becker et al. to the fixed multi-proportion stick fragmentation model we introduced in
Section 1.2. In particular, we present Theorem 1 again along with the proof for the necessity of the condition.
Theorem 1. Let be an integer, and choose such that . At each stage, a stick is split into m substicks according to the proportions . After N stages, the process produces sticks, whose lengths are for all non-negative integers with . For , set . Then the stick lengths converge to strong Benford behavior if and only if is irrational for some .
We prove the necessity of the condition; i.e., if
is rational for all
, then the stick fragmentation model produces lengths that do not converge to strong Benford’s law. The sufficiency of the condition, i.e., if
is irrational for some
, then the model produces lengths that converge to strong Benford’s law, is more technical; see [
27] for details. Before presenting the proof, we refer the reader to numerical simulations supporting the theorem. See
Figure A1 for rational cases and
Figure A2 for irrational cases.
Proof. By the Uniform Characterization Theorem 2, it suffices to show that
is not equidistributed mod 1. Since Benford’s law is scale-invariant [
6], we may assume
without the loss of generality. Each stick length
factors as
Since
is rational for all
, write
with
,
, and
. Then
Taking
gives
The term
has at most
distinct values under mod 1, since it is periodic with a period of at most
. More generally,
has at most
distinct values under mod 1. The term
is constant across all
. Hence,
has at most
distinct values, independent of
N. This number is also finite for fixed
m,
.
Since a uniform distribution on is continuous and thus not discrete, cannot be equidistribted mod 1. Thus, the fragmentation process produces stick lengths that do not converge in distribution to strong Benford’s law. □