Base Dependence of Benford Random Variables

A random variable X that is base b Benford will not in general be base c Benford when c is not equal to b. This paper builds on two of my earlier papers and is an attempt to cast some light on the issue of base dependence. Following some introductory material, the"Benford spectrum"of a positive random variable is introduced and known analytic results about Benford spectra are summarized. Some standard machinery for a"Benford analysis"is introduced and combined with my method of"seed functions"to yield tools to analyze the base c Benford properties of a base b Benford random variable. Examples are generated by applying these general methods to several families of Benford random variables. Berger and Hill's concept of"base-invariant significant digits"is discussed. Some potential extensions are sketched.

If the view is accepted that phenomena fall into geometric series, then it follows that the observed logarithmic relationship is not a result of the particular numerical system, with its base, 10, that we have elected to use. Any other base, such as 8, or 12, or 20, to select some of the numbers that have been suggested at various times, would lead to similar relationships; for the logarithmic scales of the new numerical system would be covered by equally spaced steps by the march of natural events. As has been pointed out before, the theory of anomalous numbers is really the theory of phenomena and events, and the but play the poor numbers part of lifeless symbols for living things.
This argument seems compelling, and it might seem to apply to Benford random variables as well as to geometric sequences and exponential functions. It is therefore somewhat surprising to observe that a random variable that is base Benford is not  generally base Benford when . We'll see some examples shortly.     This paper builds on two of my earlier papers ([2], [3]) and is an attempt to cast some light on the issue of base dependence. It's organized as follows. Section 2 introduces the significand function and the fractional part notation and gives several logically equivalent definitions of "Benford random variable." The base first digit law is introduced, and  several examples of random variables are presented that are Benford relative to one base but not to another. Section 3 introduces the "Benford spectrum" of a positive random   variable and summarizes some of the known analytical results that involve Section     4 is a brief digression listing some facts about Fourier transforms that are needed in subsequent sections. Section 5 introduces some fundamental notation and results that provide a framework for the "Benford analysis" of a positive random variable. Section 6 combines the framework of Section 5 with my method of "seed functions" to develop the theory of the base Benford properties of random variables that are known to be   Benford relative to base , and Section 7 gives several examples of such random variables.
 Section 8 discusses Berger and Hill's concept of "base-invariant significant digits." Section 9 is a summary and a look ahead.

Benford Random Variables
The best way to define Benford random variables is via the . Let significand function       be a fixed "base." Any may be written uniquely in the form where and , and the , written , is defined as where and . (Berger and Hill [4] define the significand of for all Now let be a positive random variable; that is, Assume that is        Pr  continuous with a probability density function (pdf).
: is base Benford Nothing written above requires that be an integer. For this paragraph , we  alone assume that is an integer greater than or equal to 3. Let denote the first (i.e.,       leftmost or most significant) digit of in the base representation of , so                   (Leading zeros, if there are any, are ignored.) .
for all . This is the . To prove it, it is sufficient It's useful at this point to introduce some non-standard notation. Let and recall    that the "floor" of , written , is defined as the largest that is less than or equal to where the notation " " means that is uniformly distributed on the half open       interval (More generally, I use the symbol " " to mean "is distributed as." Hence,    for example, " " means that is distributed with pdf , " " means that Hence, fails to satisfy the base 10 First Digit Law. 

The Benford Spectrum.
Let be a positive random variable. . Wójcik [7], the  Definition 3.1 Following " " of , denoted , is defined as The Benford spectrum of may be empty. In fact, the Benford spectra of essentially  all the standard random variables used in statistics are empty.
We say of this result that the Benford property of a random variable is "scale-invariant." Proposition 3.5. Suppose that and are independent positive random variables   and that is -Benford. Then the product is also -Benford.
: [3], [4], [7].     Citations Proposition 3.6 (a corollary of Proposition 3.5). If and are independent positive   random variables, then So far, the spectra we've seen are at most countably infinite. One may wonder if there exists a random variable with an uncountable spectrum. Whittaker showed by an example that such a random variable exists. Let be given. Define by It may be shown that is a legitimate pdf, and is u.
It may then be shown that is u.d. mod 1 (and hence that . In summary, .

Digression: Fourier Transforms.
Before going much further, we need to list some facts about Fourier transforms. Let  denote the pdf of a real valued random variable . The Fourier transform of is the Note that is an even function and is an odd function, and hence that (proof left to reader), and The Appendix of this paper contains a table of selected Fourier transforms.

A Framework for Benford Analysis.
Suppose that is a positive random variable and that . We may wish to know if       is -Benford, and if it's not by how far does it differ from "Benfordness." I call an attempt to answer these and related questions a "Benford analysis." In this section I establish some notation I'll use for a Benford analysis, and give some fundamental results that allow us to proceed.
Given we may answer the two questions given above. (1) (2) If is not -Benford, we may measure its deviation from       Benfordness by any measure of the deviation of from a uniform distribution. For is continuous, or if its only discontinuities are "jumps," we could use the    infinity norm: We need a way to find from . Under a reasonable assumption, it may be shown The "reasonable assumption" is described in [2]. In this paper we'll just     accept eq. (5.3) as given.
Although eq. (5.3) is fundamental for a Benford analysis of , it is not very useful for  finding the answers to some analytical questions one may ask. Fourier analysis provides the tools needed to continue the analysis. It may be shown [3] that the Fourier series representation of is At first sight this expression must not seem very useful; the series of real valued functions in eq. (5.3) has been replaced by a series of complex valued functions multiplied by complex coefficients. But , and eq. (5.4) may be written as As is the complex conjugate of , it follows that each term in brackets in eq. (5.5) is real valued. In fact, Combining eqs. (5.5) and (5.6) yields In practice, it is often convenient to go one step further and rewrite eq. (5.8) as and is any solution to The parameters and are not uniquely determined by eqs. (5.10) and (5.11), but in     practice natural candidates for and often present themselves.     I'll call an "amplitude" (though this term generally refers to ) and a "phase."         Solving eq. (5.7) for and , we find It follows that The first thing to observe is that is proportional to : It then follows from Proposition 4.1 that for any .

  
To use eq. (6.2) we first need to say something about I introduced "seed functions"    in [2] and showed that every pdf of a u.d. mod 1 random variable may be written  Under various assumptions about , we may combine eqs. ( It's clear from the rightmost expression in this equation that . When , an          initial integration by parts yields Evaluating this integral, for any We see once again that for all whenever is an integral . This is essentially the possibility that was exploited in the construction of    Whittaker's random variable. We'll return to this point in a moment. Example C 1. Still working with the assumption that is increasing and absolutely  continuous, we now make the additional assumption that is an even function, which  implies that is an even function. Under these assumptions, eq. (6.12) implies that    Example C 2. In Example C 1 we assume that is even, so that it's symmetrical  around the point . Now assume that is symmetrical around the point for . Define so is an even function. It is easy to show that Combining this fact with eq. (6.12) yields Equation (6.14) implies that for Example C 2. We observe that the phase depends on and , but not on . Equations (6.12) and (6.15) provide the scaffolding for the construction of , but    require insertion of an actual formula for in eq. (6.12) or in eq. (6.15) for        completion. This section completes this construction using the table of Fourier transforms in the Appendix.
Every distribution function is a legitimate seed function. Hence every Fourier transform given in the Appendix is a legitimate candidate for . Moreover, four of the    distributions in the Appendix (the normal, Laplace, Cauchy, and logistic) are even functions, and their Fourier transforms are therefore legitimate candidates for . All     four of these distributions have fixed variances, however, and it is desirable to append a "scale" parameter that allows these variances to be adjusted. Proposition 4.1 justifies the  following expanded table of Fourier transforms.
Note: Among these four distributions, is the standard deviation of the rescaled random  variable for the normal distribution . only      Gauss-Benford random variables. Suppose that is the cdf of a random         variable, i.e., a random variable shifted to the right. I'll call the random variable        implied by this seed function a "Gauss-Benford" random variable. Combining eq. (6.15) with the appropriate entry from this table, we obtain so . Viewed as a function of or , oscillates within an envelope , and for all and . Asymptotically, letting any of the   parameters , , or implies that 0. Equation (7.2) implies that The asymptotic behavior for this is identical to that of the previous three random    variables. The rate of convergence of to zero is comparable to that of a Cauchy-   Benford random variable.
Gamma-Benford random variables. Suppose that the seed function is the cdf of  a Gamma random variable. I'll call the random variable implied by this seed       function a "Gamma-Benford" random variable. This seed function is increasing and absolutely continuous, but is not symmetrically distributed around any point , so   eq. (6.15) does not apply. Combining eq. (6.12) with the appropriate entry from the table of Fourier transforms found in the Appendix, we obtain To "compare and contrast" these results with those with symmetric distributions, we make the following observations. (1)  I wish to acknowledge that I first encountered many of the ideas discussed in this section in Michal Wójcik's admirable paper [7]. To the best of my knowledge, however, Proposition 8.4 is entirely my own.