We first assume that the data has a single underlying period. We model the event times as a finite set of positive real numbers.
where the period
is a fixed positive real number, the
’s are non-repeating positive integers, the phase
is a real random variable uniformly distributed over the interval
and the additive noise elements
are zero-mean independent and identically distributed (iid) error terms. We refer to
as
the generator of the process. We also assume that the
’s have a symmetric probability density function (pdf) and that
for all
j. The assumption that
for all
j is important, for this does not allow noisy data elements to cross over each other. This crossover could, in effect, create false periods.
The MEA extracts an estimate of
. The procedure is computationally efficient and requires little input data. Given reasonable data, it quickly converges with very high probability to either the exact value of
(when the data is noise-free) or an estimate
(when the data has additive noise). When there is no noise, the MEA produces
with very high probability even if we are given very few (e.g.,
) data elements, independent of the number of missing measurements. In the presence of noise (non-zero
’s in (
1)) and false data (or outliers), there is a tradeoff between the number of data samples, the amount of noise and the percentage of outliers. The algorithm performs well given low noise for
, but will degrade as noise is increased. There is always a trade-off between the amount of data and the amount of noise. Given more data, various statistical techniques ([
2,
3,
4,
15]) can be used to reduce noise effects and speed up convergence. The EQUIMEA, although more complicated, can also be used to extract
.
2.1. The Modified Euclidean Algorithm (MEA)
Assume that we have a set of event times
S (as in (
1)). The MEA is a procedure for finding
. The mathematical justifications for the procedure use number theory. We cite Hardy and Wright [
16], Ireland and Rosen [
17], Knuth [
18,
19,
20], Leveque [
21], Rosen [
22], and Schroeder [
23] as number theory references.
First, we recall the Euclidean algorithm. (Knuth calls the Euclidean algorithm “the grandfather of all algorithms” because it is the oldest algorithm still in use today [
19], p. 318.) Given two positive integers
a and
b,
, we say that
b divides a if there exits a positive integer
k such that
. This is denoted by
. An integer
that has no divisors other than 1 and itself is a
prime. An integer
that is not prime is called a
composite. Prime numbers are the fundamental building blocks of the integers.
The Fundamental Theorem of Arithmetic (see Rosen [
22], p. 97) states that every positive integer
is a product of powers of primes, unique up to the ordering of the primes.
The Euclidean algorithm is based on the property that is a Euclidean domain, i.e., given two positive integers a and b, , there exist two positive integers q and r such that If , then b divides a. This property can be used to develop the Euclidean algorithm, which can be represented in the following. Let “⟵” denote replacement, e.g., “” means that the value of the variable a is to be replaced by the current value of the variable b. Given , , proceed as follows:
- (1.)
.
- (2.)
The algorithm terminates if . Set .
- (3.)
Else, set and . Go to (1.).
The procedure yields the greatest common divisor of a and b. The greatest common divisor is the product of all the powers of prime factors p that divide both a and b. Note that , the greatest common divisor of the set , is not the pairwise gcd of the set . If , the set is called mutually relatively prime. If for all , the set is called pairwise relatively prime. If a set is pairwise relatively prime, it is mutually relatively prime. However, the converse is not true. For example, consider the set . Note that no pair in this set is relatively prime, but the entire set is mutually relatively prime.
To work with our data sets, we have to make modifications to the standard Euclidean algorithm. We have data sets with more than two elements. The gcd of a set of more than two integers can be computed using Proposition 1(i.). We also have data sets that possibly have non-integer periods. Proposition 1(ii.) extends the gcd to multiples of a fixed real number .
Proof. See Leveque [
21], p. 16. □
Let , and so . Then, plays the role of the fundamental unit of the set . We say that the elements in the set S are commensurate to , i.e., every element in the set S can be expressed as an integer multiple of . For example, the elements of the set are commensurate to . Note, . If no such fundamental unit exists, we call the elements of the set incommensurate.
Remark 2.
All finite sets of rational numbers are commensurate. Add a single irrational number to a finite set of rational numbers and this new set is incommensurate.
The first step of the Euclidean algorithm involves division. Noisy data makes this step unstable in the following sense. The additive noise components could be non-zero, but arbitrarily close to zero. Dividing by these numbers could result in arbitrarily large numbers. Our third modification of the Euclidean algorithm addresses this. We develop a form of the procedure based on sorting and subtraction rather than division. Although this requires additional iterations, it establishes the groundwork to modify the algorithm so that it is stable with respect to noise.
Remark 3.
It is important to note that Proposition 2 plays a fundamental role in both the MEA and EQUIMEA.
Proof. First assume that is a positive integer that divides each , i.e., for . Then, for , and . Therefore, is a divisor of . Conversely, assume is a positive integer such that for , and . Therefore, there exist positive integers c and d such that and . Thus, , and so . Continuing in this fashion, we get that for . Therefore, since the sets and have the same divisors, □
Remark 4.
After the first sort and subtraction, the gcd of the integers is invariant throughout the remainder of the process.
In order to eliminate the phase information , we subtract it. This is justified by the following.
Proposition 3.
Ifthen for all .
Proof. Let . Then, since and there exist positive integers c and d such that and . Adding gives . Continuing in this fashion gives the result for . Shifting to gives the general result. □
We call the procedure the
modified Euclidean algorithm (MEA). We first sort the
’s in
descending order to allow for a more straightforward implementation of our algorithm, i.e.,
. (Thus
is the minimum of
.) We form a new set by subtracting adjacent pairs of these numbers, given by
. After this first operation, the phase information has been subtracted out and the resulting set has the simpler form
where
and
. In subsequent iterations of the algorithm, the data maintains this same general form. Eliminating the phase information simplifies the process. The main theorem of the MEA, Theorem 1, applies to
after
has been eliminated.
Initialize: Sort the elements of
S in descending order. Set
and import
.
- (1.)
[Adjoin 0 after first iteration.] If , then .
- (2.)
[Form the new set with elements .] Set .
- (3.)
[Sort.] Sort the elements in descending order.
- (4.)
[Eliminate noise.] If , then .
- (5.)
[Terminate or loop.] The algorithm terminates if S has only one element . Declare . If not, then set . Go to (1.).
We then sort, subtract adjacent pairs and, after the first iteration, eliminate noise and adjoin the previous non-zero minimum to the set. The algorithm is continued by iterating this process of sorting, subtracting and eliminating the elements in , adjoining the previous non-zero minimum at each new iteration, and terminating when only a single element remains. Note that Proposition 2 guarantees that the remains unchanged. The lone remaining element is equal to where the error term is the result of the noise terms after several iterations of the MEA and the noise floor .
2.1.1. The Noise Floor
The additive noise elements create a need for a noise floor . The are zero-mean independent identically distributed (iid) error terms. We assume that the ’s have a symmetric probability density function (pdf) and that for all j. The assumption that for all j does not allow noisy data elements to cross over each other, and thus creating false periods.
To deal with the noise, we establish a noise threshold . After the first sort and subtract (which eliminates the phase ), we “zero out” the noise by eliminating the elements in . Setting this noise floor parameter is key. If we know an estimate of the noise in the data, we set as twice the maximum range of the noise.
In general, we do not have an estimate of
. For unknown noise models, this estimate can be tricky. First, after the first iteration, the differencing operation has removed the independence of the error terms. Second, the ordering operation makes the nature of the dependence in subsequent iterations difficult to determine. Analysis of
order statistics very often rests on an iid assumption, e.g., see Sarhan and Greenberg [
24] and Reiss [
25]. Without the iid assumption, this analysis leads to many open questions (see Reiss [
25]). In general, beyond the first iteration, the pdf of the subsequent error terms becomes asymmetric, even when starting with iid
’s with symmetric pdf
. This occurs due to the reordering before differencing at each iteration, and because after the first iteration, the errors are no longer iid.
We developed the following method of estimating
in [
4]. Suppose the pdf of the
’s is given by
, and consider the set of differences obtained in the first iteration, given by
Invoking the zero-mean iid assumption on the ’s, the pdf of is given by the convolution . So, for example, if ( is uniformly distributed with parameter ) then , the triangle function centered at . A straightforward method for clustering the data is to employ a gradient operator to determine when a step has occurred. After the first iteration, the gradient is estimated, with large gradient values indicating a step or “edge” in the data. We have employed a simple estimator by convolving with an impulse response given by . The gradient operator has the effect of binning the data. Each bin gives an estimate , which is given by the largest data point in the bin minus the smallest in the bin. Let equal twice the maximum estimate over all of the bins.
This binning process also yielded a very quick method of estimating . After subtracting out the phase and binning after applying the gradient operator, simply average across the elements in each bin. and then apply the MEA. This led to a multistep estimating process which achieved the Cramér–Rao bound (CRB). We then developed a multi-step procedure that
- (i).
Estimates ,
- (ii).
Estimates the ’s, and then
- (iii).
Refines the estimate of using the estimated ’s in a least-squares solution.
In ([
4], p. 2291), we refer to this procedure as the MEA-LS. An extensive comparative analysis was presented in [
4], pp. 2296–2298), Figures 1–4. Each of these figures presents a comparative performance analysis of MEA, MEA-LS, the 1024 point periodogram, and the 4096 point periodogram against the Cramér–Rao bound as the signal-to-noise ratio (SNR) went from 0 to 50. Figure 5 from ([
4], p. 2299) did a similar analysis for increasing jitter noise, as jitter increased from 5% to 35%.
2.1.2. Connection with Analytic Number Theory
Theorem 1 and its corollaries give us that with probability 1 as . Convergence is exponentially quick. Therefore, the modified Euclidean algorithm yields either the exact value of (when the data is noise-free) or an estimate (when the data has additive noise). In the noise-free case, the theory tells us that the algorithm very likely yields given as few as 10 data samples.
This is a manifestation of the structure of randomness over . The key result is Theorem 1, which is proven by showing that in an arbitrarily large lattice of positive integers in n-dimensional Euclidean space, if we choose an element of the lattice, the probability that the n-tuple is relatively prime quickly converges to certainty as n increases. We cannot “randomly chose” positive integers , but we can choose the n-tuple as an element of a lattice in .
- (1).
Given a randomly chosen
n-tuple
of positive integers in a finite symmetric lattice in
,
- (2).
Moreover, we can compute
P—
where
is Riemann’s zeta function.
The connection with the zeta function may at first seem surprising. However, if one looks at Euler’s product Formula (
4), it is easier to see how the zeta function can play a role in understanding relative primeness. We discuss the zeta function in the following section. (See [
1,
2] for additional discussion.)
2.2. Pi, the Primes, and Probability
Riemann’s zeta function is defined in the complex numbers
. Given a complex number
(
,
), we say that
x is the
real part of z (denoted by
) and
y is the
imaginary part of z (denoted by
). Riemann’s zeta function is defined on the complex half plane
by
Let
be the set of all prime numbers. Euler connected the zeta function to the primes in 1736 by proving that
(see Conway [
26], pp. 187–194). We will show that given
n “randomly chosen positive integers”
, the probability that this
n-tuple is relatively prime is expressed in terms of
. This result is key to the MEA.
In the following denotes probability.
Given a randomly chosen
n-tuple
of positive integers in a finite symmetric lattice in
,
as the size of the lattice
.
Heuristically, we could argue as follows. Given randomly distributed positive integers, by the law of large numbers, about
of them are even, and
of them are multiples of three and
are multiples of some prime
p. Thus, given
n independently chosen positive integers,
Therefore,
Calculating this for all of the primes gives us that
where
is the
prime. In this last equation, we have used the fact that by The Fundamental Theorem of Arithmetic, the prime factor decomposition of any integer
appears among the prime numbers
raised to some power. By Euler’s formula,
Thus,
Remark 5.
This heuristic argument breaks down on the first line. Any uniform distribution on the positive integers would have to be identically zero. The merit in the argument lies in the fact that it gives an indication of how the zeta function plays a role in the problem.
The formal proof is developed as follows. Let
denote cardinality of the set
, and for
, let
denote the sublattice of positive integers in
with coordinates
c such that
. Therefore,
is the number of relatively prime elements in
. Let
be the probability that
n positive integers chosen at random from
are relatively prime. Thus
We have that
gives the asymptotic meaning of
“randomly chosen” positive integers.
Theorem 1.
LetFor , we have that
We begin with the following lemma, which gives us a counting formula for
expressed in terms of primes and products of primes. Let
denote the floor function of
x, namely
Lemma 1.
Let is the number of relatively prime elements in . Then Proof of Lemma 1. Choose a prime number
. The number of integers in
such that
divides an element of that set is
. (Note that it is possible to have
, because
.) Therefore, the number of
n-tuples
contained in the lattice
such that
divides every integer in the
n-tuple is
Next, if
divides an integer
k, then
and
. Therefore, the number of
n-tuples
contained in the lattice
such that
or
or their product divide every integer in the
n-tuple is
where the last term is subtracted so that we do not count the same numbers twice (in a simple application of the inclusion–exclusion principle).
Continuing in this fashion, for three integers, say
, the number of
n-tuples
contained in the lattice
such that
,
, or
or any of their products divide every integer in the
n-tuple is given by the inclusion–exclusion principle as
We can therefore see by induction that the number of
n-tuples
contained in the lattice
such that
or
or any of their products divide every integer in the
n-tuple is given by the inclusion–exclusion principle as
But this counts the complement of
in the lattice
. Therefore,
This completes the proof of Lemma 1. □
Proof of Theorem 1. Lemma 1 gives us that if
is the number of relatively prime elements in
, then
We now observe that
Since
, this series is convergent. Thus, each term in the expansion of
is convergent. Now, let
By noting that since
and the sum is over
, we get
Since the
term in the expansion of
is dominated by
and since
is convergent, the series converges absolutely.
We now need the Möbius inversion function
, which is defined as follows. Let
The function
is called an inversion function because if
f is a function defined for all positive integers (an
arithmetic function) and
F is its sum over all divisors, i.e.,
, then for all positive integers
n,
inverts
f and
F by
(see Rosen [
22], pp. 251–255). Euler showed that
where the last sum is over
. For
, this series is absolutely convergent. This last equality follows because for
,
where we use the fact that both series in the first term converge absolutely and thus can be rearranged in any order (see Leveque [
21], p. 120).
Now let
be given. We want to show that there exists
such that for all
,
By (
13), there exists
such that for all
,
We use the fact that for all
and all
,
if and only if
. Let
Then
But
Choose
such that for all
,
. Let
. Then, for all
This completes the proof of the theorem. □
Corollary 1.
Let . Given a randomly chosen n-tuple of positive integers , we have thatThus,with probability as .
Evaluating the zeta function, even at positive integer values, is challenging. Euler gave us a remarkable formula which evaluates
at the even integers. Ireland and Rosen describe this as one of Euler’s “most remarkable computations” [
17], p. 231. The exact evaluations of
at
are still open (see, e.g., [
27]). We list the values of
for
in
Table 1.
We can, however, estimate
at the odd integers. Moreover, the estimate shows that
quickly as
n increases. In fact, the rate of convergence is exponential. This estimate, combined with Theorem 1, explains why as few as 10 data elements are needed to estimate
as demonstrated in
Table 2.
Proposition 4.
Let . Thenconverging to 1 from below faster than .
Proof. Since
and
,
Thus,
and so
converging to 1 from below faster than
. □
The first step of the MEA eliminates the additive phase
. After this first step, the MEA is working with the differences of pairs of the initial data, having the form
By Proposition 3, a prime
divides each
if and only if
divides
,
. This just shifts the point
to the point
. Therefore, for
, we have that
Combining Theorem 1 with Propositions 1 and 3 and inequality (
21) shows that the algorithm generates the underlying period
in the noise-free case as the number of data elements
n goes to infinity. Moreover, (
21) shows that the algorithm very likely produces this value in the noise-free case with as few as 10 data elements.
Corollary 2.
Let . Given a randomly chosen n-tuple of positive integers and a fixed positive real number τ,with probability 1 as .
2.3. Simulations of the MEA
We tested the MEA by designing computer simulations. We set up 100-loop Monte-Carlo runs, and then calculated statistics based on these computations. Let n denote the number of data elements in a given experiment, and without loss of generality, let in all experiments. This choice is arbitrary, but it does allow for a direct visual analysis of the results. Any other fixed real positive number would have yielded similar results.
Let
denote the value the algorithm gives for
, and let
denote the experimental standard deviations. The initial phase
was chosen randomly in
, and did not play a factor as it was eliminated after the first differencing. Noise values
were modeled as uniformly distributed with the probability distribution function (pdf)
, where
denotes the uniform distribution across the interval
. Then, for example,
implies random phase jitter that is
of the period
. In
Table 3 various noise threshold values
equaled
. (We noted that an increase in the noise floor
had the effect of speeding up the algorithm.)
The first set of simulations assumed that the data had no additive noise, i.e., for these simulations, .
- (1.)
Estimation from data without additive noise.
The first simulation examined the effects that changes in n and in the percentage of missing observations have on the algorithm’s performance. The data points had no additive noise i.e., for all j. The algorithm converged in many cases to the exact value of . When the number of events n was extremely low, the MEA also converged to multiples of .
The missing data elements were modeled by creating jumps in the
’s as follows. We chose an integer
l randomly from the interval
. Given
,
. Increasing
M increased possible jumps, thus making the data increasingly sparse. Results are shown in
Table 2. We let
denote the experimentally determined average percentage of missing observations and
denote the average number of iterations required to converge. To interpret these, again visualize the data as zero-crossings of
. A random process has removed
of the zero crossings of
f, leaving only
n observations.
The top half of
Table 2 shows the effect of changing
M and, therefore, changing the percentage of missing observations. Given insufficient data, the algorithm may converge to a multiple of
. Columns labeled
,
,
, and
indicate the percentage of runs that converged to these values. The algorithm is able to choose
correctly based on
data samples, even with
of the possible observations missing. Convergence in the noise-free case depends on
n but is independent of
M, as shown by the analysis above. The bottom half of
Table 2 illustrates the effect of changing
n for
M fixed. Reliable results are achieved for
. Note, however, although it is very probable that as few as 10 data elements will produce
, it is still possible that one could get a multiple of
. We did get an outlier in one simulation, as one can see in the fourth line of the lower table.
Table 2 shows that given
event times in
S, the MEA works very well (even with
of the zero crossings removed). However, the algorithm breaks down as the number of elements in
S is reduced below 10. This is consistent with the mathematical underpinnings of the MEA, as given by Theorem 1 and Proposition 4. We note that the result for four data elements is a bit low, and should be closer to
, whereas those for six and eight are closer to theoretical values of
and
, respectively.
Table 2.
Modeling the MEA with noise-free data.
Table 2.
Modeling the MEA with noise-free data.
| n | M | | iter | | | | | ≥ |
|---|
| 10 | | | | | 0 | 0 | 0 | 0 |
| 10 | | | | 100 | 0 | 0 | 0 | 0 |
| 10 | | | | 100 | 0 | 0 | 0 | 0 |
| 10 | | | | 100 | 0 | 0 | 0 | 0 |
| 10 | | | | 100 | 0 | 0 | 0 | 0 |
| 4 | | | | | 6 | 4 | 2 | 0 |
| 6 | | | | 97 | 3 | 0 | 0 | 0 |
| 8 | | | | 99 | 1 | 0 | 0 | 0 |
| 10 | | | | 99 | 1 | 0 | 0 | 0 |
| 12 | | | | 100 | 0 | 0 | 0 | 0 |
| 14 | | | | 100 | 0 | 0 | 0 | 0 |
- (2.)
Uniformly distributed noise.
We assume that the
’s have a uniform distribution, given by
. The top half of
Table 3 illustrates the effect of increasing
M, resulting in more missing observations with a fixed noise parameter. Larger
M generally requires more data to maintain the same accuracy in
and results in larger
. The bottom half shows the effect of increasing noise with
n and
M fixed. The noise floor was given by
. This prevents noise from having noisy elements cross over each other. Again, if noisy elements cross over, this creates false periods.
Table 3.
Modeling the MEA with noisy data.
Table 3.
Modeling the MEA with noisy data.
| n | M | Δ | % miss | iter | | |
|---|
| 10 | | | | | | |
| 10 | | | | | | |
| 50 | | | | | | |
| 10 | | | | | | |
| 10 | | | | | | |
| 10 | | | | | | |
Table 3 shows that the estimates of the period skew toward underestimating
. We again note that this leads to open questions involving
order statistics. In the original data, the assumption is that the noise components
’s are independent identically distributed (iid). The first differencing removes iid, and the subsequent differencing and sorting then makes this noise negatively skewed. Statistical analysis of the noise after several steps of the MEA is an open question. Standard results in order statistics assume iid (e.g., see Sarhan and Greenberg [
24] and Reiss [
25]).