Next Article in Journal
Task Scheduling Optimization in Cloud-Edge Collaborative Architecture via a Multi-Strategy Artificial Lemming Algorithm
Previous Article in Journal
Predefined-Time Neural Adaptive Control for Distributed Formation Control of Nonlinear Multiagent Systems with Full-State Constraints
Previous Article in Special Issue
An Eigenvector Problem Arising in the Study of Convergence of Walsh–Fourier Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Analysis and Deinterleaving of Periodic Point Processes

Department of Mathematics and Statistics, American University, 4400 Massachusetts Ave., NW, Washington, DC 20016-8050, USA
Mathematics 2026, 14(10), 1660; https://doi.org/10.3390/math14101660
Submission received: 8 February 2026 / Revised: 20 April 2026 / Accepted: 20 April 2026 / Published: 13 May 2026
(This article belongs to the Special Issue New Perspectives in Harmonic Analysis)

Abstract

This paper addresses the problem of determining the underlying structure of a given point process. The point process data is a finite set of event times, given by a set of real numbers. We wish to determine if the process has been generated by one or more periodic processes and, if so, extract the period or periods from the data. If there are several periods, we also wish to deinterleave the data into separate periodic processes, each generated by a single period. We approach the problem by developing two algorithms. These algorithms are designed to work on all data sets, but, in particular, on extremely sparse data sets where other procedures do not work. The first algorithm works on sets of event times with only one underlying period, quickly producing an estimate of that period. The mathematical justification of the procedure involves number theory, including an interpretation of the Riemann zeta function as an asymptotic probability distribution. The second algorithm analyzes event times from multiple periodic processes. It relies on the first algorithm as an “engine” to create larger data sets, from which it produces estimates of underlying periods. This second procedure also relies on a mathematical justification, a result from both number theory and harmonic analysis—the equidistribution theorem of Weyl. We then deinterleave the data, breaking it down into components, each generated by a single period.

1. Introduction

Point processes are an important component of data analysis, appearing in everything from analysis of astronomy data, atmospheric chemistry, radar and sonar, to the analysis of neural networks. This paper deals with analyzing a given point process, and in particular, determining if the process has been generated by one or more periodic processes. The point process data is a finite set of event times, given as a set of real numbers. The goal is to determine if the data has been generated by periodic processes, and if so, extract the period or periods from the data. If there are several periods, we then deinterleave the data into separate periodic processes, each generated by a single period.
Our approach is to create two algorithms to analyze the data. The algorithms rely on the structure of randomness (see Section 2 and Section 3), which tells us that random data can settle into a structure based on the set from which the data is extracted. This approach to solving the data analysis problems allows us to bring in some deep mathematical ideas, which in turn explain why the algorithms work so efficiently. The algorithms are designed to work on all event-time data, including data where other methods break down because of sparsity (see Section 2.4 for comparisons with least squares and periodograms). The first algorithm works on event times with only one underlying period, efficiently producing an estimate of that period. The second works on sets of event times with several underlying periods, producing estimates of those periods, which in turn leads to deinterleaving the data into components, each generated by a single underlying period. We present the algorithms and provide mathematical justifications as to why they work.
The first procedure is the modified Euclidean algorithm or MEA. (The steps in the algorithms and their mathematical justifications led naturally to their names.) The algorithm is extremely efficient, stable given reasonable noise, and converges very quickly [1,2,3,4]. Numerous applications of the MEA are discussed in [1,2,3,4,5].
The second procedure, the equidistributed MEA or EQUIMEA, works on event-time data, and relies on the MEA as an “engine” to create larger data sets, from which it produces estimates of the underlying periods. We then deinterleave the data, breaking it down into components, each generated by a single period. Several applications of the EQUIMEA are discussed in [6].
The MEA and EQUIMEA work on all reasonable event-time data. The assumption is that data elements are the results of periodic processes, e.g., radar or sonar data, with missing observations and additive noise. The noise is jitter noise, a variation around the exact time of a given event. The missing data elements can be visualized by seeing the event times as zero crossings of sinusoids, where a random process has removed zero crossings. With no missing observations, standard methods, e.g., least squares or periodograms, can extract the periods even in a relatively noisy environment. If, however, many of these zero crossings are missing, these methods will break down (see Section 2.4).
The EQUIMEA and deinterleaving problem is quite important, especially in radar. Histogram methods for deinterleaving radar can be found in [7,8]. The paper [9] features histograms and neural networks, while histograms and AI are featured in the recent article [10]. Clustering methods can be found in [11]. The Lomb–Scargle periodogram has been used in astronomy [12], atmospheric chemistry [13], and biology [14].
The MEA has been compared several times with standard methods, and these references can be found [1,2,3,4].
The mathematical justification of the MEA involves number theory, including an interpretation of the Riemann zeta function as an asymptotic probability distribution. This interpretation allows us to prove Theorem 1, which shows that in an arbitrarily large lattice of positive integers in n-dimensional Euclidean space, if we randomly choose an element of the lattice, the probability that the n-tuple is relatively prime quickly converges to 1 as n increases. Given noise-free data, the algorithm very likely converges to the exact value of the period with as few as ten data samples.
Remark 1. 
The mathematical reasons that only ten data elements are needed come from Theorem 1, estimates given in Proposition 4, and calculations of the zeta function given in Table 1. Computations from computer simulations in Table 2 demonstrate the precision of this result.
For data with additive noise, simulation results show, for example, good estimation of the period, converging as noise decreased and the number of data points increased (see Table 3 for computer simulations of these results). Variations on the MEA, which deal with very noisy environments, can be found in [2,3,4].
The EQUIMEA procedure works on periodic processes created by several sources, each with different underlying periods. We assume that the periods are independent. Therefore, with probability one, they are not rational multiples of each other. We note, however, that even if two or more periods are rational multiples, the algorithm can still produce results by utilizing the phase data. The general algorithm extracts the set of fundamental periods by using Weyl’s equidistribution theorem to help separate sources. Given two periods, differences of their respective period multiples will reinforce a given period, but intermixed period differences will not be equal to a rational multiple of either number with probability one. The relative primeness of data generated by one period will backfill the missing elements for that period, whereas the data from two different periods will become Weyl flat (see Definition 4). Theoretically, if we allowed the process to go on indefinitely, it would become ergodic, with the sets of differences from different periods becoming a set of full measure, while the set of differences from the same period remaining as a set of measure zero (see Theorem 3). Using these periods, we then deinterleave the processes. We demonstrate the EQUIMEA on three different data sets, all with ≈90% missing observations—a sparse single period set (Figures 1 and 2), a sparse set with two periods analyzed then deinterleaved (Figures 3–5), and a sparse set with three periods analyzed then deinterleaved (Figures 6–8).
The paper is organized as follows. The MEA is developed in Section 2. This includes the mathematical justifications for the procedure, computer simulations, and a discussion of how to estimate other components of the data. Section 2.2 includes the proof of Theorem 1, the main justification of the MEA. Section 3 gives a description and analysis of the EQUIMEA, including the deinterleaving procedure. Weyl’s equidistribution theorem plays a key role in this work (Theorem 3), but we point out again that the “engine” for the EQUIMEA is the MEA. Three simulations of the EQUIMEA are given in Section 4, including the deinterleaving of data sets with multiple periods.
We finish with some observations, discussing how the structure of these algorithms naturally leads to questions about mathematical modeling, computational science, and data science.

2. Data Sets with a Single Period

We first assume that the data has a single underlying period. We model the event times as a finite set of positive real numbers.
S = { s j } j = 1 n , with s j = k j τ + φ + η j ,
where the period τ is a fixed positive real number, the k j ’s are non-repeating positive integers, the phase φ is a real random variable uniformly distributed over the interval [ 0 , τ ) and the additive noise elements { η j } are zero-mean independent and identically distributed (iid) error terms. We refer to τ as the generator of the process. We also assume that the η j ’s have a symmetric probability density function (pdf) and that | η j | < τ 2 for all j. The assumption that | η j | < τ 2 for all j is important, for this does not allow noisy data elements to cross over each other. This crossover could, in effect, create false periods.
By assuming that the k j ’s are increasing, we can interpret the s j ’s as an increasing sequence of event times with time gaps determined by the k j ’s. Given S and a value of τ that is an underlying period of S, the numbers τ 2 , τ 3 , are also periods. We refer to these values as the harmonics of τ.
We can think of the event times as the set of zero-crossings of a sinusoid. If f ( t ) = sin ( π τ t φ ) , then S is a finite subset of the zero-crossings of f with missing observations and small perturbations in the zero-crossings (jitter noise). Given a set of event times with an underlying period, the signal processing community usually models event times with a set of Dirac deltas called a pulse train. Work on the analysis of event times includes other methods, including signal sampling, periodogram-based methods for spectral analysis of the pulse train, filtering, and binning and searching. The papers [2,3,4,15] contain numerous references to the signal processing literature.
The MEA extracts an estimate of τ . The procedure is computationally efficient and requires little input data. Given reasonable data, it quickly converges with very high probability to either the exact value of τ (when the data is noise-free) or an estimate τ ^ (when the data has additive noise). When there is no noise, the MEA produces τ with very high probability even if we are given very few (e.g., n = 10 ) data elements, independent of the number of missing measurements. In the presence of noise (non-zero η j ’s in (1)) and false data (or outliers), there is a tradeoff between the number of data samples, the amount of noise and the percentage of outliers. The algorithm performs well given low noise for n 10 , but will degrade as noise is increased. There is always a trade-off between the amount of data and the amount of noise. Given more data, various statistical techniques ([2,3,4,15]) can be used to reduce noise effects and speed up convergence. The EQUIMEA, although more complicated, can also be used to extract τ .

2.1. The Modified Euclidean Algorithm (MEA)

Assume that we have a set of event times S (as in (1)). The MEA is a procedure for finding τ . The mathematical justifications for the procedure use number theory. We cite Hardy and Wright [16], Ireland and Rosen [17], Knuth [18,19,20], Leveque [21], Rosen [22], and Schroeder [23] as number theory references.
First, we recall the Euclidean algorithm. (Knuth calls the Euclidean algorithm “the grandfather of all algorithms” because it is the oldest algorithm still in use today [19], p. 318.) Given two positive integers a and b, b < a , we say that b divides a if there exits a positive integer k such that a = k · b . This is denoted by b | a . An integer p > 1 that has no divisors other than 1 and itself is a prime. An integer m > 1 that is not prime is called a composite. Prime numbers are the fundamental building blocks of the integers. The Fundamental Theorem of Arithmetic (see Rosen [22], p. 97) states that every positive integer 2 is a product of powers of primes, unique up to the ordering of the primes.
The Euclidean algorithm is based on the property that Z is a Euclidean domain, i.e., given two positive integers a and b, a > b , there exist two positive integers q and r such that a = q · b + r , 0 r < b . If r = 0 , then b divides a. This property can be used to develop the Euclidean algorithm, which can be represented in the following. Let “⟵” denote replacement, e.g., “ a b ” means that the value of the variable a is to be replaced by the current value of the variable b. Given a , b , a > b , proceed as follows:
(1.) 
a = b · q + r : 0 r < b .
(2.) 
The algorithm terminates if r = 0 . Set ( a , b ) = b .
(3.) 
Else, set a b and b r . Go to (1.).
The procedure yields the greatest common divisor of a and b. The greatest common divisor is the product of all the powers of prime factors p that divide both a and b. Note that gcd ( k 1 , , k n ) , the greatest common divisor of the set { k j } , is not the pairwise gcd of the set { k j } . If gcd ( k 1 , , k n ) = 1 , the set { k j } is called mutually relatively prime. If gcd ( k i , k j ) = 1 for all i j , the set { k j } is called pairwise relatively prime. If a set is pairwise relatively prime, it is mutually relatively prime. However, the converse is not true. For example, consider the set { 15 , 21 , 35 } . Note that no pair in this set is relatively prime, but the entire set is mutually relatively prime.
To work with our data sets, we have to make modifications to the standard Euclidean algorithm. We have data sets with more than two elements. The gcd of a set of more than two integers can be computed using Proposition 1(i.). We also have data sets that possibly have non-integer periods. Proposition 1(ii.) extends the gcd to multiples of a fixed real number τ > 0 .
Proposition 1. 
( i . ) gcd ( k 1 , , k n ) = gcd ( k 1 , , k n 2 , ( gcd ( k n 1 , k n ) ) ) .
( i i . ) gcd ( k 1 τ , , k n τ ) = τ gcd ( k 1 , , k n ) .
Proof. 
See Leveque [21], p. 16.    □
Let gcd ( k 1 , , k n ) = κ , and so gcd ( k 1 τ , , k n τ ) = κ τ . Then, κ τ plays the role of the fundamental unit of the set S = { k 1 τ , , k n τ } . We say that the elements in the set S are commensurate to κ τ , i.e., every element in the set S can be expressed as an integer multiple of κ τ . For example, the elements of the set { 3 7 , 1 4 7 , 4 } are commensurate to 1 7 . Note, gcd ( 3 , 11 , 28 ) = 1 . If no such fundamental unit exists, we call the elements of the set incommensurate.
Remark 2. 
All finite sets of rational numbers are commensurate. Add a single irrational number to a finite set of rational numbers and this new set is incommensurate.
The first step of the Euclidean algorithm involves division. Noisy data makes this step unstable in the following sense. The additive noise components could be non-zero, but arbitrarily close to zero. Dividing by these numbers could result in arbitrarily large numbers. Our third modification of the Euclidean algorithm addresses this. We develop a form of the procedure based on sorting and subtraction rather than division. Although this requires additional iterations, it establishes the groundwork to modify the algorithm so that it is stable with respect to noise.
Remark 3. 
It is important to note that Proposition 2 plays a fundamental role in both the MEA and EQUIMEA.
Proposition 2. 
gcd ( k 1 , , k n ) = gcd ( ( k 1 k 2 ) , ( k 2 k 3 ) , , ( k n 1 k n ) , k n ) .
Proof. 
First assume that γ is a positive integer that divides each k j , i.e., γ | k j for j = 1 , , n . Then, γ | ( k j k j + 1 ) for j = 1 , , n 1 , and γ | k n . Therefore, γ is a divisor of ( ( k 1 k 2 ) , ( k 2 k 3 ) , , ( k n 1 k n ) , k n ) . Conversely, assume δ is a positive integer such that δ | ( k j k j + 1 ) for j = 1 , , n 1 , and δ | k n . Therefore, there exist positive integers c and d such that c δ = k n and d δ = ( k n 1 k n ) . Thus, d δ + k n = ( d + c ) δ = k n 1 , and so δ | k n 1 . Continuing in this fashion, we get that δ | k j for j = 1 , , n . Therefore, since the sets { k j } and { ( k j k j + 1 ) } { k n } have the same divisors, gcd ( k 1 , , k n ) = gcd ( ( k 1 k 2 ) , ( k 2 k 3 ) , , ( k n 1 k n ) , k n ) .    □
Remark 4. 
After the first sort and subtraction, the gcd of the integers is invariant throughout the remainder of the process.
In order to eliminate the phase information φ , we subtract it. This is justified by the following.
Proposition 3. 
If
β = gcd ( ( k 1 k 2 ) , ( k 2 k 3 ) , , ( k n 1 k n ) ) ,
then β | ( k l k m ) for all 1 l < m n .
Proof. 
Let β = gcd ( ( k 1 k 2 ) , ( k 2 k 3 ) , , ( k n 1 k n ) ) . Then, since β | ( k 1 k 2 ) and β | ( k 2 k 3 ) there exist positive integers c and d such that c β = ( k 1 k 2 ) and d β = ( k 2 k 3 ) . Adding gives ( c + d ) β = ( k 1 k 3 ) . Continuing in this fashion gives the result for l = 1 . Shifting to l > 1 gives the general result.    □
We call the procedure the modified Euclidean algorithm (MEA). We first sort the s j ’s in descending order to allow for a more straightforward implementation of our algorithm, i.e., s 1 s 2 s n . (Thus k n is the minimum of { k j } .) We form a new set by subtracting adjacent pairs of these numbers, given by s j s j + 1 . After this first operation, the phase information has been subtracted out and the resulting set has the simpler form
S = { s j } j = 1 n 1 , with s j = K j τ + η j ,
where K j = k j k j + 1 and η j = η j η j + 1 . In subsequent iterations of the algorithm, the data maintains this same general form. Eliminating the phase information simplifies the process. The main theorem of the MEA, Theorem 1, applies to { K j } after φ has been eliminated.
The   Modified   Euclidean   Algorithm   (MEA) ̲
Initialize: Sort the elements of S in descending order. Set iter = 0 and import η 0 .
(1.) 
[Adjoin 0 after first iteration.] If iter > 0 , then S S { 0 } .
(2.) 
[Form the new set with elements ( s j s j + 1 ) .] Set s j ( s j s j + 1 ) .
(3.) 
[Sort.] Sort the elements in descending order.
(4.) 
[Eliminate noise.] If 0 s j η 0 , then S S { s j } .
(5.) 
[Terminate or loop.] The algorithm terminates if S has only one element s 1 . Declare τ ^ = s 1 . If not, then set iter ( iter + 1 ) . Go to (1.).
We then sort, subtract adjacent pairs and, after the first iteration, eliminate noise and adjoin the previous non-zero minimum to the set. The algorithm is continued by iterating this process of sorting, subtracting and eliminating the elements in [ 0 , η 0 ] , adjoining the previous non-zero minimum at each new iteration, and terminating when only a single element remains. Note that Proposition 2 guarantees that the gcd ( K 1 , , K ( n 1 ) ) remains unchanged. The lone remaining element is equal to gcd ( K 1 , , K ( n 1 ) ) · τ + error term , where the error term is the result of the noise terms η j after several iterations of the MEA and the noise floor η 0 .

2.1.1. The Noise Floor η 0

The additive noise elements create a need for a noise floor η 0 . The { η j } are zero-mean independent identically distributed (iid) error terms. We assume that the η j ’s have a symmetric probability density function (pdf) and that | η j | < τ 2 for all j. The assumption that | η j | < τ 2 for all j does not allow noisy data elements to cross over each other, and thus creating false periods.
To deal with the noise, we establish a noise threshold η 0 . After the first sort and subtract (which eliminates the phase φ ), we “zero out” the noise by eliminating the elements in [ 0 , η 0 ] . Setting this noise floor parameter η 0 is key. If we know an estimate of the noise in the data, we set η 0 as twice the maximum range of the noise.
In general, we do not have an estimate of η 0 . For unknown noise models, this estimate can be tricky. First, after the first iteration, the differencing operation has removed the independence of the error terms. Second, the ordering operation makes the nature of the dependence in subsequent iterations difficult to determine. Analysis of order statistics very often rests on an iid assumption, e.g., see Sarhan and Greenberg [24] and Reiss [25]. Without the iid assumption, this analysis leads to many open questions (see Reiss [25]). In general, beyond the first iteration, the pdf of the subsequent error terms becomes asymmetric, even when starting with iid η j ’s with symmetric pdf f η ( η ) . This occurs due to the reordering before differencing at each iteration, and because after the first iteration, the errors are no longer iid.
We developed the following method of estimating η 0 in [4]. Suppose the pdf of the η j ’s is given by f η ( η ) , and consider the set of differences obtained in the first iteration, given by
y j = s j s j + 1 = ( k j k j + 1 ) τ + ( η j η j + 1 ) .
Invoking the zero-mean iid assumption on the η j ’s, the pdf of ( η j η j + 1 ) is given by the convolution f η ( η ) f η ( η ) . So, for example, if f η ( η ) U [ Δ 2 , Δ 2 ] ( η is uniformly distributed with parameter Δ ) then f y j ( y ) = tri [ y ( k j k j + 1 ) τ ] , the triangle function centered at ( k j k j + 1 ) τ . A straightforward method for clustering the data is to employ a gradient operator to determine when a step has occurred. After the first iteration, the gradient is estimated, with large gradient values indicating a step or “edge” in the data. We have employed a simple estimator by convolving with an impulse response given by [ 1 , 0 , 1 ] . The gradient operator has the effect of binning the data. Each bin gives an estimate η 0 , which is given by the largest data point s k in the bin minus the smallest s j in the bin. Let η 0 equal twice the maximum estimate over all of the bins.
This binning process also yielded a very quick method of estimating τ . After subtracting out the phase φ and binning after applying the gradient operator, simply average across the elements in each bin. and then apply the MEA. This led to a multistep estimating process which achieved the Cramér–Rao bound (CRB). We then developed a multi-step procedure that
(i).
Estimates τ ,
(ii).
Estimates the k j ’s, and then
(iii).
Refines the estimate of τ using the estimated k j ’s in a least-squares solution.
In ([4], p. 2291), we refer to this procedure as the MEA-LS. An extensive comparative analysis was presented in [4], pp. 2296–2298), Figures 1–4. Each of these figures presents a comparative performance analysis of MEA, MEA-LS, the 1024 point periodogram, and the 4096 point periodogram against the Cramér–Rao bound as the signal-to-noise ratio (SNR) went from 0 to 50. Figure 5 from ([4], p. 2299) did a similar analysis for increasing jitter noise, as jitter increased from 5% to 35%.

2.1.2. Connection with Analytic Number Theory

Theorem 1 and its corollaries give us that gcd ( K 1 , , K n 1 ) 1 with probability 1 as n . Convergence is exponentially quick. Therefore, the modified Euclidean algorithm yields either the exact value of τ (when the data is noise-free) or an estimate τ ^ (when the data has additive noise). In the noise-free case, the theory tells us that the algorithm very likely yields τ given as few as 10 data samples.
This is a manifestation of the structure of randomness over  Z . The key result is Theorem 1, which is proven by showing that in an arbitrarily large lattice of positive integers in n-dimensional Euclidean space, if we choose an element of the lattice, the probability that the n-tuple is relatively prime quickly converges to certainty as n increases. We cannot “randomly chose” positive integers { k 1 , , k n } , but we can choose the n-tuple as an element of a lattice in R n .
(1). 
Given a randomly chosen n-tuple ( k 1 , , k n ) ( n 2 ) of positive integers in a finite symmetric lattice in R n ,
P { gcd ( k 1 , , k n ) = 1 } 1 quickly as n .
(2). 
Moreover, we can compute P
P { gcd ( k 1 , , k n ) = 1 } [ ζ ( n ) ] 1 as n , ,
where ζ ( z ) is Riemann’s zeta function.
The connection with the zeta function may at first seem surprising. However, if one looks at Euler’s product Formula (4), it is easier to see how the zeta function can play a role in understanding relative primeness. We discuss the zeta function in the following section. (See [1,2] for additional discussion.)

2.2. Pi, the Primes, and Probability

Riemann’s zeta function is defined in the complex numbers C . Given a complex number z = x + i y ( i 2 = 1 , x , y R ), we say that x is the real part of z (denoted by x = z ) and y is the imaginary part of z (denoted by y = z ). Riemann’s zeta function is defined on the complex half plane { z C : ( z ) > 1 } by
ζ ( z ) = n = 1 n z .
Let P = { p 1 , p 2 , p 3 , } = { 2 , 3 , 5 , } be the set of all prime numbers. Euler connected the zeta function to the primes in 1736 by proving that
ζ ( z ) = p j P 1 1 ( p j ) z
(see Conway [26], pp. 187–194). We will show that given n ( n 2 ) “randomly chosen positive integers” { k 1 , , k n } , the probability that this n-tuple is relatively prime is expressed in terms of 1 / ζ ( n ) . This result is key to the MEA.
In the following P { · } denotes probability.
Given a randomly chosen n-tuple ( k 1 , , k n ) ( n 2 ) of positive integers in a finite symmetric lattice in R n ,
P { gcd ( k 1 , , k n ) = 1 } [ ζ ( n ) ] 1
as the size of the lattice .
Heuristically, we could argue as follows. Given randomly distributed positive integers, by the law of large numbers, about 1 / 2 of them are even, and 1 / 3 of them are multiples of three and 1 / p are multiples of some prime p. Thus, given n independently chosen positive integers,
P { p | k 1 , p | k 2 , , and p | k n } = ( Independence )   P { p | k 1 } · P { p | k 2 } · · P { p | k n } = 1 / ( p ) · 1 / ( p ) · · 1 / ( p ) = 1 / ( p ) n .  
Therefore,
P { p | k 1 , p | k 2 , , and p | k n } = 1 1 / ( p ) n .
Calculating this for all of the primes gives us that
P { gcd ( k 1 , , k n ) = 1 } = j = 1 1 1 / ( p j ) n .
where p j is the j th prime. In this last equation, we have used the fact that by The Fundamental Theorem of Arithmetic, the prime factor decomposition of any integer k i > 1 appears among the prime numbers p j raised to some power. By Euler’s formula,
ζ ( z ) = j = 1 1 1 ( p j ) z , ( z ) > 1 .
Thus,
P { gcd ( k 1 , , k n ) = 1 } = 1 / ( ζ ( n ) ) .
Remark 5. 
This heuristic argument breaks down on the first line. Any uniform distribution on the positive integers would have to be identically zero. The merit in the argument lies in the fact that it gives an indication of how the zeta function plays a role in the problem.
The formal proof is developed as follows. Let card { · } denote cardinality of the set { · } , and for n 2 , let { 1 , , } n denote the sublattice of positive integers in R n with coordinates c such that 1 c . Therefore,
N n ( ) = card { ( k 1 , , k n ) { 1 , , } n : gcd ( k 1 , , k n ) = 1 }
is the number of relatively prime elements in { 1 , , } n . Let P n ( ) be the probability that n positive integers chosen at random from { 1 , , } are relatively prime. Thus
P n ( ) = N n ( ) n .
We have that lim N n ( ) n gives the asymptotic meaning of “randomly chosen” positive integers.
Theorem 1. 
Let
N n ( ) = card { ( k 1 , , k n ) { 1 , , } n : gcd ( k 1 , , k n ) = 1 } .
For n 2 , we have that
lim N n ( ) n = [ ζ ( n ) ] 1 .
We begin with the following lemma, which gives us a counting formula for N n ( ) expressed in terms of primes and products of primes. Let x denote the floor function of x, namely
x = max k x { k : k Z } .
Lemma 1. 
Let N n ( ) = card { ( k 1 , , k n ) { 1 , , } n : gcd ( k 1 , , k n ) = 1 } is the number of relatively prime elements in { 1 , , } n . Then
N n ( ) = n p i p i n + p i < p j p i · p j n p i < p j < p k p i · p j · p k n + .
Proof of Lemma 1. 
Choose a prime number p i . The number of integers in { 1 , , } such that p i divides an element of that set is p i . (Note that it is possible to have p i > , because p i = 0 .) Therefore, the number of n-tuples ( k 1 , , k n ) contained in the lattice { 1 , , } n such that p i divides every integer in the n-tuple is
p i n .
Next, if p i · p j divides an integer k, then p i | k and p j | k . Therefore, the number of n-tuples ( k 1 , , k n ) contained in the lattice { 1 , , } n such that p i or p j or their product divide every integer in the n-tuple is
p i n + p j n p i · p j n ,
where the last term is subtracted so that we do not count the same numbers twice (in a simple application of the inclusion–exclusion principle).
Continuing in this fashion, for three integers, say p i < p j < p k , the number of n-tuples ( k 1 , , k n ) contained in the lattice { 1 , , } n such that p i , p j , or p k or any of their products divide every integer in the n-tuple is given by the inclusion–exclusion principle as
  p i n + p j n + p k n p i · p j n + p i · p k n + p j · p k n p i · p j · p k n .
We can therefore see by induction that the number of n-tuples ( k 1 , , k n ) contained in the lattice { 1 , , } n such that p i , p j , p k , or p l or any of their products divide every integer in the n-tuple is given by the inclusion–exclusion principle as
p i p i n p i < p j p i · p j n + p i < p j < p k p i · p j · p k n .
But this counts the complement of N n ( ) in the lattice { 1 , , } n . Therefore,
N n ( ) = n p i p i n + p i < p j p i · p j n p i < p j < p k p i · p j · p k n + .
This completes the proof of Lemma 1.    □
Proof of Theorem 1. 
Lemma 1 gives us that if N n ( ) = card { ( k 1 , , k n ) { 1 , , } n : gcd ( k 1 , , k n ) = 1 } is the number of relatively prime elements in { 1 , , } n , then
N n ( ) = n p i p i n + p i < p j p i · p j n p i < p j < p k p i · p j · p k n + .
We now observe that
1 n p i < p j < < p k p i · p j · · p k n 1 n p i < p j < < p k p i · p j · · p k n =
p i < p j < < p k 1 p i · p j · · p k n = p 1 p n k p prime 1 p n k j = 2 1 j n k .
Since n 2 , this series is convergent. Thus, each term in the expansion of N n ( ) / n is convergent. Now, let
M k = j = 2 1 j n k , for k = 0 , 1 , 2 , .
By noting that since n 2 and the sum is over j N { 1 } , we get
0 < j 1 j n π 2 6 1 < 1 .
Since the k th term in the expansion of N n ( ) / n is dominated by M k and since
k = 0 M k k = 0 π 2 6 1 k = 6 ( 12 π 2 )
is convergent, the series converges absolutely.
We now need the Möbius inversion function μ , which is defined as follows. Let
μ ( 1 ) = 1 , μ ( m ) = 0 if m is divisible by the square of a prime , ( 1 ) r if m = p 1 · p 2 · · p r , p 1 , p 2 , , p r distinct .
The function μ is called an inversion function because if f is a function defined for all positive integers (an arithmetic function) and F is its sum over all divisors, i.e., F ( n ) = d | n f ( d ) , then for all positive integers n, μ inverts f and F by
f ( n ) = d | n μ ( d ) F ( n / d )
(see Rosen [22], pp. 251–255). Euler showed that
  1 p i 1 p i n + p i < p j 1 p i · p j n p i < p j < p k 1 p i · p j · p k n + = m μ ( m ) m n = [ ζ ( n ) ] 1 .
where the last sum is over m N . For n 2 , this series is absolutely convergent. This last equality follows because for j , k , m , n N ,
m 1 m n j μ ( j ) j n = m , j μ ( j ) m j n = k 1 k n d | k μ ( d ) = 1 ,
where we use the fact that both series in the first term converge absolutely and thus can be rearranged in any order (see Leveque [21], p. 120).
Now let ϵ > 0 be given. We want to show that there exists L > 0 such that for all > L ,
N n ( ) n [ ζ ( n ) ] 1 < ϵ .
By (13), there exists L 1 > 0 such that for all > L 1 ,
1 p i 1 p i n + p i < p j 1 p i · p j n p i < p j < p k 1 p i · p j · p k n + [ ζ ( n ) ] 1 < ϵ 2 .
We use the fact that for all x R and all n Z , x n if and only if x < ( n + 1 ) . Let
P = < p i 1 p i n < p i < p j 1 p i · p j n + < p i < p j < p k 1 p i · p j · p k n .
Then
N n ( ) n 1 p i 1 p i n + p i < p j 1 p i · p j n p i < p j < p k 1 p i · p j · p k n + | P | .
But
  | P | m = ( + 1 ) 1 m n m = ( + 1 ) 1 m 2 m = ( + 1 ) 1 m ( m 1 ) m = ( + 1 ) 1 ( m 1 ) 1 m = 1 .
Choose L 2 such that for all > L 2 , 1 < ϵ 2 . Let L = max { L 1 , L 2 } . Then, for all > L
  N n ( ) n [ ζ ( n ) ] 1 = N n ( ) n 1 p i 1 p i n + + 1 p i 1 p i n + [ ζ ( n ) ] 1 N n ( ) n 1 p i 1 p i n + + 1 p i 1 p i n + [ ζ ( n ) ] 1 < ϵ 2 + ϵ 2 = ϵ .
This completes the proof of the theorem.    □
Corollary 1. 
Let n 2 . Given a randomly chosen n-tuple of positive integers ( k 1 , , k n ) { 1 , , } n , we have that
lim N n ( ) n = [ ζ ( n ) ] 1 .
Thus,
gcd ( k 1 , , k n ) 1 ,
with probability [ ζ ( n ) ] 1 as .
Evaluating the zeta function, even at positive integer values, is challenging. Euler gave us a remarkable formula which evaluates ζ ( z ) at the even integers. Ireland and Rosen describe this as one of Euler’s “most remarkable computations” [17], p. 231. The exact evaluations of ζ at { 3 , 5 , 7 , } are still open (see, e.g., [27]). We list the values of ζ ( z ) for z = 2 , 4 , 6 , 8 , , 16 in Table 1.
We can, however, estimate ζ ( n ) at the odd integers. Moreover, the estimate shows that [ ζ ( n ) ] 1 1 quickly as n increases. In fact, the rate of convergence is exponential. This estimate, combined with Theorem 1, explains why as few as 10 data elements are needed to estimate τ as demonstrated in Table 2.
Proposition 4. 
Let ω ( 1 , ) . Then
lim ω [ ζ ( ω ) ] 1 = 1 ,
converging to 1 from below faster than ( 1 2 1 ω ) .
Proof. 
Since
ζ ( ω ) = n = 1 n ω
and ω > 1 ,
  1 ζ ( ω ) = 1 + 1 2 ω + 1 3 ω + 1 4 ω + 1 5 ω + 1 + 1 2 ω + 1 2 ω + 1 4 ω + + 1 4 ω 4 times + 1 8 ω + + 1 8 ω 8 times + = k = 0 2 2 ω k = 1 1 2 2 ω = 1 1 2 1 ω .
Thus,
( 1 2 1 ω ) [ ζ ( ω ) ] 1 < 1 ,
and so
[ ζ ( ω ) ] 1 1 as ω ,
converging to 1 from below faster than ( 1 2 1 ω ) .    □
The first step of the MEA eliminates the additive phase φ . After this first step, the MEA is working with the differences of pairs of the initial data, having the form
K 1 τ , , K ( n 1 ) τ , where K j = k j k ( j + 1 ) , j = 1 , ( n 1 ) .
By Proposition 3, a prime p i divides each k j if and only if p i divides k j k n , j = 1 , ( n 1 ) . This just shifts the point ( 1 , 1 , , 1 ) to the point ( ( k n + 1 ) , ( k n + 1 ) , , ( k n + 1 ) ) . Therefore, for n 3 , we have that
lim N ( n 1 ) ( , k n ) n 1 = [ ζ ( n 1 ) ] 1 .
Combining Theorem 1 with Propositions 1 and 3 and inequality (21) shows that the algorithm generates the underlying period τ in the noise-free case as the number of data elements n goes to infinity. Moreover, (21) shows that the algorithm very likely produces this value in the noise-free case with as few as 10 data elements.
Corollary 2. 
Let n 3 . Given a randomly chosen n-tuple of positive integers ( k 1 , , k n ) { 1 , , } n and a fixed positive real number τ,
gcd ( k 1 τ , , k ( n ) τ ) τ ,
with probability 1 as n .

2.3. Simulations of the MEA

We tested the MEA by designing computer simulations. We set up 100-loop Monte-Carlo runs, and then calculated statistics based on these computations. Let n denote the number of data elements in a given experiment, and without loss of generality, let τ = 1 in all experiments. This choice is arbitrary, but it does allow for a direct visual analysis of the results. Any other fixed real positive number would have yielded similar results.
Let τ ^ denote the value the algorithm gives for τ , and let s t d ( τ ^ ) denote the experimental standard deviations. The initial phase φ was chosen randomly in [ 0 , τ ) , and did not play a factor as it was eliminated after the first differencing. Noise values η j were modeled as uniformly distributed with the probability distribution function (pdf) f η ( η ) U [ Δ 2 , Δ 2 ] , where U [ α , β ] denotes the uniform distribution across the interval [ α , β ] . Then, for example, Δ = 2 × 10 1 implies random phase jitter that is ± 10 % of the period τ = 1 . In Table 3 various noise threshold values η 0 equaled Δ . (We noted that an increase in the noise floor η 0 had the effect of speeding up the algorithm.)
The first set of simulations assumed that the data had no additive noise, i.e., for these simulations, η 0 = 0 .
(1.) 
Estimation from data without additive noise.
The first simulation examined the effects that changes in n and in the percentage of missing observations have on the algorithm’s performance. The data points had no additive noise i.e., η j = 0 for all j. The algorithm converged in many cases to the exact value of τ = 1 . When the number of events n was extremely low, the MEA also converged to multiples of τ .
The missing data elements were modeled by creating jumps in the k j ’s as follows. We chose an integer l randomly from the interval [ 1 , M ] . Given k j , k j + 1 = k j + l . Increasing M increased possible jumps, thus making the data increasingly sparse. Results are shown in Table 2. We let % m i s s denote the experimentally determined average percentage of missing observations and i t e r denote the average number of iterations required to converge. To interpret these, again visualize the data as zero-crossings of f ( t ) = sin π τ t φ . A random process has removed % m i s s of the zero crossings of f, leaving only n observations.
The top half of Table 2 shows the effect of changing M and, therefore, changing the percentage of missing observations. Given insufficient data, the algorithm may converge to a multiple of τ . Columns labeled τ , 2 τ , 3 τ , and 4 τ indicate the percentage of runs that converged to these values. The algorithm is able to choose τ correctly based on n = 10 data samples, even with 99.99988928 % of the possible observations missing. Convergence in the noise-free case depends on n but is independent of M, as shown by the analysis above. The bottom half of Table 2 illustrates the effect of changing n for M fixed. Reliable results are achieved for n 10 . Note, however, although it is very probable that as few as 10 data elements will produce τ , it is still possible that one could get a multiple of τ . We did get an outlier in one simulation, as one can see in the fourth line of the lower table.
Table 2 shows that given n 10 event times in S, the MEA works very well (even with 99.99988928 % of the zero crossings removed). However, the algorithm breaks down as the number of elements in S is reduced below 10. This is consistent with the mathematical underpinnings of the MEA, as given by Theorem 1 and Proposition 4. We note that the result for four data elements is a bit low, and should be closer to 1 / ζ ( 4 ) = 90 / π 4 , whereas those for six and eight are closer to theoretical values of 1 / ζ ( 6 ) = 945 / π 6 and 1 / ζ ( 8 ) = 9450 / π 8 , respectively.
Table 2. Modeling the MEA with noise-free data.
Table 2. Modeling the MEA with noise-free data.
nM % miss iter τ 2 τ 3 τ 4 τ 5 τ
10 10 1 88.917 7.70 100 % 0000
10 10 2 98.878 47.58 1000000
10 10 3 99.888 436.79 1000000
10 10 4 99.989 2293.16 1000000
10 10 5 99.999 30,167.68 1000000
4 10 2 94.754 11.38 88 % 6420
6 10 2 93.076 11.84 973000
8 10 2 90.853 9.25 991000
10 10 2 88.951 6.87 991000
12 10 2 87.100 6.74 1000000
14 10 2 85.144 6.38 1000000
(2.) 
Uniformly distributed noise.
We assume that the η j ’s have a uniform distribution, given by f η ( η ) U [ Δ 2 , Δ 2 ] . The top half of Table 3 illustrates the effect of increasing M, resulting in more missing observations with a fixed noise parameter. Larger M generally requires more data to maintain the same accuracy in τ ^ and results in larger s t d ( τ ^ ) . The bottom half shows the effect of increasing noise with n and M fixed. The noise floor was given by η 0 Δ . This prevents noise from having noisy elements cross over each other. Again, if noisy elements cross over, this creates false periods.
Table 3. Modeling the MEA with noisy data.
Table 3. Modeling the MEA with noisy data.
nMΔ% missiter τ ^ std ( τ ^ )
10 10 1 10 3 89.123 7.31 0.9995 0.0002
10 10 2 10 3 97.887 7.73 0.9980 0.0010
50 10 3 10 3 99.803 11.24 0.9973 0.0035
10 10 1 10 3 89.13 4.31 0.9995 0.0002
10 10 1 10 2 87.94 4.45 0.9883 0.0051
10 10 1 10 1 88.05 4.33 0.8857 0.0432
Table 3 shows that the estimates of the period skew toward underestimating τ . We again note that this leads to open questions involving order statistics. In the original data, the assumption is that the noise components η j ’s are independent identically distributed (iid). The first differencing removes iid, and the subsequent differencing and sorting then makes this noise negatively skewed. Statistical analysis of the noise after several steps of the MEA is an open question. Standard results in order statistics assume iid (e.g., see Sarhan and Greenberg [24] and Reiss [25]).

2.4. Comparison with Other Methods

Given noisy data from a periodic point process that has no missing observations, least squares procedures can be used to solve for maximum likelihood estimates of the period. Under more general conditions, Fourier analytic methods, e.g., Wiener’s periodogram, can be used to solve for estimates which are approximately maximum likelihood. However, these methods break down when the data has an increasing number of missing observations. Juxtaposed with these methods, the MEA provides parameter estimations that, while not being maximum likelihood, can be used as initialization in an algorithm that achieves the Cramér–Rao bound for moderate noise levels. We describe the conditions under which the least squares procedures and Fourier analytic methods do not produce estimates close to maximum likelihood, and show that the number theoretic methods provide a reliable estimate in these cases. We also discuss the type of data for which the number theoretic methods fail to produce good estimates.

2.4.1. Least Squares

In this section, we assume our data is in the form (1) with the additional assumption that η j is zero-mean additive white Gaussian noise. Initialize by sorting the elements of S in descending order and import η 0 . We eliminate φ by forming the differences y j = t j + 1 t j = ( k j + 1 k j ) τ + ( η j + 1 η j ) , and clean out the elements in [ 0 , η 0 ] . This yields
y 1 y 2 y N 1 = k 1 k 2 k 2 k 3 k N 1 k N τ + δ 1 δ 2 δ N 1 ,
where δ j = η j + 1 η j . We may write (22) compactly in an obvious notation as
y = X d τ + δ .
Equation (23) is a linear regression problem whose least squares solutions yield the minimum-variance unbiased estimate when the noise is zero-mean Gaussian, e.g., see Kay [28]. Generally, use of (23) is preferred for estimating τ , avoiding estimation of φ which has high variance, and so φ can have an arbitrary non-negative value. The solution to (23) corresponds to maximum likelihood estimation and takes the form of a least squares estimate
τ ^ = ( X d T R δ 1 X d ) 1 X d T R δ 1 y ,
where R δ = E [ δ δ T ] . We have assumed white noise so R δ = σ v 2 R ˜ δ where R ˜ δ has 2’s on the main diagonal, 1 ’s on the first upper and lower diagonals, and zeros elsewhere. In general R δ is full rank and its inverse can be expressed element-wise as [ R δ 1 ] i j = min ( i , j ) i j / N , and is therefore easily computed [28].
Although optimal, use of (24) requires knowledge of X d . This is not a problem if there are no missing observations, for then k j = j for j = 1 , 2 , N . More generally, it also works with explicit knowledge of the k j ’s. However, when observations are arbitrarily missing, then the k j ’s are not known in general, and one is faced with more unknowns than equations in (23). We also need to note that if there were only a few missing observations, one could still use least squares by applying the procedure to data clusters in which the k j ’s are known, and then optimizing over these estimates.

2.4.2. Periodograms

For a general periodogram, let the data elements t j be thought of as recorded event times of a periodic procedure. This generates a zero-one time series or delta train with additive noise η j ( t )
s ( t ) = j = 1 n δ ( t ( k j τ + φ ) ) + η j ( t ) .
An obvious approach to estimating the period τ is to circularly convolve s with x ( t , τ ) = k = 1 n δ ( t ( k τ ) ) . This is equivalent to circularly applying a matched filter. This is not an optimal detector because noise manifests itself in the pulse, not the amplitude.
Another approach is to perform a spectral analysis of s using Wiener’s periodogram.
S s ( ω ) = 1 N j = 1 N exp ( 2 π i t j ω ) 2 .
The highest peak of S s ( ω ) then gives an estimate of τ , e.g., see Bartlett [29] and Brillinger [30]. This estimate has been shown to yield estimates of τ which are approximately maximum likelihood for small percentage of missing observations. However, this estimate degenerates as the percentage of missing observations exceeds 50–75%.
We now assume the noise η j has the pdf
f ( η ) = exp ( α cos ( 2 π η / τ ) 2 π I 0 ( α ) , | η | τ 2 ,
with parameter α , where I 0 ( α ) is the zero-th order Bessel function evaluated at α . This pdf was used by Van Trees for analyzing phase-lock synchronization schemes. Note that f ( η ) tends to a uniform pdf as α 0 , and tends to a Gaussian pdf as α . This is a fortuitous choice for the noise pdf, as it results in a maximum likelihood estimate of τ given by
τ ^ x = argmax τ j = 1 N exp 2 π i t j τ 2 .
(see Fogel and Gavish [31]).
This is seen as follows. First note that f ( η ) = f ( η ) for all α implies zero-mean. Assuming that β = [ φ , τ ] T is known, the pdf of t j given β is
f ( t j | β ) = f ( t j k j τ φ ) ,
and for iid noise,
f ( t | β ) = j = 1 N f ( t j | β ) .
The log-likelihood function is
L ( β ) = ln f ( t | β ) = j = 1 N [ α ( cos 2 π τ ( t j ) φ ) ln 2 π I 0 ( α ) ] .
Therefore, the maximum likelihood estimate is found by maximizing L ( β ) with respect to τ , yielding
τ ^ M L = argmax τ j = 1 N cos 2 π ( t j φ ) τ .
Using the fact that
cos 2 π τ ( t j φ ) = exp 2 π i ( t j φ ) τ ,
then, for ϕ constant, maximizing L ( β ) is equivalent to
τ ^ x = argmax τ j = 1 N exp ( 2 π i t j τ ) 2 ,
which is in the form of spectrum analysis of a point process with occurrence times t j .
These estimators depend directly on the exponential sum (as does the periodogram), and do not require knowledge of the pdf parameter α . Use of (27) requires a fine search over f = 1 / τ . This may result in a significant computational load because the FFT is not a good computational option, versus direct computation of the exponential sum, given a large number of zeros in x ( t ) . Moreover, estimates decay for >50–75% missing observations, and drop off for >90% missing observations.

2.4.3. The MEA

The MEA is designed to fill in the gap for >50–75% missing observations. It works when least squares and Weiner periodograms fail. The data can again be thought of as a set of event times of a periodic process, which generates a zero-one time series or delta train with additive noise η ( t )
s ( t ) = j = 1 n δ ( t ( k j τ + φ ) ) + η j ( t ) .
The k j ’s determine the best procedure for analyzing this data. Given a sequence of consecutive k j ’s or even an explicit knowledge of the k j ’s and assuming white Gaussian noise, we can get a maximum likelihood estimate using least squares. Fourier analytic methods work with some missing observations, but when the percentage of missing observations is too large (>50–75%), they break down. Number theoretic methods can work with very sparse data sets (>90% missing observations). However, here there is also a trade-off. Given 10 data elements with a high signal-to-noise ratio and no outliers, the number theoretic methods can produce reliable estimates even with 99 % missing observations. However, given an extremely noisy environment and a large number of outliers, the number theoretic methods require considerably larger data sets to get reliable estimates. If the period τ is much lower than the noise, this then creates false outliers, and the MEA will break down.

2.4.4. The MEA and the Periodogram: Missing Data

We finish this section with a comparison—an analysis of what happens to the MEA vs. the periodogram as the % of missing observations increases. The MEA is designed to work on all one-periodic data, but especially when the data becomes sparse. We create a data sets with no missing observations and no jitter noise and then relatively minimal (5% jitter) noise. We then, in sequence, we create data sets with 10%, 25%, 50%, 75% and finally 90% missing observations. We let τ = 3.1415 in all data sets. We set up 100-loop Monte-Carlo runs, and then calculated statistics based on these computations. These results are presented in Table 4. The values under ( τ noise-free) and ( τ 5% jitter) are averages.
Note the systematic drop in the data with 5% jitter. This is because of the increase in jitter noise as the data progressed through the algorithm (see Table 3 and the discussion following Table 3).
We then, using the same data, applied the periodogram. These results are presented in Table 5. Here, we measured the range of accuracy by computing the % of trials within an error of 2% of τ = 3.1415 .
Table 5 shows a relatively small systematic drop off for the MEA at 90% missing data, consistent with Table 3 and Table 4. In contrast, the periodogram performs very well up to ≈50–75% missing, and falls off for ≥90% missing. The periodogram produced peaks, but the height of the peaks varied with each simulation. It became impossible to determine the highest peak (See Section 2.4.2.)

2.5. Estimating Phase, Multiples, and Noise

Given an estimate τ ^ of τ , it is then possible to estimate the phase φ , the multiples k j , and the noise η j in the data. The phase estimation can be useful in the analysis of radar and sonar data, the multiples k j can be useful in the analysis of data with an associated shift register (such as hop times of frequency-hopping spread-spectrum (FHSS) radios), and information about the noise can be useful in algorithm analysis.
Phase estimation is important in radar and sonar analysis, and several researchers have several researchers have addressed this problem. Fogel and Gavish [31] produced an estimator φ ^ for φ , given by
φ ^ = τ ^ 2 π arg j = 1 n exp ( 2 π i s j τ ^ ) .
Given a good estimate τ ^ of τ , we can derive φ ^ . We have that exp ( 2 π i k j τ τ ^ ) 1 . For η j τ τ ^ , η j / τ ^ 1 , and so exp ( 2 π i η j / τ ^ ) exp ( 0 ) = 1 . Therefore,
  τ ^ 2 π arg j = 1 n exp ( 2 π i s j τ ^ ) = τ ^ 2 π arg j = 1 n exp ( 2 π i k j τ τ ^ ) exp ( 2 π i η j τ ^ ) exp ( 2 π i φ τ ^ ) τ ^ 2 π arg j = 1 n exp ( 2 π i φ τ ^ ) = τ ^ 2 π arg n · exp ( 2 π i φ τ ^ ) = τ ^ 2 π arg { n } + arg exp ( 2 π i φ τ ^ ) = τ ^ 2 π arg exp ( 2 π i φ τ ^ ) = τ ^ 2 π 2 π φ τ ^ = φ .
The MEA data after the first difference has the form
σ = { K j τ + η j } j = 1 n 1 { k n τ + φ + η n φ ^ } ,
where K j = k j k j + 1 and η j = η j η j + 1 . Let x + 1 2 = round ( x ) denotes rounding to the nearest integer. Given the estimate τ ^ , estimate k n by
k ^ n = round k n τ + φ + η n φ ^ τ ^
and K j by
K ^ j = round K j τ + η j τ ^ .
Then, k ^ n 1 = K ^ n 1 + k ^ n , k ^ n 2 = K ^ n 2 + k ^ n 1 , and so on.
Estimating the noise η j can be approached as follows. If we have a sufficient amount of data elements, at each step of the MEA, data will be in separate clusters determined by the noise floor. We can bin the data by employing a gradient operator. In each bin, compute the average, and then subtract this value from to get the η j values. Then compute statistics on these values. This approach was used in [4], pp. 2293–2294, and had the effect of increasing the convergence rate of the algorithm and reducing the skewness in the estimate of τ .

3. Data Sets with Multiple Periods

We now discuss the second algorithm, which gives the analysis and deinterleaving of multi-period pulse trains. We assume that the underlying periods are independent of each other, and therefore, with probability one, are not rational multiples of each other. However, even without this assumption, it is important to note that the algorithm will still produce results even with data that includes periods that are rational multiples.
The goal is to first identify the underlying periods (or “generators”) of data, and then to separate out the data that comes from a given generator. We model the data as the union of M copies of our previous data sets { S i } , each with different underlying periods Γ = { τ i } , different k i j ’s, and different phases φ i . Our data is
S = i = 1 M S i = i = 1 M { φ i + k i j τ i + η i j } j = 1 n i .
In this formulation, n i is the number of elements from the i t h generator, { k i j } , for fixed i, is a linearly increasing sequence of natural numbers with missing observations, φ i is a random variable uniformly distributed in [ 0 , τ i ) , and the η i j ’s are zero-mean iid Gaussian with standard deviation 3 σ i j < τ i / 2 . The data models event times from M periodic processes. The data, after being sorted and then reindexed, has the form
S = { s l } l = 1 N = { s 1 , , s N } ,
where N = n i . The data will maintain this form throughout the algorithm.
We assume that the different periods or “generators” { τ i } come from different processes. Therefore, we can assume that the difference of data elements
( φ i + k i j τ i + η i j ) ( φ i + k i j τ i + η i j ) ,
for i i , is incommensurate to τ i and to τ i with probability one. This follows from the assumption that the generators are independent from each other and that the rational numbers are a set of measure zero.
Remark 6. 
If two or more periods are commensurate, the algorithm will generate the fundamental period of the subset of commensurate periods. The phase information will then allow, almost surely, that these data sets can be deinterleaved.
We differ as in the MEA, but we compute and save all of the differences. We repeat this m times, saving the elements from each iteration. We form a union of all of these data elements. The relative primeness of data generated by one generator will backfill the missing elements for that generator (see Proposition 2), whereas the differenced data from two different generators will become Weyl flat.
We assume that we have knowledge of the range of { τ i } , namely lower and upper bounds T L , T U such that 0 < T L τ i < T U . We then “phase wrap” the data as follows. Let x R , let x be the floor function, and let x = x x , the fractional part of x. We define the mapping
Φ ρ ( s l ) = s l ρ = s l ρ s l ρ ,
for ρ [ T L , T U ) . Therefore Φ ρ ( s l ) [ 0 , 1 ) . Weyl’s Theorem applies asymptotically. We will show that for almost every choice of ρ (in the sense of Lebesgue measure) Φ ρ ( s l ) is essentially uniformly distributed in the sense of Weyl.
This is a manifestation of the structure of randomness over a continuous interval  [ T L , T U ) .
(1.) 
For almost every choice of ρ (in the sense of Lebesgue measure), Φ ρ ( s l ) is essentially uniformly distributed in the sense of Weyl.
(2.) 
The set of ρ ’s for which this is not true is rational multiples of { τ i } . Since the rational is countable and the finite union of countable sets is countable, these are a set of measure zero. Therefore, except for rational multiples of { τ i } , Φ ρ ( s l ) is essentially uniformly distributed in [ T L , T U ) .
Given S i t e r = { s 1 , , s N } , we phase wrap the data by computing modulus of the spectrum, i.e., compute
| Spec i t e r ( τ ) | = l = 1 N e ( 2 π i s l / τ ) .
The values of | Spec i t e r ( τ ) | will have peaks at the periods τ j and, for k > 1 , the harmonics ( τ j ) / k .

3.1. Weyl’s Equidistribution Theorem

Weyl’s equidistribution Theorem 2 and its extension Theorem 3 are the theoretical foundations of the EQUIMEA. They play key roles for the EQUIMEA, analogous to the role Theorem 1 played for the MEA.
The idea of Weyl’s Theorem is that given a fixed irrational number γ , the sequence of numbers
γ , 2 γ , , k γ ,
is essentially uniformly scattered over [ 0 , 1 ) . This can be easily visualized using some tools from topological dynamics. Create a lattice { ( m , n ) , m , n N { 0 } } in the upper right hand quadrant in R 2 . Now, draw a ray starting at ( 0 , 0 ) making an angle θ with the positive x-axis. If θ is a rational number, then the ray intersects a lattice point ( m , n ) (and infinitely many other lattice points ( k m , k n ) , k N .) However, if θ is irrational, it never intersects with any lattice points, but does come arbitrarily close to infinitely many lattice points. (For example, if θ is the Golden Mean, given any ϵ > 0 , there are infinitely many pairs of adjacent Fibonacci numbers within ϵ of the ray.) Now, consider the unit square with vertices { ( 0 , 0 ) , ( 1 , 0 ) , ( 1 , 1 ) , ( 0 , 1 ) } . Glue the opposite sides together, resulting in a torus T . If θ is rational, the ray on the torus will be a closed loop. If θ is irrational, the ray never closes. However, given any ϵ > 0 and any point t T , the ray will appear in the ϵ disk around t infinitely often. In fact, the rate at which the ray “fills in” the disk is related to the Diophantine approximation of θ . Very deep work of Katok, Stepin, Margulis et al. [32,33] addresses this.
Fourier series play a key role in Weyl’s Theorem. The definition, from Dym and McKean [34], follows. Let exp ( · ) = e ( · ) .
Definition 1 (Fourier Series). 
Let f be a periodic, absolutely and square integrable function on R , with period 2 Ψ , i.e., f L 1 L 2 ( T 2 Ψ ) . The Fourier coefficients of f, f ^ [ n ] , are defined by
f ^ [ n ] = 1 2 Ψ Ψ Ψ f ( t ) exp ( i π n t / Ψ ) d t .
If { f ^ [ n ] } is absolutely and square summable ( { f ^ [ n ] } l 1 l 2 ), then the Fourier series of f is
f ( t ) = n Z f ^ [ n ] exp ( i π n t / Ψ ) .
The space L 2 ( T 2 Ψ ) is the canonical example of a separable Hilbert Space, and the set { exp ( i π n t / Ψ ) : n Z } is the canonical example of an orthonormal basis for L 2 ( T 2 Ψ ) .
Once again, let card { · } denote the cardinality of { · } .
Theorem 2 (Weyl). 
Given a fixed irrational number γ, then for every a , b such that 0 a < b < 1 ,
lim n 1 n card { 1 k n : a k γ b } = ( b a ) .
The theorem is strikingly intuitive. Weyl’s original proof used techniques and theory from the Fourier series. Our proof follows the development in Körner [35], which follows Weyl’s original development.
We first prove the following lemma.
Lemma 2. 
Let γ be a fixed irrational number, and let a , b be real numbers such that 0 a < b < 1 . Let f : T C be a continuous function. Then
lim n 1 n k = 1 n f ( 2 π k γ ) = 1 2 π π π f ( t ) d t .
lim n 1 n card { 1 k n : ( 2 π k γ ) [ 2 π a , 2 π b ] } = ( b a ) .
Proof of Lemma 2. 
Let
G n ( f ) = 1 n k = 1 n f ( 2 π k γ ) 1 2 π π π f ( t ) d t .
We want to show that G n ( f ) 0 as n . Note that
G 1 ( f ) = 1 n k = 1 n 1 1 2 π π π 1 d t = 0 .
If t [ π , π ] and m Z ,
| G n ( exp ( i m t ) ) | = 1 n k = 1 n exp ( 2 π i k m γ ) 1 2 π π π exp ( i m t ) d t   = 1 n exp ( 2 π k m γ ) k = 1 n 1 exp ( 2 π i k m γ ) 0   = 1 n exp ( 2 π k m γ ) 1 exp ( 2 π i n m γ ) 1 exp ( 2 π i m γ )   = 1 n 1 exp ( 2 π i n m γ ) 1 exp ( 2 π i m γ )   2 n 1 1 exp ( 2 π i m γ ) .
Since γ is irrational, 1 exp ( 2 π i m γ ) is never zero, and so
2 n 1 1 exp ( 2 π i m γ ) 0 as n .
Now, let P ( t ) = m = l l a m exp ( i m t ) be any trigonometric polynomial. Then by linearity, and (48) and (49),
G n ( P ) = m = l l a m G n ( exp ( i m t ) ) 0 as n .
Now, let g , h : T C be two continuous functions. Let ϵ > 0 be given. Assume that
| g ( t ) h ( t ) | < ϵ
for all t T . Then
| G n ( g ) G n ( h ) | 1 n k = 1 n | g ( 2 π i k m γ ) h ( 2 π i k m γ ) | + 1 2 π π π | g ( t ) h ( t ) | d t < 1 n n ϵ + 1 2 π 2 π ϵ = ϵ + ϵ = 2 ϵ .
We now use Weierstrass approximation (see [34,35]). By Weierstrass, there exists a trigonometric polynomial P ( t ) such that, given any continuous function f : T C , we have
sup t T | P ( t ) f ( t ) | ϵ / 3 .
By (50) we can find N such that for all n > N ,
| G n ( P ) | < ϵ / 3 .
Thus, by (51),
| G n ( f ) G n ( P ) | < 2 ϵ / 3 .
Therefore, by the triangle inequality, for all n > N ,
| G n ( f ) | | G n ( P ) | + | G n ( f ) G n ( P ) | < ϵ / 3 + 2 ϵ / 3 = ϵ .
To finish, we need two continuous partitions of unity f u and f l such that f u is one on [ 2 π a , 2 π b ] , zero outside of [ 2 π a ϵ , 2 π b + ϵ ] , and f l is one on [ 2 π a + ϵ , 2 π b ϵ ] , zero outside of [ 2 π a , 2 π b ] . By our computations above, there exists N such that for all n > N ,
| G n ( f u ) | < ϵ , | G n ( f l ) | < ϵ .
Thus,
  1 2 π π π f u ( t ) d t + ϵ 1 n card { 1 k n : ( 2 π k γ ) [ 2 π a , 2 π b ] } 1 2 π π π f l ( t ) d t ϵ .
Since ϵ is arbitrary,
1 n card { 1 k n : ( 2 π k γ ) [ 2 π a , 2 π b ] } ( b a ) as n .
This completes the proof of Lemma 2.    □
To finish the proof of Theorem 2, normalize the interval to [ 0 , 1 ) .
Remark 7. 
If γ is rational, i.e., there exist k Z and n N such that γ = k / n , then the Weyl equidistribution theorem is clearly false. For m N , the sequence of values m γ is a finite set.
We finish this subsection by recalling some basic measure theory. A set X is a set of measure zero if, given any ϵ > 0 , we can cover the set with open sets of total length less than ϵ . Thus, any finite set { x 1 , , x n } is of measure zero, for we can cover it with { ( x 1 ϵ / 4 n , x 1 + ϵ / 4 n ) , , ( x n ϵ / 4 n , x n + ϵ / 4 n ) } . Similarly, any countable set { x 1 , , x n , } is of measure zero, for we can cover it with { ( x 1 ϵ / 4 , x 1 + ϵ / 4 ) , , ( x n ϵ / 4 n , x n + ϵ / 4 n ) , } . The set of rationals Q is countable, and therefore of measure zero. Given an interval [ a , b ] in R , and two measurable functions f , g defined on [ a , b ] , we say that f = g a . e . ( a . e . —almost everywhere) if they differ only on a set of measure zero.
The space of functions that are absolutely and square Lebesgue integrable functions on the real line are denoted by L 1 L 2 ( R ) . If f be a periodic, absolutely and square integrable function on R , with period 2 Ψ , i.e., f L 1 L 2 ( T 2 Ψ ) , then f may be expanded in a Fourier series.

3.2. Extending Weyl to Measures

A given element for our data set has the form φ i + k i j τ i + η i j . The data elements are then mixed by an iterative process of sorting and subtraction. We again assume that the different periods or “generators” { τ i } come from different processes. Therefore, we can assume that the difference of data elements
( φ i + k i j τ i + η i j ) ( φ i + k i j τ i + η i j ) ,
for i i , is incommensurate to both τ i and τ i , i.e., is not equal to a rational multiple of either number. The relative primeness of data generated by one generator will backfill the missing elements for that generator, whereas the data from two different generators will become Weyl flat. Theoretically, if we allowed the process to go on indefinitely, it would become ergodic, with the sets of differences from different generators becoming a set of full measure, while the set of differences from the same generator remaining as a set of measure zero. Background for this section includes Blum and Mizel [36], Breiman [37], and Walters [38]. In particular, we base our discussion on Chapter 6 of Breiman [37] and Chapters 1–4 of Walters [38].
Recall that a probability space consists of a set X , a collection B of Borel subsets of X , and P, an additive measure normalized so that P ( X ) = 1 .
Definition 2. 
Suppose that ( X 1 , B 1 , P 1 ) and ( X 2 , B 2 , P 2 ) are probability spaces.
(a.)
A transformation T : X 1 X 2 is measurable if
B 2 B 2 T 1 ( B 2 ) B 1 .
(b.)
A transformation T : X 1 X 2 is measure-preserving if T is measurable and
P 1 ( T 1 ( B 2 ) ) = P 2 ( B 2 ) f o r   a l l   B 2 B 2 .
The term ergodic is an amalgamation of Greek words ergon (work) and odos (path). It was first used by Boltzmann to describe the action T θ ( t ) : t R on an energy surface. Our discussion of the path of the ray on the torus T is relevant. The intuition is that we have an ergodic process if θ is irrational and do not have one if θ is rational.
Definition 3. 
Let ( X , B , P ) be a probability space. A measure-preserving transformation T of ( X , B , P ) is called ergodic if the only members B of B with the property that T 1 ( B ) = B satisfy P ( B ) = 0 or P ( B ) = 1 .
For a random variable, we say that a property holds almost surely if it holds except on a set of measure zero.
Definition 4. 
A sequence of real random variables { x j } [ 0 , 1 ) is essentially uniformly distributed in the sense of Weyl if given a , b , 0 a < b < 1 , 1 n card { 1 j n : x j [ a , b ] } ( b a ) as n almost surely.
Remark 8. 
We refer to data elements that become essentially uniformly distributed in the sense of Weyl as Weyl flat.
To apply this measure-theoretic variation of Weyl’s Theorem to our data sets, we have to assume that the data sets { S i } are not finite. The way to do this is to assume for each i, { k i j } is a linearly increasing infinite sequence of natural numbers with missing observations such that k i j as j . We must make this assumption because the result is only approximately true for a finite-length sequence.
Theorem 3. 
For almost every choice of ρ (in the sense of Lebesgue measure) Φ ρ ( s l ) is essentially uniformly distributed in the sense of Weyl.
Proof. 
Let ρ [ T L , T U ] , · be the floor function, and · be the fractional part. The mapping
Φ ρ ( s l ) = s l ρ = s l ρ s l ρ ,
is a measurable and measure-preserving mapping into [ 0 , 1 ) .
Claim. For a.e. choice of ρ , Φ ρ is ergodic.    □
Proof of Claim. 
Normalize [ T L , T U ) to [ 0 , 1 ) . Let X be a absolutely and square Lebesgue integrable random variable, i.e., X L 1 L 2 . Therefore, we can expand X in a Fourier series
X ( t ) = n Z X ^ [ n ] exp ( i π n t ) ,
with the Fourier coefficients X ^ [ n ] given by
X ^ [ n ] = 0 1 X ( t ) exp ( i π n t ) d t .
Then
X ( Φ ρ ( t ) ) = n Z X ^ [ n ] exp ( i π n t / ρ ) .
For X to be invariant, for a . e . t ,
X ^ [ n ] ( 1 exp ( i π n t / ρ ) ) = 0 for all n .
This implies either
X ^ [ n ] = 0 or exp ( i π n t / ρ ) = 1 .
But since ρ is irrational a.e., by Weyl’s equidistribution theorem,
exp ( i π n t / ρ ) 1
on a set of full measure.    □
Moreover, the set of ρ ’s for which this is not true is rational multiples of { τ i } . Since Q is a countable set, and a finite union of countable sets is countable, the set of rational multiples of { τ i } is countable, and therefore a set of measure zero. Thus, except for those values, Φ ρ ( s l ) is essentially uniformly distributed in [ 0 , 1 ) . The values at which Φ ρ ( s l ) = 0 almost surely are ρ { τ i / n : n N } , which is a set of measure zero.

3.3. The EQUIMEA

The EQUIMEA algorithm is built upon the MEA, relying on the MEA procedure to backfill data elements from a given fixed generator. It has numerous applications [6]. The EQUIMEA uses Theorem 3 to separate out differenced elements from different generators. We differ as in the MEA (see Steps (1.)(4.) of the EQUIMEA), but we compute and save all of the differences. We repeat this iter times, saving the elements from each iteration. We form a union of all of these data elements (see Step (5.)).
The set S is the union of the data generated by the M periodic processes, i.e., S = i = 1 M S i = i = 1 M { φ i + k i j τ i + η i j } j = 1 n i . Sort S in descending order and reindex. Let card { S } = N , and let Range { S } = s max s min .
The relative primeness of data generated by one generator will “fill in” the missing elements for that generator (see Proposition 2). In contrast, the differenced data from two or more different generators will become Weyl flat.
We rely on a Wiener periodogram and Theorem 3 to sift out the data from a given fixed generator from the differenced data from two or more different generators (see Steps (6.)(11.)).
The equidistribution of Φ ρ ( s l ) for a . e . ρ leads to a Weyl flat range for S for
ρ { τ i / n : n N } .
Given that we have to produce an answer in finite time, and therefore, have to terminate, we pre-set a noise floor η 0 for the MEA component and a degree of accuracy parameter E η , which sets an iterative convergence baseline. The convergence rate of the algorithm is related to η 0 and E η . At each step, the data is sorted in descending order
S i t e r = { s l } l = 1 N = { s 1 , , s N } .
We phase wrap the data by computing modulus of the spectrum, i.e., compute
| Spec i t e r ( τ ) | = l = 1 N e ( 2 π i s l / τ ) .
The values of | Spec i t e r ( τ ) | will have peaks at the periods τ i and their harmonics ( τ i ) / k . Recall, letting ρ [ T L , T U ] , we define
Φ ρ ( s l ) = s l ρ = s l ρ s l ρ ,
a function with range [ 0 , 1 ) . For a . e . values of ρ , Φ ρ ( s l ) is essentially uniformly distributed in the sense of Weyl. Moreover, the set of ρ ’s for which this is not true is rational multiples of { τ i } . The values at which Φ ρ ( s l ) = 0 are almost surely ρ { τ i / n : n N } , which is a set of measure zero. This is why we get peaks at { τ i / n } .
Given | Spec i t e r ( τ ) | , choose the rightmost peak. Label it as τ i t e r . The parameter E η is used as a difference between iterations. If
| Spec i t e r ( τ i t e r ) | > E η ,
declare τ i ^ = τ i t e r .
The   EQUIMEA   Algorithm Multiple   Periods ̲
Initialize: Sort the elements of S in descending order. Import η 0 [Eliminate φ ’s and initial noise.] Form the new set with elements ( s l s l + 1 ) . If 0 s j η 0 , then S S { s j } . Set iter = 1 , i = 1 , and compute E η .
(1.) 
[Adjoin 0.] S i t e r S { 0 } .
(2.) 
[Sort.] Sort the elements of S i t e r in descending order.
(3.) 
[Compute all differences. This backfills the set S.] Set S i t e r = ( s j s k ) with s j > s k .
(4.) 
[Eliminate noise.] If 0 s j η 0 , then S S { s j } .
(5.) 
[Adjoin previous iteration.] Form S i t e r S i t e r S i t e r 1 , sort and reindex.
(6.) 
[Compute spectrum.] Compute | Spec i t e r ( τ ) | = | l = 1 N e ( 2 π i s l / τ ) | .
(7.) 
[Threshold.] Choose the rightmost peak. Label it as τ i t e r .
(8.) 
[Test.] If | Spec i t e r ( τ i t e r ) | > E η declare τ i ^ = τ i t e r . If not, iter ( iter + 1 ) . Go to (1.).
(9.) 
[Remove τ i ^ and harmonics.] Given τ i ^ , remove it and its harmonics, i.e., | Spec i t e r ( τ ) | for τ i ^ / m , m N . Label as Notch i t e r ( τ ) .
(10.) 
[Recompute frequency notched spectrum.] Compute | Spec i t e r ( τ ) Notch i t e r ( τ ) | .
(11.) 
[Threshold.] If | Spec i t e r ( τ ) Notch i t e r ( τ ) | < E η algorithm terminates. Else, let i i + 1 . Go to step (7.).
We then have to deinterleave the data. We use a standard discrete matched filtering algorithm, correlating a known delayed signal (a “template”) with an unknown signal to detect the presence of the template in the unknown signal. Here, our known signal has the form
k Z δ ( t k τ i ) ,
a pulse train of period τ i with no missing observations, and our unknown signal consists of those elements of the original data S generated by all of the underlying periods Γ = { τ i } .
Given the original data and the set of generating periods { τ 1 , , τ n } , convolve the data with
k Z δ ( t k τ 1 ) .
This convolution will identify the elements in the original data set S that are generated by the generating period τ 1 . Call these elements S τ 1 .
Let S 2 = S S τ 1 . Convolve S 2 with
k Z δ ( t k τ 2 ) .
This second convolution will identify the elements in the data set S 2 that are generated by the generating period τ 2 . Call these elements S τ 2 . Let S 3 = S S τ 2 . Repeat the process for τ 3 up to τ n .
This process deinterleaves the data into components generated by individual generators τ i . Further analysis can be carried out on these individual components, e.g., using the analysis outline in Section 2.5.

The Parameter E η

The parameter E η sets an iterative convergence baseline. The convergence rate of the algorithm is related to η 0 and E η .
Remark 9. 
If we had no missing observations and minimal noise, we could proceed as follows. Form the pulse train l = 1 N δ ( τ s l ) , and then compute the data correlation
C ( τ ) = l = 1 N m = 1 l δ ( τ s l + s m ) .
Phase wrap the data by computing modulus of the spectrum, i.e., computing
| S i t e r ( τ ) | = l = 1 N m = 1 n δ ( τ s l + s m ) e ( 2 π i ( s l ( s l s m ) ) ) .
The values of | S i t e r ( τ ) | will have peaks at the periods τ j . This relies on the following fact. If ω is an nth root of unity not equal to 1, then 1 + ω + + ω n 1 = 0 .
However, given that we have (possibly very many) missing observations in our data, computing | S i t e r ( τ ) | will not produce clearly defined underlying periods. This is why we backfill to reinforce the data so that | S i t e r ( τ ) | reveals the underlying periods (see Step 5 in the EQUIMEA Algorithm (Section 2.1)).
The parameter E η is computed from the initialized data. (A variation, with analysis, for setting E η in certain cases is given in [39]. Their model, however, assumes no missing observations.) It is a type of a “spectral envelop,” but only requires the data from the first iteration S 1 = { s l } l = 1 N . It is computed in the initialization step. We have assumed a priori that the periods { τ i } lie in a range [ T L , T U ) . Let T = T U T L . Given N data points, segment the data into N bins of width T / N . For each of these bins, we can compute the correlation and the spectrum of the data elements in the K th bin. Let this data be represented as S K = { s 1 , s 2 , , S N K } . From the pulse train l = 1 N K δ ( τ s l ) , compute the data correlation
C K ( τ ) = l = 1 N K m = 1 l δ ( τ s l + s m )
and the modulus of the spectrum, i.e., compute
S K ( τ ) = l = 1 N K m = 1 l e ( 2 π i ( s l ( s l s m ) ) ) .
for s l in the K th bin. Then
E η = max { C K , S K } .
Given the rightmost peak, which is labeled as τ i t e r , if
| Spec i t e r ( τ i t e r ) | > E η ,
declare τ i ^ = τ i t e r .

4. EQUIMEA Simulation Results

We now present the analysis of three data sets. The first had a single periodic generator, the second, two generators, and the third, three generators. All of these were sparse data sets, with ≈90% missing data elements. We used the EQUIMEA algorithm to isolate the periodicities in the data. Once the periods were isolated, we used convolution-matched filtering algorithms to deinterleave the data sets.
The data set in Figure 1 is generated by τ = 1 , with 90% of the information randomly removed, and 10% jitter noise. The data set had 1000 total points. (For each original data set, we show a snapshot of the data in the interval [ 0 , 100 ] and the deinterleaved data in [ 0 , 200 ] ). Figure 2 shows the graph of | Spec i t e r | after three iterations of the EQUIMEA. The value τ ^ is the rightmost peak. Recall that, given a one period data set S, the period of that data set is the unique maximum value of τ such that the k j are all integers. Note that for any period τ i that fits the form of S, the numbers τ i 2 , τ i 3 , (the harmonics of τ) are also possible values. Thus, what one sees in all of the spectral outputs are peaks at the values τ i , τ i 2 , τ i 3 , . This explains the need for the following steps in the EQUIMEA algorithm:
(7.) 
[Threshold.] Choose the rightmost peak. Label it as τ i t e r .
(8.) 
[Test.] If | Spec i t e r ( τ i t e r ) | > E η declare τ i ^ = τ i t e r .
If not, iter ( iter + 1 ) . Go to (1.).
(9.) 
[Remove τ i and harmonics.] Given τ i , remove it and its harmonics | Spec i t e r ( τ ) | for τ i ^ / m , m N . Label as Notch i t e r ( τ ) .
(10.) 
[Recompute frequency notched spectrum.] Compute | Spec i t e r ( τ ) Notch i t e r ( τ ) | .
Also note that the EQUIMEA algorithm can extract the period from extremely sparse data ( 90 % of the information randomly removed, and 10% jitter noise—Figure 1), and tell us, after the “frequency notching step,” that there is only one period in the data set (Figure 2).
Figure 1. Sparse data set containing one underlying period.
Figure 1. Sparse data set containing one underlying period.
Mathematics 14 01660 g001
Figure 2. One period EQUIter3 spectrum.
Figure 2. One period EQUIter3 spectrum.
Mathematics 14 01660 g002
The data in Figure 3 had two underlying periods equaling 1 and ϕ = ( 1 + 5 ) / 2 , with 90% of the information randomly removed and 10% jitter noise. Figure 4 shows | Spec i t e r | after two iterations. By proceeding right to left, one can easily see the two underlying periods— 1 + 5 2 and 1. Each will reinforce the data elements from the specific generator, allowing the elements to be extracted. The algorithm backfilled the multiples of 1 and 1 + 5 2 . The facts that there was a sufficient amount of data, in addition to the ergodicity of the periods relative to each other both played significant roles in making the periods stand out against the Weyl flat data.
The strengths of the signals | Spec i t e r | are reliant on the fact that the generators are incommensurate, and so the differences of data elements ( φ i + k i j τ i + η i j ) ( φ i + k i j τ i + η i j ) , for i i , are incommensurate to τ i and τ i . These data elements in | Spec i t e r | become Weyl flat. But this is a function of the degree of the accuracy of the computation, the ergodicity of the periods relative to each other, and, if applicable, to Diophantine approximations of the periods. We again note that very deep work of Katok, Stepin, Margulis, et al. [32,33] addresses this.
Figure 4 shows | Spec i t e r | after two iterations. The two underlying periods— 1 + 5 2 and 1—are clear as the peaks furthest to the right. Deinterleaving was then done by a matched filtering technique by convolving the original data set with the individual pulse trains
k Z δ ( t k ( 1 + 5 ) / 2 ) and k Z δ ( t k ) .
Each pulse train reinforces the data elements from the specific generator, allowing the elements to be extracted. Figure 5 shows the data deinterleaved, demarked by color. The red elements were generated by 1, the green by 1 + 5 2 .
The data in Figure 6 had three underlying periods equaling τ 1 = 1 , τ 2 = ϕ = ( 1 + 5 ) / 2 , and τ 3 = 7 . The data set had 90% of the information randomly removed and 10% jitter noise. The original data is the union of three data sets, totaling 3000 data points. Figure 7 shows | Spec i t e r | after two iterations of the EQUIMEA. By proceeding right to left, one can easily see the three underlying periods— 7 , 1 + 5 2 , and 1. (Step 9 (“Frequency notching”) eliminates the first harmonic peak of 7 , located at 7 / 2 , and so the peak at 1 stands out.) Deinterleaving was then done by a matched filtering technique by convolving the original data set with the individual pulse trains
k Z δ ( t k 7 ) , k Z δ ( t k ( 1 + 5 ) / 2 ) , and k Z δ ( t k ) .
Again, each pulse train reinforces the data elements from the specific generator, allowing the elements to be extracted. Figure 8 shows the data interleaved, demarked by color. The red elements were generated by 1, the green by 1 + 5 2 , and cyan by 7 .
The EQUIMEA is reliant on the “incommensurate nature” of the periods to each other. This manifests itself in the difference of data elements
( φ i + k i j τ i + η i j ) ( φ i + k i j τ i + η i j ) .
for i i . The elements are almost surely incommensurate to τ i and τ i , i.e., not equal to a rational multiple of either number.
We note that if two underlying processes have commensurate (or even equal) periods, the EQUIMEA will first capture the largest of the periods (or the period). It is highly probable that the two or more events will likely not have the same phase. When this happens, the original data can then be separated by the deinterleaving process.

The “Spectrum” of Point Processes Tools

We close this section by addressing the role the number theoretic methods can play in the “spectrum” of the tools used to analyze point processes. The focus of this paper (the MEA/EQUIMEA approach) is only one component of an extensive toolbox to attack point process problems. The choice of the best tool to use comes from a preliminary analysis of the data, including characteristics of the data, how it was generated, and a sense of the SNR and the sparsity of the data elements.
Least squares, periodograms, and the MEA/EQUIMEA are all useful in the analysis of general point process data. The sparsity, as determined by the k j ’s, provides guidance toward the best procedure for analyzing this data. Given a sequence of consecutive k j ’s or even an explicit knowledge of the k j ’s, and assuming white Gaussian noise, we can get a maximum likelihood estimate using least squares. Periodograms work with some missing observations, but when the percentage of missing observations is too large (>50–75%), they begin to break down. Number theoretic methods can work with very sparse data sets (>90% missing observations). There is, however, a trade-off. Given 10 data elements with a high signal-to-noise ratio and no outliers, the number theoretic methods can produce reliable estimates. But these too can break down. Given an extremely noisy environment and a large number of outliers, the number theoretic methods require considerably larger data sets to get reliable estimates. If the period τ is much lower than the noise, the MEA will create false outliers, producing unreliable answers.

5. Theory vs. Computation

We finish the paper with a discussion of computability. The following are well-known results on Fibonacci numbers and the Golden Mean ϕ .
Lemma 3 
(Fibonacci and ϕ ). Let F 1 = F 2 = 1 , F n + 1 = F n + F n 1 , the Fibonacci numbers, and ϕ = ( 1 + 5 ) / 2 . Then ϕ = [ 1 , 1 , 1 , 1 , ] , i.e.,
ϕ = 1 + 1 1 + 1 1 + 1 1 +
and
lim n F n + 1 / F n = ϕ .
Thus, given any ϵ > 0 , there exists infinitely many Fibonacci numbers such that
| ( F n + 1 / F n ) ϕ | < ϵ .
Ergodicity plays a key role in the efficiency of the EQUIMEA. We set up computational experiments using two numbers—
ϕ = 1 + 5 2 and L = k = 1 1 / 10 k ! .
Here, ϕ , the Golden Mean, is poorly approximated by rationals, whereas L is the Liouville constant, a transcendental well approximated by rationals. We first note that the EQUIMEA algorithm on data sets with two generators—1 and L—will not work. The number L is irrational, but it is also well approximated by rationals. In fact, L  = 0.110010000000000000000010 , and the gaps between non-zero digits become so large that the EQUIMEA numerically views it as a rational. Thus, even though L is irrational, and therefore Weyl’s Theorem holds, the computations of the EQUIMEA can not separate the periods. The key, as we will show, is ergodicity.
Definition 5. 
A real number β is a Liouville number, denoted by β L , if β is irrational and if for every integer m 2 , there exist integers p , q with q 2 such that
β p q < 1 | q | m .
Liouville developed the Liouville numbers L as a special class of transcendentals. The complement of L includes poorly approximated by rationals. The Golden Mean ϕ = ( 1 + 5 ) / 2 is the least rapidly approximated, and this is seen in its continuant [ 1 ; 1 ¯ ] . In contrast, the continuant of L is [ 0 ; 9 , 11 , 99 , ] , with gaps between increasingly large sequences of 9’s. In terms of their approximation by rationals, ϕ and L are quite different. | Spec i t e r | will become Weyl flat very quickly for the two periods 1 and ϕ (see Figure 3). The ergodicity of the periods relative to each other plays a role in making the periods stand out against the “flat” data. In this sense, although Weyl’s theorem holds for all irrationals, its “numerical erogdicity” is a function of the continuants, and where the behaviour of these continuants lies, with ϕ and L being at opposite extremes. This is evident in the following numerical computation.
We plot exp ( 2 π i n τ j ) for
τ 1 = ϕ = 1 + 5 2 and τ 2 = L = 1 / 10 k ! .
In Figure 9, we see the ergodicity of ϕ , essentially “filling” the circle with only 300 points. In contrast, Figure 10 shows that L at first seems to follow the distribution of τ = 11 / 100 , and, even after 3000 points, has noticeable missing regions. But L is irrational, and therefore, by Weyl’s Theorem, the points will, in theory, eventually fill in. However, numerically, L = 0.1100010000000000000000010 , and the gaps between non-zero digits in L becomes so large that the missing regions will always be present in any numerical experiment.
  • However, encountering an element of L is unlikely.
If σ is a positive real number, and X R , then we say that X has σ-dimensional Hausdorff measure zero if for every ϵ > 0 , there is a sequence of intervals I n such that
X n = 1 I n ,
length ( I n ) < ϵ for all n , and
n = 1 ( length ( I n ) ) σ < ϵ .
If σ = 1 , X has Lebesgue measure zero. If 0 < σ < 1 , the condition is stronger. For example, if C denotes the Cantor middle-thirds set, then C has Lebesgue measure zero, but for any σ < log ( 2 ) / log ( 3 ) , does not have σ -dimensional Hausdorff measure zero.
Theorem 4. 
The class of Liouville numbers L has Lebesgue measure zero.
Remark 10. 
In fact, L has σ-dimensional Hausdorff measure zero for all σ > 0 .
Proof. 
Let β L . Then, for all n N there exists p , q such that q > 2 , p / q Q , and
β p q < 1 | q | n .
Now let
Q n = q = 2 p = p q 1 q n , p q + 1 q n .
Then, Q n is a countable union of open intervals. Moreover, Q n includes every number of the form p / q , for q 2 . We claim that β L if and only if
β ( R Q ) n = 1 Q n .
But since
β p q < 1 | q | n
for all n and all q 2 ,
1 | q | n < β p q < 1 | q | n ,
and so
p q 1 | q | n < β < p q + 1 | q | n .
This proves the claim. Thus, L Q n for every n. Now, let
Q n , q = p = p q 1 q n , p q + 1 q n ,
for q = 2 , 3 , . For any m , n N ,
  L ( m , m ) Q n ( m , m ) = q = 2 Q n , q ( m , m ) q = 2 p = m q m q p q 1 q n , p q + 1 q n .
Let ϵ > 0 be given. For each m N we want to find a sequence of intervals I n such that
L ( m , m ) n = 1 I n , length ( I n ) < ϵ for all n , and n = 1 ( length ( I n ) ) σ < ϵ .
Choose n so that
1 2 n 1 < ϵ ,
n σ > 2 , and ( 2 m + 1 ) 2 σ ( n σ 2 ) < ϵ .
For n which satisfy these conditions,
length p q 1 q n , p q + 1 q n = 2 q n 2 2 n < ϵ .
We also have
  q = 2 p = m q p = m q 2 q n σ = q = 2 ( 2 m q + 1 ) 2 σ q n σ q = 2 ( 2 m q + q ) 2 σ q n σ = ( 2 m + 1 ) 2 σ q = 2 1 q ( n σ 1 ) ( 2 m + 1 ) 2 σ 1 d t t ( n σ 1 ) = ( 2 m + 1 ) 2 σ ( n σ 2 ) < ϵ .
This completes the proof. □
We computed the gaps between the dots.
  • For 300 rotations of ϕ , the biggest gap between angles is 0.0316 (about 1.8 degrees).
  • For 3000 rotations of Liouville’s number, the biggest gap between angles is 0.0452 (about 2.6 degrees).
The latter is about 1.4 times the former, even though it has 10 times as many dots.

Funding

The research was partially supported by the U.S. Air Force Office of Scientific Research (AFOSR) Grant Number FA9550-20-1-0030.

Data Availability Statement

All the data for the article were generated by Python code written by the author. All pseudocode for the generation of data and the subsequent algorithms/procedures is available upon request.

Acknowledgments

The author wishes to thank his son Thomas J. Casey and his student Richard Laurberg for assistance with Python programming. He also wishes to thank the referees for suggestions which led to improvements in the paper.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Casey, S.D.; Sadler, B.M. Pi, the primes, periodicities and probability. Am. Math. Mon. 2013, 120, 594–608. [Google Scholar] [CrossRef]
  2. Casey, S.D.; Sadler, B.M. Modifications of the Euclidean algorithm for isolating periodicities from a sparse set of noisy measurements. IEEE Trans. SP 1996, 44, 2260–2272. [Google Scholar] [CrossRef]
  3. Sadler, B.M.; Casey, S.D. Sinusoidal frequency estimation via sparse zero crossings. J. Frankl. Inst. 2000, 337, 131–145. [Google Scholar] [CrossRef]
  4. Sadler, B.M.; Casey, S.D. On pulse interval analysis with outliers and missing observations. IEEE Trans. SP 1998, 46, 2990–3003. [Google Scholar] [CrossRef]
  5. Lake, D.; Sadler, B.M.; Casey, S.D. Detecting regularity in minefields using collinearity and a modified Euclidean algorithm. Proc. SPIE 1997, 3079, 234–241. [Google Scholar]
  6. Casey, S.D. Periodic point processes: Theory and application. Appl. Stoch. Model. Bus. Ind. Spec. Issue Stat. Qual. Product. 2020, 36, 1131–1146. [Google Scholar] [CrossRef]
  7. Chunjie, Z.; Yuchen, L.; Weijian, S. Synthetic algorithm for deinterleaving radar signals in a complex environment. Radar Sonar Navig. 2020, 14, 1918–1928. [Google Scholar] [CrossRef]
  8. Liu, Y.; Zhang, Q. Improved method for deinterleaving radar signals and estimating PRI values. IET Radar Sonar Navig. 2018, 12, 506–514. [Google Scholar] [CrossRef]
  9. Chao, W.; Liting, S.; Zhangmeng, L.; Zhitao, H. A radar signal deinterleaving method based on semantic segmentation with neural network. Trans. Signal Process. 2022, 70, 5806–5821. [Google Scholar] [CrossRef]
  10. Qu, Z.; Zhang, J.; Zhou, Y.; Ni, L. The Intelligent Evolution of Radar Signal Deinterleaving: A Systematic Review from Foundational Algorithms to Cognitive AI Frontiers. Sensors 2025, 26, 248. [Google Scholar] [CrossRef]
  11. Cheng, W.; Zhang, Q.; Dong, J.; Wang, C.; Liu, X.; Fang, G. An enhanced algorithm for deinterleaving mixed radar signals. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 3927–3940. [Google Scholar] [CrossRef]
  12. VanderPlas, J.T. Understanding the Lomb–Scargle periodogram. Astrophys. J. Suppl. Ser. 2018, 236, 28. [Google Scholar] [CrossRef]
  13. Hocke, K.; Kämpfer, N. Gap filling and noise reduction of unevenly sampled data by means of the Lomb-Scargle periodogram. Atmos. Chem. Phys. 2009, 9, 4197–4206. [Google Scholar] [CrossRef]
  14. Ruf, T. The Lomb-Scargle Periodogram in biological rhythm research: Analysis of incomplete and unequally spaced time-series. Biol. Rhythm. Res. 2010, 30, 178–201. [Google Scholar] [CrossRef]
  15. Sidiropoulos, N.D.; Swami, A.; Sadler, B.M. Quasi-ML period estimation from incomplete timing data. IEEE Trans. Signal Process. 2005, 53, 733–739. [Google Scholar] [CrossRef]
  16. Hardy, G.H.; Wright, E.M. An Introduction to the Theory of Numbers, 5th ed.; The Clarendon Press: Oxford University Press: Oxford, UK, 1979. [Google Scholar]
  17. Ireland, K.; Rosen, M. A Classical Introduction to Modern Number Theory; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1982; Volume 84. [Google Scholar]
  18. Knuth, D.E.; Buckholtz, T.J. Computation of tangent, Euler and Bernoulli numbers. Math. Comp. 1967, 21, 663–688. [Google Scholar] [CrossRef]
  19. Knuth, D.E. The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd ed.; Addison-Wesley: Reading, MA, USA, 1981. [Google Scholar]
  20. Knuth, D.E. The Art of Computer Programming, Volume III: Sorting and Searching; Addison-Wesley: Reading, MA, USA, 1973. [Google Scholar]
  21. Leveque, W.J. Topics in Number Theory, Volumes 1 and 2; Addison-Wesley: Reading, MA, USA, 1956. [Google Scholar]
  22. Rosen, K.H. Elementary Number Theory and Its Applications, 4th ed.; Addison-Wesley: Reading, MA, USA, 2000. [Google Scholar]
  23. Schroeder, M.R. Number Theory in Science and Communication, 2nd ed.; Springer: Berlin, Germany, 1986. [Google Scholar]
  24. Sarhan, A.E.; Greenberg, B.G. (Eds.) Contributions to Order Statistics; John Wiley: New York, NY, USA, 1962. [Google Scholar]
  25. Reiss, R.-D. Approximate Distributions of Order Statistics; Springer: New York, NY, USA, 1989. [Google Scholar]
  26. Conway, J.B. Functions of One Complex Variable, 2nd ed.; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1978; Volume 11. [Google Scholar]
  27. Borwein, J.M.; Bradley, D.M.; Crandall, R.E. Computational strategies for the Riemann zeta function. J. Comp. App. Math. 2000, 121, 247–296. [Google Scholar] [CrossRef]
  28. Kay, S.M. Fundamentals of Statistical Signal Processing; Prentice-Hall: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
  29. Bartlett, M.S. The spectral analysis of point processes. J. R. Stat. Soc. B 1963, 25, 264–280. [Google Scholar] [CrossRef]
  30. Brillinger, D.R. The spectral analysis of stationary interval functions. In Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Oakland, CA, USA, 1972; pp. 483–513. [Google Scholar]
  31. Fogel, E.; Gavish, M. Performance evaluation of zero-crossing-based bit synchronizers. IEEE Trans. Commun. 1989, 37, 663–665. [Google Scholar] [CrossRef]
  32. Katok, A.B.; Stepin, A.M. Approximations in ergodic theory. Usp. Mat. Nauk 1967, 22, 77–102. [Google Scholar] [CrossRef]
  33. Kleinbock, D.Y.; Margulis, G.A. Flows on homogeneous spaces and Diophantine approximation on manifolds. Ann. Math. (Second Ser.) 1998, 148, 339–360. [Google Scholar] [CrossRef]
  34. Dym, H.; McKean, H.P. Fourier Series and Integrals; Academic Press: Orlando, FL, USA, 1972. [Google Scholar]
  35. Körner, T.W. Fourier Analysis; Cambridge University Press: New York, NY, USA, 1988. [Google Scholar]
  36. Blum, J.R.; Mizel, V.J. A generalized Weyl equidistribution theorem for operators, with applications. Trans. AMS 1972, 165, 291–307. [Google Scholar] [CrossRef]
  37. Breiman, L. Probability; Addison-Wesley: Reading, MA, USA, 1968. [Google Scholar]
  38. Walters, P. An Introduction to Ergodic Theory; Springer: New York, NY, USA, 1982. [Google Scholar]
  39. Nishiguchi, K.; Kobayashi, M. Improved algorithm for estimating pulse repetition intervals. IEEE Trans. Aerosp. Electron. Syst. 2000, 36, 408–421. [Google Scholar] [CrossRef]
Figure 3. Sparse data set containing two underlying periods.
Figure 3. Sparse data set containing two underlying periods.
Mathematics 14 01660 g003
Figure 4. Two periods EQUIter2 spectrum.
Figure 4. Two periods EQUIter2 spectrum.
Mathematics 14 01660 g004
Figure 5. Two periods deinterleaved.
Figure 5. Two periods deinterleaved.
Mathematics 14 01660 g005
Figure 6. Sparse data set containing three underlying periods.
Figure 6. Sparse data set containing three underlying periods.
Mathematics 14 01660 g006
Figure 7. Three periods EQUIter2 spectrum.
Figure 7. Three periods EQUIter2 spectrum.
Mathematics 14 01660 g007
Figure 8. Three periods deinterleaved.
Figure 8. Three periods deinterleaved.
Mathematics 14 01660 g008
Figure 9. The ergodicity of ϕ . Note that the points nearly fill the unit circle with only 300 dots.
Figure 9. The ergodicity of ϕ . Note that the points nearly fill the unit circle with only 300 dots.
Mathematics 14 01660 g009
Figure 10. The ergodicity of L. Note that even with 3000 points, there are still noticeable uniform gaps and the points do not fill the unit circle.
Figure 10. The ergodicity of L. Note that even with 3000 points, there are still noticeable uniform gaps and the points do not fill the unit circle.
Mathematics 14 01660 g010
Table 1. Some values of the Zeta Function ζ ( n ) .
Table 1. Some values of the Zeta Function ζ ( n ) .
n246810121416
ζ ( n ) π 2 6 π 4 90 π 6 945 π 8 9450 π 10 93555 691 π 12 638512875 2 π 14 18243225 3617 π 16 325641566250
Table 4. Modeling the MEA with noise-free and 5% jitter data.
Table 4. Modeling the MEA with noise-free and 5% jitter data.
n%   Missing τ Noise-Free τ 5% Jitter
2000%3.14153.1415
20010%3.14153.1413
20025%3.14153.1411
20050%3.14153.1408
20075%3.14153.1401
20090%3.14153.0103
Table 5. Modeling the MEA and the Periodogram with 5% jitter data.
Table 5. Modeling the MEA and the Periodogram with 5% jitter data.
n%   MissingMEA Within ErrorPeriodogram Within Error
2000%100%100%
20010%100%100%
20025%100%100%
20050%100%100%
20075%100%64.3%
20090%95.63%5.5%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Casey, S.D. The Analysis and Deinterleaving of Periodic Point Processes. Mathematics 2026, 14, 1660. https://doi.org/10.3390/math14101660

AMA Style

Casey SD. The Analysis and Deinterleaving of Periodic Point Processes. Mathematics. 2026; 14(10):1660. https://doi.org/10.3390/math14101660

Chicago/Turabian Style

Casey, Stephen D. 2026. "The Analysis and Deinterleaving of Periodic Point Processes" Mathematics 14, no. 10: 1660. https://doi.org/10.3390/math14101660

APA Style

Casey, S. D. (2026). The Analysis and Deinterleaving of Periodic Point Processes. Mathematics, 14(10), 1660. https://doi.org/10.3390/math14101660

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop