The RONO (Rank-Order-Normalization) Procedure for Power-Spectrum Analysis of Datasets with Non-Normal Distributions

: Standard (Lomb-Scargle, likelihood, etc.) procedures for power-spectrum analysis provide convenient estimates of the significance of any peak in a power spectrum, based—typically—on the assumption that the measurements being analyzed have a normal (i.e., Gaussian) distribution. However, the measurement sequence provided by a real experiment or a real observational program may not meet this requirement. The RONO (rank-order normalization) procedure generates a proxy distribution that retains the rank-order of the original measurements but has a strictly normal distribution. The proxy distribution may then be analyzed by standard power-spectrum analysis. We show by an example that the resulting power spectrum may prove to be quite close to the power spectrum obtained from the original data by a standard procedure, even if the distribution of the original measurements is far from normal. Such a comparison would tend to validate the original analysis. Abstract: Standard (Lomb-Scargle, likelihood, etc.) procedures for power-spectrum analysis provide convenient estimates of the signiﬁcance of any peak in a power spectrum, based—typically—on the assumption that the measurements being analyzed have a normal (i.e., Gaussian) distribution. However, the measurement sequence provided by a real experiment or a real observational program may not meet this requirement. The RONO (rank-order normalization) procedure generates a proxy distribution that retains the rank-order of the original measurements but has a strictly normal distribution. The proxy distribution may then be analyzed by standard power-spectrum analysis. We show by an example that the resulting power spectrum may prove to be quite close to the power spectrum obtained from the original data by a standard procedure, even if the distribution of the original measurements is far from normal. Such a comparison would tend to validate the original analysis.


Introduction
The investigation of time series often involves a search for oscillations. This is usually carried out by analyses such as the Lomb-Scargle procedure [1,2]. Alternatively, one may use a likelihood procedure that yields the same power as the Lomb-Scargle procedure but yields, in addition, amplitude and phase estimates [3,4]. These procedures have the convenient property that, on the assumption that the data are derived from random measurements that have a normal (i.e., Gaussian) distribution, the probability of finding a power S or more at a given frequency is given by However, in practice a laboratory experiment or observational sequence may generate data that do not conform to the normality requirement. One then has the following options: (i) Analyze the procedure that generates the data and find or derive a valid procedure for assessing the significance of an oscillation generated by such a procedure; or (ii) Ignore the normality requirement and apply Equation (1) as if the normality requirement were valid.
Algorithms 2020, 13, 157 2 of 5 Option (i) can be difficult and time-consuming and is probably rarely adopted in practice. Option (ii) is simple and convenient, but has the unfortunate consequence that the validity of the result is uncertain. For these reasons, we here suggest a third option.
If one can convert a given dataset into a "proxy" dataset that has a normal distribution, then one may apply the Lomb-Scargle or a similar procedure to the proxy dataset, and one would then be entitled to use Equation (1) for assessing the significance of any oscillation in the proxy dataset.
This raises the question: what procedure can be used to convert a given dataset into a proxy form that (a) has a normal distribution, and (b) somehow retains information in the original dataset that is significant for time-series analysis, so that an oscillation in the original dataset will lead to a corresponding oscillation in the proxy dataset?
We suggest that what we refer to as the RONO (for rank-order normalization) operation is a practical candidate for such a procedure.
We describe this procedure in Section 2, and give an example of its application, with brief comments, in Section 3.

The RONO Operation
Consider a sequence of measurements, x n , n = 1, . . . , N, which we denote by {x}, taken at times t n , n = 1, . . . , N, which we denote by {t}. We arrange the sequence of measurements in ascending order, and denote the re-arranged measurement sequence by where R denotes the re-ordering operation.
We denote by f the error function, defined so that it increases from 0 to 1 as the independent variable g increases from −∞ to ∞: We now defineỹ byỹ where We now reverse the ordering procedure to obtain Then the sequence y has the same rank-order as {x}, but has a strictly normal distribution.

Example and Discussion
We consider, as an example of a dataset for which the distribution is far from normal, a sequence of 85,284 radon decay measurements acquired at the Geological Survey of Israel (GSI) [5,6]. The measurements, registered by universal time, were acquired at 1-hour intervals between day 86 of 2007 and day 312 of 2016, for local hour of day in the range 10 pm to 2 am [6]. The distribution of measurements, shown as a histogram in Figure 1a, is obviously far from normal. A power spectrum was computed by the following likelihood procedure [4]: where σ is the standard deviation of the measurements, and, for each frequency, the complex amplitude A is adjusted to maximize the power S. This choice can be made by noting that the complex amplitude that maximizes S is the amplitude that is unchanged by arbitrary small perturbations in the complex power, so that we can determine the appropriate complex amplitude by the operation ∂S ∂A * = 0.
This likelihood procedure yields exactly the same power as would be derived from the Lomb-Scargle procedure but it also yields-from the complex amplitude A-the amplitude and phase for each frequency. The power spectrum computed in this way is shown in Figure 1c.
The distribution of measurements, normalized according to the RONO procedure, is shown in Figure 1b. It is indeed precisely normal.
The power spectrum formed from the normalized measurements, shown in Figure 1d, is visually indistinguishable from Figure 1c. Table 1 lists the top 20 peaks in the power spectrum formed from the original dataset and the top 20 peaks in the power spectrum formed from the RONO-normalized dataset. We see that the frequencies are exactly the same and the powers differ by only a few percent.
This result suggests that a typical power-spectrum analysis may be less sensitive to departures from normality of the dataset than one might expect.
Further development of the RONO procedure could be to implement this approach as a data-preprocessing step for time-frequency analysis methods (e.g., wavelet analysis, Stockwell-transform) or for correlation analysis methods (e.g., wavelet coherence, linear and non-linear correlation analysis). Table 1. The frequency and power of the top 20 peaks in power spectra formed from GSI night-time data for the frequency range 0-6 year −1 , as computed from the raw (i.e., un-normalized) data and from the RONO-normalized data. Author Contributions: Conceptualization, software, visualization, writing-original draft preparation, writing-review and editing; P.S.; visualization, writing-review and editing, F.S. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.