Testing for a Random Walk Structure in the Frequency Evolution of a Tone in Noise

Inference and hypothesis testing are typically constructed on the basis that a specific model holds for the data. To determine the veracity of conclusions drawn from such data analyses, one must be able to identify the presence of the assumed structure within the data. In this paper, a model verification test is developed for the presence of a random walk-like structure in the variations in the frequency of complex-valued sinusoidal signals measured in additive Gaussian noise. This test evaluates the joint inference of the random walk hypothesis tests found in economics literature that seek random walk behaviours in time series data, with an additional test to account for how the random walk behaves in frequency space.


Introduction
Many analyses done today are model-based and inference results drawn from these works rely heavily on strong assumptions to hold. One may ask if the models used are the correct choice; and if true, or if not, what impact that would have on the veracity of the results. The questions surrounding verifying decisions in model selection have been popular fields of inquiry. (See [1,2] for thorough reviews of model selection methods.) This paper considers a specific structure identification (SID) problem: to determine if a given time series arose from noisy samples of a sinusoid whose frequency varies according to a random walk (RW). This problem has important practical applications in several areas of signal processing as discussed briefly in Section 1.1 and is part of the long-established theoretical tradition of optimal detection and estimation for tones with randomly varying frequency. Our current motivation relates to the, as yet unsuccessful, search for continuous gravitational waves (CGWs). The structure of the frequency variation of such signals has astrophysical significance.
Since the seminal works of Kolmogorov and Wiener in the 1940s, the majority of sensor-type signal processing approaches are model-based. Sensor-type signal processing techniques based on carefully defined models of sensors and signals have two huge advantages-firstly, optimal processing methods can be developed by exploiting model structure, and secondly, performance can often be characterized analytically.
There are, however, certain disadvantages. If the model is not an accurate description of the sensor, the signals, or both, then the above advantages can be diminished.
To address this issue, over the past 50 years many techniques for identifying parameterized models of given structure from measured data have been developed including adaptive methods which continually adjust the model parameters in real-time based on the most recent data. Robust techniques have been developed to maintain near-optimal performance under fairly broad modelling errors including classes of structural errors.
The next important step in model-based sensor and signal processing problems is the validation of models. In some situations, model validation can be based on ground truth, though this is rarely possible.
Recently, there has been growing interest and activity in quantifying the evidence for a model which is being used in a model-based optimal-sensor signal processing scenario. That is, given a model and an optimal processing solution-there is interest in determining the evidence for the model being used based on observed data and optimally-processed outputs. A comprehensive review covering methods for evaluating evidence for a model can be found in the recent paper [3].
The theoretical foundations of general SID (as opposed to parameter estimation) are still not well developed. Existing approaches such as Bayes factor-based methods, model residual testing, and multiple hypotheses testing all propose a structure, and then test in different ways how strongly the data are consistent with the model. While this is an advance on parametric estimation for a fixed given model, it is still a long way from true structural discovery.
For the problem considered in this paper, an initial question is how to even determine whether an RW is present in the data. This is a well-researched field in the econometrics literature known as random walk hypothesis (RWH) testing [4]. Such methods try to determine from a recording of time-series data whether the data behaves as an RW.
The second question is then how to distinguish between an RW and the trigonometric function of an RW. One key quality that comes into play here is that the resulting process is bounded so it is trivial to do this given sufficiently a long recording time. However, many processes are random and bounded and the RWH techniques would not distinguish the raw trigonometrically transformed data by design. To distinguish, one must pick out the component believed to act as an RW (the frequencies of the signal). Conventionally this is done by the Fourier transform.
Once in Fourier space, one may then naively think to simply apply RWH testing techniques and yield a viable result. A challenge comes from the fact that frequency space is cyclic. Given that, it then becomes a challenge to distinguish between something that would resemble an RW under RWH testing and the Fourier transform of white Gaussian noise (WGN).
We argue here that should the trigonometrically transformed-RW in question vary slowly, it could be distinguished from the Fourier transform of WGN and so be susceptible to RWH testing methodologies.
We will present a non-nested multiple hypothesis procedure to test if a structure (RW frequency signal) is a good model. The choice of hypotheses is important in order to avoid structures that are RW-like but not actually RWs. We create four hypotheses which enable sharp discrimination between RW structure and structures which generate data close to an RW in frequency space.
For completeness, we briefly outline a number of approaches to SID. We then explain the logic underlying the multiple hypothesis structure we analyse in this paper.
Bellman & Åström [5] set out a criterion for structural identifiability, and a generalised approach to SID problems. However, a key challenge is the arbitrary nature of setting criteria for accepting a given model over others while minimising complexity. One of the earliest attempts to deal with this complex problem is the Akaike Information Criterion (AIC) proposed in [6], based on a form of dimension-penalization. However, as Rissanen argued [7], Akaike's criterion only minimizes total predictive error in the degenerate case where one model size yields an estimate with probability one. Rissanen proposed the Minimum Descriptor Length (MDL) criterion [7], which is invariant under linear coordinate transformations and consistent with respect to dimension estimation. Additionally, Willems formalized SID problems through a behavioural approach [8] that asks 'given inputs and outputs to a system, does any structure exist at all which may model such behaviours?' 'Tearing' [framing the system as a network of interconnected subsystems], 'zooming' [modelling the subsystems], and 'linking' [modelling the connections between subsystems] offer systematic approaches to the task. More recently, kernel-based methods implemented in the machine learning methodology have been a popular tool (see [9] for a thorough review). Few of them have addressed the issue of generating hypothesis tests on the model structure when applied to an observed phenomenon. Kernel methods test whether a structure of some sort works, without identifying it explicitly [9]; this is not what we seek to do. Rather, we ask whether a specific structure accurately models an observed phenomenon in a statistically significant sense.
We explore an SID problem, which we phrase through the lens of model detection. This is intrinsically different from a signal detection problem with a known (or more appropriately assumed) signal model. We wish to decide whether any sinusoidal signal with random time-based variations in its instantaneous frequency can provide a fit, with confidence, to the given data. This, rather, seeks the presence of a structure in a record, not the presence of a signal with a specified structure. However, what we observe is the signal itself, and not its underlying parameters. Furthermore, here we seek whether the data fits into a class of models, and so do not impose any specific model (target signal) within the said class. Thus, given that within the class the model parameters may be arbitrary, we assume that they are unknown. In the case of detection, with unknown parameters, we then must implement estimation procedures to verify whether the data resembles a sinusoidal signal with time variation in its tone.
The parameter estimation problem of optimally estimating the frequency of a constant frequency tone based on noisy measurements of the tone is among the well-studied problem in signal processing with many papers exploring its finer details. The general problem itself was formulated in discrete time by Rife and Boorstyn [10], following the work of Slepian [11] in continuous time. In approaching our problem, we assume that the instantaneous frequency varies linearly over short time intervals (snapshots, blocks) and follows an RW structure from snapshot to snapshot. A potential extension would be blockwise (in time) frequency modulation of arbitrary polynomial order (with respect to time); this would, in effect, extend the problem to any continuous frequency modulation.
The statistical theory surrounding RWH testing [12] has been shown to be effective at deciding in certain problems whether the stock market can be said not to be following an RW [13][14][15], with implications for forecasting of market trends.
However, it is argued that such tests can lead to ambiguous conclusions [16] (in the sense of what conclusions drawn mean with regards to the question) or inconsistent results [17] (in the sense of robustness). This is thought to arise from the sensitivity of variance-based inference under the small-sample regimes [18]. Furthermore, because of phase-wrapping behaviours, as discussed in the proof of Remark 1, such methods do not discriminate between a signal measured in noise with an RW in frequency and WGN.
The problem that we consider involves the detection of a model of a trigonometric transformation of an RW process, rather than the detection of an RW process. This is inherently different to standard RWH testing, as trigonometric transformations are nonlinear in nature. The plausibility of testing the RWH under such a transformation has been considered by other researchers in the economics literature [19,20].
An auto-regressive (AR) time series {ξ n } n of order 1, known as an AR(1) process, has a recursive representation of the form ξ n = ρξ n−1 + ς n , with independent and identically distributed (IID) driving noise {ς n } n . The Dickey-Fuller (DF) unit root test [15], assuming the presence of an AR(1) process {ξ n } n (not corrupted by noise) in the data, tests a null hypothesis of a stationary (in the statistical sense) first-difference process (ξ n − ξ n−1 ) which would occur if |ρ| = 1 against an alternative hypothesis |ρ| = 1.
It had been shown [20] that the DF test, if applied to the trigonometric transformation of an AR(1) process-asymptotically may still be used to test for stationarity of the first difference process-following analytical [21], and empirical [19] arguments on the asymptotics of nonlinear transformations on 1st-order integrated time series I(1), and subsequently on AR(1) processes.
However, this is different to the model we test for: the signal model of a scaled trigonometric transformation of an AR(1) process with |ρ| = 1, given noise-corrupted measurements. As we do not assume, inherently, that a (transformed) AR(1) time series is present within the data, but rather are testing the validity of such a model-given the data-the DF test regime does not apply here, and so we follow a different scheme that will be discussed below.
In this paper, we advance a scheme for discriminating between WGN and a sinusoidal signal whose instantaneous frequency executes a discrete-time RW, which, when combined with RWH methods provides a more general inference on the existence of an RW structure in frequency. In Section 2.1 the problem is formulated mathematically. We design an estimatordetector in Sections 2.2 and 2.3, and discuss its analytical performance in Sections 3.1 and 4.
In continuous time the RW is replaced by a Wiener process and in this context, prior to sampling, we are faced with signals of infinite bandwidth. This is, as is often the case with mathematical models, a useful idealization. Whereas a typical function of a Wiener process has infinite bandwidth, ultimately the signal has to be measured and this forces a bandwidth constraint on the measured signal imposed by the sensor. This has to be taken into account in the sampling and subsequent processing of the discretized signal. We work within the discrete time domain here and the bandwidth issues are overcome by the determination of the sampling rate of the continuous time signal.

Applications
The concept of a time-varying auto-regressive (TVAR) tone appears across various fields. Below are some applications of TVAR tones in the literature, including the motivating application for this paper in gravitational wave (GW) astronomy.

Gravitational Wave Astronomy
Predicted theoretically by Einstein in 1915, GWs are disturbances in space and time, generated by the acceleration of asymmetric bodies, which propagate at the speed of light [22] as perturbations of the metric tensor g µν , which describes distances in spacetime by the invariant interval ds 2 = g µν dx µ dx ν (with respect to displacement four-vectors dx µ ).
GWs cause oscillations in the proper displacements of freely falling test masses and can be detected by long-baseline laser interferometers, such as the Laser Interferometer Gravitational-Wave Observatory (LIGO) [23].
A GW incident normally on the plane of an interferometer stretches and squeezes distances along the arms [22]; along two characteristic polarisations 'plus' (whose principal axes of action align with those of the spatial dimensions) and 'cross' (whose principal axes of action are rotated π 4 with respect to those of the spatial dimensions). At the time of writing, dozens of GW signals have been detected, including the first discovery of the chirped tones from a binary black hole merger in 2015 [24] and a binary neutron star merger in 2017 [25], inaugurating a new era in astronomy.
Instruments such as LIGO search for a variety of waveforms, not just chirps from mergers. The theory also predicts the existence of persistent, sinusoidal, quasimonochromatic CGW signals, believed to originate (for example) from isolated or binary neutron stars [26]. Under the biaxial rotor model (a rigid body with two equal principal moments of inertia), a pulsar emits GWs continuously at f and 2 f [27].
For an isolated neutron star, f decays monotonically over timescales 10 3 years, but wanders stochastically on smaller timescales [28]; suspected mechanisms include seismic activity in the crust or far-from-equilibrium avalanche processes in the superfluid interior.
Whereas, in pulsar binaries, the stochastic wander of spin frequency is driven by fluctuations in the accretion torque of gas from the binary companion [29,30]; the underlying process is suspected to be a result of transient disk formation [31], or instabilities in the disk-magnetosphere [30].
The modelling and detection of such CGWs by interferometric data (such as from aLIGO) follows the work of Jaranowski [27] (and the following parts). The stochastic wander in f is incorporated into detection schemes based on hidden Markov models [32,33].

Structural Analysis
Materials used in construction exist in meta-stable states, and their properties evolve stochastically over long timescales because of a variety of factors (such as cracking or the influence of humidity). To ensure structural stability one must be aware of the material's natural frequency and harmonics, to damp against excitation such as from applied wind pressure (for example, the case of the Tacoma Narrows Bridge, "Galloping Gertie") or from synchronous loading (as seen with the London Millennium Bridge). When considering concrete elements, for example, such a natural frequency wanders randomly in time. Efforts are made to track this randomly wandering natural frequency in papers such as [34].

Sound Processing
The tracking of TVAR harmonics in signals is an approach considered in communications and sound processing problems such as in the tracking of a time-varying chirp (such as the Doppler shifts in mobile communications). Such applications have been discussed in papers such as [35] and others.

Problem Formulation
To begin, we denote the discretisation (for indices n = 0, . . . , . This discretisation provides a partition for a data record of data sampled uniformly at (where in the temporal discretisation structure noted above, we overlap the last observation of any two adjoining blocks k = k , k + 1) to a data record of K(N − 1) + 1 observations, or KN points including repeated points.
For a block {x (k) n } n (fixed k), we consider N to be the sample size of that subset The parameter N is the blockwise sample size . Graphically, the discretisation (with overlap at endpoints) is constructed as in Figure 1.
n be a sampled complex sinusoid defined by, where, for fixed k, {φ n is the instantaneous phase of the sampled sinusoid (assuming a quadratic structure in time). That is, which corresponds to a phase:

Definition 1.
We will say that the pivot frequencies { f (k) 0 } k are stable if they vary 'slowly' in the sense that, where, as defined in (8) We note that this is effectively a band-limiting statement on the signal. Centre frequencies of Fourier bins of the signal are limited in their variation between consecutive bins to be significantly less than the length N of a block. Effectively this prevents wrapping of the frequencies within a block and aliasing. Remark 1. By means of RWH testing, one cannot distinguish between an RW in frequency space that violates condition (6) and the sequence of peak frequencies { f n } n when processing via Fourier transforms.
Existing RWH tests, such as the LOMAC Test [13]; or the Chow-Denning Test [14], are structured around variance ratio (VR) testing (for an overview of such tests see [12]). If a process {ς n } n does not satisfy condition (6), then for any realistic N, the existing RWH tests would view the process {[ f (ς) n ] f s } n as it would view the peak frequencies To clarify, a variance ratio test would not be able to determine whether Another issue that is faced, is the wrapping behaviour of frequency spaces at the edges. If the frequency variations do not occur sufficiently far from the boundaries-then there would also be issues distinguishing an RW frequency process, that is f s -modular, from the peak frequencies of a WGN process, that are f s -modular.
To clarify, should a process {ξ n } n in the frequency space F step outside the domain by some small amount > 0: f s , it would appear to jump to the other boundary. Because of wrapping issues, an RWH testing algorithm, utilising Fourier transforms, would read this as having jumped a length f s − either by f s Such a process would then appear to have variance comparable to the size of the domain itself and would be indistinguishable from the frequencies of a WGN process.

Remark 2.
Recalling that on an unbounded interval, an RW process is also unbounded, and thus the sinusoid with an RW frequency would have infinite bandwidth in the asymptotic case. However, the mere act of measurement is a finite process in finite time-which then imposes a finite bandwidth upon the process.
By standard practice, the band-limiting of the signal would be determined by an anti-aliasing filter setting the lower bound on the sampling frequency to the Nyquist limit. Then, one would be able to choose the block-wise sample size determined by the sampling frequency as per condition (6), and the other parameters can be determined to optimise the efficiency of the verification process.
For a more detailed discussion of how to optimally set out the parameters, refer to Section 4.
We discretise the frequency range [− f s 2 , Then the discretisation in time (and subsequently frequency) is equivalent to, n is a displacement term representing uncertainty in phase due to resolution of frequency; if nothing is known about its distribution we assume δ . This is a non-physical prior which will impose little to no bias on inferences, used similarly in discretised methods such as in HMMs [32,33]. Definition 2. We will say that a process {ξ n } n is RW-like if, when testing it against the RWH, its behaviour would not be considered significantly different from that of an RW, in the statistical sense.
In terms of AR(1) models, of the form ξ n = ρξ n−1 + ς n , {ς n } n IID, we would say that a process {ξ n } n is RW-like if, when testing it against the RWH, one cannot say in a statistically significant sense that |ρ| = 1.
The importance of appropriately primed SID approaches on data series assumed to be containing RWs prior to significance tests based on that assumption is made clear by Durlauf and Phillips [36]; who analytically characterised the behaviour of regression coefficients for time trends in data, assuming trend stationary data, on I(1) processes (a class under which RW-like series fall).
They found, analytically, that regressions of an RW against a time trend will yield incorrect inferences of a greater significance of a trend than the present work, resulting in a stronger bias for hypotheses assuming time trends than appropriate; supporting previous empirical works done via Monte Carlo (MC) simulations [37].
For time-series data tested against the RWH, alternative hypotheses of RW-like series may be distinguished from an RW by spectral methods [38] by utilising the weak convergence (in the Hilbert sense, that is x n , y → x, y ∀y) under the sup metric d( f , g) = sup x {d ( f (x), g(x))} of the periodogram deviation process (the sequence of deviations of the time-series in question from the periodogram of a true RW process) to a Brownian bridge on C[0, 1]. This is not a convergence that is preserved under trigonometric transforms, and thus we would not be able to apply equivalent tests in our case.
This, taken with Remark 1, yields that the following hypothesis testing structure is enough to detect an RW in frequency space, as per our model: In terms of AR(1) models, of the form ξ n = ρξ n−1 + ς n , {ς n } n IID, instability can be considered to mean |ρ| > 1 and non-RW-likeness can be considered to mean |ρ| = 1 in a statistically significant sense.
The reasoning behind these hypothesis tests will be expanded on, once the necessary tools are extracted from the literature and developed for our problem's lens (see the discussion at the end of Section 2.3).

Carrier Frequency Estimation
Given that the parameters (A, f , φ) of the signal model are assumed to be unknown, frequency estimation is necessary to apply the hypothesis test introduced in (9).
Following [39] the carrier frequency of a linear chirp is taken to be the average value of the lower and upper frequencies of the peak region of the Fourier-transformed signal.
The 'slow' nature of the RW in frequency (6) asserts that the endpoints of the peak region are separated by a negligible distance with respect to the size of the frequency space.
Under the assumption of the 'slowness' of the RW, the signal's peak power is distributed across the Fourier bins in a manner that would resemble a narrow peak in Fourier space.
The coarse search process given by Rife and Boorstyn [10] estimates the constantvalued frequency of a single-toned sinusoid. Given the sharpness of the peak in Fourier space, the methods in [10] are an appropriate means of estimating the carrier frequency of the signal.
We note that the estimation process here is not necessarily optimal, but rather is implemented for its well-understood statistical properties.
Taking the 1 N -normalised Discrete Fourier Transform (DFT) where m = 0, . . . , N − 1 is the Fourier bin number, we estimate the location of the carrier frequency by the method in [10], as:m We takef where carets indicate an estimated quantity. We represent the carrier frequency estimatorf (k) c , at fixed k, by the 'true' carrier frequency of the signal displaced by an estimation error:f The variance of the peak location of a sinusoidal signal in noise, as estimated according to (12) and (13) is shown to be σ 2 ε = 1 N 6 (2πNT) 2 SNR by [40,41] with SNR = A 2 σ 2 w . Given limiting distributions of periodogram frequency estimators [42] are normal to leading order, and the asymptotic normality of the Maximum Likelihood Estimator (MLE), we will assume that ε (k) follows a distribution of the form ε (k) a ∼ N (0, σ 2 ε ) for sufficiently large K. Note, however, that the rate of convergence here is slow-with Theorem 8 of [42] stating effectively that for the standardised version ε of 2πε: H e 4 (ξ) 4 + H e 6 (ξ) 18 where H e r (x) = (−1) r e x 2 2 d r dx r e − x 2 2 is the probabilistic Hermite polynomial of order r. As will be noted again later in further discussion, the simulations failed to provide results for SNRs less than −20 dB. We argue that the procedure provides a reasonable degree of performance for Signal-to-Noise Ratios greater than −20 dB, however, in the lower SNR regime, we would argue, that this is likely due to the estimation procedures used as a proof-of-concept for the broader issue of model verification in the problem being considered. More sophisticated estimation approaches could be applied, however, that is not the focus of the paper as this is not an estimation-detection problem we do not need to know whether the model verification lens in this problem can be used, and the SNR-regime of SNRs greater than −20 dB is sufficient to that end.
We do, however, note that Figure A1a,b shows that the parameters (T, K) do not affect the performance of the procedure as one would anticipate.

Detection of Random Walk Structure
Under H 1 , by (11) we know that f (k) c is the average of two RW elements f Recalling that the variance of an RW of finite length is its length, then, under H 1 , we assert: where we use V[·] to denote the ensemble variance operator. Now, (15) tells us that the same simplifications used in the analysis of RWs as in [12] should be used here for 'regularity' (in a statistical sense). In line with this, we define the first difference process {y We also define, at lag 1 ≤ τ ≤ K 2 , a general difference process {∆ The power of WGN (i.e., {|X (k) m | 2 } m under H 0 ) is equally distributed between Fourier bins. Through some algebraic manipulation, we find that, under H 0 : Then, under H 0 , by standard definitions and properties of the distributions (see And so, by (16) define, Now, combining (8), (11), (13) and (16), we may representŷ We assume that, under H 1 , we may use a representation of the form: where the discretised driving term of the RW-like structure is defined to be a sequence of integer-valued jumps between Fourier bins, represented by {u (k) } k .
To illustrate the range of behaviours, we consider, at extreme ends, two different forms of {u (k) } k . The first case is: This discrete uniform jump structure represents the case where minimal information is assumed for the jumps satisfying (6).
Thus, for this problem, the 'slowly' varying nature, in the sense of condition (6), of the RW in frequency space under H 1 , is manifest by σ 2 0 >> σ 2 1 . Given the presence of the discretisation displacement terms in representation (20), the data is non-gaussian regardless of the jump structure considered. Motivated by the asymptotic properties of variance estimators for non-Gaussian data, as shown in [43], we construct the test statistic,χ 2 0 :=D oFσ whereD oF = 2(K−1) is the sample Degrees of Freedom estimator, withκ,σ 2 the sample kurtosis and sample variance estimators respectively, for {y (k) c } k . As explained in [44], we require K 11 for the asymptotic properties in (25) to hold to at least 1 sigma.
However, given that the above procedure seeks controlled (in the sense of a 'slow' wander as defined in (6)), but RW-like, jumps in the frequency estimators {f (k) c } k ; any stable (in the sense of a mean-reversion type behaviour; |ρ| ≤ 1) random process bounded on a sufficiently small domain would also be flagged by this test as statistically significant. Thus, not only doesẐ 0 distinguish between H 0 and H 1 , but also H 2 and H 3 .
The hypothesis problem (9) addressed here prompts three questions to be tested: 1. Does the measured frequency exhibit a structure resembling an RW? 2.
Assuming that the measured frequency does exhibit an RW-like structure, does that structure wander in a 'slowly' (in the sense of definition (6)), rather than randomly jumping in a manner that could be characterised by randomly selecting Fourier bins in a uniformly-distributed manner? That is to say, assuming that the answer to Question 1 was affirmative, then, is that a true RW-like structure, or an artefact of the estimation procedure acting on pure noise? 3.
Does the measured frequency revert to the mean, as in an Ornstein-Uhlenbeck process? That is to say, does { f 0 } (k) wind down? (For good resources on processes like Ornstein-Uhlenbeck processes, we refer the reader to [45]). Mean reversion is likely in astrophysical and GW applications, e.g., signals from rotating neutron stars [46].
In the economics literature, VR-based RWH testing of time series data (such as the LOMAC Test [13]; or the Chow-Denning Test [14]), determines whether the ratio of the variance of the first difference process (16) to the variance of the difference process (17) at lag 2 ≤ τ ≤ K 2 (τ fixed) departs from unity in a statistically significant sense. This determines whether mean-reversion behaviours are present, as seen in an Ornstein-Uhlenbeck process.
Whereas alternative RWH testing approaches seek the presence of a unit root against stationarity (such as the DF test [15], or Bhargava's von Neumann ratio test [47] for the finite sample regime), there are stationary processes that appear like RWs from the lens of unit root testing [48].
It has been shown in [49] (and other papers) that the DF unit root test statisticρ has low power against edge cases (near the unit root) and against trend-stationary processes; resulting in incorrect conclusions being drawn on testing hypotheses for discriminating between an RW and mean-reverting processes (such as Ornstein-Uhlenbeck processes) via unit root tests.
Furthermore, many traditional detectors (such as F test statistics, Hausman test statistics, or Durbin-Watson test statistics) which may be framed to test for (or test against) stationarity tend to yield inconclusive results when tested in an RWH framework [36].
Lastly, it has been shown that the LOMAC and Chow-Denning test statistics are considerably more powerful than the Dickey-Fuller test statistic and other unit root tests [13,14] against even borderline cases such as distinguishing between AR(p) and ARIMA(p,d,q) models; and the Chow-Denning test statistic more powerful than the LOMAC [14].
Thus, for the purposes of this study, we consider VR-based approaches (such as LOMAC or Chow-Denning) and not unit root-based approaches (such as DF).
The first question is one inherent in the RWH testing literature, and so, such methodologies try to address this question.
The second question can be covered by the Controlled Variation Test defined above (26)which checks that the first difference process in frequency is varying in a sufficiently controlled manner-that it is more than likely not just randomly picking tones in a uniform manner across the bins.
The third question, similarly to the first question, can be covered by an RWH test in determining if the variance of the lagged differences changes over time in a statistically significant sense.

is the Šidák correction for the joint inference of a Chow-Denning
Test (27) of size J . J is taken as the Šidák correction factor, and not J + 1 (as would be implied by the additional independent test from the Controlled Variations test), due to the non-nested nature of the tests [50].
Set hypothesis tests for the Controlled Variations Test (26) and the (size J) Chow-Denning Test (27), respectively, as: The joint inference of the above hypothesis tests determines which of the four hypotheses holds as described in (9).
Given that the two tests are independent of each other (as per the assumptions that the Chow-Denning test statistic is constructed upon [14]), our testing structure falls under the non-nested hypothesis testing methodologies [50].
The four possible results could arise from this joint inference problem: • If the test (27) asserts that one cannot reject the RWH and the test (26) asserts that the tones vary in a sufficiently controlled sense: then one could argue that there is an RW in the frequency. • If the test (27) asserts that the data rejects the RWH and the test (26) asserts that the tones vary in a sufficiently controlled sense: then one could argue that the frequency follows a stable process that does not resemble an RW (an Ornstein-Uhlenbeck process, for example).

•
If the test (27) asserts that one cannot reject the RWH and the test (26) asserts that the tones do not vary in a controlled sense: then one could argue there is just noise. • If the test (27) asserts that the data rejects the RWH and the test (26) asserts that the tones do not vary in a controlled sense: then one could argue that the frequency follows an unstable process that does not resemble an RW.
To elaborate, should the Chow-Denning tests (27) argue that it is not statistically significant to be able to reject the RWH, then by Definition 2 the central instantaneous c } k as estimated from the observation process {x (k) n } k n is RW-like, that is that it has some component behaving in a manner that is indistinguishable from that of an RW under RWH testing methods.
As RW-like is defined (Definition 2 in a binary sense, as is 'stability' (Definition 1), we may take a Fischer-esque philosophical interpretation of the results (i.e., if it is not statistically significant to say that it is A, then it is statistically significant to say it is not A).
The joint inference by the two above tests, are summarised with respect to the hypotheses (9), (31) and (32) in Table 1. Each case explicitly covers one of the above four possible cases on the constructed problem. Thus, the hypothesis tests as described in (9) are fully described by the possible interpretations proposed above. It then follows that the above-proposed tests are sufficient to fully specify the considered problem given the assumptions upon which it was constructed.
Proof. Under H 1 , the probability distribution L(y c } k is a perturbation of the probability distribution L(u (k) ) of the process {u (k) } k (scaled by a factor as per (7)), perturbed by the uniformly-distributed variables {δ [40]. Then, by local asymptotic normality [43,51], the Degrees of Freedom estimatorD oF should converge to the Gaussian case DoF = K − 2, for sufficiently large blockwise sample sizes N, if the degree of perturbation is not too large. So, we assert thatD oF −→ N→∞ K − 2, and so, Two constructions for characterising the jump structure of the RW-like structure on the pivot frequencies { f (k) 0 } k are considered for the above modelling. We demonstrated the following convergence properties under simulations.
As shown in Figure A2a,b, the uniform jump model (22) for the RW-like structure does not result in a convergence ofD oF −→ N→∞ K − 2, whereas the normal jump model (23) for the RW-like structure does (see Figure A2c,d).
Consider a 'slow' branching process generated by normally-distributed variables for the jumps characterising the RW-like structure. Assuming normal jumps (23), the probability of detection, P D , may then be determined. By applying the Wilson-Hilferty transform to (34), one can relate the test statistic (26) with respect to the distribution under H 1 by: Through algebraic manipulation (see Appendix B) of the test statisticẐ 0 , this generates a family of (asymptotic) analytical ROC curves for the parameters (α, J, N, K, T, SNR): Given the analytical performance limits as presented in Proposition 1, we then ask for what sample sizes and signal-to-noise ratios such performance bounds reflect the performance of the model detection methodology considered. Figure A5a,b reflect the rate of convergence of the performance with respect to increasing blockwise sample sizes, given a signal-to-noise ratio.
As in Figure A5a,b specifically, the analytical performance bounds presented in Proposition 1 represent the performance of the detectors, in practice, in the discrete sample regime (N ≤ 128) for signal-to-noise ratios above −10 dB.
However, simulations failed to provide results for signal-to-noise ratios of −20 dB or less, the underlying reason likely due to non-optimal estimation procedures used (as this was just a proof of concept for the verification method). Further, the simulated ROC curves did not appear to converge to their theoretical counterparts in the very-low SNR regime. Then, it would follow that it is likely that the analytical performance bounds presented in Proposition 1 fail to represent the performance of the detectors for signal-to-noise ratios of −20 dB or less. Now, the Chow-Denning test statistic V 1 (27) of size J constructed on the pivot frequencies across a recording of K blocks follows a Studentised Maximum Modulus SMM(µ, ν) distribution with parameters µ = J, ν = K − 1 [12,14].
Following the same logic that the Chow-Denning test statistic assumes the independence of the LOMAC test statistics from which it was constructed [14], assume the independence of the Chow-Denning test statistic V 1 (27) and the Controlled Variations test statisticẐ 0 (26). Given this assumption, we may define the detection distribution of the joint tests, under appropriate Šidák corrections to the false alarm rate, as the product distribution P * D := P(Ẑ 0 < γ 1 , V 1 > γ 2 |H 1 ): recalling that, as mentioned above, V 1 ∼ SMM(J, K − 1).

Discussion
When performing a model verification test as shown above, to a specified α ∈ (0, 1), the analytical performance bounds (as defined in (33), and illustrated in Figures A3a and A4) allow one to optimally partition the data across blocks, to better prime the data for analysis, by the following procedure: • Determine the sampling rate f s given the Nyquist frequency of the hypothesized signal; • Given an SNR> −20 dB, fix N ∈ [N undersampling , N oversampling ] for which the estimator converges to within a neighborhood less than the discretisation; • Take minimal K ≥ 11 (while maximising J ∈ {1, . . . , K 2 − 1}) such that P D is optimal. If the signal-to-noise ratio, however, is not greater than −20 dB, then a different estimation procedure would need to be considered.
However, as mentioned in Section 2 , the drawback is that the detection method only works given a few key assumptions: • The blockwise phases {φ (k) n } n are quadratic; • The model on which the jumps generate the RW-like structure must be standard normal; • The period in which the blockwise frequency modulation effectively occurs is known.
That is to say, the expected value of {t The temporal location of (at least) one of the pivots f (k) 0 must be known with sufficient accuracy to align the record with the structure in the derivation. That is, the pivot frequencies do indeed occur at {t (k) 0 } k . However, note that, while the degree of freedom estimator did not converge to the theoretical values under the discrete uniform jump model, the ROC curves still provide intuition on how to partition the data for analysis.

Conclusions
A model verification test has been designed, which has been shown in simulations to operate as anticipated. The test generalises the temporal RWH testing from the economics literature to test for an RW in frequency, by introducing an additional test to account for phase-wrapping behaviours, and how the peak frequencies of sampled WGN behave.
The use of such a model verification test should reduce the need for more heuristic forms of vetting false-positive detections.
We have yet to explore the impact of biases (such as a mean-reverting behaviour as exhibited under H 2 ) on the detection procedure above. Further simulations and modelling are needed to understand how biases affect the ability to determine the presence of RW-like structures in tonal variations.
The above modelling is done for the case where pivots in frequency space are discrete in time. It would be worthwhile to extend the above model verification procedure for Brownian motion-like (BM) structures, with continuous randomness in tonal variations.
Additionally, the above derivations are only formulated for linear blockwise frequency modulations. It would not be complicated to generalise to polynomial blockwise frequency modulations to refine the model verification problem to a SID problem with respect to model order.
Lastly, one normally considers the apparent randomness of behaviours in periodic parameters (such as tones) when interested in forecasting (estimating and/or detecting) the potential of a trend in the signal. That is, for the model presented on the underlying object creating the perceived phenomenon, does the data admit a long-term trend?
A key issue in this problem would be the large number of similar models that could be determined to occur under the same case from the hypothesis structure explored in this paper.
For the following paper, we intend to look into how the question of trend-based behaviours could come into each of the outcomes of the hypothesis test, how one would distinguish between these different trend-based behaviours, and test whether the data support such models.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. First Difference Variance Derivations
Under H 0 , the carrier frequency estimators f (k) c k are distributed according to (18), and so, taking the first two moments yields: Thus, the variance, under H 0 , follows (A1) and (A2) as: Given that under H 0 the central frequency estimators f (k) c k are serially independent, then, following the algebra of variances, the first difference estimator's variance under H 0 is found to be: Thus, under the above assumptions: So, by algebra of variances, and the serial independence of the jump process u (k) For distributions considered for the jump process u (k) k , E u (k+1) 2 H 1 = 2 3 , for uniform jumps, 13 12 , for normal jumps. Thus, ( 1 NT ) 2 , for uniform jumps, 14 24 ( 1 NT ) 2 , for normal jumps. (A8)

Appendix B. Probability of Detection Derivation
To verify H 0 , at a false alarm rate α, we checkẐ 0 against some threshold γ such that if Z 0 ≥ γ we argue that H 0 is statistically significant to false alarm rate α.
However, if assuming a normal jump structure: Given that under H 1 ,Ẑ 1 a ∼ N (0, 1), if assuming a normal jump structure, we would then have that: Lastly, recalling that for the case of a normal jump structure, the first difference variances take the form: