Geometric Approach to Analytic Marginalisation of the Likelihood Ratio for Continuous Gravitational Wave Searches

: The likelihood ratio for a continuous gravitational wave signal is viewed geometrically as a function of the orientation of two vectors; one representing the optimal signal-to-noise ratio, and the other representing the maximised likelihood ratio or F -statistic. Analytic marginalisation over the angle between the vectors yields a marginalised likelihood ratio, which is a function of the F -statistic. Further analytic marginalisation over the optimal signal-to-noise ratio is explored using different choices of prior. Monte-Carlo simulations show that the marginalised likelihood ratios had identical detection power to the F -statistic. This approach demonstrates a route to viewing the F -statistic in a Bayesian context, while retaining the advantages of its efﬁcient computation.

Given an assumed signal model-a quasisinusoid that evolves with the rotation frequency of the neutron star, and is modulated by the relative motion between the star and an Earth-based detector-a matched filter can be constructed to achieve maximum detection power, in the Neyman-Pearson [31] sense of maximising the probability of detection (true positive) at a given probability of false alarm (false positive). Furthermore, as first shown in [32], the matched filter likelihood ratio can be analytically maximised over four amplitude parameters A 1 , A 2 , A 3 , A 4 , resulting in the well-known F -statistic.
The Bayesian approach to signal detection and parameter inference has become central to gravitational-wave astronomy, e.g., [33]. It was recognised in [34,35] that maximisation over signal parameters can bias detection statistics: from the Bayesian viewpoint, maximisation implicitly assumes prior probabilities for the maximised parameters, which may not be physically motivated.
In [36], the F -statistic is shown to possess such a bias due to analytic maximisation over the four amplitude parameters. The A 1 , A 2 , A 3 , A 4 are functions of four physical parameters of the continuous wave signal model: the overall signal strength h 0 ; the inclination ι and polarisation ψ angles, which orient the neutron star rotation axis relative to the observer; and the signal phase φ 0 at some reference time. Given no prior knowledge of the orientation of the neutron star, or the signal phase, one would assume uniform priors on cos ι, ψ, and φ 0 ; and the absence of detections of continuous waves to date is consistent with a choice of prior on h 0 , which prefers weaker signals to stronger ones. The F -statistic, however, implicitly adopts priors which prefer stronger signals (i.e., larger h 0 ) compared to weaker ones. It is also biased in favour of linearly polarised signals where cos ι ∼ 0 (i.e., the neutron star is viewed "edge-on" with the rotation axis at right angles to the line of sight) compared to circularly polarised signals where | cos ι| = 1 (i.e., the neutron star is viewed "face-on" with the rotation axis parallel to the line of sight).
By instead marginalising the likelihood ratio over h 0 , cos ι, ψ, and φ 0 with physicallymotivated priors, [36] introduced the B-statistic, a Bayesian alternative to the F -statistic. Monte-Carlo simulations were performed to estimate the receiver-operator curve, which plots the probability of detection against the probability of false alarm. The B-statistic was found to be a more powerful detection statistic than the F -statistic, assuming a signal population where the distributions of cos ι, ψ, and φ 0 are consistent with the B-statistic priors [35].
A practical downside of the B-statistic is that, to date, a convenient analytic expression for the marginalised likelihood ratio has not been found, and therefore the marginalisation must be performed by numerical integration. This puts the B-statistic at a disadvantage with respect to the F -statistic, for which computationally efficient implementations exist [32,[37][38][39]. Past work has sought to address this issue though transformation of the amplitude parameters to new coordinate systems, and approximations to the marginalisation integrals in various limits [40][41][42][43].
This paper presents an alternative route to marginalising the likelihood ratio for continuous gravitational wave searches. A geometric view of the likelihood ratio is presented in Section 2, which permits analytic marginalisation over its parameters in Section 3. Receiver-operator curves for the marginalised likelihood ratio are presented in Section 4, and a discussion in Section 5 concludes the paper.

Geometric View of the Likelihood Ratio
Gravitational waves detectors measure strain-the differential displacement between test particles due to a passing gravitational wave. The strain due to a continuous wave signal may be written as [32] h(t, A, p) = A · h(t, p) , where A ∈ R 4 is a vector of the amplitude parameters, and h(t, p) ∈ R 4 is a vector of timedependent basis functions. 1 Additional parameters p of h encode the phase modulation of the continuous wave signal: these typically include Taylor coefficients of the evolution of the gravitational wave frequency, the position of the neutron star in the sky, and if necessary the parameters of the orbit of the neutron star around a companion. The likelihood ratio for continuous waves arises from considering two hypotheses: that the data x(t) consist only of Gaussian stationary noise, with single-sided power spectral density S; or that the data additionally contain a signal specified by Equation (1). The log-likelihood ratio between the two hypotheses is then [32,44] A search for a continuous wave is performed by repeated computation of Equation (2) for different choices of p, corresponding to different choices of signal hypothesis. Typically, a fixed set of p, called a template bank, is constructed in such a way as to ensure any signal in x(t) matches at least one of the signal hypotheses with low loss in signal-to-noise ratio, typically 30% [45]. A metric on the parameter space of p is often used in constructing template banks [44,[46][47][48].
The elements of the vector X(x; p) ∈ R 4 in Equation (2) are inner products (normalised by S) of the data x(t) with the basis functions h(t, p). The elements of the matrix M ∈ R 4 ⊗ R 4 are inner products of the h(t, p) with each other. The typical time-span of data searched for continuous waves (days to years) far exceeds the time-scale of oscillations in h(t, p) due to the gravitational wave frequency (∼1-10 3 Hz); as a result, some inner products between the h(t, p) quickly average to zero. The remaining non-zero elements of M are [32,49,50] (3) The element E = 0 under the assumption that the gravitational wavelength is much larger than the size of the detector; this holds for terrestrial gravitational-wave interferometers, though not for proposed space-based detectors [49,50]. The elements A, B, and C can be expressed as inner products between two functions a(t, p) and b(t, p), which are related to the response of the gravitational wave detector to the two fundamental polarisations-"plus" and "cross"-of gravitational waves in general relativity. The matrix M is symmetric and positive definite [32]. It follows that its four leading principal minors D 1 , D 2 , D 3 , and D 4 are all strictly positive: It also follows that M possesses a Cholesky decomposition: a lower triangular matrix N ∈ R 4 ⊗ R 4 such that M = N N T , where N T is the transpose of N . The elements of N are given in terms of the elements of M and the leading principal minors D 2 and D 3 : The log-likelihood ratio of Equation (2) can then be re-expressed as where b 2 ≡ b · b defines the vector norm. The lengths of the vectors b and y(x; p) are related to two well-known quantities. The length of b is proportional to the optimal signal-to-noise ratio of the matched filter (cf. [44], Equation (24)): The length of y(x; p) is proportional to the F -statistic 2 (cf. [44], Equation (19)): be the cosine of the angle between b and y(x; p). The substitution of Equations (9)-(11) into Equation (8) gives As shown in Figure 1, the log-likelihood ratio may be viewed geometrically as a function of the relative orientation of two vectors. One vector, y(x; p), is a function of the data x(t), and represents the matched filter; the other vector, b, represents the expected signalto-noise ratio. Maximisation of the log-likelihood ratio with respect to A is equivalent to aligning b and y(x; p): maximising Equation (12) with respect to ρ gives and Equations (12) and (13) are maximised when κ = 1, i.e., when b and y(x; p) are parallel:

Analytic Marginalisation of the Likelihood Ratio
Instead of maximising the likelihood ratio with respect to κ and ρ, one could marginalise over these parameters with suitable priors. Marginalisation over κ is performed in Section 3.1, followed by marginalisation over ρ, considering different choices of prior, in Section 3.2.  (6)) and y(x; p) (Equation (7)), their lengths (Equations (9) and (10)), and the cosine κ of the angle between them (Equation (11)).

Marginalisation over κ
In the absence of a deeper understanding of the relationship between b and y(x; p), it is not unreasonable to adopt a prior on κ that assumes no preferred orientation between the two vectors. The prior on κ is then given by the distribution of u · v, where u ∈ R 4 and v ∈ R 4 are unit vectors uniformly distributed on the 3-sphere S 3 ⊂ R 4 .
By invoking spherical symmetry, one can, without a loss of generality, fix one vector, say u = (1, 0, 0, 0). The problem then reduces to finding the distribution of u · v = v 1 . It is well known [51] that a vector uniformly distributed on the (d − 1)-sphere S d−1 ⊂ R d may be found by generating a vector z ∈ R d whose elements are independent standard normal variates, then normalising z to unit length. Applying this procedure to v, the square of its first element is, therefore, The distributions of z 2 1 and z 2 2 + z 2 3 + z 2 4 are chi-squared distributions with 1 and 3 degrees of freedom, respectively. It follows that the distribution of κ 2 ∼ v 2 1 is a beta distribution with the parameters α = 1/2, β = 3/2: To find the distribution of κ, perform a change of variables and expand the range of the distribution to [−1, 1]: Marginalisation of the likelihood ratio, in the form of Equation (12), over κ with the prior of Equation (18) gives the analytic expression where I n is the modified Bessel function of the first kind of order n. This function of ρ and F is plotted in Figure 2. When ρ is fixed, Λ(x; ρ, F ) is a monotonically increasing function of F . When F is fixed, Λ(x; ρ, F ) monotonically decreases as a function of ρ for F ≤ 2, but achieves a local maximum at some ρ > 0 for F > 2.

Marginalisation over ρ
The marginalised likelihood ratio of Equation (19) may be further analytically marginalised over ρ, depending on its choice of prior. For example, the choice of a uniform (improper) prior on ρ, leads to This is a strictly increasing function of F , and is plotted in Figures 3 and 4. Another possible choice is an exponential prior on ρ: with parameter ρ 0 . This choice of prior is consistent with the assumption that the signal-tonoise ratio of continuous wave signals is weak, with the most likely value at ρ = 0, and most values at ρ ρ 0 . Figure 3 plots the exponential priors for choices of the parameter ρ 0 ; larger values of ρ 0 lower the peak at ρ = 0 and flatten out the distribution. Marginalisation of Equation (19) with the exponential prior on ρ results in This is a strictly increasing function of F and ρ 0 and it is plotted alongside Λ unif. (x; F ) in Figure 3.
with parameter ρ c . This function is plotted in Figure 3 for choices of ρ c ; it has a peaked shape, with the maximum occurring at ρ = ρ c . This choice of prior is consistent with the assumption that the signal-to-noise ratio of continuous wave signals has some preferred value around ρ ≈ ρ c , as might be expected if neutron stars possess a minimum ellipticity [29]. Marginalisation of Equation (19) with this peaked prior on ρ leads to This is a strictly increasing function of F and ρ c and is plotted alongside Λ unif. (x; F ) in Figure 4.  (24)) as a function of ρ, for choices of the parameter ρ c . Bottom: the likelihood ratio marginalised over κ and ρ as a function of F , with the peaked prior on ρ (Equation (25)) for choices of ρ c , and with the uniform prior (Equation (21)). Figure 5 plots the likelihood ratios Λ exp. (x; F , ρ 0 ) and Λ peak. (x; F , ρ c ) marginalised over κ and ρ with the exponential and peaked priors, respectively, as functions of the priors' respective parameters ρ 0 and ρ c . The likelihoods are evaluated at F = 2, the expectation value of F assuming no signal is present. The behaviour of the likelihood ratios at F = 2 gives some indication of which hypothesis is favoured in the absence of evidence for a signal. Both likelihood ratios favour the noise hypothesis (Λ < 1) for strictly positive parameter values. The limiting behaviour at zero parameter values are: Figure 5. The likelihood ratios marginalised over κ and ρ with the exponential (Equation (23)) and peaked (Equation (25)) priors, as functions of the priors' respective parameters ρ 0 and ρ c , at fixed F = 2.

Receiver-Operator Curves
In Section 3.2, all three likelihood ratios marginalised over ρ (Equations (21), (23) and (25)) were found to be strictly increasing functions of F . This implies that each marginalised likelihood ratio will have the same detection power as the F -statistic.
Detection power is most commonly determined by Monte-Carlo simulations of the detection statistic (e.g., F ), in both the absence and presence of a signal. First, a set of random values of F is generated, assuming no signal is present. A threshold F is determined that gives a chosen false alarm probability, p f.a. : the fraction of simulated trials where F | no signal < F . Then, a second set of random values of F is generated, this time assuming the presence of a signal. Finally, the detection probability p det. is determined: the fraction of simulated trials where F | signal > F . The receiver-operator curve is the function p det . F (p f.a. ) . The most powerful detection statistic is that which gives the largest p det. at a given p f.a. .
If g(F ) is a strictly increasing function of F , then, by definition, F | no signal < F implies g(F )| no signal < g(F ), and F | signal > F implies g(F )| signal > g(F ). Hence, by applying g(·) to all simulated values of F , the transformed threshold g(F ) will yield the same false alarm and detection probabilities, and therefore g(F ) will have the same detection power as F .
To confirm, receiver-operator curves are computed for the F -statistic, B-statistic, and the likelihood ratio Λ unif. (x; F ) marginalised over κ and ρ with the uniform prior on ρ. Following [36], the elements of M are fixed at A = 0.154, B = 0.234, C = −0.0104, and E = 0, and four signal populations are chosen: (i) fixed ρ = 4, cos ι = 0 (i.e., the neutron star is viewed "edge-on"), ψ = 0; (ii) fixed ρ = 4, cos ι = 0.99 3 (i.e., the neutron star is viewed "face-on"), ψ = 0; (iii) fixed ρ = 4, randomly drawn cos ι ∈ [−1, 1], ψ ∈ [−π/4, π/4]; and (iv) fixed h 0 √ T/S = 10, randomly drawn cos ι ∈ [−1, 1], ψ ∈ [−π/4, π/4]; where T = 25 hours. For all signal populations, φ 0 was randomly chosen from [0, 2π]. For the no-signal population, and for each of the signal populations, 10 5 random values of F and B were generated using the program lalapps_synthesizeBstatMC from the software package LALSuite [52]; values of Λ unif. (x; F ) were then computed from F using Equation (21). Figure 6 shows receiver-operator curves for the four signal populations listed above. The curves for the F -statistic and B-statistic reproduce Figures 2 and 3 of [36]. The curves for Λ unif. (x; F ) overlay the corresponding curves for F , confirming that Λ unif. (x; F ) has identical detection power to the F -statistic. Receiver-operator curves for Λ exp. (x; F , ρ 0 ), and Λ peak. (x; F , ρ c ) were computed, for various choices of ρ 0 and ρ c , respectively, and were found to be identical to the curve for Λ unif. (x; F ). Figure 7 compares the distribution of κ computed from the Monte-Carlo samples using Equation (11) with the assumed prior of Equation (18). The Monte-Carlo distribution is a good fit to the prior for cos ι = 0, and a poor fit for cos ι ∼ 1; for the two signal populations where cos ι was randomly drawn, the fit is intermediate between the two extremes. This suggests that the initial choice of prior on κ, which assumed no preferred orientation between the vectors b and y(x; p), is biased in favour of linearly polarised signals. This is consistent with Λ unif. (x; F ) being of equivalent detection power to the F -statistic, which, as noted in [36], is also biased in favour of linearly polarised signals.

Discussion
This paper presents an alternative approach (cf. [40][41][42][43]) to analytically marginalising the likelihood ratio used in continuous wave searches. Marginalised likelihood ratios were derived assuming a prior on κ, and for example priors on ρ. The expressions for the marginalised likelihood ratios are in analytic form, involving only exponential and Bessel functions. Receiver-operator curves show that the marginalised likelihood ratios have the same detection power as the F -statistic, being strictly increasing functions of F .
The marginalised likelihood ratios fail to capture the additional detection power of the B-statistic for signal populations with randomly drawn cos ι. That said, as shown in [36] and reproduced in Figure 6, the advantage of the B-statistic over the F -statistic appears to be slight. A slightly higher detection probability, from using using the B-statistic instead of the F -statistic, corresponds to slightly smaller h 0 at which continuous waves can be detected at a given confidence 1 − p det. . This small difference, however, could well be relatively insignificant; for example, it could be within the error in h 0 due to the calibration uncertainty of gravitational wave detectors [53]. Computationally efficient implementations of detection statistics, as exist for the F -statistic, are essential for wide-parameter-space, computationally-costly searches. To date, the advantage of the B-statistic in terms of detection power have not outweighed its disadvantage in terms of computational efficiency, and no wide-parameter-space search for continuous wave has been performed by computing the B-statistic directly.
It should also be noted that the B-statistic, as presented in [36], assumes a particular emission model for continuous waves: the triaxial model, where the neutron star radiates at twice its rotation frequency, and the amplitudes of the "plus" and "cross" polarisations are given by h 0 (1 + cos 2 ι)/2 and h 0 cos ι, respectively. In the absence of a continuous wave detection, however, one cannot be certain whether this is the correct emission model. Continuous wave radiation at other frequencies [54][55][56] are modelled by different expressions for the "plus" and "cross" polarisations in terms of h 0 , cos ι, and other parameters.
It is possible that the detection power of the marginalised likelihood ratios could be improved by a different choice of prior on κ. As seen in Figure 7, the prior initially assumed in Section 3.1 is not necessarily a good fit, depending on the distribution of cos ι. If a simple analytic expression for the distribution of κ computed from the Monte-Carlo samples could be determined-either from first principles, or simply as an empirical fit-it is possible that the likelihood ratio marginalised over κ could still be expressed analytically.
In marginalising the likelihood ratio over the parameters κ and ρ, it was assumed that the priors on these variables, p(κ) and p(ρ), were independent. Figure 7, however, shows that the prior on κ should be a function of cos ι, and since ρ also depends on cos ι, a joint prior p(κ, ρ) might be needed in order to increase the detection power beyond the F -statistic. It is unclear, however, whether a simple but physically motivated joint prior could be found that still permits analytic marginalisation of the likelihood ratio. A joint prior would also make it more difficult to change the prior on ρ, should one wish to consider different models for the population of continuous wave signals.
Even if the parameterisation of the likelihood ratio in terms of κ, ρ, and F does not prove a fruitful route to obtaining an analytic expression for the B-statistic, it could nevertheless provide a useful method of incorporating the F -statistic into a Bayesian framework. The marginalised likelihood ratios presented in Section 3 are readily computed, as they are a function only of the F -statistic and well-known special functions. These are able to harness the computational efficiency of existing implementations of the F -statistic, while permitting an assumption of a prior on ρ that is more physically reasonable than the prior implicit in the F -statistic, which is biased toward stronger signals [36]. More physically reasonable priors on ρ than the examples explored here, such as the Fermi-Dirac prior of [57], could be amenable to analytic marginalisation through this approach.
An example of where a Bayesian treatment of the F -statistic could be interesting is inferring properties of the population of Galactic neutron stars. While methods have been proposed for inferring properties from an ensemble of known pulsars [58,59], a similar framework does not yet exist for wide-parameter-space searches. Traditionally, such searches have computed an upper limit on h 0 satisfying the following property: given a false alarm probability (typically 1%, taking into account the trials factor of the search), and assuming a population of signals with constant h 0 (and other parameters chosen at random from physical priors), a high fraction of the signal population (typically 90-95%) would have been detected. It is not expected, however, that the population of Galactic neutron stars are all radiating gravitational waves at the same h 0 , and it is not clear what may be inferred from the upper limit on h 0 .
Perhaps, instead, a framework could be developed to compute posteriors on the parameters of an assumed model for the distribution of h 0 ; for example, assuming the exponential prior of ρ of Equation (22), and inferring the posterior on its parameter ρ c from a wide-parameter-space search. The approach to marginalisation of the likelihood ratio presented in this paper might provide a route toward constructing this framework.