Search for Extreme Mass Ratio Inspirals Using Particle Swarm Optimization and Reduced Dimensionality Likelihoods

: Extreme-mass-ratio inspirals (EMRIs) are significant observational targets for space-borne gravitational wave detectors, namely, LISA, Taiji, and Tianqin, which involve the inspiral of stellar-mass compact objects into massive black holes (MBHs) with a mass range of approximately 10 4 ∼ 10 7 M ⊙ . EMRIs are estimated to produce long-lived gravitational wave signals with more than 10 5 cycles before plunge, making them an ideal laboratory for exploring the strong-gravity properties of the spacetimes around the MBHs, stellar dynamics in galactic nuclei, and properties of the MBHs itself. However, the complexity of the waveform model, which involves the superposition of multiple harmonics, as well as the high-dimensional and large-volume parameter space, make the fully coherent search challenging. In our previous work, we proposed a 10-dimensional search using Particle Swarm Optimization (PSO) with local maximization over the three initial angles. In this study, we extend the search to an 8-dimensional PSO with local maximization over both the three initial angles and the angles of spin direction of the MBH, where the latter contribute a time-independent amplitude to the waveforms. Additionally, we propose a 7-dimensional PSO search by using a fiducial value for the initial orbital frequency and shifting the corresponding 8-dimensional Time Delay Interferometry responses until a certain lag returns the corresponding 8-dimensional log-likelihood ratio’s maximum. The reduced dimensionality likelihoods enable us to successfully search for EMRI signals with a duration of 0.5 years and signal-to-noise ratio of 50 within a wider search range than our previous study. However, the ranges used by both the LISA Data Challenge (LDC) and Mock LISA Data Challenge (MLDC) to generate their simulated signals are still wider than the those we currently employ in our direct searches. Consequently, we discuss further developments, such as using a hierarchical search to narrow down the search ranges of certain parameters and applying Graphics Processing Units to speed up the code. These advances aim to improve the efficiency, accuracy, and generality of the EMRI search algorithm.

Time-frequency methods provide a straightforward solution for detecting high SNR signals without the need for waveform models.Once the signal tracks in the time-frequency plane are well fitted, the waveform models can be used to estimate a subset of source parameters [26,27].The advantage of this approach is that it is computationally cheap.However, the disadvantage is that it requires a lot of tuning for the threshold fitting in the timefrequency plane, and it is difficult to detect signals with a low SNR.Recently, Convolutional Neural Network (CNN)-based methods have been developed, where different inputs such as time domain data [28], frequency domain data [29], and time-frequency planes by Q-transform [30,31] are fed to the neural network.These methods provide an alternative computationally efficient solution for EMRI data analysis, but they are still limited to high SNR signals.
Template-based matched filtering is the best option for a deeper search in SNRs, although it is computationally expensive.In EMRI data analysis, accurate EMRI waveforms are quite complicated and computationally expensive when considering the self-force of the COs [32].As a result, phenomenological waveforms from the kluge family are widely used at present in the development of EMRI data analysis methods.The analytical kluge (AK) waveform [33] is used in the Mock LISA Data Challenges (MLDCs) [34][35][36][37] and the latest LISA Data Challenge (LDC) [38], while the augmented analytical kluge (AAK) waveform [39] is used in the Taiji data challenge [40].The AK waveform includes 14 parameters, with the spin of the COs usually being ignored.Six of these parameters contribute to the phase evolution of the waveform and need to be estimated with high precision, thus contributing a prominent 6-dimensional sharp peak to the signal location in the parameter space of the fitness function.The AK waveform consists of a superposition of multiple harmonics, resulting in multiple secondary peaks surrounding the primary one in the parameter space [41].The primary peak indicates a good match of all the harmonics, while the secondary peaks indicate that a subset of harmonics is matched well, especially the dominant ones.Therefore, it is difficult for a global optimizer to locate a complete signal in such a high-dimensional and multimodal parameter space.
It is well known that longer-duration signals contribute more sensitivity and less flexibility to coherent matched filtering [42], and the sharp peak can only be located within a reasonable range width [43].As a result, hierarchical search methods are effective in overcoming the methodological difficulties in EMRI data analysis.It can be implemented by either using shorter-duration signals and gradually turning to longer signals with the constrained information utilized in the next search [42,44] or by initially searching for fixed-duration signal within a wide range and later focusing on narrower ranges extracted from the previous searches [43] in matched filtering.It is also beneficial to develop mixed versions by combining these two approaches together.
Given the fitness function usually defined by the log-likelihood ratio (LLR), Bayesian [45] or Fisherian methods [46] are the most commonly used ones for estimating the posterior probability density function or the global best-fit fitness value and location, which are then used for signal detection and parameter estimation.In EMRI data analysis, modified Markov Chain Monte Carlo (MCMC) methods, such as constrained Metropolis-Hastings Monte Carlo (MHMC) [44], Evolutionary Monte Carlo (EMC) [47], and parallel tempered Markov Chain Monte Carlo (PTMCMC) [48,49], have been used in previous works.The global optimizer, Particle Swarm Optimization (PSO), first proposed in [50,51] and validated by [52,53], was used in our previous work for a 10-dimensional EMRI search problem [54] and was proven effective in the LIGO data analysis of inspiral signals [55][56][57][58] and transient signals [59][60][61][62], the pulsar timing array data analysis of supermassive black holes [63][64][65][66][67][68], and the LISA data analysis of galactic binaries [69][70][71][72][73].Besides gravitational wave data analysis, it is also widely used in other fields, such as electromagnetics [74,75].More comprehensive discussions can be seen in [76][77][78][79].The advantages of PSO are its fewer tunable parameters and smaller number of LLR evaluations needed to reach the global LLR maximum compared with the MCMC method.In this paper, we extend the application of PSO to an EMRI search problem with two different dimensions: an 8-dimensional search and a 7-dimensional search, respectively.Our results demonstrate that the PSO-based search algorithm is able to accurately estimate the simulated signals with an SNR value of 50 and a duration of 0.5 years by using these reduced dimensionality LLRs.Notably, it should be emphasized that the current search ranges employed are substantially broader than those utilized in our previous work, resulting in a significant increase in the parameter space volume, approximately ∼50-fold for the 8-dimensional search and ∼100-fold for the 7-dimensional search.
The rest of the paper is organized as follows.In Section 2, we describe the consistent TDI combinations, noise model, and the signal model as LDCs used in the paper.In Section 3, we present how the reduced dimensionality likelihoods are defined.The Particle Swarm Optimization algorithm used for matched filtering is illustrated in Section 4. Finally, in Section 5, we report the results and give the corresponding discussions in Section 6.

Data Description
First, we describe the application of time-delay interferometry (TDI) [80] in this paper, which is employed by space-based GW detectors to mitigate the dominant laser frequency noise.Subsequently, we present the theoretical model of power spectral densities (PSDs) utilized by LDCs.Lastly, we provide a description of the current standard waveform model employed for EMRI data analysis.

TDI Combinations
Throughout the paper, we adhere to the coordinate and TDI conventions defined in [81].Given the definitions of the polarization tensors ϵ +,× and LISA orbit, we can derive the corresponding geometrical quantities n l and R k from the orbit.Here, n l represents the unit vector along the arm link l between the two involved satellites, and R k denotes the position vector of the k-th satellite.The sky's location, θ s and ϕ s , can be used to define the unit vector k which indicates the direction of the GW propagation.The antenna patterns F +,× l of the single arm l are given by where ψ is the polarization angle and the quantities U +,× l are defined by The symbol : denotes the contraction operation on arbitrary tensors U and V, namely, By mapping the antenna patterns F +,× l to the polarized waveforms h +,× , we can express the corresponding strain response of the arm l as Φ l : The expression for the single-arm response of the laser along the arm l can then be given as follows: where the labels s and r represent the laser sender and receiver of the satellite, respectively, and L l is the corresponding arm length of link l.The sign of l is positive when the label slr follows a cyclic permutation of indices 1 → 2 → 3 → 1 labeling the three satellites; otherwise, it is negative.By following the well-designed optical path of the TDI combinations X, Y, and Z of the first generation, the laser frequency noise can be canceled under the approximation of a constant arm length.This cancellation is achieved by linearly combining the artificially delayed single-arm responses y slr,L l , as shown below: where the y slr,L l are linked to the single-arm responses y slr through y slr,L l (t) = y slr (t − L l ).
The first-generation TDI combination X (same for Y or Z) is calculated as the difference between two Michelson-type responses.Each Michelson-type response consists of an optical path with 4 single-arm responses.Each single-arm response introduces a delay corresponding to its arm length along the optical path.Consequently, there are 0, 1, 2, and 3 accumulated indices for the delays L l in the y slr,L l of Equation ( 6), respectively, and the corresponding signs follow the same rule as the links l.Additionally, we can obtain the mutually independent noise TDI combinations A, E, and T by linearly combining the TDI combinations X, Y, and Z as follows: In this paper, we focus on the data analysis method for an individual EMRI source.As a result, the corresponding data model of each combination I is described by where d I represents the TDI combination I, with I ∈ {A, E, T}; h I denotes the single EMRI signal; and n I represents the purely instrumental noise for simplicity.We concentrate solely on the TDI combinations A and E because the TDI combination T is less sensitive to GWs, which aligns with the treatment employed by numerous other studies.

Noise Model and Signal-to-Noise Ratio
We utilize the identical PSD model of TDI combinations A and E of the first generation, as provided in [81]: where ω is the angular frequency of gravitational waves, f = ω 2π is the corresponding frequency in Hz, and L is the constant arm length whose value is 2.5 × 10 9 m in the current design of LISA.The acceleration noise S Acc and the Instrumental Optical Metrology System noise S IMS under the noise model "SciRDv1" are defined in [82] as follows: Hz , ) 4 1 Hz . ( Having acquired the analytical expressions of the PSD, the inner product between two signals a and b is defined by where x denotes the Discrete Fourier Transform (DFT) of a time series x = (x 0 , x 1 , . . ., x N−1 ), and f k = k f s /N, k = 0, 1, . . ., N − 1, with f s being the sampling frequency.In terms of the inner product, the SNR of a signal can be defined as follows: It is also convenient to define the combined overlap between two signals h which is commonly used to assess the quality of the match between injected and estimated signals in mock data analysis [34].The overlap of the individual combination, either A or E, can be obtained by setting the other combination to zero.

Signal Model: EMRI Waveform
The AK waveform [33] includes 14 parameters, namely, µ, M, λ, S/M 2 , e 0 , ν 0 , θ s , ϕ s , θ k , ϕ k , ϕ 0 , γ 0 , α 0 , and D. The first six parameters represent the mass of the COs, the mass of the MBH, the inclination angle between the orbital angular momentum of the COs and the spin direction of the MBH, the spin magnitude of the MBH, the initial orbital eccentricity, and the initial orbital frequency.These parameters contribute to the orbital dynamics of EMRI sources.The angles θ s and ϕ s denote the ecliptic colatitude and longitude of the source's sky location in the Solar System Barycenter (SSB) frame, while θ k and ϕ k represent the polar and azimuthal angles of the spin direction of the MBH in the SSB frame.Additionally, ϕ 0 , γ 0 , and α 0 correspond to the initial angles of orbital motion, pericenter precession, and Lense-Thirring precession, respectively.Finally, D represents the distance between the source and the SSB center.The polarization angle ψ is a constant in the static frame, as discussed in [44], and depends on θ s , ϕ s , θ k , and ϕ k .
The orbital dynamics in the AK waveform are described by the following set of ordinary differential equations (ODEs).These ODEs contain five quantities: ϕ, ν, γ, e, and α, where ν and e are the orbital frequency and the orbital eccentricity, respectively, and ϕ, γ, and α are the phases describing orbital motion, pericenter precession, and Lense-Thirring precession, respectively.
It is computationally expensive to solve the ODEs using a time interval of 15 s, which corresponds to the observational cadence of LISA.However, the slow evolution of the orbital parameters predicted for most EMRI sources allows us to use a larger cadence of 15,360 s when solving the ODEs.As suggested in [81], the fifth-order Cash-Karp Runge-Kutta ODEs solver [83] is used at the larger cadence, and the solutions are then interpolated to the desired cadence of 15 seconds.With the ODEs solutions at our disposal, we can now proceed to the calculation of the polarized waveforms.For each harmonic labeled as (n, 2, m), the following quantities in their polarized waveforms are time independent: (1) the amplitude factor A such as 1/D, (2) the initial phase Φ n2m 0 = nϕ 0 + 2 γ 0 + mα 0 , and (3) the time-independent amplitude A c,m [81], and the superscript c indicates that the quantity is an unknown constant.Therefore, the polarized waveforms can be expressed as follows: where the parameter set Θ contains 14 parameters; θ ′ denotes the 13 parameters excluding D; θ represents the 8 parameters excluding D, ϕ 0 , γ 0 , α 0 , θ k , and ϕ k ; and the parameter set θ ′′ includes the 6 ODE-related parameters, µ, M, λ, S/M 2 , e 0 , and ν 0 .Thus, we have Based on the number of parameters that they depend on, h n2m +,× (Θ) and s n2m +,× (θ ′ ) denote the 14 and 13-dimensional polarized waveforms, respectively, while the time-varying components correspond to the term x n (θ ′′ ), where the power distributions among harmonics depend on the index n.In the case of the AK model, the range of values for m is from −2 to 2, resulting in a total of 5 harmonics for each n.Here, we adopt the same choice as our previous work [54] to select the loudest 10 harmonics by analyzing x n (θ ′′ ).Therefore, the choice is to pick up two values for n from the values 1, 2, 3, 4, 5 used in LDC.It is worth mentioning that additional harmonics could be considered once computational limitations, such as accessing sufficient cores or utilizing a Graphics Processing Units (GPUs) code, are overcome.However, for the current study, we focus on the loudest 10 harmonics based on the cluster resources available to us.As shown in Table 1, the power distributions among harmonics indicate that the harmonics with n = 1 are considerably weaker compared to other harmonics with different values of n.Furthermore, as n increases (with n ≥ 2), the strength of the harmonics diminishes.This trend holds true for moderately eccentric sources, such as those with e 0 ≤ 0.5.Table 1 follows the same conventions as Table 1 in our previous paper [54], with two exceptions: (1) the harmonics indices become n, varying from 1 to 5, and (2) the power fraction is used instead of the SNR fraction, as their summation equals unity.Therefore, for moderately eccentric sources, the optimal choice for the loudest 10 harmonics would be those with n ∈ {2, 3}.It requires more attention to select the dominant harmonics for high eccentric sources, e.g., e 0 > 0.5, where the power distributions across harmonics exhibit greater fluctuations.
Table 1.Illustration of variation in the order of contributions of harmonics to the total power of an EMRI signal as a function of its parameters.Five typical signals are represented in the format of C/F, where C indicates the harmonics index, and F represents the corresponding power fraction of the signal.The row labeled as "power fraction" indicates the cumulative power of the top 10 harmonics.Signals in columns 3, 4, 5, and 6 utilize the LDC-1.2[81] parameters but modify only one parameter specified in the header row.

SNR Order (Descending)
The h I (Θ) is usually called template in matched filtering to distinguish it from the unknown and true signal encoded in the noisy data.In the Generalized Likelihood Ratio Test [46], the global maximum of the LLR L G and the corresponding location Θ, where are used for signal detection and parameter estimation, respectively.Analytically maximizing over A by ∂Λ(θ ′ , A)/∂A = 0 leads to with the maximizer being We call ρ(θ ′ ) the 13-dimensional LLR [44].Creating further nested levels in the maximization of ρ(θ ′ ) that separate out the time-independent parts provide reduced dimensionality LLRs, namely, 8-dimensional and 7-dimensional ones, as discussed below.

8-Dimensional LLR
By incorporating the polarized waveforms of the i-th harmonic in Equation ( 21) with the antenna patterns of arm l in Equation ( 1), we can obtain the corresponding strain response as follows: Here, the map for harmonics indices from (n, m) to i are n = ⌊(i − 1)/5⌋ + 1 and m = ((i − 1) mod 5) − 2, where i ranges from 1 to 25 in LDC.The linearity from strain responses to TDI responses for combination I leads to the same linear combination, because only the time-varying terms x I,i p (θ) are projected to the TDI delays, and the time-independent coefficients a i p , which absorb the parameters ϕ 0 , γ 0 , α 0 , θ k , and ϕ k , remain unchanged.
To apply this linear decomposition in Equation (30) to the inner products in the 13-dimensional LLR in Equation ( 27), we can express the inner products as follows: In our previous work [54], we introduced an approach in which the three initial angles ϕ 0 , γ 0 , and α 0 are separated from the remaining 10 parameters in Equation (27).This allows us to apply local maximization [84] over the three initial angles for a given point in the 10-dimensional parameter space and perform the search over the 10 parameters using PSO.In this paper, we extend the approach by employing local maximization [84] over the five parameters: θ k , ϕ k , ϕ 0 , γ 0 , and α 0 , using PSO for the remaining 8-dimensional search.
The following quantities, (d I x I,i p (θ)) and (x I,i p (θ) x I,j q (θ)) can be pre-calculated for each specific θ.This enables computationally efficient local maximization over the coefficients a i p , namely, over θ k , ϕ k , ϕ 0 , γ 0 , and α 0 .
The nature of the fitness function over the 5-dimensional subspace, consisting of θ k , ϕ k , ϕ 0 , γ 0 , and α 0 , is illustrated in Figure 1.The figure showcases the LLR (square root) landscape of a 2-dimensional slice of θ k and ϕ k (Figure 1a), as well as three randomly selected planes (Figure 1b-d) in the 3-dimensional subspace composed of ϕ 0 , γ 0 , and α 0 .This representation is valid for the specific location, although similar patterns are observed from the other locations as well.Given the presence of a fairly small number of local maxima with comparable or equal values, the local maximization approach is well-suited for handling this 5-dimensional subspace.To ensure that the global maximum is caught, we employed a total of 243 independent runs of a local maximizer starting from initial points distributed over a grid, with each angle in the 5-dimensional subspace enumerated from the 1-dimensional grid {0, 2π/3, 4π/3}, which are uniform spacings from 0 to 2π.The best-fit 5-dimensional location is determined from the run that returns the highest value.

7-Dimensional LLR
The initial orbital frequency, namely, ν 0 , corresponds to the moment t 0 at which the EMRI signal is captured by the detector, thus its varying results in a uniform shift of time labels to all the harmonics of the signal.As discussed in [47], the corresponding shift of the time label can be numerically maximized in two ways for an arbitrary harmonic, denoted as x here.The first is a phase rotation in the frequency domain: where n denotes the number of the shift and ∆t represents the observational cadence.The inverse Fast Fourier Transform of the term x( f k )e i2π f k n∆t , which rotates the x( f k ) by the same amount of n∆t at each f k , returns the delayed term x(t − n∆t).For the same shift, the second is a straightforward lag sliding in the time domain as follows: (x 0 , x 1 , . . ., x N−1 ) n −→ (x n , x n+1 , . . ., x N−1 , 0, . . ., 0) , (33) where the zero paddings at the end of the shifted signal cover n zeros.The detector noise in the low-frequency region is usually large; as a result, a fiducial ν 0 , e.g., 1 mHz, can be determined through a pre-analysis of the detector's features, which indicates that the detector has reached a level of sensitivity to detect the GWs of EMRI signals starting from the chosen fiducial ν 0 .Therefore, the 8-dimensional TDI responses x I,i p (θ) in Equation ( 30) could be calculated by running forward ODEs using θ with its initial ν 0 specified as the selected fiducial value, and the initial e 0 being one of the parameters for matched filtering.We can then systematically shift the x I,i p (θ) lag-by-lag starting from the lag of the fiducial ν 0 until the 8-dimensional LLR maximum is achieved.The corresponding lag provides the best-fit estimation of e 0 and ν 0 .Here, we set the number of shifts to 11 for computational limitations.
Figure 2 illustrates the 8-dimensional and the 7-dimensional LLRs which share the same values for parameter set θ \ {ν 0 } but vary ν 0 for the former and the fiducial ν 0 for the latter.The lag varies from −10 to 10 where the zero lag corresponds to the true lag of LDC ν 0 , 7.3804631408 × 10 −4 Hz.It can be observed that the 8-dimensional LLRs using the negative lags can be successfully mapped to the 7-dimensional LLRs with a well-fitted ν 0 by properly shifting the corresponding x I,i p (θ).This is possible because the total 11 shifts can cover the zero lag anyway, whereas the positive lags fail to locate the zero lag due to the rightward shift of x I,i p (θ).In this paper, the lag of the fiducial ν 0 is determined by considering 4 lags ahead of the LDC's true lag; thus, the corresponding value is 7.3804587134 × 10 −4 Hz.In order to accurately capture unknown EMRI signals, it would generally be necessary to fit more lags.However, due to the computational expense of the shifting operations for x I,i p (θ) and the evaluations of the 8-dimensional LLRs by using the current code, only 11 lag-by-lag shifts are utilized, enumerating lags from −4 to 6 as illustrated in Figure 2.This setting ensures the scanning of the true lag, and it is used to demonstrate the functionality of the 7-dimensional LLR.In future works, we plan to address these computational challenges by implementing a GPU-accelerated code, which will allow for the exploration of additional lags.

Particle Swarm Optimization
As discussed earlier, the search using reduced dimensional likelihoods involves the following steps.First, the distance D in the 14-dimensional LLR in Equation ( 23) is analytically maximized.Next, the local maximization over the five angles θ k , ϕ k , ϕ 0 , γ 0 , and α 0 is carried out using the Simplex algorithm of Nelder and Mead [84].Finally, the remaining parameters in the set θ (8-dimensional search), or θ excluding ν 0 (7-dimensional search), are numerically maximized by PSO.In this chapter, we briefly describe the PSO algorithm [50,51,79].
Given the fitness function f (x), where x is defined in R M , the optimization problem can be stated as follows: The best location, x * , refers to the point in the search space D that yields the highest fitness value, represented as f (x * ); M is the dimension of the parameter space for f (x).Locating the primary peak of a multimodal fitness function can be challenging.The PSO algorithm, which is utilized in this paper as a global maximizer, is a suitable approach to addressing such challenges.Successful applications of PSO in handling similar issues are discussed in Section 1.It should be noted that in our case, the fitness functions are the 8-dimensional LLR discussed in Section 3.2 and the 7-dimensional LLR discussed in Section 3.3.PSO consists of multiple agents known as particles.Each particle updates its position by considering the information from both itself and its neighboring particles at each iteration.The algorithm aims to converge towards the global maximum, which corresponds to the primary peak of the fitness function within the search space, by utilizing a balance between global exploration and local exploitation.Such balance typically results in the good performance of a PSO search.However, finding the right balance requires tuning the related parameters, which is problem-specific.One of the key advantages of the PSO algorithm is that it requires only a few tunable parameters, namely, the number of iterations N iter and the number of independent runs N runs of PSO.If the probability that an individual PSO fails to locate the primary peak of the fitness function is denoted as p, then the probability that at least one search from N runs independent PSO searches, using different random seeds, succeeds in locating the primary peak is given by 1 − p N runs .This probability approaches unity exponentially fast with N runs .Therefore, multiple independent runs are a quick and easy way to significantly enhance the performance of a PSO-based search.It is recommended to start with N runs in the range of 6∼12 and to set N iter to 2000, as discussed in [79].These values can be adjusted based on the specific fitness function being used.The actual values used in this paper are described in Section 5.For more detailed information on an objective strategy for tuning PSO parameters, refer to [55].
The PSO dynamics of the i-th particle in the swarm is described by two equations as follows: where t represents an iteration, and x i (t) and x i (t + 1) denote the respective positions before and after the update.v i (t + 1) represents the amount of positional increment, referred to as velocity, while v j i (t + 1) is the corresponding projection component for the j-th parameter.The quantities x j i (t) and p j i (t) represent the current location and personal best (pbest) location of the j-th parameter, while g j (t) represents the global best (gbest) location among all particles of the j-th parameter.Equation (37) provides the key feature of a PSO update.The first term represents the influence of the momentum of the i-th particle with ω being the inertia weight.The second and third terms represent the acceleration effects, where the former considers the influence of the particle itself and the latter represents the influence from neighboring particles, with c 1 and c 2 being the acceleration coefficients.The randomness of the PSO algorithm arises from the utilization of random variables r 1 and r 2 , which are drawn from a uniform distribution between 0 and 1.The locations of pbest and gbest are updated following the rules below: The typical settings for PSO are as follows: (1) c 1 = c 2 = 2; (2) linearly decreasing inertia weight ω over iterations; (3) constraining the velocity by a given parameter, referred to as the maximum velocity, V max , whose value is usually 0.2, but 0.5 is used here to strengthen the search ability of PSO, such that −V max ≤ v j i (t) ≤ V max for all iterations and particles; (4) randomly generating initial positions and velocities for all particles; and (5) setting the number of particles N p in the swarm to N p = 40.The "let-them-fly" boundary condition is used, where the position and velocity of a particle remain unchanged, and a fitness value of −∞ is assigned once the particle leaves the search space.As a result, the actual number of fitness function (likelihood) evaluations for an individual PSO search would be smaller than the value of N iter • N p .
To enhance the exploitation capability of PSO, particularly for multimodal fitness functions, a variation called local best (lbest) PSO [51] is proposed as an improvement over the gbest PSO.In the lbest PSO, for each particle i, a smaller swarm is utilized to determine the lbest position denoted as p local,i (t) and the corresponding fitness value f (p local,i (t)).These values are then used to replace the gbest position g j (t), g(t + 1) in Equation ( 37) and the corresponding fitness value f (g(t)) in Equation (39).The typical configuration for the smaller swarm surrounding the i-th particle is a ring structure consisting of three particles, whose indices are given by N i = i − 1, i, i + 1, with the first and last particle connected in a circular manner.It is worth noting that the lbest PSO reduces to the gbest PSO when the ring includes all the particles.The selection of the fitness value for the lbest of the i-th particle follows the criteria shown below: Here, a more comprehensive exploitation is achieved by slower convergence in the lbest PSO, thus making it more computationally expensive than gbest PSO.

Results
In this study, we utilized 0.5 years of data containing a single EMRI signal with the same source parameters as LDC-1.2[81], except for a shorter distance D of 1.535300 Gpc, resulting in an SNR value of 50 for the injected signal.This SNR value has been widely used as a benchmark for 0.5 years signals in recent studies [30,31,43].While our search method is not inherently restricted to a shorter data length, constraints on computational resources and a pending GPU-acceleration of our code sets the above limit on the data length.The noise realization used in our analysis is obtained by subtracting the signal from the data, with both provided in LDC-1.2 [81], ensuring that our simulated data share the same characteristics as the LDC data but with a scaling of a shorter duration and a higher SNR.In Figure 3, the spectra of the injected signal and the simulated data for TDI combinations A and E are displayed, revealing the relatively weak nature of the injected signal compared to the simulated data.The values of the source parameters and the width of the search ranges used for the 8-dimensional and the 7-dimensional searches are presented in Table 2.The Fisher Information Matrix (FIM) σ represents the estimation error of the Cramer-Rao Lower Bound (CRLB) for each parameter at an SNR of 50, evaluated at the injected source parameters.The injected signal parameters are also called the true ones in the following analysis.We set the tunable hyperparameters for PSO as follows: N runs is set to 6, and N iter is set to 15,000 for the 8-dimensional searches, 20,000 for the first two 7-dimensional searches, and 25,000 for the remaining four 7-dimensional searches.Due to limited computational resources, the 6 independent searches have to be carried out serially.Due to the presence of noise in the data, both PSO and local maximization are expected to find best-fit fitness values that are higher than that at the true location, which are called a successful search.In order to further reduce computational costs, the searches are terminated once a successful search occurs.Consequently, the actual N runs is 4 for the 8-dimensional searches and 6 for the 7-dimensional searches.Table 2.The injected source parameters and range width used in our search.Currently, the location of the injected signal is set at the center of the given range.We leave a more general search, with injected signals placed non-centrally in the search space, to future work.The value of 0.3456 (7D) is the corresponding orbital frequency difference between lag 6 and lag −4 relative to its σ value; see more details in Section 3.3.The results obtained from the 8-dimensional and the 7-dimensional searches are summarized in Table 3 and Table 4, respectively.We report the square roots of the best-fit fitness values from each PSO search, which provide the estimated SNRs.Additional details regarding Tables 3 and 4 are provided below.

Parameters
1.The 4-th PSO in the 8-dimensional searches is successful as indicated by the estimated SNR shown in bold.However, no similar successful search is observed in the 7dimensional searches.
2. Parameter estimation errors are determined by subtracting the corresponding signal parameter's best-fit values from their true values.The six ODE-related parameters, namely, µ, M, λ, S/M 2 , e 0 , and ν 0 , are expressed relative to their respective FIM σ (evaluated at the true location).The estimation error for D is expressed relative to its true value itself.For the parameters θ s and ϕ s that represent the sky's location, we show the errors themselves.The sky's locations (θ s , ϕ s ) and (π − θ s , ϕ s + π) [26] contribute a degeneracy to the LLR in Equation (27).As a result, we use the asterisk ( * ) to show the corresponding errors after the degeneracy is taken care of.3. To consider the impact of weak harmonics beyond the loudest 10 on the estimation of the initial angles ϕ 0 , γ 0 and α 0 , as well as the angles θ k and ϕ k denoting the spin direction of the MBH, we conduct a rerun of the 5-dimensional local maximization using a waveform with all the 25 harmonics at the best-fit location from each PSO search, where the templates used in the search are restricted to the loudest 10 harmonics with n ∈ {2, 3}.The estimated angles are then utilized in the estimation of the distance D using Equation ( 28). 4. The recovered 14-dimensional parameters obtained previously are utilized in Equation (6) and Equation ( 7) to estimate the signal of A and E. The separate and combined overlaps between the injected and the estimated signal are quantified as ff A , ff E , and ff AE , respectively.
For the 8-dimensional PSO outputs shown in Table 3, it can be observed that the errors in the parameters µ, M, λ, and S/M 2 are ≈2σ, while the error in the parameter D is ≈1%.The errors in the sky's location are within ≈0.1 radians.However, the errors in the parameters e 0 and ν 0 vary significantly among different PSO outputs where the successful PSO output returns a minimum error of approximately ∼2σ with an overlap of 98%, and the other PSO outputs yield larger errors up to ∼8σ with smaller overlap values.These discrepancies are reasonable because e 0 and ν 0 are the initial values of the ODEs in Equation ( 20) which describe the orbital dynamics of the EMRI source (the other three initial angles do not determine the morphology of the ODEs' solution, only contributing a constant shift).Therefore, the phase match are more sensitive to these parameters, thus requiring longer iterations to converge.
For the 7-dimensional PSO outputs presented in Table 4, no successful search is found where the best-fit fitness value exceeds that at the true location.However, the 4th PSO output returns errors of approximately ∼1σ for the parameters µ, M, λ, S/M 2 , e 0 , and ν 0 ; ∼5% for the distance D; and ∼5% radians for the sky's location, with an overlap of 97%.This indicates that the signal is indeed captured, making it a successful search.The 3rd and 5th PSO outputs exhibit similar features to the first three PSO outputs in the 8-dimensional searches, where the larger errors in e 0 result in the smaller fitness values.The errors in ν 0 are the same for the 1st, 2nd, 3rd, and 5th PSO outputs, which may be attributed to the small range of 11 lags used to shift the 8-dimensional A and E template starting from the lag of the fiducial ν 0 .It should be noted that the fitting of ν 0 should cover more lags to obtain a more accurate estimation over ν 0 .
The successful PSO searches (the 4th PSO for both dimensions) demonstrate smaller errors in the parameters µ, M, λ, S/M 2 , e 0 , and ν 0 for the 7-dimensional search (∼1σ) compared to those for the 8-dimensional search (∼2σ).This suggests that the utilization of reduced dimensional LLR and increased iterations effectively reduce estimation errors, particularly for parameters that are related to the GW phase.The fact that all PSO runs obtained fitness values close to each other but at various offsets for estimated errors, ranging from 1σ to 8σ, illustrates the presence of large number of secondary peaks in the fitness function.

Discussion
We extended the previous work on a 10-dimensional LLR [54] search to an 8-dimensional and a 7-dimensional LLR search, in which progressively more parameters are locally maximized while the remaining are globally maximized using PSO.In the 8-dimensional search, we performed a 5-dimensional local maximization over the three initial angles ϕ 0 , γ 0 , and α 0 , and the angles θ k and ϕ k describing the spin direction of the MBH.In the 7-dimensional search, we used a fiducial value of ν 0 and applied a lag-by-lag shift to the 8-dimensional TDI responses to fit the true ν 0 .
The low estimated errors and the corresponding high overlap between the estimated and injected signals indicate that both the 8-dimensional and the 7-dimensional search work well within a wider search range.Our approach used the same search range widths for µ and M as the low mass-ratio sources prescribed in MLDC 1.3.4 and 1.3.5, and half the width of the MLDC value for the parameter S/M 2 [35].This serves as a guide for future hierarchical searches, as demonstrated in [43] using certain clustering techniques, for how much they need to narrow down the search ranges for parameters such as e 0 , ν 0 , and λ.
The larger errors observed for e 0 and ν 0 , compared to the smaller errors for other parameters in Tables 3 and 4, indicate that matched filtering is more sensitive to these two parameters.Thus, it becomes more difficult to accurately determine them.This insight inspires us to explore more advanced optimization algorithms, such as the Cooperative Coevolution Particle Swarm Optimization (CCPSO) [85], where only a subset of parameters are updated at each iteration to improve the optimization process.We expect that this approach will help PSO particles in escaping from secondary peaks and converging faster towards the primary peak in the parameter space of the fitness function.The additional computational cost can be mitigated by implementing a faster code using GPU acceleration.
In systems with higher eccentricity (e 0 > 0.5), the power distributions over harmonics become more erratic, which depend on the harmonics index n only for the 8-dimensional waveform.Consequently, the loudest 10 harmonics, with fixed values n belonging to the set {2, 3}, might not be the optimal choice any longer.Hence, we need to develop methods to select the dominant harmonics on the fly in such systems.
The existence of multiple secondary peaks can hinder the PSO update process, making it difficult for particles to converge towards the primary peak.As a result, larger estimation errors of the signal parameters may occur.To effectively tackle this issue, one possible approach is to employ the reduced dimensional LLR and increase the number of iterations for PSO searches.Nevertheless, the increased computational requirements necessitate the utilization of additional cores or GPUs in the code.
In our previous 10-dimensional searches, where computational costs were lower, we examined injected signals with SNR values of 50, 40, and 30 and a duration of 0.5 years.However, in this paper, our focus is on the computationally expensive 8-dimensional and 7-dimensional searches.Consequently, we only cover injected signals with an SNR of 50 and the same duration due to our limited computational resources.This SNR value of 50 is higher compared to the SNR of the LDC-1.2signal with the same duration.In future work, it is important to explore lower SNR values to assess the robustness of our method.We also plan to conduct additional tests, such as a random placement of the true location, wider search ranges, and longer data duration, to further validate our approach.
, and ⊗ represents (a ⊗ b) ij = a i b j for arbitrary vectors a and b.

Figure 1 .
Figure 1.Illustrations of structures of the 5-dimensional subspace evaluated at a location.The X and Y axes lie in these planes in the 3-dimensional subspace composed of ϕ 0 , γ 0 , and α 0 , and the range along both is [−π, π].The dot in panel a denotes the true values of the corresponding locations of the injected signal, which also labels the global maximum of the given landscape.

Figure 2 .
Figure 2. Illustrations of the square root of the LLRs over lags.The square roots of the 8-dimensional LLRs are in red and the corresponding 7-dimensional values are in blue, connected with a solid magenta line for each lag.

Figure 3 .
Figure 3. Magnitudes of the FFTs of the injected signal with SNR = 50 in blue and the corresponding data in red, where the TDI combination A is illustrated in the left panel and the TDI combination E is displayed in the right panel.See their definitions in Equation (7).

Table 3 .
PSO outputs of 8-dimensional searches.The square root of the fitness value at the true 8-dimensional location is 47.879594.Further details about the table are discussed in Section 5.

Table 4 .
PSO outputs of 7-dimensional searches.The square root of the fitness value at the true 7-dimensional location is 47.882605.Further details about the table are discussed in Section 5.