Parameter Inference for Coalescing Massive Black Hole Binaries Using Deep Learning

Ruan, Wenhong; Wang, He; Liu, Chang; Guo, Zongkuan

doi:10.3390/universe9090407

Open AccessArticle

Parameter Inference for Coalescing Massive Black Hole Binaries Using Deep Learning

¹

School of Fundamental Physics and Mathematical Sciences, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China

²

School of Physical Sciences, University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China

³

International Centre for Theoretical Physics Asia-Pacific, University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Taiji Laboratory for Gravitational Wave Universe, University of Chinese Academy of Sciences, Beijing 100049, China

⁵

CAS Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Universe 2023, 9(9), 407; https://doi.org/10.3390/universe9090407

Submission received: 4 August 2023 / Revised: 3 September 2023 / Accepted: 4 September 2023 / Published: 6 September 2023

(This article belongs to the Special Issue Newest Results in Gravitational Waves and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

In the 2030s, a new era of gravitational wave (GW) observations will dawn as multiple space-based GW detectors, such as the Laser Interferometer Space Antenna, Taiji, and TianQin, will open the millihertz window for GW astronomy. These detectors are poised to detect a multitude of GW signals emitted by different sources. It is a challenging task for GW data analysis to recover the parameters of these sources at a low computational cost. Generally, the matched filtering approach entails exploring an extensive parameter space for all resolvable sources, incurring a substantial cost owing to the generation of GW waveform templates. To alleviate the challenge, we make an attempt to perform parameter inference for coalescing massive black hole binaries (MBHBs) using deep learning. The model trained in this work has the capability to produce 50,000 posterior samples for the redshifted total mass, mass ratio, coalescence time, and luminosity distance of an MBHB in about twenty seconds. Our model can serve as an effective data pre-processing tool, reducing the volume of parameter space by more than four orders of magnitude for MBHB signals with a signal-to-noise ratio larger than 100. Moreover, the model exhibits robustness when handling input data that contain multiple MBHB signals.

Keywords:

gravitational wave; artificial intelligence; machine learning; coalescence of binary compact objects

1. Introduction

Several space-based gravitational wave (GW) detectors are expected to launch in the 2030s, including the Laser Interferometer Space Antenna (LISA) [1], Taiji [2], and TianQin [3], which will conduct all-sky surveys of GWs in the millihertz frequency band. One of the key focal points for these detectors is the GW signals emitted by coalescing massive black hole binaries (MBHBs) with total masses ranging from

10^{5} M_{⊙}

to

10^{8} M_{⊙}

. Based on the estimations from population models, it is expected to detect more than one MBHB coalescence per year [4]. It is a key task for space-based GW data analysis to recover the parameters of these systems. By harnessing this information, we can trace the origin, growth, and merger history of MBHBs.

Typically, the matched filtering (MF) method [5] stands as the primary choice for analyzing a weak signal buried in noise. It has been widely used to infer the parameters of stellar-mass binary black holes (BBHs) for ground-based GW detection [6,7,8]. However, the method is computationally expensive due to the extensive generation of waveform templates during the stochastic exploration of parameter space. As the number of GW events detected by LIGO/Virgo Collaboration continues to surge, the substantial cost of the MF method becomes increasingly problematic. The challenge further intensifies when applying the MF method to space-based GW detection, because the GW waveform templates are more complicated due to the motion of detectors and the application of the time-delay interferometry (TDI) technique [9]. Moreover, GW signals emitted by some sources can be observed for several days, weeks, or even months during the lifetime of a detector. This is the case for MBHBs considered in this work. It is foreseeable that the strain data of the detector will contain a mixture of multiple GW signals from different sources. Generally, a global fit analysis has been proposed to recover the source parameters of all resolvable signals [10,11,12]. The method explores the large parameter space of the parameters of all sources, resulting in a considerable computational cost. As the launch time of the space-based GW detectors approaches, there is a necessary and urgent need to develop novel techniques that can effectively mitigate the computational cost of parameter inference.

Currently, the application of deep learning in parameter inference has garnered significant attention within the GW community. Many researchers have applied deep learning models to produce the posterior for source parameters of stellar-mass BBHs [13,14,15,16,17,18,19,20,21]. Some of the models can achieve comparable performance with the MF approach on the GW events detected by LIGO/Virgo. Moreover, some authors have considered the parameter inference for space-based GW detection through deep learning models in [22]. The authors implement a successful example of producing a two-dimensional posterior for MBHBs with components’ masses

m_{1, 2} \in [1.25, 10] \times 10^{5} M_{⊙}

using a deep learning model. Their model is trained on the family of 2.5 PN TaylorF2 waveforms, which are characterized by five parameters: masses and spins of the two black holes, as well as the signal-to-noise ratio (SNR) of the waveform. Moreover, the detector responses of TDI channels are not considered.

In this paper, we present an implementation of parameter inference for nonprocessing spinning MBHBs using a deep learning model based on the normalizing flow (NF) [23]. Actually, the NF architecture has demonstrated remarkable capability in parameter inference for ground-based GW sources [16,17,18,21]. In light of this success, we make an attempt to extend its application to coalescing MBHBs detected by future space-based GW detectors. Taking the simulated LISA data as input, our model has the ability to produce a reliable posterior for redshifted total mass, mass ratio, coalescence time, and luminosity distance of MBHBs with instrumental noise. The model only takes about twenty seconds to draw 50,000 posterior samples for the four parameters, which is much faster than the MF approach. The prototype global fit analysis takes

O (5)

days to process a year of data using

O (10^{3})

CPUs [12]. Although our model provides less precise ranges of the parameters compared to the MF approach, we introduce an attempt at parameter inference for coalescing massive black hole binaries. It serves as valuable data pre-processing. Specifically, the model rapidly establishes a narrowed prior within tens of seconds, which is negligible when contrasted with the time-consuming random sampling in the parameter space. And the computational cost of the MF can be reduced by adopting the narrowed prior. Furthermore, our model shows robustness against the presence of multiple MBHB signals in strain data. This noteworthy characteristic indicates it as a potential candidate for integration with the global fit analysis, enabling a lower computational cost to recover the source parameters of all resolvable MBHB signals. In this work, we use simulated LISA data to train our model. In principle, it can be easily extended to other space-based GW detectors.

The paper is organized as follows. In Section 2, we introduce the framework of our model. In Section 3, we illustrate the generation of simulated LISA data used to train and test the model. Next, Section 4 shows the test results of our model. Finally, we give a summary and discussions in Section 5.

2. Model

Our task is to obtain the posterior

p (Θ | s)

of the source parameters

Θ

from the strain data s. To achieve the target, we construct a generative model conditioned on the strain data to draw samples of a random variable

θ

. In other words, the model is a sample generator of conditional distribution

q (θ | s)

. By tuning the learnable parameters of the model during training, the conditional distribution transforms into an estimation of the posterior for source parameters. Practically, we combine a convolutional neural network (CNN) [24] and an NF to construct the model. The framework of our model is shown in Figure 1, which can be divided into two parts.

The first part of the model is designed to extract key features from the strain data s, as indicated in Figure 1 within the smaller black dashed box. This part is required to encode the features into a low-dimensional output, as a higher-dimensional output would lead to an increase in the parameters of the subsequent network, consequently raising the cost of the training process. We have explored several neural network structures commonly used in the GW community to implement this part, and after careful consideration, we settled on the CNN due to its final performance and training difficulty. Specifically, the CNN is composed of three convolutional layers and three fully connected layers. It encodes the key information of the strain data in a feature vector l to input the rest part of the model. The dimension of l is set to 256 in this work.

The second part of the model is implemented using an NF, which describes the transformation of an initial probability distribution into another probability distribution. The transformation is defined as an invertible and smooth mapping

f_{l} : R^{D} \to R^{D}

, where D is the dimension of sample space. In this paper, we set

D = 4

for redshifted total mass, mass ratio, coalescence time, and luminosity distance of MBHB. Note that the NF is conditioned on the feature vector l. Applying the transformation on a random variable z with distribution

π (z)

, the resulting random variable

θ = f_{l} (z)

obeys the distribution [23]

q (θ | s) = π (z) {|det (\frac{\partial f_{l}}{\partial z})|}^{- 1} .

(1)

The transform comprises a series of artificially defined invertible and smooth mappings, which can be written as

f_{l} (z) = f_{l, N} \circ f_{l, N - 1} \circ \dots \circ f_{l, 1} (z) .

(2)

Each mapping

f_{l, i} (i = 1, \dots, N)

represents a block of the NF, which corresponds to the bigger black dashed box in Figure 1. There are many types of NF due to different designs of the mapping

f_{l, i}

. In this work, we adopt a neural spline flow [25] that is also used in some studies of ground-based GW data analysis [16,17,18]. The construction of the neural spline flow block is depicted within the bigger black dashed box in Figure 1. For the input

z^{(i)} = f_{l, i} \circ \dots \circ f_{l, 1} (z)

of the

(i + 1)

th block, it undergoes an initial step of random permutation through an intermediate layer that is implemented using the LU-decomposition approach [26]. The layer ensures the components of

z^{(i)}

can interact with others. Then, the output of the permutation layer is split into two parts, which can be written as

\begin{matrix} z^{(i)} = [z_{1 : d - 1}^{(i)}, z_{d : D}^{(i)}] . \end{matrix}

(3)

Next,

z^{(i)}

is inputted to a monotonic rational quadratic (RQ) function

G_{l, i}

, which transforms the two parts of

z^{(i)}

separately via

G_{l, i} (z^{(i)}) = \{\begin{matrix} z_{j}^{(i)} & if j < d, \\ g_{φ_{i}} (z_{j}^{(i)}) & if j \geq d, \end{matrix}

(4)

where

g_{φ_{i}}

is an artificially defined function parameterized by

φ_{i}

and the parameters are determined by l and

z_{1 : d - 1}^{(i)}

. Specifically, the function

g_{φ_{i}}

maps an interval

[- L, L]

to

[- L, L]

and divides the interval into K bins (see more details in Figure 1 of [25]). Each bin is continuously separated by

K + 1

coordinates

{(x^{(k)}, y^{(k)})}_{k = 0}^{K}

with

(x^{(0)}, y^{(0)}) = (- L, - L)

and

(x^{(K)}, y^{(K)}) = (L, L)

. For

x \in [- L, L]

, the function in the kth bin is evaluated through a monotonically increasing function given by [25]

\begin{matrix} g_{φ_{i}}^{(k)} (ξ) & = & y^{(k)} + \frac{(y^{(k + 1)} - y^{(k)}) [τ^{(k)} ξ^{2} + δ^{(k)} ξ (1 - ξ)]}{τ^{(k)} + [δ^{(k + 1)} + δ^{(k)} - 2 τ^{(k)}] ξ (1 - ξ)}, \end{matrix}

(5)

\begin{matrix} ξ (x) & = & \frac{x - x^{(k)}}{x^{(k + 1)} - x^{(k)}}, \end{matrix}

(6)

\begin{matrix} τ^{(k)} & = & \frac{y^{(k + 1)} - y^{(k)}}{x^{(k + 1)} - x^{(k)}}, \end{matrix}

(7)

where

δ^{(k)}

denotes the derivative of the function at coordinate

(x^{(k)}, y^{(k)})

and the derivatives of boundaries

{δ^{(0)}, δ^{(K)}}

are set to 1. Therefore, the function

g_{φ_{i}}

is parameterized by

3 K - 1

parameters, which can be written as

φ_{i} = [φ_{i}^{w}, φ_{i}^{h}, φ_{i}^{d}] .

(8)

Here,

2 K

of the parameters

φ_{i}^{w}

and

φ_{i}^{h}

determine the widths and heights of the K bins, respectively. The remaining

K - 1

of the parameters

φ_{i}^{d}

determine the boundary derivatives

{δ^{(k)}}_{k = 1}^{K - 1}

. For the neural spline flow used in this work, the parameters

φ_{i}

are given by a residual network [27] that takes the feature vector l and the

z_{1 : d - 1}^{(i)}

part as input. The residual network contains 14 residual blocks, and each block is combined with two fully connected hidden layers of 512 units. Moreover, we train models with a different number N of the blocks of NF and a number K of bins. We finally choose

N = 22

and

K = 8

after comparing the performance of the models. As shown in Figure 1, the initial variable z is transformed to the variable

θ

after passing through all blocks of the NF and a permutation layer. Generally, one can choose a simple distribution

π (z)

that is convenient for drawing samples of the variable z. We take

π (z)

as the standard multivariate normal distribution in this work.

To make the distribution

q (θ | s)

described by the model close to the GW posterior

p (Θ | s)

, we train the model using the expected value of the cross entropy between the two distributions, which is given as [16]

H (p, q) = - \int d s p (s) \int d θ p (θ | s) log q (θ | s) .

(9)

Practically, we train the model by minimizing the cross entropy, which is a metric of the difference between two distributions. However, it is difficult to obtain the posterior

p (θ | s)

in the integral. The posterior can be converted to likelihood using Bayes’ theorem, and then the cross entropy can be written as

H (p, q) = - \int d θ p (θ) \int d s p (s | θ) log q (θ | s) .

(10)

In the training stage, the training dataset is divided into many mini-batches.

H (p, q)

should be calculated on every mini-batch, and the integral can be estimated using a Monte Carlo approximation [16,18]

H (p, q) \approx - \frac{1}{B} \sum_{n = 1}^{B} ln q (θ_{n} | s_{n}),

(11)

where B denotes the batch size and

s_{n}

represents the nth simulated strain data on the minibatch. Note that

θ_{n}

is drawn from the prior of source parameters and

s_{n}

is generated using the composition of GW waveform and noise, which will be explained in detail in Section 3. Furthermore, we use the Adam optimizer [28] to minimize the cross entropy stochastically on minibatches. In this work, we implement the model based on PyTorch [29], nflows [30] and codes shared in [31].

3. Datasets

The strain data s considered in this work is composed of the GW signal and detector noise, which can be written as

s = h (Θ) + n,

(12)

where

h (Θ)

is a GW signal from MBHB with the parameter

Θ

and n is the detector noise. We simulate GW signals used in the training and test stage using the IMRPhenomD waveforms [32,33], which model nonprecessing spinning inspiral–merger–ringdown waveforms. The GW waveforms are generated with random sampling over an 11-dimensional set of source parameters: redshifted total mass M, mass ratio q, coalescence time

t_{c}

, luminosity distance

d_{L}

, dimensionless spins

(s_{1 z}, s_{2 z})

, inclination angle

ι

, ecliptic latitude

β

, ecliptic longitude

λ

, reference phase

ϕ_{c}

, and polarization angle

ψ

. The priors of the source parameters are listed in Table 1. The range of coalescence time covers nearly a whole year, and the range of luminosity distance is converted from the range of redshifts

z \in [0.5, 5]

assuming a flat

Λ CDM

cosmology with

Ω_{m} = 0.31, Ω_{Λ} = 0.69

, and

H_{0} = 67.74

[34]. Moreover, to simulate the GW signal in real data, the IMRPhenomD waveforms should be modulated using the response function. For LISA, one of the strongest components of instrumental noise is the laser phase noise. It can be suppressed using the TDI technique, which combines measurements from different arms of LISA into a composite observable [9]. We choose the uncorrelated TDI observables A and E [35] as two channels of the input data passing through our model. Moreover, for the random noise n in strain data, we generate Gaussian instrumental noises of the A and E channels using the power-spectral density stated in the LISA Science Requirement Document [36]. In this work, the simulated GW signals are generated using codes of LISA Data Challenge Group (LDC) [37], and the instrumental noises are generated using the PyCBC package [38].

For each TDI channel, we set the number of the input data points as 98,304 and the sampling time interval as 5 s. During each epoch of the training process, we generate 60,000 GW signals by randomly drawing source parameters from the prior and combining them with random noises. This approach effectively prevents overfitting and enhances the robustness of the model. As is standard practice in training deep learning models, the training dataset is split into two sets for training and validation. Specifically, 90% of the data are allocated for training, while the remaining data are utilized for validation.

4. Results

In this work, we trained many models on an NVIDIA Tesla A40 GPU, experimenting with different learning rates, batch sizes, and hyperparameters. We evaluated the performance of these models based on their final validation loss. The test results reported in this paper correspond to the model that achieves the lowest validation loss. The model was trained for about 5 days, employing a batch size of 512 and a total of 3200 epochs. Moreover, the learning rate was set to 0.0001 at the beginning, which gradually decreased due to cosine annealing [39] in the training process. The model takes only about twenty seconds to produce 50,000 posterior samples of the redshifted total mass, mass ratio, coalescence time, and luminosity distance for an MBHB signal.

Generally, it is convenient to demonstrate the reliability of a generative model through the Kolmogorov–Smirnov (KS) test [40]. We conducted the KS test on our model using a dataset containing 1000 simulated strain data. The GW signals injected in the strain data were generated with source parameters randomly drawn from the prior shown in Table 1. For each test signal, our model produced 20,000 samples to estimate the posterior for the four source parameters. The results of the test are summarized in the P-P plot shown in Figure 2. The p-values of the four parameters are given in the upper left corner of the figure. The colored lines in the figure represent the empirical cumulative distribution function (CDF) of the number of times that the true value for each parameter fell within a credible interval p, as a function of p. As shown in the figure, the empirical CDF lines provided by our model lie close to the true CDF line (the diagonal black dashed line), which confirms that our model is a reliable estimator of the GW posterior.

Compared with the uncertainties of source parameters given by the MF approach, our model provides rougher estimations of the source parameters. Nonetheless, the model can produce an estimated posterior in tens of seconds, which is valuable for data pre-processing. Based on the output of our model, we can determine a substantially narrowed prior in comparison to the initial prior used in the MF approach. The additional time cost incurred by our model is negligible compared to the cost of the stochastic exploration of parameter space. Figure 3 shows the reduction in parameter space volume with the help of our model as a function of SNR. Specifically, we estimate the value based on the 90% credible region obtained from the posterior samples produced by our model in the KS test. The initial priors for M, q, and

d_{L}

used in the MF approach are assumed to be the same as the priors in Table 1. And the initial prior for

t_{c}

is assumed to be the input length of our model. We average the estimated reductions in parameter space volume for each bin of SNRs. As shown in Figure 3, our model provides a greater reduction for the MBHB with a larger SNR. The model is capable of reducing the parameter space volume by more than four orders of magnitude when

SNR > 100

.

Considering that a GW signal emitted by MBHB can evolve in the sensitive frequency band of LISA for days, months, or even years, it is foreseeable that multiple MBHB signals will be superimposed in real data. Although our model is trained on the strain data containing a single MBHB signal, it also exhibits robustness when dealing with data that contain multiple signals. The LDC group simulated one year of LISA data comprising a mixture of 15 MBHB signals, with these systems undergoing mergers at different times throughout the year [41]. We composed a mixture signal and instrumental noise to test the performance of our model on multiple signals. Note that eight of the MBHBs possessed parameters falling within the priors listed in Table 1, whereas the remaining MBHBs had luminosity distances beyond the specified range. Thus, the test for our model focused on the eight MBHBs which are labeled LDC with numbers

{1, 2, 3, 8, 9, 10, 13, 14, 15}

. As a comparison, we additionally generated eight GW signals individually using the parameters provided by LDC for these systems. We then combined each signal with its respective noise separately to test the performance of our model. Due to the input length limitations of the model, the merger phase of MBHB 2 always exists in input data with the merger phase of another MBHB. The model cannot produce reliable samples for the input data containing multiple merger phases. For the other seven MBHBs, our model produced similar posteriors in both single-signal and multiple-signal cases. Figure 4 shows the estimated posteriors of the MBHB 14 produced by the model in these two scenarios. The MBHB 14 is the first one to evolve to the merger phase, which means the GW signal from it is overlapped with the inspiral phases of all other GW signals. As shown in Figure 4, the marginalized one- and two-dimensional posterior distributions of the multiple-signal case (blue) closely align with those of the single-signal case (orange). Our model has the capability to adapt effectively when multiple MBHB signals are observed during the lifetime of LISA. It is expected that the model can contribute to the global fit analysis by determining a reduced parameter space.

5. Summary and Discussions

In this paper, we presented the implementation of parameter inference using deep learning for coalescing MBHBs. While training a deep learning model may require some time, and it took several days in this study, it can produce a large number of posterior samples of source parameters in a very short duration. Our model has the ability to generate 50,000 samples of the four source parameters in about twenty seconds. Due to the complexity of the strain data raised from the motion of space-based detectors and the TDI technique, the current model cannot achieve comparable precision of parameter inference when compared with the MF approach. However, it remains feasible to treat the model as a data pre-processing tool that effectively reduces the parameter space volume by over four orders of magnitude for MBHB signals with an

SNR > 100

. This reduction consequently reduces the computational cost of the follow-up exploration of parameter space accordingly.

In reality, the real data of space-based GW detection may contain many unexpected features. Therefore, the current simulated data cannot be completely consistent with future real-world scenarios. However, the generalization ability of deep learning enables it to work beyond the scope of training data, making it adaptable to handle unseen variations and unexpected features that may arise in the future. Although our model is trained on data containing a single MBHB signal, it exhibits robustness when analysing data containing multiple MBHB signals. In our test, the model yields similar posteriors for both scenarios, showcasing the potential of deep learning to be integrated with a global fit analysis for analyzing real LISA data. This combined approach holds promising prospects for handling numerous resolvable GW signals at a low computational cost.

The generalization capability of the deep learning model can be further extended. In this work, we trained the model on Gaussian instrumental noise. Nevertheless, during real detection, other noise components will be present, such as non-Gaussian and non-stationary foreground noise composed of tens of millions of GW signals from Galactic binaries, as well as data gaps and glitches [42]. Currently, our model has not been extended to analyze input data containing these components. It can be handled by artificially adding these components to the training data and increasing the complexity of the neural network. Due to the superior ability of deep learning in dealing with unknown data features, it holds great potential for further development in space-based GW data analysis.

Author Contributions

Conceptualization, W.R., H.W., C.L. and Z.G.; methodology, W.R., H.W. and Z.G.; software, W.R.; data curation, W.R.; investigation, W.R., H.W., C.L. and Z.G.; writing—original draft preparation, W.R.; writing—review and editing, W.R., H.W., C.L. and Z.G. All authors have read and agreed to the published version of the manuscript.

Funding

WHR is supported by the National Natural Science Foundation of China Grant No. 12247140. CL is supported by the National Natural Science Foundation of China Grant No. 12147132. ZKG is supported in part by the National Natural Science Foundation of China Grants No. 12075297 and No. 12235019. HW is supported by the National Key Research and Development Program of China Grant No. 2021YFC2203004, the National Natural Science Foundation of China Grant No. 12147103, 12247187 and the Fundamental Research Funds for the Central Universities.

Data Availability Statement

This study used the dataset generated by LDC group, and it can be obtained from [41].

Acknowledgments

We thank the LDC group to provide the software and datasets. We also thank the authors of [16] for open-source codes.

Conflicts of Interest

The authors declare no conflict of interest.

References

Amaro-Seoane, P.; Audley, H.; Babak, S.; Baker, J.; Barausse, E.; Bender, P.; Berti, E.; Binetruy, P.; Born, M.; Bortoluzzi, D.; et al. Laser Interferometer Space Antenna. arXiv 2017, arXiv:1702.00786. [Google Scholar]
Hu, W.-R.; Wu, Y.-L. The Taiji Program in Space for Gravitational Wave Physics and the Nature of Gravity. Natl. Sci. Rev. 2017, 4, 685–686. [Google Scholar] [CrossRef]
Luo, J.; Chen, L.-S.; Duan, H.-Z.; Gong, Y.-G.; Hu, S.; Ji, J.; Liu, Q.; Mei, J.; Milyukov, V.; Sazhin, M.; et al. TianQin: A Space-borne Gravitational Wave Detector. Class. Quantum Gravity 2016, 33, 035010. [Google Scholar] [CrossRef]
Klein, A.; Barausse, E.; Sesana, A.; Petiteau, A.; Berti, E.; Babak, S.; Gair, J.; Aoudia, S.; Hinder, I.; Ohme, F.; et al. Science with the Space-based Interferometer eLISA: Supermassive Black Hole Binaries. Phys. Rev. D 2016, 93, 024003. [Google Scholar] [CrossRef]
Owen, B.J.; Sathyaprakash, B.S. Matched Filtering of Gravitational Waves from Inspiraling Compact Binaries: Computational Cost and Template Placement. Phys. Rev. D 1999, 60, 022002. [Google Scholar] [CrossRef]
Allen, B.; Anderson, W.G.; Brady, P.R.; Brown, D.A.; Creighton, J.D. FINDCHIRP: An Algorithm for Detection of Gravitational Waves from Inspiraling Compact Binaries. Phys. Rev. D 2012, 85, 122006. [Google Scholar] [CrossRef]
Abbott, B.; Jawahar, S.; Lockerbie, N.; Tokmakov, K. GW150914: First Results from the Search for Binary Black Hole Coalescence with Advanced LIGO. Phys. Rev. D 2016, 93, 122003. [Google Scholar] [CrossRef]
Abbott, B.P.; Abbott, R.; Abbott, T.; Abernathy, M.; Acernese, F.; Ackley, K.; Adams, C.; Adams, T.; Addesso, P.; Adhikari, R.; et al. GW151226: Observation of Gravitational Waves from a 22-Solar-Mass Binary Black Hole Coalescence. Phys. Rev. Lett. 2016, 116, 241103. [Google Scholar] [CrossRef]
Tinto, M.; Dhurandhar, S.V. Time-Delay Interferometry. Living Rev. Relativ. 2014, 17, 1. [Google Scholar] [CrossRef]
Cornish, N.J.; Crowder, J. LISA Data Analysis Using Markov Chain Monte Carlo Methods. Phys. Rev. D 2005, 72, 043005. [Google Scholar] [CrossRef]
Littenberg, T.B.; Cornish, N.J.; Lackeos, K.; Robson, T. Global Analysis of the Gravitational Wave Signal from Galactic Binaries. Phys. Rev. D 2020, 101, 123021. [Google Scholar] [CrossRef]
Littenberg, T.B.; Cornish, N.J. Prototype Global Analysis of LISA Data with Multiple Source Types. Phys. Rev. D 2023, 107, 063004. [Google Scholar] [CrossRef]
George, D.; Huerta, E.A. Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation: Results with Advanced LIGO Data. Phys. Lett. B 2018, 778, 64. [Google Scholar] [CrossRef]
Green, S.R.; Simpson, C.; Gair, J. Gravitational-wave Parameter Estimation with Autoregressive Neural Network Flows. Phys. Rev. D 2020, 102, 104057. [Google Scholar] [CrossRef]
Krastev, P.G.; Gill, K.; Villar, V.A.; Berger, E. Detection and Parameter Estimation of Gravitational Waves from Binary Neutron-star Mergers in Real LIGO Data Using Deep Learning. Phys. Lett. B 2021, 815, 136161. [Google Scholar] [CrossRef]
Green, S.R.; Gair, J. Complete Parameter Inference for GW150914 Using Deep Learning. Mach. Learn. Sci. Technol. 2021, 2, 03LT01. [Google Scholar] [CrossRef]
Dax, M.; Green, S.R.; Gair, J.; Macke, J.H.; Buonanno, A.; Schölkopf, B. Real-time Gravitational Wave Science with Neural Posterior Estimation. Phys. Rev. Lett. 2021, 127, 241103. [Google Scholar] [CrossRef]
Shen, H.; Huerta, E.; O’Shea, E.; Kumar, P.; Zhao, Z. Statistically-informed Deep Learning for Gravitational Wave Parameter Estimation. Mach. Learn. Sci. Technol. 2021, 3, 015007. [Google Scholar] [CrossRef]
Schmidt, S.; Breschi, M.; Gamba, R.; Pagano, G.; Rettegno, P.; Riemenschneider, G.; Bernuzzi, S.; Nagar, A.; Del Pozzo, W. Machine Learning Gravitational Waves from Binary Black Hole Mergers. Phys. Rev. D 2021, 103, 043020. [Google Scholar] [CrossRef]
Gabbard, H.; Messenger, C.; Heng, I.S.; Tonolini, F.; Murray-Smith, R. Bayesian Parameter Estimation Using Conditional Variational Autoencoders for Gravitational-wave Astronomy. Nat. Phys. 2022, 18, 112. [Google Scholar] [CrossRef]
Langendorff, J.; Kolmus, A.; Janquart, J.; Van Den Broeck, C. Normalizing Flows as an Avenue to Studying Overlapping Gravitational Wave Signals. Phys. Rev. Lett. 2023, 130, 171402. [Google Scholar] [CrossRef]
Chua, A.J.; Vallisneri, M. Learning Bayesian Posteriors with Neural Networks for Gravitational-wave Inference. Phys. Rev. Lett. 2020, 124, 041102. [Google Scholar] [CrossRef]
Rezende, D.; Mohamed, S. Variational Inference with Normalizing Flows. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1530–1538. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based Learning Applied to Cocument Recognition. Proc. IEEE 1998, 86, 2278. [Google Scholar] [CrossRef]
Durkan, C.; Bekasov, A.; Murray, I.; Papamakarios, G. Neural Spline Flows. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 7509–7520. [Google Scholar]
Oliva, J.; Dubey, A.; Zaheer, M.; Poczos, B.; Salakhutdinov, R.; Xing, E.; Schneider, J. Transformation autoregressive networks. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; pp. 3895–3904. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
Durkan, C.; Bekasov, A.; Murray, I.; Papamakarios, G. Nflows: Normalizing Flows in PyTorch; Zenodo: Geneva, Switzerland, 2020; p. 4296287. [Google Scholar]
lfigw: Likelihood-Free Inference for Gravitational Waves. Available online: https://github.com/stephengreen/lfi-gw (accessed on 1 November 2022).
Husa, S.; Khan, S.; Hannam, M.; Pürrer, M.; Ohme, F.; Forteza, X.J.; Bohé, A. Frequency-domain Gravitational Waves from Nonprecessing Black-hole Binaries. I. New Numerical Waveforms and Anatomy of the Signal. Phys. Rev. D 2016, 93, 044006. [Google Scholar] [CrossRef]
Khan, S.; Husa, S.; Hannam, M.; Ohme, F.; Pürrer, M.; Forteza, X.J.; Bohé, A. Frequency-domain Gravitational Waves from Nonprecessing Black-hole Binaries. II. A Phenomenological Model for the Advanced Detector Era. Phys. Rev. D 2016, 93, 044007. [Google Scholar] [CrossRef]
Ade, P.A.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A.; Barreiro, R.; Bartlett, J.; Bartolo, N.; et al. Planck 2015 Results. XIII. Cosmological Parameters. Astron. Astrophys. 2016, 594, A13. [Google Scholar]
Prince, T.A.; Tinto, M.; Larson, S.L.; Armstrong, J. LISA Optimal Sensitivity. Phys. Rev. D 2002, 66, 122002. [Google Scholar] [CrossRef]
The LISA Science Study Team, ESA-L3-EST-SCI-RS-001. 2018. Available online: https://atrium.in2p3.fr/f5a78d3e-9e19-47a5-aa11-51c81d370f5f (accessed on 1 November 2022).
Babak, S.; Petiteau, A. LISA Data Challenge Manual. 2018. Available online: https://lisa-ldc.lal.in2p3.fr/static/data/pdf/LDC-manual-002.pdf (accessed on 1 November 2022).
Nitz, A.; Harry, I.; Brown, D.; Biwer, C.M.; Willis, J.; Canton, T.D.; Capano, C.; Dent, T.; Pekowsky, L.; Williamson, A.R.; et al. 2021. Available online: https://gwastro/pycbc (accessed on 1 November 2022).
Loshchilov, I.; Hutter, F. Sgdr: Stochastic Gradient Descent with Warm Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
Massey, F.J., Jr. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 1951, 46, 68. [Google Scholar] [CrossRef]
LISA Consortium’s LDC Working Group, LISA Data Challenges. 2019. Available online: https://lisa-ldc.lal.in2p3.fr (accessed on 1 November 2022).
Cornish, N.J. Low Latency Detection of Massive Black Hole Binaries. Phys. Rev. D 2022, 105, 044007. [Google Scholar] [CrossRef]

Figure 1. Framework of the neural network used in this work. The small black dashed box represents the CNN used to extract key features from the strain data s. The large black dashed box depicts the block of the NF, and the number of the blocks is set to 22. The initial variable z is transformed to

θ

after passing through the whole neural network.

Figure 1. Framework of the neural network used in this work. The small black dashed box represents the CNN used to extract key features from the strain data s. The large black dashed box depicts the block of the NF, and the number of the blocks is set to 22. The initial variable z is transformed to

θ

after passing through the whole neural network.

Figure 2. P-P plot for redshifted total mass M (blue), mass ratio q (green), coalescence time

t_{c}

(orange), and luminosity distance

d_{L}

(red) of MBHB. We generate a test dataset composed of 1000 simulated strain data to conduct the KS test for our model, and the p-values are listed in the upper left corner. The model produces 20,000 samples for each test data. The colored lines denote the empirical CDF provided by the model, and the black dashed line denotes the true CDF. The grey regions represent the

1 σ

,

2 σ

and

3 σ

confidence bounds.

Figure 2. P-P plot for redshifted total mass M (blue), mass ratio q (green), coalescence time

t_{c}

(orange), and luminosity distance

d_{L}

(red) of MBHB. We generate a test dataset composed of 1000 simulated strain data to conduct the KS test for our model, and the p-values are listed in the upper left corner. The model produces 20,000 samples for each test data. The colored lines denote the empirical CDF provided by the model, and the black dashed line denotes the true CDF. The grey regions represent the

1 σ

,

2 σ

and

3 σ

confidence bounds.

Figure 3. Reduction in parameter space volume as a function of SNR. The value of reduction is calculated based on a 90% credible region obtained from the posterior samples produced by our model in the KS test. The blue line denotes the average value calculated for each bin of SNR.

Figure 4. Posterior distributions of the redshifted total mass M, mass ratio q, coalescence time

t_{c}

, and luminosity distance

d_{L}

produced by our model. The orange line denotes the result for a single MBHB signal and the blue line denotes the result for the same signal overlapped with the inspiral phases of other 14 MBHB signals. The model produces 50,000 posterior samples for each case.

Figure 4. Posterior distributions of the redshifted total mass M, mass ratio q, coalescence time

t_{c}

, and luminosity distance

d_{L}

produced by our model. The orange line denotes the result for a single MBHB signal and the blue line denotes the result for the same signal overlapped with the inspiral phases of other 14 MBHB signals. The model produces 50,000 posterior samples for each case.

Table 1. Priors of the source parameters used in this work.

Parameter	Prior
M	$LogUniform {[10}^{6} M_{⊙}, 10^{7} M_{⊙}]$
q	$Uniform [1, 5]$
$t_{c}$	$Uniform [3 d, 365 d]$
$d_{L}$	$Uniform {[2910 Mpc, 47, 312 Mpc]}^{3}$
$(s_{1 z}, s_{2 z})$	$Uniform [- 1, 1]$
$cos ι$	$Uniform [- 1, 1]$
$sin β$	$Uniform [- 1, 1]$
$λ$	$Uniform [0, 2 π]$
$ϕ_{c}$	$Uniform [0, 2 π]$
$ψ$	$Uniform [0, π]$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ruan, W.; Wang, H.; Liu, C.; Guo, Z. Parameter Inference for Coalescing Massive Black Hole Binaries Using Deep Learning. Universe 2023, 9, 407. https://doi.org/10.3390/universe9090407

AMA Style

Ruan W, Wang H, Liu C, Guo Z. Parameter Inference for Coalescing Massive Black Hole Binaries Using Deep Learning. Universe. 2023; 9(9):407. https://doi.org/10.3390/universe9090407

Chicago/Turabian Style

Ruan, Wenhong, He Wang, Chang Liu, and Zongkuan Guo. 2023. "Parameter Inference for Coalescing Massive Black Hole Binaries Using Deep Learning" Universe 9, no. 9: 407. https://doi.org/10.3390/universe9090407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parameter Inference for Coalescing Massive Black Hole Binaries Using Deep Learning

Abstract

1. Introduction

2. Model

3. Datasets

4. Results

5. Summary and Discussions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI